[
  {
    "path": ".claude/plans/gha-gpu-runner.plan.md",
    "content": "# Plan: GCP GPU Runner for Integration Tests\n\n**Created:** 2026-03-16\n**Branch:** feat/gha-gpu-runner\n**Status:** Implemented (pending `pulumi up` and PR merge)\n\n## Overview\n\nThe python-audio-separator integration tests currently run on a CPU-only self-hosted\nGHA runner (`e2-standard-4`, 4 vCPU, 16GB RAM). With the new ensemble tests and\nmulti-stem verification tests, CI takes 30+ minutes because each model separation runs\non CPU. A GPU runner would reduce this to ~5 minutes.\n\n## Current State\n\n### Existing runner infrastructure\n- **Location:** `karaoke-gen/infrastructure/compute/github_runners.py` (Pulumi)\n- **Runners:** 3× `e2-standard-4` (general) + 1× `e2-standard-8` (Docker builds)\n- **Labels:** `self-hosted`, `Linux`, `X64`, `gcp`, `large-disk`\n- **Region:** `us-central1-a`\n- **Models:** Pre-cached at `/opt/audio-separator-models` on runner startup\n- **Org-level:** Runners are registered to `nomadkaraoke` org, available to all repos\n- **NAT:** All runners use Cloud NAT (no external IPs)\n\n### Current integration test workflow\n- File: `.github/workflows/run-integration-tests.yaml`\n- Runs on: `self-hosted` (picks up any org runner)\n- Tests: `poetry run pytest -sv --cov=audio_separator tests/integration`\n- Installs: `poetry install -E cpu`\n- Problem: All model inference on CPU → very slow for Roformer/Demucs models\n\n## Requirements\n\n- [x] GCE VM with NVIDIA GPU (T4 is cheapest, sufficient for inference)\n- [x] CUDA drivers + PyTorch GPU support pre-installed\n- [x] Models pre-cached on persistent disk (same as existing runners)\n- [x] Labeled `gpu` so workflow can target it specifically\n- [x] Cost-effective — only runs when needed (on-demand, not always-on)\n- [x] Integration test workflow updated to use `gpu` label\n- [x] Install `poetry install -E gpu` (onnxruntime-gpu) instead of `-E cpu`\n\n## Technical Approach\n\n### Option A: Dedicated GPU VM (simplest)\n\nAdd a new GPU runner VM to the existing Pulumi infrastructure. Use an `n1-standard-4`\nwith 1× NVIDIA T4 GPU. Cost: ~$0.35/hr on-demand, ~$0.11/hr spot.\n\n**Pros:** Simple, fits existing patterns, fast startup (VM already running)\n**Cons:** Always-on cost if not managed; or slow cold-start if managed\n\n### Option B: Spot GPU VM with startup/shutdown management\n\nSame as A but use spot pricing and the existing runner_manager Cloud Function to\nstart/stop based on CI demand.\n\n**Pros:** 70% cheaper ($0.11/hr), fits existing management pattern\n**Cons:** Spot can be preempted mid-test (rare for short jobs); cold start ~2-3 min\n\n### Option C: Use a cloud GPU service (Modal, Lambda Labs, etc.)\n\nRun the integration tests on a cloud GPU service rather than self-hosted.\n\n**Pros:** No infrastructure to manage, pay-per-second\n**Cons:** More complex CI integration, different from existing patterns\n\n### Recommendation: Option B (Spot GPU VM)\n\nThe integration test takes <10 minutes on GPU, so spot preemption risk is low.\nCold start is acceptable since it's triggered by PR events. Cost: ~$0.02 per CI run.\n\n## Implementation Steps\n\n### 1. Pulumi infrastructure (in karaoke-gen repo)\n\n1. [x] Add `GITHUB_GPU_RUNNER` machine type to `config.py`: `n1-standard-4` + 1× T4\n2. [x] Add `GPU_RUNNER_LABELS` to `config.py`: `\"self-hosted,linux,x64,gcp,gpu\"`\n3. [x] Create GPU runner VM in `github_runners.py`:\n   - `n1-standard-4` (4 vCPU, 15GB RAM)\n   - 1× NVIDIA T4 GPU (`nvidia-tesla-t4`)\n   - `guest_accelerators` config\n   - `on_host_maintenance: \"TERMINATE\"` (required for GPU VMs)\n   - Same NAT/networking as existing runners\n4. [x] Create GPU startup script (`github_runner_gpu.sh`):\n   - Install NVIDIA drivers via CUDA repo (cuda-drivers + cuda-toolkit-12-4)\n   - Install CUDA toolkit\n   - Verify GPU: `nvidia-smi`\n   - Pre-download models to `/opt/audio-separator-models`\n   - Register as GHA runner with `gpu` label\n5. [x] Add spot scheduling for cost optimization\n6. [ ] Run `pulumi up` to create the VM\n\n### 2. Workflow update (in python-audio-separator repo)\n\n7. [x] Update `run-integration-tests.yaml`:\n   - Change `runs-on: self-hosted` to `runs-on: [self-hosted, gpu]`\n   - Change `poetry install -E cpu` to `poetry install -E gpu`\n   - Add `nvidia-smi` verification step\n   - Add 30-minute timeout\n8. [ ] Add fallback: if no GPU runner available, fall back to CPU with longer timeout\n   - Deferred: not needed initially, the runner_manager auto-starts the GPU VM on demand\n\n### 3. Startup script details\n\nThe GPU startup script needs to:\n```bash\n# Install NVIDIA drivers (for Debian 12)\nsudo apt-get update\nsudo apt-get install -y linux-headers-$(uname -r) nvidia-driver-535\n\n# Verify GPU\nnvidia-smi\n\n# Install CUDA (for PyTorch)\n# PyTorch bundles its own CUDA, so we mainly need the driver\n\n# Pre-download models\npip install audio-separator[gpu]\npython -c \"\nfrom audio_separator.separator import Separator\nsep = Separator(model_file_dir='/opt/audio-separator-models')\n# Download all models used in integration tests\nmodels = [\n    'model_bs_roformer_ep_317_sdr_12.9755.ckpt',\n    'mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt',\n    'MGM_MAIN_v4.pth',\n    'UVR-MDX-NET-Inst_HQ_4.onnx',\n    'kuielab_b_vocals.onnx',\n    '2_HP-UVR.pth',\n    'htdemucs_6s.yaml',\n    'htdemucs_ft.yaml',\n    # Ensemble preset models\n    'bs_roformer_vocals_resurrection_unwa.ckpt',\n    'melband_roformer_big_beta6x.ckpt',\n    'bs_roformer_vocals_revive_v2_unwa.ckpt',\n    'mel_band_roformer_kim_ft2_bleedless_unwa.ckpt',\n    'bs_roformer_vocals_revive_v3e_unwa.ckpt',\n    'mel_band_roformer_vocals_becruily.ckpt',\n    'mel_band_roformer_vocals_fv4_gabox.ckpt',\n    'mel_band_roformer_instrumental_fv7z_gabox.ckpt',\n    'bs_roformer_instrumental_resurrection_unwa.ckpt',\n    'melband_roformer_inst_v1e_plus.ckpt',\n    'mel_band_roformer_instrumental_becruily.ckpt',\n    'mel_band_roformer_instrumental_instv8_gabox.ckpt',\n    'UVR-MDX-NET-Inst_HQ_5.onnx',\n    'mel_band_roformer_karaoke_gabox_v2.ckpt',\n    'mel_band_roformer_karaoke_becruily.ckpt',\n    # Multi-stem test models\n    '17_HP-Wind_Inst-UVR.pth',\n    'MDX23C-DrumSep-aufr33-jarredou.ckpt',\n    'dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt',\n]\nfor m in models:\n    sep.download_model_and_data(m)\n\"\n```\n\n## Cost Estimate\n\n| Config | Hourly | Per CI run (~10 min) | Monthly (est. 100 runs) |\n|--------|--------|---------------------|-------------------------|\n| n1-standard-4 + T4 (on-demand) | $0.61 | $0.10 | $10 |\n| n1-standard-4 + T4 (spot) | $0.19 | $0.03 | $3 |\n| Current CPU (e2-standard-4) | $0.13 | $0.07 | $7 |\n\nSpot GPU is actually cheaper per-run than current CPU because GPU tests finish 5× faster.\n\n## Files to Create/Modify\n\n| File | Repo | Action |\n|------|------|--------|\n| `infrastructure/config.py` | karaoke-gen | Add GPU machine type + labels |\n| `infrastructure/compute/github_runners.py` | karaoke-gen | Add GPU runner VM |\n| `infrastructure/compute/startup_scripts/github_runner_gpu.sh` | karaoke-gen | GPU-specific startup |\n| `.github/workflows/run-integration-tests.yaml` | python-audio-separator | Target GPU runner |\n\n## Open Questions\n\n- [x] Should the GPU runner be spot or on-demand? → **Spot** ($0.19/hr, ~$3/mo)\n- [x] Should we keep the CPU fallback for when GPU runner is unavailable? → **Deferred** (runner_manager auto-starts VM)\n- [x] Should the runner startup script install NVIDIA drivers from scratch each boot,\n      or use a pre-built GCP Deep Learning VM image? → **From scratch** (idempotent, matches existing pattern)\n- [x] Zone availability: T4 GPUs may not be available in us-central1-a → **Available** in all us-central1 zones (a, b, c, f)\n\n## Rollback Plan\n\nThe GPU runner is additive infrastructure. If it fails:\n1. Change workflow back to `runs-on: self-hosted` (CPU)\n2. Destroy the GPU VM via `pulumi destroy` targeting just that resource\n"
  },
  {
    "path": ".cursor/commands/analyze.md",
    "content": "---\ndescription: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGoal: Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/tasks` has successfully produced a complete `tasks.md`.\n\nSTRICTLY READ-ONLY: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually).\n\nConstitution Authority: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/analyze`.\n\nExecution steps:\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths:\n   - SPEC = FEATURE_DIR/spec.md\n   - PLAN = FEATURE_DIR/plan.md\n   - TASKS = FEATURE_DIR/tasks.md\n   Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command).\n\n2. Load artifacts:\n   - Parse spec.md sections: Overview/Context, Functional Requirements, Non-Functional Requirements, User Stories, Edge Cases (if present).\n   - Parse plan.md: Architecture/stack choices, Data Model references, Phases, Technical constraints.\n   - Parse tasks.md: Task IDs, descriptions, phase grouping, parallel markers [P], referenced file paths.\n   - Load constitution `.specify/memory/constitution.md` for principle validation.\n\n3. Build internal semantic models:\n   - Requirements inventory: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., \"User can upload file\" -> `user-can-upload-file`).\n   - User story/action inventory.\n   - Task coverage mapping: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases).\n   - Constitution rule set: Extract principle names and any MUST/SHOULD normative statements.\n\n4. Detection passes:\n   A. Duplication detection:\n      - Identify near-duplicate requirements. Mark lower-quality phrasing for consolidation.\n   B. Ambiguity detection:\n      - Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria.\n      - Flag unresolved placeholders (TODO, TKTK, ???, <placeholder>, etc.).\n   C. Underspecification:\n      - Requirements with verbs but missing object or measurable outcome.\n      - User stories missing acceptance criteria alignment.\n      - Tasks referencing files or components not defined in spec/plan.\n   D. Constitution alignment:\n      - Any requirement or plan element conflicting with a MUST principle.\n      - Missing mandated sections or quality gates from constitution.\n   E. Coverage gaps:\n      - Requirements with zero associated tasks.\n      - Tasks with no mapped requirement/story.\n      - Non-functional requirements not reflected in tasks (e.g., performance, security).\n   F. Inconsistency:\n      - Terminology drift (same concept named differently across files).\n      - Data entities referenced in plan but absent in spec (or vice versa).\n      - Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note).\n      - Conflicting requirements (e.g., one requires to use Next.js while other says to use Vue as the framework).\n\n5. Severity assignment heuristic:\n   - CRITICAL: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality.\n   - HIGH: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion.\n   - MEDIUM: Terminology drift, missing non-functional task coverage, underspecified edge case.\n   - LOW: Style/wording improvements, minor redundancy not affecting execution order.\n\n6. Produce a Markdown report (no file writes) with sections:\n\n   ### Specification Analysis Report\n   | ID | Category | Severity | Location(s) | Summary | Recommendation |\n   |----|----------|----------|-------------|---------|----------------|\n   | A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version |\n   (Add one row per finding; generate stable IDs prefixed by category initial.)\n\n   Additional subsections:\n   - Coverage Summary Table:\n     | Requirement Key | Has Task? | Task IDs | Notes |\n   - Constitution Alignment Issues (if any)\n   - Unmapped Tasks (if any)\n   - Metrics:\n     * Total Requirements\n     * Total Tasks\n     * Coverage % (requirements with >=1 task)\n     * Ambiguity Count\n     * Duplication Count\n     * Critical Issues Count\n\n7. At end of report, output a concise Next Actions block:\n   - If CRITICAL issues exist: Recommend resolving before `/implement`.\n   - If only LOW/MEDIUM: User may proceed, but provide improvement suggestions.\n   - Provide explicit command suggestions: e.g., \"Run /specify with refinement\", \"Run /plan to adjust architecture\", \"Manually edit tasks.md to add coverage for 'performance-metrics'\".\n\n8. Ask the user: \"Would you like me to suggest concrete remediation edits for the top N issues?\" (Do NOT apply them automatically.)\n\nBehavior rules:\n- NEVER modify files.\n- NEVER hallucinate missing sections—if absent, report them.\n- KEEP findings deterministic: if rerun without changes, produce consistent IDs and counts.\n- LIMIT total findings in the main table to 50; aggregate remainder in a summarized overflow note.\n- If zero issues found, emit a success report with coverage statistics and proceed recommendation.\n\nContext: $ARGUMENTS\n"
  },
  {
    "path": ".cursor/commands/clarify.md",
    "content": "---\ndescription: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGoal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file.\n\nNote: This clarification workflow is expected to run (and be completed) BEFORE invoking `/plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases.\n\nExecution steps:\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields:\n   - `FEATURE_DIR`\n   - `FEATURE_SPEC`\n   - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.)\n   - If JSON parsing fails, abort and instruct user to re-run `/specify` or verify feature branch environment.\n\n2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked).\n\n   Functional Scope & Behavior:\n   - Core user goals & success criteria\n   - Explicit out-of-scope declarations\n   - User roles / personas differentiation\n\n   Domain & Data Model:\n   - Entities, attributes, relationships\n   - Identity & uniqueness rules\n   - Lifecycle/state transitions\n   - Data volume / scale assumptions\n\n   Interaction & UX Flow:\n   - Critical user journeys / sequences\n   - Error/empty/loading states\n   - Accessibility or localization notes\n\n   Non-Functional Quality Attributes:\n   - Performance (latency, throughput targets)\n   - Scalability (horizontal/vertical, limits)\n   - Reliability & availability (uptime, recovery expectations)\n   - Observability (logging, metrics, tracing signals)\n   - Security & privacy (authN/Z, data protection, threat assumptions)\n   - Compliance / regulatory constraints (if any)\n\n   Integration & External Dependencies:\n   - External services/APIs and failure modes\n   - Data import/export formats\n   - Protocol/versioning assumptions\n\n   Edge Cases & Failure Handling:\n   - Negative scenarios\n   - Rate limiting / throttling\n   - Conflict resolution (e.g., concurrent edits)\n\n   Constraints & Tradeoffs:\n   - Technical constraints (language, storage, hosting)\n   - Explicit tradeoffs or rejected alternatives\n\n   Terminology & Consistency:\n   - Canonical glossary terms\n   - Avoided synonyms / deprecated terms\n\n   Completion Signals:\n   - Acceptance criteria testability\n   - Measurable Definition of Done style indicators\n\n   Misc / Placeholders:\n   - TODO markers / unresolved decisions\n   - Ambiguous adjectives (\"robust\", \"intuitive\") lacking quantification\n\n   For each category with Partial or Missing status, add a candidate question opportunity unless:\n   - Clarification would not materially change implementation or validation strategy\n   - Information is better deferred to planning phase (note internally)\n\n3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints:\n    - Maximum of 5 total questions across the whole session.\n    - Each question must be answerable with EITHER:\n       * A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR\n       * A one-word / short‑phrase answer (explicitly constrain: \"Answer in <=5 words\").\n   - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation.\n   - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved.\n   - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness).\n   - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests.\n   - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic.\n\n4. Sequential questioning loop (interactive):\n    - Present EXACTLY ONE question at a time.\n    - For multiple‑choice questions render options as a Markdown table:\n\n       | Option | Description |\n       |--------|-------------|\n       | A | <Option A description> |\n       | B | <Option B description> |\n       | C | <Option C description> | (add D/E as needed up to 5)\n       | Short | Provide a different short answer (<=5 words) | (Include only if free-form alternative is appropriate)\n\n    - For short‑answer style (no meaningful discrete options), output a single line after the question: `Format: Short answer (<=5 words)`.\n    - After the user answers:\n       * Validate the answer maps to one option or fits the <=5 word constraint.\n       * If ambiguous, ask for a quick disambiguation (count still belongs to same question; do not advance).\n       * Once satisfactory, record it in working memory (do not yet write to disk) and move to the next queued question.\n    - Stop asking further questions when:\n       * All critical ambiguities resolved early (remaining queued items become unnecessary), OR\n       * User signals completion (\"done\", \"good\", \"no more\"), OR\n       * You reach 5 asked questions.\n    - Never reveal future queued questions in advance.\n    - If no valid questions exist at start, immediately report no critical ambiguities.\n\n5. Integration after EACH accepted answer (incremental update approach):\n    - Maintain in-memory representation of the spec (loaded once at start) plus the raw file contents.\n    - For the first integrated answer in this session:\n       * Ensure a `## Clarifications` section exists (create it just after the highest-level contextual/overview section per the spec template if missing).\n       * Under it, create (if not present) a `### Session YYYY-MM-DD` subheading for today.\n    - Append a bullet line immediately after acceptance: `- Q: <question> → A: <final answer>`.\n    - Then immediately apply the clarification to the most appropriate section(s):\n       * Functional ambiguity → Update or add a bullet in Functional Requirements.\n       * User interaction / actor distinction → Update User Stories or Actors subsection (if present) with clarified role, constraint, or scenario.\n       * Data shape / entities → Update Data Model (add fields, types, relationships) preserving ordering; note added constraints succinctly.\n       * Non-functional constraint → Add/modify measurable criteria in Non-Functional / Quality Attributes section (convert vague adjective to metric or explicit target).\n       * Edge case / negative flow → Add a new bullet under Edge Cases / Error Handling (or create such subsection if template provides placeholder for it).\n       * Terminology conflict → Normalize term across spec; retain original only if necessary by adding `(formerly referred to as \"X\")` once.\n    - If the clarification invalidates an earlier ambiguous statement, replace that statement instead of duplicating; leave no obsolete contradictory text.\n    - Save the spec file AFTER each integration to minimize risk of context loss (atomic overwrite).\n    - Preserve formatting: do not reorder unrelated sections; keep heading hierarchy intact.\n    - Keep each inserted clarification minimal and testable (avoid narrative drift).\n\n6. Validation (performed after EACH write plus final pass):\n   - Clarifications session contains exactly one bullet per accepted answer (no duplicates).\n   - Total asked (accepted) questions ≤ 5.\n   - Updated sections contain no lingering vague placeholders the new answer was meant to resolve.\n   - No contradictory earlier statement remains (scan for now-invalid alternative choices removed).\n   - Markdown structure valid; only allowed new headings: `## Clarifications`, `### Session YYYY-MM-DD`.\n   - Terminology consistency: same canonical term used across all updated sections.\n\n7. Write the updated spec back to `FEATURE_SPEC`.\n\n8. Report completion (after questioning loop ends or early termination):\n   - Number of questions asked & answered.\n   - Path to updated spec.\n   - Sections touched (list names).\n   - Coverage summary table listing each taxonomy category with Status: Resolved (was Partial/Missing and addressed), Deferred (exceeds question quota or better suited for planning), Clear (already sufficient), Outstanding (still Partial/Missing but low impact).\n   - If any Outstanding or Deferred remain, recommend whether to proceed to `/plan` or run `/clarify` again later post-plan.\n   - Suggested next command.\n\nBehavior rules:\n- If no meaningful ambiguities found (or all potential questions would be low-impact), respond: \"No critical ambiguities detected worth formal clarification.\" and suggest proceeding.\n- If spec file missing, instruct user to run `/specify` first (do not create a new spec here).\n- Never exceed 5 total asked questions (clarification retries for a single question do not count as new questions).\n- Avoid speculative tech stack questions unless the absence blocks functional clarity.\n- Respect user early termination signals (\"stop\", \"done\", \"proceed\").\n - If no questions asked due to full coverage, output a compact coverage summary (all categories Clear) then suggest advancing.\n - If quota reached with unresolved high-impact categories remaining, explicitly flag them under Deferred with rationale.\n\nContext for prioritization: $ARGUMENTS\n"
  },
  {
    "path": ".cursor/commands/constitution.md",
    "content": "---\ndescription: Create or update the project constitution from interactive or provided principle inputs, ensuring all dependent templates stay in sync.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nYou are updating the project constitution at `.specify/memory/constitution.md`. This file is a TEMPLATE containing placeholder tokens in square brackets (e.g. `[PROJECT_NAME]`, `[PRINCIPLE_1_NAME]`). Your job is to (a) collect/derive concrete values, (b) fill the template precisely, and (c) propagate any amendments across dependent artifacts.\n\nFollow this execution flow:\n\n1. Load the existing constitution template at `.specify/memory/constitution.md`.\n   - Identify every placeholder token of the form `[ALL_CAPS_IDENTIFIER]`.\n   **IMPORTANT**: The user might require less or more principles than the ones used in the template. If a number is specified, respect that - follow the general template. You will update the doc accordingly.\n\n2. Collect/derive values for placeholders:\n   - If user input (conversation) supplies a value, use it.\n   - Otherwise infer from existing repo context (README, docs, prior constitution versions if embedded).\n   - For governance dates: `RATIFICATION_DATE` is the original adoption date (if unknown ask or mark TODO), `LAST_AMENDED_DATE` is today if changes are made, otherwise keep previous.\n   - `CONSTITUTION_VERSION` must increment according to semantic versioning rules:\n     * MAJOR: Backward incompatible governance/principle removals or redefinitions.\n     * MINOR: New principle/section added or materially expanded guidance.\n     * PATCH: Clarifications, wording, typo fixes, non-semantic refinements.\n   - If version bump type ambiguous, propose reasoning before finalizing.\n\n3. Draft the updated constitution content:\n   - Replace every placeholder with concrete text (no bracketed tokens left except intentionally retained template slots that the project has chosen not to define yet—explicitly justify any left).\n   - Preserve heading hierarchy and comments can be removed once replaced unless they still add clarifying guidance.\n   - Ensure each Principle section: succinct name line, paragraph (or bullet list) capturing non‑negotiable rules, explicit rationale if not obvious.\n   - Ensure Governance section lists amendment procedure, versioning policy, and compliance review expectations.\n\n4. Consistency propagation checklist (convert prior checklist into active validations):\n   - Read `.specify/templates/plan-template.md` and ensure any \"Constitution Check\" or rules align with updated principles.\n   - Read `.specify/templates/spec-template.md` for scope/requirements alignment—update if constitution adds/removes mandatory sections or constraints.\n   - Read `.specify/templates/tasks-template.md` and ensure task categorization reflects new or removed principle-driven task types (e.g., observability, versioning, testing discipline).\n   - Read each command file in `.specify/templates/commands/*.md` (including this one) to verify no outdated references (agent-specific names like CLAUDE only) remain when generic guidance is required.\n   - Read any runtime guidance docs (e.g., `README.md`, `docs/quickstart.md`, or agent-specific guidance files if present). Update references to principles changed.\n\n5. Produce a Sync Impact Report (prepend as an HTML comment at top of the constitution file after update):\n   - Version change: old → new\n   - List of modified principles (old title → new title if renamed)\n   - Added sections\n   - Removed sections\n   - Templates requiring updates (✅ updated / ⚠ pending) with file paths\n   - Follow-up TODOs if any placeholders intentionally deferred.\n\n6. Validation before final output:\n   - No remaining unexplained bracket tokens.\n   - Version line matches report.\n   - Dates ISO format YYYY-MM-DD.\n   - Principles are declarative, testable, and free of vague language (\"should\" → replace with MUST/SHOULD rationale where appropriate).\n\n7. Write the completed constitution back to `.specify/memory/constitution.md` (overwrite).\n\n8. Output a final summary to the user with:\n   - New version and bump rationale.\n   - Any files flagged for manual follow-up.\n   - Suggested commit message (e.g., `docs: amend constitution to vX.Y.Z (principle additions + governance update)`).\n\nFormatting & Style Requirements:\n- Use Markdown headings exactly as in the template (do not demote/promote levels).\n- Wrap long rationale lines to keep readability (<100 chars ideally) but do not hard enforce with awkward breaks.\n- Keep a single blank line between sections.\n- Avoid trailing whitespace.\n\nIf the user supplies partial updates (e.g., only one principle revision), still perform validation and version decision steps.\n\nIf critical info missing (e.g., ratification date truly unknown), insert `TODO(<FIELD_NAME>): explanation` and include in the Sync Impact Report under deferred items.\n\nDo not create a new template; always operate on the existing `.specify/memory/constitution.md` file.\n"
  },
  {
    "path": ".cursor/commands/implement.md",
    "content": "---\ndescription: Execute the implementation plan by processing and executing all tasks defined in tasks.md\n---\n\nThe user input can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute.\n\n2. Load and analyze the implementation context:\n   - **REQUIRED**: Read tasks.md for the complete task list and execution plan\n   - **REQUIRED**: Read plan.md for tech stack, architecture, and file structure\n   - **IF EXISTS**: Read data-model.md for entities and relationships\n   - **IF EXISTS**: Read contracts/ for API specifications and test requirements\n   - **IF EXISTS**: Read research.md for technical decisions and constraints\n   - **IF EXISTS**: Read quickstart.md for integration scenarios\n\n3. Parse tasks.md structure and extract:\n   - **Task phases**: Setup, Tests, Core, Integration, Polish\n   - **Task dependencies**: Sequential vs parallel execution rules\n   - **Task details**: ID, description, file paths, parallel markers [P]\n   - **Execution flow**: Order and dependency requirements\n\n4. Execute implementation following the task plan:\n   - **Phase-by-phase execution**: Complete each phase before moving to the next\n   - **Respect dependencies**: Run sequential tasks in order, parallel tasks [P] can run together  \n   - **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks\n   - **File-based coordination**: Tasks affecting the same files must run sequentially\n   - **Validation checkpoints**: Verify each phase completion before proceeding\n\n5. Implementation execution rules:\n   - **Setup first**: Initialize project structure, dependencies, configuration\n   - **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios\n   - **Core development**: Implement models, services, CLI commands, endpoints\n   - **Integration work**: Database connections, middleware, logging, external services\n   - **Polish and validation**: Unit tests, performance optimization, documentation\n\n6. Progress tracking and error handling:\n   - Report progress after each completed task\n   - Halt execution if any non-parallel task fails\n   - For parallel tasks [P], continue with successful tasks, report failed ones\n   - Provide clear error messages with context for debugging\n   - Suggest next steps if implementation cannot proceed\n   - **IMPORTANT** For completed tasks, make sure to mark the task off as [X] in the tasks file.\n\n7. Completion validation:\n   - Verify all required tasks are completed\n   - Check that implemented features match the original specification\n   - Validate that tests pass and coverage meets requirements\n   - Confirm the implementation follows the technical plan\n   - Report final status with summary of completed work\n\nNote: This command assumes a complete task breakdown exists in tasks.md. If tasks are incomplete or missing, suggest running `/tasks` first to regenerate the task list.\n"
  },
  {
    "path": ".cursor/commands/plan.md",
    "content": "---\ndescription: Execute the implementation planning workflow using the plan template to generate design artifacts.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGiven the implementation details provided as an argument, do this:\n\n1. Run `.specify/scripts/bash/setup-plan.sh --json` from the repo root and parse JSON for FEATURE_SPEC, IMPL_PLAN, SPECS_DIR, BRANCH. All future file paths must be absolute.\n   - BEFORE proceeding, inspect FEATURE_SPEC for a `## Clarifications` section with at least one `Session` subheading. If missing or clearly ambiguous areas remain (vague adjectives, unresolved critical choices), PAUSE and instruct the user to run `/clarify` first to reduce rework. Only continue if: (a) Clarifications exist OR (b) an explicit user override is provided (e.g., \"proceed without clarification\"). Do not attempt to fabricate clarifications yourself.\n2. Read and analyze the feature specification to understand:\n   - The feature requirements and user stories\n   - Functional and non-functional requirements\n   - Success criteria and acceptance criteria\n   - Any technical constraints or dependencies mentioned\n\n3. Read the constitution at `.specify/memory/constitution.md` to understand constitutional requirements.\n\n4. Execute the implementation plan template:\n   - Load `.specify/templates/plan-template.md` (already copied to IMPL_PLAN path)\n   - Set Input path to FEATURE_SPEC\n   - Run the Execution Flow (main) function steps 1-9\n   - The template is self-contained and executable\n   - Follow error handling and gate checks as specified\n   - Let the template guide artifact generation in $SPECS_DIR:\n     * Phase 0 generates research.md\n     * Phase 1 generates data-model.md, contracts/, quickstart.md\n     * Phase 2 generates tasks.md\n   - Incorporate user-provided details from arguments into Technical Context: $ARGUMENTS\n   - Update Progress Tracking as you complete each phase\n\n5. Verify execution completed:\n   - Check Progress Tracking shows all phases complete\n   - Ensure all required artifacts were generated\n   - Confirm no ERROR states in execution\n\n6. Report results with branch name, file paths, and generated artifacts.\n\nUse absolute paths with the repository root for all file operations to avoid path issues.\n"
  },
  {
    "path": ".cursor/commands/specify.md",
    "content": "---\ndescription: Create or update the feature specification from a natural language feature description.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nThe text the user typed after `/specify` in the triggering message **is** the feature description. Assume you always have it available in this conversation even if `$ARGUMENTS` appears literally below. Do not ask the user to repeat it unless they provided an empty command.\n\nGiven that feature description, do this:\n\n1. Run the script `.specify/scripts/bash/create-new-feature.sh --json \"$ARGUMENTS\"` from repo root and parse its JSON output for BRANCH_NAME and SPEC_FILE. All file paths must be absolute.\n  **IMPORTANT** You must only ever run this script once. The JSON is provided in the terminal as output - always refer to it to get the actual content you're looking for.\n2. Load `.specify/templates/spec-template.md` to understand required sections.\n3. Write the specification to SPEC_FILE using the template structure, replacing placeholders with concrete details derived from the feature description (arguments) while preserving section order and headings.\n4. Report completion with branch name, spec file path, and readiness for the next phase.\n\nNote: The script creates and checks out the new branch and initializes the spec file before writing.\n"
  },
  {
    "path": ".cursor/commands/tasks.md",
    "content": "---\ndescription: Generate an actionable, dependency-ordered tasks.md for the feature based on available design artifacts.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute.\n2. Load and analyze available design documents:\n   - Always read plan.md for tech stack and libraries\n   - IF EXISTS: Read data-model.md for entities\n   - IF EXISTS: Read contracts/ for API endpoints\n   - IF EXISTS: Read research.md for technical decisions\n   - IF EXISTS: Read quickstart.md for test scenarios\n\n   Note: Not all projects have all documents. For example:\n   - CLI tools might not have contracts/\n   - Simple libraries might not need data-model.md\n   - Generate tasks based on what's available\n\n3. Generate tasks following the template:\n   - Use `.specify/templates/tasks-template.md` as the base\n   - Replace example tasks with actual tasks based on:\n     * **Setup tasks**: Project init, dependencies, linting\n     * **Test tasks [P]**: One per contract, one per integration scenario\n     * **Core tasks**: One per entity, service, CLI command, endpoint\n     * **Integration tasks**: DB connections, middleware, logging\n     * **Polish tasks [P]**: Unit tests, performance, docs\n\n4. Task generation rules:\n   - Each contract file → contract test task marked [P]\n   - Each entity in data-model → model creation task marked [P]\n   - Each endpoint → implementation task (not parallel if shared files)\n   - Each user story → integration test marked [P]\n   - Different files = can be parallel [P]\n   - Same file = sequential (no [P])\n\n5. Order tasks by dependencies:\n   - Setup before everything\n   - Tests before implementation (TDD)\n   - Models before services\n   - Services before endpoints\n   - Core before integration\n   - Everything before polish\n\n6. Include parallel execution examples:\n   - Group [P] tasks that can run together\n   - Show actual Task agent commands\n\n7. Create FEATURE_DIR/tasks.md with:\n   - Correct feature name from implementation plan\n   - Numbered tasks (T001, T002, etc.)\n   - Clear file paths for each task\n   - Dependency notes\n   - Parallel execution guidance\n\nContext for task generation: $ARGUMENTS\n\nThe tasks.md should be immediately executable - each task must be specific enough that an LLM can complete it without additional context.\n"
  },
  {
    "path": ".cursor/rules/specify-rules.mdc",
    "content": "# python-audio-separator Development Guidelines\n\nAuto-generated from all feature plans. Last updated: 2025-09-25\n\n## Active Technologies\n- Python 3.11+ + PyTorch, librosa, soundfile, numpy, onnxruntime (001-update-roformer-implementation)\n\n## Project Structure\n```\nsrc/\ntests/\n```\n\n## Commands\ncd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .\n\n## Code Style\nPython 3.11+: Follow standard conventions\n\n## Recent Changes\n- 001-update-roformer-implementation: Added Python 3.11+ + PyTorch, librosa, soundfile, numpy, onnxruntime\n\n<!-- MANUAL ADDITIONS START -->\n<!-- MANUAL ADDITIONS END -->"
  },
  {
    "path": ".github/FUNDING.yml",
    "content": "github: beveradb"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/BUG_REPORT.yml",
    "content": "name: Bug report\ndescription: Report a problem you encountered\ntitle: \"[Bug]: \"\nlabels: [\"bug\"]\nbody:\n  - type: textarea\n    id: bug-description\n    attributes:\n      label: Describe the bug\n      description: Please provide a concise description of the bug.\n      placeholder: Bug description\n    validations:\n      required: true\n  - type: checkboxes\n    attributes:\n      label: Have you searched for existing issues?  🔎\n      description: Please search to see if there is already an issue for the problem you encountered.\n      options:\n        - label: I have searched and found no existing issues.\n          required: true\n  - type: markdown\n    attributes:\n      value: \"---\"\n  - type: textarea\n    id: screenshots\n    attributes:\n      label: Screenshots or Videos\n      description: Add screenshots, gifs, or videos to help explain your problem.\n      placeholder: Upload screenshots, gifs, and videos here.\n    validations:\n      required: false\n  - type: textarea\n    id: logs\n    attributes:\n      label: Logs\n      description: Please include the full stack trace of the errors you encounter.\n      render: shell\n  - type: markdown\n    attributes:\n      value: \"---\"\n  - type: textarea\n    id: system-info\n    attributes:\n      label: System Info\n      description: Provide information about your system.\n      value: |\n        Operating System: \n        Python version: \n        Other...\n      render: shell\n    validations:\n      required: true\n  - type: textarea\n    id: additional\n    attributes:\n      label: Additional Information\n      description: Add any other useful information about the problem here.\n      placeholder: Is there any additional helpful information you can share?\n    validations:\n      required: false\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/FEATURE_REQUEST.yml",
    "content": "name: Feature request\ndescription: Suggest an idea for this project\ntitle: \"[Feature]: \"\nlabels: [\"enhancement\", \"feature\"]\nbody:\n  - type: textarea\n    id: description\n    attributes:\n      label: Description\n      description: Clearly and concisely describe what you would like to change, add, or implement.\n      placeholder: Tell us your idea.\n    validations:\n      required: true\n"
  },
  {
    "path": ".github/prompts/analyze.prompt.md",
    "content": "---\ndescription: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGoal: Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/tasks` has successfully produced a complete `tasks.md`.\n\nSTRICTLY READ-ONLY: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually).\n\nConstitution Authority: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/analyze`.\n\nExecution steps:\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths:\n   - SPEC = FEATURE_DIR/spec.md\n   - PLAN = FEATURE_DIR/plan.md\n   - TASKS = FEATURE_DIR/tasks.md\n   Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command).\n\n2. Load artifacts:\n   - Parse spec.md sections: Overview/Context, Functional Requirements, Non-Functional Requirements, User Stories, Edge Cases (if present).\n   - Parse plan.md: Architecture/stack choices, Data Model references, Phases, Technical constraints.\n   - Parse tasks.md: Task IDs, descriptions, phase grouping, parallel markers [P], referenced file paths.\n   - Load constitution `.specify/memory/constitution.md` for principle validation.\n\n3. Build internal semantic models:\n   - Requirements inventory: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., \"User can upload file\" -> `user-can-upload-file`).\n   - User story/action inventory.\n   - Task coverage mapping: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases).\n   - Constitution rule set: Extract principle names and any MUST/SHOULD normative statements.\n\n4. Detection passes:\n   A. Duplication detection:\n      - Identify near-duplicate requirements. Mark lower-quality phrasing for consolidation.\n   B. Ambiguity detection:\n      - Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria.\n      - Flag unresolved placeholders (TODO, TKTK, ???, <placeholder>, etc.).\n   C. Underspecification:\n      - Requirements with verbs but missing object or measurable outcome.\n      - User stories missing acceptance criteria alignment.\n      - Tasks referencing files or components not defined in spec/plan.\n   D. Constitution alignment:\n      - Any requirement or plan element conflicting with a MUST principle.\n      - Missing mandated sections or quality gates from constitution.\n   E. Coverage gaps:\n      - Requirements with zero associated tasks.\n      - Tasks with no mapped requirement/story.\n      - Non-functional requirements not reflected in tasks (e.g., performance, security).\n   F. Inconsistency:\n      - Terminology drift (same concept named differently across files).\n      - Data entities referenced in plan but absent in spec (or vice versa).\n      - Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note).\n      - Conflicting requirements (e.g., one requires to use Next.js while other says to use Vue as the framework).\n\n5. Severity assignment heuristic:\n   - CRITICAL: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality.\n   - HIGH: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion.\n   - MEDIUM: Terminology drift, missing non-functional task coverage, underspecified edge case.\n   - LOW: Style/wording improvements, minor redundancy not affecting execution order.\n\n6. Produce a Markdown report (no file writes) with sections:\n\n   ### Specification Analysis Report\n   | ID | Category | Severity | Location(s) | Summary | Recommendation |\n   |----|----------|----------|-------------|---------|----------------|\n   | A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version |\n   (Add one row per finding; generate stable IDs prefixed by category initial.)\n\n   Additional subsections:\n   - Coverage Summary Table:\n     | Requirement Key | Has Task? | Task IDs | Notes |\n   - Constitution Alignment Issues (if any)\n   - Unmapped Tasks (if any)\n   - Metrics:\n     * Total Requirements\n     * Total Tasks\n     * Coverage % (requirements with >=1 task)\n     * Ambiguity Count\n     * Duplication Count\n     * Critical Issues Count\n\n7. At end of report, output a concise Next Actions block:\n   - If CRITICAL issues exist: Recommend resolving before `/implement`.\n   - If only LOW/MEDIUM: User may proceed, but provide improvement suggestions.\n   - Provide explicit command suggestions: e.g., \"Run /specify with refinement\", \"Run /plan to adjust architecture\", \"Manually edit tasks.md to add coverage for 'performance-metrics'\".\n\n8. Ask the user: \"Would you like me to suggest concrete remediation edits for the top N issues?\" (Do NOT apply them automatically.)\n\nBehavior rules:\n- NEVER modify files.\n- NEVER hallucinate missing sections—if absent, report them.\n- KEEP findings deterministic: if rerun without changes, produce consistent IDs and counts.\n- LIMIT total findings in the main table to 50; aggregate remainder in a summarized overflow note.\n- If zero issues found, emit a success report with coverage statistics and proceed recommendation.\n\nContext: $ARGUMENTS\n"
  },
  {
    "path": ".github/prompts/clarify.prompt.md",
    "content": "---\ndescription: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGoal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file.\n\nNote: This clarification workflow is expected to run (and be completed) BEFORE invoking `/plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases.\n\nExecution steps:\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields:\n   - `FEATURE_DIR`\n   - `FEATURE_SPEC`\n   - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.)\n   - If JSON parsing fails, abort and instruct user to re-run `/specify` or verify feature branch environment.\n\n2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked).\n\n   Functional Scope & Behavior:\n   - Core user goals & success criteria\n   - Explicit out-of-scope declarations\n   - User roles / personas differentiation\n\n   Domain & Data Model:\n   - Entities, attributes, relationships\n   - Identity & uniqueness rules\n   - Lifecycle/state transitions\n   - Data volume / scale assumptions\n\n   Interaction & UX Flow:\n   - Critical user journeys / sequences\n   - Error/empty/loading states\n   - Accessibility or localization notes\n\n   Non-Functional Quality Attributes:\n   - Performance (latency, throughput targets)\n   - Scalability (horizontal/vertical, limits)\n   - Reliability & availability (uptime, recovery expectations)\n   - Observability (logging, metrics, tracing signals)\n   - Security & privacy (authN/Z, data protection, threat assumptions)\n   - Compliance / regulatory constraints (if any)\n\n   Integration & External Dependencies:\n   - External services/APIs and failure modes\n   - Data import/export formats\n   - Protocol/versioning assumptions\n\n   Edge Cases & Failure Handling:\n   - Negative scenarios\n   - Rate limiting / throttling\n   - Conflict resolution (e.g., concurrent edits)\n\n   Constraints & Tradeoffs:\n   - Technical constraints (language, storage, hosting)\n   - Explicit tradeoffs or rejected alternatives\n\n   Terminology & Consistency:\n   - Canonical glossary terms\n   - Avoided synonyms / deprecated terms\n\n   Completion Signals:\n   - Acceptance criteria testability\n   - Measurable Definition of Done style indicators\n\n   Misc / Placeholders:\n   - TODO markers / unresolved decisions\n   - Ambiguous adjectives (\"robust\", \"intuitive\") lacking quantification\n\n   For each category with Partial or Missing status, add a candidate question opportunity unless:\n   - Clarification would not materially change implementation or validation strategy\n   - Information is better deferred to planning phase (note internally)\n\n3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints:\n    - Maximum of 5 total questions across the whole session.\n    - Each question must be answerable with EITHER:\n       * A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR\n       * A one-word / short‑phrase answer (explicitly constrain: \"Answer in <=5 words\").\n   - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation.\n   - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved.\n   - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness).\n   - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests.\n   - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic.\n\n4. Sequential questioning loop (interactive):\n    - Present EXACTLY ONE question at a time.\n    - For multiple‑choice questions render options as a Markdown table:\n\n       | Option | Description |\n       |--------|-------------|\n       | A | <Option A description> |\n       | B | <Option B description> |\n       | C | <Option C description> | (add D/E as needed up to 5)\n       | Short | Provide a different short answer (<=5 words) | (Include only if free-form alternative is appropriate)\n\n    - For short‑answer style (no meaningful discrete options), output a single line after the question: `Format: Short answer (<=5 words)`.\n    - After the user answers:\n       * Validate the answer maps to one option or fits the <=5 word constraint.\n       * If ambiguous, ask for a quick disambiguation (count still belongs to same question; do not advance).\n       * Once satisfactory, record it in working memory (do not yet write to disk) and move to the next queued question.\n    - Stop asking further questions when:\n       * All critical ambiguities resolved early (remaining queued items become unnecessary), OR\n       * User signals completion (\"done\", \"good\", \"no more\"), OR\n       * You reach 5 asked questions.\n    - Never reveal future queued questions in advance.\n    - If no valid questions exist at start, immediately report no critical ambiguities.\n\n5. Integration after EACH accepted answer (incremental update approach):\n    - Maintain in-memory representation of the spec (loaded once at start) plus the raw file contents.\n    - For the first integrated answer in this session:\n       * Ensure a `## Clarifications` section exists (create it just after the highest-level contextual/overview section per the spec template if missing).\n       * Under it, create (if not present) a `### Session YYYY-MM-DD` subheading for today.\n    - Append a bullet line immediately after acceptance: `- Q: <question> → A: <final answer>`.\n    - Then immediately apply the clarification to the most appropriate section(s):\n       * Functional ambiguity → Update or add a bullet in Functional Requirements.\n       * User interaction / actor distinction → Update User Stories or Actors subsection (if present) with clarified role, constraint, or scenario.\n       * Data shape / entities → Update Data Model (add fields, types, relationships) preserving ordering; note added constraints succinctly.\n       * Non-functional constraint → Add/modify measurable criteria in Non-Functional / Quality Attributes section (convert vague adjective to metric or explicit target).\n       * Edge case / negative flow → Add a new bullet under Edge Cases / Error Handling (or create such subsection if template provides placeholder for it).\n       * Terminology conflict → Normalize term across spec; retain original only if necessary by adding `(formerly referred to as \"X\")` once.\n    - If the clarification invalidates an earlier ambiguous statement, replace that statement instead of duplicating; leave no obsolete contradictory text.\n    - Save the spec file AFTER each integration to minimize risk of context loss (atomic overwrite).\n    - Preserve formatting: do not reorder unrelated sections; keep heading hierarchy intact.\n    - Keep each inserted clarification minimal and testable (avoid narrative drift).\n\n6. Validation (performed after EACH write plus final pass):\n   - Clarifications session contains exactly one bullet per accepted answer (no duplicates).\n   - Total asked (accepted) questions ≤ 5.\n   - Updated sections contain no lingering vague placeholders the new answer was meant to resolve.\n   - No contradictory earlier statement remains (scan for now-invalid alternative choices removed).\n   - Markdown structure valid; only allowed new headings: `## Clarifications`, `### Session YYYY-MM-DD`.\n   - Terminology consistency: same canonical term used across all updated sections.\n\n7. Write the updated spec back to `FEATURE_SPEC`.\n\n8. Report completion (after questioning loop ends or early termination):\n   - Number of questions asked & answered.\n   - Path to updated spec.\n   - Sections touched (list names).\n   - Coverage summary table listing each taxonomy category with Status: Resolved (was Partial/Missing and addressed), Deferred (exceeds question quota or better suited for planning), Clear (already sufficient), Outstanding (still Partial/Missing but low impact).\n   - If any Outstanding or Deferred remain, recommend whether to proceed to `/plan` or run `/clarify` again later post-plan.\n   - Suggested next command.\n\nBehavior rules:\n- If no meaningful ambiguities found (or all potential questions would be low-impact), respond: \"No critical ambiguities detected worth formal clarification.\" and suggest proceeding.\n- If spec file missing, instruct user to run `/specify` first (do not create a new spec here).\n- Never exceed 5 total asked questions (clarification retries for a single question do not count as new questions).\n- Avoid speculative tech stack questions unless the absence blocks functional clarity.\n- Respect user early termination signals (\"stop\", \"done\", \"proceed\").\n - If no questions asked due to full coverage, output a compact coverage summary (all categories Clear) then suggest advancing.\n - If quota reached with unresolved high-impact categories remaining, explicitly flag them under Deferred with rationale.\n\nContext for prioritization: $ARGUMENTS\n"
  },
  {
    "path": ".github/prompts/constitution.prompt.md",
    "content": "---\ndescription: Create or update the project constitution from interactive or provided principle inputs, ensuring all dependent templates stay in sync.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nYou are updating the project constitution at `.specify/memory/constitution.md`. This file is a TEMPLATE containing placeholder tokens in square brackets (e.g. `[PROJECT_NAME]`, `[PRINCIPLE_1_NAME]`). Your job is to (a) collect/derive concrete values, (b) fill the template precisely, and (c) propagate any amendments across dependent artifacts.\n\nFollow this execution flow:\n\n1. Load the existing constitution template at `.specify/memory/constitution.md`.\n   - Identify every placeholder token of the form `[ALL_CAPS_IDENTIFIER]`.\n   **IMPORTANT**: The user might require less or more principles than the ones used in the template. If a number is specified, respect that - follow the general template. You will update the doc accordingly.\n\n2. Collect/derive values for placeholders:\n   - If user input (conversation) supplies a value, use it.\n   - Otherwise infer from existing repo context (README, docs, prior constitution versions if embedded).\n   - For governance dates: `RATIFICATION_DATE` is the original adoption date (if unknown ask or mark TODO), `LAST_AMENDED_DATE` is today if changes are made, otherwise keep previous.\n   - `CONSTITUTION_VERSION` must increment according to semantic versioning rules:\n     * MAJOR: Backward incompatible governance/principle removals or redefinitions.\n     * MINOR: New principle/section added or materially expanded guidance.\n     * PATCH: Clarifications, wording, typo fixes, non-semantic refinements.\n   - If version bump type ambiguous, propose reasoning before finalizing.\n\n3. Draft the updated constitution content:\n   - Replace every placeholder with concrete text (no bracketed tokens left except intentionally retained template slots that the project has chosen not to define yet—explicitly justify any left).\n   - Preserve heading hierarchy and comments can be removed once replaced unless they still add clarifying guidance.\n   - Ensure each Principle section: succinct name line, paragraph (or bullet list) capturing non‑negotiable rules, explicit rationale if not obvious.\n   - Ensure Governance section lists amendment procedure, versioning policy, and compliance review expectations.\n\n4. Consistency propagation checklist (convert prior checklist into active validations):\n   - Read `.specify/templates/plan-template.md` and ensure any \"Constitution Check\" or rules align with updated principles.\n   - Read `.specify/templates/spec-template.md` for scope/requirements alignment—update if constitution adds/removes mandatory sections or constraints.\n   - Read `.specify/templates/tasks-template.md` and ensure task categorization reflects new or removed principle-driven task types (e.g., observability, versioning, testing discipline).\n   - Read each command file in `.specify/templates/commands/*.md` (including this one) to verify no outdated references (agent-specific names like CLAUDE only) remain when generic guidance is required.\n   - Read any runtime guidance docs (e.g., `README.md`, `docs/quickstart.md`, or agent-specific guidance files if present). Update references to principles changed.\n\n5. Produce a Sync Impact Report (prepend as an HTML comment at top of the constitution file after update):\n   - Version change: old → new\n   - List of modified principles (old title → new title if renamed)\n   - Added sections\n   - Removed sections\n   - Templates requiring updates (✅ updated / ⚠ pending) with file paths\n   - Follow-up TODOs if any placeholders intentionally deferred.\n\n6. Validation before final output:\n   - No remaining unexplained bracket tokens.\n   - Version line matches report.\n   - Dates ISO format YYYY-MM-DD.\n   - Principles are declarative, testable, and free of vague language (\"should\" → replace with MUST/SHOULD rationale where appropriate).\n\n7. Write the completed constitution back to `.specify/memory/constitution.md` (overwrite).\n\n8. Output a final summary to the user with:\n   - New version and bump rationale.\n   - Any files flagged for manual follow-up.\n   - Suggested commit message (e.g., `docs: amend constitution to vX.Y.Z (principle additions + governance update)`).\n\nFormatting & Style Requirements:\n- Use Markdown headings exactly as in the template (do not demote/promote levels).\n- Wrap long rationale lines to keep readability (<100 chars ideally) but do not hard enforce with awkward breaks.\n- Keep a single blank line between sections.\n- Avoid trailing whitespace.\n\nIf the user supplies partial updates (e.g., only one principle revision), still perform validation and version decision steps.\n\nIf critical info missing (e.g., ratification date truly unknown), insert `TODO(<FIELD_NAME>): explanation` and include in the Sync Impact Report under deferred items.\n\nDo not create a new template; always operate on the existing `.specify/memory/constitution.md` file.\n"
  },
  {
    "path": ".github/prompts/implement.prompt.md",
    "content": "---\ndescription: Execute the implementation plan by processing and executing all tasks defined in tasks.md\n---\n\nThe user input can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute.\n\n2. Load and analyze the implementation context:\n   - **REQUIRED**: Read tasks.md for the complete task list and execution plan\n   - **REQUIRED**: Read plan.md for tech stack, architecture, and file structure\n   - **IF EXISTS**: Read data-model.md for entities and relationships\n   - **IF EXISTS**: Read contracts/ for API specifications and test requirements\n   - **IF EXISTS**: Read research.md for technical decisions and constraints\n   - **IF EXISTS**: Read quickstart.md for integration scenarios\n\n3. Parse tasks.md structure and extract:\n   - **Task phases**: Setup, Tests, Core, Integration, Polish\n   - **Task dependencies**: Sequential vs parallel execution rules\n   - **Task details**: ID, description, file paths, parallel markers [P]\n   - **Execution flow**: Order and dependency requirements\n\n4. Execute implementation following the task plan:\n   - **Phase-by-phase execution**: Complete each phase before moving to the next\n   - **Respect dependencies**: Run sequential tasks in order, parallel tasks [P] can run together  \n   - **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks\n   - **File-based coordination**: Tasks affecting the same files must run sequentially\n   - **Validation checkpoints**: Verify each phase completion before proceeding\n\n5. Implementation execution rules:\n   - **Setup first**: Initialize project structure, dependencies, configuration\n   - **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios\n   - **Core development**: Implement models, services, CLI commands, endpoints\n   - **Integration work**: Database connections, middleware, logging, external services\n   - **Polish and validation**: Unit tests, performance optimization, documentation\n\n6. Progress tracking and error handling:\n   - Report progress after each completed task\n   - Halt execution if any non-parallel task fails\n   - For parallel tasks [P], continue with successful tasks, report failed ones\n   - Provide clear error messages with context for debugging\n   - Suggest next steps if implementation cannot proceed\n   - **IMPORTANT** For completed tasks, make sure to mark the task off as [X] in the tasks file.\n\n7. Completion validation:\n   - Verify all required tasks are completed\n   - Check that implemented features match the original specification\n   - Validate that tests pass and coverage meets requirements\n   - Confirm the implementation follows the technical plan\n   - Report final status with summary of completed work\n\nNote: This command assumes a complete task breakdown exists in tasks.md. If tasks are incomplete or missing, suggest running `/tasks` first to regenerate the task list.\n"
  },
  {
    "path": ".github/prompts/plan.prompt.md",
    "content": "---\ndescription: Execute the implementation planning workflow using the plan template to generate design artifacts.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nGiven the implementation details provided as an argument, do this:\n\n1. Run `.specify/scripts/bash/setup-plan.sh --json` from the repo root and parse JSON for FEATURE_SPEC, IMPL_PLAN, SPECS_DIR, BRANCH. All future file paths must be absolute.\n   - BEFORE proceeding, inspect FEATURE_SPEC for a `## Clarifications` section with at least one `Session` subheading. If missing or clearly ambiguous areas remain (vague adjectives, unresolved critical choices), PAUSE and instruct the user to run `/clarify` first to reduce rework. Only continue if: (a) Clarifications exist OR (b) an explicit user override is provided (e.g., \"proceed without clarification\"). Do not attempt to fabricate clarifications yourself.\n2. Read and analyze the feature specification to understand:\n   - The feature requirements and user stories\n   - Functional and non-functional requirements\n   - Success criteria and acceptance criteria\n   - Any technical constraints or dependencies mentioned\n\n3. Read the constitution at `.specify/memory/constitution.md` to understand constitutional requirements.\n\n4. Execute the implementation plan template:\n   - Load `.specify/templates/plan-template.md` (already copied to IMPL_PLAN path)\n   - Set Input path to FEATURE_SPEC\n   - Run the Execution Flow (main) function steps 1-9\n   - The template is self-contained and executable\n   - Follow error handling and gate checks as specified\n   - Let the template guide artifact generation in $SPECS_DIR:\n     * Phase 0 generates research.md\n     * Phase 1 generates data-model.md, contracts/, quickstart.md\n     * Phase 2 generates tasks.md\n   - Incorporate user-provided details from arguments into Technical Context: $ARGUMENTS\n   - Update Progress Tracking as you complete each phase\n\n5. Verify execution completed:\n   - Check Progress Tracking shows all phases complete\n   - Ensure all required artifacts were generated\n   - Confirm no ERROR states in execution\n\n6. Report results with branch name, file paths, and generated artifacts.\n\nUse absolute paths with the repository root for all file operations to avoid path issues.\n"
  },
  {
    "path": ".github/prompts/specify.prompt.md",
    "content": "---\ndescription: Create or update the feature specification from a natural language feature description.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\nThe text the user typed after `/specify` in the triggering message **is** the feature description. Assume you always have it available in this conversation even if `$ARGUMENTS` appears literally below. Do not ask the user to repeat it unless they provided an empty command.\n\nGiven that feature description, do this:\n\n1. Run the script `.specify/scripts/bash/create-new-feature.sh --json \"$ARGUMENTS\"` from repo root and parse its JSON output for BRANCH_NAME and SPEC_FILE. All file paths must be absolute.\n  **IMPORTANT** You must only ever run this script once. The JSON is provided in the terminal as output - always refer to it to get the actual content you're looking for.\n2. Load `.specify/templates/spec-template.md` to understand required sections.\n3. Write the specification to SPEC_FILE using the template structure, replacing placeholders with concrete details derived from the feature description (arguments) while preserving section order and headings.\n4. Report completion with branch name, spec file path, and readiness for the next phase.\n\nNote: The script creates and checks out the new branch and initializes the spec file before writing.\n"
  },
  {
    "path": ".github/prompts/tasks.prompt.md",
    "content": "---\ndescription: Generate an actionable, dependency-ordered tasks.md for the feature based on available design artifacts.\n---\n\nThe user input to you can be provided directly by the agent or as a command argument - you **MUST** consider it before proceeding with the prompt (if not empty).\n\nUser input:\n\n$ARGUMENTS\n\n1. Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute.\n2. Load and analyze available design documents:\n   - Always read plan.md for tech stack and libraries\n   - IF EXISTS: Read data-model.md for entities\n   - IF EXISTS: Read contracts/ for API endpoints\n   - IF EXISTS: Read research.md for technical decisions\n   - IF EXISTS: Read quickstart.md for test scenarios\n\n   Note: Not all projects have all documents. For example:\n   - CLI tools might not have contracts/\n   - Simple libraries might not need data-model.md\n   - Generate tasks based on what's available\n\n3. Generate tasks following the template:\n   - Use `.specify/templates/tasks-template.md` as the base\n   - Replace example tasks with actual tasks based on:\n     * **Setup tasks**: Project init, dependencies, linting\n     * **Test tasks [P]**: One per contract, one per integration scenario\n     * **Core tasks**: One per entity, service, CLI command, endpoint\n     * **Integration tasks**: DB connections, middleware, logging\n     * **Polish tasks [P]**: Unit tests, performance, docs\n\n4. Task generation rules:\n   - Each contract file → contract test task marked [P]\n   - Each entity in data-model → model creation task marked [P]\n   - Each endpoint → implementation task (not parallel if shared files)\n   - Each user story → integration test marked [P]\n   - Different files = can be parallel [P]\n   - Same file = sequential (no [P])\n\n5. Order tasks by dependencies:\n   - Setup before everything\n   - Tests before implementation (TDD)\n   - Models before services\n   - Services before endpoints\n   - Core before integration\n   - Everything before polish\n\n6. Include parallel execution examples:\n   - Group [P] tasks that can run together\n   - Show actual Task agent commands\n\n7. Create FEATURE_DIR/tasks.md with:\n   - Correct feature name from implementation plan\n   - Numbered tasks (T001, T002, etc.)\n   - Clear file paths for each task\n   - Dependency notes\n   - Parallel execution guidance\n\nContext for task generation: $ARGUMENTS\n\nThe tasks.md should be immediately executable - each task must be specific enough that an LLM can complete it without additional context.\n"
  },
  {
    "path": ".github/workflows/deploy-to-cloudrun.yml",
    "content": "name: Deploy to Cloud Run\n\non:\n  # Deploy when a new PyPI release is published\n  workflow_run:\n    workflows: [\"Publish to PyPI\"]\n    types: [completed]\n\n  # Deploy on changes to Dockerfile or Cloud Run server\n  push:\n    branches: [main]\n    paths:\n      - \"Dockerfile.cloudrun\"\n      - \"audio_separator/remote/deploy_cloudrun.py\"\n      - \"audio_separator/ensemble_presets.json\"\n      - \"cloudbuild.yaml\"\n\n  # Manual deployment\n  workflow_dispatch:\n\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    # Only run on successful PyPI publish (or push/manual triggers)\n    if: ${{ github.event_name != 'workflow_run' || github.event.workflow_run.conclusion == 'success' }}\n\n    permissions:\n      contents: read\n      id-token: write  # Required for Workload Identity Federation\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v4\n\n      - name: Authenticate to Google Cloud\n        uses: google-github-actions/auth@v2\n        with:\n          workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}\n          service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}\n\n      - name: Set up Cloud SDK\n        uses: google-github-actions/setup-gcloud@v2\n\n      # Use Cloud Build for the Docker build — it has native x86 with enough\n      # RAM to load ML models during the build (baking models into the image).\n      - name: Build and push via Cloud Build\n        run: |\n          gcloud builds submit \\\n            --config cloudbuild.yaml \\\n            --region=us-east4 \\\n            --project=nomadkaraoke \\\n            --substitutions=SHORT_SHA=${GITHUB_SHA::8}\n\n      - name: Deploy to Cloud Run\n        run: |\n          gcloud run services update audio-separator \\\n            --image=\"us-east4-docker.pkg.dev/nomadkaraoke/audio-separator/api:${GITHUB_SHA::8}\" \\\n            --region=us-east4 \\\n            --project=nomadkaraoke \\\n            --quiet\n"
  },
  {
    "path": ".github/workflows/deploy-to-modal.yml",
    "content": "name: deploy-to-modal\n\non:\n  # Deploy after PyPI publish (when new version is available)\n  workflow_run:\n    workflows: [\"publish-to-pypi\"]\n    types:\n      - completed\n  # Also deploy on direct changes to Modal deployment files\n  push:\n    branches:\n      - main\n    paths:\n      - 'audio_separator/remote/deploy_modal.py'\n  # Manual trigger\n  workflow_dispatch:\n\njobs:\n  deploy-modal:\n    # Only run if publish-to-pypi succeeded (or if triggered directly)\n    if: |\n      github.event_name == 'workflow_dispatch' ||\n      github.event_name == 'push' ||\n      (github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.12'\n\n      - name: Install Modal CLI and dependencies\n        run: |\n          pip install modal\n          # Install dependencies needed to parse deploy_modal.py\n          pip install fastapi filetype python-multipart\n\n      - name: Deploy to Modal\n        env:\n          MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}\n          MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}\n        run: |\n          modal deploy audio_separator/remote/deploy_modal.py\n\n      - name: Verify deployment\n        run: |\n          sleep 10  # Wait for container to be ready\n          VERSION=$(curl -s https://nomadkaraoke--audio-separator-api.modal.run/health | jq -r '.version')\n          echo \"Deployed version: $VERSION\"\n"
  },
  {
    "path": ".github/workflows/github-sponsors.yml",
    "content": "name: Generate Sponsors README\non:\n  workflow_dispatch:\n  schedule:\n    - cron: 30 15 * * 0-6\npermissions:\n  contents: write\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout 🛎️\n        uses: actions/checkout@v4\n\n      - name: Generate Sponsors 💖\n        uses: JamesIves/github-sponsors-readme-action@v1\n        with:\n          token: ${{ secrets.SPONSORS_WORKFLOW_PAT }}\n          file: 'README.md'\n\n      - name: Deploy to GitHub Pages 🚀\n        uses: JamesIves/github-pages-deploy-action@v4\n        with:\n          branch: main\n          folder: '.'\n"
  },
  {
    "path": ".github/workflows/publish-to-docker.yml",
    "content": "name: publish-to-docker\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - pyproject.toml\n      - Dockerfile.cpu\n      - Dockerfile.gpu\n      - Dockerfile.runpod\n  workflow_dispatch:\n\njobs:\n  build-and-push-docker:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Delete huge unnecessary tools folder\n        run: rm -rf /opt/hostedtoolcache\n\n      - name: Checkout\n        uses: actions/checkout@v4\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.x'\n\n      - name: Install TOML\n        run: pip install toml\n\n      - name: Get version from pyproject.toml\n        run: |\n          VERSION=$(python -c \"import toml; print(toml.load('pyproject.toml')['tool']['poetry']['version'])\")\n          echo \"VERSION=$VERSION\" >> $GITHUB_ENV\n\n      - name: Set up QEMU\n        uses: docker/setup-qemu-action@v3\n\n      - name: Set up Docker Buildx\n        uses: docker/setup-buildx-action@v3\n\n      - name: Login to Docker Hub\n        uses: docker/login-action@v3\n        with:\n          username: ${{ secrets.DOCKERHUB_USERNAME }}\n          password: ${{ secrets.DOCKERHUB_TOKEN }}\n\n      - name: Build and push Docker image for CPU\n        if: ${{ github.ref == 'refs/heads/main' }}\n        uses: docker/build-push-action@v5\n        with:\n          file: Dockerfile.cpu\n          context: .\n          platforms: linux/amd64,linux/arm64\n          push: true\n          tags: |\n            beveradb/audio-separator:cpu-${{ env.VERSION }}\n            beveradb/audio-separator:cpu\n            beveradb/audio-separator:latest\n\n      - name: Build and push Docker image for CUDA 11 GPU\n        if: ${{ github.ref == 'refs/heads/main' }}\n        uses: docker/build-push-action@v5\n        with:\n          file: Dockerfile.cuda11\n          context: .\n          platforms: linux/amd64\n          push: true\n          tags: |\n            beveradb/audio-separator:gpu-${{ env.VERSION }}\n            beveradb/audio-separator:gpu\n\n      - name: Build and push Docker image for CUDA 12 GPU\n        if: ${{ github.ref == 'refs/heads/main' }}\n        uses: docker/build-push-action@v5\n        with:\n          file: Dockerfile.cuda12\n          context: .\n          platforms: linux/amd64\n          push: true\n          tags: |\n            beveradb/audio-separator:cuda12-${{ env.VERSION }}\n            beveradb/audio-separator:cuda12\n\n      # Deliberately commented out because Github CI can't build this runpod image due to disk space limits\n      # Instead, I build this (17GB) image locally and push it to Docker Hub manually.\n      # - name: Build and push Docker image for Runpod\n      #   if: ${{ github.ref == 'refs/heads/main' }}\n      #   uses: docker/build-push-action@v5\n      #   with:\n      #     file: Dockerfile.runpod\n      #     context: .\n      #     platforms: linux/amd64\n      #     push: true\n      #     tags: |\n      #       beveradb/audio-separator:runpod-${{ env.VERSION }}\n      #       beveradb/audio-separator:runpod\n"
  },
  {
    "path": ".github/workflows/publish-to-pypi.yml",
    "content": "name: publish-to-pypi\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - pyproject.toml\n  workflow_dispatch:\n\njobs:\n  # Auto-publish when version is increased\n  publish-pypi:\n    # Only publish on `main` branch\n    if: github.ref == 'refs/heads/main'\n    runs-on: ubuntu-latest\n    permissions: # Don't forget permissions\n      contents: write\n\n    steps:\n      - uses: etils-actions/pypi-auto-publish@v1\n        with:\n          pypi-token: ${{ secrets.PYPI_API_TOKEN }}\n          gh-token: ${{ secrets.GITHUB_TOKEN }}\n          parse-changelog: false\n"
  },
  {
    "path": ".github/workflows/run-integration-tests.yaml",
    "content": "name: run-integration-tests\n\non:\n  pull_request:\n\njobs:\n  changes:\n    runs-on: ubuntu-latest\n    outputs:\n      should_run: ${{ steps.filter.outputs.code }}\n    steps:\n      - uses: actions/checkout@v4\n      - uses: dorny/paths-filter@v3\n        id: filter\n        with:\n          filters: |\n            code:\n              - 'audio_separator/**'\n              - 'tests/**'\n              - 'pyproject.toml'\n              - 'poetry.lock'\n              - '.github/workflows/run-integration-tests.yaml'\n\n  # ── Integration test jobs (parallel across 3 GPU runners) ──────────\n  #\n  # Balanced to ~7 min each so all 3 finish around the same time.\n  #\n  #   ensemble-presets (~8 min): test_ensemble_integration (heaviest single file)\n  #   core-models      (~7 min): test_24bit + test_cli + test_separator_output + roformer tests\n  #   stems-and-quality (~6 min): test_ensemble_meaningful + test_multi_stem + test_remote_api\n\n  ensemble-presets:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: [self-hosted, gpu]\n    timeout-minutes: 15\n    env:\n      AUDIO_SEPARATOR_MODEL_DIR: /opt/audio-separator-models\n    steps:\n      - uses: actions/checkout@v4\n      - name: Verify GPU availability\n        run: nvidia-smi --query-gpu=driver_version,name,memory.total --format=csv,noheader\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n      - name: Install pipx and poetry\n        run: |\n          python -m pip install --user pipx && python -m pipx ensurepath\n          python -m pipx install poetry\n          echo \"$HOME/.local/bin\" >> $GITHUB_PATH\n      - name: Install system dependencies\n        run: sudo apt-get update && sudo apt-get install -y ffmpeg libsamplerate0 libsamplerate-dev\n      - name: Set up Python with cache\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n          cache: poetry\n      - name: Install Poetry dependencies (GPU)\n        run: poetry install -E gpu\n      - name: Verify pre-cached models\n        run: |\n          MODEL_COUNT=$(ls -1 $AUDIO_SEPARATOR_MODEL_DIR | wc -l)\n          echo \"Pre-cached models: $MODEL_COUNT\"\n          if [ \"$MODEL_COUNT\" -lt 10 ]; then\n            echo \"::warning::Expected at least 10 pre-cached model files, found $MODEL_COUNT\"\n          fi\n      - name: \"Run: ensemble preset tests (~8 min)\"\n        run: poetry run pytest -sv tests/integration/test_ensemble_integration.py\n      - name: Upload test artifacts\n        if: always()\n        uses: actions/upload-artifact@v4\n        with:\n          name: ensemble-presets-results\n          path: |\n            *.flac\n            tests/*.flac\n\n  core-models:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: [self-hosted, gpu]\n    timeout-minutes: 15\n    env:\n      AUDIO_SEPARATOR_MODEL_DIR: /opt/audio-separator-models\n    steps:\n      - uses: actions/checkout@v4\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n      - name: Install pipx and poetry\n        run: |\n          python -m pip install --user pipx && python -m pipx ensurepath\n          python -m pipx install poetry\n          echo \"$HOME/.local/bin\" >> $GITHUB_PATH\n      - name: Install system dependencies\n        run: sudo apt-get update && sudo apt-get install -y ffmpeg libsamplerate0 libsamplerate-dev\n      - name: Set up Python with cache\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n          cache: poetry\n      - name: Install Poetry dependencies (GPU)\n        run: poetry install -E gpu\n      - name: \"Run: 24-bit, CLI, output, and roformer tests (~7 min)\"\n        run: |\n          poetry run pytest -sv \\\n            tests/integration/test_24bit_preservation.py \\\n            tests/integration/test_cli_integration.py \\\n            tests/integration/test_separator_output_integration.py \\\n            tests/integration/test_roformer_audio_quality.py \\\n            tests/integration/test_roformer_backward_compatibility.py \\\n            tests/integration/test_roformer_config_validation.py \\\n            tests/integration/test_roformer_e2e.py \\\n            tests/integration/test_roformer_fallback_mechanism.py \\\n            tests/integration/test_roformer_model_switching.py \\\n            tests/integration/test_roformer_new_parameters.py\n      - name: Upload test artifacts\n        if: always()\n        uses: actions/upload-artifact@v4\n        with:\n          name: core-models-results\n          path: |\n            *.flac\n            tests/*.flac\n\n  stems-and-quality:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: [self-hosted, gpu]\n    timeout-minutes: 15\n    env:\n      AUDIO_SEPARATOR_MODEL_DIR: /opt/audio-separator-models\n    steps:\n      - uses: actions/checkout@v4\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n      - name: Install pipx and poetry\n        run: |\n          python -m pip install --user pipx && python -m pipx ensurepath\n          python -m pipx install poetry\n          echo \"$HOME/.local/bin\" >> $GITHUB_PATH\n      - name: Install system dependencies\n        run: sudo apt-get update && sudo apt-get install -y ffmpeg libsamplerate0 libsamplerate-dev\n      - name: Set up Python with cache\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n          cache: poetry\n      - name: Install Poetry dependencies (GPU)\n        run: poetry install -E gpu\n      - name: \"Run: ensemble quality, multi-stem, and remote API tests (~6 min)\"\n        run: |\n          poetry run pytest -sv \\\n            tests/integration/test_ensemble_meaningful.py \\\n            tests/integration/test_multi_stem_verification.py \\\n            tests/integration/test_remote_api_integration.py\n      - name: Upload test artifacts\n        if: always()\n        uses: actions/upload-artifact@v4\n        with:\n          name: stems-and-quality-results\n          path: |\n            *.flac\n            tests/*.flac\n\n  # ── Gate job for branch protection ────────────────────────────────\n\n  integration-test:\n    needs: [changes, ensemble-presets, core-models, stems-and-quality]\n    if: always()\n    runs-on: ubuntu-latest\n    steps:\n      - name: Check test results\n        run: |\n          if [[ \"${{ needs.changes.outputs.should_run }}\" != \"true\" ]]; then\n            echo \"Tests skipped - no code changes detected\"\n            exit 0\n          fi\n\n          echo \"ensemble-presets:   ${{ needs.ensemble-presets.result }}\"\n          echo \"core-models:        ${{ needs.core-models.result }}\"\n          echo \"stems-and-quality:  ${{ needs.stems-and-quality.result }}\"\n\n          if [[ \"${{ needs.ensemble-presets.result }}\" == \"failure\" ]] || \\\n             [[ \"${{ needs.core-models.result }}\" == \"failure\" ]] || \\\n             [[ \"${{ needs.stems-and-quality.result }}\" == \"failure\" ]]; then\n            echo \"Integration tests failed\"\n            exit 1\n          fi\n          echo \"All integration tests passed\"\n"
  },
  {
    "path": ".github/workflows/run-unit-tests.yaml",
    "content": "name: run-unit-tests\n\non:\n  pull_request:\n\njobs:\n  changes:\n    runs-on: ubuntu-latest\n    outputs:\n      should_run: ${{ steps.filter.outputs.code }}\n    steps:\n      - uses: actions/checkout@v4\n      - uses: dorny/paths-filter@v3\n        id: filter\n        with:\n          filters: |\n            code:\n              - 'audio_separator/**'\n              - 'tests/**'\n              - 'pyproject.toml'\n              - 'poetry.lock'\n\n  test-ubuntu:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: ubuntu-latest\n\n    strategy:\n      matrix:\n        python-version: ['3.10', '3.11', '3.12', '3.13']\n\n    steps:\n      - name: Free disk space\n        uses: jlumbroso/free-disk-space@main\n        with:\n          tool-cache: false\n          android: true\n          dotnet: true\n          haskell: true\n          large-packages: false\n          docker-images: true\n          swap-storage: true\n\n      - name: Checkout project\n        uses: actions/checkout@v4\n\n      - name: Install poetry\n        run: pipx install poetry\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: ${{ matrix.python-version }}\n          cache: poetry # caching dependencies from poetry.lock\n\n      - name: Install Poetry dependencies (CPU)\n        run: poetry install -E cpu\n\n      - name: Run unit tests with coverage\n        run: poetry run pytest tests/unit\n\n  test-macos:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: macos-latest\n\n    strategy:\n      matrix:\n        python-version: ['3.10', '3.11', '3.12', '3.13']\n\n    steps:\n      - name: Checkout project\n        uses: actions/checkout@v4\n\n      - name: Install poetry\n        run: pipx install poetry\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: ${{ matrix.python-version }}\n          cache: poetry # caching dependencies from poetry.lock\n\n      - name: Install Poetry dependencies (CPU)\n        run: poetry install -E cpu\n\n      - name: Run unit tests with coverage\n        run: |\n          poetry run pytest tests/unit\n\n  test-windows:\n    needs: changes\n    if: needs.changes.outputs.should_run == 'true'\n    runs-on: windows-latest\n\n    strategy:\n      matrix:\n        python-version: ['3.10', '3.11', '3.12', '3.13']\n\n    steps:\n      - name: Checkout project\n        uses: actions/checkout@v4\n\n      - name: Install poetry\n        run: pipx install poetry\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: ${{ matrix.python-version }}\n          cache: poetry # caching dependencies from poetry.lock\n\n      - name: Install Poetry dependencies (CPU)\n        run: poetry install -E cpu\n\n      - name: Run unit tests with coverage\n        run: poetry run pytest tests/unit\n\n  # Gate job for branch protection - always reports a status\n  unit-tests:\n    needs: [changes, test-ubuntu, test-macos, test-windows]\n    if: always()\n    runs-on: ubuntu-latest\n    steps:\n      - name: Check test results\n        run: |\n          if [[ \"${{ needs.changes.outputs.should_run }}\" != \"true\" ]]; then\n            echo \"Tests skipped - no code changes detected\"\n            exit 0\n          fi\n          if [[ \"${{ needs.test-ubuntu.result }}\" == \"failure\" ]] || \\\n             [[ \"${{ needs.test-macos.result }}\" == \"failure\" ]] || \\\n             [[ \"${{ needs.test-windows.result }}\" == \"failure\" ]]; then\n            echo \"Some tests failed\"\n            exit 1\n          fi\n          echo \"All tests passed\"\n"
  },
  {
    "path": ".gitignore",
    "content": "# Andrew env\n.DS_Store\n.vscode\n\n# Andrew functional adds\n/tracks/\n/lyrics/\n/.cache/\n*.onnx\n*.pth\n*.wav\n/*.flac\n*.mp3\ntests/model-metrics/results\ntests/model-metrics/datasets\ntemp_images\n\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n#.idea/\n"
  },
  {
    "path": ".specify/memory/constitution.md",
    "content": "<!--\nSync Impact Report:\nVersion change: Initial → 1.0.0\nModified principles: All (new constitution)\nAdded sections: Core Principles, Performance Standards, Quality Assurance\nRemoved sections: None (template placeholders)\nTemplates requiring updates:\n- ✅ plan-template.md (Constitution Check section updated)\n- ✅ spec-template.md (aligned with principles)\n- ✅ tasks-template.md (TDD enforcement aligned)\n- ✅ agent-file-template.md (no changes needed)\nFollow-up TODOs: None\n-->\n\n# Audio Separator Constitution\n\n## Core Principles\n\n### I. Library-First Architecture\nThe `Separator` class MUST be the primary interface for all audio separation functionality. CLI and remote API are thin wrappers around the library. Libraries MUST be self-contained, independently testable, and documented with clear separation of concerns between architectures (MDX, VR, Demucs, MDXC).\n\n**Rationale**: This ensures the core functionality can be integrated into other projects while maintaining a consistent API across all interfaces.\n\n### II. Multi-Interface Consistency\nEvery core feature MUST be accessible via three interfaces: Python API, CLI, and Remote API. Parameter names and behavior MUST be identical across all interfaces. All interfaces MUST support the same model architectures and processing options.\n\n**Rationale**: Users should have consistent experience regardless of how they access the functionality, enabling seamless transition between local and remote processing.\n\n### III. Test-First Development (NON-NEGOTIABLE)\nTDD is mandatory: Tests written → Tests fail → Implementation → Tests pass. All new features MUST include unit tests, integration tests with audio validation (SSIM comparison), and CLI tests. No code merges without passing tests.\n\n**Rationale**: Audio processing requires precision and consistency. Automated testing with perceptual validation ensures output quality remains stable across changes.\n\n### IV. Performance & Resource Efficiency\nHardware acceleration MUST be supported (CUDA, CoreML, DirectML). Memory usage MUST be optimized for large audio files through streaming and batch processing. Processing parameters MUST be tunable for different hardware capabilities.\n\n**Rationale**: Audio separation is computationally intensive. Efficient resource usage enables processing of longer files and broader hardware compatibility.\n\n### V. Model Architecture Separation\nEach model architecture (MDX, VR, Demucs, MDXC) MUST be implemented in separate modules inheriting from `CommonSeparator`. Loading one architecture MUST NOT load code from others. Architecture-specific parameters MUST be isolated and documented.\n\n**Rationale**: This prevents conflicts between different model types and keeps memory usage minimal by loading only required components.\n\n## Performance Standards\n\nAll audio processing operations MUST meet these requirements:\n- **Memory efficiency**: Support files larger than available RAM through streaming\n- **GPU utilization**: Automatically detect and utilize available hardware acceleration\n- **Batch processing**: Support processing multiple files without model reloading\n- **Output consistency**: Identical inputs MUST produce identical outputs (deterministic)\n\n## Quality Assurance\n\n### Testing Requirements\n- **Unit tests**: All core classes and functions\n- **Integration tests**: End-to-end audio processing with SSIM validation\n- **Performance tests**: Memory usage and processing speed benchmarks\n- **Cross-platform tests**: Windows, macOS, Linux compatibility\n\n### Audio Validation\nOutput quality MUST be validated using:\n- Waveform and spectrogram image comparison (SSIM ≥ 0.95)\n- Reference audio files for each supported model architecture\n- Automated regression testing on model output changes\n\n## Governance\n\nThis constitution supersedes all other development practices. All pull requests MUST verify compliance with these principles. Any deviation MUST be explicitly justified and documented.\n\n**Amendment Process**: Changes require documentation of impact, approval from maintainers, and migration plan for affected code.\n\n**Compliance Review**: All features undergo constitutional compliance check during planning phase and post-implementation validation.\n\n**Version**: 1.0.0 | **Ratified**: 2025-09-25 | **Last Amended**: 2025-09-25"
  },
  {
    "path": ".specify/scripts/bash/check-prerequisites.sh",
    "content": "#!/usr/bin/env bash\n\n# Consolidated prerequisite checking script\n#\n# This script provides unified prerequisite checking for Spec-Driven Development workflow.\n# It replaces the functionality previously spread across multiple scripts.\n#\n# Usage: ./check-prerequisites.sh [OPTIONS]\n#\n# OPTIONS:\n#   --json              Output in JSON format\n#   --require-tasks     Require tasks.md to exist (for implementation phase)\n#   --include-tasks     Include tasks.md in AVAILABLE_DOCS list\n#   --paths-only        Only output path variables (no validation)\n#   --help, -h          Show help message\n#\n# OUTPUTS:\n#   JSON mode: {\"FEATURE_DIR\":\"...\", \"AVAILABLE_DOCS\":[\"...\"]}\n#   Text mode: FEATURE_DIR:... \\n AVAILABLE_DOCS: \\n ✓/✗ file.md\n#   Paths only: REPO_ROOT: ... \\n BRANCH: ... \\n FEATURE_DIR: ... etc.\n\nset -e\n\n# Parse command line arguments\nJSON_MODE=false\nREQUIRE_TASKS=false\nINCLUDE_TASKS=false\nPATHS_ONLY=false\n\nfor arg in \"$@\"; do\n    case \"$arg\" in\n        --json)\n            JSON_MODE=true\n            ;;\n        --require-tasks)\n            REQUIRE_TASKS=true\n            ;;\n        --include-tasks)\n            INCLUDE_TASKS=true\n            ;;\n        --paths-only)\n            PATHS_ONLY=true\n            ;;\n        --help|-h)\n            cat << 'EOF'\nUsage: check-prerequisites.sh [OPTIONS]\n\nConsolidated prerequisite checking for Spec-Driven Development workflow.\n\nOPTIONS:\n  --json              Output in JSON format\n  --require-tasks     Require tasks.md to exist (for implementation phase)\n  --include-tasks     Include tasks.md in AVAILABLE_DOCS list\n  --paths-only        Only output path variables (no prerequisite validation)\n  --help, -h          Show this help message\n\nEXAMPLES:\n  # Check task prerequisites (plan.md required)\n  ./check-prerequisites.sh --json\n  \n  # Check implementation prerequisites (plan.md + tasks.md required)\n  ./check-prerequisites.sh --json --require-tasks --include-tasks\n  \n  # Get feature paths only (no validation)\n  ./check-prerequisites.sh --paths-only\n  \nEOF\n            exit 0\n            ;;\n        *)\n            echo \"ERROR: Unknown option '$arg'. Use --help for usage information.\" >&2\n            exit 1\n            ;;\n    esac\ndone\n\n# Source common functions\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nsource \"$SCRIPT_DIR/common.sh\"\n\n# Get feature paths and validate branch\neval $(get_feature_paths)\ncheck_feature_branch \"$CURRENT_BRANCH\" \"$HAS_GIT\" || exit 1\n\n# If paths-only mode, output paths and exit (support JSON + paths-only combined)\nif $PATHS_ONLY; then\n    if $JSON_MODE; then\n        # Minimal JSON paths payload (no validation performed)\n        printf '{\"REPO_ROOT\":\"%s\",\"BRANCH\":\"%s\",\"FEATURE_DIR\":\"%s\",\"FEATURE_SPEC\":\"%s\",\"IMPL_PLAN\":\"%s\",\"TASKS\":\"%s\"}\\n' \\\n            \"$REPO_ROOT\" \"$CURRENT_BRANCH\" \"$FEATURE_DIR\" \"$FEATURE_SPEC\" \"$IMPL_PLAN\" \"$TASKS\"\n    else\n        echo \"REPO_ROOT: $REPO_ROOT\"\n        echo \"BRANCH: $CURRENT_BRANCH\"\n        echo \"FEATURE_DIR: $FEATURE_DIR\"\n        echo \"FEATURE_SPEC: $FEATURE_SPEC\"\n        echo \"IMPL_PLAN: $IMPL_PLAN\"\n        echo \"TASKS: $TASKS\"\n    fi\n    exit 0\nfi\n\n# Validate required directories and files\nif [[ ! -d \"$FEATURE_DIR\" ]]; then\n    echo \"ERROR: Feature directory not found: $FEATURE_DIR\" >&2\n    echo \"Run /specify first to create the feature structure.\" >&2\n    exit 1\nfi\n\nif [[ ! -f \"$IMPL_PLAN\" ]]; then\n    echo \"ERROR: plan.md not found in $FEATURE_DIR\" >&2\n    echo \"Run /plan first to create the implementation plan.\" >&2\n    exit 1\nfi\n\n# Check for tasks.md if required\nif $REQUIRE_TASKS && [[ ! -f \"$TASKS\" ]]; then\n    echo \"ERROR: tasks.md not found in $FEATURE_DIR\" >&2\n    echo \"Run /tasks first to create the task list.\" >&2\n    exit 1\nfi\n\n# Build list of available documents\ndocs=()\n\n# Always check these optional docs\n[[ -f \"$RESEARCH\" ]] && docs+=(\"research.md\")\n[[ -f \"$DATA_MODEL\" ]] && docs+=(\"data-model.md\")\n\n# Check contracts directory (only if it exists and has files)\nif [[ -d \"$CONTRACTS_DIR\" ]] && [[ -n \"$(ls -A \"$CONTRACTS_DIR\" 2>/dev/null)\" ]]; then\n    docs+=(\"contracts/\")\nfi\n\n[[ -f \"$QUICKSTART\" ]] && docs+=(\"quickstart.md\")\n\n# Include tasks.md if requested and it exists\nif $INCLUDE_TASKS && [[ -f \"$TASKS\" ]]; then\n    docs+=(\"tasks.md\")\nfi\n\n# Output results\nif $JSON_MODE; then\n    # Build JSON array of documents\n    if [[ ${#docs[@]} -eq 0 ]]; then\n        json_docs=\"[]\"\n    else\n        json_docs=$(printf '\"%s\",' \"${docs[@]}\")\n        json_docs=\"[${json_docs%,}]\"\n    fi\n    \n    printf '{\"FEATURE_DIR\":\"%s\",\"AVAILABLE_DOCS\":%s}\\n' \"$FEATURE_DIR\" \"$json_docs\"\nelse\n    # Text output\n    echo \"FEATURE_DIR:$FEATURE_DIR\"\n    echo \"AVAILABLE_DOCS:\"\n    \n    # Show status of each potential document\n    check_file \"$RESEARCH\" \"research.md\"\n    check_file \"$DATA_MODEL\" \"data-model.md\"\n    check_dir \"$CONTRACTS_DIR\" \"contracts/\"\n    check_file \"$QUICKSTART\" \"quickstart.md\"\n    \n    if $INCLUDE_TASKS; then\n        check_file \"$TASKS\" \"tasks.md\"\n    fi\nfi"
  },
  {
    "path": ".specify/scripts/bash/common.sh",
    "content": "#!/usr/bin/env bash\n# Common functions and variables for all scripts\n\n# Get repository root, with fallback for non-git repositories\nget_repo_root() {\n    if git rev-parse --show-toplevel >/dev/null 2>&1; then\n        git rev-parse --show-toplevel\n    else\n        # Fall back to script location for non-git repos\n        local script_dir=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\n        (cd \"$script_dir/../../..\" && pwd)\n    fi\n}\n\n# Get current branch, with fallback for non-git repositories\nget_current_branch() {\n    # First check if SPECIFY_FEATURE environment variable is set\n    if [[ -n \"${SPECIFY_FEATURE:-}\" ]]; then\n        echo \"$SPECIFY_FEATURE\"\n        return\n    fi\n    \n    # Then check git if available\n    if git rev-parse --abbrev-ref HEAD >/dev/null 2>&1; then\n        git rev-parse --abbrev-ref HEAD\n        return\n    fi\n    \n    # For non-git repos, try to find the latest feature directory\n    local repo_root=$(get_repo_root)\n    local specs_dir=\"$repo_root/specs\"\n    \n    if [[ -d \"$specs_dir\" ]]; then\n        local latest_feature=\"\"\n        local highest=0\n        \n        for dir in \"$specs_dir\"/*; do\n            if [[ -d \"$dir\" ]]; then\n                local dirname=$(basename \"$dir\")\n                if [[ \"$dirname\" =~ ^([0-9]{3})- ]]; then\n                    local number=${BASH_REMATCH[1]}\n                    number=$((10#$number))\n                    if [[ \"$number\" -gt \"$highest\" ]]; then\n                        highest=$number\n                        latest_feature=$dirname\n                    fi\n                fi\n            fi\n        done\n        \n        if [[ -n \"$latest_feature\" ]]; then\n            echo \"$latest_feature\"\n            return\n        fi\n    fi\n    \n    echo \"main\"  # Final fallback\n}\n\n# Check if we have git available\nhas_git() {\n    git rev-parse --show-toplevel >/dev/null 2>&1\n}\n\ncheck_feature_branch() {\n    local branch=\"$1\"\n    local has_git_repo=\"$2\"\n    \n    # For non-git repos, we can't enforce branch naming but still provide output\n    if [[ \"$has_git_repo\" != \"true\" ]]; then\n        echo \"[specify] Warning: Git repository not detected; skipped branch validation\" >&2\n        return 0\n    fi\n    \n    if [[ ! \"$branch\" =~ ^[0-9]{3}- ]]; then\n        echo \"ERROR: Not on a feature branch. Current branch: $branch\" >&2\n        echo \"Feature branches should be named like: 001-feature-name\" >&2\n        return 1\n    fi\n    \n    return 0\n}\n\nget_feature_dir() { echo \"$1/specs/$2\"; }\n\nget_feature_paths() {\n    local repo_root=$(get_repo_root)\n    local current_branch=$(get_current_branch)\n    local has_git_repo=\"false\"\n    \n    if has_git; then\n        has_git_repo=\"true\"\n    fi\n    \n    local feature_dir=$(get_feature_dir \"$repo_root\" \"$current_branch\")\n    \n    cat <<EOF\nREPO_ROOT='$repo_root'\nCURRENT_BRANCH='$current_branch'\nHAS_GIT='$has_git_repo'\nFEATURE_DIR='$feature_dir'\nFEATURE_SPEC='$feature_dir/spec.md'\nIMPL_PLAN='$feature_dir/plan.md'\nTASKS='$feature_dir/tasks.md'\nRESEARCH='$feature_dir/research.md'\nDATA_MODEL='$feature_dir/data-model.md'\nQUICKSTART='$feature_dir/quickstart.md'\nCONTRACTS_DIR='$feature_dir/contracts'\nEOF\n}\n\ncheck_file() { [[ -f \"$1\" ]] && echo \"  ✓ $2\" || echo \"  ✗ $2\"; }\ncheck_dir() { [[ -d \"$1\" && -n $(ls -A \"$1\" 2>/dev/null) ]] && echo \"  ✓ $2\" || echo \"  ✗ $2\"; }\n"
  },
  {
    "path": ".specify/scripts/bash/create-new-feature.sh",
    "content": "#!/usr/bin/env bash\n\nset -e\n\nJSON_MODE=false\nARGS=()\nfor arg in \"$@\"; do\n    case \"$arg\" in\n        --json) JSON_MODE=true ;;\n        --help|-h) echo \"Usage: $0 [--json] <feature_description>\"; exit 0 ;;\n        *) ARGS+=(\"$arg\") ;;\n    esac\ndone\n\nFEATURE_DESCRIPTION=\"${ARGS[*]}\"\nif [ -z \"$FEATURE_DESCRIPTION\" ]; then\n    echo \"Usage: $0 [--json] <feature_description>\" >&2\n    exit 1\nfi\n\n# Function to find the repository root by searching for existing project markers\nfind_repo_root() {\n    local dir=\"$1\"\n    while [ \"$dir\" != \"/\" ]; do\n        if [ -d \"$dir/.git\" ] || [ -d \"$dir/.specify\" ]; then\n            echo \"$dir\"\n            return 0\n        fi\n        dir=\"$(dirname \"$dir\")\"\n    done\n    return 1\n}\n\n# Resolve repository root. Prefer git information when available, but fall back\n# to searching for repository markers so the workflow still functions in repositories that\n# were initialised with --no-git.\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\n\nif git rev-parse --show-toplevel >/dev/null 2>&1; then\n    REPO_ROOT=$(git rev-parse --show-toplevel)\n    HAS_GIT=true\nelse\n    REPO_ROOT=\"$(find_repo_root \"$SCRIPT_DIR\")\"\n    if [ -z \"$REPO_ROOT\" ]; then\n        echo \"Error: Could not determine repository root. Please run this script from within the repository.\" >&2\n        exit 1\n    fi\n    HAS_GIT=false\nfi\n\ncd \"$REPO_ROOT\"\n\nSPECS_DIR=\"$REPO_ROOT/specs\"\nmkdir -p \"$SPECS_DIR\"\n\nHIGHEST=0\nif [ -d \"$SPECS_DIR\" ]; then\n    for dir in \"$SPECS_DIR\"/*; do\n        [ -d \"$dir\" ] || continue\n        dirname=$(basename \"$dir\")\n        number=$(echo \"$dirname\" | grep -o '^[0-9]\\+' || echo \"0\")\n        number=$((10#$number))\n        if [ \"$number\" -gt \"$HIGHEST\" ]; then HIGHEST=$number; fi\n    done\nfi\n\nNEXT=$((HIGHEST + 1))\nFEATURE_NUM=$(printf \"%03d\" \"$NEXT\")\n\nBRANCH_NAME=$(echo \"$FEATURE_DESCRIPTION\" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\\+/-/g' | sed 's/^-//' | sed 's/-$//')\nWORDS=$(echo \"$BRANCH_NAME\" | tr '-' '\\n' | grep -v '^$' | head -3 | tr '\\n' '-' | sed 's/-$//')\nBRANCH_NAME=\"${FEATURE_NUM}-${WORDS}\"\n\nif [ \"$HAS_GIT\" = true ]; then\n    git checkout -b \"$BRANCH_NAME\"\nelse\n    >&2 echo \"[specify] Warning: Git repository not detected; skipped branch creation for $BRANCH_NAME\"\nfi\n\nFEATURE_DIR=\"$SPECS_DIR/$BRANCH_NAME\"\nmkdir -p \"$FEATURE_DIR\"\n\nTEMPLATE=\"$REPO_ROOT/.specify/templates/spec-template.md\"\nSPEC_FILE=\"$FEATURE_DIR/spec.md\"\nif [ -f \"$TEMPLATE\" ]; then cp \"$TEMPLATE\" \"$SPEC_FILE\"; else touch \"$SPEC_FILE\"; fi\n\n# Set the SPECIFY_FEATURE environment variable for the current session\nexport SPECIFY_FEATURE=\"$BRANCH_NAME\"\n\nif $JSON_MODE; then\n    printf '{\"BRANCH_NAME\":\"%s\",\"SPEC_FILE\":\"%s\",\"FEATURE_NUM\":\"%s\"}\\n' \"$BRANCH_NAME\" \"$SPEC_FILE\" \"$FEATURE_NUM\"\nelse\n    echo \"BRANCH_NAME: $BRANCH_NAME\"\n    echo \"SPEC_FILE: $SPEC_FILE\"\n    echo \"FEATURE_NUM: $FEATURE_NUM\"\n    echo \"SPECIFY_FEATURE environment variable set to: $BRANCH_NAME\"\nfi\n"
  },
  {
    "path": ".specify/scripts/bash/setup-plan.sh",
    "content": "#!/usr/bin/env bash\n\nset -e\n\n# Parse command line arguments\nJSON_MODE=false\nARGS=()\n\nfor arg in \"$@\"; do\n    case \"$arg\" in\n        --json) \n            JSON_MODE=true \n            ;;\n        --help|-h) \n            echo \"Usage: $0 [--json]\"\n            echo \"  --json    Output results in JSON format\"\n            echo \"  --help    Show this help message\"\n            exit 0 \n            ;;\n        *) \n            ARGS+=(\"$arg\") \n            ;;\n    esac\ndone\n\n# Get script directory and load common functions\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nsource \"$SCRIPT_DIR/common.sh\"\n\n# Get all paths and variables from common functions\neval $(get_feature_paths)\n\n# Check if we're on a proper feature branch (only for git repos)\ncheck_feature_branch \"$CURRENT_BRANCH\" \"$HAS_GIT\" || exit 1\n\n# Ensure the feature directory exists\nmkdir -p \"$FEATURE_DIR\"\n\n# Copy plan template if it exists\nTEMPLATE=\"$REPO_ROOT/.specify/templates/plan-template.md\"\nif [[ -f \"$TEMPLATE\" ]]; then\n    cp \"$TEMPLATE\" \"$IMPL_PLAN\"\n    echo \"Copied plan template to $IMPL_PLAN\"\nelse\n    echo \"Warning: Plan template not found at $TEMPLATE\"\n    # Create a basic plan file if template doesn't exist\n    touch \"$IMPL_PLAN\"\nfi\n\n# Output results\nif $JSON_MODE; then\n    printf '{\"FEATURE_SPEC\":\"%s\",\"IMPL_PLAN\":\"%s\",\"SPECS_DIR\":\"%s\",\"BRANCH\":\"%s\",\"HAS_GIT\":\"%s\"}\\n' \\\n        \"$FEATURE_SPEC\" \"$IMPL_PLAN\" \"$FEATURE_DIR\" \"$CURRENT_BRANCH\" \"$HAS_GIT\"\nelse\n    echo \"FEATURE_SPEC: $FEATURE_SPEC\"\n    echo \"IMPL_PLAN: $IMPL_PLAN\" \n    echo \"SPECS_DIR: $FEATURE_DIR\"\n    echo \"BRANCH: $CURRENT_BRANCH\"\n    echo \"HAS_GIT: $HAS_GIT\"\nfi\n"
  },
  {
    "path": ".specify/scripts/bash/update-agent-context.sh",
    "content": "#!/usr/bin/env bash\n\n# Update agent context files with information from plan.md\n#\n# This script maintains AI agent context files by parsing feature specifications \n# and updating agent-specific configuration files with project information.\n#\n# MAIN FUNCTIONS:\n# 1. Environment Validation\n#    - Verifies git repository structure and branch information\n#    - Checks for required plan.md files and templates\n#    - Validates file permissions and accessibility\n#\n# 2. Plan Data Extraction\n#    - Parses plan.md files to extract project metadata\n#    - Identifies language/version, frameworks, databases, and project types\n#    - Handles missing or incomplete specification data gracefully\n#\n# 3. Agent File Management\n#    - Creates new agent context files from templates when needed\n#    - Updates existing agent files with new project information\n#    - Preserves manual additions and custom configurations\n#    - Supports multiple AI agent formats and directory structures\n#\n# 4. Content Generation\n#    - Generates language-specific build/test commands\n#    - Creates appropriate project directory structures\n#    - Updates technology stacks and recent changes sections\n#    - Maintains consistent formatting and timestamps\n#\n# 5. Multi-Agent Support\n#    - Handles agent-specific file paths and naming conventions\n#    - Supports: Claude, Gemini, Copilot, Cursor, Qwen, opencode, Codex, Windsurf\n#    - Can update single agents or all existing agent files\n#    - Creates default Claude file if no agent files exist\n#\n# Usage: ./update-agent-context.sh [agent_type]\n# Agent types: claude|gemini|copilot|cursor|qwen|opencode|codex|windsurf\n# Leave empty to update all existing agent files\n\nset -e\n\n# Enable strict error handling\nset -u\nset -o pipefail\n\n#==============================================================================\n# Configuration and Global Variables\n#==============================================================================\n\n# Get script directory and load common functions\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nsource \"$SCRIPT_DIR/common.sh\"\n\n# Get all paths and variables from common functions\neval $(get_feature_paths)\n\nNEW_PLAN=\"$IMPL_PLAN\"  # Alias for compatibility with existing code\nAGENT_TYPE=\"${1:-}\"\n\n# Agent-specific file paths  \nCLAUDE_FILE=\"$REPO_ROOT/CLAUDE.md\"\nGEMINI_FILE=\"$REPO_ROOT/GEMINI.md\"\nCOPILOT_FILE=\"$REPO_ROOT/.github/copilot-instructions.md\"\nCURSOR_FILE=\"$REPO_ROOT/.cursor/rules/specify-rules.mdc\"\nQWEN_FILE=\"$REPO_ROOT/QWEN.md\"\nAGENTS_FILE=\"$REPO_ROOT/AGENTS.md\"\nWINDSURF_FILE=\"$REPO_ROOT/.windsurf/rules/specify-rules.md\"\nKILOCODE_FILE=\"$REPO_ROOT/.kilocode/rules/specify-rules.md\"\nAUGGIE_FILE=\"$REPO_ROOT/.augment/rules/specify-rules.md\"\nROO_FILE=\"$REPO_ROOT/.roo/rules/specify-rules.md\"\n\n# Template file\nTEMPLATE_FILE=\"$REPO_ROOT/.specify/templates/agent-file-template.md\"\n\n# Global variables for parsed plan data\nNEW_LANG=\"\"\nNEW_FRAMEWORK=\"\"\nNEW_DB=\"\"\nNEW_PROJECT_TYPE=\"\"\n\n#==============================================================================\n# Utility Functions\n#==============================================================================\n\nlog_info() {\n    echo \"INFO: $1\"\n}\n\nlog_success() {\n    echo \"✓ $1\"\n}\n\nlog_error() {\n    echo \"ERROR: $1\" >&2\n}\n\nlog_warning() {\n    echo \"WARNING: $1\" >&2\n}\n\n# Cleanup function for temporary files\ncleanup() {\n    local exit_code=$?\n    rm -f /tmp/agent_update_*_$$\n    rm -f /tmp/manual_additions_$$\n    exit $exit_code\n}\n\n# Set up cleanup trap\ntrap cleanup EXIT INT TERM\n\n#==============================================================================\n# Validation Functions\n#==============================================================================\n\nvalidate_environment() {\n    # Check if we have a current branch/feature (git or non-git)\n    if [[ -z \"$CURRENT_BRANCH\" ]]; then\n        log_error \"Unable to determine current feature\"\n        if [[ \"$HAS_GIT\" == \"true\" ]]; then\n            log_info \"Make sure you're on a feature branch\"\n        else\n            log_info \"Set SPECIFY_FEATURE environment variable or create a feature first\"\n        fi\n        exit 1\n    fi\n    \n    # Check if plan.md exists\n    if [[ ! -f \"$NEW_PLAN\" ]]; then\n        log_error \"No plan.md found at $NEW_PLAN\"\n        log_info \"Make sure you're working on a feature with a corresponding spec directory\"\n        if [[ \"$HAS_GIT\" != \"true\" ]]; then\n            log_info \"Use: export SPECIFY_FEATURE=your-feature-name or create a new feature first\"\n        fi\n        exit 1\n    fi\n    \n    # Check if template exists (needed for new files)\n    if [[ ! -f \"$TEMPLATE_FILE\" ]]; then\n        log_warning \"Template file not found at $TEMPLATE_FILE\"\n        log_warning \"Creating new agent files will fail\"\n    fi\n}\n\n#==============================================================================\n# Plan Parsing Functions\n#==============================================================================\n\nextract_plan_field() {\n    local field_pattern=\"$1\"\n    local plan_file=\"$2\"\n    \n    grep \"^\\*\\*${field_pattern}\\*\\*: \" \"$plan_file\" 2>/dev/null | \\\n        head -1 | \\\n        sed \"s|^\\*\\*${field_pattern}\\*\\*: ||\" | \\\n        sed 's/^[ \\t]*//;s/[ \\t]*$//' | \\\n        grep -v \"NEEDS CLARIFICATION\" | \\\n        grep -v \"^N/A$\" || echo \"\"\n}\n\nparse_plan_data() {\n    local plan_file=\"$1\"\n    \n    if [[ ! -f \"$plan_file\" ]]; then\n        log_error \"Plan file not found: $plan_file\"\n        return 1\n    fi\n    \n    if [[ ! -r \"$plan_file\" ]]; then\n        log_error \"Plan file is not readable: $plan_file\"\n        return 1\n    fi\n    \n    log_info \"Parsing plan data from $plan_file\"\n    \n    NEW_LANG=$(extract_plan_field \"Language/Version\" \"$plan_file\")\n    NEW_FRAMEWORK=$(extract_plan_field \"Primary Dependencies\" \"$plan_file\")\n    NEW_DB=$(extract_plan_field \"Storage\" \"$plan_file\")\n    NEW_PROJECT_TYPE=$(extract_plan_field \"Project Type\" \"$plan_file\")\n    \n    # Log what we found\n    if [[ -n \"$NEW_LANG\" ]]; then\n        log_info \"Found language: $NEW_LANG\"\n    else\n        log_warning \"No language information found in plan\"\n    fi\n    \n    if [[ -n \"$NEW_FRAMEWORK\" ]]; then\n        log_info \"Found framework: $NEW_FRAMEWORK\"\n    fi\n    \n    if [[ -n \"$NEW_DB\" ]] && [[ \"$NEW_DB\" != \"N/A\" ]]; then\n        log_info \"Found database: $NEW_DB\"\n    fi\n    \n    if [[ -n \"$NEW_PROJECT_TYPE\" ]]; then\n        log_info \"Found project type: $NEW_PROJECT_TYPE\"\n    fi\n}\n\nformat_technology_stack() {\n    local lang=\"$1\"\n    local framework=\"$2\"\n    local parts=()\n    \n    # Add non-empty parts\n    [[ -n \"$lang\" && \"$lang\" != \"NEEDS CLARIFICATION\" ]] && parts+=(\"$lang\")\n    [[ -n \"$framework\" && \"$framework\" != \"NEEDS CLARIFICATION\" && \"$framework\" != \"N/A\" ]] && parts+=(\"$framework\")\n    \n    # Join with proper formatting\n    if [[ ${#parts[@]} -eq 0 ]]; then\n        echo \"\"\n    elif [[ ${#parts[@]} -eq 1 ]]; then\n        echo \"${parts[0]}\"\n    else\n        # Join multiple parts with \" + \"\n        local result=\"${parts[0]}\"\n        for ((i=1; i<${#parts[@]}; i++)); do\n            result=\"$result + ${parts[i]}\"\n        done\n        echo \"$result\"\n    fi\n}\n\n#==============================================================================\n# Template and Content Generation Functions\n#==============================================================================\n\nget_project_structure() {\n    local project_type=\"$1\"\n    \n    if [[ \"$project_type\" == *\"web\"* ]]; then\n        echo \"backend/\\\\nfrontend/\\\\ntests/\"\n    else\n        echo \"src/\\\\ntests/\"\n    fi\n}\n\nget_commands_for_language() {\n    local lang=\"$1\"\n    \n    case \"$lang\" in\n        *\"Python\"*)\n            echo \"cd src && pytest && ruff check .\"\n            ;;\n        *\"Rust\"*)\n            echo \"cargo test && cargo clippy\"\n            ;;\n        *\"JavaScript\"*|*\"TypeScript\"*)\n            echo \"npm test && npm run lint\"\n            ;;\n        *)\n            echo \"# Add commands for $lang\"\n            ;;\n    esac\n}\n\nget_language_conventions() {\n    local lang=\"$1\"\n    echo \"$lang: Follow standard conventions\"\n}\n\ncreate_new_agent_file() {\n    local target_file=\"$1\"\n    local temp_file=\"$2\"\n    local project_name=\"$3\"\n    local current_date=\"$4\"\n    \n    if [[ ! -f \"$TEMPLATE_FILE\" ]]; then\n        log_error \"Template not found at $TEMPLATE_FILE\"\n        return 1\n    fi\n    \n    if [[ ! -r \"$TEMPLATE_FILE\" ]]; then\n        log_error \"Template file is not readable: $TEMPLATE_FILE\"\n        return 1\n    fi\n    \n    log_info \"Creating new agent context file from template...\"\n    \n    if ! cp \"$TEMPLATE_FILE\" \"$temp_file\"; then\n        log_error \"Failed to copy template file\"\n        return 1\n    fi\n    \n    # Replace template placeholders\n    local project_structure\n    project_structure=$(get_project_structure \"$NEW_PROJECT_TYPE\")\n    \n    local commands\n    commands=$(get_commands_for_language \"$NEW_LANG\")\n    \n    local language_conventions\n    language_conventions=$(get_language_conventions \"$NEW_LANG\")\n    \n    # Perform substitutions with error checking using safer approach\n    # Escape special characters for sed by using a different delimiter or escaping\n    local escaped_lang=$(printf '%s\\n' \"$NEW_LANG\" | sed 's/[\\[\\.*^$()+{}|]/\\\\&/g')\n    local escaped_framework=$(printf '%s\\n' \"$NEW_FRAMEWORK\" | sed 's/[\\[\\.*^$()+{}|]/\\\\&/g')\n    local escaped_branch=$(printf '%s\\n' \"$CURRENT_BRANCH\" | sed 's/[\\[\\.*^$()+{}|]/\\\\&/g')\n    \n    # Build technology stack and recent change strings conditionally\n    local tech_stack\n    if [[ -n \"$escaped_lang\" && -n \"$escaped_framework\" ]]; then\n        tech_stack=\"- $escaped_lang + $escaped_framework ($escaped_branch)\"\n    elif [[ -n \"$escaped_lang\" ]]; then\n        tech_stack=\"- $escaped_lang ($escaped_branch)\"\n    elif [[ -n \"$escaped_framework\" ]]; then\n        tech_stack=\"- $escaped_framework ($escaped_branch)\"\n    else\n        tech_stack=\"- ($escaped_branch)\"\n    fi\n\n    local recent_change\n    if [[ -n \"$escaped_lang\" && -n \"$escaped_framework\" ]]; then\n        recent_change=\"- $escaped_branch: Added $escaped_lang + $escaped_framework\"\n    elif [[ -n \"$escaped_lang\" ]]; then\n        recent_change=\"- $escaped_branch: Added $escaped_lang\"\n    elif [[ -n \"$escaped_framework\" ]]; then\n        recent_change=\"- $escaped_branch: Added $escaped_framework\"\n    else\n        recent_change=\"- $escaped_branch: Added\"\n    fi\n\n    local substitutions=(\n        \"s|\\[PROJECT NAME\\]|$project_name|\"\n        \"s|\\[DATE\\]|$current_date|\"\n        \"s|\\[EXTRACTED FROM ALL PLAN.MD FILES\\]|$tech_stack|\"\n        \"s|\\[ACTUAL STRUCTURE FROM PLANS\\]|$project_structure|g\"\n        \"s|\\[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES\\]|$commands|\"\n        \"s|\\[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE\\]|$language_conventions|\"\n        \"s|\\[LAST 3 FEATURES AND WHAT THEY ADDED\\]|$recent_change|\"\n    )\n    \n    for substitution in \"${substitutions[@]}\"; do\n        if ! sed -i.bak -e \"$substitution\" \"$temp_file\"; then\n            log_error \"Failed to perform substitution: $substitution\"\n            rm -f \"$temp_file\" \"$temp_file.bak\"\n            return 1\n        fi\n    done\n    \n    # Convert \\n sequences to actual newlines\n    newline=$(printf '\\n')\n    sed -i.bak2 \"s/\\\\\\\\n/${newline}/g\" \"$temp_file\"\n    \n    # Clean up backup files\n    rm -f \"$temp_file.bak\" \"$temp_file.bak2\"\n    \n    return 0\n}\n\n\n\n\nupdate_existing_agent_file() {\n    local target_file=\"$1\"\n    local current_date=\"$2\"\n    \n    log_info \"Updating existing agent context file...\"\n    \n    # Use a single temporary file for atomic update\n    local temp_file\n    temp_file=$(mktemp) || {\n        log_error \"Failed to create temporary file\"\n        return 1\n    }\n    \n    # Process the file in one pass\n    local tech_stack=$(format_technology_stack \"$NEW_LANG\" \"$NEW_FRAMEWORK\")\n    local new_tech_entries=()\n    local new_change_entry=\"\"\n    \n    # Prepare new technology entries\n    if [[ -n \"$tech_stack\" ]] && ! grep -q \"$tech_stack\" \"$target_file\"; then\n        new_tech_entries+=(\"- $tech_stack ($CURRENT_BRANCH)\")\n    fi\n    \n    if [[ -n \"$NEW_DB\" ]] && [[ \"$NEW_DB\" != \"N/A\" ]] && [[ \"$NEW_DB\" != \"NEEDS CLARIFICATION\" ]] && ! grep -q \"$NEW_DB\" \"$target_file\"; then\n        new_tech_entries+=(\"- $NEW_DB ($CURRENT_BRANCH)\")\n    fi\n    \n    # Prepare new change entry\n    if [[ -n \"$tech_stack\" ]]; then\n        new_change_entry=\"- $CURRENT_BRANCH: Added $tech_stack\"\n    elif [[ -n \"$NEW_DB\" ]] && [[ \"$NEW_DB\" != \"N/A\" ]] && [[ \"$NEW_DB\" != \"NEEDS CLARIFICATION\" ]]; then\n        new_change_entry=\"- $CURRENT_BRANCH: Added $NEW_DB\"\n    fi\n    \n    # Process file line by line\n    local in_tech_section=false\n    local in_changes_section=false\n    local tech_entries_added=false\n    local changes_entries_added=false\n    local existing_changes_count=0\n    \n    while IFS= read -r line || [[ -n \"$line\" ]]; do\n        # Handle Active Technologies section\n        if [[ \"$line\" == \"## Active Technologies\" ]]; then\n            echo \"$line\" >> \"$temp_file\"\n            in_tech_section=true\n            continue\n        elif [[ $in_tech_section == true ]] && [[ \"$line\" =~ ^##[[:space:]] ]]; then\n            # Add new tech entries before closing the section\n            if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then\n                printf '%s\\n' \"${new_tech_entries[@]}\" >> \"$temp_file\"\n                tech_entries_added=true\n            fi\n            echo \"$line\" >> \"$temp_file\"\n            in_tech_section=false\n            continue\n        elif [[ $in_tech_section == true ]] && [[ -z \"$line\" ]]; then\n            # Add new tech entries before empty line in tech section\n            if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then\n                printf '%s\\n' \"${new_tech_entries[@]}\" >> \"$temp_file\"\n                tech_entries_added=true\n            fi\n            echo \"$line\" >> \"$temp_file\"\n            continue\n        fi\n        \n        # Handle Recent Changes section\n        if [[ \"$line\" == \"## Recent Changes\" ]]; then\n            echo \"$line\" >> \"$temp_file\"\n            # Add new change entry right after the heading\n            if [[ -n \"$new_change_entry\" ]]; then\n                echo \"$new_change_entry\" >> \"$temp_file\"\n            fi\n            in_changes_section=true\n            changes_entries_added=true\n            continue\n        elif [[ $in_changes_section == true ]] && [[ \"$line\" =~ ^##[[:space:]] ]]; then\n            echo \"$line\" >> \"$temp_file\"\n            in_changes_section=false\n            continue\n        elif [[ $in_changes_section == true ]] && [[ \"$line\" == \"- \"* ]]; then\n            # Keep only first 2 existing changes\n            if [[ $existing_changes_count -lt 2 ]]; then\n                echo \"$line\" >> \"$temp_file\"\n                ((existing_changes_count++))\n            fi\n            continue\n        fi\n        \n        # Update timestamp\n        if [[ \"$line\" =~ \\*\\*Last\\ updated\\*\\*:.*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] ]]; then\n            echo \"$line\" | sed \"s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/$current_date/\" >> \"$temp_file\"\n        else\n            echo \"$line\" >> \"$temp_file\"\n        fi\n    done < \"$target_file\"\n    \n    # Post-loop check: if we're still in the Active Technologies section and haven't added new entries\n    if [[ $in_tech_section == true ]] && [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then\n        printf '%s\\n' \"${new_tech_entries[@]}\" >> \"$temp_file\"\n    fi\n    \n    # Move temp file to target atomically\n    if ! mv \"$temp_file\" \"$target_file\"; then\n        log_error \"Failed to update target file\"\n        rm -f \"$temp_file\"\n        return 1\n    fi\n    \n    return 0\n}\n#==============================================================================\n# Main Agent File Update Function\n#==============================================================================\n\nupdate_agent_file() {\n    local target_file=\"$1\"\n    local agent_name=\"$2\"\n    \n    if [[ -z \"$target_file\" ]] || [[ -z \"$agent_name\" ]]; then\n        log_error \"update_agent_file requires target_file and agent_name parameters\"\n        return 1\n    fi\n    \n    log_info \"Updating $agent_name context file: $target_file\"\n    \n    local project_name\n    project_name=$(basename \"$REPO_ROOT\")\n    local current_date\n    current_date=$(date +%Y-%m-%d)\n    \n    # Create directory if it doesn't exist\n    local target_dir\n    target_dir=$(dirname \"$target_file\")\n    if [[ ! -d \"$target_dir\" ]]; then\n        if ! mkdir -p \"$target_dir\"; then\n            log_error \"Failed to create directory: $target_dir\"\n            return 1\n        fi\n    fi\n    \n    if [[ ! -f \"$target_file\" ]]; then\n        # Create new file from template\n        local temp_file\n        temp_file=$(mktemp) || {\n            log_error \"Failed to create temporary file\"\n            return 1\n        }\n        \n        if create_new_agent_file \"$target_file\" \"$temp_file\" \"$project_name\" \"$current_date\"; then\n            if mv \"$temp_file\" \"$target_file\"; then\n                log_success \"Created new $agent_name context file\"\n            else\n                log_error \"Failed to move temporary file to $target_file\"\n                rm -f \"$temp_file\"\n                return 1\n            fi\n        else\n            log_error \"Failed to create new agent file\"\n            rm -f \"$temp_file\"\n            return 1\n        fi\n    else\n        # Update existing file\n        if [[ ! -r \"$target_file\" ]]; then\n            log_error \"Cannot read existing file: $target_file\"\n            return 1\n        fi\n        \n        if [[ ! -w \"$target_file\" ]]; then\n            log_error \"Cannot write to existing file: $target_file\"\n            return 1\n        fi\n        \n        if update_existing_agent_file \"$target_file\" \"$current_date\"; then\n            log_success \"Updated existing $agent_name context file\"\n        else\n            log_error \"Failed to update existing agent file\"\n            return 1\n        fi\n    fi\n    \n    return 0\n}\n\n#==============================================================================\n# Agent Selection and Processing\n#==============================================================================\n\nupdate_specific_agent() {\n    local agent_type=\"$1\"\n    \n    case \"$agent_type\" in\n        claude)\n            update_agent_file \"$CLAUDE_FILE\" \"Claude Code\"\n            ;;\n        gemini)\n            update_agent_file \"$GEMINI_FILE\" \"Gemini CLI\"\n            ;;\n        copilot)\n            update_agent_file \"$COPILOT_FILE\" \"GitHub Copilot\"\n            ;;\n        cursor)\n            update_agent_file \"$CURSOR_FILE\" \"Cursor IDE\"\n            ;;\n        qwen)\n            update_agent_file \"$QWEN_FILE\" \"Qwen Code\"\n            ;;\n        opencode)\n            update_agent_file \"$AGENTS_FILE\" \"opencode\"\n            ;;\n        codex)\n            update_agent_file \"$AGENTS_FILE\" \"Codex CLI\"\n            ;;\n        windsurf)\n            update_agent_file \"$WINDSURF_FILE\" \"Windsurf\"\n            ;;\n        kilocode)\n            update_agent_file \"$KILOCODE_FILE\" \"Kilo Code\"\n            ;;\n        auggie)\n            update_agent_file \"$AUGGIE_FILE\" \"Auggie CLI\"\n            ;;\n        roo)\n            update_agent_file \"$ROO_FILE\" \"Roo Code\"\n            ;;\n        *)\n            log_error \"Unknown agent type '$agent_type'\"\n            log_error \"Expected: claude|gemini|copilot|cursor|qwen|opencode|codex|windsurf|kilocode|auggie|roo\"\n            exit 1\n            ;;\n    esac\n}\n\nupdate_all_existing_agents() {\n    local found_agent=false\n    \n    # Check each possible agent file and update if it exists\n    if [[ -f \"$CLAUDE_FILE\" ]]; then\n        update_agent_file \"$CLAUDE_FILE\" \"Claude Code\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$GEMINI_FILE\" ]]; then\n        update_agent_file \"$GEMINI_FILE\" \"Gemini CLI\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$COPILOT_FILE\" ]]; then\n        update_agent_file \"$COPILOT_FILE\" \"GitHub Copilot\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$CURSOR_FILE\" ]]; then\n        update_agent_file \"$CURSOR_FILE\" \"Cursor IDE\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$QWEN_FILE\" ]]; then\n        update_agent_file \"$QWEN_FILE\" \"Qwen Code\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$AGENTS_FILE\" ]]; then\n        update_agent_file \"$AGENTS_FILE\" \"Codex/opencode\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$WINDSURF_FILE\" ]]; then\n        update_agent_file \"$WINDSURF_FILE\" \"Windsurf\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$KILOCODE_FILE\" ]]; then\n        update_agent_file \"$KILOCODE_FILE\" \"Kilo Code\"\n        found_agent=true\n    fi\n\n    if [[ -f \"$AUGGIE_FILE\" ]]; then\n        update_agent_file \"$AUGGIE_FILE\" \"Auggie CLI\"\n        found_agent=true\n    fi\n    \n    if [[ -f \"$ROO_FILE\" ]]; then\n        update_agent_file \"$ROO_FILE\" \"Roo Code\"\n        found_agent=true\n    fi\n    \n    # If no agent files exist, create a default Claude file\n    if [[ \"$found_agent\" == false ]]; then\n        log_info \"No existing agent files found, creating default Claude file...\"\n        update_agent_file \"$CLAUDE_FILE\" \"Claude Code\"\n    fi\n}\nprint_summary() {\n    echo\n    log_info \"Summary of changes:\"\n    \n    if [[ -n \"$NEW_LANG\" ]]; then\n        echo \"  - Added language: $NEW_LANG\"\n    fi\n    \n    if [[ -n \"$NEW_FRAMEWORK\" ]]; then\n        echo \"  - Added framework: $NEW_FRAMEWORK\"\n    fi\n    \n    if [[ -n \"$NEW_DB\" ]] && [[ \"$NEW_DB\" != \"N/A\" ]]; then\n        echo \"  - Added database: $NEW_DB\"\n    fi\n    \n    echo\n    log_info \"Usage: $0 [claude|gemini|copilot|cursor|qwen|opencode|codex|windsurf|kilocode|auggie|roo]\"\n}\n\n#==============================================================================\n# Main Execution\n#==============================================================================\n\nmain() {\n    # Validate environment before proceeding\n    validate_environment\n    \n    log_info \"=== Updating agent context files for feature $CURRENT_BRANCH ===\"\n    \n    # Parse the plan file to extract project information\n    if ! parse_plan_data \"$NEW_PLAN\"; then\n        log_error \"Failed to parse plan data\"\n        exit 1\n    fi\n    \n    # Process based on agent type argument\n    local success=true\n    \n    if [[ -z \"$AGENT_TYPE\" ]]; then\n        # No specific agent provided - update all existing agent files\n        log_info \"No agent specified, updating all existing agent files...\"\n        if ! update_all_existing_agents; then\n            success=false\n        fi\n    else\n        # Specific agent provided - update only that agent\n        log_info \"Updating specific agent: $AGENT_TYPE\"\n        if ! update_specific_agent \"$AGENT_TYPE\"; then\n            success=false\n        fi\n    fi\n    \n    # Print summary\n    print_summary\n    \n    if [[ \"$success\" == true ]]; then\n        log_success \"Agent context update completed successfully\"\n        exit 0\n    else\n        log_error \"Agent context update completed with errors\"\n        exit 1\n    fi\n}\n\n# Execute main function if script is run directly\nif [[ \"${BASH_SOURCE[0]}\" == \"${0}\" ]]; then\n    main \"$@\"\nfi\n"
  },
  {
    "path": ".specify/templates/agent-file-template.md",
    "content": "# [PROJECT NAME] Development Guidelines\n\nAuto-generated from all feature plans. Last updated: [DATE]\n\n## Active Technologies\n[EXTRACTED FROM ALL PLAN.MD FILES]\n\n## Project Structure\n```\n[ACTUAL STRUCTURE FROM PLANS]\n```\n\n## Commands\n[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES]\n\n## Code Style\n[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE]\n\n## Recent Changes\n[LAST 3 FEATURES AND WHAT THEY ADDED]\n\n<!-- MANUAL ADDITIONS START -->\n<!-- MANUAL ADDITIONS END -->"
  },
  {
    "path": ".specify/templates/plan-template.md",
    "content": "\n# Implementation Plan: [FEATURE]\n\n**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]\n**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`\n\n## Execution Flow (/plan command scope)\n```\n1. Load feature spec from Input path\n   → If not found: ERROR \"No feature spec at {path}\"\n2. Fill Technical Context (scan for NEEDS CLARIFICATION)\n   → Detect Project Type from context (web=frontend+backend, mobile=app+api)\n   → Set Structure Decision based on project type\n3. Fill the Constitution Check section based on the content of the constitution document.\n4. Evaluate Constitution Check section below\n   → If violations exist: Document in Complexity Tracking\n   → If no justification possible: ERROR \"Simplify approach first\"\n   → Update Progress Tracking: Initial Constitution Check\n5. Execute Phase 0 → research.md\n   → If NEEDS CLARIFICATION remain: ERROR \"Resolve unknowns\"\n6. Execute Phase 1 → contracts, data-model.md, quickstart.md, agent-specific template file (e.g., `CLAUDE.md` for Claude Code, `.github/copilot-instructions.md` for GitHub Copilot, `GEMINI.md` for Gemini CLI, `QWEN.md` for Qwen Code or `AGENTS.md` for opencode).\n7. Re-evaluate Constitution Check section\n   → If new violations: Refactor design, return to Phase 1\n   → Update Progress Tracking: Post-Design Constitution Check\n8. Plan Phase 2 → Describe task generation approach (DO NOT create tasks.md)\n9. STOP - Ready for /tasks command\n```\n\n**IMPORTANT**: The /plan command STOPS at step 7. Phases 2-4 are executed by other commands:\n- Phase 2: /tasks command creates tasks.md\n- Phase 3-4: Implementation execution (manual or via tools)\n\n## Summary\n[Extract from feature spec: primary requirement + technical approach from research]\n\n## Technical Context\n**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]  \n**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]  \n**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]  \n**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]  \n**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]\n**Project Type**: [single/web/mobile - determines source structure]  \n**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]  \n**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]  \n**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]\n\n## Constitution Check\n*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*\n\n### I. Library-First Architecture\n- [ ] Core functionality implemented in `Separator` class or similar library pattern\n- [ ] CLI/Remote API are thin wrappers, not containing business logic\n- [ ] Clear separation between model architectures (MDX, VR, Demucs, MDXC)\n\n### II. Multi-Interface Consistency  \n- [ ] Feature accessible via Python API, CLI, and Remote API (if applicable)\n- [ ] Parameter names identical across all interfaces\n- [ ] Same model architectures supported across interfaces\n\n### III. Test-First Development (NON-NEGOTIABLE)\n- [ ] Tests written before implementation\n- [ ] Unit tests for all core functionality\n- [ ] Integration tests with audio validation (SSIM comparison)\n- [ ] CLI tests for all exposed functionality\n\n### IV. Performance & Resource Efficiency\n- [ ] Hardware acceleration support considered (CUDA, CoreML, DirectML)\n- [ ] Memory optimization for large files (streaming/batch processing)\n- [ ] Tunable parameters for different hardware capabilities\n\n### V. Model Architecture Separation\n- [ ] Each architecture in separate modules\n- [ ] Inherits from `CommonSeparator` pattern\n- [ ] Architecture-specific parameters isolated\n- [ ] Loading one architecture doesn't load others\n\n## Project Structure\n\n### Documentation (this feature)\n```\nspecs/[###-feature]/\n├── plan.md              # This file (/plan command output)\n├── research.md          # Phase 0 output (/plan command)\n├── data-model.md        # Phase 1 output (/plan command)\n├── quickstart.md        # Phase 1 output (/plan command)\n├── contracts/           # Phase 1 output (/plan command)\n└── tasks.md             # Phase 2 output (/tasks command - NOT created by /plan)\n```\n\n### Source Code (repository root)\n```\n# Option 1: Single project (DEFAULT)\nsrc/\n├── models/\n├── services/\n├── cli/\n└── lib/\n\ntests/\n├── contract/\n├── integration/\n└── unit/\n\n# Option 2: Web application (when \"frontend\" + \"backend\" detected)\nbackend/\n├── src/\n│   ├── models/\n│   ├── services/\n│   └── api/\n└── tests/\n\nfrontend/\n├── src/\n│   ├── components/\n│   ├── pages/\n│   └── services/\n└── tests/\n\n# Option 3: Mobile + API (when \"iOS/Android\" detected)\napi/\n└── [same as backend above]\n\nios/ or android/\n└── [platform-specific structure]\n```\n\n**Structure Decision**: [DEFAULT to Option 1 unless Technical Context indicates web/mobile app]\n\n## Phase 0: Outline & Research\n1. **Extract unknowns from Technical Context** above:\n   - For each NEEDS CLARIFICATION → research task\n   - For each dependency → best practices task\n   - For each integration → patterns task\n\n2. **Generate and dispatch research agents**:\n   ```\n   For each unknown in Technical Context:\n     Task: \"Research {unknown} for {feature context}\"\n   For each technology choice:\n     Task: \"Find best practices for {tech} in {domain}\"\n   ```\n\n3. **Consolidate findings** in `research.md` using format:\n   - Decision: [what was chosen]\n   - Rationale: [why chosen]\n   - Alternatives considered: [what else evaluated]\n\n**Output**: research.md with all NEEDS CLARIFICATION resolved\n\n## Phase 1: Design & Contracts\n*Prerequisites: research.md complete*\n\n1. **Extract entities from feature spec** → `data-model.md`:\n   - Entity name, fields, relationships\n   - Validation rules from requirements\n   - State transitions if applicable\n\n2. **Generate API contracts** from functional requirements:\n   - For each user action → endpoint\n   - Use standard REST/GraphQL patterns\n   - Output OpenAPI/GraphQL schema to `/contracts/`\n\n3. **Generate contract tests** from contracts:\n   - One test file per endpoint\n   - Assert request/response schemas\n   - Tests must fail (no implementation yet)\n\n4. **Extract test scenarios** from user stories:\n   - Each story → integration test scenario\n   - Quickstart test = story validation steps\n\n5. **Update agent file incrementally** (O(1) operation):\n   - Run `.specify/scripts/bash/update-agent-context.sh cursor`\n     **IMPORTANT**: Execute it exactly as specified above. Do not add or remove any arguments.\n   - If exists: Add only NEW tech from current plan\n   - Preserve manual additions between markers\n   - Update recent changes (keep last 3)\n   - Keep under 150 lines for token efficiency\n   - Output to repository root\n\n**Output**: data-model.md, /contracts/*, failing tests, quickstart.md, agent-specific file\n\n## Phase 2: Task Planning Approach\n*This section describes what the /tasks command will do - DO NOT execute during /plan*\n\n**Task Generation Strategy**:\n- Load `.specify/templates/tasks-template.md` as base\n- Generate tasks from Phase 1 design docs (contracts, data model, quickstart)\n- Each contract → contract test task [P]\n- Each entity → model creation task [P] \n- Each user story → integration test task\n- Implementation tasks to make tests pass\n\n**Ordering Strategy**:\n- TDD order: Tests before implementation \n- Dependency order: Models before services before UI\n- Mark [P] for parallel execution (independent files)\n\n**Estimated Output**: 25-30 numbered, ordered tasks in tasks.md\n\n**IMPORTANT**: This phase is executed by the /tasks command, NOT by /plan\n\n## Phase 3+: Future Implementation\n*These phases are beyond the scope of the /plan command*\n\n**Phase 3**: Task execution (/tasks command creates tasks.md)  \n**Phase 4**: Implementation (execute tasks.md following constitutional principles)  \n**Phase 5**: Validation (run tests, execute quickstart.md, performance validation)\n\n## Complexity Tracking\n*Fill ONLY if Constitution Check has violations that must be justified*\n\n| Violation | Why Needed | Simpler Alternative Rejected Because |\n|-----------|------------|-------------------------------------|\n| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |\n| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |\n\n\n## Progress Tracking\n*This checklist is updated during execution flow*\n\n**Phase Status**:\n- [ ] Phase 0: Research complete (/plan command)\n- [ ] Phase 1: Design complete (/plan command)\n- [ ] Phase 2: Task planning complete (/plan command - describe approach only)\n- [ ] Phase 3: Tasks generated (/tasks command)\n- [ ] Phase 4: Implementation complete\n- [ ] Phase 5: Validation passed\n\n**Gate Status**:\n- [ ] Initial Constitution Check: PASS\n- [ ] Post-Design Constitution Check: PASS\n- [ ] All NEEDS CLARIFICATION resolved\n- [ ] Complexity deviations documented\n\n---\n*Based on Constitution v1.0.0 - See `.specify/memory/constitution.md`*\n"
  },
  {
    "path": ".specify/templates/spec-template.md",
    "content": "# Feature Specification: [FEATURE NAME]\n\n**Feature Branch**: `[###-feature-name]`  \n**Created**: [DATE]  \n**Status**: Draft  \n**Input**: User description: \"$ARGUMENTS\"\n\n## Execution Flow (main)\n```\n1. Parse user description from Input\n   → If empty: ERROR \"No feature description provided\"\n2. Extract key concepts from description\n   → Identify: actors, actions, data, constraints\n3. For each unclear aspect:\n   → Mark with [NEEDS CLARIFICATION: specific question]\n4. Fill User Scenarios & Testing section\n   → If no clear user flow: ERROR \"Cannot determine user scenarios\"\n5. Generate Functional Requirements\n   → Each requirement must be testable\n   → Mark ambiguous requirements\n6. Identify Key Entities (if data involved)\n7. Run Review Checklist\n   → If any [NEEDS CLARIFICATION]: WARN \"Spec has uncertainties\"\n   → If implementation details found: ERROR \"Remove tech details\"\n8. Return: SUCCESS (spec ready for planning)\n```\n\n---\n\n## ⚡ Quick Guidelines\n- ✅ Focus on WHAT users need and WHY\n- ❌ Avoid HOW to implement (no tech stack, APIs, code structure)\n- 👥 Written for business stakeholders, not developers\n\n### Section Requirements\n- **Mandatory sections**: Must be completed for every feature\n- **Optional sections**: Include only when relevant to the feature\n- When a section doesn't apply, remove it entirely (don't leave as \"N/A\")\n\n### For AI Generation\nWhen creating this spec from a user prompt:\n1. **Mark all ambiguities**: Use [NEEDS CLARIFICATION: specific question] for any assumption you'd need to make\n2. **Don't guess**: If the prompt doesn't specify something (e.g., \"login system\" without auth method), mark it\n3. **Think like a tester**: Every vague requirement should fail the \"testable and unambiguous\" checklist item\n4. **Common underspecified areas**:\n   - User types and permissions\n   - Data retention/deletion policies  \n   - Performance targets and scale\n   - Error handling behaviors\n   - Integration requirements\n   - Security/compliance needs\n\n---\n\n## User Scenarios & Testing *(mandatory)*\n\n### Primary User Story\n[Describe the main user journey in plain language]\n\n### Acceptance Scenarios\n1. **Given** [initial state], **When** [action], **Then** [expected outcome]\n2. **Given** [initial state], **When** [action], **Then** [expected outcome]\n\n### Edge Cases\n- What happens when [boundary condition]?\n- How does system handle [error scenario]?\n\n## Requirements *(mandatory)*\n\n### Functional Requirements\n- **FR-001**: System MUST [specific capability, e.g., \"allow users to create accounts\"]\n- **FR-002**: System MUST [specific capability, e.g., \"validate email addresses\"]  \n- **FR-003**: Users MUST be able to [key interaction, e.g., \"reset their password\"]\n- **FR-004**: System MUST [data requirement, e.g., \"persist user preferences\"]\n- **FR-005**: System MUST [behavior, e.g., \"log all security events\"]\n\n*Example of marking unclear requirements:*\n- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]\n- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]\n\n### Key Entities *(include if feature involves data)*\n- **[Entity 1]**: [What it represents, key attributes without implementation]\n- **[Entity 2]**: [What it represents, relationships to other entities]\n\n---\n\n## Review & Acceptance Checklist\n*GATE: Automated checks run during main() execution*\n\n### Content Quality\n- [ ] No implementation details (languages, frameworks, APIs)\n- [ ] Focused on user value and business needs\n- [ ] Written for non-technical stakeholders\n- [ ] All mandatory sections completed\n\n### Requirement Completeness\n- [ ] No [NEEDS CLARIFICATION] markers remain\n- [ ] Requirements are testable and unambiguous  \n- [ ] Success criteria are measurable\n- [ ] Scope is clearly bounded\n- [ ] Dependencies and assumptions identified\n\n---\n\n## Execution Status\n*Updated by main() during processing*\n\n- [ ] User description parsed\n- [ ] Key concepts extracted\n- [ ] Ambiguities marked\n- [ ] User scenarios defined\n- [ ] Requirements generated\n- [ ] Entities identified\n- [ ] Review checklist passed\n\n---\n"
  },
  {
    "path": ".specify/templates/tasks-template.md",
    "content": "# Tasks: [FEATURE NAME]\n\n**Input**: Design documents from `/specs/[###-feature-name]/`\n**Prerequisites**: plan.md (required), research.md, data-model.md, contracts/\n\n## Execution Flow (main)\n```\n1. Load plan.md from feature directory\n   → If not found: ERROR \"No implementation plan found\"\n   → Extract: tech stack, libraries, structure\n2. Load optional design documents:\n   → data-model.md: Extract entities → model tasks\n   → contracts/: Each file → contract test task\n   → research.md: Extract decisions → setup tasks\n3. Generate tasks by category:\n   → Setup: project init, dependencies, linting\n   → Tests: contract tests, integration tests\n   → Core: models, services, CLI commands\n   → Integration: DB, middleware, logging\n   → Polish: unit tests, performance, docs\n4. Apply task rules:\n   → Different files = mark [P] for parallel\n   → Same file = sequential (no [P])\n   → Tests before implementation (TDD)\n5. Number tasks sequentially (T001, T002...)\n6. Generate dependency graph\n7. Create parallel execution examples\n8. Validate task completeness:\n   → All contracts have tests?\n   → All entities have models?\n   → All endpoints implemented?\n9. Return: SUCCESS (tasks ready for execution)\n```\n\n## Format: `[ID] [P?] Description`\n- **[P]**: Can run in parallel (different files, no dependencies)\n- Include exact file paths in descriptions\n\n## Path Conventions\n- **Single project**: `src/`, `tests/` at repository root\n- **Web app**: `backend/src/`, `frontend/src/`\n- **Mobile**: `api/src/`, `ios/src/` or `android/src/`\n- Paths shown below assume single project - adjust based on plan.md structure\n\n## Phase 3.1: Setup\n- [ ] T001 Create project structure per implementation plan\n- [ ] T002 Initialize [language] project with [framework] dependencies\n- [ ] T003 [P] Configure linting and formatting tools\n\n## Phase 3.2: Tests First (TDD) ⚠️ MUST COMPLETE BEFORE 3.3\n**CRITICAL: These tests MUST be written and MUST FAIL before ANY implementation**\n- [ ] T004 [P] Contract test POST /api/users in tests/contract/test_users_post.py\n- [ ] T005 [P] Contract test GET /api/users/{id} in tests/contract/test_users_get.py\n- [ ] T006 [P] Integration test user registration in tests/integration/test_registration.py\n- [ ] T007 [P] Integration test auth flow in tests/integration/test_auth.py\n\n## Phase 3.3: Core Implementation (ONLY after tests are failing)\n- [ ] T008 [P] User model in src/models/user.py\n- [ ] T009 [P] UserService CRUD in src/services/user_service.py\n- [ ] T010 [P] CLI --create-user in src/cli/user_commands.py\n- [ ] T011 POST /api/users endpoint\n- [ ] T012 GET /api/users/{id} endpoint\n- [ ] T013 Input validation\n- [ ] T014 Error handling and logging\n\n## Phase 3.4: Integration\n- [ ] T015 Connect UserService to DB\n- [ ] T016 Auth middleware\n- [ ] T017 Request/response logging\n- [ ] T018 CORS and security headers\n\n## Phase 3.5: Polish\n- [ ] T019 [P] Unit tests for validation in tests/unit/test_validation.py\n- [ ] T020 Performance tests (<200ms)\n- [ ] T021 [P] Update docs/api.md\n- [ ] T022 Remove duplication\n- [ ] T023 Run manual-testing.md\n\n## Dependencies\n- Tests (T004-T007) before implementation (T008-T014)\n- T008 blocks T009, T015\n- T016 blocks T018\n- Implementation before polish (T019-T023)\n\n## Parallel Example\n```\n# Launch T004-T007 together:\nTask: \"Contract test POST /api/users in tests/contract/test_users_post.py\"\nTask: \"Contract test GET /api/users/{id} in tests/contract/test_users_get.py\"\nTask: \"Integration test registration in tests/integration/test_registration.py\"\nTask: \"Integration test auth in tests/integration/test_auth.py\"\n```\n\n## Notes\n- [P] tasks = different files, no dependencies\n- Verify tests fail before implementing\n- Commit after each task\n- Avoid: vague tasks, same file conflicts\n\n## Task Generation Rules\n*Applied during main() execution*\n\n1. **From Contracts**:\n   - Each contract file → contract test task [P]\n   - Each endpoint → implementation task\n   \n2. **From Data Model**:\n   - Each entity → model creation task [P]\n   - Relationships → service layer tasks\n   \n3. **From User Stories**:\n   - Each story → integration test [P]\n   - Quickstart scenarios → validation tasks\n\n4. **Ordering**:\n   - Setup → Tests → Models → Services → Endpoints → Polish\n   - Dependencies block parallel execution\n\n## Validation Checklist\n*GATE: Checked by main() before returning*\n\n- [ ] All contracts have corresponding tests\n- [ ] All entities have model tasks\n- [ ] All tests come before implementation\n- [ ] Parallel tasks truly independent\n- [ ] Each task specifies exact file path\n- [ ] No task modifies same file as another [P] task"
  },
  {
    "path": "Dockerfile.cloudrun",
    "content": "# Audio Separator API - Cloud Run GPU Deployment\n# Optimized for NVIDIA L4 GPU on Google Cloud Run\n#\n# Models are baked into the image for zero cold-start latency.\n# To update models, rebuild the image.\n#\n# Build: docker build -f Dockerfile.cloudrun -t audio-separator-cloudrun .\n# Run:   docker run --gpus all -p 8080:8080 audio-separator-cloudrun\n\nFROM nvidia/cuda:12.6.3-runtime-ubuntu22.04\n\n# Prevent interactive prompts during package installation\nENV DEBIAN_FRONTEND=noninteractive\n\n# Install Python 3.12 from deadsnakes PPA (onnxruntime-gpu requires >= 3.11)\n# and system dependencies\nRUN apt-get update && apt-get install -y --no-install-recommends \\\n    software-properties-common \\\n    && add-apt-repository -y ppa:deadsnakes/ppa \\\n    && apt-get update && apt-get install -y --no-install-recommends \\\n    # Python 3.12\n    python3.12 \\\n    python3.12-dev \\\n    python3.12-venv \\\n    # FFmpeg\n    ffmpeg \\\n    # Audio libraries\n    libsndfile1 \\\n    libsndfile1-dev \\\n    libsox-dev \\\n    sox \\\n    libportaudio2 \\\n    portaudio19-dev \\\n    libasound2-dev \\\n    libpulse-dev \\\n    libjack-dev \\\n    libsamplerate0 \\\n    libsamplerate0-dev \\\n    # Build tools (for compiling Python packages with C extensions)\n    build-essential \\\n    gcc \\\n    g++ \\\n    pkg-config \\\n    # Utilities\n    curl \\\n    && rm -rf /var/lib/apt/lists/* \\\n    && python3.12 --version && ffmpeg -version\n\n# Set Python 3.12 as default and install pip\nRUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 \\\n    && update-alternatives --install /usr/bin/python python /usr/bin/python3.12 1 \\\n    && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12 \\\n    && python3 -m pip install --no-cache-dir --upgrade pip setuptools wheel\n\n# Install audio-separator with GPU support and API dependencies\nCOPY . /tmp/audio-separator-src\nRUN cd /tmp/audio-separator-src \\\n    && pip install --no-cache-dir \".[gpu]\" \\\n    && pip install --no-cache-dir \\\n        \"fastapi>=0.104.0\" \\\n        \"uvicorn[standard]>=0.24.0\" \\\n        \"python-multipart>=0.0.6\" \\\n        \"filetype>=1.2.0\" \\\n        \"google-cloud-storage>=2.0.0\" \\\n        \"google-cloud-firestore>=2.0.0\" \\\n    && rm -rf /tmp/audio-separator-src\n\n# Set up CUDA library paths\nRUN echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf && ldconfig\n\n# Environment configuration\nENV MODEL_DIR=/models \\\n    STORAGE_DIR=/tmp/storage \\\n    PORT=8080 \\\n    LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH \\\n    PATH=/usr/local/cuda/bin:$PATH \\\n    PYTHONUNBUFFERED=1\n\n# Create directories\nRUN mkdir -p /models /tmp/storage/outputs\n\n# Bake ensemble preset models into the image.\n# These are the models used by the default presets (instrumental_clean + karaoke).\n# Total: ~1-1.5 GB. This eliminates cold-start model download time.\nCOPY scripts/download_preset_models.py /tmp/download_preset_models.py\nRUN python3 /tmp/download_preset_models.py && rm /tmp/download_preset_models.py && ls -lh /models/\n\n# Expose Cloud Run default port\nEXPOSE 8080\n\n# Health check for container orchestration\nHEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \\\n    CMD curl -f http://localhost:8080/health || exit 1\n\n# Run the API server\nCMD [\"python3\", \"-m\", \"audio_separator.remote.deploy_cloudrun\"]\n"
  },
  {
    "path": "Dockerfile.cpu",
    "content": "# Use an official Python runtime as a parent image\nFROM python:3.12-slim\n\n# Set the working directory in the container\nWORKDIR /workdir\n\n# Install necessary packages\nRUN apt-get update && apt-get install -y ffmpeg build-essential\n\nRUN python -m pip install --upgrade pip\n\nRUN --mount=type=cache,target=/root/.cache \\\n    pip install \"audio-separator[cpu]\"\n\n# Run audio-separator when the container launches\nENTRYPOINT [\"audio-separator\"]\n"
  },
  {
    "path": "Dockerfile.cuda11",
    "content": "# Use an official Python runtime as a parent image\nFROM nvidia/cuda:11.8.0-base-ubuntu22.04\n\n# Set the working directory in the container\nWORKDIR /workdir\n\nRUN apt-get update && apt-get install -y \\\n    ffmpeg \\\n    build-essential \\\n    python3 \\\n    python3-pip \\\n    && rm -rf /var/lib/apt/lists/*\n\nRUN python3 -m pip install --upgrade pip\n\n# Install the GPU version of audio-separator, which installs the CUDA 11 compatible version of ONNXRuntime\n# (the default CUDA version for ORT is 11.8, see https://onnxruntime.ai/docs/install/ for more info)\nRUN --mount=type=cache,target=/root/.cache \\\n    pip3 install \"audio-separator[gpu]\"\n\n# Run audio-separator when the container launches\nENTRYPOINT [\"audio-separator\"]\n"
  },
  {
    "path": "Dockerfile.cuda12",
    "content": "# Use the latest CUDA 12 runtime as base image\nFROM nvidia/cuda:12.3.1-devel-ubuntu22.04\n\n# Set the working directory in the container\nWORKDIR /workdir\n\nRUN apt-get update && apt-get install -y \\\n    ffmpeg \\\n    build-essential \\\n    python3 \\\n    python3-pip \\\n    && rm -rf /var/lib/apt/lists/*\n\nRUN python3 -m pip install --upgrade pip\n\n# Install the CUDA 12 compatible version of ONNXRuntime (the default CUDA version for ORT is still 11.8 so they've provided a separate package index)\n# See https://onnxruntime.ai/docs/install/\nRUN pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/\n\n# Install audio-separator without any specific onnxruntime (onnxruntime should already be satisfied by the above)\nRUN --mount=type=cache,target=/root/.cache \\\n    pip3 install \"audio-separator\"\n\n# Run audio-separator when the container launches\nENTRYPOINT [\"audio-separator\"]\n"
  },
  {
    "path": "Dockerfile.runpod-cuda11",
    "content": "# Runpod Base image: https://github.com/runpod/containers/blob/main/official-templates/base/Dockerfile\nFROM runpod/base:0.5.0-cuda11.8.0\n\nRUN apt-get update && apt-get install -y \\\n    ffmpeg \\\n    && rm -rf /var/lib/apt/lists/*\n\nRUN python3 -m pip install --upgrade pip\n\n# Install the CUDA 11 compatible version of ONNXRuntime (The default CUDA version for ORT is 11.8)\n# See https://onnxruntime.ai/docs/install/\nRUN pip install onnxruntime-gpu\n\n# Install audio-separator without any specific onnxruntime (onnxruntime should already be satisfied by the above)\nRUN --mount=type=cache,target=/root/.cache \\\n    pip3 install \"audio-separator\"\n"
  },
  {
    "path": "Dockerfile.runpod-cuda12",
    "content": "# Runpod Base image: https://github.com/runpod/containers/blob/main/official-templates/base/Dockerfile\nFROM runpod/base:0.6.2-cuda12.1.0\n\nRUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\nRUN dpkg -i cuda-keyring_1.1-1_all.deb\n\nRUN apt-get update && apt-get upgrade -y\n\nRUN apt-get install -y \\\n    ffmpeg \\\n    cuda-toolkit \\\n    cudnn9-cuda-12\n\nRUN python3 -m pip install --upgrade pip\n\n# Install the CUDA 12 compatible version of ONNXRuntime (the default CUDA version for ORT is still 11.8 so they've provided a separate package index)\n# See https://onnxruntime.ai/docs/install/\nRUN pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/\n\n# Install audio-separator without any specific onnxruntime (onnxruntime should already be satisfied by the above)\nRUN --mount=type=cache,target=/root/.cache \\\n    pip3 install \"audio-separator\"\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 karaokenerds\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "<div align=\"center\">\n\n# 🎶 Audio Separator 🎶\n\n[![PyPI version](https://badge.fury.io/py/audio-separator.svg)](https://badge.fury.io/py/audio-separator)\n[![Conda Version](https://img.shields.io/conda/vn/conda-forge/audio-separator.svg)](https://anaconda.org/conda-forge/audio-separator)\n[![Docker pulls](https://img.shields.io/docker/pulls/beveradb/audio-separator.svg)](https://hub.docker.com/r/beveradb/audio-separator/tags)\n[![codecov](https://codecov.io/gh/karaokenerds/python-audio-separator/graph/badge.svg?token=N7YK4ET5JP)](https://codecov.io/gh/karaokenerds/python-audio-separator)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1gSlmSmna7f7fH6OjsiMEDLl-aJ9kGPkY?usp=sharing)\n[![Open In Huggingface](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/nomadkaraoke/audio-separator)\n\n</div>\n\n**Summary:** Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net, VR Arch, Demucs and MDXC models available in UVR by @Anjok07 & @aufr33.\n\nAudio Separator is a Python package that allows you to separate an audio file into various stems, using models trained by @Anjok07 for use with [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui).\n\nThe simplest (and probably most used) use case for this package is to separate an audio file into two stems, Instrumental and Vocals, which can be very useful for producing karaoke videos! However, the models available in UVR can separate audio into many more stems, such as Drums, Bass, Piano, and Guitar, and perform other audio processing tasks, such as denoising or removing echo/reverb.\n\n<details>\n<summary align=\"center\"><b>Table of Contents</div></b></summary>\n\n- [🎶 Audio Separator 🎶](#-audio-separator-)\n  - [Features](#features)\n  - [Installation 🛠️](#installation-%EF%B8%8F)\n    - [🐳 Docker](#-docker)\n    - [🎮 Nvidia GPU with CUDA or 🧪 Google Colab](#-nvidia-gpu-with-cuda-or--google-colab)\n    - [ Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)](#-apple-silicon-macos-sonoma-with-m1-or-newer-cpu-coreml-acceleration)\n    - [🐢 No hardware acceleration, CPU only](#-no-hardware-acceleration-cpu-only)\n    - [🎥 FFmpeg dependency](#-ffmpeg-dependency)\n  - [GPU / CUDA specific installation steps with Pip](#gpu--cuda-specific-installation-steps-with-pip)\n    - [Multiple CUDA library versions may be needed](#multiple-cuda-library-versions-may-be-needed)\n  - [Usage 🚀](#usage-)\n    - [Command Line Interface (CLI)](#command-line-interface-cli)\n    - [Listing and Filtering Available Models](#listing-and-filtering-available-models)\n      - [Filtering Models](#filtering-models)\n      - [Limiting Results](#limiting-results)\n      - [JSON Output](#json-output)\n    - [Full command-line interface options](#full-command-line-interface-options)\n    - [As a Dependency in a Python Project](#as-a-dependency-in-a-python-project)\n      - [Batch processing and processing with multiple models](#batch-processing-and-processing-with-multiple-models)\n      - [Renaming Stems](#renaming-stems)\n  - [Parameters for the Separator class](#parameters-for-the-separator-class)\n  - [Remote API Usage 🌐](#remote-api-usage-)\n  - [Requirements 📋](#requirements-)\n  - [Developing Locally](#developing-locally)\n    - [Prerequisites](#prerequisites)\n    - [Clone the Repository](#clone-the-repository)\n    - [Create and activate the Conda Environment](#create-and-activate-the-conda-environment)\n    - [Install Dependencies](#install-dependencies)\n    - [Running the Command-Line Interface Locally](#running-the-command-line-interface-locally)\n    - [Deactivate the Virtual Environment](#deactivate-the-virtual-environment)\n    - [Building the Package](#building-the-package)\n  - [Contributing 🤝](#contributing-)\n  - [License 📄](#license-)\n  - [Credits 🙏](#credits-)\n  - [Contact 💌](#contact-)\n  - [Thanks to all contributors for their efforts](#thanks-to-all-contributors-for-their-efforts)\n</details>\n\n---\n\n## Features\n\n- Separate audio into multiple stems, e.g. instrumental and vocals.\n- Supports all common audio formats (WAV, MP3, FLAC, M4A, etc.)\n- Ability to inference using a pre-trained model in PTH or ONNX format.\n- CLI support for easy use in scripts and batch processing.\n- Python API for integration into other projects.\n\n## Installation 🛠️\n\n### 🐳 Docker\n\nIf you're able to use docker, you don't actually need to _install_ anything - there are [images published on Docker Hub](https://hub.docker.com/r/beveradb/audio-separator/tags) for GPU (CUDA) and CPU inferencing, for both `amd64` and `arm64` platforms.\n\nYou probably want to volume-mount a folder containing whatever file you want to separate, which can then also be used as the output folder.\n\nFor instance, if your current directory has the file `input.wav`, you could execute `audio-separator` as shown below (see [usage](#usage-) section for more details):\n\n```sh\ndocker run -it -v `pwd`:/workdir beveradb/audio-separator input.wav\n```\n\nIf you're using a machine with a GPU, you'll want to use the GPU specific image and pass in the GPU device to the container, like this:\n\n```sh\ndocker run -it --gpus all -v `pwd`:/workdir beveradb/audio-separator:gpu input.wav\n```\n\nIf the GPU isn't being detected, make sure your docker runtime environment is passing through the GPU correctly - there are [various guides](https://www.celantur.com/blog/run-cuda-in-docker-on-linux/) online to help with that.\n\n### 🎮 Nvidia GPU with CUDA or 🧪 Google Colab\n\n**Supported CUDA Versions:** 11.8 and 12.2\n\n💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:\n `ONNXruntime has CUDAExecutionProvider available, enabling acceleration`\n\nConda:\n```sh\nconda install pytorch=*=*cuda* onnxruntime=*=*cuda* audio-separator -c pytorch -c conda-forge\n```\n\nPip:\n```sh\npip install \"audio-separator[gpu]\"\n```\n\nDocker:\n```sh\nbeveradb/audio-separator:gpu\n```\n\n###  Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)\n\n💬 If successfully configured, you should see this log message when running `audio-separator --env_info`:\n `ONNXruntime has CoreMLExecutionProvider available, enabling acceleration`\n\nPip:\n```sh\npip install \"audio-separator[cpu]\"\n```\n\n### 🐢 No hardware acceleration, CPU only\n\nConda:\n```sh\nconda install audio-separator -c pytorch -c conda-forge\n```\n\nPip:\n```sh\npip install \"audio-separator[cpu]\"\n```\n\nDocker:\n```sh\nbeveradb/audio-separator\n```\n\n### 🎥 FFmpeg dependency\n\n💬 To test if `audio-separator` has been successfully configured to use FFmpeg, run `audio-separator --env_info`. The log will show `FFmpeg installed`.\n\nIf you installed `audio-separator` using `conda` or `docker`, FFmpeg should already be available in your environment.\n\nYou may need to separately install FFmpeg. It should be easy to install on most platforms, e.g.:\n\n🐧 Debian/Ubuntu:\n```sh\napt-get update; apt-get install -y ffmpeg\n```\n\n macOS:\n```sh\nbrew update; brew install ffmpeg\n```\n\n## GPU / CUDA specific installation steps with Pip\n\nIn theory, all you should need to do to get `audio-separator` working with a GPU is install it with the `[gpu]` extra as above.\n\nHowever, sometimes getting both PyTorch and ONNX Runtime working with CUDA support can be a bit tricky so it may not work that easily.\n\nYou may need to reinstall both packages directly, allowing pip to calculate the right versions for your platform, for example:\n\n- `pip uninstall torch onnxruntime`\n- `pip cache purge`\n- `pip install --force-reinstall torch torchvision torchaudio`\n- `pip install --force-reinstall onnxruntime-gpu`\n\nI generally recommend installing the latest version of PyTorch for your environment using the command recommended by the wizard here:\n<https://pytorch.org/get-started/locally/>\n\n### Multiple CUDA library versions may be needed\n\nDepending on your CUDA version and environment, you may need to install specific version(s) of CUDA libraries for ONNX Runtime to use your GPU.\n\n🧪 Google Colab, for example, now uses CUDA 12 by default, but ONNX Runtime still needs CUDA 11 libraries to work.\n\nIf you see the error `Failed to load library` or `cannot open shared object file` when you run `audio-separator`, this is likely the issue.\n\nYou can install the CUDA 11 libraries _alongside_ CUDA 12 like so:\n```sh\napt update; apt install nvidia-cuda-toolkit\n```\n\nIf you encounter the following messages when running on Google Colab or in another environment:\n```\n[E:onnxruntime:Default, provider_bridge_ort.cc:1862 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1539 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn_adv.so.9: cannot open shared object file: No such file or directory\n\n[W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.\n```\nYou can resolve this by running the following command:\n```sh\npython -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/\n```\n\n> Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR!\n\n## Usage 🚀\n\n### Command Line Interface (CLI)\n\nYou can use Audio Separator via the command line, for example:\n\n```sh\naudio-separator /path/to/your/input/audio.wav --model_filename model_bs_roformer_ep_317_sdr_12.9755.ckpt\n```\n\nThis command will download the specified model file, process the `audio.wav` input audio and generate two new files in the current directory, one containing vocals and one containing instrumental.\n\n**Note:** You do not need to download any files yourself - audio-separator does that automatically for you!\n\nTo see a list of supported models, run `audio-separator --list_models`\n\nAny file listed in the list models output can be specified (with file extension) with the model_filename parameter (e.g. `--model_filename UVR_MDXNET_KARA_2.onnx`) and it will be automatically downloaded to the `--model_file_dir` (default: `/tmp/audio-separator-models/`) folder on first usage.\n\n### Listing and Filtering Available Models\n\nYou can view all available models using the `--list_models` (or `-l`) flag:\n\n```sh\naudio-separator --list_models\n```\n\nThe output shows a table with the following columns:\n- Model Filename: The filename to use with `--model_filename`\n- Arch: The model architecture (MDX, MDXC, Demucs, etc.)\n- Output Stems (SDR): The stems this model can separate, with Signal-to-Distortion Ratio scores where available\n- Friendly Name: A human-readable name describing the model\n\n#### Filtering Models\n\nYou can filter and sort the model list by stem type using `--list_filter`. For example, to find models that can separate drums:\n\n```sh\naudio-separator -l --list_filter=drums\n```\n\nExample output:\n```\n-----------------------------------------------------------------------------------------------------------------------------------\nModel Filename        Arch    Output Stems (SDR)                                            Friendly Name\n-----------------------------------------------------------------------------------------------------------------------------------\nhtdemucs_ft.yaml      Demucs  vocals (10.8), drums (10.1), bass (11.9), other               Demucs v4: htdemucs_ft\nhdemucs_mmi.yaml      Demucs  vocals (10.3), drums (9.7), bass (12.0), other                Demucs v4: hdemucs_mmi\nhtdemucs.yaml         Demucs  vocals (10.0), drums (9.4), bass (11.3), other                Demucs v4: htdemucs\nhtdemucs_6s.yaml      Demucs  vocals (9.7), drums (8.5), bass (10.0), guitar, piano, other  Demucs v4: htdemucs_6s\n```\n\n#### Limiting Results\n\nYou can limit the number of results shown using `--list_limit`. This is useful for finding the best performing models for a particular stem. For example, to see the top 5 vocal separation models:\n\n```sh\naudio-separator -l --list_filter=vocals --list_limit=5\n```\n\nExample output:\n```\n--------------------------------------------------------------------------------------------------------------------------------------------------------------\nModel Filename                             Arch  Output Stems (SDR)                   Friendly Name\n--------------------------------------------------------------------------------------------------------------------------------------------------------------\nmodel_bs_roformer_ep_317_sdr_12.9755.ckpt  MDXC  vocals* (12.9), instrumental (17.0)  Roformer Model: BS-Roformer-Viperx-1297\nmodel_bs_roformer_ep_368_sdr_12.9628.ckpt  MDXC  vocals* (12.9), instrumental (17.0)  Roformer Model: BS-Roformer-Viperx-1296\nvocals_mel_band_roformer.ckpt              MDXC  vocals* (12.6), other                Roformer Model: MelBand Roformer | Vocals by Kimberley Jensen\nmelband_roformer_big_beta4.ckpt            MDXC  vocals* (12.5), other                Roformer Model: MelBand Roformer Kim | Big Beta 4 FT by unwa\nmel_band_roformer_kim_ft_unwa.ckpt         MDXC  vocals* (12.4), other                Roformer Model: MelBand Roformer Kim | FT by unwa\n```\n\n#### JSON Output\n\nFor programmatic use, you can output the model list in JSON format:\n\n```sh\naudio-separator -l --list_format=json\n```\n\n### Processing Large Files\n\nFor very long audio files (>1 hour), you may encounter out-of-memory errors. The `--chunk_duration` option automatically splits large files into smaller chunks, processes them separately, and merges the results:\n\n```sh\n# Process an 8-hour podcast in 10-minute chunks\naudio-separator long_podcast.wav --chunk_duration 600\n\n# Adjust chunk size based on available memory\naudio-separator very_long_audio.wav --chunk_duration 300  # 5-minute chunks\n```\n\n#### How It Works\n\n1. **Split**: The input file is split into fixed-duration chunks (e.g., 10 minutes)\n2. **Process**: Each chunk is processed separately, reducing peak memory usage\n3. **Merge**: The results are merged back together with simple concatenation\n\nThe chunking feature supports all model types:\n- **2-stem models** (e.g., MDX): Vocals + Instrumental\n- **4-stem models** (e.g., Demucs): Drums, Bass, Other, Vocals\n- **6-stem models** (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano\n\n#### Benefits\n\n- **Prevents OOM errors**: Process files of any length without running out of memory\n- **Predictable memory usage**: Memory usage stays bounded regardless of file length\n- **No quality loss**: Each chunk is fully processed with the selected model\n- **Multi-stem support**: Works seamlessly with 2, 4, and 6-stem models\n\n#### Recommendations\n\n- **Files > 1 hour**: Use `--chunk_duration 600` (10 minutes)\n- **Limited memory systems**: Use smaller chunks (300-600 seconds)\n- **Ample memory**: You may not need chunking at all\n\n#### Note on Audio Quality\n\nChunks are concatenated without crossfading, which may result in minor artifacts at chunk boundaries in rare cases. For most use cases, these are not noticeable. The simple concatenation approach keeps processing time minimal while solving out-of-memory issues.\n\n### Ensembling Multiple Models\n\nYou can combine the results of multiple models to improve separation quality. This will run each model and then combine their outputs using a specified algorithm.\n\n#### CLI Usage\n\nUse `-m` for the primary model and `--extra_models` for additional models. You can also specify the ensemble algorithm using `--ensemble_algorithm`.\n\n```sh\n# Ensemble two models using the default 'avg_wave' algorithm\naudio-separator audio.wav -m model1.ckpt --extra_models model2.onnx\n\n# Ensemble multiple models using a specific algorithm\naudio-separator audio.wav -m model1.ckpt --extra_models model2.onnx model3.ckpt --ensemble_algorithm max_fft\n\n# With custom weights (must match the number of models)\naudio-separator audio.wav -m model1.ckpt --extra_models model2.onnx --ensemble_weights 2.0 1.0\n```\n\n#### Python API Usage\n\n```python\nfrom audio_separator.separator import Separator\n\n# Initialize the Separator class with custom parameters\nseparator = Separator(\n    output_dir='output',\n    ensemble_algorithm='avg_wave'\n)\n\n# List of models to ensemble\n# Note: These models will be downloaded automatically if not present\nmodels = [\n    'UVR-MDX-NET-Inst_HQ_3.onnx',\n    'UVR_MDXNET_KARA_2.onnx'\n]\n\n# Specify multiple models for ensembling\nseparator.load_model(model_filename=models)\n\n# Perform separation\noutput_files = separator.separate('audio.wav')\n```\n\n#### Supported Ensemble Algorithms\n- `avg_wave`: Weighted average of waveforms (default)\n- `median_wave`: Median of waveforms\n- `min_wave`: Minimum of waveforms\n- `max_wave`: Maximum of waveforms\n- `avg_fft`: Weighted average of spectrograms\n- `median_fft`: Median of spectrograms\n- `min_fft`: Minimum of spectrograms\n- `max_fft`: Maximum of spectrograms\n- `uvr_max_spec`: UVR-based maximum spectrogram ensemble\n- `uvr_min_spec`: UVR-based minimum spectrogram ensemble\n- `ensemble_wav`: UVR-based least noisy chunk ensemble\n\n#### Ensemble Presets\n\nInstead of specifying models and algorithms manually, you can use curated presets based on community-tested combinations:\n\n```sh\n# List available presets\naudio-separator --list_presets\n\n# Use a preset (models and algorithm are configured automatically)\naudio-separator audio.wav --ensemble_preset vocal_balanced\n\n# Override a preset's algorithm\naudio-separator audio.wav --ensemble_preset vocal_balanced --ensemble_algorithm max_fft\n```\n\n**Python API:**\n```python\nseparator = Separator(output_dir='output', ensemble_preset='vocal_balanced')\nseparator.load_model()  # Uses preset's models automatically\noutput_files = separator.separate('audio.wav')\n```\n\nAvailable presets:\n\n| Preset | Use Case | Models | Algorithm |\n|--------|----------|--------|-----------|\n| `instrumental_clean` | Cleanest instrumentals, minimal vocal bleed | 2 | `uvr_max_spec` |\n| `instrumental_full` | Maximum instrument preservation | 2 | `uvr_max_spec` |\n| `instrumental_balanced` | Good noise/fullness balance | 2 | `uvr_max_spec` |\n| `instrumental_low_resource` | Fast, low VRAM | 2 | `avg_fft` |\n| `vocal_balanced` | Best overall vocal quality | 2 | `avg_fft` |\n| `vocal_clean` | Minimal instrument bleed | 2 | `min_fft` |\n| `vocal_full` | Maximum vocal capture | 2 | `max_fft` |\n| `vocal_rvc` | Optimized for RVC/AI training | 2 | `avg_wave` |\n| `karaoke` | Lead vocal removal | 3 | `avg_wave` |\n\nPresets are defined in `audio_separator/ensemble_presets.json` — contributions welcome via PR!\n\n### Full command-line interface options\n\n```sh\nusage: audio-separator [-h] [-v] [-d] [-e] [-l] [--log_level LOG_LEVEL] [--list_filter LIST_FILTER] [--list_limit LIST_LIMIT] [--list_format {pretty,json}] [-m MODEL_FILENAME] [--output_format OUTPUT_FORMAT]\n                       [--output_bitrate OUTPUT_BITRATE] [--output_dir OUTPUT_DIR] [--model_file_dir MODEL_FILE_DIR] [--download_model_only] [--invert_spect] [--normalization NORMALIZATION]\n                       [--amplification AMPLIFICATION] [--single_stem SINGLE_STEM] [--sample_rate SAMPLE_RATE] [--use_soundfile] [--use_autocast] [--custom_output_names CUSTOM_OUTPUT_NAMES]\n                       [--mdx_segment_size MDX_SEGMENT_SIZE] [--mdx_overlap MDX_OVERLAP] [--mdx_batch_size MDX_BATCH_SIZE] [--mdx_hop_length MDX_HOP_LENGTH] [--mdx_enable_denoise] [--vr_batch_size VR_BATCH_SIZE]\n                       [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta] [--vr_high_end_process] [--vr_enable_post_process]\n                       [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_segment_size DEMUCS_SEGMENT_SIZE] [--demucs_shifts DEMUCS_SHIFTS] [--demucs_overlap DEMUCS_OVERLAP]\n                       [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED] [--mdxc_segment_size MDXC_SEGMENT_SIZE] [--mdxc_override_model_segment_size] [--mdxc_overlap MDXC_OVERLAP]\n                       [--mdxc_batch_size MDXC_BATCH_SIZE] [--mdxc_pitch_shift MDXC_PITCH_SHIFT]\n                       [audio_files ...]\n\nSeparate audio file into different stems.\n\npositional arguments:\n  audio_files                                            The audio file paths or directory to separate, in any common format.\n\noptions:\n  -h, --help                                             show this help message and exit\n\nInfo and Debugging:\n  -v, --version                                          Show the program's version number and exit.\n  -d, --debug                                            Enable debug logging, equivalent to --log_level=debug.\n  -e, --env_info                                         Print environment information and exit.\n  -l, --list_models                                      List all supported models and exit. Use --list_filter to filter/sort the list and --list_limit to show only top N results.\n  --log_level LOG_LEVEL                                  Log level, e.g. info, debug, warning (default: info).\n  --list_filter LIST_FILTER                              Filter and sort the model list by 'name', 'filename', or any stem e.g. vocals, instrumental, drums\n  --list_limit LIST_LIMIT                                Limit the number of models shown\n  --list_format {pretty,json}                            Format for listing models: 'pretty' for formatted output, 'json' for raw JSON dump\n\nSeparation I/O Params:\n  -m MODEL_FILENAME, --model_filename MODEL_FILENAME     Model to use for separation (default: model_bs_roformer_ep_317_sdr_12.9755.yaml). Example: -m 2_HP-UVR.pth\n  --output_format OUTPUT_FORMAT                          Output format for separated files, any common format (default: FLAC). Example: --output_format=MP3\n  --output_bitrate OUTPUT_BITRATE                        Output bitrate for separated files, any ffmpeg-compatible bitrate (default: None). Example: --output_bitrate=320k\n  --output_dir OUTPUT_DIR                                Directory to write output files (default: <current dir>). Example: --output_dir=/app/separated\n  --model_file_dir MODEL_FILE_DIR                        Model files directory (default: /tmp/audio-separator-models/). Example: --model_file_dir=/app/models\n  --download_model_only                                  Download a single model file only, without performing separation.\n\nCommon Separation Parameters:\n  --invert_spect                                         Invert secondary stem using spectrogram (default: False). Example: --invert_spect\n  --normalization NORMALIZATION                          Max peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization=0.7\n  --amplification AMPLIFICATION                          Min peak amplitude to amplify input and output audio to (default: 0.0). Example: --amplification=0.4\n  --single_stem SINGLE_STEM                              Output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental\n  --sample_rate SAMPLE_RATE                              Modify the sample rate of the output audio (default: 44100). Example: --sample_rate=44100\n  --use_soundfile                                        Use soundfile to write audio output (default: False). Example: --use_soundfile\n  --use_autocast                                         Use PyTorch autocast for faster inference (default: False). Do not use for CPU inference. Example: --use_autocast\n  --custom_output_names CUSTOM_OUTPUT_NAMES              Custom names for all output files in JSON format (default: None). Example: --custom_output_names='{\"Vocals\": \"vocals_output\", \"Drums\": \"drums_output\"}'\n\nMDX Architecture Parameters:\n  --mdx_segment_size MDX_SEGMENT_SIZE                    Larger consumes more resources, but may give better results (default: 256). Example: --mdx_segment_size=256\n  --mdx_overlap MDX_OVERLAP                              Amount of overlap between prediction windows, 0.001-0.999. Higher is better but slower (default: 0.25). Example: --mdx_overlap=0.25\n  --mdx_batch_size MDX_BATCH_SIZE                        Larger consumes more RAM but may process slightly faster (default: 1). Example: --mdx_batch_size=4\n  --mdx_hop_length MDX_HOP_LENGTH                        Usually called stride in neural networks, only change if you know what you're doing (default: 1024). Example: --mdx_hop_length=1024\n  --mdx_enable_denoise                                   Enable denoising during separation (default: False). Example: --mdx_enable_denoise\n\nVR Architecture Parameters:\n  --vr_batch_size VR_BATCH_SIZE                          Number of batches to process at a time. Higher = more RAM, slightly faster processing (default: 1). Example: --vr_batch_size=16\n  --vr_window_size VR_WINDOW_SIZE                        Balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320\n  --vr_aggression VR_AGGRESSION                          Intensity of primary stem extraction, -100 - 100. Typically, 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2\n  --vr_enable_tta                                        Enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta\n  --vr_high_end_process                                  Mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process\n  --vr_enable_post_process                               Identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process\n  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  Threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1\n\nDemucs Architecture Parameters:\n  --demucs_segment_size DEMUCS_SEGMENT_SIZE              Size of segments into which the audio is split, 1-100. Higher = slower but better quality (default: Default). Example: --demucs_segment_size=256\n  --demucs_shifts DEMUCS_SHIFTS                          Number of predictions with random shifts, higher = slower but better quality (default: 2). Example: --demucs_shifts=4\n  --demucs_overlap DEMUCS_OVERLAP                        Overlap between prediction windows, 0.001-0.999. Higher = slower but better quality (default: 0.25). Example: --demucs_overlap=0.25\n  --demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED      Enable segment-wise processing (default: True). Example: --demucs_segments_enabled=False\n\nMDXC Architecture Parameters:\n  --mdxc_segment_size MDXC_SEGMENT_SIZE                  Larger consumes more resources, but may give better results (default: 256). Example: --mdxc_segment_size=256\n  --mdxc_override_model_segment_size                     Override model default segment size instead of using the model default value. Example: --mdxc_override_model_segment_size\n  --mdxc_overlap MDXC_OVERLAP                            Amount of overlap between prediction windows, 2-50. Higher is better but slower (default: 8). Example: --mdxc_overlap=8\n  --mdxc_batch_size MDXC_BATCH_SIZE                      Larger consumes more RAM but may process slightly faster (default: 1). Example: --mdxc_batch_size=4\n  --mdxc_pitch_shift MDXC_PITCH_SHIFT                    Shift audio pitch by a number of semitones while processing. May improve output for deep/high vocals. (default: 0). Example: --mdxc_pitch_shift=2\n```\n\n### As a Dependency in a Python Project\n\nYou can use Audio Separator in your own Python project. Here's a minimal example using the default two stem (Instrumental and Vocals) model:\n\n```python\nfrom audio_separator.separator import Separator\n\n# Initialize the Separator class (with optional configuration properties, below)\nseparator = Separator()\n\n# Load a machine learning model (if unspecified, defaults to 'model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt')\nseparator.load_model()\n\n# Perform the separation on specific audio files without reloading the model\noutput_files = separator.separate('audio1.wav')\n\nprint(f\"Separation complete! Output file(s): {' '.join(output_files)}\")\n```\n\n#### Batch processing and processing with multiple models\n\nYou can process multiple files without reloading the model to save time and memory.\n\nYou only need to load a model when choosing or changing models. See example below:\n\n```python\nfrom audio_separator.separator import Separator\n\n# Initialize the Separator class (with optional configuration properties, below)\nseparator = Separator()\n\n# Load a model\nseparator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt')\n\n# Separate multiple audio files without reloading the model\noutput_files = separator.separate(['audio1.wav', 'audio2.wav', 'audio3.wav'])\n\n# Load a different model\nseparator.load_model(model_filename='UVR_MDXNET_KARA_2.onnx')\n\n# Separate the same files with the new model\noutput_files = separator.separate(['audio1.wav', 'audio2.wav', 'audio3.wav'])\n```\n\nYou can also specify the path to a folder containing audio files instead of listing the full paths to each of them:\n```python\nfrom audio_separator.separator import Separator\n\n# Initialize the Separator class (with optional configuration properties, below)\nseparator = Separator()\n\n# Load a model\nseparator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt')\n\n# Separate all audio files located in a folder\noutput_files = separator.separate('path/to/audio_directory')\n```\n\n#### Renaming Stems\n\nYou can rename the output files by specifying the desired names. For example:\n```python\noutput_names = {\n    \"Vocals\": \"vocals_output\",\n    \"Instrumental\": \"instrumental_output\",\n}\noutput_files = separator.separate('audio1.wav', output_names)\n```\nIn this case, the output file names will be: `vocals_output.wav` and `instrumental_output.wav`.\n\nYou can also rename specific stems:\n\n- To rename the Vocals stem:\n  ```python\n  output_names = {\n      \"Vocals\": \"vocals_output\",\n  }\n  output_files = separator.separate('audio1.wav', output_names)\n  ```\n  > The output files will be named: `vocals_output.wav` and `audio1_(Instrumental)_model_mel_band_roformer_ep_3005_sdr_11.wav`\n- To rename the Instrumental stem:\n  ```python\n  output_names = {\n      \"Instrumental\": \"instrumental_output\",\n  }\n  output_files = separator.separate('audio1.wav', output_names)\n  ```\n  > The output files will be named: `audio1_(Vocals)_model_mel_band_roformer_ep_3005_sdr_11.wav` and `instrumental_output.wav`\n- List of stems for Demucs models:\n  - htdemucs_6s.yaml\n    ```python\n    output_names = {\n        \"Vocals\": \"vocals_output\",\n        \"Drums\": \"drums_output\",\n        \"Bass\": \"bass_output\",\n        \"Other\": \"other_output\",\n        \"Guitar\": \"guitar_output\",\n        \"Piano\": \"piano_output\",\n    }\n    ```\n  - Other Demucs models\n    ```python\n    output_names = {\n        \"Vocals\": \"vocals_output\",\n        \"Drums\": \"drums_output\",\n        \"Bass\": \"bass_output\",\n        \"Other\": \"other_output\",\n    }\n    ```\n\n## Parameters for the Separator class\n\n- **`log_level`:** (Optional) Logging level, e.g., INFO, DEBUG, WARNING. `Default: logging.INFO`\n- **`log_formatter`:** (Optional) The log format. Default: None, which falls back to '%(asctime)s - %(levelname)s - %(module)s - %(message)s'\n- **`model_file_dir`:** (Optional) Directory to cache model files in. `Default: /tmp/audio-separator-models/`\n- **`output_dir`:** (Optional) Directory where the separated files will be saved. If not specified, uses the current directory.\n- **`output_format`:** (Optional) Format to encode output files, any common format (WAV, MP3, FLAC, M4A, etc.). `Default: WAV`\n- **`normalization_threshold`:** (Optional) The amount by which the amplitude of the output audio will be multiplied. `Default: 0.9`\n- **`amplification_threshold`:** (Optional) The minimum amplitude level at which the waveform will be amplified. If the peak amplitude of the audio is below this threshold, the waveform will be scaled up to meet it. `Default: 0.0`\n- **`output_single_stem`:** (Optional) Output only a single stem, such as 'Instrumental' and 'Vocals'. `Default: None`\n- **`invert_using_spec`:** (Optional) Flag to invert using spectrogram. `Default: False`\n- **`sample_rate`:** (Optional) Set the sample rate of the output audio. `Default: 44100`\n- **`use_soundfile`:** (Optional) Use soundfile for output writing, can solve OOM issues, especially on longer audio.\n- **`use_autocast`:** (Optional) Flag to use PyTorch autocast for faster inference. Do not use for CPU inference. `Default: False`\n- **`mdx_params`:** (Optional) MDX Architecture Specific Attributes & Defaults. `Default: {\"hop_length\": 1024, \"segment_size\": 256, \"overlap\": 0.25, \"batch_size\": 1, \"enable_denoise\": False}`\n- **`vr_params`:** (Optional) VR Architecture Specific Attributes & Defaults. `Default: {\"batch_size\": 1, \"window_size\": 512, \"aggression\": 5, \"enable_tta\": False, \"enable_post_process\": False, \"post_process_threshold\": 0.2, \"high_end_process\": False}`\n- **`demucs_params`:** (Optional) Demucs Architecture Specific Attributes & Defaults. `Default: {\"segment_size\": \"Default\", \"shifts\": 2, \"overlap\": 0.25, \"segments_enabled\": True}` _(Note: `segment_size` \"Default\" uses the model's internal default, typically 40 for older Demucs models and 10 for Demucs v4/htdemucs)_\n- **`mdxc_params`:** (Optional) MDXC Architecture Specific Attributes & Defaults. `Default: {\"segment_size\": 256, \"override_model_segment_size\": False, \"batch_size\": 1, \"overlap\": 8, \"pitch_shift\": 0}`\n- **`ensemble_algorithm`:** (Optional) Algorithm to use for ensembling multiple models. `Default: 'avg_wave'`\n- **`ensemble_weights`:** (Optional) Weights for each model in the ensemble. `Default: None` (equal weights)\n- **`ensemble_preset`:** (Optional) Named ensemble preset (e.g. `'vocal_balanced'`, `'karaoke'`). Sets models, algorithm, and weights automatically. Use `Separator(info_only=True).list_ensemble_presets()` to see all. `Default: None`\n\n## Remote API Usage 🌐\n\nAudio Separator includes a remote API client that allows you to connect to a deployed Audio Separator API service, enabling you to perform audio separation without running the models locally. The API uses asynchronous processing with job polling for efficient handling of separation tasks.\n\nTo deploy Audio Separator as an API on modal.com and use this for remote processing, please see the detailed documentation here: [audio_separator/remote/README.md](audio_separator/remote/README.md).\n\n## Requirements 📋\n\nPython >= 3.10\n\nLibraries: torch, onnx, onnxruntime, numpy, librosa, requests, six, tqdm, pydub\n\n## Developing Locally\n\nThis project uses Poetry for dependency management and packaging. Follow these steps to setup a local development environment:\n\n### Prerequisites\n\n- Make sure you have Python 3.10 or newer installed on your machine.\n- Install Conda (I recommend Miniforge: [Miniforge GitHub](https://github.com/conda-forge/miniforge)) to manage your Python virtual environments\n\n### Clone the Repository\n\nClone the repository to your local machine:\n\n```sh\ngit clone https://github.com/YOUR_USERNAME/audio-separator.git\ncd audio-separator\n```\n\nReplace `YOUR_USERNAME` with your GitHub username if you've forked the repository, or use the main repository URL if you have the permissions.\n\n### Create and activate the Conda Environment\n\nTo create and activate the conda environment, use the following commands:\n\n```sh\nconda env create\nconda activate audio-separator-dev\n```\n\n### Install Dependencies\n\nOnce you're inside the conda env, run the following command to install the project dependencies:\n\n```sh\npoetry install\n```\n\nInstall extra dependencies depending if you're running with GPU or CPU.\n```sh\npoetry install --extras \"cpu\"\n```\nor\n```sh\npoetry install --extras \"gpu\"\n```\nor\n```sh\npoetry install --extras \"dml\"\n```\n\n### Running the Command-Line Interface Locally\n\nYou can run the CLI command directly within the virtual environment. For example:\n\n```sh\naudio-separator path/to/your/audio-file.wav\n```\n\n### Deactivate the Virtual Environment\n\nOnce you are done with your development work, you can exit the virtual environment by simply typing:\n\n```sh\nconda deactivate\n```\n\n### Building the Package\n\nTo build the package for distribution, use the following command:\n\n```sh\npoetry build\n```\n\nThis will generate the distribution packages in the dist directory - but for now only @beveradb will be able to publish to PyPI.\n\n\n## Contributing 🤝\n\nContributions are very much welcome! Please fork the repository and submit a pull request with your changes, and I'll try to review, merge and publish promptly!\n\n- This project is 100% open-source and free for anyone to use and modify as they wish.\n- If the maintenance workload for this repo somehow becomes too much for me I'll ask for volunteers to share maintainership of the repo, though I don't think that is very likely\n- Development and support for the MDX-Net separation models is part of the main [UVR project](https://github.com/Anjok07/ultimatevocalremovergui), this repo is just a CLI/Python package wrapper to simplify running those models programmatically. So, if you want to try and improve the actual models, please get involved in the UVR project and look for guidance there!\n\n## License 📄\n\nThis project is licensed under the MIT [License](LICENSE).\n\n- **Please Note:** If you choose to integrate this project into some other project using the default model or any other model trained as part of the [UVR](https://github.com/Anjok07/ultimatevocalremovergui) project, please honor the MIT license by providing credit to UVR and its developers!\n\n## Credits 🙏\n\n- [Anjok07](https://github.com/Anjok07) - Author of [Ultimate Vocal Remover GUI](https://github.com/Anjok07/ultimatevocalremovergui), which almost all of the code in this repo was copied from! Definitely deserving of credit for anything good from this project. Thank you!\n- [DilanBoskan](https://github.com/DilanBoskan) - Your contributions at the start of this project were essential to the success of UVR. Thank you!\n- [Kuielab & Woosung Choi](https://github.com/kuielab) - Developed the original MDX-Net AI code.\n- [KimberleyJSN](https://github.com/KimberleyJensen) - Advised and aided the implementation of the training scripts for MDX-Net and Demucs. Thank you!\n- [Hv](https://github.com/NaJeongMo/Colab-for-MDX_B) - Helped implement chunks into the MDX-Net AI code. Thank you!\n- [zhzhongshi](https://github.com/zhzhongshi) - Helped add support for the MDXC models in `audio-separator`. Thank you!\n\n## Contact 💌\n\nFor questions or feedback, please raise an issue or reach out to @beveradb ([Andrew Beveridge](mailto:andrew@beveridge.uk)) directly.\n\n---\n<div align=\"center\">\n\n<!-- sponsors --><!-- sponsors -->\n\n## Thanks to all contributors for their efforts\n\n<a href=\"https://github.com/nomadkaraoke/python-audio-separator/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=nomadkaraoke/python-audio-separator\" />\n</a>\n\n</div>"
  },
  {
    "path": "TODO.md",
    "content": "# Audio-Separator TO-DO list\n\nIf you see something here, Andrew is aware it needs to be done, and he will hopefully get to it soon but can't make any promises!\nThis isn't his full time job, and he's doing his best to keep up with everything.\n\nIf you'd like something to be done sooner, please consider trying to work on it yourself and submitting a pull request!\nIf you don't know how to code, please consider learning - it's free, and anyone can do it! https://www.freecodecamp.org/learn/scientific-computing-with-python/\n\n## TODO:\n\n- Add unit tests to all uvr lib functions to ensure no obvious errors are missed\n- Add end-to-end tests which download all models and test separation with a very short input file for speed\n- Add ability for user to download all models ahead of time\n- Add tests for Windows, Linux and macOS separately\n- Add tests for Python 3.10, 3.11\n- Add support for Ensemble mode\n- Add support for Chaining multiple models\n- Add support for Splitting multiple stems from a single input file by running different models\n"
  },
  {
    "path": "audio_separator/__init__.py",
    "content": ""
  },
  {
    "path": "audio_separator/ensemble_presets.json",
    "content": "{\n    \"version\": 1,\n    \"presets\": {\n        \"instrumental_clean\": {\n            \"name\": \"Instrumental Clean\",\n            \"description\": \"Cleanest instrumentals with minimal vocal bleed — Fv7z (bleedless 44.61) + Resurrection Inst (SDR 17.25)\",\n            \"models\": [\n                \"mel_band_roformer_instrumental_fv7z_gabox.ckpt\",\n                \"bs_roformer_instrumental_resurrection_unwa.ckpt\"\n            ],\n            \"algorithm\": \"uvr_max_spec\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"instrumental_full\": {\n            \"name\": \"Instrumental Full\",\n            \"description\": \"Maximum instrument preservation — v1e+ (fullness 37.89) + becruily inst (SOTA SDR 17.55)\",\n            \"models\": [\n                \"melband_roformer_inst_v1e_plus.ckpt\",\n                \"mel_band_roformer_instrumental_becruily.ckpt\"\n            ],\n            \"algorithm\": \"uvr_max_spec\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"instrumental_balanced\": {\n            \"name\": \"Instrumental Balanced\",\n            \"description\": \"Good balance of noise and fullness — Gabox INSTV8 + Resurrection Inst\",\n            \"models\": [\n                \"mel_band_roformer_instrumental_instv8_gabox.ckpt\",\n                \"bs_roformer_instrumental_resurrection_unwa.ckpt\"\n            ],\n            \"algorithm\": \"uvr_max_spec\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"instrumental_low_resource\": {\n            \"name\": \"Instrumental Low Resource\",\n            \"description\": \"Fast ensemble for low VRAM — Resurrection Inst (200MB) + MDX HQ_5 (ONNX, very fast)\",\n            \"models\": [\n                \"bs_roformer_instrumental_resurrection_unwa.ckpt\",\n                \"UVR-MDX-NET-Inst_HQ_5.onnx\"\n            ],\n            \"algorithm\": \"avg_fft\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"vocal_balanced\": {\n            \"name\": \"Vocal Balanced\",\n            \"description\": \"Best overall vocal quality — Resurrection (SDR 11.34) + Beta 6X (SDR 11.12) averaged\",\n            \"models\": [\n                \"bs_roformer_vocals_resurrection_unwa.ckpt\",\n                \"melband_roformer_big_beta6x.ckpt\"\n            ],\n            \"algorithm\": \"avg_fft\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"vocal_clean\": {\n            \"name\": \"Vocal Clean\",\n            \"description\": \"Minimal instrument bleed in vocals — Revive 2 (bleedless 40.07) + FT2 bleedless (39.30) with min FFT\",\n            \"models\": [\n                \"bs_roformer_vocals_revive_v2_unwa.ckpt\",\n                \"mel_band_roformer_kim_ft2_bleedless_unwa.ckpt\"\n            ],\n            \"algorithm\": \"min_fft\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"vocal_full\": {\n            \"name\": \"Vocal Full\",\n            \"description\": \"Maximum vocal capture including harmonies — Revive 3e (fullness 21.43) + becruily vocal with max FFT\",\n            \"models\": [\n                \"bs_roformer_vocals_revive_v3e_unwa.ckpt\",\n                \"mel_band_roformer_vocals_becruily.ckpt\"\n            ],\n            \"algorithm\": \"max_fft\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"vocal_rvc\": {\n            \"name\": \"Vocal RVC\",\n            \"description\": \"Optimized for RVC/AI voice training data — Beta 6X + Gabox voc_fv4 averaged\",\n            \"models\": [\n                \"melband_roformer_big_beta6x.ckpt\",\n                \"mel_band_roformer_vocals_fv4_gabox.ckpt\"\n            ],\n            \"algorithm\": \"avg_wave\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        },\n        \"karaoke\": {\n            \"name\": \"Karaoke\",\n            \"description\": \"Lead vocal removal — 3-model karaoke ensemble reaches SDR ~10.6 vs ~10.2 single model\",\n            \"models\": [\n                \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\",\n                \"mel_band_roformer_karaoke_gabox_v2.ckpt\",\n                \"mel_band_roformer_karaoke_becruily.ckpt\"\n            ],\n            \"algorithm\": \"avg_wave\",\n            \"weights\": null,\n            \"contributor\": \"deton24 community guide\"\n        }\n    }\n}\n"
  },
  {
    "path": "audio_separator/model-data.json",
    "content": "{\n    \"vr_model_data\": {\n        \"97dc361a7a88b2c4542f68364b32c7f6\": {\n            \"vr_model_param\": \"4band_v4_ms_fullband\",\n            \"primary_stem\": \"Dry\",\n            \"nout\": 32,\n            \"nout_lstm\": 128,\n            \"is_karaoke\": false,\n            \"is_bv_model\": false,\n            \"is_bv_model_rebalanced\": 0.0\n        },\n        \"581f4461b8d8c7e43af546eebd3b6a2a\": {\n            \"vr_model_param\": \"4band_v4_ms_fullband\",\n            \"primary_stem\": \"Vocals\",\n            \"nout\": 64,\n            \"nout_lstm\": 128,\n            \"is_karaoke\": false,\n            \"is_bv_model\": true,\n            \"is_bv_model_rebalanced\": 0.9\n        }\n    },\n    \"mdx_model_data\": {\n        \"cb790d0c913647ced70fc6b38f5bea1a\": {\n            \"compensate\": 1.010,\n            \"mdx_dim_f_set\": 2560,\n            \"mdx_dim_t_set\": 8,\n            \"mdx_n_fft_scale_set\": 5120,\n            \"primary_stem\": \"Instrumental\"\n        }\n    }\n}"
  },
  {
    "path": "audio_separator/models-scores.json",
    "content": "{\n  \"1_HP-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 1_HP-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.24871,\n            \"SIR\": 15.4624,\n            \"SAR\": 4.83391,\n            \"ISR\": 8.8857\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8927,\n            \"SIR\": 19.3158,\n            \"SAR\": 18.24,\n            \"ISR\": 18.8107\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.43096,\n            \"SIR\": 12.0094,\n            \"SAR\": 5.75084,\n            \"ISR\": 12.7187\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6068,\n            \"SIR\": 18.0944,\n            \"SAR\": 12.3348,\n            \"ISR\": 14.9249\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.37912,\n            \"SIR\": 18.9104,\n            \"SAR\": 8.54122,\n            \"ISR\": 15.3526\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3357,\n            \"SIR\": 21.73,\n            \"SAR\": 16.6099,\n            \"ISR\": 21.9107\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.68601,\n            \"SIR\": 5.35433,\n            \"SAR\": 4.00494,\n            \"ISR\": 12.7504\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3974,\n            \"SIR\": 21.6121,\n            \"SAR\": 13.2817,\n            \"ISR\": 15.9813\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.77937,\n            \"SIR\": 20.2537,\n            \"SAR\": 9.60245,\n            \"ISR\": 13.3208\n          },\n          \"instrumental\": {\n            \"SDR\": 12.781,\n            \"SIR\": 15.3895,\n            \"SAR\": 13.3742,\n            \"ISR\": 23.1482\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.84219,\n            \"SIR\": 16.1098,\n            \"SAR\": 9.47788,\n            \"ISR\": 13.4743\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0084,\n            \"SIR\": 15.7127,\n            \"SAR\": 12.9399,\n            \"ISR\": 19.7754\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.73747,\n            \"SIR\": 18.9403,\n            \"SAR\": 10.333,\n            \"ISR\": 14.4066\n          },\n          \"instrumental\": {\n            \"SDR\": 14.318,\n            \"SIR\": 18.8837,\n            \"SAR\": 15.8694,\n            \"ISR\": 22.3595\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91892,\n            \"SIR\": 20.3204,\n            \"SAR\": 5.93576,\n            \"ISR\": 10.5446\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9788,\n            \"SIR\": 23.2555,\n            \"SAR\": 21.5719,\n            \"ISR\": 21.9236\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.15237,\n            \"SIR\": 25.8856,\n            \"SAR\": 9.99461,\n            \"ISR\": 12.9868\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4142,\n            \"SIR\": 16.693,\n            \"SAR\": 15.1214,\n            \"ISR\": 18.7991\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.03781,\n            \"SIR\": 18.0127,\n            \"SAR\": 7.57484,\n            \"ISR\": 10.4891\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3057,\n            \"SIR\": 14.2951,\n            \"SAR\": 13.8743,\n            \"ISR\": 24.6434\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2091,\n            \"SIR\": 23.0624,\n            \"SAR\": 10.6547,\n            \"ISR\": 17.3402\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2878,\n            \"SIR\": 20.8786,\n            \"SAR\": 14.4586,\n            \"ISR\": 22.0027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0986,\n            \"SIR\": 20.6672,\n            \"SAR\": 9.75169,\n            \"ISR\": 15.2152\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9512,\n            \"SIR\": 21.3993,\n            \"SAR\": 16.138,\n            \"ISR\": 27.0006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.76939,\n            \"SIR\": 17.5586,\n            \"SAR\": 6.98892,\n            \"ISR\": 12.3147\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8124,\n            \"SIR\": 20.1259,\n            \"SAR\": 15.7944,\n            \"ISR\": 24.8105\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.43502,\n            \"SIR\": 18.5542,\n            \"SAR\": 7.8475,\n            \"ISR\": 13.134\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5321,\n            \"SIR\": 20.5767,\n            \"SAR\": 16.245,\n            \"ISR\": 18.7807\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.57991,\n            \"SIR\": 16.4946,\n            \"SAR\": 3.3243,\n            \"ISR\": 7.31346\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8819,\n            \"SIR\": 17.3794,\n            \"SAR\": 18.4389,\n            \"ISR\": 29.5074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2519,\n            \"SIR\": 19.1926,\n            \"SAR\": 8.72244,\n            \"ISR\": 14.1103\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3555,\n            \"SIR\": 18.9906,\n            \"SAR\": 15.1787,\n            \"ISR\": 19.5638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.89595,\n            \"SIR\": 22.4832,\n            \"SAR\": 6.83054,\n            \"ISR\": 10.5802\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7255,\n            \"SIR\": 17.3371,\n            \"SAR\": 16.0628,\n            \"ISR\": 21.1358\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -20.544,\n            \"SIR\": -37.706,\n            \"SAR\": 0.60976,\n            \"ISR\": 0.2973\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6498,\n            \"SIR\": 49.0042,\n            \"SAR\": 12.4641,\n            \"ISR\": 16.1127\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.08146,\n            \"SIR\": 15.5686,\n            \"SAR\": 5.40936,\n            \"ISR\": 8.49432\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2047,\n            \"SIR\": 17.0075,\n            \"SAR\": 16.996,\n            \"ISR\": 26.9043\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.09982,\n            \"SIR\": 18.5838,\n            \"SAR\": 8.74048,\n            \"ISR\": 13.0896\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9941,\n            \"SIR\": 19.1888,\n            \"SAR\": 16.7467,\n            \"ISR\": 19.5069\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.99043,\n            \"SIR\": 18.7958,\n            \"SAR\": 9.75759,\n            \"ISR\": 14.2486\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8229,\n            \"SIR\": 25.0371,\n            \"SAR\": 19.7476,\n            \"ISR\": 27.6669\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.56311,\n            \"SIR\": 18.1942,\n            \"SAR\": 6.0561,\n            \"ISR\": 11.368\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8085,\n            \"SIR\": 17.9619,\n            \"SAR\": 15.7333,\n            \"ISR\": 18.1057\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.47258,\n            \"SIR\": 13.2462,\n            \"SAR\": 3.17736,\n            \"ISR\": 5.63786\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3255,\n            \"SIR\": 12.4029,\n            \"SAR\": 15.0933,\n            \"ISR\": 19.4058\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.18906,\n            \"SIR\": 22.5721,\n            \"SAR\": 5.73903,\n            \"ISR\": 9.24557\n          },\n          \"instrumental\": {\n            \"SDR\": 14.975,\n            \"SIR\": 19.4188,\n            \"SAR\": 17.948,\n            \"ISR\": 18.9743\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.18507,\n            \"SIR\": 16.4992,\n            \"SAR\": 6.30036,\n            \"ISR\": 10.0651\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1405,\n            \"SIR\": 18.0741,\n            \"SAR\": 16.8313,\n            \"ISR\": 26.3192\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.5812,\n            \"SIR\": 16.1195,\n            \"SAR\": 7.46711,\n            \"ISR\": 13.4299\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0101,\n            \"SIR\": 18.3,\n            \"SAR\": 14.5392,\n            \"ISR\": 21.4193\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.76299,\n            \"SIR\": 18.7505,\n            \"SAR\": 8.23403,\n            \"ISR\": 11.5089\n          },\n          \"instrumental\": {\n            \"SDR\": 9.76063,\n            \"SIR\": 13.0942,\n            \"SAR\": 11.6006,\n            \"ISR\": 21.6246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.71433,\n            \"SIR\": 14.0367,\n            \"SAR\": 3.94581,\n            \"ISR\": 8.26944\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3109,\n            \"SIR\": 12.6439,\n            \"SAR\": 12.5739,\n            \"ISR\": 21.6968\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.12746,\n            \"SIR\": 21.5389,\n            \"SAR\": 9.30687,\n            \"ISR\": 13.3151\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8668,\n            \"SIR\": 18.001,\n            \"SAR\": 15.5456,\n            \"ISR\": 17.7115\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7225,\n            \"SIR\": 22.8344,\n            \"SAR\": 10.58,\n            \"ISR\": 13.4283\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5839,\n            \"SIR\": 13.1001,\n            \"SAR\": 10.4395,\n            \"ISR\": 23.6291\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.20913,\n            \"SIR\": 19.282,\n            \"SAR\": 10.0456,\n            \"ISR\": 13.8778\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7055,\n            \"SIR\": 16.7757,\n            \"SAR\": 13.7724,\n            \"ISR\": 16.9293\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1419,\n            \"SIR\": 30.1871,\n            \"SAR\": 10.9111,\n            \"ISR\": 15.3985\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1001,\n            \"SIR\": 22.8839,\n            \"SAR\": 19.7081,\n            \"ISR\": 19.6091\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.88926,\n            \"SIR\": 8.76664,\n            \"SAR\": 4.81204,\n            \"ISR\": 8.86849\n          },\n          \"instrumental\": {\n            \"SDR\": 9.9125,\n            \"SIR\": 14.02,\n            \"SAR\": 12.4588,\n            \"ISR\": 16.7232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4278,\n            \"SIR\": 24.0343,\n            \"SAR\": 12.6541,\n            \"ISR\": 21.4862\n          },\n          \"instrumental\": {\n            \"SDR\": 20.8791,\n            \"SIR\": 29.5482,\n            \"SAR\": 21.5446,\n            \"ISR\": 27.4235\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1582,\n            \"SIR\": 19.0278,\n            \"SAR\": 11.3987,\n            \"ISR\": 18.4367\n          },\n          \"instrumental\": {\n            \"SDR\": 10.827,\n            \"SIR\": 17.4836,\n            \"SAR\": 11.1509,\n            \"ISR\": 18.6054\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.23827,\n            \"SIR\": 17.2753,\n            \"SAR\": 6.55888,\n            \"ISR\": 10.6605\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4443,\n            \"SIR\": 15.754,\n            \"SAR\": 13.6843,\n            \"ISR\": 24.0501\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.15022,\n            \"SIR\": 12.9594,\n            \"SAR\": 5.2702,\n            \"ISR\": 12.1\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2903,\n            \"SIR\": 22.4148,\n            \"SAR\": 17.1974,\n            \"ISR\": 18.2176\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.1041,\n            \"SIR\": 22.9987,\n            \"SAR\": 8.3819,\n            \"ISR\": 13.277\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2919,\n            \"SIR\": 20.1393,\n            \"SAR\": 17.1399,\n            \"ISR\": 29.5608\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.70997,\n            \"SIR\": 25.086,\n            \"SAR\": 9.61721,\n            \"ISR\": 13.4541\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8108,\n            \"SIR\": 21.3215,\n            \"SAR\": 18.4017,\n            \"ISR\": 24.6592\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.37693,\n            \"SIR\": 21.8776,\n            \"SAR\": 8.69275,\n            \"ISR\": 13.1925\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2943,\n            \"SIR\": 16.7404,\n            \"SAR\": 13.7403,\n            \"ISR\": 21.0744\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.90122,\n            \"SIR\": 20.671,\n            \"SAR\": 8.24112,\n            \"ISR\": 12.8178\n          },\n          \"instrumental\": {\n            \"SDR\": 14.483,\n            \"SIR\": 19.329,\n            \"SAR\": 16.6609,\n            \"ISR\": 27.6775\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.90122,\n        \"SIR\": 18.7958,\n        \"SAR\": 8.23403,\n        \"ISR\": 12.9868\n      },\n      \"instrumental\": {\n        \"SDR\": 13.7255,\n        \"SIR\": 18.3,\n        \"SAR\": 15.7333,\n        \"ISR\": 21.4193\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"2_HP-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 2_HP-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.00778,\n            \"SIR\": 16.5022,\n            \"SAR\": 4.95801,\n            \"ISR\": 9.04026\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9485,\n            \"SIR\": 20.0149,\n            \"SAR\": 19.0058,\n            \"ISR\": 18.6304\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.15725,\n            \"SIR\": 10.082,\n            \"SAR\": 5.58263,\n            \"ISR\": 12.6914\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3331,\n            \"SIR\": 17.7535,\n            \"SAR\": 12.1266,\n            \"ISR\": 13.9989\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.84617,\n            \"SIR\": 20.7391,\n            \"SAR\": 9.16417,\n            \"ISR\": 14.8747\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8721,\n            \"SIR\": 21.0334,\n            \"SAR\": 17.0938,\n            \"ISR\": 24.1636\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.17646,\n            \"SIR\": 6.21319,\n            \"SAR\": 3.37286,\n            \"ISR\": 12.6694\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8308,\n            \"SIR\": 20.8908,\n            \"SAR\": 13.6898,\n            \"ISR\": 17.2492\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78333,\n            \"SIR\": 19.528,\n            \"SAR\": 10.1362,\n            \"ISR\": 13.927\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5802,\n            \"SIR\": 15.6572,\n            \"SAR\": 13.8603,\n            \"ISR\": 22.9156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.94432,\n            \"SIR\": 16.1917,\n            \"SAR\": 9.61374,\n            \"ISR\": 13.5481\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1097,\n            \"SIR\": 15.2072,\n            \"SAR\": 13.296,\n            \"ISR\": 20.4126\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0817,\n            \"SIR\": 18.7539,\n            \"SAR\": 10.6444,\n            \"ISR\": 14.5992\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4766,\n            \"SIR\": 18.5632,\n            \"SAR\": 16.1782,\n            \"ISR\": 24.8029\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.12834,\n            \"SIR\": 21.3618,\n            \"SAR\": 7.37531,\n            \"ISR\": 10.0298\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2119,\n            \"SIR\": 21.2127,\n            \"SAR\": 21.2972,\n            \"ISR\": 21.9574\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.15686,\n            \"SIR\": 25.3675,\n            \"SAR\": 10.0502,\n            \"ISR\": 13.0838\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3372,\n            \"SIR\": 16.2325,\n            \"SAR\": 15.0541,\n            \"ISR\": 19.6884\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.18828,\n            \"SIR\": 18.2793,\n            \"SAR\": 8.08023,\n            \"ISR\": 10.8534\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1823,\n            \"SIR\": 14.3265,\n            \"SAR\": 14.2533,\n            \"ISR\": 24.7347\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1216,\n            \"SIR\": 25.082,\n            \"SAR\": 10.8916,\n            \"ISR\": 16.3986\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4235,\n            \"SIR\": 19.3639,\n            \"SAR\": 14.9457,\n            \"ISR\": 22.6478\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.98685,\n            \"SIR\": 20.6453,\n            \"SAR\": 9.72844,\n            \"ISR\": 14.7501\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7031,\n            \"SIR\": 20.4256,\n            \"SAR\": 16.3468,\n            \"ISR\": 27.6831\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.45618,\n            \"SIR\": 18.7851,\n            \"SAR\": 7.63734,\n            \"ISR\": 12.4079\n          },\n          \"instrumental\": {\n            \"SDR\": 13.487,\n            \"SIR\": 17.408,\n            \"SAR\": 15.3937,\n            \"ISR\": 25.7793\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.32599,\n            \"SIR\": 21.1849,\n            \"SAR\": 8.68249,\n            \"ISR\": 13.2348\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2519,\n            \"SIR\": 18.2063,\n            \"SAR\": 15.7006,\n            \"ISR\": 18.1793\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.12959,\n            \"SIR\": 16.0573,\n            \"SAR\": 4.07229,\n            \"ISR\": 7.33628\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5519,\n            \"SIR\": 16.3618,\n            \"SAR\": 17.5517,\n            \"ISR\": 26.7607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.24351,\n            \"SIR\": 19.1347,\n            \"SAR\": 8.96653,\n            \"ISR\": 13.8539\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1893,\n            \"SIR\": 18.1446,\n            \"SAR\": 15.3549,\n            \"ISR\": 18.6805\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.08125,\n            \"SIR\": 20.9199,\n            \"SAR\": 7.14508,\n            \"ISR\": 11.225\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6604,\n            \"SIR\": 17.9009,\n            \"SAR\": 15.7184,\n            \"ISR\": 19.1352\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -33.3785,\n            \"SIR\": -39.6633,\n            \"SAR\": 1.02618,\n            \"ISR\": 1.94862\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6712,\n            \"SIR\": 48.8715,\n            \"SAR\": 11.7105,\n            \"ISR\": 14.6499\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04137,\n            \"SIR\": 15.3063,\n            \"SAR\": 5.57995,\n            \"ISR\": 8.85756\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0659,\n            \"SIR\": 17.1653,\n            \"SAR\": 17.1281,\n            \"ISR\": 27.3679\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.7064,\n            \"SIR\": 18.7822,\n            \"SAR\": 9.0536,\n            \"ISR\": 12.8876\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7245,\n            \"SIR\": 18.3741,\n            \"SAR\": 16.7035,\n            \"ISR\": 19.5378\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.466,\n            \"SIR\": 21.2373,\n            \"SAR\": 11.4843,\n            \"ISR\": 16.0906\n          },\n          \"instrumental\": {\n            \"SDR\": 16.693,\n            \"SIR\": 21.0931,\n            \"SAR\": 18.3088,\n            \"ISR\": 28.5648\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.64894,\n            \"SIR\": 20.1887,\n            \"SAR\": 7.02405,\n            \"ISR\": 11.7596\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4303,\n            \"SIR\": 17.418,\n            \"SAR\": 16.2131,\n            \"ISR\": 18.3726\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.54851,\n            \"SIR\": 13.8686,\n            \"SAR\": 3.21556,\n            \"ISR\": 5.48372\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4532,\n            \"SIR\": 12.1858,\n            \"SAR\": 15.7379,\n            \"ISR\": 19.6449\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.98389,\n            \"SIR\": 20.7384,\n            \"SAR\": 7.43142,\n            \"ISR\": 11.3862\n          },\n          \"instrumental\": {\n            \"SDR\": 15.661,\n            \"SIR\": 19.5961,\n            \"SAR\": 17.9449,\n            \"ISR\": 24.9838\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.18863,\n            \"SIR\": 16.5062,\n            \"SAR\": 6.34427,\n            \"ISR\": 10.1041\n          },\n          \"instrumental\": {\n            \"SDR\": 14.146,\n            \"SIR\": 17.8738,\n            \"SAR\": 17.0247,\n            \"ISR\": 26.666\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.56396,\n            \"SIR\": 16.783,\n            \"SAR\": 7.68562,\n            \"ISR\": 12.7959\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8673,\n            \"SIR\": 17.4061,\n            \"SAR\": 14.6248,\n            \"ISR\": 21.2252\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.96098,\n            \"SIR\": 18.2876,\n            \"SAR\": 8.56374,\n            \"ISR\": 11.7566\n          },\n          \"instrumental\": {\n            \"SDR\": 9.64605,\n            \"SIR\": 13.1294,\n            \"SAR\": 12.0116,\n            \"ISR\": 21.0546\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.92757,\n            \"SIR\": 12.2991,\n            \"SAR\": 4.36964,\n            \"ISR\": 8.5524\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1842,\n            \"SIR\": 12.6915,\n            \"SAR\": 12.5298,\n            \"ISR\": 20.9447\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.43627,\n            \"SIR\": 20.6598,\n            \"SAR\": 9.49164,\n            \"ISR\": 13.7033\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6497,\n            \"SIR\": 17.4029,\n            \"SAR\": 15.5961,\n            \"ISR\": 17.1511\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3885,\n            \"SIR\": 22.1686,\n            \"SAR\": 9.95249,\n            \"ISR\": 12.6987\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2248,\n            \"SIR\": 12.738,\n            \"SAR\": 10.7406,\n            \"ISR\": 23.7747\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.1546,\n            \"SIR\": 19.1153,\n            \"SAR\": 9.92178,\n            \"ISR\": 13.9526\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8004,\n            \"SIR\": 15.8696,\n            \"SAR\": 13.8896,\n            \"ISR\": 19.32\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3749,\n            \"SIR\": 30.0389,\n            \"SAR\": 11.0854,\n            \"ISR\": 15.5681\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5418,\n            \"SIR\": 22.7203,\n            \"SAR\": 19.6135,\n            \"ISR\": 19.3445\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.48667,\n            \"SIR\": 12.2044,\n            \"SAR\": 8.7007,\n            \"ISR\": 9.78261\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1585,\n            \"SIR\": 15.7049,\n            \"SAR\": 16.2092,\n            \"ISR\": 18.1336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4054,\n            \"SIR\": 29.0948,\n            \"SAR\": 14.8905,\n            \"ISR\": 21.3908\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3764,\n            \"SIR\": 27.1385,\n            \"SAR\": 20.5411,\n            \"ISR\": 28.9143\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2836,\n            \"SIR\": 20.2702,\n            \"SAR\": 12.5767,\n            \"ISR\": 18.5102\n          },\n          \"instrumental\": {\n            \"SDR\": 9.73039,\n            \"SIR\": 15.0866,\n            \"SAR\": 11.1375,\n            \"ISR\": 19.1111\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.3909,\n            \"SIR\": 15.7194,\n            \"SAR\": 6.61467,\n            \"ISR\": 11.0887\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3889,\n            \"SIR\": 15.7268,\n            \"SAR\": 13.6124,\n            \"ISR\": 23.3422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.67221,\n            \"SIR\": 15.8116,\n            \"SAR\": 6.46644,\n            \"ISR\": 11.5847\n          },\n          \"instrumental\": {\n            \"SDR\": 13.714,\n            \"SIR\": 19.362,\n            \"SAR\": 16.8602,\n            \"ISR\": 18.5052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.28571,\n            \"SIR\": 22.7856,\n            \"SAR\": 8.59891,\n            \"ISR\": 12.9408\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7266,\n            \"SIR\": 19.0016,\n            \"SAR\": 17.0098,\n            \"ISR\": 30.5897\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.74135,\n            \"SIR\": 24.7042,\n            \"SAR\": 10.1537,\n            \"ISR\": 13.61\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5424,\n            \"SIR\": 21.2135,\n            \"SAR\": 18.6524,\n            \"ISR\": 24.0354\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.22333,\n            \"SIR\": 22.2872,\n            \"SAR\": 8.58902,\n            \"ISR\": 12.3432\n          },\n          \"instrumental\": {\n            \"SDR\": 11.966,\n            \"SIR\": 15.8178,\n            \"SAR\": 13.9315,\n            \"ISR\": 21.7672\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.56321,\n            \"SIR\": 19.3301,\n            \"SAR\": 8.16277,\n            \"ISR\": 13.0004\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1486,\n            \"SIR\": 19.1956,\n            \"SAR\": 16.5119,\n            \"ISR\": 26.3537\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.22333,\n        \"SIR\": 19.1347,\n        \"SAR\": 8.58902,\n        \"ISR\": 12.6987\n      },\n      \"instrumental\": {\n        \"SDR\": 13.487,\n        \"SIR\": 17.8738,\n        \"SAR\": 15.7184,\n        \"ISR\": 21.2252\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"3_HP-Vocal-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 3_HP-Vocal-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.91517,\n            \"SIR\": 16.8581,\n            \"SAR\": 3.2452,\n            \"ISR\": 8.16718\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4662,\n            \"SIR\": 20.2019,\n            \"SAR\": 18.6457,\n            \"ISR\": 19.0081\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.87231,\n            \"SIR\": 10.679,\n            \"SAR\": 5.30314,\n            \"ISR\": 14.2979\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1178,\n            \"SIR\": 19.7643,\n            \"SAR\": 11.6539,\n            \"ISR\": 14.2331\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.78584,\n            \"SIR\": 20.6745,\n            \"SAR\": 8.79637,\n            \"ISR\": 15.9796\n          },\n          \"instrumental\": {\n            \"SDR\": 15.39,\n            \"SIR\": 22.6865,\n            \"SAR\": 16.7412,\n            \"ISR\": 20.3054\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.33282,\n            \"SIR\": 6.81916,\n            \"SAR\": 3.38208,\n            \"ISR\": 12.2759\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6479,\n            \"SIR\": 20.9998,\n            \"SAR\": 13.1945,\n            \"ISR\": 17.4514\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3712,\n            \"SIR\": 21.8084,\n            \"SAR\": 9.60849,\n            \"ISR\": 14.0572\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0453,\n            \"SIR\": 16.2533,\n            \"SAR\": 13.2216,\n            \"ISR\": 20.2902\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.19688,\n            \"SIR\": 17.5484,\n            \"SAR\": 9.57108,\n            \"ISR\": 15.1891\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4714,\n            \"SIR\": 17.7027,\n            \"SAR\": 12.7438,\n            \"ISR\": 20.8473\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4279,\n            \"SIR\": 19.8254,\n            \"SAR\": 10.8294,\n            \"ISR\": 16.3092\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9479,\n            \"SIR\": 20.8494,\n            \"SAR\": 15.8114,\n            \"ISR\": 24.9202\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.18588,\n            \"SIR\": 21.6962,\n            \"SAR\": 6.1854,\n            \"ISR\": 11.0684\n          },\n          \"instrumental\": {\n            \"SDR\": 18.9171,\n            \"SIR\": 25.7725,\n            \"SAR\": 21.9094,\n            \"ISR\": 22.6954\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.64193,\n            \"SIR\": 26.6181,\n            \"SAR\": 10.1025,\n            \"ISR\": 14.7193\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9314,\n            \"SIR\": 18.7568,\n            \"SAR\": 15.1092,\n            \"ISR\": 18.9239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.62488,\n            \"SIR\": 20.0665,\n            \"SAR\": 8.16844,\n            \"ISR\": 11.9757\n          },\n          \"instrumental\": {\n            \"SDR\": 13.978,\n            \"SIR\": 15.9196,\n            \"SAR\": 13.993,\n            \"ISR\": 25.775\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0016,\n            \"SIR\": 25.9658,\n            \"SAR\": 11.3507,\n            \"ISR\": 18.836\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0458,\n            \"SIR\": 21.9457,\n            \"SAR\": 14.9457,\n            \"ISR\": 23.2925\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3204,\n            \"SIR\": 21.8704,\n            \"SAR\": 9.97393,\n            \"ISR\": 16.7\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3138,\n            \"SIR\": 23.1138,\n            \"SAR\": 16.2695,\n            \"ISR\": 27.9776\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.10087,\n            \"SIR\": 18.958,\n            \"SAR\": 7.08537,\n            \"ISR\": 13.6148\n          },\n          \"instrumental\": {\n            \"SDR\": 14.759,\n            \"SIR\": 21.1519,\n            \"SAR\": 15.4328,\n            \"ISR\": 25.6944\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.09266,\n            \"SIR\": 18.035,\n            \"SAR\": 7.28869,\n            \"ISR\": 12.4561\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3853,\n            \"SIR\": 23.3289,\n            \"SAR\": 17.736,\n            \"ISR\": 19.4768\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.86776,\n            \"SIR\": 15.9253,\n            \"SAR\": 1.5362,\n            \"ISR\": 7.23622\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8172,\n            \"SIR\": 18.5467,\n            \"SAR\": 17.8699,\n            \"ISR\": 29.3533\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.4086,\n            \"SIR\": 20.583,\n            \"SAR\": 8.67683,\n            \"ISR\": 15.3268\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6506,\n            \"SIR\": 20.222,\n            \"SAR\": 14.9289,\n            \"ISR\": 19.2371\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.65148,\n            \"SIR\": 19.8511,\n            \"SAR\": 6.44746,\n            \"ISR\": 11.2586\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9796,\n            \"SIR\": 18.2897,\n            \"SAR\": 15.591,\n            \"ISR\": 20.778\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -24.8196,\n            \"SIR\": -33.9034,\n            \"SAR\": 0.063335,\n            \"ISR\": -0.206185\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1553,\n            \"SIR\": 48.2966,\n            \"SAR\": 12.8159,\n            \"ISR\": 17.3851\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.13427,\n            \"SIR\": 17.0125,\n            \"SAR\": 5.3305,\n            \"ISR\": 9.62459\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7184,\n            \"SIR\": 18.1468,\n            \"SAR\": 16.7563,\n            \"ISR\": 27.7495\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.51187,\n            \"SIR\": 20.3525,\n            \"SAR\": 9.70296,\n            \"ISR\": 16.0271\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1807,\n            \"SIR\": 21.9168,\n            \"SAR\": 16.1741,\n            \"ISR\": 19.5927\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.501,\n            \"SIR\": 21.295,\n            \"SAR\": 10.8468,\n            \"ISR\": 18.1803\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3438,\n            \"SIR\": 26.0776,\n            \"SAR\": 18.9514,\n            \"ISR\": 29.0685\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0026,\n            \"SIR\": 19.2467,\n            \"SAR\": 5.54823,\n            \"ISR\": 10.5623\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1638,\n            \"SIR\": 19.4386,\n            \"SAR\": 16.8441,\n            \"ISR\": 18.7556\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.43321,\n            \"SIR\": 15.4703,\n            \"SAR\": 2.65737,\n            \"ISR\": 5.55385\n          },\n          \"instrumental\": {\n            \"SDR\": 10.468,\n            \"SIR\": 12.3601,\n            \"SAR\": 15.0377,\n            \"ISR\": 19.4039\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0421,\n            \"SIR\": 23.0577,\n            \"SAR\": 6.05479,\n            \"ISR\": 11.215\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2768,\n            \"SIR\": 20.0932,\n            \"SAR\": 16.934,\n            \"ISR\": 21.3088\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.20149,\n            \"SIR\": 19.1073,\n            \"SAR\": 5.96094,\n            \"ISR\": 10.3622\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1593,\n            \"SIR\": 18.4907,\n            \"SAR\": 16.4001,\n            \"ISR\": 28.6375\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.96345,\n            \"SIR\": 17.3898,\n            \"SAR\": 6.72148,\n            \"ISR\": 13.3801\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5203,\n            \"SIR\": 19.5884,\n            \"SAR\": 14.9257,\n            \"ISR\": 20.8662\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.08213,\n            \"SIR\": 20.4998,\n            \"SAR\": 8.55644,\n            \"ISR\": 12.3076\n          },\n          \"instrumental\": {\n            \"SDR\": 9.84642,\n            \"SIR\": 14.1563,\n            \"SAR\": 11.5201,\n            \"ISR\": 22.9371\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.68281,\n            \"SIR\": 14.2775,\n            \"SAR\": 3.46882,\n            \"ISR\": 9.08678\n          },\n          \"instrumental\": {\n            \"SDR\": 10.683,\n            \"SIR\": 13.6028,\n            \"SAR\": 12.0386,\n            \"ISR\": 21.9524\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.54331,\n            \"SIR\": 21.2135,\n            \"SAR\": 9.13919,\n            \"ISR\": 15.824\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0925,\n            \"SIR\": 20.0763,\n            \"SAR\": 15.095,\n            \"ISR\": 17.4334\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4474,\n            \"SIR\": 24.3422,\n            \"SAR\": 10.9209,\n            \"ISR\": 14.2822\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9888,\n            \"SIR\": 13.9326,\n            \"SAR\": 10.5722,\n            \"ISR\": 24.7506\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.48696,\n            \"SIR\": 20.8567,\n            \"SAR\": 9.90332,\n            \"ISR\": 14.5902\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1449,\n            \"SIR\": 17.3311,\n            \"SAR\": 13.4512,\n            \"ISR\": 18.9254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5409,\n            \"SIR\": 30.087,\n            \"SAR\": 10.7854,\n            \"ISR\": 16.8471\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6245,\n            \"SIR\": 24.8654,\n            \"SAR\": 20.0458,\n            \"ISR\": 19.6017\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.86481,\n            \"SIR\": 8.85314,\n            \"SAR\": 5.45329,\n            \"ISR\": 10.1158\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6862,\n            \"SIR\": 15.5776,\n            \"SAR\": 13.5164,\n            \"ISR\": 15.7832\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1155,\n            \"SIR\": 28.5913,\n            \"SAR\": 14.2347,\n            \"ISR\": 22.5612\n          },\n          \"instrumental\": {\n            \"SDR\": 20.2309,\n            \"SIR\": 29.5669,\n            \"SAR\": 20.6627,\n            \"ISR\": 32.7618\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3247,\n            \"SIR\": 20.9764,\n            \"SAR\": 12.2169,\n            \"ISR\": 21.2506\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3742,\n            \"SIR\": 18.7744,\n            \"SAR\": 10.7533,\n            \"ISR\": 19.3459\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.31983,\n            \"SIR\": 18.6562,\n            \"SAR\": 6.15057,\n            \"ISR\": 12.1549\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0936,\n            \"SIR\": 17.3127,\n            \"SAR\": 13.3578,\n            \"ISR\": 24.9136\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.68265,\n            \"SIR\": 17.4903,\n            \"SAR\": 5.99527,\n            \"ISR\": 12.1328\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8438,\n            \"SIR\": 20.3494,\n            \"SAR\": 15.9869,\n            \"ISR\": 18.8644\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.64884,\n            \"SIR\": 25.7387,\n            \"SAR\": 8.71953,\n            \"ISR\": 14.5045\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7677,\n            \"SIR\": 21.1307,\n            \"SAR\": 17.1144,\n            \"ISR\": 31.3504\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.66948,\n            \"SIR\": 25.4119,\n            \"SAR\": 9.57384,\n            \"ISR\": 14.9568\n          },\n          \"instrumental\": {\n            \"SDR\": 17.821,\n            \"SIR\": 23.8168,\n            \"SAR\": 18.7191,\n            \"ISR\": 24.7344\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92995,\n            \"SIR\": 22.3634,\n            \"SAR\": 9.40023,\n            \"ISR\": 14.7581\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3516,\n            \"SIR\": 19.0998,\n            \"SAR\": 14.5473,\n            \"ISR\": 21.3029\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.21814,\n            \"SIR\": 21.5639,\n            \"SAR\": 8.34549,\n            \"ISR\": 14.0699\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6893,\n            \"SIR\": 20.8312,\n            \"SAR\": 16.6097,\n            \"ISR\": 28.4509\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.21814,\n        \"SIR\": 20.3525,\n        \"SAR\": 8.34549,\n        \"ISR\": 14.0572\n      },\n      \"instrumental\": {\n        \"SDR\": 14.0458,\n        \"SIR\": 20.0763,\n        \"SAR\": 15.4328,\n        \"ISR\": 20.8662\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"4_HP-Vocal-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 4_HP-Vocal-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.13247,\n            \"SIR\": 15.6369,\n            \"SAR\": 4.46161,\n            \"ISR\": 9.59156\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9272,\n            \"SIR\": 20.4818,\n            \"SAR\": 17.6023,\n            \"ISR\": 18.6389\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.70667,\n            \"SIR\": 9.1547,\n            \"SAR\": 5.63561,\n            \"ISR\": 15.3189\n          },\n          \"instrumental\": {\n            \"SDR\": 9.99389,\n            \"SIR\": 20.6067,\n            \"SAR\": 11.2725,\n            \"ISR\": 14.0836\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.67695,\n            \"SIR\": 19.5138,\n            \"SAR\": 8.81367,\n            \"ISR\": 16.6883\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2974,\n            \"SIR\": 23.0615,\n            \"SAR\": 16.4926,\n            \"ISR\": 21.7556\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.68009,\n            \"SIR\": 6.93765,\n            \"SAR\": 3.80179,\n            \"ISR\": 12.8639\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6498,\n            \"SIR\": 21.7705,\n            \"SAR\": 13.1645,\n            \"ISR\": 17.2212\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2667,\n            \"SIR\": 20.1792,\n            \"SAR\": 10.1437,\n            \"ISR\": 15.7367\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0328,\n            \"SIR\": 17.8452,\n            \"SAR\": 13.5695,\n            \"ISR\": 22.1146\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.98945,\n            \"SIR\": 16.0069,\n            \"SAR\": 9.495,\n            \"ISR\": 15.8425\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1837,\n            \"SIR\": 18.1981,\n            \"SAR\": 12.391,\n            \"ISR\": 19.1886\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3533,\n            \"SIR\": 18.5671,\n            \"SAR\": 10.644,\n            \"ISR\": 16.3089\n          },\n          \"instrumental\": {\n            \"SDR\": 14.794,\n            \"SIR\": 20.8559,\n            \"SAR\": 15.6846,\n            \"ISR\": 23.579\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.77921,\n            \"SIR\": 21.2356,\n            \"SAR\": 6.48015,\n            \"ISR\": 12.9815\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2221,\n            \"SIR\": 24.5152,\n            \"SAR\": 20.7935,\n            \"ISR\": 22.7143\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.42073,\n            \"SIR\": 24.541,\n            \"SAR\": 9.88334,\n            \"ISR\": 15.7559\n          },\n          \"instrumental\": {\n            \"SDR\": 12.643,\n            \"SIR\": 19.5706,\n            \"SAR\": 14.6142,\n            \"ISR\": 18.391\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.52604,\n            \"SIR\": 16.957,\n            \"SAR\": 8.037,\n            \"ISR\": 12.5391\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7151,\n            \"SIR\": 16.3357,\n            \"SAR\": 13.7748,\n            \"ISR\": 22.8928\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8826,\n            \"SIR\": 24.1573,\n            \"SAR\": 11.2586,\n            \"ISR\": 19.4754\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5627,\n            \"SIR\": 22.4968,\n            \"SAR\": 14.7527,\n            \"ISR\": 22.4351\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2098,\n            \"SIR\": 20.6167,\n            \"SAR\": 10.0719,\n            \"ISR\": 17.5743\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3437,\n            \"SIR\": 24.0073,\n            \"SAR\": 16.3457,\n            \"ISR\": 26.8621\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.12336,\n            \"SIR\": 17.4523,\n            \"SAR\": 7.43998,\n            \"ISR\": 14.7645\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1192,\n            \"SIR\": 20.6636,\n            \"SAR\": 15.001,\n            \"ISR\": 23.6453\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.72779,\n            \"SIR\": 16.3995,\n            \"SAR\": 7.32047,\n            \"ISR\": 16.9334\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9237,\n            \"SIR\": 25.5636,\n            \"SAR\": 16.2709,\n            \"ISR\": 18.4361\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.87861,\n            \"SIR\": 16.0076,\n            \"SAR\": 2.46015,\n            \"ISR\": 7.74823\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4113,\n            \"SIR\": 17.9885,\n            \"SAR\": 17.3031,\n            \"ISR\": 28.4045\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.34587,\n            \"SIR\": 19.053,\n            \"SAR\": 8.61658,\n            \"ISR\": 15.9432\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2879,\n            \"SIR\": 20.8668,\n            \"SAR\": 14.7945,\n            \"ISR\": 18.6885\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.62502,\n            \"SIR\": 16.7401,\n            \"SAR\": 6.47612,\n            \"ISR\": 11.634\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4188,\n            \"SIR\": 18.297,\n            \"SAR\": 14.8591,\n            \"ISR\": 19.6905\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -21.2094,\n            \"SIR\": -40.1099,\n            \"SAR\": 0.462865,\n            \"ISR\": 3.67031\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5227,\n            \"SIR\": 52.239,\n            \"SAR\": 11.0897,\n            \"ISR\": 13.9405\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.35629,\n            \"SIR\": 15.8628,\n            \"SAR\": 5.76441,\n            \"ISR\": 10.1244\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5837,\n            \"SIR\": 18.7391,\n            \"SAR\": 16.6746,\n            \"ISR\": 26.3426\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.57781,\n            \"SIR\": 18.1643,\n            \"SAR\": 8.6365,\n            \"ISR\": 15.784\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1628,\n            \"SIR\": 22.1182,\n            \"SAR\": 16.0756,\n            \"ISR\": 19.0953\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1506,\n            \"SIR\": 19.4336,\n            \"SAR\": 10.5703,\n            \"ISR\": 18.1863\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8857,\n            \"SIR\": 27.7745,\n            \"SAR\": 19.2882,\n            \"ISR\": 27.3143\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.92969,\n            \"SIR\": 16.7152,\n            \"SAR\": 5.4446,\n            \"ISR\": 11.2325\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4916,\n            \"SIR\": 20.5148,\n            \"SAR\": 16.1495,\n            \"ISR\": 17.8874\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.39677,\n            \"SIR\": 13.9635,\n            \"SAR\": 2.82003,\n            \"ISR\": 5.70048\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3135,\n            \"SIR\": 12.4438,\n            \"SAR\": 14.9513,\n            \"ISR\": 18.6463\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.6162,\n            \"SIR\": 21.7675,\n            \"SAR\": 7.44895,\n            \"ISR\": 13.2748\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4824,\n            \"SIR\": 20.7977,\n            \"SAR\": 16.5705,\n            \"ISR\": 21.3009\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.16051,\n            \"SIR\": 18.0268,\n            \"SAR\": 6.00783,\n            \"ISR\": 10.643\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1873,\n            \"SIR\": 18.7806,\n            \"SAR\": 16.3202,\n            \"ISR\": 27.5497\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.83194,\n            \"SIR\": 13.4755,\n            \"SAR\": 6.8711,\n            \"ISR\": 15.2048\n          },\n          \"instrumental\": {\n            \"SDR\": 14.343,\n            \"SIR\": 21.0919,\n            \"SAR\": 13.7404,\n            \"ISR\": 19.9369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42617,\n            \"SIR\": 18.5918,\n            \"SAR\": 8.83512,\n            \"ISR\": 13.4685\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0471,\n            \"SIR\": 15.4826,\n            \"SAR\": 11.6525,\n            \"ISR\": 20.8751\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.40799,\n            \"SIR\": 9.89043,\n            \"SAR\": 3.94591,\n            \"ISR\": 10.4701\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2851,\n            \"SIR\": 14.5194,\n            \"SAR\": 11.2261,\n            \"ISR\": 17.6695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.3505,\n            \"SIR\": 19.1756,\n            \"SAR\": 8.86374,\n            \"ISR\": 15.6199\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8628,\n            \"SIR\": 19.8675,\n            \"SAR\": 15.0554,\n            \"ISR\": 16.7884\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2802,\n            \"SIR\": 22.5119,\n            \"SAR\": 11.4679,\n            \"ISR\": 15.7357\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7872,\n            \"SIR\": 15.2864,\n            \"SAR\": 11.1203,\n            \"ISR\": 23.0367\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.25619,\n            \"SIR\": 19.2091,\n            \"SAR\": 9.88221,\n            \"ISR\": 14.9679\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6922,\n            \"SIR\": 17.9643,\n            \"SAR\": 13.6356,\n            \"ISR\": 16.0659\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2684,\n            \"SIR\": 28.9612,\n            \"SAR\": 10.5797,\n            \"ISR\": 17.0502\n          },\n          \"instrumental\": {\n            \"SDR\": 16.436,\n            \"SIR\": 24.7021,\n            \"SAR\": 19.7067,\n            \"ISR\": 19.5462\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.59637,\n            \"SIR\": 9.13157,\n            \"SAR\": 6.39332,\n            \"ISR\": 9.71978\n          },\n          \"instrumental\": {\n            \"SDR\": 10.194,\n            \"SIR\": 15.0814,\n            \"SAR\": 12.5173,\n            \"ISR\": 15.308\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3684,\n            \"SIR\": 28.6667,\n            \"SAR\": 14.4789,\n            \"ISR\": 22.8522\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3386,\n            \"SIR\": 29.4806,\n            \"SAR\": 19.9103,\n            \"ISR\": 32.3307\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0873,\n            \"SIR\": 17.8703,\n            \"SAR\": 10.8269,\n            \"ISR\": 20.4341\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0302,\n            \"SIR\": 19.7611,\n            \"SAR\": 11.225,\n            \"ISR\": 17.6389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.11536,\n            \"SIR\": 15.587,\n            \"SAR\": 6.2916,\n            \"ISR\": 13.1744\n          },\n          \"instrumental\": {\n            \"SDR\": 11.569,\n            \"SIR\": 17.7935,\n            \"SAR\": 12.7457,\n            \"ISR\": 21.6239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04269,\n            \"SIR\": 17.7199,\n            \"SAR\": 6.36607,\n            \"ISR\": 12.8428\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9125,\n            \"SIR\": 20.4129,\n            \"SAR\": 15.4367,\n            \"ISR\": 18.6254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.78669,\n            \"SIR\": 23.9042,\n            \"SAR\": 8.79118,\n            \"ISR\": 15.3333\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5209,\n            \"SIR\": 21.7498,\n            \"SAR\": 16.7422,\n            \"ISR\": 29.4908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.97353,\n            \"SIR\": 24.067,\n            \"SAR\": 9.98042,\n            \"ISR\": 15.3978\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2547,\n            \"SIR\": 23.3484,\n            \"SAR\": 18.2358,\n            \"ISR\": 24.1993\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92523,\n            \"SIR\": 21.1312,\n            \"SAR\": 9.516,\n            \"ISR\": 15.3797\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8788,\n            \"SIR\": 19.4166,\n            \"SAR\": 14.0514,\n            \"ISR\": 20.5532\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.03945,\n            \"SIR\": 19.9597,\n            \"SAR\": 8.36785,\n            \"ISR\": 14.808\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5298,\n            \"SIR\": 21.413,\n            \"SAR\": 16.0827,\n            \"ISR\": 26.727\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.34587,\n        \"SIR\": 18.1643,\n        \"SAR\": 8.36785,\n        \"ISR\": 15.2048\n      },\n      \"instrumental\": {\n        \"SDR\": 13.5627,\n        \"SIR\": 20.5148,\n        \"SAR\": 15.001,\n        \"ISR\": 20.5532\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"5_HP-Karaoke-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 5_HP-Karaoke-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.37002,\n            \"SIR\": 20.6581,\n            \"SAR\": 1.28458,\n            \"ISR\": 4.10107\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9129,\n            \"SIR\": 15.7183,\n            \"SAR\": 20.1,\n            \"ISR\": 19.509\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.12631,\n            \"SIR\": 20.9962,\n            \"SAR\": 0.86005,\n            \"ISR\": 5.41791\n          },\n          \"instrumental\": {\n            \"SDR\": 9.77765,\n            \"SIR\": 10.6192,\n            \"SAR\": 13.3887,\n            \"ISR\": 18.5952\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.75754,\n            \"SIR\": 23.402,\n            \"SAR\": 4.82637,\n            \"ISR\": 8.0752\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2249,\n            \"SIR\": 13.8208,\n            \"SAR\": 14.8301,\n            \"ISR\": 22.8881\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.626275,\n            \"SIR\": 9.94143,\n            \"SAR\": 0.515735,\n            \"ISR\": 7.10507\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9709,\n            \"SIR\": 15.449,\n            \"SAR\": 13.8028,\n            \"ISR\": 23.6236\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.22375,\n            \"SIR\": 23.2173,\n            \"SAR\": 4.45585,\n            \"ISR\": 7.65573\n          },\n          \"instrumental\": {\n            \"SDR\": 9.83016,\n            \"SIR\": 9.65897,\n            \"SAR\": 11.5199,\n            \"ISR\": 22.2742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.03802,\n            \"SIR\": 16.5898,\n            \"SAR\": -0.86877,\n            \"ISR\": 2.67806\n          },\n          \"instrumental\": {\n            \"SDR\": 4.67641,\n            \"SIR\": 4.5102,\n            \"SAR\": 13.2892,\n            \"ISR\": 20.0426\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.64396,\n            \"SIR\": 22.754,\n            \"SAR\": 2e-05,\n            \"ISR\": 3.11022\n          },\n          \"instrumental\": {\n            \"SDR\": 7.90129,\n            \"SIR\": 7.06185,\n            \"SAR\": 14.6852,\n            \"ISR\": 22.1027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.98136,\n            \"SIR\": 21.9962,\n            \"SAR\": 5.95488,\n            \"ISR\": 7.88767\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4522,\n            \"SIR\": 18.8283,\n            \"SAR\": 21.0703,\n            \"ISR\": 22.5482\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.29402,\n            \"SIR\": 26.6818,\n            \"SAR\": 10.0371,\n            \"ISR\": 13.1765\n          },\n          \"instrumental\": {\n            \"SDR\": 12.455,\n            \"SIR\": 16.2953,\n            \"SAR\": 15.1409,\n            \"ISR\": 20.0204\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.17494,\n            \"SIR\": 24.4007,\n            \"SAR\": 4.11477,\n            \"ISR\": 7.16253\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8207,\n            \"SIR\": 9.76355,\n            \"SAR\": 12.7336,\n            \"ISR\": 30.9008\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5061,\n            \"SIR\": 27.6891,\n            \"SAR\": 11.2293,\n            \"ISR\": 16.4247\n          },\n          \"instrumental\": {\n            \"SDR\": 13.739,\n            \"SIR\": 19.5739,\n            \"SAR\": 15.3707,\n            \"ISR\": 23.5726\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.91157,\n            \"SIR\": 25.7552,\n            \"SAR\": 7.14882,\n            \"ISR\": 9.55243\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2584,\n            \"SIR\": 14.2008,\n            \"SAR\": 14.9355,\n            \"ISR\": 32.1099\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.83911,\n            \"SIR\": 23.4852,\n            \"SAR\": 0.77468,\n            \"ISR\": 4.56767\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2504,\n            \"SIR\": 8.17764,\n            \"SAR\": 13.5702,\n            \"ISR\": 23.2439\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.267,\n            \"SIR\": 24.4517,\n            \"SAR\": 6.91519,\n            \"ISR\": 10.2792\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2564,\n            \"SIR\": 16.2753,\n            \"SAR\": 14.9453,\n            \"ISR\": 19.2375\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.94536,\n            \"SIR\": 20.8094,\n            \"SAR\": 2.50427,\n            \"ISR\": 4.86938\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4875,\n            \"SIR\": 13.2666,\n            \"SAR\": 17.6702,\n            \"ISR\": 30.5098\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.5519,\n            \"SIR\": 21.514,\n            \"SAR\": 4.61754,\n            \"ISR\": 5.76929\n          },\n          \"instrumental\": {\n            \"SDR\": 9.75042,\n            \"SIR\": 10.0978,\n            \"SAR\": 15.7619,\n            \"ISR\": 20.788\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.1929,\n            \"SIR\": 25.0865,\n            \"SAR\": 3.90898,\n            \"ISR\": 7.33379\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8645,\n            \"SIR\": 14.2523,\n            \"SAR\": 14.8487,\n            \"ISR\": 18.4454\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.47434,\n            \"SIR\": 17.5772,\n            \"SAR\": 1.74966,\n            \"ISR\": 4.74526\n          },\n          \"instrumental\": {\n            \"SDR\": 9.0956,\n            \"SIR\": 8.34041,\n            \"SAR\": 11.9886,\n            \"ISR\": 28.2925\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.16767,\n            \"SIR\": 24.4379,\n            \"SAR\": -0.36296,\n            \"ISR\": 1.54159\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5778,\n            \"SIR\": 10.313,\n            \"SAR\": 22.3135,\n            \"ISR\": 19.6916\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.09398,\n            \"SIR\": 20.3467,\n            \"SAR\": 6.88533,\n            \"ISR\": 9.06582\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7232,\n            \"SIR\": 14.0921,\n            \"SAR\": 15.5965,\n            \"ISR\": 20.2129\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.83676,\n            \"SIR\": 24.0007,\n            \"SAR\": 9.26256,\n            \"ISR\": 10.9254\n          },\n          \"instrumental\": {\n            \"SDR\": 13.531,\n            \"SIR\": 15.8398,\n            \"SAR\": 17.2165,\n            \"ISR\": 31.7996\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.12105,\n            \"SIR\": 21.447,\n            \"SAR\": 6.409,\n            \"ISR\": 9.03096\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4939,\n            \"SIR\": 15.4179,\n            \"SAR\": 15.4303,\n            \"ISR\": 18.6236\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.33194,\n            \"SIR\": 16.961,\n            \"SAR\": -0.19185,\n            \"ISR\": 2.87461\n          },\n          \"instrumental\": {\n            \"SDR\": 9.6199,\n            \"SIR\": 9.84252,\n            \"SAR\": 16.2327,\n            \"ISR\": 20.7356\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.98419,\n            \"SIR\": 28.532,\n            \"SAR\": 0.58217,\n            \"ISR\": 3.9544\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9419,\n            \"SIR\": 10.6751,\n            \"SAR\": 14.4384,\n            \"ISR\": 18.9691\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.20256,\n            \"SIR\": 20.3411,\n            \"SAR\": 3.31471,\n            \"ISR\": 6.02608\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8473,\n            \"SIR\": 13.6137,\n            \"SAR\": 16.5213,\n            \"ISR\": 28.6294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.73143,\n            \"SIR\": 20.9692,\n            \"SAR\": 4.92334,\n            \"ISR\": 7.27296\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9532,\n            \"SIR\": 12.4452,\n            \"SAR\": 13.9094,\n            \"ISR\": 21.2634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.94257,\n            \"SIR\": 20.0432,\n            \"SAR\": 7.22916,\n            \"ISR\": 9.16384\n          },\n          \"instrumental\": {\n            \"SDR\": 8.09578,\n            \"SIR\": 9.96014,\n            \"SAR\": 11.2158,\n            \"ISR\": 23.4258\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.67678,\n            \"SIR\": 20.6534,\n            \"SAR\": 0.01114,\n            \"ISR\": 3.09242\n          },\n          \"instrumental\": {\n            \"SDR\": 8.43165,\n            \"SIR\": 7.24666,\n            \"SAR\": 15.372,\n            \"ISR\": 33.9394\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.43215,\n            \"SIR\": 23.2316,\n            \"SAR\": 2.39183,\n            \"ISR\": 5.04692\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8677,\n            \"SIR\": 10.1434,\n            \"SAR\": 14.2553,\n            \"ISR\": 18.664\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.12551,\n            \"SIR\": 26.6695,\n            \"SAR\": 2.34288,\n            \"ISR\": 4.43896\n          },\n          \"instrumental\": {\n            \"SDR\": 5.10695,\n            \"SIR\": 3.9547,\n            \"SAR\": 5.4535,\n            \"ISR\": 28.8859\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.25431,\n            \"SIR\": 23.7119,\n            \"SAR\": 1.08353,\n            \"ISR\": 4.6803\n          },\n          \"instrumental\": {\n            \"SDR\": 5.79186,\n            \"SIR\": 6.29142,\n            \"SAR\": 11.0789,\n            \"ISR\": 19.9545\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.60276,\n            \"SIR\": 34.1921,\n            \"SAR\": 3.3619,\n            \"ISR\": 5.20793\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4475,\n            \"SIR\": 13.3676,\n            \"SAR\": 16.51,\n            \"ISR\": 19.6462\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.60822,\n            \"SIR\": 17.869,\n            \"SAR\": 0.38194,\n            \"ISR\": 4.6327\n          },\n          \"instrumental\": {\n            \"SDR\": 8.18715,\n            \"SIR\": 9.50725,\n            \"SAR\": 13.0135,\n            \"ISR\": 18.7294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.38711,\n            \"SIR\": 26.3399,\n            \"SAR\": 4.74832,\n            \"ISR\": 8.57357\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6023,\n            \"SIR\": 14.1386,\n            \"SAR\": 14.9156,\n            \"ISR\": 28.9532\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.09805,\n            \"SIR\": 20.559,\n            \"SAR\": 3.8303,\n            \"ISR\": 6.01517\n          },\n          \"instrumental\": {\n            \"SDR\": 2.49153,\n            \"SIR\": 3.15333,\n            \"SAR\": 9.33386,\n            \"ISR\": 24.2134\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.255,\n            \"SIR\": 21.3071,\n            \"SAR\": 4.10065,\n            \"ISR\": 7.21003\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3007,\n            \"SIR\": 11.0636,\n            \"SAR\": 13.2231,\n            \"ISR\": 30.2452\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.58559,\n            \"SIR\": 19.4079,\n            \"SAR\": 6.9424,\n            \"ISR\": 11.2179\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7295,\n            \"SIR\": 17.6338,\n            \"SAR\": 15.811,\n            \"ISR\": 19.1735\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.78755,\n            \"SIR\": 24.0087,\n            \"SAR\": 7.47872,\n            \"ISR\": 11.2765\n          },\n          \"instrumental\": {\n            \"SDR\": 14.543,\n            \"SIR\": 17.0304,\n            \"SAR\": 16.3599,\n            \"ISR\": 28.2525\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.81987,\n            \"SIR\": 26.6712,\n            \"SAR\": 6.12698,\n            \"ISR\": 9.03664\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1393,\n            \"SIR\": 15.7737,\n            \"SAR\": 16.1919,\n            \"ISR\": 24.8093\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2805,\n            \"SIR\": 25.3755,\n            \"SAR\": 8.52245,\n            \"ISR\": 11.7108\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4196,\n            \"SIR\": 13.5415,\n            \"SAR\": 13.3605,\n            \"ISR\": 22.6007\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.77034,\n            \"SIR\": 20.2996,\n            \"SAR\": 8.54363,\n            \"ISR\": 12.7826\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3102,\n            \"SIR\": 18.9809,\n            \"SAR\": 16.7468,\n            \"ISR\": 27.2321\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 5.20256,\n        \"SIR\": 22.754,\n        \"SAR\": 4.10065,\n        \"ISR\": 7.16253\n      },\n      \"instrumental\": {\n        \"SDR\": 12.2249,\n        \"SIR\": 13.2666,\n        \"SAR\": 14.9156,\n        \"ISR\": 22.5482\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"6_HP-Karaoke-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 6_HP-Karaoke-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.40569,\n            \"SIR\": 22.1346,\n            \"SAR\": 0.01828,\n            \"ISR\": 4.20645\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9884,\n            \"SIR\": 15.2129,\n            \"SAR\": 19.224,\n            \"ISR\": 19.398\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.61504,\n            \"SIR\": 18.5853,\n            \"SAR\": 1.75763,\n            \"ISR\": 6.13348\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3293,\n            \"SIR\": 11.221,\n            \"SAR\": 13.52,\n            \"ISR\": 18.4318\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.27414,\n            \"SIR\": 23.2346,\n            \"SAR\": 4.95217,\n            \"ISR\": 8.29752\n          },\n          \"instrumental\": {\n            \"SDR\": 12.558,\n            \"SIR\": 14.0447,\n            \"SAR\": 15.2018,\n            \"ISR\": 20.5359\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00614,\n            \"SIR\": 7.77224,\n            \"SAR\": 0.14026,\n            \"ISR\": 7.2008\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6506,\n            \"SIR\": 15.861,\n            \"SAR\": 13.7142,\n            \"ISR\": 21.8189\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.50836,\n            \"SIR\": 23.3981,\n            \"SAR\": 4.87885,\n            \"ISR\": 8.29334\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5566,\n            \"SIR\": 10.2813,\n            \"SAR\": 11.4075,\n            \"ISR\": 23.0069\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.63666,\n            \"SIR\": 17.1416,\n            \"SAR\": 0.34978,\n            \"ISR\": 3.49187\n          },\n          \"instrumental\": {\n            \"SDR\": 5.18988,\n            \"SIR\": 5.19426,\n            \"SAR\": 12.7515,\n            \"ISR\": 19.6689\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.001045,\n            \"SIR\": 17.143,\n            \"SAR\": 0.0,\n            \"ISR\": 0.402455\n          },\n          \"instrumental\": {\n            \"SDR\": 5.62452,\n            \"SIR\": 4.90761,\n            \"SAR\": 24.9334,\n            \"ISR\": 17.5162\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.49108,\n            \"SIR\": 21.2738,\n            \"SAR\": 4.47166,\n            \"ISR\": 7.18649\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9468,\n            \"SIR\": 19.7873,\n            \"SAR\": 21.9072,\n            \"ISR\": 21.991\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.50031,\n            \"SIR\": 26.7489,\n            \"SAR\": 10.4,\n            \"ISR\": 13.7043\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5768,\n            \"SIR\": 17.0184,\n            \"SAR\": 15.1751,\n            \"ISR\": 18.7333\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.83438,\n            \"SIR\": 23.3221,\n            \"SAR\": 4.48195,\n            \"ISR\": 7.44542\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6954,\n            \"SIR\": 9.97841,\n            \"SAR\": 12.7788,\n            \"ISR\": 30.7868\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4136,\n            \"SIR\": 26.3365,\n            \"SAR\": 12.1671,\n            \"ISR\": 17.9544\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4065,\n            \"SIR\": 20.9015,\n            \"SAR\": 15.8924,\n            \"ISR\": 23.4855\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.80753,\n            \"SIR\": 26.6171,\n            \"SAR\": 7.61718,\n            \"ISR\": 10.0965\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2886,\n            \"SIR\": 14.2347,\n            \"SAR\": 15.1665,\n            \"ISR\": 33.5654\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.74924,\n            \"SIR\": 21.6133,\n            \"SAR\": 1.58186,\n            \"ISR\": 5.23802\n          },\n          \"instrumental\": {\n            \"SDR\": 13.426,\n            \"SIR\": 11.1279,\n            \"SAR\": 14.2701,\n            \"ISR\": 25.4548\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.64021,\n            \"SIR\": 20.95,\n            \"SAR\": 5.06153,\n            \"ISR\": 9.5488\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8081,\n            \"SIR\": 19.9898,\n            \"SAR\": 18.3629,\n            \"ISR\": 18.8289\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.59429,\n            \"SIR\": 19.959,\n            \"SAR\": 2.25885,\n            \"ISR\": 5.76088\n          },\n          \"instrumental\": {\n            \"SDR\": 13.703,\n            \"SIR\": 15.297,\n            \"SAR\": 18.129,\n            \"ISR\": 30.1415\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.18976,\n            \"SIR\": 20.748,\n            \"SAR\": 5.54638,\n            \"ISR\": 5.5326\n          },\n          \"instrumental\": {\n            \"SDR\": 9.45943,\n            \"SIR\": 9.8325,\n            \"SAR\": 16.0838,\n            \"ISR\": 20.532\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.31114,\n            \"SIR\": 25.3629,\n            \"SAR\": 3.98212,\n            \"ISR\": 7.27301\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9745,\n            \"SIR\": 13.7369,\n            \"SAR\": 15.2978,\n            \"ISR\": 19.9168\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.27974,\n            \"SIR\": 7.30975,\n            \"SAR\": 0.66439,\n            \"ISR\": 4.19622\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1283,\n            \"SIR\": 20.2394,\n            \"SAR\": 15.4304,\n            \"ISR\": 29.5477\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.26588,\n            \"SIR\": 21.2452,\n            \"SAR\": 0.0017,\n            \"ISR\": 2.15839\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9829,\n            \"SIR\": 10.7949,\n            \"SAR\": 19.583,\n            \"ISR\": 20.0131\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.8873,\n            \"SIR\": 20.8616,\n            \"SAR\": 5.88701,\n            \"ISR\": 7.92103\n          },\n          \"instrumental\": {\n            \"SDR\": 12.016,\n            \"SIR\": 13.5267,\n            \"SAR\": 14.9936,\n            \"ISR\": 21.0662\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.19442,\n            \"SIR\": 23.5415,\n            \"SAR\": 6.80425,\n            \"ISR\": 9.1183\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5002,\n            \"SIR\": 16.4718,\n            \"SAR\": 18.3747,\n            \"ISR\": 33.361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.03535,\n            \"SIR\": 4.1695,\n            \"SAR\": 0.1037,\n            \"ISR\": 3.26091\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9356,\n            \"SIR\": 22.3127,\n            \"SAR\": 21.6331,\n            \"ISR\": 18.4239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.88156,\n            \"SIR\": 16.7721,\n            \"SAR\": 1.47556,\n            \"ISR\": 3.83899\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1574,\n            \"SIR\": 10.7556,\n            \"SAR\": 16.1756,\n            \"ISR\": 20.2921\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3e-05,\n            \"SIR\": 25.0255,\n            \"SAR\": 0.00053,\n            \"ISR\": 3.27158\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0022,\n            \"SIR\": 13.6025,\n            \"SAR\": 17.5628,\n            \"ISR\": 19.5107\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.42512,\n            \"SIR\": 21.2921,\n            \"SAR\": 3.55915,\n            \"ISR\": 6.23482\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0463,\n            \"SIR\": 13.8092,\n            \"SAR\": 16.4955,\n            \"ISR\": 34.3083\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.30165,\n            \"SIR\": 19.3769,\n            \"SAR\": 4.68588,\n            \"ISR\": 7.54588\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2955,\n            \"SIR\": 14.3822,\n            \"SAR\": 15.2098,\n            \"ISR\": 24.3896\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.95729,\n            \"SIR\": 20.5847,\n            \"SAR\": 7.485,\n            \"ISR\": 9.2506\n          },\n          \"instrumental\": {\n            \"SDR\": 8.41637,\n            \"SIR\": 10.3323,\n            \"SAR\": 11.2411,\n            \"ISR\": 24.6364\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.54235,\n            \"SIR\": 22.0931,\n            \"SAR\": 0.00234,\n            \"ISR\": 3.44139\n          },\n          \"instrumental\": {\n            \"SDR\": 9.43017,\n            \"SIR\": 8.279,\n            \"SAR\": 15.2115,\n            \"ISR\": 23.9707\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.71403,\n            \"SIR\": 22.1302,\n            \"SAR\": 2.98669,\n            \"ISR\": 5.61629\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4655,\n            \"SIR\": 10.9668,\n            \"SAR\": 14.0492,\n            \"ISR\": 18.3294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.26775,\n            \"SIR\": 27.2817,\n            \"SAR\": 4.90781,\n            \"ISR\": 7.6703\n          },\n          \"instrumental\": {\n            \"SDR\": 7.92315,\n            \"SIR\": 6.76672,\n            \"SAR\": 6.81889,\n            \"ISR\": 29.6706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.26969,\n            \"SIR\": 24.0666,\n            \"SAR\": 3.02549,\n            \"ISR\": 6.08719\n          },\n          \"instrumental\": {\n            \"SDR\": 6.98965,\n            \"SIR\": 7.60775,\n            \"SAR\": 10.7882,\n            \"ISR\": 19.6705\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.82935,\n            \"SIR\": 31.2641,\n            \"SAR\": -0.001175,\n            \"ISR\": 3.07653\n          },\n          \"instrumental\": {\n            \"SDR\": 15.222,\n            \"SIR\": 11.5805,\n            \"SAR\": 15.2478,\n            \"ISR\": 19.7549\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.0,\n            \"SIR\": 1.65806,\n            \"SAR\": -2e-05,\n            \"ISR\": 0.01738\n          },\n          \"instrumental\": {\n            \"SDR\": 6.78315,\n            \"SIR\": 6.05593,\n            \"SAR\": 32.2085,\n            \"ISR\": 19.1614\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.74493,\n            \"SIR\": 20.266,\n            \"SAR\": 5.54303,\n            \"ISR\": 10.3481\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6914,\n            \"SIR\": 17.2938,\n            \"SAR\": 15.8017,\n            \"ISR\": 27.5643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.81733,\n            \"SIR\": 19.0782,\n            \"SAR\": 4.25342,\n            \"ISR\": 7.31583\n          },\n          \"instrumental\": {\n            \"SDR\": 5.96654,\n            \"SIR\": 6.49313,\n            \"SAR\": 9.05095,\n            \"ISR\": 22.7231\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.10796,\n            \"SIR\": 22.7148,\n            \"SAR\": 3.64233,\n            \"ISR\": 7.24707\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6031,\n            \"SIR\": 11.3933,\n            \"SAR\": 12.9657,\n            \"ISR\": 31.9148\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.34193,\n            \"SIR\": 18.6291,\n            \"SAR\": 6.91061,\n            \"ISR\": 11.1377\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1981,\n            \"SIR\": 18.1297,\n            \"SAR\": 16.5326,\n            \"ISR\": 19.5379\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.46422,\n            \"SIR\": 25.181,\n            \"SAR\": 8.59972,\n            \"ISR\": 12.585\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2965,\n            \"SIR\": 18.6346,\n            \"SAR\": 17.4095,\n            \"ISR\": 23.6427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.62149,\n            \"SIR\": 25.5108,\n            \"SAR\": 5.43217,\n            \"ISR\": 9.20098\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4761,\n            \"SIR\": 16.7174,\n            \"SAR\": 16.7227,\n            \"ISR\": 24.5484\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.52432,\n            \"SIR\": 25.6619,\n            \"SAR\": 8.61282,\n            \"ISR\": 12.2143\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7204,\n            \"SIR\": 14.2243,\n            \"SAR\": 13.4859,\n            \"ISR\": 22.2115\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.55174,\n            \"SIR\": 19.6464,\n            \"SAR\": 8.34298,\n            \"ISR\": 13.0978\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3714,\n            \"SIR\": 19.1923,\n            \"SAR\": 16.5584,\n            \"ISR\": 27.3065\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 4.64021,\n        \"SIR\": 21.2921,\n        \"SAR\": 4.25342,\n        \"ISR\": 7.24707\n      },\n      \"instrumental\": {\n        \"SDR\": 12.9745,\n        \"SIR\": 13.7369,\n        \"SAR\": 15.2978,\n        \"ISR\": 21.991\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"7_HP2-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 7_HP2-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.93236,\n            \"SIR\": 17.387,\n            \"SAR\": 4.36463,\n            \"ISR\": 8.40421\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0848,\n            \"SIR\": 19.1375,\n            \"SAR\": 18.9953,\n            \"ISR\": 18.7277\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.05415,\n            \"SIR\": 13.9499,\n            \"SAR\": 6.28066,\n            \"ISR\": 12.4333\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1818,\n            \"SIR\": 17.7609,\n            \"SAR\": 13.2374,\n            \"ISR\": 15.8376\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.83053,\n            \"SIR\": 20.3737,\n            \"SAR\": 9.04699,\n            \"ISR\": 15.2785\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7707,\n            \"SIR\": 21.1389,\n            \"SAR\": 17.0835,\n            \"ISR\": 25.3722\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.40247,\n            \"SIR\": 6.60429,\n            \"SAR\": 4.62521,\n            \"ISR\": 12.9349\n          },\n          \"instrumental\": {\n            \"SDR\": 12.178,\n            \"SIR\": 21.1637,\n            \"SAR\": 13.9718,\n            \"ISR\": 17.3191\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4273,\n            \"SIR\": 21.1451,\n            \"SAR\": 10.3944,\n            \"ISR\": 14.0964\n          },\n          \"instrumental\": {\n            \"SDR\": 13.181,\n            \"SIR\": 15.7752,\n            \"SAR\": 14.2406,\n            \"ISR\": 24.6048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.18038,\n            \"SIR\": 16.9563,\n            \"SAR\": 9.8804,\n            \"ISR\": 13.8982\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3496,\n            \"SIR\": 15.5824,\n            \"SAR\": 13.4221,\n            \"ISR\": 21.1611\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3502,\n            \"SIR\": 19.9206,\n            \"SAR\": 10.988,\n            \"ISR\": 13.9799\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4664,\n            \"SIR\": 17.7813,\n            \"SAR\": 16.62,\n            \"ISR\": 26.3967\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.43604,\n            \"SIR\": 22.297,\n            \"SAR\": 7.31999,\n            \"ISR\": 10.1423\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9287,\n            \"SIR\": 21.0305,\n            \"SAR\": 21.4404,\n            \"ISR\": 21.0038\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.50602,\n            \"SIR\": 26.4427,\n            \"SAR\": 10.4688,\n            \"ISR\": 13.5006\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4628,\n            \"SIR\": 16.5996,\n            \"SAR\": 15.2764,\n            \"ISR\": 18.6923\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.19081,\n            \"SIR\": 18.8539,\n            \"SAR\": 8.2361,\n            \"ISR\": 11.0304\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9633,\n            \"SIR\": 14.3792,\n            \"SAR\": 14.4101,\n            \"ISR\": 23.0108\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5409,\n            \"SIR\": 22.9143,\n            \"SAR\": 11.5136,\n            \"ISR\": 17.2841\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9244,\n            \"SIR\": 20.2426,\n            \"SAR\": 15.2089,\n            \"ISR\": 22.1341\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.288,\n            \"SIR\": 20.8054,\n            \"SAR\": 10.0895,\n            \"ISR\": 15.978\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1317,\n            \"SIR\": 21.3943,\n            \"SAR\": 16.6096,\n            \"ISR\": 28.0967\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.24604,\n            \"SIR\": 21.1342,\n            \"SAR\": 8.46357,\n            \"ISR\": 12.9112\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9823,\n            \"SIR\": 16.7981,\n            \"SAR\": 14.6151,\n            \"ISR\": 27.3577\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.44201,\n            \"SIR\": 22.99,\n            \"SAR\": 9.50292,\n            \"ISR\": 14.1576\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9197,\n            \"SIR\": 18.1114,\n            \"SAR\": 15.4591,\n            \"ISR\": 18.577\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.3843,\n            \"SIR\": 16.7187,\n            \"SAR\": 4.53434,\n            \"ISR\": 7.55322\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6042,\n            \"SIR\": 16.3222,\n            \"SAR\": 17.4393,\n            \"ISR\": 27.1831\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42091,\n            \"SIR\": 19.764,\n            \"SAR\": 9.13352,\n            \"ISR\": 14.1109\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3145,\n            \"SIR\": 18.2575,\n            \"SAR\": 15.4218,\n            \"ISR\": 18.9897\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.91406,\n            \"SIR\": 22.991,\n            \"SAR\": 6.97285,\n            \"ISR\": 10.8305\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2234,\n            \"SIR\": 17.5496,\n            \"SAR\": 16.5915,\n            \"ISR\": 21.3062\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.3854,\n            \"SIR\": -36.6305,\n            \"SAR\": 0.13971,\n            \"ISR\": 3.13368\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4084,\n            \"SIR\": 48.926,\n            \"SAR\": 13.1819,\n            \"ISR\": 17.6772\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.28245,\n            \"SIR\": 15.189,\n            \"SAR\": 5.87541,\n            \"ISR\": 9.4287\n          },\n          \"instrumental\": {\n            \"SDR\": 16.362,\n            \"SIR\": 17.5719,\n            \"SAR\": 17.2561,\n            \"ISR\": 27.0459\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.04796,\n            \"SIR\": 18.9988,\n            \"SAR\": 9.62702,\n            \"ISR\": 13.8841\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7249,\n            \"SIR\": 19.232,\n            \"SAR\": 16.2698,\n            \"ISR\": 19.1654\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4078,\n            \"SIR\": 22.8579,\n            \"SAR\": 12.4746,\n            \"ISR\": 17.4099\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8064,\n            \"SIR\": 22.412,\n            \"SAR\": 18.605,\n            \"ISR\": 29.699\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.09854,\n            \"SIR\": 22.8819,\n            \"SAR\": 7.51915,\n            \"ISR\": 11.6399\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8702,\n            \"SIR\": 16.6661,\n            \"SAR\": 16.3938,\n            \"ISR\": 18.2569\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.63844,\n            \"SIR\": 14.6048,\n            \"SAR\": 3.2755,\n            \"ISR\": 5.58584\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5784,\n            \"SIR\": 12.3036,\n            \"SAR\": 15.6486,\n            \"ISR\": 19.4757\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.14744,\n            \"SIR\": 22.3991,\n            \"SAR\": 7.63438,\n            \"ISR\": 11.5593\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2128,\n            \"SIR\": 18.3167,\n            \"SAR\": 16.9511,\n            \"ISR\": 22.8588\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.44784,\n            \"SIR\": 17.4852,\n            \"SAR\": 6.56487,\n            \"ISR\": 10.3718\n          },\n          \"instrumental\": {\n            \"SDR\": 14.429,\n            \"SIR\": 18.2085,\n            \"SAR\": 17.1185,\n            \"ISR\": 28.0977\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.09312,\n            \"SIR\": 17.7768,\n            \"SAR\": 8.12612,\n            \"ISR\": 13.4193\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8988,\n            \"SIR\": 17.9638,\n            \"SAR\": 14.9284,\n            \"ISR\": 20.6086\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.30713,\n            \"SIR\": 18.7674,\n            \"SAR\": 9.09077,\n            \"ISR\": 12.7464\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0367,\n            \"SIR\": 13.9504,\n            \"SAR\": 12.0476,\n            \"ISR\": 21.8967\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.10818,\n            \"SIR\": 14.1312,\n            \"SAR\": 4.42982,\n            \"ISR\": 8.71197\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4711,\n            \"SIR\": 13.0184,\n            \"SAR\": 12.7884,\n            \"ISR\": 22.791\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.33733,\n            \"SIR\": 22.0611,\n            \"SAR\": 9.76658,\n            \"ISR\": 13.5259\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0386,\n            \"SIR\": 17.8098,\n            \"SAR\": 16.0153,\n            \"ISR\": 17.5294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4636,\n            \"SIR\": 24.6266,\n            \"SAR\": 10.826,\n            \"ISR\": 13.4003\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1421,\n            \"SIR\": 13.3813,\n            \"SAR\": 11.2898,\n            \"ISR\": 26.1565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.0366,\n            \"SIR\": 19.3842,\n            \"SAR\": 9.92364,\n            \"ISR\": 13.7406\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1188,\n            \"SIR\": 16.1186,\n            \"SAR\": 14.2281,\n            \"ISR\": 18.0852\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1914,\n            \"SIR\": 29.5799,\n            \"SAR\": 11.0196,\n            \"ISR\": 15.7821\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2859,\n            \"SIR\": 22.9974,\n            \"SAR\": 20.1242,\n            \"ISR\": 19.4437\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.99552,\n            \"SIR\": 11.9636,\n            \"SAR\": 9.74422,\n            \"ISR\": 10.283\n          },\n          \"instrumental\": {\n            \"SDR\": 12.817,\n            \"SIR\": 15.7251,\n            \"SAR\": 17.3806,\n            \"ISR\": 18.6098\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3667,\n            \"SIR\": 28.5043,\n            \"SAR\": 14.6887,\n            \"ISR\": 21.6084\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6669,\n            \"SIR\": 26.6962,\n            \"SAR\": 19.7765,\n            \"ISR\": 31.3672\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.45,\n            \"SIR\": 20.6107,\n            \"SAR\": 12.6396,\n            \"ISR\": 18.915\n          },\n          \"instrumental\": {\n            \"SDR\": 9.79974,\n            \"SIR\": 15.8961,\n            \"SAR\": 11.2031,\n            \"ISR\": 19.5192\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.71609,\n            \"SIR\": 14.1348,\n            \"SAR\": 7.11794,\n            \"ISR\": 11.2593\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5634,\n            \"SIR\": 15.9534,\n            \"SAR\": 13.5375,\n            \"ISR\": 21.8897\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.89447,\n            \"SIR\": 16.8714,\n            \"SAR\": 6.65307,\n            \"ISR\": 11.5714\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5069,\n            \"SIR\": 18.8977,\n            \"SAR\": 16.5729,\n            \"ISR\": 18.9864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.77945,\n            \"SIR\": 24.268,\n            \"SAR\": 9.28198,\n            \"ISR\": 13.7379\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7713,\n            \"SIR\": 20.2268,\n            \"SAR\": 17.9744,\n            \"ISR\": 32.1029\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.80116,\n            \"SIR\": 26.0144,\n            \"SAR\": 10.0654,\n            \"ISR\": 13.3923\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7233,\n            \"SIR\": 20.8795,\n            \"SAR\": 18.5056,\n            \"ISR\": 24.2901\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.27226,\n            \"SIR\": 21.8389,\n            \"SAR\": 8.96911,\n            \"ISR\": 13.0366\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1889,\n            \"SIR\": 16.3723,\n            \"SAR\": 13.755,\n            \"ISR\": 21.1589\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95115,\n            \"SIR\": 20.8892,\n            \"SAR\": 8.6603,\n            \"ISR\": 12.6023\n          },\n          \"instrumental\": {\n            \"SDR\": 14.436,\n            \"SIR\": 18.7546,\n            \"SAR\": 16.8965,\n            \"ISR\": 28.7178\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.30713,\n        \"SIR\": 20.3737,\n        \"SAR\": 9.04699,\n        \"ISR\": 13.0366\n      },\n      \"instrumental\": {\n        \"SDR\": 13.5069,\n        \"SIR\": 17.8098,\n        \"SAR\": 16.0153,\n        \"ISR\": 21.3062\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"8_HP2-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 8_HP2-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.5483,\n            \"SIR\": 18.0967,\n            \"SAR\": 4.25896,\n            \"ISR\": 7.83695\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0298,\n            \"SIR\": 18.8059,\n            \"SAR\": 19.0634,\n            \"ISR\": 19.0558\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.5217,\n            \"SIR\": 9.25839,\n            \"SAR\": 5.11401,\n            \"ISR\": 12.5705\n          },\n          \"instrumental\": {\n            \"SDR\": 9.79435,\n            \"SIR\": 17.7386,\n            \"SAR\": 11.5506,\n            \"ISR\": 13.705\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.91456,\n            \"SIR\": 19.9612,\n            \"SAR\": 9.02234,\n            \"ISR\": 14.8232\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8334,\n            \"SIR\": 21.5055,\n            \"SAR\": 17.1998,\n            \"ISR\": 24.742\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.8577,\n            \"SIR\": 5.69392,\n            \"SAR\": 3.41346,\n            \"ISR\": 12.0947\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8434,\n            \"SIR\": 20.992,\n            \"SAR\": 13.6334,\n            \"ISR\": 16.7193\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.72525,\n            \"SIR\": 20.2734,\n            \"SAR\": 9.75812,\n            \"ISR\": 12.9511\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7109,\n            \"SIR\": 14.9283,\n            \"SAR\": 13.7881,\n            \"ISR\": 23.4522\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.86058,\n            \"SIR\": 17.1129,\n            \"SAR\": 9.56094,\n            \"ISR\": 12.586\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1257,\n            \"SIR\": 14.8721,\n            \"SAR\": 13.4094,\n            \"ISR\": 20.8899\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0362,\n            \"SIR\": 19.4624,\n            \"SAR\": 10.5458,\n            \"ISR\": 13.9267\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5052,\n            \"SIR\": 18.4534,\n            \"SAR\": 16.2467,\n            \"ISR\": 24.5399\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.89598,\n            \"SIR\": 22.4134,\n            \"SAR\": 7.45517,\n            \"ISR\": 10.0019\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9104,\n            \"SIR\": 20.3231,\n            \"SAR\": 21.5075,\n            \"ISR\": 22.0859\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.35862,\n            \"SIR\": 25.3578,\n            \"SAR\": 10.1324,\n            \"ISR\": 13.3908\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4171,\n            \"SIR\": 16.8515,\n            \"SAR\": 15.1402,\n            \"ISR\": 18.6435\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.1635,\n            \"SIR\": 19.0642,\n            \"SAR\": 7.91343,\n            \"ISR\": 10.4067\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1921,\n            \"SIR\": 14.228,\n            \"SAR\": 14.3025,\n            \"ISR\": 25.2771\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.99689,\n            \"SIR\": 21.3627,\n            \"SAR\": 10.7317,\n            \"ISR\": 16.633\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1696,\n            \"SIR\": 20.0343,\n            \"SAR\": 14.5301,\n            \"ISR\": 21.2391\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.94976,\n            \"SIR\": 21.9598,\n            \"SAR\": 9.5192,\n            \"ISR\": 13.3106\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9411,\n            \"SIR\": 19.6014,\n            \"SAR\": 16.6301,\n            \"ISR\": 28.4452\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.69217,\n            \"SIR\": 19.6891,\n            \"SAR\": 8.05813,\n            \"ISR\": 12.1802\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4725,\n            \"SIR\": 17.321,\n            \"SAR\": 15.1231,\n            \"ISR\": 25.7254\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.63925,\n            \"SIR\": 21.3111,\n            \"SAR\": 8.93167,\n            \"ISR\": 12.888\n          },\n          \"instrumental\": {\n            \"SDR\": 14.465,\n            \"SIR\": 18.3457,\n            \"SAR\": 15.6787,\n            \"ISR\": 18.7529\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.01996,\n            \"SIR\": 19.4165,\n            \"SAR\": 3.80985,\n            \"ISR\": 6.81264\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4967,\n            \"SIR\": 15.6968,\n            \"SAR\": 17.7825,\n            \"ISR\": 30.6874\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.38851,\n            \"SIR\": 19.5639,\n            \"SAR\": 9.12248,\n            \"ISR\": 13.4169\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2897,\n            \"SIR\": 18.119,\n            \"SAR\": 15.6201,\n            \"ISR\": 19.0483\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.81846,\n            \"SIR\": 21.1041,\n            \"SAR\": 6.8353,\n            \"ISR\": 10.2105\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6215,\n            \"SIR\": 16.768,\n            \"SAR\": 15.9908,\n            \"ISR\": 20.8525\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -15.0131,\n            \"SIR\": -34.6622,\n            \"SAR\": 0.42234,\n            \"ISR\": 0.90929\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2162,\n            \"SIR\": 46.1306,\n            \"SAR\": 12.576,\n            \"ISR\": 15.6722\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.99497,\n            \"SIR\": 16.9166,\n            \"SAR\": 5.58985,\n            \"ISR\": 8.1484\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3515,\n            \"SIR\": 16.7041,\n            \"SAR\": 17.2802,\n            \"ISR\": 28.3457\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.89536,\n            \"SIR\": 20.1338,\n            \"SAR\": 8.90963,\n            \"ISR\": 12.2676\n          },\n          \"instrumental\": {\n            \"SDR\": 13.667,\n            \"SIR\": 17.7732,\n            \"SAR\": 16.9273,\n            \"ISR\": 19.953\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5114,\n            \"SIR\": 21.0131,\n            \"SAR\": 11.5402,\n            \"ISR\": 16.1858\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6453,\n            \"SIR\": 21.3635,\n            \"SAR\": 18.5128,\n            \"ISR\": 28.6812\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.52437,\n            \"SIR\": 18.6882,\n            \"SAR\": 6.44174,\n            \"ISR\": 10.934\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0757,\n            \"SIR\": 17.6763,\n            \"SAR\": 17.08,\n            \"ISR\": 18.2607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.38589,\n            \"SIR\": 13.6967,\n            \"SAR\": 2.98159,\n            \"ISR\": 5.12145\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2681,\n            \"SIR\": 11.843,\n            \"SAR\": 15.5523,\n            \"ISR\": 19.5709\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.73026,\n            \"SIR\": 22.5991,\n            \"SAR\": 6.52306,\n            \"ISR\": 9.44696\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9734,\n            \"SIR\": 18.1106,\n            \"SAR\": 17.3611,\n            \"ISR\": 21.8388\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.12047,\n            \"SIR\": 17.6122,\n            \"SAR\": 6.14048,\n            \"ISR\": 9.4555\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0483,\n            \"SIR\": 17.3905,\n            \"SAR\": 17.0588,\n            \"ISR\": 27.6397\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.29692,\n            \"SIR\": 15.8046,\n            \"SAR\": 7.455,\n            \"ISR\": 12.4417\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5648,\n            \"SIR\": 17.2329,\n            \"SAR\": 14.3899,\n            \"ISR\": 19.3636\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.99284,\n            \"SIR\": 18.9859,\n            \"SAR\": 8.62869,\n            \"ISR\": 12.0115\n          },\n          \"instrumental\": {\n            \"SDR\": 9.88753,\n            \"SIR\": 13.7357,\n            \"SAR\": 12.1866,\n            \"ISR\": 21.7101\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.715,\n            \"SIR\": 13.7218,\n            \"SAR\": 4.32933,\n            \"ISR\": 7.9927\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3648,\n            \"SIR\": 12.5321,\n            \"SAR\": 13.1412,\n            \"ISR\": 21.7225\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.3018,\n            \"SIR\": 21.214,\n            \"SAR\": 9.3766,\n            \"ISR\": 13.2167\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9149,\n            \"SIR\": 17.7767,\n            \"SAR\": 15.8697,\n            \"ISR\": 17.5457\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6007,\n            \"SIR\": 23.0134,\n            \"SAR\": 10.1728,\n            \"ISR\": 12.4659\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3141,\n            \"SIR\": 11.6633,\n            \"SAR\": 9.94532,\n            \"ISR\": 23.8187\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.45214,\n            \"SIR\": 20.0448,\n            \"SAR\": 10.3347,\n            \"ISR\": 13.5173\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8848,\n            \"SIR\": 15.7662,\n            \"SAR\": 14.2046,\n            \"ISR\": 18.4133\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2031,\n            \"SIR\": 32.0996,\n            \"SAR\": 11.3027,\n            \"ISR\": 14.7336\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1664,\n            \"SIR\": 22.1659,\n            \"SAR\": 20.1328,\n            \"ISR\": 19.7074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.32301,\n            \"SIR\": 9.21441,\n            \"SAR\": 6.13258,\n            \"ISR\": 10.45\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0885,\n            \"SIR\": 15.2584,\n            \"SAR\": 13.273,\n            \"ISR\": 15.3141\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.8531,\n            \"SIR\": 28.1247,\n            \"SAR\": 14.4393,\n            \"ISR\": 20.6479\n          },\n          \"instrumental\": {\n            \"SDR\": 19.439,\n            \"SIR\": 27.2839,\n            \"SAR\": 20.652,\n            \"ISR\": 27.5765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9901,\n            \"SIR\": 20.7114,\n            \"SAR\": 12.2736,\n            \"ISR\": 17.7449\n          },\n          \"instrumental\": {\n            \"SDR\": 9.81901,\n            \"SIR\": 14.9903,\n            \"SAR\": 11.2065,\n            \"ISR\": 19.5773\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.45781,\n            \"SIR\": 16.1148,\n            \"SAR\": 6.90846,\n            \"ISR\": 10.8658\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5644,\n            \"SIR\": 15.6986,\n            \"SAR\": 13.8691,\n            \"ISR\": 22.8434\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.50834,\n            \"SIR\": 15.3045,\n            \"SAR\": 6.04962,\n            \"ISR\": 11.4284\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6646,\n            \"SIR\": 20.2119,\n            \"SAR\": 16.6316,\n            \"ISR\": 18.326\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.22251,\n            \"SIR\": 23.4458,\n            \"SAR\": 8.59403,\n            \"ISR\": 12.6463\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9076,\n            \"SIR\": 19.1517,\n            \"SAR\": 17.0845,\n            \"ISR\": 29.8717\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.65171,\n            \"SIR\": 25.0941,\n            \"SAR\": 9.78456,\n            \"ISR\": 12.5468\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5769,\n            \"SIR\": 20.5809,\n            \"SAR\": 18.5276,\n            \"ISR\": 24.697\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.30663,\n            \"SIR\": 21.702,\n            \"SAR\": 8.76429,\n            \"ISR\": 12.5591\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1806,\n            \"SIR\": 16.273,\n            \"SAR\": 13.9889,\n            \"ISR\": 21.2666\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.82587,\n            \"SIR\": 20.0076,\n            \"SAR\": 8.39395,\n            \"ISR\": 12.7493\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2692,\n            \"SIR\": 19.0398,\n            \"SAR\": 16.5854,\n            \"ISR\": 27.17\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.22251,\n        \"SIR\": 19.9612,\n        \"SAR\": 8.59403,\n        \"ISR\": 12.4659\n      },\n      \"instrumental\": {\n        \"SDR\": 13.4967,\n        \"SIR\": 17.7386,\n        \"SAR\": 15.6787,\n        \"ISR\": 21.2666\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"9_HP2-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 9_HP2-UVR\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.22431,\n            \"SIR\": 18.2506,\n            \"SAR\": 4.90021,\n            \"ISR\": 8.40822\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1383,\n            \"SIR\": 19.3881,\n            \"SAR\": 19.1667,\n            \"ISR\": 19.0479\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91892,\n            \"SIR\": 13.4315,\n            \"SAR\": 5.87209,\n            \"ISR\": 11.5725\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1616,\n            \"SIR\": 17.0683,\n            \"SAR\": 13.1556,\n            \"ISR\": 15.7575\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.71013,\n            \"SIR\": 20.7683,\n            \"SAR\": 8.92713,\n            \"ISR\": 14.4059\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6761,\n            \"SIR\": 20.8755,\n            \"SAR\": 17.1283,\n            \"ISR\": 22.5582\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.77383,\n            \"SIR\": 6.66241,\n            \"SAR\": 3.43567,\n            \"ISR\": 11.2882\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9648,\n            \"SIR\": 20.1375,\n            \"SAR\": 13.7891,\n            \"ISR\": 17.8352\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.77113,\n            \"SIR\": 20.1864,\n            \"SAR\": 9.9855,\n            \"ISR\": 13.1812\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7192,\n            \"SIR\": 15.3332,\n            \"SAR\": 13.7734,\n            \"ISR\": 23.266\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.95266,\n            \"SIR\": 17.2787,\n            \"SAR\": 9.72487,\n            \"ISR\": 12.7692\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1685,\n            \"SIR\": 15.0105,\n            \"SAR\": 13.5501,\n            \"ISR\": 21.0401\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0512,\n            \"SIR\": 19.6575,\n            \"SAR\": 10.5758,\n            \"ISR\": 14.0432\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5793,\n            \"SIR\": 18.5896,\n            \"SAR\": 16.3065,\n            \"ISR\": 25.0471\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.34105,\n            \"SIR\": 22.0818,\n            \"SAR\": 7.45141,\n            \"ISR\": 10.1942\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2842,\n            \"SIR\": 21.2746,\n            \"SAR\": 21.3936,\n            \"ISR\": 22.2958\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.30976,\n            \"SIR\": 26.1301,\n            \"SAR\": 10.1868,\n            \"ISR\": 12.9209\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4818,\n            \"SIR\": 16.3577,\n            \"SAR\": 15.3424,\n            \"ISR\": 18.9259\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.36682,\n            \"SIR\": 19.3628,\n            \"SAR\": 8.15558,\n            \"ISR\": 10.5532\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2204,\n            \"SIR\": 14.351,\n            \"SAR\": 14.4808,\n            \"ISR\": 25.6181\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1264,\n            \"SIR\": 22.8787,\n            \"SAR\": 10.9172,\n            \"ISR\": 16.2352\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2989,\n            \"SIR\": 19.5949,\n            \"SAR\": 14.9069,\n            \"ISR\": 22.0512\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.96521,\n            \"SIR\": 21.5663,\n            \"SAR\": 9.65443,\n            \"ISR\": 14.2792\n          },\n          \"instrumental\": {\n            \"SDR\": 15.013,\n            \"SIR\": 20.2952,\n            \"SAR\": 16.6908,\n            \"ISR\": 28.1565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.7797,\n            \"SIR\": 18.8522,\n            \"SAR\": 7.17519,\n            \"ISR\": 11.1195\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3973,\n            \"SIR\": 18.1526,\n            \"SAR\": 16.0058,\n            \"ISR\": 25.9338\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.82901,\n            \"SIR\": 19.2128,\n            \"SAR\": 8.15814,\n            \"ISR\": 12.5959\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2524,\n            \"SIR\": 19.3853,\n            \"SAR\": 16.9795,\n            \"ISR\": 18.5459\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.58801,\n            \"SIR\": 16.3083,\n            \"SAR\": 2.74386,\n            \"ISR\": 6.85394\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8996,\n            \"SIR\": 17.6975,\n            \"SAR\": 19.1122,\n            \"ISR\": 29.7667\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.34758,\n            \"SIR\": 19.9418,\n            \"SAR\": 9.10171,\n            \"ISR\": 13.4363\n          },\n          \"instrumental\": {\n            \"SDR\": 13.359,\n            \"SIR\": 18.1552,\n            \"SAR\": 15.6388,\n            \"ISR\": 19.2491\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.77543,\n            \"SIR\": 19.4681,\n            \"SAR\": 6.6889,\n            \"ISR\": 9.95323\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6185,\n            \"SIR\": 16.8184,\n            \"SAR\": 15.8543,\n            \"ISR\": 20.5589\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.2878,\n            \"SIR\": -35.3836,\n            \"SAR\": 0.43443,\n            \"ISR\": 0.6351\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6237,\n            \"SIR\": 46.4515,\n            \"SAR\": 12.9895,\n            \"ISR\": 16.6862\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04038,\n            \"SIR\": 16.6786,\n            \"SAR\": 5.28286,\n            \"ISR\": 8.36533\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3444,\n            \"SIR\": 16.8185,\n            \"SAR\": 17.3767,\n            \"ISR\": 28.0542\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.61089,\n            \"SIR\": 19.8581,\n            \"SAR\": 9.04009,\n            \"ISR\": 12.5559\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6583,\n            \"SIR\": 18.2553,\n            \"SAR\": 16.6041,\n            \"ISR\": 19.6878\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.31669,\n            \"SIR\": 20.4576,\n            \"SAR\": 10.4625,\n            \"ISR\": 14.7162\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9295,\n            \"SIR\": 22.342,\n            \"SAR\": 19.3893,\n            \"ISR\": 29.0074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00221,\n            \"SIR\": 4.74317,\n            \"SAR\": 0.00125,\n            \"ISR\": 4.2203\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6629,\n            \"SIR\": 24.4063,\n            \"SAR\": 21.6347,\n            \"ISR\": 18.6024\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.44665,\n            \"SIR\": 14.0945,\n            \"SAR\": 3.01467,\n            \"ISR\": 5.32001\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3958,\n            \"SIR\": 12.0877,\n            \"SAR\": 15.5962,\n            \"ISR\": 20.362\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.067,\n            \"SIR\": 21.6977,\n            \"SAR\": 6.9542,\n            \"ISR\": 10.0683\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4921,\n            \"SIR\": 19.7167,\n            \"SAR\": 18.0298,\n            \"ISR\": 22.376\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.20469,\n            \"SIR\": 17.5996,\n            \"SAR\": 6.2824,\n            \"ISR\": 9.6513\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1098,\n            \"SIR\": 17.6084,\n            \"SAR\": 17.0794,\n            \"ISR\": 27.5637\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.27363,\n            \"SIR\": 17.11,\n            \"SAR\": 7.5736,\n            \"ISR\": 12.0952\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9028,\n            \"SIR\": 17.2235,\n            \"SAR\": 14.9143,\n            \"ISR\": 19.7105\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.98183,\n            \"SIR\": 19.1511,\n            \"SAR\": 8.49003,\n            \"ISR\": 11.5501\n          },\n          \"instrumental\": {\n            \"SDR\": 9.5881,\n            \"SIR\": 13.0311,\n            \"SAR\": 11.7381,\n            \"ISR\": 22.1402\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.83956,\n            \"SIR\": 13.4483,\n            \"SAR\": 4.2598,\n            \"ISR\": 7.71912\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2326,\n            \"SIR\": 12.5918,\n            \"SAR\": 13.1154,\n            \"ISR\": 22.0488\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.27487,\n            \"SIR\": 22.0604,\n            \"SAR\": 9.2993,\n            \"ISR\": 13.1682\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0906,\n            \"SIR\": 17.8775,\n            \"SAR\": 16.0108,\n            \"ISR\": 17.8114\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5831,\n            \"SIR\": 23.3267,\n            \"SAR\": 9.95767,\n            \"ISR\": 12.1257\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3388,\n            \"SIR\": 11.5746,\n            \"SAR\": 9.83662,\n            \"ISR\": 24.0402\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.27707,\n            \"SIR\": 19.8508,\n            \"SAR\": 10.2183,\n            \"ISR\": 13.398\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0478,\n            \"SIR\": 15.9282,\n            \"SAR\": 14.0329,\n            \"ISR\": 18.2832\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2488,\n            \"SIR\": 31.8434,\n            \"SAR\": 11.2218,\n            \"ISR\": 15.0046\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2059,\n            \"SIR\": 22.4921,\n            \"SAR\": 20.0902,\n            \"ISR\": 19.688\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.7769,\n            \"SIR\": 10.7739,\n            \"SAR\": 6.76779,\n            \"ISR\": 8.6522\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0431,\n            \"SIR\": 14.4797,\n            \"SAR\": 14.4037,\n            \"ISR\": 17.2019\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7959,\n            \"SIR\": 28.6967,\n            \"SAR\": 13.964,\n            \"ISR\": 20.2307\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7435,\n            \"SIR\": 26.9783,\n            \"SAR\": 20.9271,\n            \"ISR\": 29.2887\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6518,\n            \"SIR\": 21.2137,\n            \"SAR\": 11.8402,\n            \"ISR\": 16.9075\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3149,\n            \"SIR\": 14.7172,\n            \"SAR\": 11.4537,\n            \"ISR\": 20.5416\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.49728,\n            \"SIR\": 16.6313,\n            \"SAR\": 6.91648,\n            \"ISR\": 11.128\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7356,\n            \"SIR\": 15.9875,\n            \"SAR\": 13.9733,\n            \"ISR\": 23.4678\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.33799,\n            \"SIR\": 14.4166,\n            \"SAR\": 5.4313,\n            \"ISR\": 10.8145\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8741,\n            \"SIR\": 21.0435,\n            \"SAR\": 17.9999,\n            \"ISR\": 18.6252\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.99198,\n            \"SIR\": 23.5181,\n            \"SAR\": 8.34209,\n            \"ISR\": 12.5971\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2871,\n            \"SIR\": 19.385,\n            \"SAR\": 17.2405,\n            \"ISR\": 29.9486\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.57819,\n            \"SIR\": 25.6947,\n            \"SAR\": 9.74428,\n            \"ISR\": 12.9988\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9069,\n            \"SIR\": 20.8965,\n            \"SAR\": 18.9081,\n            \"ISR\": 24.8489\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.26817,\n            \"SIR\": 22.9365,\n            \"SAR\": 8.88114,\n            \"ISR\": 12.2538\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3143,\n            \"SIR\": 16.0665,\n            \"SAR\": 14.2893,\n            \"ISR\": 21.7174\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.91646,\n            \"SIR\": 20.7634,\n            \"SAR\": 8.41774,\n            \"ISR\": 12.6083\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3384,\n            \"SIR\": 18.8894,\n            \"SAR\": 16.899,\n            \"ISR\": 27.8449\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.98183,\n        \"SIR\": 19.6575,\n        \"SAR\": 8.34209,\n        \"ISR\": 12.1257\n      },\n      \"instrumental\": {\n        \"SDR\": 13.6583,\n        \"SIR\": 18.1526,\n        \"SAR\": 16.0058,\n        \"ISR\": 22.0488\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"10_SP-UVR-2B-32000-1.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 10_SP-UVR-2B-32000-1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.55666,\n            \"SIR\": 13.1594,\n            \"SAR\": 4.60601,\n            \"ISR\": 8.92065\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0844,\n            \"SIR\": 20.3956,\n            \"SAR\": 18.4196,\n            \"ISR\": 18.3987\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.31353,\n            \"SIR\": 9.47502,\n            \"SAR\": 5.09269,\n            \"ISR\": 12.714\n          },\n          \"instrumental\": {\n            \"SDR\": 9.37283,\n            \"SIR\": 17.8028,\n            \"SAR\": 10.9615,\n            \"ISR\": 13.6907\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.39411,\n            \"SIR\": 18.5401,\n            \"SAR\": 8.48992,\n            \"ISR\": 14.5754\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8965,\n            \"SIR\": 21.17,\n            \"SAR\": 16.3839,\n            \"ISR\": 19.8328\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.74179,\n            \"SIR\": 5.5464,\n            \"SAR\": 3.16704,\n            \"ISR\": 12.0086\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5079,\n            \"SIR\": 20.401,\n            \"SAR\": 13.3993,\n            \"ISR\": 16.4562\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.52752,\n            \"SIR\": 19.8896,\n            \"SAR\": 9.07667,\n            \"ISR\": 12.0783\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5666,\n            \"SIR\": 14.1582,\n            \"SAR\": 13.1657,\n            \"ISR\": 21.3663\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.67034,\n            \"SIR\": 15.6915,\n            \"SAR\": 9.32621,\n            \"ISR\": 13.102\n          },\n          \"instrumental\": {\n            \"SDR\": 10.894,\n            \"SIR\": 15.2413,\n            \"SAR\": 13.0435,\n            \"ISR\": 19.5403\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.73292,\n            \"SIR\": 18.7999,\n            \"SAR\": 10.3168,\n            \"ISR\": 13.7927\n          },\n          \"instrumental\": {\n            \"SDR\": 14.163,\n            \"SIR\": 18.2858,\n            \"SAR\": 15.9504,\n            \"ISR\": 23.6148\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.36253,\n            \"SIR\": 20.2471,\n            \"SAR\": 5.68376,\n            \"ISR\": 9.20478\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0967,\n            \"SIR\": 22.247,\n            \"SAR\": 22.1453,\n            \"ISR\": 21.5731\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.71165,\n            \"SIR\": 25.2214,\n            \"SAR\": 9.7953,\n            \"ISR\": 12.3997\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2931,\n            \"SIR\": 16.0846,\n            \"SAR\": 14.9762,\n            \"ISR\": 19.2765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.70022,\n            \"SIR\": 17.3035,\n            \"SAR\": 7.49171,\n            \"ISR\": 10.3478\n          },\n          \"instrumental\": {\n            \"SDR\": 12.927,\n            \"SIR\": 14.1498,\n            \"SAR\": 13.9197,\n            \"ISR\": 23.1908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.81558,\n            \"SIR\": 22.7435,\n            \"SAR\": 10.4558,\n            \"ISR\": 16.2504\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9624,\n            \"SIR\": 19.7632,\n            \"SAR\": 14.1727,\n            \"ISR\": 22.2328\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79323,\n            \"SIR\": 20.358,\n            \"SAR\": 9.15326,\n            \"ISR\": 13.3217\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6342,\n            \"SIR\": 19.7319,\n            \"SAR\": 15.9498,\n            \"ISR\": 26.4234\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.06859,\n            \"SIR\": 17.2859,\n            \"SAR\": 6.67274,\n            \"ISR\": 11.3402\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5972,\n            \"SIR\": 19.4296,\n            \"SAR\": 16.1956,\n            \"ISR\": 24.7395\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.33408,\n            \"SIR\": 18.786,\n            \"SAR\": 8.0449,\n            \"ISR\": 12.1352\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3814,\n            \"SIR\": 20.1613,\n            \"SAR\": 17.2909,\n            \"ISR\": 18.7949\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.55404,\n            \"SIR\": 13.2299,\n            \"SAR\": 1.81962,\n            \"ISR\": 7.02643\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0181,\n            \"SIR\": 18.0565,\n            \"SAR\": 18.6274,\n            \"ISR\": 26.4225\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.89549,\n            \"SIR\": 18.321,\n            \"SAR\": 8.40817,\n            \"ISR\": 13.7358\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1567,\n            \"SIR\": 18.4196,\n            \"SAR\": 15.0427,\n            \"ISR\": 18.8103\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.39555,\n            \"SIR\": 20.6244,\n            \"SAR\": 6.03627,\n            \"ISR\": 9.92927\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5042,\n            \"SIR\": 16.9237,\n            \"SAR\": 15.2106,\n            \"ISR\": 21.7277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -24.9708,\n            \"SIR\": -37.1873,\n            \"SAR\": 0.228475,\n            \"ISR\": 3.20742\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4729,\n            \"SIR\": 46.5401,\n            \"SAR\": 11.515,\n            \"ISR\": 14.9476\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.66326,\n            \"SIR\": 14.2253,\n            \"SAR\": 4.92814,\n            \"ISR\": 8.64637\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3789,\n            \"SIR\": 17.3207,\n            \"SAR\": 16.958,\n            \"ISR\": 26.0429\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.48859,\n            \"SIR\": 18.4375,\n            \"SAR\": 9.18019,\n            \"ISR\": 12.8338\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3166,\n            \"SIR\": 18.527,\n            \"SAR\": 15.9151,\n            \"ISR\": 19.4388\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.11344,\n            \"SIR\": 20.7022,\n            \"SAR\": 10.1758,\n            \"ISR\": 14.1646\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3215,\n            \"SIR\": 20.0917,\n            \"SAR\": 17.5812,\n            \"ISR\": 28.1135\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.06374,\n            \"SIR\": 20.5727,\n            \"SAR\": 6.73396,\n            \"ISR\": 10.0258\n          },\n          \"instrumental\": {\n            \"SDR\": 12.677,\n            \"SIR\": 15.9207,\n            \"SAR\": 15.8225,\n            \"ISR\": 19.4481\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.38899,\n            \"SIR\": 12.2626,\n            \"SAR\": 2.94463,\n            \"ISR\": 5.42674\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1886,\n            \"SIR\": 12.2725,\n            \"SAR\": 15.0454,\n            \"ISR\": 18.137\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.74227,\n            \"SIR\": 19.9194,\n            \"SAR\": 2.18712,\n            \"ISR\": 7.65378\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7806,\n            \"SIR\": 20.0186,\n            \"SAR\": 18.262,\n            \"ISR\": 22.63\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.93724,\n            \"SIR\": 15.3569,\n            \"SAR\": 6.10264,\n            \"ISR\": 9.87626\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8299,\n            \"SIR\": 18.0726,\n            \"SAR\": 16.7343,\n            \"ISR\": 24.7768\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.317,\n            \"SIR\": 12.3429,\n            \"SAR\": 5.98735,\n            \"ISR\": 12.7055\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0098,\n            \"SIR\": 18.4262,\n            \"SAR\": 14.3086,\n            \"ISR\": 20.2315\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.82226,\n            \"SIR\": 18.1491,\n            \"SAR\": 8.47455,\n            \"ISR\": 11.8146\n          },\n          \"instrumental\": {\n            \"SDR\": 9.70422,\n            \"SIR\": 13.4805,\n            \"SAR\": 11.733,\n            \"ISR\": 20.6292\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.1161,\n            \"SIR\": 8.73569,\n            \"SAR\": 4.05106,\n            \"ISR\": 8.94053\n          },\n          \"instrumental\": {\n            \"SDR\": 9.86386,\n            \"SIR\": 13.1755,\n            \"SAR\": 11.693,\n            \"ISR\": 17.179\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.7524,\n            \"SIR\": 20.5942,\n            \"SAR\": 8.74542,\n            \"ISR\": 12.9702\n          },\n          \"instrumental\": {\n            \"SDR\": 12.601,\n            \"SIR\": 17.5097,\n            \"SAR\": 15.4571,\n            \"ISR\": 17.4277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9523,\n            \"SIR\": 22.9112,\n            \"SAR\": 9.60426,\n            \"ISR\": 11.903\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4931,\n            \"SIR\": 12.0426,\n            \"SAR\": 10.5119,\n            \"ISR\": 24.1665\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.36228,\n            \"SIR\": 18.5196,\n            \"SAR\": 9.28727,\n            \"ISR\": 12.8736\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8727,\n            \"SIR\": 16.3804,\n            \"SAR\": 14.023,\n            \"ISR\": 19.8458\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.68673,\n            \"SIR\": 29.9738,\n            \"SAR\": 10.5015,\n            \"ISR\": 14.4612\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2893,\n            \"SIR\": 22.5907,\n            \"SAR\": 19.287,\n            \"ISR\": 19.322\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.30439,\n            \"SIR\": 8.44296,\n            \"SAR\": 6.62538,\n            \"ISR\": 9.58403\n          },\n          \"instrumental\": {\n            \"SDR\": 9.91908,\n            \"SIR\": 15.382,\n            \"SAR\": 13.1648,\n            \"ISR\": 15.4699\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8223,\n            \"SIR\": 26.1613,\n            \"SAR\": 13.3089,\n            \"ISR\": 19.6004\n          },\n          \"instrumental\": {\n            \"SDR\": 20.2388,\n            \"SIR\": 27.2012,\n            \"SAR\": 21.2514,\n            \"ISR\": 28.438\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4937,\n            \"SIR\": 19.4429,\n            \"SAR\": 11.7425,\n            \"ISR\": 17.5716\n          },\n          \"instrumental\": {\n            \"SDR\": 9.72552,\n            \"SIR\": 15.1745,\n            \"SAR\": 10.9276,\n            \"ISR\": 18.6733\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.13402,\n            \"SIR\": 16.0478,\n            \"SAR\": 6.28158,\n            \"ISR\": 10.9519\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3114,\n            \"SIR\": 15.9395,\n            \"SAR\": 13.4249,\n            \"ISR\": 22.4215\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.9839,\n            \"SIR\": 4.65898,\n            \"SAR\": 1.55571,\n            \"ISR\": 10.5974\n          },\n          \"instrumental\": {\n            \"SDR\": 18.178,\n            \"SIR\": 26.5809,\n            \"SAR\": 20.3547,\n            \"ISR\": 17.9172\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.38466,\n            \"SIR\": 21.0508,\n            \"SAR\": 7.79172,\n            \"ISR\": 12.3868\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7027,\n            \"SIR\": 20.1125,\n            \"SAR\": 17.4991,\n            \"ISR\": 28.0891\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.35869,\n            \"SIR\": 24.3349,\n            \"SAR\": 8.66062,\n            \"ISR\": 12.0172\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9297,\n            \"SIR\": 21.0908,\n            \"SAR\": 18.7642,\n            \"ISR\": 24.095\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.75028,\n            \"SIR\": 21.6046,\n            \"SAR\": 8.55142,\n            \"ISR\": 11.6882\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0793,\n            \"SIR\": 15.4816,\n            \"SAR\": 14.0158,\n            \"ISR\": 21.6839\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.46119,\n            \"SIR\": 19.7891,\n            \"SAR\": 7.97103,\n            \"ISR\": 12.4279\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2304,\n            \"SIR\": 18.8039,\n            \"SAR\": 16.389,\n            \"ISR\": 25.723\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.46119,\n        \"SIR\": 18.5401,\n        \"SAR\": 7.97103,\n        \"ISR\": 12.0172\n      },\n      \"instrumental\": {\n        \"SDR\": 13.3166,\n        \"SIR\": 18.2858,\n        \"SAR\": 15.4571,\n        \"ISR\": 20.6292\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"11_SP-UVR-2B-32000-2.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 11_SP-UVR-2B-32000-2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.38186,\n            \"SIR\": 13.6055,\n            \"SAR\": 4.31086,\n            \"ISR\": 8.63485\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3254,\n            \"SIR\": 20.4471,\n            \"SAR\": 19.4538,\n            \"ISR\": 18.7182\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.61384,\n            \"SIR\": 10.2422,\n            \"SAR\": 4.80059,\n            \"ISR\": 12.4588\n          },\n          \"instrumental\": {\n            \"SDR\": 9.54512,\n            \"SIR\": 17.5755,\n            \"SAR\": 11.2566,\n            \"ISR\": 14.1752\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2642,\n            \"SIR\": 19.2592,\n            \"SAR\": 8.39931,\n            \"ISR\": 14.3811\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8186,\n            \"SIR\": 21.0005,\n            \"SAR\": 16.5319,\n            \"ISR\": 19.9054\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.99406,\n            \"SIR\": 6.30951,\n            \"SAR\": 3.1006,\n            \"ISR\": 11.4657\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5838,\n            \"SIR\": 19.7417,\n            \"SAR\": 13.4876,\n            \"ISR\": 17.2048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.49051,\n            \"SIR\": 20.3363,\n            \"SAR\": 8.80304,\n            \"ISR\": 11.828\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4692,\n            \"SIR\": 13.9747,\n            \"SAR\": 12.9802,\n            \"ISR\": 21.1922\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.62613,\n            \"SIR\": 15.9568,\n            \"SAR\": 9.31362,\n            \"ISR\": 13.0723\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9411,\n            \"SIR\": 15.2113,\n            \"SAR\": 13.0171,\n            \"ISR\": 19.8292\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.69432,\n            \"SIR\": 19.2793,\n            \"SAR\": 10.1191,\n            \"ISR\": 13.5519\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1343,\n            \"SIR\": 18.1194,\n            \"SAR\": 15.8495,\n            \"ISR\": 24.1207\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0848,\n            \"SIR\": 20.3658,\n            \"SAR\": 5.4703,\n            \"ISR\": 8.45097\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1645,\n            \"SIR\": 22.117,\n            \"SAR\": 22.5022,\n            \"ISR\": 21.4384\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.6596,\n            \"SIR\": 25.8751,\n            \"SAR\": 9.85267,\n            \"ISR\": 12.2801\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2606,\n            \"SIR\": 15.9448,\n            \"SAR\": 15.0563,\n            \"ISR\": 19.4169\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68082,\n            \"SIR\": 18.2666,\n            \"SAR\": 7.24287,\n            \"ISR\": 10.005\n          },\n          \"instrumental\": {\n            \"SDR\": 12.878,\n            \"SIR\": 13.824,\n            \"SAR\": 13.847,\n            \"ISR\": 24.1875\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.84797,\n            \"SIR\": 23.4706,\n            \"SAR\": 10.4109,\n            \"ISR\": 16.0513\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0979,\n            \"SIR\": 19.5504,\n            \"SAR\": 14.4432,\n            \"ISR\": 22.6134\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.74748,\n            \"SIR\": 20.6581,\n            \"SAR\": 8.94782,\n            \"ISR\": 13.0916\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5602,\n            \"SIR\": 19.5197,\n            \"SAR\": 15.9022,\n            \"ISR\": 26.6374\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.78824,\n            \"SIR\": 17.2413,\n            \"SAR\": 6.35979,\n            \"ISR\": 11.2602\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9635,\n            \"SIR\": 19.5241,\n            \"SAR\": 16.1938,\n            \"ISR\": 24.8392\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.86097,\n            \"SIR\": 18.6092,\n            \"SAR\": 7.46524,\n            \"ISR\": 11.6822\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6088,\n            \"SIR\": 20.1364,\n            \"SAR\": 17.89,\n            \"ISR\": 18.7545\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.42423,\n            \"SIR\": 14.7608,\n            \"SAR\": 1.7754,\n            \"ISR\": 6.74381\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8927,\n            \"SIR\": 17.723,\n            \"SAR\": 18.9541,\n            \"ISR\": 27.369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84459,\n            \"SIR\": 18.7054,\n            \"SAR\": 8.34885,\n            \"ISR\": 13.4779\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1435,\n            \"SIR\": 18.2079,\n            \"SAR\": 15.1135,\n            \"ISR\": 18.9942\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.24479,\n            \"SIR\": 19.5704,\n            \"SAR\": 5.57451,\n            \"ISR\": 9.53887\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4719,\n            \"SIR\": 16.6878,\n            \"SAR\": 15.0485,\n            \"ISR\": 21.5875\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -14.362,\n            \"SIR\": -36.8235,\n            \"SAR\": 0.05692,\n            \"ISR\": 2.97604\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5248,\n            \"SIR\": 46.5231,\n            \"SAR\": 12.1418,\n            \"ISR\": 16.2859\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.75014,\n            \"SIR\": 15.2202,\n            \"SAR\": 5.01357,\n            \"ISR\": 8.3006\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6365,\n            \"SIR\": 16.9556,\n            \"SAR\": 16.8094,\n            \"ISR\": 26.8088\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.15072,\n            \"SIR\": 18.7709,\n            \"SAR\": 8.71484,\n            \"ISR\": 12.5102\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7908,\n            \"SIR\": 18.6218,\n            \"SAR\": 16.2998,\n            \"ISR\": 19.7491\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04521,\n            \"SIR\": 20.3281,\n            \"SAR\": 9.19245,\n            \"ISR\": 13.4945\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5021,\n            \"SIR\": 20.5096,\n            \"SAR\": 18.2521,\n            \"ISR\": 28.5136\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.734605,\n            \"SIR\": 11.7582,\n            \"SAR\": 0.28468,\n            \"ISR\": 5.58422\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3042,\n            \"SIR\": 19.7121,\n            \"SAR\": 18.9849,\n            \"ISR\": 19.4355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.37459,\n            \"SIR\": 12.9288,\n            \"SAR\": 2.93736,\n            \"ISR\": 5.22264\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1648,\n            \"SIR\": 12.0941,\n            \"SAR\": 15.4173,\n            \"ISR\": 18.5163\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.15221,\n            \"SIR\": 20.5995,\n            \"SAR\": 1.55236,\n            \"ISR\": 7.32372\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1042,\n            \"SIR\": 20.376,\n            \"SAR\": 18.058,\n            \"ISR\": 22.4219\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.89508,\n            \"SIR\": 16.155,\n            \"SAR\": 5.95612,\n            \"ISR\": 9.52464\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8018,\n            \"SIR\": 17.6361,\n            \"SAR\": 16.8416,\n            \"ISR\": 25.6079\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.85588,\n            \"SIR\": 12.0053,\n            \"SAR\": 5.62864,\n            \"ISR\": 12.298\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3336,\n            \"SIR\": 18.6347,\n            \"SAR\": 14.5938,\n            \"ISR\": 19.7336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.60294,\n            \"SIR\": 18.4607,\n            \"SAR\": 8.30484,\n            \"ISR\": 11.435\n          },\n          \"instrumental\": {\n            \"SDR\": 9.59475,\n            \"SIR\": 13.2285,\n            \"SAR\": 11.6414,\n            \"ISR\": 20.986\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.97519,\n            \"SIR\": 10.2667,\n            \"SAR\": 3.59107,\n            \"ISR\": 8.33597\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1096,\n            \"SIR\": 13.0706,\n            \"SAR\": 12.1869,\n            \"ISR\": 19.1603\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.72201,\n            \"SIR\": 21.5301,\n            \"SAR\": 8.83955,\n            \"ISR\": 12.6502\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6775,\n            \"SIR\": 17.4183,\n            \"SAR\": 15.54,\n            \"ISR\": 17.6848\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9152,\n            \"SIR\": 23.0828,\n            \"SAR\": 9.54188,\n            \"ISR\": 11.7552\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4927,\n            \"SIR\": 12.0964,\n            \"SAR\": 10.566,\n            \"ISR\": 24.4476\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.40956,\n            \"SIR\": 18.7681,\n            \"SAR\": 9.27832,\n            \"ISR\": 12.8404\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8489,\n            \"SIR\": 16.354,\n            \"SAR\": 14.0441,\n            \"ISR\": 19.9596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.6756,\n            \"SIR\": 30.6905,\n            \"SAR\": 10.4859,\n            \"ISR\": 14.4171\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2941,\n            \"SIR\": 22.4912,\n            \"SAR\": 19.2023,\n            \"ISR\": 19.3543\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.07049,\n            \"SIR\": 8.86747,\n            \"SAR\": 5.96601,\n            \"ISR\": 9.11737\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1071,\n            \"SIR\": 15.2494,\n            \"SAR\": 13.069,\n            \"ISR\": 15.7414\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1122,\n            \"SIR\": 27.1205,\n            \"SAR\": 13.5418,\n            \"ISR\": 19.7598\n          },\n          \"instrumental\": {\n            \"SDR\": 19.4212,\n            \"SIR\": 26.8558,\n            \"SAR\": 20.5656,\n            \"ISR\": 27.3669\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.97273,\n            \"SIR\": 19.2215,\n            \"SAR\": 11.1354,\n            \"ISR\": 17.7203\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0736,\n            \"SIR\": 15.59,\n            \"SAR\": 11.0266,\n            \"ISR\": 18.8722\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.03998,\n            \"SIR\": 16.9494,\n            \"SAR\": 6.1379,\n            \"ISR\": 10.7825\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2962,\n            \"SIR\": 15.852,\n            \"SAR\": 13.5976,\n            \"ISR\": 23.4322\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.60752,\n            \"SIR\": 5.48203,\n            \"SAR\": 1.35698,\n            \"ISR\": 10.3112\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4661,\n            \"SIR\": 26.3381,\n            \"SAR\": 18.7769,\n            \"ISR\": 17.9984\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.30248,\n            \"SIR\": 20.9118,\n            \"SAR\": 7.61325,\n            \"ISR\": 12.2099\n          },\n          \"instrumental\": {\n            \"SDR\": 15.962,\n            \"SIR\": 20.1418,\n            \"SAR\": 17.6917,\n            \"ISR\": 28.3288\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.14863,\n            \"SIR\": 24.9556,\n            \"SAR\": 8.30588,\n            \"ISR\": 11.7522\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9524,\n            \"SIR\": 20.7986,\n            \"SAR\": 18.7984,\n            \"ISR\": 24.2858\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79451,\n            \"SIR\": 22.4829,\n            \"SAR\": 8.58089,\n            \"ISR\": 11.7168\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0624,\n            \"SIR\": 15.373,\n            \"SAR\": 13.8614,\n            \"ISR\": 22.0634\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.22383,\n            \"SIR\": 20.1185,\n            \"SAR\": 7.87232,\n            \"ISR\": 12.1618\n          },\n          \"instrumental\": {\n            \"SDR\": 14.085,\n            \"SIR\": 18.6513,\n            \"SAR\": 16.4229,\n            \"ISR\": 25.9444\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.30248,\n        \"SIR\": 18.7681,\n        \"SAR\": 7.61325,\n        \"ISR\": 11.7522\n      },\n      \"instrumental\": {\n        \"SDR\": 13.7908,\n        \"SIR\": 18.2079,\n        \"SAR\": 15.54,\n        \"ISR\": 20.986\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"12_SP-UVR-3B-44100.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 12_SP-UVR-3B-44100\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.02814,\n            \"SIR\": 13.8688,\n            \"SAR\": 3.86639,\n            \"ISR\": 7.45564\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0726,\n            \"SIR\": 18.9793,\n            \"SAR\": 19.1598,\n            \"ISR\": 18.6167\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.93034,\n            \"SIR\": 10.449,\n            \"SAR\": 5.22867,\n            \"ISR\": 12.0323\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0922,\n            \"SIR\": 17.3567,\n            \"SAR\": 11.9871,\n            \"ISR\": 14.2859\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.14578,\n            \"SIR\": 20.7771,\n            \"SAR\": 7.94933,\n            \"ISR\": 13.4559\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2385,\n            \"SIR\": 19.9489,\n            \"SAR\": 16.4486,\n            \"ISR\": 23.5334\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.585,\n            \"SIR\": 7.26795,\n            \"SAR\": 3.35357,\n            \"ISR\": 11.4714\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9982,\n            \"SIR\": 19.7547,\n            \"SAR\": 13.6384,\n            \"ISR\": 18.1648\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.63349,\n            \"SIR\": 20.8366,\n            \"SAR\": 8.96628,\n            \"ISR\": 11.8193\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6868,\n            \"SIR\": 13.8917,\n            \"SAR\": 13.1537,\n            \"ISR\": 24.6912\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.71592,\n            \"SIR\": 15.9061,\n            \"SAR\": 9.43897,\n            \"ISR\": 13.3975\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0122,\n            \"SIR\": 15.3479,\n            \"SAR\": 13.0051,\n            \"ISR\": 19.7634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.92737,\n            \"SIR\": 18.6469,\n            \"SAR\": 10.3047,\n            \"ISR\": 14.545\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2452,\n            \"SIR\": 18.73,\n            \"SAR\": 15.6826,\n            \"ISR\": 24.4004\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.59732,\n            \"SIR\": 21.2289,\n            \"SAR\": 6.75736,\n            \"ISR\": 9.91152\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2441,\n            \"SIR\": 21.8697,\n            \"SAR\": 21.3371,\n            \"ISR\": 21.1316\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.70133,\n            \"SIR\": 25.6276,\n            \"SAR\": 9.87426,\n            \"ISR\": 12.0406\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9984,\n            \"SIR\": 15.5671,\n            \"SAR\": 15.0333,\n            \"ISR\": 19.6758\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.58659,\n            \"SIR\": 17.6204,\n            \"SAR\": 7.38273,\n            \"ISR\": 10.0118\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5841,\n            \"SIR\": 13.6179,\n            \"SAR\": 13.9716,\n            \"ISR\": 22.9587\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.69846,\n            \"SIR\": 22.1192,\n            \"SAR\": 10.1502,\n            \"ISR\": 16.334\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8085,\n            \"SIR\": 19.7623,\n            \"SAR\": 14.2077,\n            \"ISR\": 21.2653\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.84933,\n            \"SIR\": 21.2017,\n            \"SAR\": 9.0412,\n            \"ISR\": 13.4185\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5717,\n            \"SIR\": 19.3707,\n            \"SAR\": 16.11,\n            \"ISR\": 27.9267\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.43966,\n            \"SIR\": 17.6976,\n            \"SAR\": 6.78796,\n            \"ISR\": 11.3965\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0883,\n            \"SIR\": 18.396,\n            \"SAR\": 15.4366,\n            \"ISR\": 24.944\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.72585,\n            \"SIR\": 20.7561,\n            \"SAR\": 8.28962,\n            \"ISR\": 12.0577\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7708,\n            \"SIR\": 18.1527,\n            \"SAR\": 17.1991,\n            \"ISR\": 18.1932\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.398,\n            \"SIR\": 17.7581,\n            \"SAR\": 2.82674,\n            \"ISR\": 6.59425\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7013,\n            \"SIR\": 15.9021,\n            \"SAR\": 17.8286,\n            \"ISR\": 28.0489\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.88729,\n            \"SIR\": 18.9982,\n            \"SAR\": 8.5793,\n            \"ISR\": 13.2669\n          },\n          \"instrumental\": {\n            \"SDR\": 12.974,\n            \"SIR\": 17.983,\n            \"SAR\": 15.1671,\n            \"ISR\": 18.8691\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.24364,\n            \"SIR\": 20.0766,\n            \"SAR\": 5.62216,\n            \"ISR\": 9.52464\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7571,\n            \"SIR\": 15.9233,\n            \"SAR\": 14.8433,\n            \"ISR\": 18.1566\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -22.5095,\n            \"SIR\": -39.506,\n            \"SAR\": 0.40621,\n            \"ISR\": 0.46604\n          },\n          \"instrumental\": {\n            \"SDR\": 10.141,\n            \"SIR\": 47.6819,\n            \"SAR\": 11.0849,\n            \"ISR\": 14.3503\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.84255,\n            \"SIR\": 14.3917,\n            \"SAR\": 5.25149,\n            \"ISR\": 8.78137\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8792,\n            \"SIR\": 17.4008,\n            \"SAR\": 17.0551,\n            \"ISR\": 26.0931\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.39653,\n            \"SIR\": 19.0845,\n            \"SAR\": 8.30213,\n            \"ISR\": 12.0674\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4939,\n            \"SIR\": 17.6494,\n            \"SAR\": 16.0763,\n            \"ISR\": 19.7919\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.81424,\n            \"SIR\": 20.4989,\n            \"SAR\": 10.4545,\n            \"ISR\": 15.2281\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9992,\n            \"SIR\": 21.6924,\n            \"SAR\": 18.6378,\n            \"ISR\": 28.344\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.58262,\n            \"SIR\": 7.17757,\n            \"SAR\": 0.19646,\n            \"ISR\": 5.06085\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9715,\n            \"SIR\": 19.9999,\n            \"SAR\": 17.5695,\n            \"ISR\": 18.7622\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.42283,\n            \"SIR\": 13.5122,\n            \"SAR\": 3.05493,\n            \"ISR\": 5.45809\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3022,\n            \"SIR\": 12.2305,\n            \"SAR\": 15.4814,\n            \"ISR\": 18.6963\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.81387,\n            \"SIR\": 21.0838,\n            \"SAR\": 6.18649,\n            \"ISR\": 9.49702\n          },\n          \"instrumental\": {\n            \"SDR\": 16.208,\n            \"SIR\": 20.1215,\n            \"SAR\": 18.13,\n            \"ISR\": 24.1106\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91358,\n            \"SIR\": 17.3857,\n            \"SAR\": 5.88287,\n            \"ISR\": 9.31887\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8848,\n            \"SIR\": 17.3547,\n            \"SAR\": 16.8783,\n            \"ISR\": 27.1805\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.55312,\n            \"SIR\": 15.1974,\n            \"SAR\": 6.54748,\n            \"ISR\": 12.5038\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6404,\n            \"SIR\": 18.1728,\n            \"SAR\": 14.7092,\n            \"ISR\": 17.7815\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49493,\n            \"SIR\": 18.3552,\n            \"SAR\": 8.16201,\n            \"ISR\": 11.2232\n          },\n          \"instrumental\": {\n            \"SDR\": 9.36637,\n            \"SIR\": 12.5978,\n            \"SAR\": 11.5028,\n            \"ISR\": 21.2161\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.93884,\n            \"SIR\": 9.79519,\n            \"SAR\": 3.64735,\n            \"ISR\": 8.36923\n          },\n          \"instrumental\": {\n            \"SDR\": 9.67626,\n            \"SIR\": 12.8617,\n            \"SAR\": 11.8042,\n            \"ISR\": 18.3969\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.471,\n            \"SIR\": 20.22,\n            \"SAR\": 8.97851,\n            \"ISR\": 12.29\n          },\n          \"instrumental\": {\n            \"SDR\": 12.594,\n            \"SIR\": 16.6172,\n            \"SAR\": 15.5781,\n            \"ISR\": 17.3924\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.289,\n            \"SIR\": 23.2038,\n            \"SAR\": 9.63599,\n            \"ISR\": 12.043\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0513,\n            \"SIR\": 11.6475,\n            \"SAR\": 10.055,\n            \"ISR\": 24.484\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68702,\n            \"SIR\": 19.7453,\n            \"SAR\": 9.74042,\n            \"ISR\": 12.7187\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6058,\n            \"SIR\": 15.213,\n            \"SAR\": 14.1023,\n            \"ISR\": 18.1782\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.42229,\n            \"SIR\": 28.8034,\n            \"SAR\": 10.2956,\n            \"ISR\": 14.5663\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5409,\n            \"SIR\": 22.5657,\n            \"SAR\": 19.2258,\n            \"ISR\": 19.5192\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.99776,\n            \"SIR\": 8.94443,\n            \"SAR\": 7.34837,\n            \"ISR\": 9.13308\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2386,\n            \"SIR\": 14.8091,\n            \"SAR\": 12.9585,\n            \"ISR\": 15.8542\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8707,\n            \"SIR\": 25.4991,\n            \"SAR\": 13.2785,\n            \"ISR\": 20.3103\n          },\n          \"instrumental\": {\n            \"SDR\": 20.1243,\n            \"SIR\": 27.6273,\n            \"SAR\": 20.9628,\n            \"ISR\": 28.1562\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.6324,\n            \"SIR\": 19.4792,\n            \"SAR\": 10.4926,\n            \"ISR\": 16.2551\n          },\n          \"instrumental\": {\n            \"SDR\": 9.94676,\n            \"SIR\": 14.6482,\n            \"SAR\": 11.0144,\n            \"ISR\": 19.2249\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.22008,\n            \"SIR\": 16.8543,\n            \"SAR\": 6.47889,\n            \"ISR\": 10.8592\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1009,\n            \"SIR\": 15.3297,\n            \"SAR\": 13.3787,\n            \"ISR\": 23.7143\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.30538,\n            \"SIR\": 14.9096,\n            \"SAR\": 5.67958,\n            \"ISR\": 11.3022\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0586,\n            \"SIR\": 19.4625,\n            \"SAR\": 16.1133,\n            \"ISR\": 18.481\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.23986,\n            \"SIR\": 23.4319,\n            \"SAR\": 8.4494,\n            \"ISR\": 12.8163\n          },\n          \"instrumental\": {\n            \"SDR\": 14.714,\n            \"SIR\": 19.1031,\n            \"SAR\": 16.8113,\n            \"ISR\": 29.9726\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.36405,\n            \"SIR\": 24.4449,\n            \"SAR\": 8.86552,\n            \"ISR\": 11.9543\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5202,\n            \"SIR\": 20.6881,\n            \"SAR\": 18.1013,\n            \"ISR\": 24.1135\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.93737,\n            \"SIR\": 22.5637,\n            \"SAR\": 8.46623,\n            \"ISR\": 11.6299\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6523,\n            \"SIR\": 15.1138,\n            \"SAR\": 13.979,\n            \"ISR\": 21.6857\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.12085,\n            \"SIR\": 19.1023,\n            \"SAR\": 7.62759,\n            \"ISR\": 12.3058\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0382,\n            \"SIR\": 18.5704,\n            \"SAR\": 16.3556,\n            \"ISR\": 26.0105\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.49493,\n        \"SIR\": 19.0845,\n        \"SAR\": 7.94933,\n        \"ISR\": 11.9543\n      },\n      \"instrumental\": {\n        \"SDR\": 13.0586,\n        \"SIR\": 17.983,\n        \"SAR\": 15.4814,\n        \"ISR\": 21.1316\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"13_SP-UVR-4B-44100-1.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 13_SP-UVR-4B-44100-1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.41826,\n            \"SIR\": 14.6844,\n            \"SAR\": 4.2865,\n            \"ISR\": 8.28759\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8749,\n            \"SIR\": 19.2658,\n            \"SAR\": 18.7497,\n            \"ISR\": 18.731\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.98532,\n            \"SIR\": 10.0177,\n            \"SAR\": 5.29026,\n            \"ISR\": 12.1635\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0819,\n            \"SIR\": 17.5659,\n            \"SAR\": 12.1074,\n            \"ISR\": 14.0863\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.38654,\n            \"SIR\": 20.808,\n            \"SAR\": 8.3166,\n            \"ISR\": 13.7502\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9009,\n            \"SIR\": 20.3153,\n            \"SAR\": 16.691,\n            \"ISR\": 20.6964\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.91697,\n            \"SIR\": 7.49656,\n            \"SAR\": 3.99703,\n            \"ISR\": 12.1556\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9512,\n            \"SIR\": 20.3185,\n            \"SAR\": 13.6902,\n            \"ISR\": 18.0258\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78854,\n            \"SIR\": 20.2237,\n            \"SAR\": 9.92833,\n            \"ISR\": 13.2697\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5207,\n            \"SIR\": 15.1626,\n            \"SAR\": 13.8834,\n            \"ISR\": 22.8307\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.59545,\n            \"SIR\": 16.3628,\n            \"SAR\": 9.18637,\n            \"ISR\": 12.5797\n          },\n          \"instrumental\": {\n            \"SDR\": 10.84,\n            \"SIR\": 14.7996,\n            \"SAR\": 13.0028,\n            \"ISR\": 20.122\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.6345,\n            \"SIR\": 18.6996,\n            \"SAR\": 10.2487,\n            \"ISR\": 13.808\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2413,\n            \"SIR\": 18.4264,\n            \"SAR\": 16.0357,\n            \"ISR\": 24.1197\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.73358,\n            \"SIR\": 21.9277,\n            \"SAR\": 6.8544,\n            \"ISR\": 9.00087\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6595,\n            \"SIR\": 20.5913,\n            \"SAR\": 21.2521,\n            \"ISR\": 20.6173\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.89178,\n            \"SIR\": 25.8769,\n            \"SAR\": 9.9147,\n            \"ISR\": 12.0276\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1371,\n            \"SIR\": 15.6452,\n            \"SAR\": 15.3373,\n            \"ISR\": 18.8936\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.6614,\n            \"SIR\": 18.1423,\n            \"SAR\": 7.43352,\n            \"ISR\": 9.79423\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0206,\n            \"SIR\": 13.6758,\n            \"SAR\": 13.9994,\n            \"ISR\": 24.2626\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.98951,\n            \"SIR\": 23.3727,\n            \"SAR\": 10.4646,\n            \"ISR\": 15.7627\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9827,\n            \"SIR\": 19.2392,\n            \"SAR\": 14.6142,\n            \"ISR\": 22.2657\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.80261,\n            \"SIR\": 20.9007,\n            \"SAR\": 9.67106,\n            \"ISR\": 14.4635\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7152,\n            \"SIR\": 20.4592,\n            \"SAR\": 16.4465,\n            \"ISR\": 27.4427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.6711,\n            \"SIR\": 18.0848,\n            \"SAR\": 6.89336,\n            \"ISR\": 11.3345\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3313,\n            \"SIR\": 18.9451,\n            \"SAR\": 15.6149,\n            \"ISR\": 25.2544\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.46491,\n            \"SIR\": 17.8794,\n            \"SAR\": 7.8588,\n            \"ISR\": 11.6634\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9786,\n            \"SIR\": 17.6716,\n            \"SAR\": 15.4965,\n            \"ISR\": 20.0721\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.5194,\n            \"SIR\": 15.8413,\n            \"SAR\": 2.27776,\n            \"ISR\": 6.87151\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1063,\n            \"SIR\": 16.7401,\n            \"SAR\": 18.3552,\n            \"ISR\": 28.7067\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.94543,\n            \"SIR\": 19.1868,\n            \"SAR\": 8.5286,\n            \"ISR\": 13.1133\n          },\n          \"instrumental\": {\n            \"SDR\": 13.115,\n            \"SIR\": 17.9577,\n            \"SAR\": 15.2125,\n            \"ISR\": 19.5041\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.51509,\n            \"SIR\": 19.2956,\n            \"SAR\": 6.58455,\n            \"ISR\": 10.0709\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3243,\n            \"SIR\": 16.9425,\n            \"SAR\": 15.6506,\n            \"ISR\": 20.4946\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -29.051,\n            \"SIR\": -37.8741,\n            \"SAR\": 0.70834,\n            \"ISR\": -0.90944\n          },\n          \"instrumental\": {\n            \"SDR\": 9.28536,\n            \"SIR\": 46.5568,\n            \"SAR\": 10.8375,\n            \"ISR\": 13.5086\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04273,\n            \"SIR\": 15.2185,\n            \"SAR\": 5.80012,\n            \"ISR\": 8.68761\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8014,\n            \"SIR\": 17.0753,\n            \"SAR\": 17.2208,\n            \"ISR\": 26.0686\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.06754,\n            \"SIR\": 19.8207,\n            \"SAR\": 8.18037,\n            \"ISR\": 11.5825\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4792,\n            \"SIR\": 17.1294,\n            \"SAR\": 16.4137,\n            \"ISR\": 19.8525\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2755,\n            \"SIR\": 19.3087,\n            \"SAR\": 8.95044,\n            \"ISR\": 12.7896\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8133,\n            \"SIR\": 22.126,\n            \"SAR\": 19.4998,\n            \"ISR\": 28.5727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.07514,\n            \"SIR\": 16.1756,\n            \"SAR\": 2.04702,\n            \"ISR\": 7.39154\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2322,\n            \"SIR\": 17.1006,\n            \"SAR\": 16.5759,\n            \"ISR\": 18.3298\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.32322,\n            \"SIR\": 12.0928,\n            \"SAR\": 3.08007,\n            \"ISR\": 5.47207\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2374,\n            \"SIR\": 12.2337,\n            \"SAR\": 15.0952,\n            \"ISR\": 18.8107\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.61471,\n            \"SIR\": 16.8314,\n            \"SAR\": 5.69008,\n            \"ISR\": 9.00259\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4581,\n            \"SIR\": 19.8594,\n            \"SAR\": 17.7709,\n            \"ISR\": 21.6855\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.09813,\n            \"SIR\": 15.6501,\n            \"SAR\": 6.25027,\n            \"ISR\": 9.83092\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9968,\n            \"SIR\": 17.8731,\n            \"SAR\": 16.9864,\n            \"ISR\": 25.6636\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.87252,\n            \"SIR\": 15.7198,\n            \"SAR\": 7.31811,\n            \"ISR\": 12.3802\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6735,\n            \"SIR\": 17.5715,\n            \"SAR\": 14.2639,\n            \"ISR\": 20.1323\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.93513,\n            \"SIR\": 18.9585,\n            \"SAR\": 8.71966,\n            \"ISR\": 11.6686\n          },\n          \"instrumental\": {\n            \"SDR\": 9.82902,\n            \"SIR\": 13.2975,\n            \"SAR\": 12.0334,\n            \"ISR\": 21.8169\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.32691,\n            \"SIR\": 12.8676,\n            \"SAR\": 3.97944,\n            \"ISR\": 7.63503\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4932,\n            \"SIR\": 12.5298,\n            \"SAR\": 13.4283,\n            \"ISR\": 22.0864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.16301,\n            \"SIR\": 21.7809,\n            \"SAR\": 9.34315,\n            \"ISR\": 12.73\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7761,\n            \"SIR\": 17.238,\n            \"SAR\": 15.8078,\n            \"ISR\": 17.7607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4597,\n            \"SIR\": 23.1536,\n            \"SAR\": 10.6549,\n            \"ISR\": 13.5269\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1292,\n            \"SIR\": 12.995,\n            \"SAR\": 10.817,\n            \"ISR\": 24.6006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.00818,\n            \"SIR\": 20.0705,\n            \"SAR\": 10.0386,\n            \"ISR\": 13.1714\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5563,\n            \"SIR\": 15.5264,\n            \"SAR\": 14.0823,\n            \"ISR\": 17.0441\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79125,\n            \"SIR\": 29.4913,\n            \"SAR\": 10.4872,\n            \"ISR\": 15.0597\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0196,\n            \"SIR\": 22.6459,\n            \"SAR\": 19.8371,\n            \"ISR\": 19.6068\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.30705,\n            \"SIR\": 8.59699,\n            \"SAR\": 6.80478,\n            \"ISR\": 9.91799\n          },\n          \"instrumental\": {\n            \"SDR\": 9.25459,\n            \"SIR\": 15.055,\n            \"SAR\": 12.046,\n            \"ISR\": 14.839\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5074,\n            \"SIR\": 26.3587,\n            \"SAR\": 13.049,\n            \"ISR\": 19.1806\n          },\n          \"instrumental\": {\n            \"SDR\": 20.8636,\n            \"SIR\": 26.645,\n            \"SAR\": 21.9877,\n            \"ISR\": 29.1322\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2598,\n            \"SIR\": 19.377,\n            \"SAR\": 11.357,\n            \"ISR\": 17.4191\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1679,\n            \"SIR\": 15.5676,\n            \"SAR\": 11.0068,\n            \"ISR\": 18.8198\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.19851,\n            \"SIR\": 17.4315,\n            \"SAR\": 6.44975,\n            \"ISR\": 10.4084\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2299,\n            \"SIR\": 15.1968,\n            \"SAR\": 13.8059,\n            \"ISR\": 24.1121\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.51984,\n            \"SIR\": 11.638,\n            \"SAR\": 4.68858,\n            \"ISR\": 11.2223\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7783,\n            \"SIR\": 21.6232,\n            \"SAR\": 16.6126,\n            \"ISR\": 17.9909\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84301,\n            \"SIR\": 23.6197,\n            \"SAR\": 8.17242,\n            \"ISR\": 12.3188\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9459,\n            \"SIR\": 19.1077,\n            \"SAR\": 17.2316,\n            \"ISR\": 30.1052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.15174,\n            \"SIR\": 25.4632,\n            \"SAR\": 8.93095,\n            \"ISR\": 12.3378\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1115,\n            \"SIR\": 20.6906,\n            \"SAR\": 18.7004,\n            \"ISR\": 24.8482\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.02677,\n            \"SIR\": 23.2843,\n            \"SAR\": 8.57835,\n            \"ISR\": 11.4552\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8811,\n            \"SIR\": 15.1211,\n            \"SAR\": 14.086,\n            \"ISR\": 21.8536\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.6846,\n            \"SIR\": 20.7786,\n            \"SAR\": 8.4539,\n            \"ISR\": 12.2047\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2158,\n            \"SIR\": 18.655,\n            \"SAR\": 16.7755,\n            \"ISR\": 28.0416\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.84301,\n        \"SIR\": 18.9585,\n        \"SAR\": 8.17242,\n        \"ISR\": 12.0276\n      },\n      \"instrumental\": {\n        \"SDR\": 13.3243,\n        \"SIR\": 17.5715,\n        \"SAR\": 15.6149,\n        \"ISR\": 20.6964\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"14_SP-UVR-4B-44100-2.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 14_SP-UVR-4B-44100-2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.5248,\n            \"SIR\": 13.5222,\n            \"SAR\": 4.46966,\n            \"ISR\": 8.8485\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8167,\n            \"SIR\": 19.8877,\n            \"SAR\": 18.6056,\n            \"ISR\": 18.3775\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.50422,\n            \"SIR\": 9.35119,\n            \"SAR\": 5.2104,\n            \"ISR\": 12.7287\n          },\n          \"instrumental\": {\n            \"SDR\": 9.81308,\n            \"SIR\": 18.0013,\n            \"SAR\": 11.5798,\n            \"ISR\": 13.6069\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.31222,\n            \"SIR\": 19.9216,\n            \"SAR\": 8.41086,\n            \"ISR\": 14.3203\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4006,\n            \"SIR\": 20.8796,\n            \"SAR\": 16.716,\n            \"ISR\": 22.6893\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.66562,\n            \"SIR\": 7.74842,\n            \"SAR\": 3.72953,\n            \"ISR\": 12.3266\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8215,\n            \"SIR\": 20.4704,\n            \"SAR\": 13.638,\n            \"ISR\": 18.2074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87923,\n            \"SIR\": 19.9102,\n            \"SAR\": 9.87533,\n            \"ISR\": 13.2056\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6597,\n            \"SIR\": 15.1439,\n            \"SAR\": 13.6654,\n            \"ISR\": 21.7636\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.64272,\n            \"SIR\": 15.9929,\n            \"SAR\": 9.35723,\n            \"ISR\": 12.9232\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8569,\n            \"SIR\": 15.1289,\n            \"SAR\": 13.0045,\n            \"ISR\": 19.6896\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.70688,\n            \"SIR\": 18.2884,\n            \"SAR\": 10.366,\n            \"ISR\": 14.1405\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3108,\n            \"SIR\": 18.7936,\n            \"SAR\": 16.0897,\n            \"ISR\": 23.6964\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.93587,\n            \"SIR\": 21.7611,\n            \"SAR\": 6.80393,\n            \"ISR\": 9.73137\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0138,\n            \"SIR\": 20.7423,\n            \"SAR\": 21.3503,\n            \"ISR\": 21.4343\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92399,\n            \"SIR\": 25.9582,\n            \"SAR\": 9.8791,\n            \"ISR\": 12.5412\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1403,\n            \"SIR\": 16.0969,\n            \"SAR\": 15.1034,\n            \"ISR\": 18.814\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.89648,\n            \"SIR\": 17.8489,\n            \"SAR\": 7.56405,\n            \"ISR\": 10.1354\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0456,\n            \"SIR\": 13.8515,\n            \"SAR\": 13.9523,\n            \"ISR\": 24.3331\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1654,\n            \"SIR\": 23.8652,\n            \"SAR\": 10.5501,\n            \"ISR\": 16.4109\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0762,\n            \"SIR\": 19.7323,\n            \"SAR\": 14.6128,\n            \"ISR\": 22.4236\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87981,\n            \"SIR\": 21.0161,\n            \"SAR\": 9.57051,\n            \"ISR\": 13.7883\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8092,\n            \"SIR\": 19.9202,\n            \"SAR\": 16.2827,\n            \"ISR\": 27.503\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.25203,\n            \"SIR\": 18.8013,\n            \"SAR\": 7.45612,\n            \"ISR\": 12.4327\n          },\n          \"instrumental\": {\n            \"SDR\": 13.614,\n            \"SIR\": 18.0469,\n            \"SAR\": 15.2312,\n            \"ISR\": 25.1401\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.14154,\n            \"SIR\": 19.0885,\n            \"SAR\": 8.55902,\n            \"ISR\": 12.854\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4857,\n            \"SIR\": 18.4205,\n            \"SAR\": 15.2542,\n            \"ISR\": 21.2744\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.88173,\n            \"SIR\": 15.9683,\n            \"SAR\": 2.64208,\n            \"ISR\": 7.27436\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2268,\n            \"SIR\": 17.1532,\n            \"SAR\": 18.3193,\n            \"ISR\": 28.7752\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04085,\n            \"SIR\": 18.7211,\n            \"SAR\": 8.54688,\n            \"ISR\": 13.59\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2073,\n            \"SIR\": 18.4398,\n            \"SAR\": 15.1521,\n            \"ISR\": 20.052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.77572,\n            \"SIR\": 22.2527,\n            \"SAR\": 6.59366,\n            \"ISR\": 10.2828\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6398,\n            \"SIR\": 16.9955,\n            \"SAR\": 15.9125,\n            \"ISR\": 21.715\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -17.4213,\n            \"SIR\": -34.6844,\n            \"SAR\": 1.38369,\n            \"ISR\": -1.08296\n          },\n          \"instrumental\": {\n            \"SDR\": 9.81801,\n            \"SIR\": 45.1931,\n            \"SAR\": 11.1872,\n            \"ISR\": 13.8325\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.27353,\n            \"SIR\": 13.9943,\n            \"SAR\": 5.86874,\n            \"ISR\": 9.35191\n          },\n          \"instrumental\": {\n            \"SDR\": 15.746,\n            \"SIR\": 17.7356,\n            \"SAR\": 17.3166,\n            \"ISR\": 24.8408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.36012,\n            \"SIR\": 19.2626,\n            \"SAR\": 8.51897,\n            \"ISR\": 12.1376\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6585,\n            \"SIR\": 17.7544,\n            \"SAR\": 16.5072,\n            \"ISR\": 19.6113\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.8139,\n            \"SIR\": 19.1961,\n            \"SAR\": 9.83911,\n            \"ISR\": 13.7534\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1823,\n            \"SIR\": 21.7665,\n            \"SAR\": 18.9064,\n            \"ISR\": 28.0227\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.72719,\n            \"SIR\": 18.3205,\n            \"SAR\": 5.55217,\n            \"ISR\": 8.45251\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5494,\n            \"SIR\": 16.8988,\n            \"SAR\": 16.1755,\n            \"ISR\": 18.2756\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.38323,\n            \"SIR\": 12.5471,\n            \"SAR\": 3.03811,\n            \"ISR\": 5.56346\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2942,\n            \"SIR\": 12.3265,\n            \"SAR\": 15.0819,\n            \"ISR\": 19.1564\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.34138,\n            \"SIR\": 20.1964,\n            \"SAR\": 6.7755,\n            \"ISR\": 9.95848\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1747,\n            \"SIR\": 16.1604,\n            \"SAR\": 16.1362,\n            \"ISR\": 21.7528\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.11604,\n            \"SIR\": 16.1178,\n            \"SAR\": 6.24741,\n            \"ISR\": 10.0044\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0279,\n            \"SIR\": 17.9615,\n            \"SAR\": 16.8884,\n            \"ISR\": 26.0156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.46436,\n            \"SIR\": 16.1781,\n            \"SAR\": 7.641,\n            \"ISR\": 12.6534\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5482,\n            \"SIR\": 17.6419,\n            \"SAR\": 14.1705,\n            \"ISR\": 20.2154\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.92851,\n            \"SIR\": 18.7334,\n            \"SAR\": 8.68737,\n            \"ISR\": 11.6045\n          },\n          \"instrumental\": {\n            \"SDR\": 9.75899,\n            \"SIR\": 13.1036,\n            \"SAR\": 11.9679,\n            \"ISR\": 21.5514\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.00262,\n            \"SIR\": 13.8508,\n            \"SAR\": 4.39134,\n            \"ISR\": 8.26591\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1498,\n            \"SIR\": 12.2515,\n            \"SAR\": 12.7174,\n            \"ISR\": 21.3025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.17821,\n            \"SIR\": 21.4578,\n            \"SAR\": 9.43296,\n            \"ISR\": 12.9406\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9526,\n            \"SIR\": 17.6838,\n            \"SAR\": 15.6887,\n            \"ISR\": 17.6637\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3582,\n            \"SIR\": 22.5246,\n            \"SAR\": 10.5035,\n            \"ISR\": 13.4544\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3707,\n            \"SIR\": 13.2049,\n            \"SAR\": 10.7477,\n            \"ISR\": 23.9461\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.08221,\n            \"SIR\": 19.4725,\n            \"SAR\": 10.028,\n            \"ISR\": 13.4082\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7671,\n            \"SIR\": 15.8295,\n            \"SAR\": 14.0745,\n            \"ISR\": 17.8234\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.71127,\n            \"SIR\": 30.2996,\n            \"SAR\": 10.4048,\n            \"ISR\": 14.3588\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9789,\n            \"SIR\": 22.1168,\n            \"SAR\": 19.7453,\n            \"ISR\": 19.6412\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.75369,\n            \"SIR\": 9.89789,\n            \"SAR\": 4.42746,\n            \"ISR\": 7.657\n          },\n          \"instrumental\": {\n            \"SDR\": 8.85454,\n            \"SIR\": 12.4224,\n            \"SAR\": 10.9395,\n            \"ISR\": 15.9617\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7338,\n            \"SIR\": 27.9152,\n            \"SAR\": 14.0307,\n            \"ISR\": 20.0397\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7726,\n            \"SIR\": 26.7053,\n            \"SAR\": 20.5342,\n            \"ISR\": 30.8862\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4956,\n            \"SIR\": 19.617,\n            \"SAR\": 11.3343,\n            \"ISR\": 17.2533\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1916,\n            \"SIR\": 15.2461,\n            \"SAR\": 11.236,\n            \"ISR\": 19.2265\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.31051,\n            \"SIR\": 17.0441,\n            \"SAR\": 6.48147,\n            \"ISR\": 11.0137\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2954,\n            \"SIR\": 15.588,\n            \"SAR\": 13.6196,\n            \"ISR\": 23.47\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.77356,\n            \"SIR\": 12.6533,\n            \"SAR\": 5.00826,\n            \"ISR\": 11.6876\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8303,\n            \"SIR\": 20.5646,\n            \"SAR\": 16.2602,\n            \"ISR\": 17.8629\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.01294,\n            \"SIR\": 22.6827,\n            \"SAR\": 8.21141,\n            \"ISR\": 12.8724\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1509,\n            \"SIR\": 19.7286,\n            \"SAR\": 17.1011,\n            \"ISR\": 29.1674\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.19851,\n            \"SIR\": 24.9079,\n            \"SAR\": 9.1992,\n            \"ISR\": 12.7973\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8754,\n            \"SIR\": 20.7992,\n            \"SAR\": 18.5041,\n            \"ISR\": 24.6454\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.05843,\n            \"SIR\": 23.7943,\n            \"SAR\": 8.59817,\n            \"ISR\": 11.5114\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7621,\n            \"SIR\": 15.0864,\n            \"SAR\": 14.1552,\n            \"ISR\": 22.0227\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.59345,\n            \"SIR\": 20.7982,\n            \"SAR\": 8.29149,\n            \"ISR\": 12.0623\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2786,\n            \"SIR\": 18.6844,\n            \"SAR\": 16.8824,\n            \"ISR\": 28.1603\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.01294,\n        \"SIR\": 19.0885,\n        \"SAR\": 8.29149,\n        \"ISR\": 12.4327\n      },\n      \"instrumental\": {\n        \"SDR\": 13.5482,\n        \"SIR\": 17.7544,\n        \"SAR\": 15.2542,\n        \"ISR\": 21.5514\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"15_SP-UVR-MID-44100-1.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 15_SP-UVR-MID-44100-1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.04636,\n            \"SIR\": 14.85,\n            \"SAR\": 3.83909,\n            \"ISR\": 7.84627\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8345,\n            \"SIR\": 19.119,\n            \"SAR\": 19.62,\n            \"ISR\": 19.9563\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.03427,\n            \"SIR\": 10.1512,\n            \"SAR\": 5.33985,\n            \"ISR\": 12.5775\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3835,\n            \"SIR\": 17.9081,\n            \"SAR\": 12.3015,\n            \"ISR\": 14.015\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.99112,\n            \"SIR\": 21.4839,\n            \"SAR\": 8.19655,\n            \"ISR\": 13.376\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5252,\n            \"SIR\": 19.7386,\n            \"SAR\": 16.5328,\n            \"ISR\": 19.8799\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.6814,\n            \"SIR\": 8.03735,\n            \"SAR\": 2.83774,\n            \"ISR\": 11.2194\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4191,\n            \"SIR\": 19.3799,\n            \"SAR\": 14.1766,\n            \"ISR\": 19.2965\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1383,\n            \"SIR\": 21.7378,\n            \"SAR\": 9.52697,\n            \"ISR\": 12.0016\n          },\n          \"instrumental\": {\n            \"SDR\": 12.966,\n            \"SIR\": 13.9437,\n            \"SAR\": 13.7839,\n            \"ISR\": 23.605\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.63794,\n            \"SIR\": 16.6805,\n            \"SAR\": 9.28839,\n            \"ISR\": 12.9153\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8976,\n            \"SIR\": 14.9523,\n            \"SAR\": 12.9844,\n            \"ISR\": 20.7225\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1303,\n            \"SIR\": 19.6198,\n            \"SAR\": 11.1093,\n            \"ISR\": 13.9669\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5365,\n            \"SIR\": 18.0006,\n            \"SAR\": 16.6218,\n            \"ISR\": 25.4292\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.67131,\n            \"SIR\": 21.9459,\n            \"SAR\": 6.52327,\n            \"ISR\": 8.67098\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0086,\n            \"SIR\": 20.4128,\n            \"SAR\": 21.8534,\n            \"ISR\": 21.4844\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.87081,\n            \"SIR\": 26.466,\n            \"SAR\": 10.0688,\n            \"ISR\": 12.1843\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2049,\n            \"SIR\": 15.5198,\n            \"SAR\": 15.3823,\n            \"ISR\": 19.8869\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.18581,\n            \"SIR\": 19.1368,\n            \"SAR\": 7.92444,\n            \"ISR\": 10.3195\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0956,\n            \"SIR\": 14.0175,\n            \"SAR\": 14.2788,\n            \"ISR\": 25.2758\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4572,\n            \"SIR\": 23.6427,\n            \"SAR\": 11.2548,\n            \"ISR\": 17.0547\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3908,\n            \"SIR\": 20.3204,\n            \"SAR\": 15.1459,\n            \"ISR\": 22.239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.70679,\n            \"SIR\": 21.7597,\n            \"SAR\": 9.42591,\n            \"ISR\": 13.0422\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5007,\n            \"SIR\": 19.1565,\n            \"SAR\": 16.1324,\n            \"ISR\": 28.4553\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.25525,\n            \"SIR\": 19.6589,\n            \"SAR\": 7.16343,\n            \"ISR\": 11.5635\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1225,\n            \"SIR\": 16.846,\n            \"SAR\": 15.0242,\n            \"ISR\": 26.1152\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.15397,\n            \"SIR\": 22.7034,\n            \"SAR\": 8.8693,\n            \"ISR\": 11.9382\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9533,\n            \"SIR\": 17.6081,\n            \"SAR\": 15.73,\n            \"ISR\": 19.3107\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.04418,\n            \"SIR\": 20.063,\n            \"SAR\": 3.63063,\n            \"ISR\": 6.31967\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9942,\n            \"SIR\": 14.8376,\n            \"SAR\": 17.6322,\n            \"ISR\": 29.0027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.96859,\n            \"SIR\": 19.6143,\n            \"SAR\": 8.50803,\n            \"ISR\": 12.9257\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0289,\n            \"SIR\": 17.5142,\n            \"SAR\": 15.0729,\n            \"ISR\": 19.389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.58093,\n            \"SIR\": 23.2464,\n            \"SAR\": 6.40487,\n            \"ISR\": 9.69841\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2752,\n            \"SIR\": 16.008,\n            \"SAR\": 15.5397,\n            \"ISR\": 19.8241\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.017675,\n            \"SIR\": -26.9589,\n            \"SAR\": 0.06112,\n            \"ISR\": -1.50838\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8414,\n            \"SIR\": 43.8834,\n            \"SAR\": 13.9884,\n            \"ISR\": 19.2608\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.76853,\n            \"SIR\": 16.4682,\n            \"SAR\": 5.2186,\n            \"ISR\": 8.01715\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6622,\n            \"SIR\": 16.4191,\n            \"SAR\": 17.0878,\n            \"ISR\": 28.397\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.10637,\n            \"SIR\": 20.0302,\n            \"SAR\": 8.58733,\n            \"ISR\": 11.201\n          },\n          \"instrumental\": {\n            \"SDR\": 13.296,\n            \"SIR\": 16.4908,\n            \"SAR\": 16.6643,\n            \"ISR\": 19.7079\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.26583,\n            \"SIR\": 22.9489,\n            \"SAR\": 10.0914,\n            \"ISR\": 11.8722\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8583,\n            \"SIR\": 17.847,\n            \"SAR\": 17.7752,\n            \"ISR\": 30.4277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.15381,\n            \"SIR\": 21.8247,\n            \"SAR\": 7.08072,\n            \"ISR\": 10.013\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3933,\n            \"SIR\": 15.7059,\n            \"SAR\": 17.3492,\n            \"ISR\": 19.3205\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.32531,\n            \"SIR\": 14.1833,\n            \"SAR\": 2.64588,\n            \"ISR\": 4.92155\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2704,\n            \"SIR\": 11.7177,\n            \"SAR\": 15.7355,\n            \"ISR\": 18.9879\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.94258,\n            \"SIR\": 22.4066,\n            \"SAR\": 5.70052,\n            \"ISR\": 8.67636\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7886,\n            \"SIR\": 17.7085,\n            \"SAR\": 17.3128,\n            \"ISR\": 22.9239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.10597,\n            \"SIR\": 17.9811,\n            \"SAR\": 6.14729,\n            \"ISR\": 9.48078\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0803,\n            \"SIR\": 17.4199,\n            \"SAR\": 16.9376,\n            \"ISR\": 27.7988\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.08939,\n            \"SIR\": 16.8874,\n            \"SAR\": 7.18521,\n            \"ISR\": 12.1316\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0169,\n            \"SIR\": 16.8575,\n            \"SAR\": 14.306,\n            \"ISR\": 17.4292\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.43649,\n            \"SIR\": 19.1496,\n            \"SAR\": 8.24094,\n            \"ISR\": 10.3074\n          },\n          \"instrumental\": {\n            \"SDR\": 9.20312,\n            \"SIR\": 11.8684,\n            \"SAR\": 11.8405,\n            \"ISR\": 21.9732\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.55491,\n            \"SIR\": 12.6856,\n            \"SAR\": 3.93544,\n            \"ISR\": 7.43787\n          },\n          \"instrumental\": {\n            \"SDR\": 9.74664,\n            \"SIR\": 11.9224,\n            \"SAR\": 12.6348,\n            \"ISR\": 21.1574\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.53246,\n            \"SIR\": 22.3304,\n            \"SAR\": 9.25947,\n            \"ISR\": 11.758\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5985,\n            \"SIR\": 16.2246,\n            \"SAR\": 15.9804,\n            \"ISR\": 18.0693\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4681,\n            \"SIR\": 24.63,\n            \"SAR\": 10.5415,\n            \"ISR\": 13.0635\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8558,\n            \"SIR\": 12.3995,\n            \"SAR\": 10.8567,\n            \"ISR\": 26.4926\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.6245,\n            \"SIR\": 20.9933,\n            \"SAR\": 9.96015,\n            \"ISR\": 12.3554\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6471,\n            \"SIR\": 14.8636,\n            \"SAR\": 14.4255,\n            \"ISR\": 17.9366\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.85662,\n            \"SIR\": 28.7719,\n            \"SAR\": 10.523,\n            \"ISR\": 14.501\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3752,\n            \"SIR\": 22.2987,\n            \"SAR\": 19.0341,\n            \"ISR\": 19.4875\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.9205,\n            \"SIR\": 11.3297,\n            \"SAR\": 7.30537,\n            \"ISR\": 9.20943\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6013,\n            \"SIR\": 14.4046,\n            \"SAR\": 13.5113,\n            \"ISR\": 17.0101\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7573,\n            \"SIR\": 28.7101,\n            \"SAR\": 14.3598,\n            \"ISR\": 19.8549\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2899,\n            \"SIR\": 26.176,\n            \"SAR\": 19.9787,\n            \"ISR\": 24.6239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0834,\n            \"SIR\": 21.0256,\n            \"SAR\": 11.167,\n            \"ISR\": 15.8358\n          },\n          \"instrumental\": {\n            \"SDR\": 9.31446,\n            \"SIR\": 13.3623,\n            \"SAR\": 10.8209,\n            \"ISR\": 20.325\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.9831,\n            \"SIR\": 17.0047,\n            \"SAR\": 6.36112,\n            \"ISR\": 9.70996\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1064,\n            \"SIR\": 14.7686,\n            \"SAR\": 13.8125,\n            \"ISR\": 24.4958\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.24717,\n            \"SIR\": 15.2307,\n            \"SAR\": 5.72997,\n            \"ISR\": 10.2738\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5056,\n            \"SIR\": 19.3122,\n            \"SAR\": 17.8237,\n            \"ISR\": 18.9927\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.35463,\n            \"SIR\": 24.6351,\n            \"SAR\": 9.71368,\n            \"ISR\": 14.2319\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5978,\n            \"SIR\": 20.2629,\n            \"SAR\": 17.6386,\n            \"ISR\": 30.9372\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.85196,\n            \"SIR\": 25.0897,\n            \"SAR\": 9.19933,\n            \"ISR\": 11.9327\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2559,\n            \"SIR\": 20.3272,\n            \"SAR\": 18.3855,\n            \"ISR\": 24.2826\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.24832,\n            \"SIR\": 24.3849,\n            \"SAR\": 8.88159,\n            \"ISR\": 11.5826\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9588,\n            \"SIR\": 15.1944,\n            \"SAR\": 14.4311,\n            \"ISR\": 22.3589\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.09862,\n            \"SIR\": 20.0806,\n            \"SAR\": 8.14292,\n            \"ISR\": 11.3661\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0197,\n            \"SIR\": 17.7861,\n            \"SAR\": 16.5663,\n            \"ISR\": 26.9919\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.53246,\n        \"SIR\": 20.0806,\n        \"SAR\": 8.19655,\n        \"ISR\": 11.5826\n      },\n      \"instrumental\": {\n        \"SDR\": 13.1225,\n        \"SIR\": 16.8575,\n        \"SAR\": 15.73,\n        \"ISR\": 21.1574\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"16_SP-UVR-MID-44100-2.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 16_SP-UVR-MID-44100-2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.05476,\n            \"SIR\": 14.7077,\n            \"SAR\": 3.82144,\n            \"ISR\": 7.84507\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1911,\n            \"SIR\": 19.3774,\n            \"SAR\": 19.53,\n            \"ISR\": 18.7747\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.14488,\n            \"SIR\": 10.564,\n            \"SAR\": 5.30249,\n            \"ISR\": 12.2981\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8672,\n            \"SIR\": 17.6631,\n            \"SAR\": 12.363,\n            \"ISR\": 14.2162\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95012,\n            \"SIR\": 21.8536,\n            \"SAR\": 8.01042,\n            \"ISR\": 13.1692\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7461,\n            \"SIR\": 19.7277,\n            \"SAR\": 16.6219,\n            \"ISR\": 19.9237\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.83639,\n            \"SIR\": 7.99277,\n            \"SAR\": 3.15233,\n            \"ISR\": 11.5308\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4194,\n            \"SIR\": 19.6512,\n            \"SAR\": 14.0752,\n            \"ISR\": 19.1012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1361,\n            \"SIR\": 21.7287,\n            \"SAR\": 9.60373,\n            \"ISR\": 12.1769\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0007,\n            \"SIR\": 14.2202,\n            \"SAR\": 13.8308,\n            \"ISR\": 24.3814\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.65064,\n            \"SIR\": 16.5965,\n            \"SAR\": 9.26727,\n            \"ISR\": 12.9055\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9311,\n            \"SIR\": 15.0176,\n            \"SAR\": 12.9826,\n            \"ISR\": 20.5673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1991,\n            \"SIR\": 19.5123,\n            \"SAR\": 11.0645,\n            \"ISR\": 14.1953\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6576,\n            \"SIR\": 18.2434,\n            \"SAR\": 16.6393,\n            \"ISR\": 25.3089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.05712,\n            \"SIR\": 21.0804,\n            \"SAR\": 6.34704,\n            \"ISR\": 9.15949\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5455,\n            \"SIR\": 21.0241,\n            \"SAR\": 21.91,\n            \"ISR\": 21.4503\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.82975,\n            \"SIR\": 26.5625,\n            \"SAR\": 10.0465,\n            \"ISR\": 12.1293\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2174,\n            \"SIR\": 15.4893,\n            \"SAR\": 15.3898,\n            \"ISR\": 19.9156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.97059,\n            \"SIR\": 19.3854,\n            \"SAR\": 7.792,\n            \"ISR\": 10.2318\n          },\n          \"instrumental\": {\n            \"SDR\": 13.091,\n            \"SIR\": 13.99,\n            \"SAR\": 14.2777,\n            \"ISR\": 25.6792\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5424,\n            \"SIR\": 23.1182,\n            \"SAR\": 11.1848,\n            \"ISR\": 17.2389\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3612,\n            \"SIR\": 20.5086,\n            \"SAR\": 15.0249,\n            \"ISR\": 21.9469\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.8928,\n            \"SIR\": 21.4305,\n            \"SAR\": 9.60206,\n            \"ISR\": 13.5589\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6325,\n            \"SIR\": 19.6522,\n            \"SAR\": 16.3226,\n            \"ISR\": 28.0676\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.82749,\n            \"SIR\": 19.196,\n            \"SAR\": 6.84891,\n            \"ISR\": 11.0321\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3009,\n            \"SIR\": 17.0932,\n            \"SAR\": 15.276,\n            \"ISR\": 26.1144\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.57525,\n            \"SIR\": 21.151,\n            \"SAR\": 8.08551,\n            \"ISR\": 11.508\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2093,\n            \"SIR\": 18.4946,\n            \"SAR\": 17.9074,\n            \"ISR\": 19.2926\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.80268,\n            \"SIR\": 19.7704,\n            \"SAR\": 3.38626,\n            \"ISR\": 6.39995\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3486,\n            \"SIR\": 15.3605,\n            \"SAR\": 17.7346,\n            \"ISR\": 28.9332\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79887,\n            \"SIR\": 19.6485,\n            \"SAR\": 8.32583,\n            \"ISR\": 12.7485\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9781,\n            \"SIR\": 17.4973,\n            \"SAR\": 15.0503,\n            \"ISR\": 19.2158\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.61559,\n            \"SIR\": 23.2557,\n            \"SAR\": 6.45193,\n            \"ISR\": 9.66374\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3184,\n            \"SIR\": 16.0149,\n            \"SAR\": 15.547,\n            \"ISR\": 19.8151\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.13655,\n            \"SIR\": -29.1205,\n            \"SAR\": 0.12796,\n            \"ISR\": -1.16684\n          },\n          \"instrumental\": {\n            \"SDR\": 12.189,\n            \"SIR\": 44.4709,\n            \"SAR\": 13.5854,\n            \"ISR\": 18.2335\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.59345,\n            \"SIR\": 16.701,\n            \"SAR\": 4.99651,\n            \"ISR\": 7.80856\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1007,\n            \"SIR\": 16.3044,\n            \"SAR\": 17.007,\n            \"ISR\": 28.6554\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.06904,\n            \"SIR\": 19.7876,\n            \"SAR\": 8.45608,\n            \"ISR\": 11.273\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2783,\n            \"SIR\": 16.7505,\n            \"SAR\": 16.4972,\n            \"ISR\": 19.6366\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.77863,\n            \"SIR\": 21.174,\n            \"SAR\": 7.43477,\n            \"ISR\": 10.8253\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6009,\n            \"SIR\": 19.2379,\n            \"SAR\": 19.5835,\n            \"ISR\": 30.3146\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.09053,\n            \"SIR\": 12.2171,\n            \"SAR\": 0.78871,\n            \"ISR\": 5.18396\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8861,\n            \"SIR\": 20.4292,\n            \"SAR\": 20.5322,\n            \"ISR\": 19.2904\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.30994,\n            \"SIR\": 14.0454,\n            \"SAR\": 2.61719,\n            \"ISR\": 4.9673\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3195,\n            \"SIR\": 11.7643,\n            \"SAR\": 15.6419,\n            \"ISR\": 18.9673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.89648,\n            \"SIR\": 21.9296,\n            \"SAR\": 6.12896,\n            \"ISR\": 9.416\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8766,\n            \"SIR\": 17.8881,\n            \"SAR\": 16.672,\n            \"ISR\": 22.8106\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.10146,\n            \"SIR\": 17.8299,\n            \"SAR\": 6.15102,\n            \"ISR\": 9.53275\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1199,\n            \"SIR\": 17.4712,\n            \"SAR\": 16.9302,\n            \"ISR\": 27.6332\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.75647,\n            \"SIR\": 16.3752,\n            \"SAR\": 6.82834,\n            \"ISR\": 11.9032\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4518,\n            \"SIR\": 17.4073,\n            \"SAR\": 14.624,\n            \"ISR\": 17.916\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.44674,\n            \"SIR\": 19.1068,\n            \"SAR\": 8.27371,\n            \"ISR\": 10.4903\n          },\n          \"instrumental\": {\n            \"SDR\": 9.19093,\n            \"SIR\": 12.132,\n            \"SAR\": 11.8309,\n            \"ISR\": 21.8651\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.42356,\n            \"SIR\": 13.2458,\n            \"SAR\": 3.53249,\n            \"ISR\": 7.23216\n          },\n          \"instrumental\": {\n            \"SDR\": 9.9736,\n            \"SIR\": 12.2865,\n            \"SAR\": 12.8762,\n            \"ISR\": 22.0655\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.42881,\n            \"SIR\": 22.7991,\n            \"SAR\": 9.16099,\n            \"ISR\": 11.7041\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6543,\n            \"SIR\": 16.2558,\n            \"SAR\": 15.8203,\n            \"ISR\": 18.1911\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1782,\n            \"SIR\": 24.2228,\n            \"SAR\": 10.5391,\n            \"ISR\": 13.2633\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1408,\n            \"SIR\": 13.4121,\n            \"SAR\": 11.5786,\n            \"ISR\": 26.6025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.62665,\n            \"SIR\": 21.2211,\n            \"SAR\": 9.89564,\n            \"ISR\": 12.3274\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7063,\n            \"SIR\": 14.7338,\n            \"SAR\": 14.3065,\n            \"ISR\": 18.0481\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1396,\n            \"SIR\": 29.0142,\n            \"SAR\": 10.6652,\n            \"ISR\": 14.592\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3857,\n            \"SIR\": 22.1542,\n            \"SAR\": 19.1591,\n            \"ISR\": 19.5059\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.31727,\n            \"SIR\": 9.67953,\n            \"SAR\": 7.05032,\n            \"ISR\": 8.98947\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4692,\n            \"SIR\": 14.6488,\n            \"SAR\": 13.5279,\n            \"ISR\": 15.6278\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3509,\n            \"SIR\": 26.1803,\n            \"SAR\": 12.6763,\n            \"ISR\": 19.2711\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8545,\n            \"SIR\": 27.0207,\n            \"SAR\": 21.5404,\n            \"ISR\": 24.1672\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.61872,\n            \"SIR\": 20.2478,\n            \"SAR\": 10.6861,\n            \"ISR\": 15.6562\n          },\n          \"instrumental\": {\n            \"SDR\": 9.60951,\n            \"SIR\": 13.866,\n            \"SAR\": 10.9383,\n            \"ISR\": 20.3149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.97313,\n            \"SIR\": 16.1558,\n            \"SAR\": 6.32932,\n            \"ISR\": 9.75732\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0828,\n            \"SIR\": 14.8188,\n            \"SAR\": 13.7095,\n            \"ISR\": 23.6688\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.38992,\n            \"SIR\": 15.7863,\n            \"SAR\": 6.01618,\n            \"ISR\": 10.4057\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4765,\n            \"SIR\": 18.9054,\n            \"SAR\": 16.4919,\n            \"ISR\": 18.8983\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.07499,\n            \"SIR\": 24.3168,\n            \"SAR\": 9.47002,\n            \"ISR\": 14.0612\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7923,\n            \"SIR\": 20.4386,\n            \"SAR\": 17.7775,\n            \"ISR\": 30.9416\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.55129,\n            \"SIR\": 24.6962,\n            \"SAR\": 8.80227,\n            \"ISR\": 11.7298\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3715,\n            \"SIR\": 20.5855,\n            \"SAR\": 18.6586,\n            \"ISR\": 24.1959\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.26149,\n            \"SIR\": 24.3251,\n            \"SAR\": 8.9076,\n            \"ISR\": 11.7729\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1055,\n            \"SIR\": 15.514,\n            \"SAR\": 14.5863,\n            \"ISR\": 22.3034\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.11251,\n            \"SIR\": 19.8879,\n            \"SAR\": 8.10484,\n            \"ISR\": 11.545\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9713,\n            \"SIR\": 17.9623,\n            \"SAR\": 16.5374,\n            \"ISR\": 26.8822\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.42881,\n        \"SIR\": 19.7876,\n        \"SAR\": 8.01042,\n        \"ISR\": 11.5308\n      },\n      \"instrumental\": {\n        \"SDR\": 13.3184,\n        \"SIR\": 17.4712,\n        \"SAR\": 15.6419,\n        \"ISR\": 21.4503\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"17_HP-Wind_Inst-UVR.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: 17_HP-Wind_Inst-UVR\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"no woodwinds\",\n      \"woodwinds\"\n    ],\n    \"target_stem\": \"no woodwinds\"\n  },\n  \"UVR-De-Echo-Aggressive.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-De-Echo-Aggressive by FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"no echo\",\n      \"echo\"\n    ],\n    \"target_stem\": \"no echo\"\n  },\n  \"UVR-De-Echo-Normal.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-De-Echo-Normal by FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"no echo\",\n      \"echo\"\n    ],\n    \"target_stem\": \"no echo\"\n  },\n  \"UVR-DeEcho-DeReverb.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-DeEcho-DeReverb by FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"no reverb\",\n      \"reverb\"\n    ],\n    \"target_stem\": \"no reverb\"\n  },\n  \"UVR-DeNoise-Lite.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-DeNoise-Lite by FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"noise\",\n      \"no noise\"\n    ],\n    \"target_stem\": \"noise\"\n  },\n  \"UVR-DeNoise.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-DeNoise by FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"noise\",\n      \"no noise\"\n    ],\n    \"target_stem\": \"noise\"\n  },\n  \"UVR-BVE-4B_SN-44100-1.pth\": {\n    \"model_name\": \"VR Arch Single Model v5: UVR-BVE-4B_SN-44100-1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.0143,\n            \"SIR\": 5.71885,\n            \"SAR\": 0.00389,\n            \"ISR\": 2.65473\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0434,\n            \"SIR\": 13.2314,\n            \"SAR\": 20.3884,\n            \"ISR\": 18.1082\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.04364,\n            \"SIR\": -0.21064,\n            \"SAR\": 1.13731,\n            \"ISR\": 4.28305\n          },\n          \"instrumental\": {\n            \"SDR\": 5.134,\n            \"SIR\": 8.18763,\n            \"SAR\": 9.75106,\n            \"ISR\": 12.2035\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.32999,\n            \"SIR\": 13.4328,\n            \"SAR\": -0.091835,\n            \"ISR\": 1.66201\n          },\n          \"instrumental\": {\n            \"SDR\": 7.97396,\n            \"SIR\": 7.89168,\n            \"SAR\": 18.8239,\n            \"ISR\": 18.7354\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.312875,\n            \"SIR\": 8.2827,\n            \"SAR\": -1.47848,\n            \"ISR\": 1.55275\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3618,\n            \"SIR\": 10.2596,\n            \"SAR\": 18.4705,\n            \"ISR\": 29.0072\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.42935,\n            \"SIR\": 13.9102,\n            \"SAR\": -0.415185,\n            \"ISR\": 2.28143\n          },\n          \"instrumental\": {\n            \"SDR\": 3.57475,\n            \"SIR\": 3.97382,\n            \"SAR\": 12.5341,\n            \"ISR\": 24.7033\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.64078,\n            \"SIR\": 12.5758,\n            \"SAR\": 4.83381,\n            \"ISR\": 9.15469\n          },\n          \"instrumental\": {\n            \"SDR\": 7.71278,\n            \"SIR\": 10.1979,\n            \"SAR\": 9.69012,\n            \"ISR\": 14.0837\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.77048,\n            \"SIR\": 13.0031,\n            \"SAR\": 4.32344,\n            \"ISR\": 8.96978\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1011,\n            \"SIR\": 13.0064,\n            \"SAR\": 11.5667,\n            \"ISR\": 20.3358\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.01073,\n            \"SIR\": 17.9395,\n            \"SAR\": -0.01832,\n            \"ISR\": 0.35279\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1041,\n            \"SIR\": 10.9717,\n            \"SAR\": 34.1422,\n            \"ISR\": 19.9389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00011,\n            \"SIR\": -13.1551,\n            \"SAR\": -0.03768,\n            \"ISR\": 0.094585\n          },\n          \"instrumental\": {\n            \"SDR\": 3.638,\n            \"SIR\": 2.8822,\n            \"SAR\": 27.305,\n            \"ISR\": 19.3149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.028245,\n            \"SIR\": 3.83482,\n            \"SAR\": 0.010535,\n            \"ISR\": 1.60119\n          },\n          \"instrumental\": {\n            \"SDR\": 5.0306,\n            \"SIR\": 5.31947,\n            \"SAR\": 15.8014,\n            \"ISR\": 18.1645\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.000285,\n            \"SIR\": -7.34197,\n            \"SAR\": -0.599315,\n            \"ISR\": 0.027645\n          },\n          \"instrumental\": {\n            \"SDR\": 3.23884,\n            \"SIR\": 2.59164,\n            \"SAR\": 37.7021,\n            \"ISR\": 20.0797\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.40459,\n            \"SIR\": 18.0403,\n            \"SAR\": 0.02858,\n            \"ISR\": 4.25442\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5983,\n            \"SIR\": 10.059,\n            \"SAR\": 13.1579,\n            \"ISR\": 23.167\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.40391,\n            \"SIR\": 13.7794,\n            \"SAR\": 0.06788,\n            \"ISR\": 4.86253\n          },\n          \"instrumental\": {\n            \"SDR\": 9.05201,\n            \"SIR\": 10.1529,\n            \"SAR\": 12.2732,\n            \"ISR\": 25.1146\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.01767,\n            \"SIR\": 5.09955,\n            \"SAR\": -0.62411,\n            \"ISR\": 0.75405\n          },\n          \"instrumental\": {\n            \"SDR\": 5.96105,\n            \"SIR\": 5.57252,\n            \"SAR\": 20.6007,\n            \"ISR\": 18.9997\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2e-05,\n            \"SIR\": -10.3795,\n            \"SAR\": 3e-05,\n            \"ISR\": 0.27913\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8787,\n            \"SIR\": 10.8785,\n            \"SAR\": 25.7465,\n            \"ISR\": 20.2301\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.50267,\n            \"SIR\": 14.2358,\n            \"SAR\": 4.57239,\n            \"ISR\": 9.29933\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0412,\n            \"SIR\": 13.7085,\n            \"SAR\": 12.4026,\n            \"ISR\": 17.3735\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00618,\n            \"SIR\": 12.4248,\n            \"SAR\": 0.0,\n            \"ISR\": 1.16757\n          },\n          \"instrumental\": {\n            \"SDR\": 9.40412,\n            \"SIR\": 9.00934,\n            \"SAR\": 18.8258,\n            \"ISR\": 20.5897\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -47.1281,\n            \"SIR\": -39.8072,\n            \"SAR\": 4.62926,\n            \"ISR\": -5.92184\n          },\n          \"instrumental\": {\n            \"SDR\": 3.42626,\n            \"SIR\": 39.3782,\n            \"SAR\": 6.31446,\n            \"ISR\": 5.81093\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.55326,\n            \"SIR\": 13.0587,\n            \"SAR\": 2.02905,\n            \"ISR\": 6.54441\n          },\n          \"instrumental\": {\n            \"SDR\": 11.659,\n            \"SIR\": 14.2547,\n            \"SAR\": 14.7696,\n            \"ISR\": 25.6118\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.12547,\n            \"SIR\": 14.3363,\n            \"SAR\": 0.00047,\n            \"ISR\": 1.47323\n          },\n          \"instrumental\": {\n            \"SDR\": 8.4943,\n            \"SIR\": 7.86234,\n            \"SAR\": 19.0268,\n            \"ISR\": 19.1418\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.20213,\n            \"SIR\": 12.4245,\n            \"SAR\": 0.00065,\n            \"ISR\": 1.27989\n          },\n          \"instrumental\": {\n            \"SDR\": 9.07785,\n            \"SIR\": 7.58439,\n            \"SAR\": 21.1446,\n            \"ISR\": 19.3072\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7e-05,\n            \"SIR\": -2.77665,\n            \"SAR\": -1.71684,\n            \"ISR\": 0.793235\n          },\n          \"instrumental\": {\n            \"SDR\": 7.13026,\n            \"SIR\": 8.34966,\n            \"SAR\": 18.3008,\n            \"ISR\": 16.238\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.02167,\n            \"SIR\": 7.93674,\n            \"SAR\": 0.00422,\n            \"ISR\": 0.73612\n          },\n          \"instrumental\": {\n            \"SDR\": 7.74501,\n            \"SIR\": 7.7283,\n            \"SAR\": 25.8832,\n            \"ISR\": 19.0584\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.0,\n            \"SIR\": -2.13244,\n            \"SAR\": 0.00751,\n            \"ISR\": 1.33782\n          },\n          \"instrumental\": {\n            \"SDR\": 10.485,\n            \"SIR\": 14.1155,\n            \"SAR\": 16.39,\n            \"ISR\": 17.4384\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.246915,\n            \"SIR\": 10.8693,\n            \"SAR\": -0.023265,\n            \"ISR\": 1.55227\n          },\n          \"instrumental\": {\n            \"SDR\": 8.94452,\n            \"SIR\": 9.13386,\n            \"SAR\": 21.2897,\n            \"ISR\": 25.0864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -3.10508,\n            \"SIR\": -11.8376,\n            \"SAR\": 3.59199,\n            \"ISR\": 2.01564\n          },\n          \"instrumental\": {\n            \"SDR\": 2.21281,\n            \"SIR\": 4.74416,\n            \"SAR\": 7.33404,\n            \"ISR\": 6.02347\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.133265,\n            \"SIR\": 14.0034,\n            \"SAR\": -0.43571,\n            \"ISR\": 0.686895\n          },\n          \"instrumental\": {\n            \"SDR\": 3.09455,\n            \"SIR\": 2.40681,\n            \"SAR\": 19.4666,\n            \"ISR\": 18.499\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.13102,\n            \"SIR\": -3.41465,\n            \"SAR\": 0.89258,\n            \"ISR\": 4.14013\n          },\n          \"instrumental\": {\n            \"SDR\": 5.59169,\n            \"SIR\": 6.8377,\n            \"SAR\": 8.52814,\n            \"ISR\": 10.6466\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.0369,\n            \"SIR\": 8.50288,\n            \"SAR\": 0.00042,\n            \"ISR\": 2.38309\n          },\n          \"instrumental\": {\n            \"SDR\": 6.4094,\n            \"SIR\": 6.89847,\n            \"SAR\": 15.2855,\n            \"ISR\": 18.8411\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.413795,\n            \"SIR\": 8.52527,\n            \"SAR\": -1.38646,\n            \"ISR\": 1.13693\n          },\n          \"instrumental\": {\n            \"SDR\": 0.043585,\n            \"SIR\": -0.77882,\n            \"SAR\": 9.44473,\n            \"ISR\": 16.5427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.31805,\n            \"SIR\": 11.9867,\n            \"SAR\": -0.60957,\n            \"ISR\": 2.95772\n          },\n          \"instrumental\": {\n            \"SDR\": 5.00257,\n            \"SIR\": 5.61025,\n            \"SAR\": 11.1517,\n            \"ISR\": 15.8905\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.88393,\n            \"SIR\": 27.3909,\n            \"SAR\": 0.10713,\n            \"ISR\": 4.38105\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0878,\n            \"SIR\": 11.938,\n            \"SAR\": 15.5121,\n            \"ISR\": 19.5204\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.100795,\n            \"SIR\": 0.387355,\n            \"SAR\": 0.027535,\n            \"ISR\": 1.75572\n          },\n          \"instrumental\": {\n            \"SDR\": 7.23206,\n            \"SIR\": 8.84448,\n            \"SAR\": 14.5591,\n            \"ISR\": 15.2786\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.38667,\n            \"SIR\": 18.2052,\n            \"SAR\": -0.26025,\n            \"ISR\": 3.36353\n          },\n          \"instrumental\": {\n            \"SDR\": 7.91075,\n            \"SIR\": 8.17666,\n            \"SAR\": 13.3232,\n            \"ISR\": 20.0839\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.46791,\n            \"SIR\": 17.882,\n            \"SAR\": 0.00119,\n            \"ISR\": 1.5489\n          },\n          \"instrumental\": {\n            \"SDR\": 0.16428,\n            \"SIR\": 0.32093,\n            \"SAR\": 15.8749,\n            \"ISR\": 29.4909\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00815,\n            \"SIR\": -0.44809,\n            \"SAR\": -0.213905,\n            \"ISR\": 1.19004\n          },\n          \"instrumental\": {\n            \"SDR\": 6.70995,\n            \"SIR\": 6.15572,\n            \"SAR\": 14.9742,\n            \"ISR\": 14.7971\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.32474,\n            \"SIR\": -14.1332,\n            \"SAR\": -2.14238,\n            \"ISR\": 0.25697\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0616,\n            \"SIR\": 15.8927,\n            \"SAR\": 24.8659,\n            \"ISR\": 18.0995\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.02668,\n            \"SIR\": -1.82824,\n            \"SAR\": -0.13726,\n            \"ISR\": 0.86444\n          },\n          \"instrumental\": {\n            \"SDR\": 7.65412,\n            \"SIR\": 7.41909,\n            \"SAR\": 22.2664,\n            \"ISR\": 19.358\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00083,\n            \"SIR\": 14.3415,\n            \"SAR\": -0.2663,\n            \"ISR\": 1.00558\n          },\n          \"instrumental\": {\n            \"SDR\": 7.58665,\n            \"SIR\": 8.01065,\n            \"SAR\": 23.6744,\n            \"ISR\": 25.0603\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.007785,\n            \"SIR\": 4.49145,\n            \"SAR\": -0.53294,\n            \"ISR\": 0.54762\n          },\n          \"instrumental\": {\n            \"SDR\": 8.96662,\n            \"SIR\": 8.24758,\n            \"SAR\": 20.1794,\n            \"ISR\": 18.2859\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.0,\n            \"SIR\": -31.3216,\n            \"SAR\": -4e-05,\n            \"ISR\": 0.02305\n          },\n          \"instrumental\": {\n            \"SDR\": 6.71564,\n            \"SIR\": 6.73194,\n            \"SAR\": 22.1283,\n            \"ISR\": 17.061\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 0.0369,\n        \"SIR\": 8.50288,\n        \"SAR\": 3e-05,\n        \"ISR\": 1.5489\n      },\n      \"instrumental\": {\n        \"SDR\": 7.71278,\n        \"SIR\": 8.17666,\n        \"SAR\": 16.39,\n        \"ISR\": 18.9997\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MGM_HIGHEND_v4.pth\": {\n    \"model_name\": \"VR Arch Single Model v4: MGM_HIGHEND_v4\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.90158,\n            \"SIR\": 15.9598,\n            \"SAR\": 3.37143,\n            \"ISR\": 6.56793\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4628,\n            \"SIR\": 17.3509,\n            \"SAR\": 18.4845,\n            \"ISR\": 18.8014\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.14604,\n            \"SIR\": 10.0782,\n            \"SAR\": 3.92013,\n            \"ISR\": 9.93124\n          },\n          \"instrumental\": {\n            \"SDR\": 9.75011,\n            \"SIR\": 15.2216,\n            \"SAR\": 11.5548,\n            \"ISR\": 14.6244\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.93557,\n            \"SIR\": 19.3656,\n            \"SAR\": 7.81443,\n            \"ISR\": 13.1884\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9156,\n            \"SIR\": 19.1792,\n            \"SAR\": 15.8666,\n            \"ISR\": 19.6767\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.33268,\n            \"SIR\": 6.77226,\n            \"SAR\": 3.65674,\n            \"ISR\": 11.5903\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8171,\n            \"SIR\": 19.9583,\n            \"SAR\": 13.6783,\n            \"ISR\": 17.7355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.94004,\n            \"SIR\": 20.0844,\n            \"SAR\": 8.4096,\n            \"ISR\": 11.376\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5449,\n            \"SIR\": 13.329,\n            \"SAR\": 12.6816,\n            \"ISR\": 24.56\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.11557,\n            \"SIR\": 15.8005,\n            \"SAR\": 8.90774,\n            \"ISR\": 12.1068\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3683,\n            \"SIR\": 13.8967,\n            \"SAR\": 12.8837,\n            \"ISR\": 19.4005\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.27646,\n            \"SIR\": 18.6354,\n            \"SAR\": 9.63718,\n            \"ISR\": 13.1704\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6027,\n            \"SIR\": 17.3718,\n            \"SAR\": 15.5494,\n            \"ISR\": 25.0246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.26177,\n            \"SIR\": 20.233,\n            \"SAR\": 6.11505,\n            \"ISR\": 9.82511\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6445,\n            \"SIR\": 20.2902,\n            \"SAR\": 20.8222,\n            \"ISR\": 22.2319\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.5773,\n            \"SIR\": 21.8232,\n            \"SAR\": 8.52451,\n            \"ISR\": 11.2414\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6901,\n            \"SIR\": 14.5104,\n            \"SAR\": 14.0378,\n            \"ISR\": 17.3448\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.4221,\n            \"SIR\": 16.8177,\n            \"SAR\": 5.93863,\n            \"ISR\": 8.81692\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2707,\n            \"SIR\": 12.271,\n            \"SAR\": 12.696,\n            \"ISR\": 21.5427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.93582,\n            \"SIR\": 19.9157,\n            \"SAR\": 9.31265,\n            \"ISR\": 15.5439\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8457,\n            \"SIR\": 18.9899,\n            \"SAR\": 13.4012,\n            \"ISR\": 20.8239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.17047,\n            \"SIR\": 20.7378,\n            \"SAR\": 8.61907,\n            \"ISR\": 13.4811\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2668,\n            \"SIR\": 19.1495,\n            \"SAR\": 16.2434,\n            \"ISR\": 27.9942\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.02652,\n            \"SIR\": 17.354,\n            \"SAR\": 6.1273,\n            \"ISR\": 10.6004\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9385,\n            \"SIR\": 16.7924,\n            \"SAR\": 14.934,\n            \"ISR\": 25.139\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.72002,\n            \"SIR\": 18.2223,\n            \"SAR\": 7.99737,\n            \"ISR\": 12.4747\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9529,\n            \"SIR\": 17.9076,\n            \"SAR\": 15.2824,\n            \"ISR\": 20.6921\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.59072,\n            \"SIR\": 15.4127,\n            \"SAR\": 1.90344,\n            \"ISR\": 5.62525\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0211,\n            \"SIR\": 15.3179,\n            \"SAR\": 18.0053,\n            \"ISR\": 27.8044\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.1424,\n            \"SIR\": 18.2684,\n            \"SAR\": 7.72987,\n            \"ISR\": 12.118\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3263,\n            \"SIR\": 16.6244,\n            \"SAR\": 14.5395,\n            \"ISR\": 18.7469\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.79773,\n            \"SIR\": 22.8844,\n            \"SAR\": 4.22085,\n            \"ISR\": 7.32914\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5991,\n            \"SIR\": 14.045,\n            \"SAR\": 15.1613,\n            \"ISR\": 19.8877\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -26.7485,\n            \"SIR\": -37.3465,\n            \"SAR\": 0.48318,\n            \"ISR\": -0.30463\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5748,\n            \"SIR\": 46.0469,\n            \"SAR\": 11.4619,\n            \"ISR\": 14.6779\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.29641,\n            \"SIR\": 14.8692,\n            \"SAR\": 4.50374,\n            \"ISR\": 7.53302\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4052,\n            \"SIR\": 16.1001,\n            \"SAR\": 16.8893,\n            \"ISR\": 27.3751\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.97189,\n            \"SIR\": 17.9441,\n            \"SAR\": 6.69787,\n            \"ISR\": 9.2965\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0515,\n            \"SIR\": 14.3845,\n            \"SAR\": 15.2945,\n            \"ISR\": 18.984\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.24282,\n            \"SIR\": 20.9396,\n            \"SAR\": 9.75113,\n            \"ISR\": 12.7221\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3228,\n            \"SIR\": 19.0966,\n            \"SAR\": 17.6149,\n            \"ISR\": 29.0029\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.38435,\n            \"SIR\": 18.8745,\n            \"SAR\": 5.34815,\n            \"ISR\": 8.91087\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5739,\n            \"SIR\": 14.452,\n            \"SAR\": 15.5478,\n            \"ISR\": 18.0254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.154,\n            \"SIR\": 13.3721,\n            \"SAR\": 2.65443,\n            \"ISR\": 5.07322\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0838,\n            \"SIR\": 11.8062,\n            \"SAR\": 15.5527,\n            \"ISR\": 19.3115\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.5587,\n            \"SIR\": 20.8421,\n            \"SAR\": 5.74645,\n            \"ISR\": 9.17571\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6373,\n            \"SIR\": 17.6326,\n            \"SAR\": 16.5896,\n            \"ISR\": 22.2738\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.27313,\n            \"SIR\": 18.1922,\n            \"SAR\": 5.09384,\n            \"ISR\": 8.00382\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2735,\n            \"SIR\": 15.8607,\n            \"SAR\": 16.7631,\n            \"ISR\": 29.685\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.12591,\n            \"SIR\": 15.0247,\n            \"SAR\": 6.41316,\n            \"ISR\": 10.4564\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6403,\n            \"SIR\": 15.4023,\n            \"SAR\": 13.8415,\n            \"ISR\": 19.0921\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.07211,\n            \"SIR\": 17.0282,\n            \"SAR\": 7.84004,\n            \"ISR\": 10.7319\n          },\n          \"instrumental\": {\n            \"SDR\": 8.89794,\n            \"SIR\": 11.7736,\n            \"SAR\": 11.3839,\n            \"ISR\": 20.5617\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.98271,\n            \"SIR\": 15.9526,\n            \"SAR\": 2.77588,\n            \"ISR\": 6.02627\n          },\n          \"instrumental\": {\n            \"SDR\": 8.85549,\n            \"SIR\": 9.76054,\n            \"SAR\": 12.551,\n            \"ISR\": 25.0718\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0344,\n            \"SIR\": 18.4441,\n            \"SAR\": 7.52487,\n            \"ISR\": 9.83313\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3866,\n            \"SIR\": 14.4887,\n            \"SAR\": 14.6639,\n            \"ISR\": 16.9712\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79018,\n            \"SIR\": 19.7083,\n            \"SAR\": 9.00709,\n            \"ISR\": 11.7792\n          },\n          \"instrumental\": {\n            \"SDR\": 9.9061,\n            \"SIR\": 11.8059,\n            \"SAR\": 9.81472,\n            \"ISR\": 22.7282\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.86472,\n            \"SIR\": 19.3687,\n            \"SAR\": 9.44389,\n            \"ISR\": 11.5758\n          },\n          \"instrumental\": {\n            \"SDR\": 10.948,\n            \"SIR\": 13.9173,\n            \"SAR\": 13.8063,\n            \"ISR\": 18.3349\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.23432,\n            \"SIR\": 28.5103,\n            \"SAR\": 9.78841,\n            \"ISR\": 14.6058\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2896,\n            \"SIR\": 21.6052,\n            \"SAR\": 18.9097,\n            \"ISR\": 19.4016\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.94688,\n            \"SIR\": 10.3811,\n            \"SAR\": 4.72496,\n            \"ISR\": 7.40895\n          },\n          \"instrumental\": {\n            \"SDR\": 10.096,\n            \"SIR\": 13.317,\n            \"SAR\": 12.7108,\n            \"ISR\": 16.6326\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2862,\n            \"SIR\": 20.929,\n            \"SAR\": 11.9025,\n            \"ISR\": 19.6779\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0346,\n            \"SIR\": 26.1253,\n            \"SAR\": 18.7194,\n            \"ISR\": 27.3093\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.83779,\n            \"SIR\": 14.4903,\n            \"SAR\": 9.05255,\n            \"ISR\": 14.8196\n          },\n          \"instrumental\": {\n            \"SDR\": 7.71921,\n            \"SIR\": 12.9827,\n            \"SAR\": 9.56297,\n            \"ISR\": 15.0955\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.76004,\n            \"SIR\": 15.1232,\n            \"SAR\": 5.01898,\n            \"ISR\": 8.43216\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1513,\n            \"SIR\": 13.3561,\n            \"SAR\": 13.0138,\n            \"ISR\": 21.4655\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.24156,\n            \"SIR\": 15.9899,\n            \"SAR\": 5.29469,\n            \"ISR\": 11.2274\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2661,\n            \"SIR\": 18.222,\n            \"SAR\": 14.9786,\n            \"ISR\": 18.2719\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.33451,\n            \"SIR\": 23.5481,\n            \"SAR\": 7.67851,\n            \"ISR\": 11.4429\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7794,\n            \"SIR\": 17.3813,\n            \"SAR\": 16.1219,\n            \"ISR\": 31.4248\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.30791,\n            \"SIR\": 24.0661,\n            \"SAR\": 8.46494,\n            \"ISR\": 10.6745\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2415,\n            \"SIR\": 17.9436,\n            \"SAR\": 17.2388,\n            \"ISR\": 23.9917\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.8583,\n            \"SIR\": 20.7505,\n            \"SAR\": 7.33039,\n            \"ISR\": 11.1598\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9788,\n            \"SIR\": 14.4968,\n            \"SAR\": 13.0092,\n            \"ISR\": 20.7857\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.38656,\n            \"SIR\": 18.1913,\n            \"SAR\": 7.05589,\n            \"ISR\": 11.4799\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9472,\n            \"SIR\": 17.5936,\n            \"SAR\": 15.3739,\n            \"ISR\": 26.3415\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 6.8583,\n        \"SIR\": 18.2223,\n        \"SAR\": 7.05589,\n        \"ISR\": 10.7319\n      },\n      \"instrumental\": {\n        \"SDR\": 12.2661,\n        \"SIR\": 15.8607,\n        \"SAR\": 14.9786,\n        \"ISR\": 20.6921\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"MGM_LOWEND_A_v4.pth\": {\n    \"model_name\": \"VR Arch Single Model v4: MGM_LOWEND_A_v4\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.13243,\n            \"SIR\": 17.8811,\n            \"SAR\": 3.52338,\n            \"ISR\": 6.90286\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9123,\n            \"SIR\": 18.3341,\n            \"SAR\": 19.4275,\n            \"ISR\": 18.9589\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.01049,\n            \"SIR\": 11.6772,\n            \"SAR\": 4.99365,\n            \"ISR\": 11.1504\n          },\n          \"instrumental\": {\n            \"SDR\": 9.92313,\n            \"SIR\": 16.3978,\n            \"SAR\": 11.7922,\n            \"ISR\": 15.0924\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.28529,\n            \"SIR\": 20.1487,\n            \"SAR\": 8.40583,\n            \"ISR\": 13.7556\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6941,\n            \"SIR\": 19.9227,\n            \"SAR\": 16.4249,\n            \"ISR\": 20.4547\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.01528,\n            \"SIR\": 6.60106,\n            \"SAR\": 2.9223,\n            \"ISR\": 10.8273\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0584,\n            \"SIR\": 19.6768,\n            \"SAR\": 13.8999,\n            \"ISR\": 18.3289\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.57182,\n            \"SIR\": 21.8357,\n            \"SAR\": 8.67216,\n            \"ISR\": 11.3473\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4535,\n            \"SIR\": 13.2453,\n            \"SAR\": 13.1476,\n            \"ISR\": 26.0736\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.53133,\n            \"SIR\": 17.2874,\n            \"SAR\": 9.25447,\n            \"ISR\": 12.4147\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8303,\n            \"SIR\": 14.2099,\n            \"SAR\": 13.3712,\n            \"ISR\": 21.4583\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.56681,\n            \"SIR\": 19.7166,\n            \"SAR\": 10.0536,\n            \"ISR\": 13.2565\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1335,\n            \"SIR\": 17.5529,\n            \"SAR\": 15.9987,\n            \"ISR\": 25.523\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.6425,\n            \"SIR\": 22.1486,\n            \"SAR\": 6.80209,\n            \"ISR\": 9.2464\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4815,\n            \"SIR\": 19.8195,\n            \"SAR\": 21.4317,\n            \"ISR\": 21.3385\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42826,\n            \"SIR\": 23.8723,\n            \"SAR\": 9.52058,\n            \"ISR\": 12.0575\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7044,\n            \"SIR\": 15.4715,\n            \"SAR\": 14.7534,\n            \"ISR\": 19.1443\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.29394,\n            \"SIR\": 19.6272,\n            \"SAR\": 6.56581,\n            \"ISR\": 8.88188\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8858,\n            \"SIR\": 12.2884,\n            \"SAR\": 13.2998,\n            \"ISR\": 24.1904\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.58396,\n            \"SIR\": 22.3749,\n            \"SAR\": 10.2593,\n            \"ISR\": 15.5784\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7627,\n            \"SIR\": 19.129,\n            \"SAR\": 14.0214,\n            \"ISR\": 22.4446\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.54848,\n            \"SIR\": 22.158,\n            \"SAR\": 8.83647,\n            \"ISR\": 12.5536\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6306,\n            \"SIR\": 18.4423,\n            \"SAR\": 16.3813,\n            \"ISR\": 28.774\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.09956,\n            \"SIR\": 18.5909,\n            \"SAR\": 6.58563,\n            \"ISR\": 10.455\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4724,\n            \"SIR\": 18.4273,\n            \"SAR\": 16.0907,\n            \"ISR\": 26.9517\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.40484,\n            \"SIR\": 19.7856,\n            \"SAR\": 8.05628,\n            \"ISR\": 12.0102\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1953,\n            \"SIR\": 18.3612,\n            \"SAR\": 16.7126,\n            \"ISR\": 19.7958\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.78262,\n            \"SIR\": 18.0492,\n            \"SAR\": 2.43916,\n            \"ISR\": 5.81773\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3299,\n            \"SIR\": 15.4156,\n            \"SAR\": 18.2056,\n            \"ISR\": 29.0077\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84857,\n            \"SIR\": 19.6716,\n            \"SAR\": 8.56902,\n            \"ISR\": 12.9029\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0481,\n            \"SIR\": 17.496,\n            \"SAR\": 15.3852,\n            \"ISR\": 18.9905\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.02299,\n            \"SIR\": 23.418,\n            \"SAR\": 4.71773,\n            \"ISR\": 7.52461\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6876,\n            \"SIR\": 14.3308,\n            \"SAR\": 14.9073,\n            \"ISR\": 18.3955\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.114985,\n            \"SIR\": -31.3486,\n            \"SAR\": 0.006975,\n            \"ISR\": 0.24724\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2202,\n            \"SIR\": 43.1697,\n            \"SAR\": 11.8595,\n            \"ISR\": 16.5294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.44268,\n            \"SIR\": 17.1482,\n            \"SAR\": 4.85671,\n            \"ISR\": 7.38147\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5769,\n            \"SIR\": 15.7796,\n            \"SAR\": 16.8969,\n            \"ISR\": 29.5096\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.26728,\n            \"SIR\": 19.0854,\n            \"SAR\": 6.91409,\n            \"ISR\": 9.86828\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2125,\n            \"SIR\": 15.781,\n            \"SAR\": 16.5457,\n            \"ISR\": 19.6908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.03903,\n            \"SIR\": 20.7411,\n            \"SAR\": 7.77303,\n            \"ISR\": 11.2687\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9634,\n            \"SIR\": 20.7001,\n            \"SAR\": 19.1418,\n            \"ISR\": 29.3158\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00017,\n            \"SIR\": -1.63019,\n            \"SAR\": 0.00015,\n            \"ISR\": 2.87582\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7711,\n            \"SIR\": 36.8654,\n            \"SAR\": 25.0648,\n            \"ISR\": 18.2668\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.30796,\n            \"SIR\": 13.8603,\n            \"SAR\": 2.96534,\n            \"ISR\": 5.16448\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3269,\n            \"SIR\": 11.9477,\n            \"SAR\": 15.8934,\n            \"ISR\": 19.8466\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.48658,\n            \"SIR\": 26.0277,\n            \"SAR\": 5.80744,\n            \"ISR\": 8.25719\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5514,\n            \"SIR\": 15.6632,\n            \"SAR\": 16.4748,\n            \"ISR\": 17.6172\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.73863,\n            \"SIR\": 18.2641,\n            \"SAR\": 5.61624,\n            \"ISR\": 8.5719\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6125,\n            \"SIR\": 16.3909,\n            \"SAR\": 17.0085,\n            \"ISR\": 28.3021\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.09681,\n            \"SIR\": 13.5978,\n            \"SAR\": 6.03586,\n            \"ISR\": 11.2977\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9772,\n            \"SIR\": 17.1869,\n            \"SAR\": 14.6038,\n            \"ISR\": 22.7453\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49884,\n            \"SIR\": 18.3407,\n            \"SAR\": 7.96366,\n            \"ISR\": 10.9161\n          },\n          \"instrumental\": {\n            \"SDR\": 9.24692,\n            \"SIR\": 12.0869,\n            \"SAR\": 11.5976,\n            \"ISR\": 20.9149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.00739,\n            \"SIR\": 15.0319,\n            \"SAR\": 3.45595,\n            \"ISR\": 6.72136\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1741,\n            \"SIR\": 12.034,\n            \"SAR\": 13.4391,\n            \"ISR\": 24.6053\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.40607,\n            \"SIR\": 18.1357,\n            \"SAR\": 8.03135,\n            \"ISR\": 11.1484\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6303,\n            \"SIR\": 15.6059,\n            \"SAR\": 15.0191,\n            \"ISR\": 16.6834\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.839,\n            \"SIR\": 23.5971,\n            \"SAR\": 9.11615,\n            \"ISR\": 11.1671\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1629,\n            \"SIR\": 10.6336,\n            \"SAR\": 9.45999,\n            \"ISR\": 25.1319\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.35571,\n            \"SIR\": 21.0869,\n            \"SAR\": 9.66289,\n            \"ISR\": 11.9206\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4371,\n            \"SIR\": 14.3517,\n            \"SAR\": 13.9543,\n            \"ISR\": 18.6089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.96153,\n            \"SIR\": 30.4023,\n            \"SAR\": 10.5443,\n            \"ISR\": 14.4086\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1541,\n            \"SIR\": 21.7402,\n            \"SAR\": 19.0166,\n            \"ISR\": 19.286\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.07542,\n            \"SIR\": 7.56306,\n            \"SAR\": 7.57733,\n            \"ISR\": 9.30151\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0495,\n            \"SIR\": 15.1149,\n            \"SAR\": 13.4573,\n            \"ISR\": 14.5596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4041,\n            \"SIR\": 21.3838,\n            \"SAR\": 11.4042,\n            \"ISR\": 20.0992\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3969,\n            \"SIR\": 26.9726,\n            \"SAR\": 19.9276,\n            \"ISR\": 27.2555\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.46751,\n            \"SIR\": 14.82,\n            \"SAR\": 7.83257,\n            \"ISR\": 14.2496\n          },\n          \"instrumental\": {\n            \"SDR\": 8.51406,\n            \"SIR\": 13.584,\n            \"SAR\": 9.24146,\n            \"ISR\": 15.7104\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.53377,\n            \"SIR\": 17.2831,\n            \"SAR\": 5.35722,\n            \"ISR\": 9.61377\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7176,\n            \"SIR\": 14.3966,\n            \"SAR\": 13.4694,\n            \"ISR\": 25.2204\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.55968,\n            \"SIR\": 17.0061,\n            \"SAR\": 6.07319,\n            \"ISR\": 11.2038\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3794,\n            \"SIR\": 18.0994,\n            \"SAR\": 15.3898,\n            \"ISR\": 18.3958\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.78558,\n            \"SIR\": 24.8375,\n            \"SAR\": 8.25331,\n            \"ISR\": 12.0214\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4666,\n            \"SIR\": 18.1075,\n            \"SAR\": 16.8944,\n            \"ISR\": 32.6386\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.36498,\n            \"SIR\": 24.764,\n            \"SAR\": 8.80443,\n            \"ISR\": 11.1303\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1173,\n            \"SIR\": 19.6311,\n            \"SAR\": 18.1249,\n            \"ISR\": 24.2693\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.51972,\n            \"SIR\": 23.2259,\n            \"SAR\": 8.06198,\n            \"ISR\": 11.1239\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3152,\n            \"SIR\": 14.5681,\n            \"SAR\": 13.4831,\n            \"ISR\": 22.397\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.98197,\n            \"SIR\": 19.5055,\n            \"SAR\": 7.73423,\n            \"ISR\": 11.6913\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8241,\n            \"SIR\": 18.0304,\n            \"SAR\": 16.1106,\n            \"ISR\": 25.7671\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 6.98197,\n        \"SIR\": 19.5055,\n        \"SAR\": 7.73423,\n        \"ISR\": 11.1484\n      },\n      \"instrumental\": {\n        \"SDR\": 13.0481,\n        \"SIR\": 16.3978,\n        \"SAR\": 15.3898,\n        \"ISR\": 21.3385\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"MGM_LOWEND_B_v4.pth\": {\n    \"model_name\": \"VR Arch Single Model v4: MGM_LOWEND_B_v4\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.9675,\n            \"SIR\": 15.4909,\n            \"SAR\": 2.98769,\n            \"ISR\": 8.2918\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8575,\n            \"SIR\": 19.7225,\n            \"SAR\": 17.7783,\n            \"ISR\": 18.5521\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.18477,\n            \"SIR\": 8.98029,\n            \"SAR\": 5.10279,\n            \"ISR\": 14.4782\n          },\n          \"instrumental\": {\n            \"SDR\": 8.99737,\n            \"SIR\": 19.4874,\n            \"SAR\": 10.5674,\n            \"ISR\": 13.0905\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.19721,\n            \"SIR\": 19.6505,\n            \"SAR\": 8.16066,\n            \"ISR\": 14.7074\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3895,\n            \"SIR\": 20.9081,\n            \"SAR\": 15.9993,\n            \"ISR\": 19.1861\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.03057,\n            \"SIR\": 5.97143,\n            \"SAR\": 3.79114,\n            \"ISR\": 13.176\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4174,\n            \"SIR\": 21.4517,\n            \"SAR\": 12.9543,\n            \"ISR\": 16.5193\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87721,\n            \"SIR\": 20.2411,\n            \"SAR\": 8.90107,\n            \"ISR\": 13.5076\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3989,\n            \"SIR\": 15.3159,\n            \"SAR\": 12.5643,\n            \"ISR\": 23.9658\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.47155,\n            \"SIR\": 15.8875,\n            \"SAR\": 8.90151,\n            \"ISR\": 14.6205\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6876,\n            \"SIR\": 16.5728,\n            \"SAR\": 12.0138,\n            \"ISR\": 19.8179\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.43083,\n            \"SIR\": 18.8175,\n            \"SAR\": 9.63775,\n            \"ISR\": 15.2819\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8531,\n            \"SIR\": 19.3671,\n            \"SAR\": 15.0126,\n            \"ISR\": 24.4046\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.27594,\n            \"SIR\": 19.6452,\n            \"SAR\": 4.86124,\n            \"ISR\": 10.1817\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8908,\n            \"SIR\": 22.5699,\n            \"SAR\": 21.5551,\n            \"ISR\": 21.1843\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.53377,\n            \"SIR\": 18.714,\n            \"SAR\": 8.39711,\n            \"ISR\": 14.2194\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0415,\n            \"SIR\": 17.8973,\n            \"SAR\": 13.1423,\n            \"ISR\": 17.2306\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.0754,\n            \"SIR\": 17.3445,\n            \"SAR\": 6.49397,\n            \"ISR\": 9.85213\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0414,\n            \"SIR\": 13.146,\n            \"SAR\": 12.5185,\n            \"ISR\": 22.9792\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.61233,\n            \"SIR\": 21.3121,\n            \"SAR\": 9.94979,\n            \"ISR\": 18.7393\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6227,\n            \"SIR\": 22.3657,\n            \"SAR\": 13.4869,\n            \"ISR\": 21.2047\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0013,\n            \"SIR\": 21.2563,\n            \"SAR\": 9.29221,\n            \"ISR\": 14.8023\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6,\n            \"SIR\": 20.8971,\n            \"SAR\": 15.4985,\n            \"ISR\": 27.4692\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.86578,\n            \"SIR\": 16.7447,\n            \"SAR\": 6.12781,\n            \"ISR\": 12.138\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7448,\n            \"SIR\": 20.9252,\n            \"SAR\": 15.6285,\n            \"ISR\": 24.7113\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.12226,\n            \"SIR\": 19.1395,\n            \"SAR\": 8.25136,\n            \"ISR\": 13.8451\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2615,\n            \"SIR\": 20.8518,\n            \"SAR\": 16.6748,\n            \"ISR\": 19.3368\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.5266,\n            \"SIR\": 15.0029,\n            \"SAR\": 1.55637,\n            \"ISR\": 6.61714\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2654,\n            \"SIR\": 17.2336,\n            \"SAR\": 18.2078,\n            \"ISR\": 27.7387\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.63607,\n            \"SIR\": 17.4365,\n            \"SAR\": 8.15807,\n            \"ISR\": 15.5045\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1067,\n            \"SIR\": 19.9996,\n            \"SAR\": 14.3959,\n            \"ISR\": 19.7474\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04977,\n            \"SIR\": 21.8256,\n            \"SAR\": 3.57315,\n            \"ISR\": 8.35393\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7275,\n            \"SIR\": 15.5956,\n            \"SAR\": 14.375,\n            \"ISR\": 18.0402\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -27.2884,\n            \"SIR\": -38.3822,\n            \"SAR\": 0.09529,\n            \"ISR\": 2.00992\n          },\n          \"instrumental\": {\n            \"SDR\": 9.76453,\n            \"SIR\": 45.8066,\n            \"SAR\": 10.1319,\n            \"ISR\": 13.6418\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.17251,\n            \"SIR\": 14.8962,\n            \"SAR\": 4.38671,\n            \"ISR\": 8.5191\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6384,\n            \"SIR\": 17.279,\n            \"SAR\": 16.2987,\n            \"ISR\": 27.1687\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95101,\n            \"SIR\": 18.7905,\n            \"SAR\": 7.62096,\n            \"ISR\": 11.8259\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9312,\n            \"SIR\": 17.3149,\n            \"SAR\": 15.1922,\n            \"ISR\": 19.374\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.37826,\n            \"SIR\": 20.6571,\n            \"SAR\": 10.0759,\n            \"ISR\": 15.7293\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3204,\n            \"SIR\": 22.4937,\n            \"SAR\": 17.4025,\n            \"ISR\": 27.7919\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.057635,\n            \"SIR\": 6.61954,\n            \"SAR\": 0.013615,\n            \"ISR\": 5.32883\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7593,\n            \"SIR\": 24.2327,\n            \"SAR\": 22.5662,\n            \"ISR\": 18.3651\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.41033,\n            \"SIR\": 14.2618,\n            \"SAR\": 2.79349,\n            \"ISR\": 5.5282\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3692,\n            \"SIR\": 12.3214,\n            \"SAR\": 15.2233,\n            \"ISR\": 19.4368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.40881,\n            \"SIR\": 22.0028,\n            \"SAR\": 0.55808,\n            \"ISR\": 7.29613\n          },\n          \"instrumental\": {\n            \"SDR\": 15.308,\n            \"SIR\": 19.6869,\n            \"SAR\": 17.0789,\n            \"ISR\": 22.01\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.71273,\n            \"SIR\": 17.7328,\n            \"SAR\": 5.24679,\n            \"ISR\": 9.34773\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6351,\n            \"SIR\": 17.1453,\n            \"SAR\": 16.0849,\n            \"ISR\": 27.5017\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0391,\n            \"SIR\": 14.0093,\n            \"SAR\": 5.92357,\n            \"ISR\": 13.8404\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2006,\n            \"SIR\": 19.9166,\n            \"SAR\": 13.9145,\n            \"ISR\": 21.4495\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.97044,\n            \"SIR\": 17.3459,\n            \"SAR\": 8.20739,\n            \"ISR\": 13.0825\n          },\n          \"instrumental\": {\n            \"SDR\": 9.66595,\n            \"SIR\": 14.4362,\n            \"SAR\": 10.9794,\n            \"ISR\": 19.8647\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.66267,\n            \"SIR\": 12.2592,\n            \"SAR\": 2.98286,\n            \"ISR\": 8.7549\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3507,\n            \"SIR\": 13.8424,\n            \"SAR\": 11.6756,\n            \"ISR\": 20.9727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.2687,\n            \"SIR\": 18.0036,\n            \"SAR\": 7.73318,\n            \"ISR\": 11.5953\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9078,\n            \"SIR\": 16.0322,\n            \"SAR\": 14.7308,\n            \"ISR\": 16.4761\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4236,\n            \"SIR\": 21.0483,\n            \"SAR\": 9.07507,\n            \"ISR\": 12.3294\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3963,\n            \"SIR\": 12.7783,\n            \"SAR\": 9.89462,\n            \"ISR\": 22.8778\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2711,\n            \"SIR\": 18.9436,\n            \"SAR\": 9.15173,\n            \"ISR\": 12.8451\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7588,\n            \"SIR\": 17.0153,\n            \"SAR\": 13.8067,\n            \"ISR\": 17.7302\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.40316,\n            \"SIR\": 27.7491,\n            \"SAR\": 9.81864,\n            \"ISR\": 15.1854\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0068,\n            \"SIR\": 23.1889,\n            \"SAR\": 18.3981,\n            \"ISR\": 19.178\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.68018,\n            \"SIR\": 6.6671,\n            \"SAR\": 6.51245,\n            \"ISR\": 9.19247\n          },\n          \"instrumental\": {\n            \"SDR\": 9.51619,\n            \"SIR\": 15.1478,\n            \"SAR\": 11.8799,\n            \"ISR\": 14.3974\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6671,\n            \"SIR\": 20.7408,\n            \"SAR\": 11.6936,\n            \"ISR\": 21.6505\n          },\n          \"instrumental\": {\n            \"SDR\": 19.9837,\n            \"SIR\": 29.7667,\n            \"SAR\": 20.4837,\n            \"ISR\": 26.127\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.45481,\n            \"SIR\": 13.4013,\n            \"SAR\": 9.46919,\n            \"ISR\": 18.6245\n          },\n          \"instrumental\": {\n            \"SDR\": 7.83593,\n            \"SIR\": 16.5367,\n            \"SAR\": 8.7138,\n            \"ISR\": 13.1285\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.11995,\n            \"SIR\": 15.5337,\n            \"SAR\": 4.99059,\n            \"ISR\": 11.1439\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7257,\n            \"SIR\": 15.8819,\n            \"SAR\": 12.3394,\n            \"ISR\": 22.7387\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.63972,\n            \"SIR\": 12.162,\n            \"SAR\": 4.69649,\n            \"ISR\": 13.5619\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8,\n            \"SIR\": 24.2831,\n            \"SAR\": 17.1415,\n            \"ISR\": 17.787\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.64777,\n            \"SIR\": 22.8761,\n            \"SAR\": 7.6792,\n            \"ISR\": 14.3182\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5046,\n            \"SIR\": 21.6225,\n            \"SAR\": 16.5703,\n            \"ISR\": 30.4884\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.92042,\n            \"SIR\": 22.5185,\n            \"SAR\": 8.14133,\n            \"ISR\": 11.7341\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9359,\n            \"SIR\": 21.5351,\n            \"SAR\": 17.9104,\n            \"ISR\": 23.6473\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.65147,\n            \"SIR\": 17.7581,\n            \"SAR\": 8.14693,\n            \"ISR\": 14.4044\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5437,\n            \"SIR\": 18.1257,\n            \"SAR\": 12.7496,\n            \"ISR\": 18.9138\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.05077,\n            \"SIR\": 17.7585,\n            \"SAR\": 7.3985,\n            \"ISR\": 14.2439\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7162,\n            \"SIR\": 20.4682,\n            \"SAR\": 15.4043,\n            \"ISR\": 24.2121\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.53377,\n        \"SIR\": 17.7581,\n        \"SAR\": 7.62096,\n        \"ISR\": 13.0825\n      },\n      \"instrumental\": {\n        \"SDR\": 13.1067,\n        \"SIR\": 19.4874,\n        \"SAR\": 15.0126,\n        \"ISR\": 19.8647\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"MGM_MAIN_v4.pth\": {\n    \"model_name\": \"VR Arch Single Model v4: MGM_MAIN_v4\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.72751,\n            \"SIR\": 19.3325,\n            \"SAR\": 3.02685,\n            \"ISR\": 5.7645\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4502,\n            \"SIR\": 16.2785,\n            \"SAR\": 19.0347,\n            \"ISR\": 19.0905\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.93812,\n            \"SIR\": 11.2323,\n            \"SAR\": 4.85155,\n            \"ISR\": 11.141\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0664,\n            \"SIR\": 16.3012,\n            \"SAR\": 11.9555,\n            \"ISR\": 14.8655\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.16094,\n            \"SIR\": 20.438,\n            \"SAR\": 8.07798,\n            \"ISR\": 13.3923\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3901,\n            \"SIR\": 19.4143,\n            \"SAR\": 16.3201,\n            \"ISR\": 21.1946\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.52928,\n            \"SIR\": 6.37677,\n            \"SAR\": 2.83622,\n            \"ISR\": 11.7858\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1292,\n            \"SIR\": 20.3967,\n            \"SAR\": 13.5529,\n            \"ISR\": 17.6903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.76107,\n            \"SIR\": 21.6031,\n            \"SAR\": 7.86276,\n            \"ISR\": 10.1576\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5998,\n            \"SIR\": 12.3408,\n            \"SAR\": 12.3839,\n            \"ISR\": 21.7458\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.00859,\n            \"SIR\": 16.1299,\n            \"SAR\": 8.60872,\n            \"ISR\": 11.5513\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3294,\n            \"SIR\": 13.4008,\n            \"SAR\": 12.982,\n            \"ISR\": 19.5907\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.84357,\n            \"SIR\": 18.8094,\n            \"SAR\": 9.44955,\n            \"ISR\": 12.5665\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5357,\n            \"SIR\": 16.7158,\n            \"SAR\": 15.5266,\n            \"ISR\": 24.008\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.16387,\n            \"SIR\": 17.5026,\n            \"SAR\": 2.68586,\n            \"ISR\": 6.48377\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2006,\n            \"SIR\": 20.6321,\n            \"SAR\": 22.5002,\n            \"ISR\": 21.4237\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.59771,\n            \"SIR\": 21.2194,\n            \"SAR\": 9.04271,\n            \"ISR\": 11.3929\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0731,\n            \"SIR\": 14.7194,\n            \"SAR\": 14.7213,\n            \"ISR\": 17.2514\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.25806,\n            \"SIR\": 17.4758,\n            \"SAR\": 5.81589,\n            \"ISR\": 8.32099\n          },\n          \"instrumental\": {\n            \"SDR\": 11.511,\n            \"SIR\": 11.8528,\n            \"SAR\": 13.1743,\n            \"ISR\": 24.432\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.13442,\n            \"SIR\": 21.143,\n            \"SAR\": 9.85519,\n            \"ISR\": 14.5355\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3455,\n            \"SIR\": 18.0624,\n            \"SAR\": 13.9838,\n            \"ISR\": 21.6233\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.20097,\n            \"SIR\": 21.9655,\n            \"SAR\": 8.5321,\n            \"ISR\": 11.9622\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5048,\n            \"SIR\": 17.8594,\n            \"SAR\": 16.2637,\n            \"ISR\": 29.3976\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.11146,\n            \"SIR\": 17.0105,\n            \"SAR\": 5.66823,\n            \"ISR\": 9.53944\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2027,\n            \"SIR\": 17.8198,\n            \"SAR\": 16.1186,\n            \"ISR\": 26.0773\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.42413,\n            \"SIR\": 18.3399,\n            \"SAR\": 7.00236,\n            \"ISR\": 11.0991\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3683,\n            \"SIR\": 19.6921,\n            \"SAR\": 18.3173,\n            \"ISR\": 20.5266\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.33923,\n            \"SIR\": 19.644,\n            \"SAR\": 2.58578,\n            \"ISR\": 5.05855\n          },\n          \"instrumental\": {\n            \"SDR\": 12.214,\n            \"SIR\": 13.4413,\n            \"SAR\": 17.8915,\n            \"ISR\": 29.7868\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.26483,\n            \"SIR\": 18.0463,\n            \"SAR\": 8.03526,\n            \"ISR\": 12.3866\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6025,\n            \"SIR\": 17.0558,\n            \"SAR\": 14.8997,\n            \"ISR\": 18.6994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.84903,\n            \"SIR\": 23.6097,\n            \"SAR\": 3.9548,\n            \"ISR\": 7.16666\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7725,\n            \"SIR\": 13.8488,\n            \"SAR\": 15.413,\n            \"ISR\": 19.9839\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -1.92375,\n            \"SIR\": -31.4327,\n            \"SAR\": 0.041725,\n            \"ISR\": -3.61074\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3915,\n            \"SIR\": 42.7212,\n            \"SAR\": 12.5109,\n            \"ISR\": 16.3128\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.34504,\n            \"SIR\": 15.6529,\n            \"SAR\": 4.47412,\n            \"ISR\": 7.24422\n          },\n          \"instrumental\": {\n            \"SDR\": 15.32,\n            \"SIR\": 15.824,\n            \"SAR\": 17.1089,\n            \"ISR\": 28.6607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.21079,\n            \"SIR\": 17.9775,\n            \"SAR\": 5.92651,\n            \"ISR\": 8.64818\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4077,\n            \"SIR\": 14.6662,\n            \"SAR\": 16.3658,\n            \"ISR\": 19.3588\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.46199,\n            \"SIR\": 18.9667,\n            \"SAR\": 7.11996,\n            \"ISR\": 10.8521\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7341,\n            \"SIR\": 19.7769,\n            \"SAR\": 18.7759,\n            \"ISR\": 29.2885\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.00896,\n            \"SIR\": 18.5584,\n            \"SAR\": 4.45391,\n            \"ISR\": 7.63199\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2802,\n            \"SIR\": 14.6967,\n            \"SAR\": 16.2287,\n            \"ISR\": 18.3484\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.21881,\n            \"SIR\": 14.5053,\n            \"SAR\": 2.58359,\n            \"ISR\": 4.80248\n          },\n          \"instrumental\": {\n            \"SDR\": 10.286,\n            \"SIR\": 11.5562,\n            \"SAR\": 15.9917,\n            \"ISR\": 20.3889\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.49386,\n            \"SIR\": 24.383,\n            \"SAR\": 1.46818,\n            \"ISR\": 6.15781\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3489,\n            \"SIR\": 15.2595,\n            \"SAR\": 15.8608,\n            \"ISR\": 20.1041\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.49918,\n            \"SIR\": 17.8586,\n            \"SAR\": 5.20316,\n            \"ISR\": 8.21143\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3516,\n            \"SIR\": 15.9547,\n            \"SAR\": 16.8832,\n            \"ISR\": 29.2472\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.22036,\n            \"SIR\": 16.3489,\n            \"SAR\": 6.45505,\n            \"ISR\": 9.94314\n          },\n          \"instrumental\": {\n            \"SDR\": 12.407,\n            \"SIR\": 14.8574,\n            \"SAR\": 14.09,\n            \"ISR\": 18.219\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.0678,\n            \"SIR\": 17.7307,\n            \"SAR\": 7.52551,\n            \"ISR\": 10.0548\n          },\n          \"instrumental\": {\n            \"SDR\": 8.76977,\n            \"SIR\": 11.297,\n            \"SAR\": 11.7032,\n            \"ISR\": 21.4769\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.44285,\n            \"SIR\": 16.5906,\n            \"SAR\": 2.22127,\n            \"ISR\": 5.59199\n          },\n          \"instrumental\": {\n            \"SDR\": 9.2545,\n            \"SIR\": 10.2319,\n            \"SAR\": 13.3096,\n            \"ISR\": 26.8723\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.10568,\n            \"SIR\": 19.2287,\n            \"SAR\": 7.50866,\n            \"ISR\": 9.68112\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7628,\n            \"SIR\": 14.5098,\n            \"SAR\": 15.2705,\n            \"ISR\": 17.1468\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.52713,\n            \"SIR\": 21.2175,\n            \"SAR\": 8.31681,\n            \"ISR\": 10.2352\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6359,\n            \"SIR\": 10.8487,\n            \"SAR\": 10.1175,\n            \"ISR\": 23.7012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.7169,\n            \"SIR\": 19.945,\n            \"SAR\": 9.04383,\n            \"ISR\": 11.1351\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2638,\n            \"SIR\": 14.0299,\n            \"SAR\": 14.1432,\n            \"ISR\": 18.1826\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.97752,\n            \"SIR\": 28.5569,\n            \"SAR\": 9.53626,\n            \"ISR\": 13.7075\n          },\n          \"instrumental\": {\n            \"SDR\": 15.598,\n            \"SIR\": 21.3313,\n            \"SAR\": 19.5473,\n            \"ISR\": 19.4505\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.49437,\n            \"SIR\": 7.96427,\n            \"SAR\": 4.70645,\n            \"ISR\": 7.82138\n          },\n          \"instrumental\": {\n            \"SDR\": 9.70192,\n            \"SIR\": 13.8198,\n            \"SAR\": 12.1401,\n            \"ISR\": 15.7106\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6451,\n            \"SIR\": 22.6737,\n            \"SAR\": 12.9176,\n            \"ISR\": 20.3813\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5965,\n            \"SIR\": 26.5817,\n            \"SAR\": 19.2761,\n            \"ISR\": 22.1355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91792,\n            \"SIR\": 14.6471,\n            \"SAR\": 6.29621,\n            \"ISR\": 11.9181\n          },\n          \"instrumental\": {\n            \"SDR\": 8.26272,\n            \"SIR\": 11.2704,\n            \"SAR\": 9.3395,\n            \"ISR\": 16.2706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0247,\n            \"SIR\": 16.1999,\n            \"SAR\": 5.22332,\n            \"ISR\": 8.58522\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0845,\n            \"SIR\": 13.1151,\n            \"SAR\": 13.2975,\n            \"ISR\": 17.5689\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.02464,\n            \"SIR\": 15.6789,\n            \"SAR\": 5.49276,\n            \"ISR\": 10.7681\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0376,\n            \"SIR\": 18.2861,\n            \"SAR\": 15.9109,\n            \"ISR\": 18.6434\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.05676,\n            \"SIR\": 23.5922,\n            \"SAR\": 7.31123,\n            \"ISR\": 10.7171\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2453,\n            \"SIR\": 17.3097,\n            \"SAR\": 16.8613,\n            \"ISR\": 31.4261\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.55171,\n            \"SIR\": 24.4466,\n            \"SAR\": 8.02854,\n            \"ISR\": 10.3788\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6177,\n            \"SIR\": 18.6118,\n            \"SAR\": 17.9461,\n            \"ISR\": 24.1905\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.68651,\n            \"SIR\": 21.0149,\n            \"SAR\": 7.10741,\n            \"ISR\": 10.286\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0247,\n            \"SIR\": 13.8353,\n            \"SAR\": 13.196,\n            \"ISR\": 21.0399\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.41377,\n            \"SIR\": 19.0187,\n            \"SAR\": 7.49891,\n            \"ISR\": 10.5788\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4717,\n            \"SIR\": 16.9355,\n            \"SAR\": 16.4662,\n            \"ISR\": 27.2967\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 6.22036,\n        \"SIR\": 18.5584,\n        \"SAR\": 6.45505,\n        \"ISR\": 10.2352\n      },\n      \"instrumental\": {\n        \"SDR\": 12.407,\n        \"SIR\": 15.824,\n        \"SAR\": 15.5266,\n        \"ISR\": 20.5266\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-De-Reverb-aufr33-jarredou.pth\": {\n    \"model_name\": \"VR Arch Single Model v4: UVR-De-Reverb by aufr33-jarredou\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"dry\",\n      \"no dry\"\n    ],\n    \"target_stem\": \"dry\"\n  },\n  \"UVR-MDX-NET-Inst_HQ_1.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst HQ 1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.26685,\n            \"SIR\": 19.0489,\n            \"SAR\": 5.43294,\n            \"ISR\": 9.16527\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5292,\n            \"SIR\": 23.6979,\n            \"SAR\": 19.953,\n            \"ISR\": 18.9037\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.86386,\n            \"SIR\": 13.8367,\n            \"SAR\": 7.1304,\n            \"ISR\": 12.6514\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1846,\n            \"SIR\": 21.8827,\n            \"SAR\": 13.5634,\n            \"ISR\": 15.4953\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1999,\n            \"SIR\": 23.0466,\n            \"SAR\": 10.8199,\n            \"ISR\": 15.0797\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8466,\n            \"SIR\": 26.4202,\n            \"SAR\": 18.2384,\n            \"ISR\": 18.5543\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.95429,\n            \"SIR\": 7.6728,\n            \"SAR\": 3.88337,\n            \"ISR\": 12.6449\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8948,\n            \"SIR\": 24.7515,\n            \"SAR\": 13.8385,\n            \"ISR\": 14.7255\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0403,\n            \"SIR\": 23.9019,\n            \"SAR\": 12.8167,\n            \"ISR\": 15.3731\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6085,\n            \"SIR\": 22.8054,\n            \"SAR\": 15.6456,\n            \"ISR\": 17.6908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0709,\n            \"SIR\": 18.4542,\n            \"SAR\": 10.8027,\n            \"ISR\": 14.7364\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8133,\n            \"SIR\": 21.9553,\n            \"SAR\": 13.3612,\n            \"ISR\": 15.9441\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5136,\n            \"SIR\": 20.4814,\n            \"SAR\": 12.7098,\n            \"ISR\": 15.0266\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0808,\n            \"SIR\": 24.0066,\n            \"SAR\": 17.1985,\n            \"ISR\": 17.9199\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.18142,\n            \"SIR\": 20.0956,\n            \"SAR\": 5.77096,\n            \"ISR\": 8.66334\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1059,\n            \"SIR\": 28.3377,\n            \"SAR\": 23.2815,\n            \"ISR\": 19.6012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3083,\n            \"SIR\": 30.5521,\n            \"SAR\": 13.0685,\n            \"ISR\": 16.7141\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2996,\n            \"SIR\": 28.1532,\n            \"SAR\": 17.4646,\n            \"ISR\": 18.9256\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7491,\n            \"SIR\": 23.8969,\n            \"SAR\": 9.72104,\n            \"ISR\": 11.5543\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7543,\n            \"SIR\": 17.5154,\n            \"SAR\": 15.2486,\n            \"ISR\": 18.2727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6417,\n            \"SIR\": 34.6714,\n            \"SAR\": 16.3188,\n            \"ISR\": 18.0275\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9108,\n            \"SIR\": 31.3944,\n            \"SAR\": 19.6994,\n            \"ISR\": 19.5486\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2439,\n            \"SIR\": 22.8843,\n            \"SAR\": 11.4796,\n            \"ISR\": 15.1462\n          },\n          \"instrumental\": {\n            \"SDR\": 15.402,\n            \"SIR\": 26.161,\n            \"SAR\": 17.3659,\n            \"ISR\": 18.5137\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.62232,\n            \"SIR\": 20.2482,\n            \"SAR\": 7.98341,\n            \"ISR\": 12.8985\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6625,\n            \"SIR\": 25.5025,\n            \"SAR\": 17.6297,\n            \"ISR\": 18.17\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.48818,\n            \"SIR\": 22.6401,\n            \"SAR\": 11.071,\n            \"ISR\": 11.7065\n          },\n          \"instrumental\": {\n            \"SDR\": 16.183,\n            \"SIR\": 23.0783,\n            \"SAR\": 20.5218,\n            \"ISR\": 18.5062\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.38401,\n            \"SIR\": 17.3352,\n            \"SAR\": 3.26781,\n            \"ISR\": 9.07339\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2362,\n            \"SIR\": 22.1474,\n            \"SAR\": 18.6295,\n            \"ISR\": 18.7399\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.78618,\n            \"SIR\": 17.6177,\n            \"SAR\": 9.34869,\n            \"ISR\": 14.7436\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9056,\n            \"SIR\": 24.387,\n            \"SAR\": 14.516,\n            \"ISR\": 16.5355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.59081,\n            \"SIR\": 22.6088,\n            \"SAR\": 8.75701,\n            \"ISR\": 13.5122\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8771,\n            \"SIR\": 24.6355,\n            \"SAR\": 17.2345,\n            \"ISR\": 18.7096\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -14.1935,\n            \"SIR\": -36.4567,\n            \"SAR\": 0.314075,\n            \"ISR\": 11.102\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6646,\n            \"SIR\": 57.5533,\n            \"SAR\": 12.6339,\n            \"ISR\": 12.7074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.01401,\n            \"SIR\": 15.9968,\n            \"SAR\": 6.57059,\n            \"ISR\": 10.5887\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8938,\n            \"SIR\": 21.4246,\n            \"SAR\": 17.7676,\n            \"ISR\": 17.9455\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9738,\n            \"SIR\": 24.5406,\n            \"SAR\": 11.1928,\n            \"ISR\": 14.9648\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7663,\n            \"SIR\": 25.8849,\n            \"SAR\": 17.8359,\n            \"ISR\": 18.8504\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1398,\n            \"SIR\": 22.6288,\n            \"SAR\": 10.7266,\n            \"ISR\": 15.6327\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1908,\n            \"SIR\": 30.6531,\n            \"SAR\": 20.4712,\n            \"ISR\": 19.0078\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -10.6104,\n            \"SIR\": -7.99531,\n            \"SAR\": 0.11591,\n            \"ISR\": 8.1445\n          },\n          \"instrumental\": {\n            \"SDR\": 19.2292,\n            \"SIR\": 40.8291,\n            \"SAR\": 29.1592,\n            \"ISR\": 18.7721\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.02899,\n            \"SIR\": 15.6366,\n            \"SAR\": 3.74437,\n            \"ISR\": 6.1774\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7603,\n            \"SIR\": 13.8704,\n            \"SAR\": 14.6637,\n            \"ISR\": 17.6305\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.50677,\n            \"SIR\": 17.7859,\n            \"SAR\": 5.96917,\n            \"ISR\": 12.0982\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6279,\n            \"SIR\": 28.1053,\n            \"SAR\": 20.639,\n            \"ISR\": 19.0337\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.61954,\n            \"SIR\": 19.641,\n            \"SAR\": 7.83261,\n            \"ISR\": 11.2693\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5554,\n            \"SIR\": 21.8735,\n            \"SAR\": 17.228,\n            \"ISR\": 18.5753\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.47519,\n            \"SIR\": 17.9245,\n            \"SAR\": 8.86247,\n            \"ISR\": 14.0049\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4927,\n            \"SIR\": 25.3263,\n            \"SAR\": 16.9752,\n            \"ISR\": 17.7823\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.526,\n            \"SIR\": 25.3446,\n            \"SAR\": 12.1187,\n            \"ISR\": 14.5452\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1298,\n            \"SIR\": 21.3466,\n            \"SAR\": 14.3429,\n            \"ISR\": 17.8006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.34312,\n            \"SIR\": 15.5056,\n            \"SAR\": 6.22015,\n            \"ISR\": 10.7062\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4187,\n            \"SIR\": 18.2033,\n            \"SAR\": 13.5888,\n            \"ISR\": 16.0094\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.33572,\n            \"SIR\": 6.25176,\n            \"SAR\": 7.65442,\n            \"ISR\": 14.8568\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1484,\n            \"SIR\": 22.8804,\n            \"SAR\": 12.9362,\n            \"ISR\": 13.4223\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5018,\n            \"SIR\": 30.7089,\n            \"SAR\": 14.2291,\n            \"ISR\": 15.8441\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9845,\n            \"SIR\": 20.4726,\n            \"SAR\": 14.6665,\n            \"ISR\": 19.0178\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.093,\n            \"SIR\": 24.8229,\n            \"SAR\": 11.6782,\n            \"ISR\": 15.1569\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2491,\n            \"SIR\": 23.4368,\n            \"SAR\": 15.6673,\n            \"ISR\": 17.9126\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5012,\n            \"SIR\": 26.6218,\n            \"SAR\": 12.1428,\n            \"ISR\": 14.8729\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2478,\n            \"SIR\": 27.3377,\n            \"SAR\": 21.4549,\n            \"ISR\": 18.9566\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.46905,\n            \"SIR\": 11.5357,\n            \"SAR\": 0.40599,\n            \"ISR\": 7.44792\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4402,\n            \"SIR\": 14.5972,\n            \"SAR\": 11.2779,\n            \"ISR\": 16.9035\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7932,\n            \"SIR\": 29.1664,\n            \"SAR\": 13.8629,\n            \"ISR\": 17.3101\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8257,\n            \"SIR\": 34.4168,\n            \"SAR\": 24.75,\n            \"ISR\": 19.711\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.976,\n            \"SIR\": 21.0465,\n            \"SAR\": 12.048,\n            \"ISR\": 16.0109\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5617,\n            \"SIR\": 22.1389,\n            \"SAR\": 12.4756,\n            \"ISR\": 15.6336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.17875,\n            \"SIR\": 25.829,\n            \"SAR\": 8.23288,\n            \"ISR\": 12.9425\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9527,\n            \"SIR\": 21.8677,\n            \"SAR\": 15.3017,\n            \"ISR\": 18.5297\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.94406,\n            \"SIR\": 7.7746,\n            \"SAR\": 1.98949,\n            \"ISR\": 9.89117\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9691,\n            \"SIR\": 30.6376,\n            \"SAR\": 22.0327,\n            \"ISR\": 18.2674\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1519,\n            \"SIR\": 27.509,\n            \"SAR\": 10.5958,\n            \"ISR\": 15.5151\n          },\n          \"instrumental\": {\n            \"SDR\": 17.126,\n            \"SIR\": 30.0009,\n            \"SAR\": 20.3964,\n            \"ISR\": 19.2701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1509,\n            \"SIR\": 30.7697,\n            \"SAR\": 11.6768,\n            \"ISR\": 15.3188\n          },\n          \"instrumental\": {\n            \"SDR\": 17.924,\n            \"SIR\": 29.885,\n            \"SAR\": 21.8401,\n            \"ISR\": 19.5461\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.217,\n            \"SIR\": 30.7074,\n            \"SAR\": 13.6792,\n            \"ISR\": 16.5262\n          },\n          \"instrumental\": {\n            \"SDR\": 15.629,\n            \"SIR\": 25.9143,\n            \"SAR\": 17.2054,\n            \"ISR\": 18.9425\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7964,\n            \"SIR\": 23.8375,\n            \"SAR\": 11.249,\n            \"ISR\": 16.2143\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1787,\n            \"SIR\": 29.1949,\n            \"SAR\": 18.98,\n            \"ISR\": 18.8066\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.78618,\n        \"SIR\": 21.0465,\n        \"SAR\": 9.72104,\n        \"ISR\": 14.5452\n      },\n      \"instrumental\": {\n        \"SDR\": 15.402,\n        \"SIR\": 24.6355,\n        \"SAR\": 17.2345,\n        \"ISR\": 18.5137\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_HQ_2.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst HQ 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.75013,\n            \"SIR\": 18.1092,\n            \"SAR\": 5.6745,\n            \"ISR\": 10.305\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5295,\n            \"SIR\": 25.0494,\n            \"SAR\": 19.9577,\n            \"ISR\": 18.7263\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.25193,\n            \"SIR\": 10.5995,\n            \"SAR\": 6.92066,\n            \"ISR\": 13.4755\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1988,\n            \"SIR\": 23.0424,\n            \"SAR\": 12.6725,\n            \"ISR\": 13.847\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2148,\n            \"SIR\": 21.3804,\n            \"SAR\": 10.8151,\n            \"ISR\": 15.369\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6513,\n            \"SIR\": 27.1273,\n            \"SAR\": 18.1712,\n            \"ISR\": 18.0733\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.22032,\n            \"SIR\": 5.78212,\n            \"SAR\": 4.25469,\n            \"ISR\": 13.5468\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9118,\n            \"SIR\": 26.2213,\n            \"SAR\": 13.0931,\n            \"ISR\": 13.3895\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2178,\n            \"SIR\": 22.3866,\n            \"SAR\": 13.3185,\n            \"ISR\": 16.2224\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7686,\n            \"SIR\": 24.8618,\n            \"SAR\": 15.7859,\n            \"ISR\": 17.1019\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0235,\n            \"SIR\": 17.9669,\n            \"SAR\": 10.9328,\n            \"ISR\": 14.9252\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8309,\n            \"SIR\": 22.3207,\n            \"SAR\": 13.4408,\n            \"ISR\": 15.6294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.592,\n            \"SIR\": 20.4494,\n            \"SAR\": 12.7341,\n            \"ISR\": 15.1387\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0605,\n            \"SIR\": 24.1589,\n            \"SAR\": 17.1633,\n            \"ISR\": 17.7572\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.41811,\n            \"SIR\": 19.0141,\n            \"SAR\": 5.64387,\n            \"ISR\": 8.20029\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8129,\n            \"SIR\": 28.1003,\n            \"SAR\": 23.531,\n            \"ISR\": 19.2944\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3787,\n            \"SIR\": 30.0504,\n            \"SAR\": 13.1581,\n            \"ISR\": 16.795\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3285,\n            \"SIR\": 28.4907,\n            \"SAR\": 17.5272,\n            \"ISR\": 18.8554\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8534,\n            \"SIR\": 21.4323,\n            \"SAR\": 10.5037,\n            \"ISR\": 12.5056\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5767,\n            \"SIR\": 19.1949,\n            \"SAR\": 15.7617,\n            \"ISR\": 17.6246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7439,\n            \"SIR\": 31.6311,\n            \"SAR\": 16.5139,\n            \"ISR\": 18.1119\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5368,\n            \"SIR\": 31.9364,\n            \"SAR\": 19.5184,\n            \"ISR\": 19.0697\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4341,\n            \"SIR\": 21.4683,\n            \"SAR\": 11.6368,\n            \"ISR\": 15.2359\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2523,\n            \"SIR\": 25.9826,\n            \"SAR\": 17.4633,\n            \"ISR\": 18.0557\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.57684,\n            \"SIR\": 19.7919,\n            \"SAR\": 8.08266,\n            \"ISR\": 13.4371\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5119,\n            \"SIR\": 26.3783,\n            \"SAR\": 17.7314,\n            \"ISR\": 17.8069\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.14174,\n            \"SIR\": 20.8594,\n            \"SAR\": 10.9994,\n            \"ISR\": 11.7026\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8888,\n            \"SIR\": 23.4503,\n            \"SAR\": 19.4485,\n            \"ISR\": 17.8789\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.3453,\n            \"SIR\": 15.9079,\n            \"SAR\": 3.26428,\n            \"ISR\": 9.49522\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8953,\n            \"SIR\": 22.7308,\n            \"SAR\": 18.2017,\n            \"ISR\": 18.3418\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.7616,\n            \"SIR\": 17.0676,\n            \"SAR\": 9.41335,\n            \"ISR\": 15.0365\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7658,\n            \"SIR\": 25.0631,\n            \"SAR\": 14.5017,\n            \"ISR\": 16.255\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.64321,\n            \"SIR\": 21.5826,\n            \"SAR\": 8.85857,\n            \"ISR\": 13.7613\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7159,\n            \"SIR\": 25.0438,\n            \"SAR\": 17.0628,\n            \"ISR\": 18.3133\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.9311,\n            \"SIR\": -34.6845,\n            \"SAR\": 0.44717,\n            \"ISR\": 11.6692\n          },\n          \"instrumental\": {\n            \"SDR\": 14.167,\n            \"SIR\": 57.9273,\n            \"SAR\": 15.0911,\n            \"ISR\": 14.0964\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.71139,\n            \"SIR\": 14.0424,\n            \"SAR\": 6.47662,\n            \"ISR\": 11.2724\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7111,\n            \"SIR\": 22.3648,\n            \"SAR\": 17.6089,\n            \"ISR\": 17.0999\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.055,\n            \"SIR\": 23.6487,\n            \"SAR\": 11.3556,\n            \"ISR\": 15.038\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7785,\n            \"SIR\": 26.0834,\n            \"SAR\": 18.1416,\n            \"ISR\": 18.6006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4457,\n            \"SIR\": 21.7137,\n            \"SAR\": 11.0581,\n            \"ISR\": 15.8732\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0851,\n            \"SIR\": 30.946,\n            \"SAR\": 20.6105,\n            \"ISR\": 18.6987\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -8.89141,\n            \"SIR\": -6.61302,\n            \"SAR\": -0.53875,\n            \"ISR\": 6.34526\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3864,\n            \"SIR\": 41.3587,\n            \"SAR\": 28.1256,\n            \"ISR\": 18.8042\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.13095,\n            \"SIR\": 14.4451,\n            \"SAR\": 3.91932,\n            \"ISR\": 6.58308\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7107,\n            \"SIR\": 14.3284,\n            \"SAR\": 14.5143,\n            \"ISR\": 17.0436\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.35816,\n            \"SIR\": 16.4118,\n            \"SAR\": 6.44462,\n            \"ISR\": 12.7408\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4328,\n            \"SIR\": 29.2,\n            \"SAR\": 20.3177,\n            \"ISR\": 18.6126\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.66077,\n            \"SIR\": 18.2472,\n            \"SAR\": 8.01908,\n            \"ISR\": 11.7037\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4873,\n            \"SIR\": 22.4413,\n            \"SAR\": 17.3787,\n            \"ISR\": 18.0918\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.65901,\n            \"SIR\": 17.9711,\n            \"SAR\": 9.0448,\n            \"ISR\": 14.2412\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1546,\n            \"SIR\": 25.6346,\n            \"SAR\": 16.7448,\n            \"ISR\": 17.5825\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4329,\n            \"SIR\": 24.1807,\n            \"SAR\": 12.1309,\n            \"ISR\": 14.8873\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1586,\n            \"SIR\": 21.9866,\n            \"SAR\": 14.3689,\n            \"ISR\": 17.4333\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.71024,\n            \"SIR\": 16.6967,\n            \"SAR\": 6.41049,\n            \"ISR\": 10.7304\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9114,\n            \"SIR\": 18.192,\n            \"SAR\": 13.7687,\n            \"ISR\": 16.4708\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.35064,\n            \"SIR\": 6.27276,\n            \"SAR\": 7.74583,\n            \"ISR\": 15.2688\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1029,\n            \"SIR\": 23.6844,\n            \"SAR\": 13.0589,\n            \"ISR\": 13.3144\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5135,\n            \"SIR\": 29.3897,\n            \"SAR\": 14.199,\n            \"ISR\": 15.7632\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8891,\n            \"SIR\": 20.6328,\n            \"SAR\": 14.7286,\n            \"ISR\": 18.5527\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4348,\n            \"SIR\": 23.7673,\n            \"SAR\": 11.9227,\n            \"ISR\": 15.5388\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2276,\n            \"SIR\": 24.1166,\n            \"SAR\": 15.9378,\n            \"ISR\": 17.5278\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8745,\n            \"SIR\": 25.3945,\n            \"SAR\": 12.4421,\n            \"ISR\": 15.6904\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3909,\n            \"SIR\": 28.9113,\n            \"SAR\": 21.647,\n            \"ISR\": 18.9789\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.70417,\n            \"SIR\": 10.4575,\n            \"SAR\": 1.68583,\n            \"ISR\": 8.37743\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8265,\n            \"SIR\": 15.3969,\n            \"SAR\": 12.5611,\n            \"ISR\": 16.6539\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1129,\n            \"SIR\": 27.289,\n            \"SAR\": 14.0116,\n            \"ISR\": 17.5973\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6953,\n            \"SIR\": 35.3744,\n            \"SAR\": 24.7443,\n            \"ISR\": 19.3882\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8359,\n            \"SIR\": 20.1393,\n            \"SAR\": 12.178,\n            \"ISR\": 16.4394\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7308,\n            \"SIR\": 22.9517,\n            \"SAR\": 12.4309,\n            \"ISR\": 15.1176\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.25759,\n            \"SIR\": 24.3971,\n            \"SAR\": 8.56934,\n            \"ISR\": 13.2231\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0296,\n            \"SIR\": 22.3451,\n            \"SAR\": 15.3987,\n            \"ISR\": 18.1758\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.52491,\n            \"SIR\": 6.42375,\n            \"SAR\": 2.18858,\n            \"ISR\": 10.7866\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7642,\n            \"SIR\": 30.9915,\n            \"SAR\": 22.0214,\n            \"ISR\": 17.8616\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0833,\n            \"SIR\": 25.1693,\n            \"SAR\": 10.6642,\n            \"ISR\": 15.6484\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8894,\n            \"SIR\": 30.752,\n            \"SAR\": 20.563,\n            \"ISR\": 18.811\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4449,\n            \"SIR\": 29.5342,\n            \"SAR\": 11.8947,\n            \"ISR\": 15.7077\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8416,\n            \"SIR\": 31.0958,\n            \"SAR\": 21.9747,\n            \"ISR\": 19.3451\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2274,\n            \"SIR\": 29.3935,\n            \"SAR\": 13.8195,\n            \"ISR\": 16.6271\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5728,\n            \"SIR\": 26.1098,\n            \"SAR\": 17.4237,\n            \"ISR\": 18.6752\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0053,\n            \"SIR\": 23.8065,\n            \"SAR\": 11.5084,\n            \"ISR\": 15.8519\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1076,\n            \"SIR\": 28.411,\n            \"SAR\": 19.1462,\n            \"ISR\": 18.6863\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.7616,\n        \"SIR\": 20.4494,\n        \"SAR\": 10.5037,\n        \"ISR\": 14.8873\n      },\n      \"instrumental\": {\n        \"SDR\": 15.2523,\n        \"SIR\": 25.0631,\n        \"SAR\": 17.4237,\n        \"ISR\": 18.0557\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_HQ_3.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst HQ 3\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.62282,\n            \"SIR\": 18.7143,\n            \"SAR\": 5.72698,\n            \"ISR\": 9.80979\n          },\n          \"instrumental\": {\n            \"SDR\": 16.512,\n            \"SIR\": 24.6836,\n            \"SAR\": 19.931,\n            \"ISR\": 18.8168\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.9593,\n            \"SIR\": 11.6041,\n            \"SAR\": 7.30291,\n            \"ISR\": 13.0865\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8454,\n            \"SIR\": 22.4631,\n            \"SAR\": 13.3694,\n            \"ISR\": 14.4199\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2541,\n            \"SIR\": 21.9765,\n            \"SAR\": 10.8039,\n            \"ISR\": 15.3018\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6539,\n            \"SIR\": 26.9562,\n            \"SAR\": 18.3161,\n            \"ISR\": 18.2363\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.64894,\n            \"SIR\": 6.66637,\n            \"SAR\": 4.03823,\n            \"ISR\": 13.2784\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3023,\n            \"SIR\": 25.9891,\n            \"SAR\": 13.4442,\n            \"ISR\": 13.9262\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.196,\n            \"SIR\": 22.7971,\n            \"SAR\": 13.2357,\n            \"ISR\": 16.0105\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6984,\n            \"SIR\": 24.3107,\n            \"SAR\": 15.8734,\n            \"ISR\": 17.2338\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0571,\n            \"SIR\": 17.8196,\n            \"SAR\": 10.9203,\n            \"ISR\": 14.9919\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8254,\n            \"SIR\": 22.4586,\n            \"SAR\": 13.4082,\n            \"ISR\": 15.5408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6457,\n            \"SIR\": 20.364,\n            \"SAR\": 12.6643,\n            \"ISR\": 15.1095\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9823,\n            \"SIR\": 24.0987,\n            \"SAR\": 17.3399,\n            \"ISR\": 17.7673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.09634,\n            \"SIR\": 18.929,\n            \"SAR\": 5.6633,\n            \"ISR\": 8.69877\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9722,\n            \"SIR\": 28.8237,\n            \"SAR\": 23.2407,\n            \"ISR\": 19.3447\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5304,\n            \"SIR\": 30.7205,\n            \"SAR\": 13.1792,\n            \"ISR\": 16.8838\n          },\n          \"instrumental\": {\n            \"SDR\": 15.421,\n            \"SIR\": 28.6888,\n            \"SAR\": 17.5145,\n            \"ISR\": 18.968\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8973,\n            \"SIR\": 22.9421,\n            \"SAR\": 10.0193,\n            \"ISR\": 11.9112\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7798,\n            \"SIR\": 18.045,\n            \"SAR\": 15.4133,\n            \"ISR\": 17.9321\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.8073,\n            \"SIR\": 33.1974,\n            \"SAR\": 16.5528,\n            \"ISR\": 18.1652\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7075,\n            \"SIR\": 31.9164,\n            \"SAR\": 19.6389,\n            \"ISR\": 19.2246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4156,\n            \"SIR\": 20.7426,\n            \"SAR\": 11.7749,\n            \"ISR\": 15.7576\n          },\n          \"instrumental\": {\n            \"SDR\": 15.374,\n            \"SIR\": 27.5474,\n            \"SAR\": 17.5502,\n            \"ISR\": 17.8933\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.50874,\n            \"SIR\": 19.6127,\n            \"SAR\": 7.99139,\n            \"ISR\": 13.3774\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4521,\n            \"SIR\": 26.1681,\n            \"SAR\": 17.5354,\n            \"ISR\": 17.8234\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.3718,\n            \"SIR\": 18.2818,\n            \"SAR\": 10.6386,\n            \"ISR\": 11.8815\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9009,\n            \"SIR\": 23.6249,\n            \"SAR\": 19.2596,\n            \"ISR\": 17.6046\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.5255,\n            \"SIR\": 16.6584,\n            \"SAR\": 3.18452,\n            \"ISR\": 9.32841\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1102,\n            \"SIR\": 22.438,\n            \"SAR\": 18.449,\n            \"ISR\": 18.5154\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.75365,\n            \"SIR\": 17.091,\n            \"SAR\": 9.25502,\n            \"ISR\": 15.1685\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7034,\n            \"SIR\": 25.3411,\n            \"SAR\": 14.498,\n            \"ISR\": 16.2435\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.56603,\n            \"SIR\": 21.8152,\n            \"SAR\": 8.8896,\n            \"ISR\": 13.7838\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7809,\n            \"SIR\": 25.0254,\n            \"SAR\": 17.011,\n            \"ISR\": 18.3994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -14.5722,\n            \"SIR\": -36.5946,\n            \"SAR\": 0.593805,\n            \"ISR\": 11.7319\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1159,\n            \"SIR\": 57.9643,\n            \"SAR\": 13.1717,\n            \"ISR\": 12.9007\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.88965,\n            \"SIR\": 14.9505,\n            \"SAR\": 6.61937,\n            \"ISR\": 10.9597\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8256,\n            \"SIR\": 21.9682,\n            \"SAR\": 18.0635,\n            \"ISR\": 17.5503\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1191,\n            \"SIR\": 23.6078,\n            \"SAR\": 11.3035,\n            \"ISR\": 15.071\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6381,\n            \"SIR\": 26.1601,\n            \"SAR\": 17.9077,\n            \"ISR\": 18.6189\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.164,\n            \"SIR\": 22.422,\n            \"SAR\": 10.81,\n            \"ISR\": 15.2089\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2398,\n            \"SIR\": 30.0995,\n            \"SAR\": 20.8514,\n            \"ISR\": 18.9213\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -8.70715,\n            \"SIR\": -5.95472,\n            \"SAR\": -0.59022,\n            \"ISR\": 5.95141\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3823,\n            \"SIR\": 40.1941,\n            \"SAR\": 29.7609,\n            \"ISR\": 19.042\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.16956,\n            \"SIR\": 13.8463,\n            \"SAR\": 3.90974,\n            \"ISR\": 6.76324\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6182,\n            \"SIR\": 14.5028,\n            \"SAR\": 14.1692,\n            \"ISR\": 16.7777\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.19084,\n            \"SIR\": 16.7965,\n            \"SAR\": 6.73845,\n            \"ISR\": 12.5383\n          },\n          \"instrumental\": {\n            \"SDR\": 17.522,\n            \"SIR\": 28.6215,\n            \"SAR\": 20.1242,\n            \"ISR\": 18.7572\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79317,\n            \"SIR\": 18.5001,\n            \"SAR\": 8.07296,\n            \"ISR\": 11.7553\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4986,\n            \"SIR\": 22.6228,\n            \"SAR\": 17.2989,\n            \"ISR\": 18.1715\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.80873,\n            \"SIR\": 17.9825,\n            \"SAR\": 9.12794,\n            \"ISR\": 14.3872\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5337,\n            \"SIR\": 25.9077,\n            \"SAR\": 17.2248,\n            \"ISR\": 17.6591\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7416,\n            \"SIR\": 24.4657,\n            \"SAR\": 12.3387,\n            \"ISR\": 14.8444\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1758,\n            \"SIR\": 21.8779,\n            \"SAR\": 14.4976,\n            \"ISR\": 17.5441\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.51058,\n            \"SIR\": 12.0636,\n            \"SAR\": 6.22767,\n            \"ISR\": 11.7694\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5307,\n            \"SIR\": 19.4025,\n            \"SAR\": 13.1216,\n            \"ISR\": 14.2804\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.35679,\n            \"SIR\": 6.33969,\n            \"SAR\": 7.70771,\n            \"ISR\": 15.365\n          },\n          \"instrumental\": {\n            \"SDR\": 12.322,\n            \"SIR\": 24.0991,\n            \"SAR\": 13.0649,\n            \"ISR\": 13.4821\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5943,\n            \"SIR\": 29.7868,\n            \"SAR\": 14.1584,\n            \"ISR\": 15.7227\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8858,\n            \"SIR\": 20.528,\n            \"SAR\": 14.6905,\n            \"ISR\": 18.6634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5352,\n            \"SIR\": 23.9252,\n            \"SAR\": 11.8444,\n            \"ISR\": 15.4427\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2474,\n            \"SIR\": 23.8096,\n            \"SAR\": 15.8224,\n            \"ISR\": 17.6323\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.509,\n            \"SIR\": 27.5619,\n            \"SAR\": 12.1901,\n            \"ISR\": 14.8521\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3269,\n            \"SIR\": 27.0156,\n            \"SAR\": 21.4601,\n            \"ISR\": 19.1336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.67762,\n            \"SIR\": 10.2249,\n            \"SAR\": 4.86761,\n            \"ISR\": 10.3537\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8714,\n            \"SIR\": 19.4776,\n            \"SAR\": 12.4648,\n            \"ISR\": 15.9558\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1793,\n            \"SIR\": 27.2447,\n            \"SAR\": 14.1908,\n            \"ISR\": 17.7531\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7921,\n            \"SIR\": 35.9954,\n            \"SAR\": 24.9087,\n            \"ISR\": 19.4532\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9152,\n            \"SIR\": 20.5906,\n            \"SAR\": 12.3009,\n            \"ISR\": 16.1932\n          },\n          \"instrumental\": {\n            \"SDR\": 11.493,\n            \"SIR\": 22.5364,\n            \"SAR\": 12.4442,\n            \"ISR\": 15.316\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42637,\n            \"SIR\": 23.7342,\n            \"SAR\": 8.6461,\n            \"ISR\": 13.6363\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9191,\n            \"SIR\": 22.7885,\n            \"SAR\": 15.5622,\n            \"ISR\": 18.0473\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.87444,\n            \"SIR\": 6.34672,\n            \"SAR\": 2.03677,\n            \"ISR\": 11.0709\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8324,\n            \"SIR\": 31.4563,\n            \"SAR\": 21.8144,\n            \"ISR\": 17.8679\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1128,\n            \"SIR\": 25.0662,\n            \"SAR\": 10.7028,\n            \"ISR\": 15.7958\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9072,\n            \"SIR\": 30.993,\n            \"SAR\": 20.512,\n            \"ISR\": 18.8226\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4564,\n            \"SIR\": 29.7134,\n            \"SAR\": 11.8202,\n            \"ISR\": 15.539\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9628,\n            \"SIR\": 30.8865,\n            \"SAR\": 22.0774,\n            \"ISR\": 19.4061\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4108,\n            \"SIR\": 28.9683,\n            \"SAR\": 13.9727,\n            \"ISR\": 16.737\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5141,\n            \"SIR\": 26.9344,\n            \"SAR\": 17.3492,\n            \"ISR\": 18.5732\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9662,\n            \"SIR\": 24.1744,\n            \"SAR\": 11.5705,\n            \"ISR\": 15.9568\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1899,\n            \"SIR\": 28.5621,\n            \"SAR\": 19.2757,\n            \"ISR\": 18.7745\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.80873,\n        \"SIR\": 20.364,\n        \"SAR\": 10.0193,\n        \"ISR\": 14.8444\n      },\n      \"instrumental\": {\n        \"SDR\": 15.421,\n        \"SIR\": 25.3411,\n        \"SAR\": 17.3492,\n        \"ISR\": 17.9321\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_HQ_4.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst HQ 4\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.85686,\n            \"SIR\": 19.2837,\n            \"SAR\": 5.7082,\n            \"ISR\": 10.1135\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7591,\n            \"SIR\": 25.0221,\n            \"SAR\": 20.0476,\n            \"ISR\": 19.0067\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.03511,\n            \"SIR\": 12.8602,\n            \"SAR\": 7.41324,\n            \"ISR\": 12.7271\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8894,\n            \"SIR\": 21.7858,\n            \"SAR\": 13.4914,\n            \"ISR\": 14.9551\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3066,\n            \"SIR\": 22.53,\n            \"SAR\": 10.9632,\n            \"ISR\": 15.2201\n          },\n          \"instrumental\": {\n            \"SDR\": 15.666,\n            \"SIR\": 26.8045,\n            \"SAR\": 18.1745,\n            \"ISR\": 18.3658\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.57858,\n            \"SIR\": 5.93343,\n            \"SAR\": 3.30803,\n            \"ISR\": 12.5209\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0782,\n            \"SIR\": 25.2529,\n            \"SAR\": 12.8854,\n            \"ISR\": 13.6342\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3815,\n            \"SIR\": 22.8234,\n            \"SAR\": 13.389,\n            \"ISR\": 16.2417\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8139,\n            \"SIR\": 24.9417,\n            \"SAR\": 15.9235,\n            \"ISR\": 17.2634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2109,\n            \"SIR\": 18.4123,\n            \"SAR\": 11.0992,\n            \"ISR\": 15.0204\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0299,\n            \"SIR\": 22.5333,\n            \"SAR\": 13.5259,\n            \"ISR\": 15.8907\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6508,\n            \"SIR\": 20.5902,\n            \"SAR\": 12.8044,\n            \"ISR\": 15.184\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2053,\n            \"SIR\": 24.4884,\n            \"SAR\": 17.3929,\n            \"ISR\": 17.866\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0413,\n            \"SIR\": 19.1728,\n            \"SAR\": 4.99867,\n            \"ISR\": 8.67852\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1427,\n            \"SIR\": 29.0412,\n            \"SAR\": 23.3745,\n            \"ISR\": 19.5464\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6886,\n            \"SIR\": 29.7956,\n            \"SAR\": 13.432,\n            \"ISR\": 16.8817\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5514,\n            \"SIR\": 28.4638,\n            \"SAR\": 17.5942,\n            \"ISR\": 18.8752\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8416,\n            \"SIR\": 21.9069,\n            \"SAR\": 10.4475,\n            \"ISR\": 12.2967\n          },\n          \"instrumental\": {\n            \"SDR\": 15.918,\n            \"SIR\": 19.0178,\n            \"SAR\": 15.7752,\n            \"ISR\": 17.7321\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9312,\n            \"SIR\": 33.2777,\n            \"SAR\": 16.7505,\n            \"ISR\": 18.3166\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7249,\n            \"SIR\": 32.6605,\n            \"SAR\": 19.9528,\n            \"ISR\": 19.2149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4675,\n            \"SIR\": 22.8829,\n            \"SAR\": 11.6808,\n            \"ISR\": 15.2476\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2594,\n            \"SIR\": 25.7904,\n            \"SAR\": 17.577,\n            \"ISR\": 18.384\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.53051,\n            \"SIR\": 19.3342,\n            \"SAR\": 7.98742,\n            \"ISR\": 13.3761\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6174,\n            \"SIR\": 26.0881,\n            \"SAR\": 17.6295,\n            \"ISR\": 17.8043\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.61938,\n            \"SIR\": 20.1069,\n            \"SAR\": 11.4551,\n            \"ISR\": 11.8547\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0898,\n            \"SIR\": 23.5332,\n            \"SAR\": 20.3613,\n            \"ISR\": 18.1577\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.63943,\n            \"SIR\": 18.9571,\n            \"SAR\": 3.73242,\n            \"ISR\": 9.01454\n          },\n          \"instrumental\": {\n            \"SDR\": 15.401,\n            \"SIR\": 22.1897,\n            \"SAR\": 19.0159,\n            \"ISR\": 18.9372\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49789,\n            \"SIR\": 13.6184,\n            \"SAR\": 8.35197,\n            \"ISR\": 15.1273\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4018,\n            \"SIR\": 24.9895,\n            \"SAR\": 13.1216,\n            \"ISR\": 14.7764\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.72041,\n            \"SIR\": 21.4088,\n            \"SAR\": 8.84061,\n            \"ISR\": 13.9091\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6974,\n            \"SIR\": 25.3201,\n            \"SAR\": 16.9959,\n            \"ISR\": 18.4489\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -18.0316,\n            \"SIR\": -37.3143,\n            \"SAR\": 0.81574,\n            \"ISR\": 12.4039\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2523,\n            \"SIR\": 58.2215,\n            \"SAR\": 12.8487,\n            \"ISR\": 12.4706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.7133,\n            \"SIR\": 14.6428,\n            \"SAR\": 6.34567,\n            \"ISR\": 11.0777\n          },\n          \"instrumental\": {\n            \"SDR\": 16.077,\n            \"SIR\": 22.2259,\n            \"SAR\": 17.8465,\n            \"ISR\": 17.4422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8521,\n            \"SIR\": 24.0705,\n            \"SAR\": 10.8754,\n            \"ISR\": 14.408\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5122,\n            \"SIR\": 24.8167,\n            \"SAR\": 17.4231,\n            \"ISR\": 18.794\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5317,\n            \"SIR\": 22.2285,\n            \"SAR\": 10.785,\n            \"ISR\": 15.7561\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2213,\n            \"SIR\": 31.8874,\n            \"SAR\": 20.5787,\n            \"ISR\": 18.9164\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.5216,\n            \"SIR\": -8.95723,\n            \"SAR\": 0.7458,\n            \"ISR\": 11.0521\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8969,\n            \"SIR\": 41.4806,\n            \"SAR\": 29.4105,\n            \"ISR\": 18.7697\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.12438,\n            \"SIR\": 14.8239,\n            \"SAR\": 3.80795,\n            \"ISR\": 6.61489\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7753,\n            \"SIR\": 14.3432,\n            \"SAR\": 14.2971,\n            \"ISR\": 17.1639\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.63987,\n            \"SIR\": 14.0156,\n            \"SAR\": 6.36177,\n            \"ISR\": 12.6105\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9589,\n            \"SIR\": 28.1824,\n            \"SAR\": 19.4677,\n            \"ISR\": 18.3777\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.74941,\n            \"SIR\": 18.7546,\n            \"SAR\": 8.12369,\n            \"ISR\": 11.8337\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5977,\n            \"SIR\": 22.7382,\n            \"SAR\": 17.3793,\n            \"ISR\": 18.3114\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.83195,\n            \"SIR\": 18.2937,\n            \"SAR\": 9.0916,\n            \"ISR\": 14.4444\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6847,\n            \"SIR\": 26.0526,\n            \"SAR\": 17.2799,\n            \"ISR\": 17.7546\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2303,\n            \"SIR\": 25.9265,\n            \"SAR\": 11.9281,\n            \"ISR\": 13.8312\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4551,\n            \"SIR\": 20.1209,\n            \"SAR\": 14.7372,\n            \"ISR\": 17.9224\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.99514,\n            \"SIR\": 15.9885,\n            \"SAR\": 6.8435,\n            \"ISR\": 11.2652\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0878,\n            \"SIR\": 18.9621,\n            \"SAR\": 14.139,\n            \"ISR\": 16.0843\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.33752,\n            \"SIR\": 5.78628,\n            \"SAR\": 7.50576,\n            \"ISR\": 15.4087\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0919,\n            \"SIR\": 24.0181,\n            \"SAR\": 12.7461,\n            \"ISR\": 13.2081\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4786,\n            \"SIR\": 29.1951,\n            \"SAR\": 14.8428,\n            \"ISR\": 16.5428\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7934,\n            \"SIR\": 21.9521,\n            \"SAR\": 15.1376,\n            \"ISR\": 18.2209\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3325,\n            \"SIR\": 24.7288,\n            \"SAR\": 12.0267,\n            \"ISR\": 15.33\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3362,\n            \"SIR\": 23.976,\n            \"SAR\": 15.7722,\n            \"ISR\": 17.889\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5539,\n            \"SIR\": 26.8458,\n            \"SAR\": 10.1353,\n            \"ISR\": 11.8043\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4648,\n            \"SIR\": 22.5868,\n            \"SAR\": 20.3121,\n            \"ISR\": 19.2231\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.42588,\n            \"SIR\": 9.06719,\n            \"SAR\": 7.24233,\n            \"ISR\": 13.5398\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4037,\n            \"SIR\": 23.0744,\n            \"SAR\": 14.3059,\n            \"ISR\": 14.8809\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3501,\n            \"SIR\": 26.547,\n            \"SAR\": 14.239,\n            \"ISR\": 17.9401\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8806,\n            \"SIR\": 36.7738,\n            \"SAR\": 25.0537,\n            \"ISR\": 19.539\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1366,\n            \"SIR\": 20.3999,\n            \"SAR\": 12.428,\n            \"ISR\": 16.8285\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9251,\n            \"SIR\": 23.7458,\n            \"SAR\": 12.6606,\n            \"ISR\": 15.2826\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.57828,\n            \"SIR\": 24.6253,\n            \"SAR\": 8.77917,\n            \"ISR\": 13.9686\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2182,\n            \"SIR\": 23.2434,\n            \"SAR\": 15.632,\n            \"ISR\": 18.3182\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.92139,\n            \"SIR\": 5.81725,\n            \"SAR\": 2.06675,\n            \"ISR\": 10.8761\n          },\n          \"instrumental\": {\n            \"SDR\": 16.808,\n            \"SIR\": 31.3864,\n            \"SAR\": 20.9209,\n            \"ISR\": 17.8701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2232,\n            \"SIR\": 24.5532,\n            \"SAR\": 10.8848,\n            \"ISR\": 15.8251\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0122,\n            \"SIR\": 30.9548,\n            \"SAR\": 20.7214,\n            \"ISR\": 18.7644\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9001,\n            \"SIR\": 30.4242,\n            \"SAR\": 12.2469,\n            \"ISR\": 15.7596\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1434,\n            \"SIR\": 31.4276,\n            \"SAR\": 22.3572,\n            \"ISR\": 19.5462\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9875,\n            \"SIR\": 29.7121,\n            \"SAR\": 14.0536,\n            \"ISR\": 16.4766\n          },\n          \"instrumental\": {\n            \"SDR\": 15.728,\n            \"SIR\": 25.7527,\n            \"SAR\": 17.5914,\n            \"ISR\": 18.9239\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2761,\n            \"SIR\": 22.9858,\n            \"SAR\": 11.991,\n            \"ISR\": 16.9225\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2658,\n            \"SIR\": 30.6573,\n            \"SAR\": 19.3022,\n            \"ISR\": 18.6899\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.83195,\n        \"SIR\": 20.3999,\n        \"SAR\": 10.1353,\n        \"ISR\": 13.9686\n      },\n      \"instrumental\": {\n        \"SDR\": 15.5122,\n        \"SIR\": 24.9895,\n        \"SAR\": 17.4231,\n        \"ISR\": 18.2209\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR_MDXNET_Main.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Main\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.63812,\n            \"SIR\": 17.4406,\n            \"SAR\": 5.78804,\n            \"ISR\": 10.1273\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5084,\n            \"SIR\": 24.7649,\n            \"SAR\": 19.8021,\n            \"ISR\": 18.7235\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.78169,\n            \"SIR\": 8.41188,\n            \"SAR\": 6.9639,\n            \"ISR\": 14.1907\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2124,\n            \"SIR\": 24.1852,\n            \"SAR\": 12.1118,\n            \"ISR\": 12.495\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5073,\n            \"SIR\": 22.5596,\n            \"SAR\": 11.1205,\n            \"ISR\": 15.3204\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8829,\n            \"SIR\": 26.9528,\n            \"SAR\": 18.5398,\n            \"ISR\": 18.3642\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.71298,\n            \"SIR\": 4.55526,\n            \"SAR\": 4.41841,\n            \"ISR\": 13.5779\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2904,\n            \"SIR\": 28.2159,\n            \"SAR\": 12.7908,\n            \"ISR\": 12.5596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9046,\n            \"SIR\": 23.3772,\n            \"SAR\": 13.9639,\n            \"ISR\": 16.5199\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7488,\n            \"SIR\": 25.3611,\n            \"SAR\": 16.1389,\n            \"ISR\": 17.3698\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.265,\n            \"SIR\": 18.319,\n            \"SAR\": 11.1117,\n            \"ISR\": 15.2755\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0067,\n            \"SIR\": 22.8264,\n            \"SAR\": 13.539,\n            \"ISR\": 15.7601\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9164,\n            \"SIR\": 21.0729,\n            \"SAR\": 12.941,\n            \"ISR\": 15.6019\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1779,\n            \"SIR\": 25.012,\n            \"SAR\": 17.2753,\n            \"ISR\": 17.9222\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.56513,\n            \"SIR\": 19.403,\n            \"SAR\": 5.66866,\n            \"ISR\": 8.73723\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1455,\n            \"SIR\": 28.4962,\n            \"SAR\": 23.888,\n            \"ISR\": 19.4103\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6332,\n            \"SIR\": 33.0259,\n            \"SAR\": 13.572,\n            \"ISR\": 16.4998\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7138,\n            \"SIR\": 26.9176,\n            \"SAR\": 18.0663,\n            \"ISR\": 19.1853\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5077,\n            \"SIR\": 21.3703,\n            \"SAR\": 11.0324,\n            \"ISR\": 12.7267\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0667,\n            \"SIR\": 20.108,\n            \"SAR\": 16.2678,\n            \"ISR\": 17.8215\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6795,\n            \"SIR\": 29.4574,\n            \"SAR\": 16.584,\n            \"ISR\": 18.067\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5213,\n            \"SIR\": 30.0026,\n            \"SAR\": 19.558,\n            \"ISR\": 18.9745\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5185,\n            \"SIR\": 21.8661,\n            \"SAR\": 11.8857,\n            \"ISR\": 15.3888\n          },\n          \"instrumental\": {\n            \"SDR\": 15.46,\n            \"SIR\": 26.1003,\n            \"SAR\": 17.7594,\n            \"ISR\": 18.21\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.02388,\n            \"SIR\": 15.9155,\n            \"SAR\": 7.54485,\n            \"ISR\": 13.6946\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3277,\n            \"SIR\": 26.8306,\n            \"SAR\": 17.273,\n            \"ISR\": 17.1336\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.64116,\n            \"SIR\": 14.8488,\n            \"SAR\": 9.28946,\n            \"ISR\": 13.3597\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9489,\n            \"SIR\": 25.3607,\n            \"SAR\": 19.6784,\n            \"ISR\": 17.5423\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.69813,\n            \"SIR\": 14.914,\n            \"SAR\": 3.4676,\n            \"ISR\": 9.62061\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9204,\n            \"SIR\": 22.9414,\n            \"SAR\": 17.844,\n            \"ISR\": 18.2578\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0159,\n            \"SIR\": 21.2854,\n            \"SAR\": 10.7858,\n            \"ISR\": 15.3422\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5904,\n            \"SIR\": 25.8051,\n            \"SAR\": 16.6506,\n            \"ISR\": 17.8355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.19827,\n            \"SIR\": 25.6044,\n            \"SAR\": 9.59577,\n            \"ISR\": 13.9902\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4164,\n            \"SIR\": 25.4591,\n            \"SAR\": 17.8248,\n            \"ISR\": 18.869\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -27.8978,\n            \"SIR\": -41.8016,\n            \"SAR\": 0.566545,\n            \"ISR\": 11.1463\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1163,\n            \"SIR\": 56.4843,\n            \"SAR\": 9.96975,\n            \"ISR\": 10.2864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.27639,\n            \"SIR\": 15.3395,\n            \"SAR\": 7.21208,\n            \"ISR\": 11.3865\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4489,\n            \"SIR\": 22.5843,\n            \"SAR\": 18.7069,\n            \"ISR\": 17.6203\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2641,\n            \"SIR\": 24.0605,\n            \"SAR\": 11.4962,\n            \"ISR\": 15.0843\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8652,\n            \"SIR\": 26.0981,\n            \"SAR\": 17.7729,\n            \"ISR\": 18.606\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4867,\n            \"SIR\": 21.5034,\n            \"SAR\": 11.5393,\n            \"ISR\": 15.8784\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6973,\n            \"SIR\": 32.061,\n            \"SAR\": 21.0973,\n            \"ISR\": 18.8315\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.64666,\n            \"SIR\": 2.38322,\n            \"SAR\": -0.12859,\n            \"ISR\": 3.48881\n          },\n          \"instrumental\": {\n            \"SDR\": 19.799,\n            \"SIR\": 40.2857,\n            \"SAR\": 33.1708,\n            \"ISR\": 19.3255\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.10689,\n            \"SIR\": 13.9211,\n            \"SAR\": 4.0246,\n            \"ISR\": 6.82979\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7049,\n            \"SIR\": 14.6111,\n            \"SAR\": 14.3182,\n            \"ISR\": 16.8809\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.44271,\n            \"SIR\": 16.4442,\n            \"SAR\": 6.53233,\n            \"ISR\": 12.9763\n          },\n          \"instrumental\": {\n            \"SDR\": 17.42,\n            \"SIR\": 29.5251,\n            \"SAR\": 20.5387,\n            \"ISR\": 18.6357\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.75382,\n            \"SIR\": 18.4517,\n            \"SAR\": 8.14677,\n            \"ISR\": 11.9735\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5384,\n            \"SIR\": 22.877,\n            \"SAR\": 17.2659,\n            \"ISR\": 18.1892\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.89459,\n            \"SIR\": 16.6335,\n            \"SAR\": 9.11425,\n            \"ISR\": 14.871\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4098,\n            \"SIR\": 26.6714,\n            \"SAR\": 16.9654,\n            \"ISR\": 17.1709\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4097,\n            \"SIR\": 23.1646,\n            \"SAR\": 12.1292,\n            \"ISR\": 15.1546\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6668,\n            \"SIR\": 22.5248,\n            \"SAR\": 14.1071,\n            \"ISR\": 17.1133\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.78855,\n            \"SIR\": 8.85361,\n            \"SAR\": 6.28691,\n            \"ISR\": 13.269\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4153,\n            \"SIR\": 21.2661,\n            \"SAR\": 11.9407,\n            \"ISR\": 12.4188\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6384,\n            \"SIR\": 24.7938,\n            \"SAR\": 11.3375,\n            \"ISR\": 15.375\n          },\n          \"instrumental\": {\n            \"SDR\": 14.718,\n            \"SIR\": 24.871,\n            \"SAR\": 17.1602,\n            \"ISR\": 17.7614\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9538,\n            \"SIR\": 30.8104,\n            \"SAR\": 14.7963,\n            \"ISR\": 16.1084\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2409,\n            \"SIR\": 20.8077,\n            \"SAR\": 15.0561,\n            \"ISR\": 18.8809\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6015,\n            \"SIR\": 24.172,\n            \"SAR\": 12.2026,\n            \"ISR\": 15.511\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5572,\n            \"SIR\": 23.9537,\n            \"SAR\": 15.9465,\n            \"ISR\": 17.6982\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.989,\n            \"SIR\": 27.7185,\n            \"SAR\": 12.1442,\n            \"ISR\": 14.3559\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5362,\n            \"SIR\": 25.9949,\n            \"SAR\": 21.6727,\n            \"ISR\": 19.4336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.50263,\n            \"SIR\": 11.4276,\n            \"SAR\": 6.23284,\n            \"ISR\": 8.55276\n          },\n          \"instrumental\": {\n            \"SDR\": 11.593,\n            \"SIR\": 16.1367,\n            \"SAR\": 14.2317,\n            \"ISR\": 16.4947\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2171,\n            \"SIR\": 27.224,\n            \"SAR\": 14.2712,\n            \"ISR\": 18.0377\n          },\n          \"instrumental\": {\n            \"SDR\": 18.694,\n            \"SIR\": 36.9321,\n            \"SAR\": 24.915,\n            \"ISR\": 19.367\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0518,\n            \"SIR\": 19.9184,\n            \"SAR\": 12.486,\n            \"ISR\": 17.0362\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9608,\n            \"SIR\": 23.5955,\n            \"SAR\": 12.6314,\n            \"ISR\": 14.8994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.43834,\n            \"SIR\": 17.9349,\n            \"SAR\": 8.69114,\n            \"ISR\": 13.8892\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3286,\n            \"SIR\": 22.8714,\n            \"SAR\": 14.7663,\n            \"ISR\": 16.5784\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.65212,\n            \"SIR\": 5.92334,\n            \"SAR\": 2.19368,\n            \"ISR\": 11.6722\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2181,\n            \"SIR\": 31.5719,\n            \"SAR\": 20.9956,\n            \"ISR\": 17.8368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1737,\n            \"SIR\": 24.64,\n            \"SAR\": 10.9486,\n            \"ISR\": 15.794\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9259,\n            \"SIR\": 31.0506,\n            \"SAR\": 20.5252,\n            \"ISR\": 18.8426\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7848,\n            \"SIR\": 29.6406,\n            \"SAR\": 12.5111,\n            \"ISR\": 15.8471\n          },\n          \"instrumental\": {\n            \"SDR\": 17.989,\n            \"SIR\": 32.2081,\n            \"SAR\": 22.5874,\n            \"ISR\": 19.3716\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6205,\n            \"SIR\": 27.9609,\n            \"SAR\": 13.8643,\n            \"ISR\": 14.975\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1636,\n            \"SIR\": 22.729,\n            \"SAR\": 17.4276,\n            \"ISR\": 18.4887\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.191,\n            \"SIR\": 28.8054,\n            \"SAR\": 12.1663,\n            \"ISR\": 14.8448\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5545,\n            \"SIR\": 25.6316,\n            \"SAR\": 20.3024,\n            \"ISR\": 19.1274\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1737,\n        \"SIR\": 21.0729,\n        \"SAR\": 10.9486,\n        \"ISR\": 14.8448\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4098,\n        \"SIR\": 25.4591,\n        \"SAR\": 17.4276,\n        \"ISR\": 17.9222\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET-Inst_Main.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst Main\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.94817,\n            \"SIR\": 18.4295,\n            \"SAR\": 5.567,\n            \"ISR\": 9.73235\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4933,\n            \"SIR\": 24.3046,\n            \"SAR\": 19.8225,\n            \"ISR\": 18.7949\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.44254,\n            \"SIR\": 11.307,\n            \"SAR\": 7.01952,\n            \"ISR\": 12.9618\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3824,\n            \"SIR\": 22.155,\n            \"SAR\": 13.1415,\n            \"ISR\": 14.1184\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.83751,\n            \"SIR\": 21.7799,\n            \"SAR\": 10.5705,\n            \"ISR\": 14.8474\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4183,\n            \"SIR\": 26.1371,\n            \"SAR\": 17.9677,\n            \"ISR\": 18.159\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.13838,\n            \"SIR\": 6.23754,\n            \"SAR\": 4.39782,\n            \"ISR\": 13.4657\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0344,\n            \"SIR\": 26.3402,\n            \"SAR\": 13.2529,\n            \"ISR\": 13.5952\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0052,\n            \"SIR\": 22.5176,\n            \"SAR\": 13.0016,\n            \"ISR\": 15.9124\n          },\n          \"instrumental\": {\n            \"SDR\": 14.293,\n            \"SIR\": 23.9437,\n            \"SAR\": 15.6473,\n            \"ISR\": 17.1701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.93694,\n            \"SIR\": 17.7793,\n            \"SAR\": 10.8321,\n            \"ISR\": 14.6172\n          },\n          \"instrumental\": {\n            \"SDR\": 11.783,\n            \"SIR\": 21.7133,\n            \"SAR\": 13.3151,\n            \"ISR\": 15.6105\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1951,\n            \"SIR\": 20.0372,\n            \"SAR\": 12.4587,\n            \"ISR\": 14.8614\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8485,\n            \"SIR\": 23.6532,\n            \"SAR\": 16.9541,\n            \"ISR\": 17.6318\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.98997,\n            \"SIR\": 18.0469,\n            \"SAR\": 5.36005,\n            \"ISR\": 7.63234\n          },\n          \"instrumental\": {\n            \"SDR\": 17.872,\n            \"SIR\": 27.8452,\n            \"SAR\": 23.1905,\n            \"ISR\": 19.3623\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0163,\n            \"SIR\": 28.9332,\n            \"SAR\": 12.7255,\n            \"ISR\": 16.5519\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1754,\n            \"SIR\": 27.7116,\n            \"SAR\": 17.1829,\n            \"ISR\": 18.8185\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2611,\n            \"SIR\": 21.165,\n            \"SAR\": 9.96126,\n            \"ISR\": 12.0396\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5034,\n            \"SIR\": 18.4535,\n            \"SAR\": 15.2435,\n            \"ISR\": 17.6421\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.0137,\n            \"SIR\": 30.5619,\n            \"SAR\": 15.4912,\n            \"ISR\": 17.8524\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3475,\n            \"SIR\": 31.2408,\n            \"SAR\": 18.919,\n            \"ISR\": 18.957\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9797,\n            \"SIR\": 21.5805,\n            \"SAR\": 11.0086,\n            \"ISR\": 14.4182\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0596,\n            \"SIR\": 24.236,\n            \"SAR\": 17.2517,\n            \"ISR\": 18.1161\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.51026,\n            \"SIR\": 18.7185,\n            \"SAR\": 7.9299,\n            \"ISR\": 13.2187\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2734,\n            \"SIR\": 25.8918,\n            \"SAR\": 17.3458,\n            \"ISR\": 17.5253\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.6218,\n            \"SIR\": 21.2985,\n            \"SAR\": 11.1056,\n            \"ISR\": 11.8519\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0302,\n            \"SIR\": 23.4826,\n            \"SAR\": 20.3927,\n            \"ISR\": 18.1433\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.29868,\n            \"SIR\": 16.3626,\n            \"SAR\": 3.11345,\n            \"ISR\": 9.16623\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0742,\n            \"SIR\": 22.1344,\n            \"SAR\": 18.501,\n            \"ISR\": 18.5361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.60394,\n            \"SIR\": 13.5139,\n            \"SAR\": 8.35486,\n            \"ISR\": 14.8775\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3918,\n            \"SIR\": 24.5422,\n            \"SAR\": 13.1835,\n            \"ISR\": 14.7393\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.11683,\n            \"SIR\": 14.1441,\n            \"SAR\": 7.58347,\n            \"ISR\": 13.3017\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1561,\n            \"SIR\": 24.1872,\n            \"SAR\": 15.811,\n            \"ISR\": 17.3148\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.8464,\n            \"SIR\": -37.1858,\n            \"SAR\": 0.398175,\n            \"ISR\": 11.1309\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3575,\n            \"SIR\": 57.2828,\n            \"SAR\": 12.3725,\n            \"ISR\": 12.3642\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.56878,\n            \"SIR\": 12.8498,\n            \"SAR\": 6.47542,\n            \"ISR\": 11.022\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7124,\n            \"SIR\": 22.042,\n            \"SAR\": 17.6579,\n            \"ISR\": 16.7143\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5075,\n            \"SIR\": 23.2769,\n            \"SAR\": 10.979,\n            \"ISR\": 14.6454\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4744,\n            \"SIR\": 25.1678,\n            \"SAR\": 17.468,\n            \"ISR\": 18.5844\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.82938,\n            \"SIR\": 20.7475,\n            \"SAR\": 10.0275,\n            \"ISR\": 15.4137\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7736,\n            \"SIR\": 31.0207,\n            \"SAR\": 19.9806,\n            \"ISR\": 18.5768\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.0613,\n            \"SIR\": -10.0315,\n            \"SAR\": 0.54093,\n            \"ISR\": 12.6948\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0215,\n            \"SIR\": 43.3561,\n            \"SAR\": 28.0075,\n            \"ISR\": 18.7734\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.96575,\n            \"SIR\": 14.002,\n            \"SAR\": 3.85457,\n            \"ISR\": 6.55322\n          },\n          \"instrumental\": {\n            \"SDR\": 10.622,\n            \"SIR\": 14.1964,\n            \"SAR\": 14.4012,\n            \"ISR\": 16.903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.79443,\n            \"SIR\": 13.1626,\n            \"SAR\": 6.19672,\n            \"ISR\": 12.5102\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9476,\n            \"SIR\": 27.5731,\n            \"SAR\": 18.1376,\n            \"ISR\": 17.9342\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.5173,\n            \"SIR\": 18.5142,\n            \"SAR\": 7.95809,\n            \"ISR\": 11.4949\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4291,\n            \"SIR\": 22.0731,\n            \"SAR\": 17.249,\n            \"ISR\": 18.1816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.38432,\n            \"SIR\": 16.2402,\n            \"SAR\": 8.84748,\n            \"ISR\": 14.0807\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3516,\n            \"SIR\": 25.3745,\n            \"SAR\": 16.6057,\n            \"ISR\": 17.2088\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3867,\n            \"SIR\": 23.1903,\n            \"SAR\": 11.8005,\n            \"ISR\": 14.4095\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1207,\n            \"SIR\": 21.1422,\n            \"SAR\": 14.3661,\n            \"ISR\": 17.2945\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.37351,\n            \"SIR\": 14.9461,\n            \"SAR\": 6.3616,\n            \"ISR\": 10.737\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5155,\n            \"SIR\": 18.2638,\n            \"SAR\": 13.6869,\n            \"ISR\": 15.9633\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.32822,\n            \"SIR\": 5.71922,\n            \"SAR\": 7.41367,\n            \"ISR\": 15.1822\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5277,\n            \"SIR\": 23.6525,\n            \"SAR\": 12.5809,\n            \"ISR\": 12.9325\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3455,\n            \"SIR\": 28.0889,\n            \"SAR\": 14.3366,\n            \"ISR\": 15.8705\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6575,\n            \"SIR\": 21.0296,\n            \"SAR\": 14.7077,\n            \"ISR\": 18.3151\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1501,\n            \"SIR\": 22.6225,\n            \"SAR\": 11.4753,\n            \"ISR\": 15.102\n          },\n          \"instrumental\": {\n            \"SDR\": 13.949,\n            \"SIR\": 23.1857,\n            \"SAR\": 15.5571,\n            \"ISR\": 17.3748\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8236,\n            \"SIR\": 22.4511,\n            \"SAR\": 10.372,\n            \"ISR\": 13.1647\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5036,\n            \"SIR\": 24.1054,\n            \"SAR\": 19.9808,\n            \"ISR\": 18.825\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.55278,\n            \"SIR\": 8.52573,\n            \"SAR\": 4.5857,\n            \"ISR\": 11.3608\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4593,\n            \"SIR\": 18.7148,\n            \"SAR\": 11.9618,\n            \"ISR\": 15.1649\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7013,\n            \"SIR\": 26.9096,\n            \"SAR\": 13.8912,\n            \"ISR\": 17.6875\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6811,\n            \"SIR\": 35.6342,\n            \"SAR\": 24.4198,\n            \"ISR\": 19.3739\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8194,\n            \"SIR\": 19.6726,\n            \"SAR\": 12.0739,\n            \"ISR\": 16.3173\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4961,\n            \"SIR\": 22.7819,\n            \"SAR\": 12.195,\n            \"ISR\": 14.8435\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04211,\n            \"SIR\": 21.9067,\n            \"SAR\": 8.29973,\n            \"ISR\": 13.4076\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7242,\n            \"SIR\": 22.2973,\n            \"SAR\": 14.9778,\n            \"ISR\": 17.6368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.36317,\n            \"SIR\": 1.63646,\n            \"SAR\": 1.95397,\n            \"ISR\": 10.4013\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5638,\n            \"SIR\": 30.6576,\n            \"SAR\": 21.5886,\n            \"ISR\": 17.902\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0111,\n            \"SIR\": 25.8673,\n            \"SAR\": 10.5544,\n            \"ISR\": 15.3824\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9712,\n            \"SIR\": 29.9745,\n            \"SAR\": 20.4231,\n            \"ISR\": 18.9464\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1207,\n            \"SIR\": 28.6679,\n            \"SAR\": 11.38,\n            \"ISR\": 15.5418\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7887,\n            \"SIR\": 30.805,\n            \"SAR\": 21.7902,\n            \"ISR\": 19.2728\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9362,\n            \"SIR\": 27.6798,\n            \"SAR\": 13.2721,\n            \"ISR\": 16.2644\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4658,\n            \"SIR\": 25.3853,\n            \"SAR\": 17.2575,\n            \"ISR\": 18.4928\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.50306,\n        \"SIR\": 19.1955,\n        \"SAR\": 9.40437,\n        \"ISR\": 13.7732\n      },\n      \"instrumental\": {\n        \"SDR\": 15.1248,\n        \"SIR\": 24.2116,\n        \"SAR\": 17.2159,\n        \"ISR\": 17.7721\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR_MDXNET_1_9703.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET 1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.5934,\n            \"SIR\": 17.0409,\n            \"SAR\": 5.26672,\n            \"ISR\": 10.3872\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4668,\n            \"SIR\": 24.9046,\n            \"SAR\": 19.569,\n            \"ISR\": 18.5774\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.22266,\n            \"SIR\": 10.9764,\n            \"SAR\": 6.91905,\n            \"ISR\": 13.215\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8234,\n            \"SIR\": 22.5692,\n            \"SAR\": 12.482,\n            \"ISR\": 13.8857\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.95372,\n            \"SIR\": 20.6799,\n            \"SAR\": 10.4783,\n            \"ISR\": 14.9469\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4007,\n            \"SIR\": 25.9509,\n            \"SAR\": 17.9162,\n            \"ISR\": 18.0109\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.9839,\n            \"SIR\": 5.33878,\n            \"SAR\": 4.41261,\n            \"ISR\": 12.9296\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6865,\n            \"SIR\": 26.9074,\n            \"SAR\": 13.0336,\n            \"ISR\": 13.1403\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2387,\n            \"SIR\": 21.8043,\n            \"SAR\": 13.1484,\n            \"ISR\": 16.3089\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0073,\n            \"SIR\": 24.1939,\n            \"SAR\": 15.3916,\n            \"ISR\": 16.8745\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.86591,\n            \"SIR\": 17.0423,\n            \"SAR\": 10.7696,\n            \"ISR\": 15.0679\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4418,\n            \"SIR\": 22.4582,\n            \"SAR\": 13.0084,\n            \"ISR\": 15.2095\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1776,\n            \"SIR\": 19.629,\n            \"SAR\": 12.3615,\n            \"ISR\": 14.9202\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6558,\n            \"SIR\": 23.3615,\n            \"SAR\": 16.9709,\n            \"ISR\": 17.5965\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.89478,\n            \"SIR\": 18.4425,\n            \"SAR\": 5.77097,\n            \"ISR\": 7.99601\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9592,\n            \"SIR\": 28.4961,\n            \"SAR\": 23.8189,\n            \"ISR\": 19.3995\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0085,\n            \"SIR\": 30.7608,\n            \"SAR\": 12.7072,\n            \"ISR\": 16.5786\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0111,\n            \"SIR\": 26.8237,\n            \"SAR\": 17.0764,\n            \"ISR\": 18.9015\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7102,\n            \"SIR\": 21.1019,\n            \"SAR\": 9.93077,\n            \"ISR\": 12.2174\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2329,\n            \"SIR\": 19.0691,\n            \"SAR\": 15.0804,\n            \"ISR\": 17.7091\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.2312,\n            \"SIR\": 29.9449,\n            \"SAR\": 15.3861,\n            \"ISR\": 18.2236\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0657,\n            \"SIR\": 29.5551,\n            \"SAR\": 18.3979,\n            \"ISR\": 18.9734\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4058,\n            \"SIR\": 20.6741,\n            \"SAR\": 11.4405,\n            \"ISR\": 15.4073\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1063,\n            \"SIR\": 26.2659,\n            \"SAR\": 17.4505,\n            \"ISR\": 17.9396\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.43656,\n            \"SIR\": 18.4789,\n            \"SAR\": 7.89525,\n            \"ISR\": 13.5155\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0072,\n            \"SIR\": 26.4376,\n            \"SAR\": 17.115,\n            \"ISR\": 17.4657\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.50133,\n            \"SIR\": 14.5742,\n            \"SAR\": 7.65376,\n            \"ISR\": 12.5377\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2975,\n            \"SIR\": 24.688,\n            \"SAR\": 18.5047,\n            \"ISR\": 17.3685\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.86693,\n            \"SIR\": 14.7055,\n            \"SAR\": 2.87049,\n            \"ISR\": 9.49421\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7684,\n            \"SIR\": 22.6585,\n            \"SAR\": 17.8551,\n            \"ISR\": 18.1329\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.56312,\n            \"SIR\": 19.8559,\n            \"SAR\": 10.1818,\n            \"ISR\": 15.221\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8904,\n            \"SIR\": 25.4695,\n            \"SAR\": 15.8216,\n            \"ISR\": 17.3503\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.44802,\n            \"SIR\": 24.1273,\n            \"SAR\": 8.76505,\n            \"ISR\": 13.2209\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9115,\n            \"SIR\": 24.0172,\n            \"SAR\": 17.327,\n            \"ISR\": 18.6788\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.84,\n            \"SIR\": -36.8657,\n            \"SAR\": 0.59314,\n            \"ISR\": 6.84835\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1057,\n            \"SIR\": 55.9667,\n            \"SAR\": 12.286,\n            \"ISR\": 12.3295\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.05944,\n            \"SIR\": 15.1203,\n            \"SAR\": 6.57126,\n            \"ISR\": 10.6478\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1087,\n            \"SIR\": 21.6227,\n            \"SAR\": 17.8489,\n            \"ISR\": 17.6373\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6845,\n            \"SIR\": 22.558,\n            \"SAR\": 10.5986,\n            \"ISR\": 14.6012\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1599,\n            \"SIR\": 25.2379,\n            \"SAR\": 17.0471,\n            \"ISR\": 18.4966\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4205,\n            \"SIR\": 22.0125,\n            \"SAR\": 11.1156,\n            \"ISR\": 16.0842\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3912,\n            \"SIR\": 31.7735,\n            \"SAR\": 20.7251,\n            \"ISR\": 18.859\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.72346,\n            \"SIR\": -2.07454,\n            \"SAR\": -0.24123,\n            \"ISR\": 3.27877\n          },\n          \"instrumental\": {\n            \"SDR\": 19.6486,\n            \"SIR\": 40.0979,\n            \"SAR\": 28.9246,\n            \"ISR\": 18.7723\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.09011,\n            \"SIR\": 14.5775,\n            \"SAR\": 3.86788,\n            \"ISR\": 6.45496\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6832,\n            \"SIR\": 14.1482,\n            \"SAR\": 14.3912,\n            \"ISR\": 17.1427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.99333,\n            \"SIR\": 16.3305,\n            \"SAR\": 6.49331,\n            \"ISR\": 12.3837\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2959,\n            \"SIR\": 28.9575,\n            \"SAR\": 20.1172,\n            \"ISR\": 18.6111\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49893,\n            \"SIR\": 17.7827,\n            \"SAR\": 7.80719,\n            \"ISR\": 11.6089\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3092,\n            \"SIR\": 22.2661,\n            \"SAR\": 16.9712,\n            \"ISR\": 18.0339\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.26891,\n            \"SIR\": 15.2998,\n            \"SAR\": 8.06571,\n            \"ISR\": 14.6527\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1306,\n            \"SIR\": 25.9669,\n            \"SAR\": 16.2078,\n            \"ISR\": 16.8153\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3262,\n            \"SIR\": 23.0929,\n            \"SAR\": 11.5833,\n            \"ISR\": 14.2597\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6203,\n            \"SIR\": 20.5347,\n            \"SAR\": 13.7306,\n            \"ISR\": 17.0843\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.02312,\n            \"SIR\": 12.6744,\n            \"SAR\": 6.03027,\n            \"ISR\": 11.9248\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7363,\n            \"SIR\": 19.9188,\n            \"SAR\": 13.0685,\n            \"ISR\": 14.5742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.57363,\n            \"SIR\": 22.0864,\n            \"SAR\": 10.2347,\n            \"ISR\": 14.888\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4283,\n            \"SIR\": 23.9455,\n            \"SAR\": 16.4424,\n            \"ISR\": 17.5327\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6595,\n            \"SIR\": 28.888,\n            \"SAR\": 13.8634,\n            \"ISR\": 15.8432\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4831,\n            \"SIR\": 20.0349,\n            \"SAR\": 14.3985,\n            \"ISR\": 18.5502\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1304,\n            \"SIR\": 22.6742,\n            \"SAR\": 11.3828,\n            \"ISR\": 15.4408\n          },\n          \"instrumental\": {\n            \"SDR\": 13.99,\n            \"SIR\": 23.216,\n            \"SAR\": 15.3781,\n            \"ISR\": 17.2512\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5453,\n            \"SIR\": 30.8469,\n            \"SAR\": 11.9946,\n            \"ISR\": 14.0146\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0591,\n            \"SIR\": 25.028,\n            \"SAR\": 21.0303,\n            \"ISR\": 19.5824\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.73583,\n            \"SIR\": 9.23844,\n            \"SAR\": 8.12944,\n            \"ISR\": 9.82866\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9026,\n            \"SIR\": 17.1661,\n            \"SAR\": 15.3949,\n            \"ISR\": 15.1906\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4777,\n            \"SIR\": 25.5473,\n            \"SAR\": 13.1611,\n            \"ISR\": 18.1669\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4218,\n            \"SIR\": 36.6375,\n            \"SAR\": 23.7677,\n            \"ISR\": 19.2077\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7988,\n            \"SIR\": 19.27,\n            \"SAR\": 12.0695,\n            \"ISR\": 17.2874\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4229,\n            \"SIR\": 24.3946,\n            \"SAR\": 12.033,\n            \"ISR\": 14.5602\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.11099,\n            \"SIR\": 20.0078,\n            \"SAR\": 8.18194,\n            \"ISR\": 13.66\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1556,\n            \"SIR\": 22.6396,\n            \"SAR\": 14.4493,\n            \"ISR\": 17.1903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.93495,\n            \"SIR\": 7.37853,\n            \"SAR\": 1.60049,\n            \"ISR\": 9.89124\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4946,\n            \"SIR\": 31.2868,\n            \"SAR\": 21.2932,\n            \"ISR\": 17.637\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.88147,\n            \"SIR\": 24.5079,\n            \"SAR\": 10.3901,\n            \"ISR\": 15.9481\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8393,\n            \"SIR\": 31.5393,\n            \"SAR\": 20.1513,\n            \"ISR\": 18.8433\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0664,\n            \"SIR\": 27.3351,\n            \"SAR\": 11.0669,\n            \"ISR\": 15.6981\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5943,\n            \"SIR\": 31.4723,\n            \"SAR\": 21.3877,\n            \"ISR\": 19.1606\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.087,\n            \"SIR\": 28.4364,\n            \"SAR\": 12.4375,\n            \"ISR\": 13.7448\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1102,\n            \"SIR\": 20.1301,\n            \"SAR\": 16.0361,\n            \"ISR\": 18.5372\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.56837,\n        \"SIR\": 19.7424,\n        \"SAR\": 10.0563,\n        \"ISR\": 13.8797\n      },\n      \"instrumental\": {\n        \"SDR\": 14.9594,\n        \"SIR\": 24.7963,\n        \"SAR\": 17.0091,\n        \"ISR\": 17.6732\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR_MDXNET_2_9682.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.46453,\n            \"SIR\": 17.675,\n            \"SAR\": 4.99802,\n            \"ISR\": 9.81412\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3486,\n            \"SIR\": 24.0614,\n            \"SAR\": 19.3918,\n            \"ISR\": 18.7301\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.40079,\n            \"SIR\": 12.6505,\n            \"SAR\": 6.87446,\n            \"ISR\": 12.1831\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0032,\n            \"SIR\": 20.9133,\n            \"SAR\": 12.8517,\n            \"ISR\": 14.9143\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.60884,\n            \"SIR\": 21.165,\n            \"SAR\": 10.2813,\n            \"ISR\": 14.3951\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2635,\n            \"SIR\": 25.1061,\n            \"SAR\": 17.8489,\n            \"ISR\": 18.1594\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.90264,\n            \"SIR\": 6.28893,\n            \"SAR\": 3.11339,\n            \"ISR\": 11.3278\n          },\n          \"instrumental\": {\n            \"SDR\": 11.244,\n            \"SIR\": 23.536,\n            \"SAR\": 13.3677,\n            \"ISR\": 14.0506\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0659,\n            \"SIR\": 21.9569,\n            \"SAR\": 12.9948,\n            \"ISR\": 15.9682\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0174,\n            \"SIR\": 23.7215,\n            \"SAR\": 15.3182,\n            \"ISR\": 16.9475\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.75154,\n            \"SIR\": 17.4055,\n            \"SAR\": 10.6096,\n            \"ISR\": 14.5785\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4085,\n            \"SIR\": 21.4284,\n            \"SAR\": 12.955,\n            \"ISR\": 15.4553\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1477,\n            \"SIR\": 19.6268,\n            \"SAR\": 12.2948,\n            \"ISR\": 14.8033\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6056,\n            \"SIR\": 23.2574,\n            \"SAR\": 16.6226,\n            \"ISR\": 17.5846\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.67488,\n            \"SIR\": 18.4063,\n            \"SAR\": 5.36033,\n            \"ISR\": 7.55526\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8995,\n            \"SIR\": 27.7493,\n            \"SAR\": 23.7172,\n            \"ISR\": 19.4267\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7657,\n            \"SIR\": 30.9358,\n            \"SAR\": 12.432,\n            \"ISR\": 16.0457\n          },\n          \"instrumental\": {\n            \"SDR\": 15.089,\n            \"SIR\": 26.0163,\n            \"SAR\": 16.9656,\n            \"ISR\": 18.9324\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2912,\n            \"SIR\": 21.634,\n            \"SAR\": 9.50986,\n            \"ISR\": 11.7108\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0294,\n            \"SIR\": 18.2526,\n            \"SAR\": 14.9823,\n            \"ISR\": 17.8543\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7488,\n            \"SIR\": 33.2106,\n            \"SAR\": 15.0597,\n            \"ISR\": 17.4611\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1127,\n            \"SIR\": 28.2339,\n            \"SAR\": 18.4692,\n            \"ISR\": 19.2661\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1442,\n            \"SIR\": 22.284,\n            \"SAR\": 11.57,\n            \"ISR\": 15.3503\n          },\n          \"instrumental\": {\n            \"SDR\": 15.196,\n            \"SIR\": 26.5929,\n            \"SAR\": 17.6819,\n            \"ISR\": 18.3769\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.38685,\n            \"SIR\": 18.5005,\n            \"SAR\": 7.50454,\n            \"ISR\": 12.8723\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0623,\n            \"SIR\": 25.4058,\n            \"SAR\": 16.9194,\n            \"ISR\": 17.551\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.054,\n            \"SIR\": 17.0651,\n            \"SAR\": 8.54909,\n            \"ISR\": 12.103\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1765,\n            \"SIR\": 24.3283,\n            \"SAR\": 20.4193,\n            \"ISR\": 18.1681\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.55396,\n            \"SIR\": 15.8365,\n            \"SAR\": 2.44444,\n            \"ISR\": 8.43451\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9341,\n            \"SIR\": 21.3146,\n            \"SAR\": 18.3836,\n            \"ISR\": 18.5507\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.46262,\n            \"SIR\": 20.6013,\n            \"SAR\": 10.063,\n            \"ISR\": 14.745\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0998,\n            \"SIR\": 24.4914,\n            \"SAR\": 15.9391,\n            \"ISR\": 17.5833\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.47292,\n            \"SIR\": 25.2321,\n            \"SAR\": 8.76002,\n            \"ISR\": 12.701\n          },\n          \"instrumental\": {\n            \"SDR\": 15.012,\n            \"SIR\": 23.2608,\n            \"SAR\": 17.525,\n            \"ISR\": 18.8558\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.5477,\n            \"SIR\": -39.1106,\n            \"SAR\": 0.574875,\n            \"ISR\": 7.92234\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6151,\n            \"SIR\": 55.7281,\n            \"SAR\": 10.7725,\n            \"ISR\": 11.0453\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.65338,\n            \"SIR\": 16.2848,\n            \"SAR\": 6.5295,\n            \"ISR\": 9.77501\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2414,\n            \"SIR\": 20.4103,\n            \"SAR\": 17.8276,\n            \"ISR\": 18.0219\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2847,\n            \"SIR\": 22.8679,\n            \"SAR\": 10.0144,\n            \"ISR\": 13.0653\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1428,\n            \"SIR\": 22.67,\n            \"SAR\": 17.2206,\n            \"ISR\": 18.6223\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3432,\n            \"SIR\": 21.6254,\n            \"SAR\": 11.1767,\n            \"ISR\": 15.64\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1878,\n            \"SIR\": 31.7235,\n            \"SAR\": 20.3718,\n            \"ISR\": 18.8224\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -10.6232,\n            \"SIR\": -2.82688,\n            \"SAR\": -0.54609,\n            \"ISR\": 3.24121\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3804,\n            \"SIR\": 40.0579,\n            \"SAR\": 26.7598,\n            \"ISR\": 18.7482\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.00698,\n            \"SIR\": 14.508,\n            \"SAR\": 3.74174,\n            \"ISR\": 6.30936\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6716,\n            \"SIR\": 13.9879,\n            \"SAR\": 14.4902,\n            \"ISR\": 17.1602\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.95819,\n            \"SIR\": 17.8621,\n            \"SAR\": 6.65397,\n            \"ISR\": 12.1497\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3317,\n            \"SIR\": 28.1173,\n            \"SAR\": 20.52,\n            \"ISR\": 18.8462\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.32109,\n            \"SIR\": 18.7145,\n            \"SAR\": 7.37439,\n            \"ISR\": 10.5163\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2612,\n            \"SIR\": 20.7908,\n            \"SAR\": 17.0235,\n            \"ISR\": 18.3184\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04443,\n            \"SIR\": 15.5489,\n            \"SAR\": 8.24409,\n            \"ISR\": 14.1029\n          },\n          \"instrumental\": {\n            \"SDR\": 14.967,\n            \"SIR\": 25.1598,\n            \"SAR\": 15.887,\n            \"ISR\": 16.9169\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7759,\n            \"SIR\": 22.8069,\n            \"SAR\": 11.5565,\n            \"ISR\": 14.3773\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5197,\n            \"SIR\": 21.0681,\n            \"SAR\": 13.7147,\n            \"ISR\": 17.0303\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.9586,\n            \"SIR\": 13.1811,\n            \"SAR\": 5.64439,\n            \"ISR\": 11.0181\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8985,\n            \"SIR\": 18.6698,\n            \"SAR\": 13.0894,\n            \"ISR\": 15.1756\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.28915,\n            \"SIR\": 22.2863,\n            \"SAR\": 9.89436,\n            \"ISR\": 14.4417\n          },\n          \"instrumental\": {\n            \"SDR\": 14.301,\n            \"SIR\": 22.9174,\n            \"SAR\": 16.4738,\n            \"ISR\": 17.6156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4937,\n            \"SIR\": 29.2387,\n            \"SAR\": 13.7039,\n            \"ISR\": 15.5505\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4408,\n            \"SIR\": 19.4256,\n            \"SAR\": 14.1141,\n            \"ISR\": 18.6565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0405,\n            \"SIR\": 22.8459,\n            \"SAR\": 11.1872,\n            \"ISR\": 14.972\n          },\n          \"instrumental\": {\n            \"SDR\": 14.05,\n            \"SIR\": 22.5815,\n            \"SAR\": 15.3561,\n            \"ISR\": 17.3568\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4453,\n            \"SIR\": 31.9477,\n            \"SAR\": 10.9486,\n            \"ISR\": 13.1744\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0355,\n            \"SIR\": 24.1731,\n            \"SAR\": 20.5753,\n            \"ISR\": 19.6338\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.08407,\n            \"SIR\": 8.46666,\n            \"SAR\": 9.3276,\n            \"ISR\": 11.4642\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6975,\n            \"SIR\": 18.1439,\n            \"SAR\": 15.5879,\n            \"ISR\": 14.6746\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3457,\n            \"SIR\": 26.4682,\n            \"SAR\": 13.1293,\n            \"ISR\": 17.8\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3587,\n            \"SIR\": 35.6293,\n            \"SAR\": 23.4796,\n            \"ISR\": 19.2539\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7189,\n            \"SIR\": 19.0807,\n            \"SAR\": 11.997,\n            \"ISR\": 17.0414\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3517,\n            \"SIR\": 23.7983,\n            \"SAR\": 11.89,\n            \"ISR\": 14.4862\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.89625,\n            \"SIR\": 21.1288,\n            \"SAR\": 7.86716,\n            \"ISR\": 12.9952\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0086,\n            \"SIR\": 21.553,\n            \"SAR\": 14.396,\n            \"ISR\": 17.5262\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.14999,\n            \"SIR\": 7.29956,\n            \"SAR\": 1.73523,\n            \"ISR\": 9.65562\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5368,\n            \"SIR\": 30.3149,\n            \"SAR\": 21.456,\n            \"ISR\": 17.6565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.71596,\n            \"SIR\": 24.9536,\n            \"SAR\": 10.2683,\n            \"ISR\": 15.4547\n          },\n          \"instrumental\": {\n            \"SDR\": 16.762,\n            \"SIR\": 30.2496,\n            \"SAR\": 19.9945,\n            \"ISR\": 18.9098\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7554,\n            \"SIR\": 27.5346,\n            \"SAR\": 10.7463,\n            \"ISR\": 15.0081\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6448,\n            \"SIR\": 29.5343,\n            \"SAR\": 21.0699,\n            \"ISR\": 19.1887\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7841,\n            \"SIR\": 29.4205,\n            \"SAR\": 11.9041,\n            \"ISR\": 13.1682\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9315,\n            \"SIR\": 19.1659,\n            \"SAR\": 15.4771,\n            \"ISR\": 18.6308\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.37589,\n        \"SIR\": 20.114,\n        \"SAR\": 9.70211,\n        \"ISR\": 13.1167\n      },\n      \"instrumental\": {\n        \"SDR\": 14.9895,\n        \"SIR\": 23.7599,\n        \"SAR\": 16.9425,\n        \"ISR\": 18.0907\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR_MDXNET_3_9662.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET 3\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.42437,\n            \"SIR\": 18.0352,\n            \"SAR\": 4.82973,\n            \"ISR\": 9.64703\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4386,\n            \"SIR\": 24.5125,\n            \"SAR\": 19.4329,\n            \"ISR\": 18.7654\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.1878,\n            \"SIR\": 10.7021,\n            \"SAR\": 6.85693,\n            \"ISR\": 12.771\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7318,\n            \"SIR\": 21.7972,\n            \"SAR\": 12.39,\n            \"ISR\": 13.7819\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79669,\n            \"SIR\": 21.3539,\n            \"SAR\": 10.3349,\n            \"ISR\": 15.051\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3426,\n            \"SIR\": 25.9844,\n            \"SAR\": 17.8024,\n            \"ISR\": 18.1867\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.02459,\n            \"SIR\": 6.35295,\n            \"SAR\": 4.14652,\n            \"ISR\": 12.2428\n          },\n          \"instrumental\": {\n            \"SDR\": 11.016,\n            \"SIR\": 25.1965,\n            \"SAR\": 13.4494,\n            \"ISR\": 13.8091\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3189,\n            \"SIR\": 21.9519,\n            \"SAR\": 13.1978,\n            \"ISR\": 16.7153\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0662,\n            \"SIR\": 24.923,\n            \"SAR\": 15.2796,\n            \"ISR\": 16.9025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.85264,\n            \"SIR\": 17.3519,\n            \"SAR\": 10.7483,\n            \"ISR\": 15.2878\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4036,\n            \"SIR\": 22.4904,\n            \"SAR\": 12.7851,\n            \"ISR\": 15.3741\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3476,\n            \"SIR\": 19.4962,\n            \"SAR\": 12.3371,\n            \"ISR\": 15.5705\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6022,\n            \"SIR\": 24.26,\n            \"SAR\": 16.721,\n            \"ISR\": 17.4832\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.74615,\n            \"SIR\": 18.4544,\n            \"SAR\": 4.71315,\n            \"ISR\": 8.64891\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9891,\n            \"SIR\": 29.0545,\n            \"SAR\": 23.6419,\n            \"ISR\": 19.3618\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9366,\n            \"SIR\": 31.2944,\n            \"SAR\": 12.575,\n            \"ISR\": 16.889\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1399,\n            \"SIR\": 27.4436,\n            \"SAR\": 16.9471,\n            \"ISR\": 18.9532\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6574,\n            \"SIR\": 21.5501,\n            \"SAR\": 9.96162,\n            \"ISR\": 12.527\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1333,\n            \"SIR\": 19.1527,\n            \"SAR\": 15.0154,\n            \"ISR\": 17.7383\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9829,\n            \"SIR\": 33.2907,\n            \"SAR\": 15.0955,\n            \"ISR\": 18.1661\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0739,\n            \"SIR\": 29.1211,\n            \"SAR\": 18.3166,\n            \"ISR\": 19.2621\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3782,\n            \"SIR\": 20.6799,\n            \"SAR\": 11.6328,\n            \"ISR\": 15.896\n          },\n          \"instrumental\": {\n            \"SDR\": 15.148,\n            \"SIR\": 26.753,\n            \"SAR\": 17.1879,\n            \"ISR\": 17.9421\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.34252,\n            \"SIR\": 18.8131,\n            \"SAR\": 7.53487,\n            \"ISR\": 13.3628\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1386,\n            \"SIR\": 26.1589,\n            \"SAR\": 17.0481,\n            \"ISR\": 17.5584\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.22404,\n            \"SIR\": 14.8422,\n            \"SAR\": 7.59886,\n            \"ISR\": 12.8958\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4483,\n            \"SIR\": 25.2,\n            \"SAR\": 19.3591,\n            \"ISR\": 17.59\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.92619,\n            \"SIR\": 15.9804,\n            \"SAR\": 2.72476,\n            \"ISR\": 9.41203\n          },\n          \"instrumental\": {\n            \"SDR\": 15.046,\n            \"SIR\": 22.6649,\n            \"SAR\": 18.3608,\n            \"ISR\": 18.4297\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.61468,\n            \"SIR\": 20.4557,\n            \"SAR\": 10.2076,\n            \"ISR\": 15.4281\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9901,\n            \"SIR\": 25.7908,\n            \"SAR\": 15.78,\n            \"ISR\": 17.5149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.5201,\n            \"SIR\": 25.0314,\n            \"SAR\": 8.75421,\n            \"ISR\": 13.1375\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9306,\n            \"SIR\": 23.9525,\n            \"SAR\": 17.1972,\n            \"ISR\": 18.796\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -10.8954,\n            \"SIR\": -35.6572,\n            \"SAR\": 0.381915,\n            \"ISR\": 6.83652\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8081,\n            \"SIR\": 55.9319,\n            \"SAR\": 12.8086,\n            \"ISR\": 13.0739\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.98865,\n            \"SIR\": 15.5812,\n            \"SAR\": 6.81185,\n            \"ISR\": 10.6987\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2352,\n            \"SIR\": 21.8351,\n            \"SAR\": 17.971,\n            \"ISR\": 17.7113\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6847,\n            \"SIR\": 22.8997,\n            \"SAR\": 10.8723,\n            \"ISR\": 14.8885\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2196,\n            \"SIR\": 25.8273,\n            \"SAR\": 17.286,\n            \"ISR\": 18.5261\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6468,\n            \"SIR\": 21.6175,\n            \"SAR\": 11.4004,\n            \"ISR\": 16.8164\n          },\n          \"instrumental\": {\n            \"SDR\": 17.165,\n            \"SIR\": 32.3308,\n            \"SAR\": 20.4557,\n            \"ISR\": 18.7655\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -6.77998,\n            \"SIR\": -2.3792,\n            \"SAR\": -0.14437,\n            \"ISR\": 3.29199\n          },\n          \"instrumental\": {\n            \"SDR\": 19.5861,\n            \"SIR\": 40.1276,\n            \"SAR\": 28.5117,\n            \"ISR\": 18.7367\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.05636,\n            \"SIR\": 14.1204,\n            \"SAR\": 3.91988,\n            \"ISR\": 6.57386\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6256,\n            \"SIR\": 14.2504,\n            \"SAR\": 14.2993,\n            \"ISR\": 16.9377\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.9036,\n            \"SIR\": 16.1735,\n            \"SAR\": 6.39386,\n            \"ISR\": 13.5548\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2797,\n            \"SIR\": 29.6737,\n            \"SAR\": 20.1652,\n            \"ISR\": 18.5442\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.44527,\n            \"SIR\": 18.378,\n            \"SAR\": 7.67087,\n            \"ISR\": 11.4873\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2948,\n            \"SIR\": 22.0472,\n            \"SAR\": 16.8807,\n            \"ISR\": 18.114\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.24201,\n            \"SIR\": 15.5535,\n            \"SAR\": 8.26726,\n            \"ISR\": 14.8712\n          },\n          \"instrumental\": {\n            \"SDR\": 14.958,\n            \"SIR\": 26.355,\n            \"SAR\": 16.1321,\n            \"ISR\": 16.9138\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7835,\n            \"SIR\": 23.1093,\n            \"SAR\": 11.5871,\n            \"ISR\": 14.8158\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5905,\n            \"SIR\": 21.3343,\n            \"SAR\": 13.7621,\n            \"ISR\": 17.0638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.51273,\n            \"SIR\": 11.2932,\n            \"SAR\": 5.62732,\n            \"ISR\": 12.4254\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4883,\n            \"SIR\": 20.414,\n            \"SAR\": 12.6054,\n            \"ISR\": 13.8191\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79265,\n            \"SIR\": 22.7855,\n            \"SAR\": 10.2925,\n            \"ISR\": 15.2721\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4588,\n            \"SIR\": 24.2498,\n            \"SAR\": 16.3099,\n            \"ISR\": 17.6874\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7526,\n            \"SIR\": 29.3443,\n            \"SAR\": 14.0836,\n            \"ISR\": 16.3733\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4794,\n            \"SIR\": 20.6142,\n            \"SAR\": 14.2455,\n            \"ISR\": 18.6445\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.202,\n            \"SIR\": 22.9917,\n            \"SAR\": 11.3418,\n            \"ISR\": 15.6489\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9037,\n            \"SIR\": 23.363,\n            \"SAR\": 15.2358,\n            \"ISR\": 17.3448\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4552,\n            \"SIR\": 30.8639,\n            \"SAR\": 11.3646,\n            \"ISR\": 14.0787\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0503,\n            \"SIR\": 25.1412,\n            \"SAR\": 20.8957,\n            \"ISR\": 19.5683\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.26357,\n            \"SIR\": 11.3464,\n            \"SAR\": 4.50709,\n            \"ISR\": 7.68187\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3854,\n            \"SIR\": 15.1802,\n            \"SAR\": 13.3382,\n            \"ISR\": 17.083\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6545,\n            \"SIR\": 27.2025,\n            \"SAR\": 13.3271,\n            \"ISR\": 18.3637\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4151,\n            \"SIR\": 37.13,\n            \"SAR\": 23.6582,\n            \"ISR\": 19.3399\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6609,\n            \"SIR\": 18.8866,\n            \"SAR\": 11.9621,\n            \"ISR\": 17.7706\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3611,\n            \"SIR\": 26.058,\n            \"SAR\": 11.6222,\n            \"ISR\": 14.3553\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.0171,\n            \"SIR\": 20.6619,\n            \"SAR\": 7.98461,\n            \"ISR\": 13.5249\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0022,\n            \"SIR\": 22.9276,\n            \"SAR\": 14.4146,\n            \"ISR\": 17.2664\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.17187,\n            \"SIR\": 7.66708,\n            \"SAR\": 2.08921,\n            \"ISR\": 9.77958\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2533,\n            \"SIR\": 30.5264,\n            \"SAR\": 20.9702,\n            \"ISR\": 17.7179\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.83565,\n            \"SIR\": 25.0138,\n            \"SAR\": 10.3837,\n            \"ISR\": 16.1516\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7805,\n            \"SIR\": 31.9866,\n            \"SAR\": 20.0389,\n            \"ISR\": 18.893\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1021,\n            \"SIR\": 27.9211,\n            \"SAR\": 11.0349,\n            \"ISR\": 16.0174\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7,\n            \"SIR\": 31.7467,\n            \"SAR\": 21.3614,\n            \"ISR\": 19.2058\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2404,\n            \"SIR\": 28.6754,\n            \"SAR\": 12.3794,\n            \"ISR\": 14.2521\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1266,\n            \"SIR\": 21.0291,\n            \"SAR\": 15.9185,\n            \"ISR\": 18.5261\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.70367,\n        \"SIR\": 19.976,\n        \"SAR\": 10.0846,\n        \"ISR\": 14.1654\n      },\n      \"instrumental\": {\n        \"SDR\": 15.002,\n        \"SIR\": 25.1688,\n        \"SAR\": 16.9139,\n        \"ISR\": 17.7281\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET-Inst_1.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst 1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.82133,\n            \"SIR\": 18.827,\n            \"SAR\": 5.68742,\n            \"ISR\": 9.93487\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4818,\n            \"SIR\": 24.78,\n            \"SAR\": 19.7198,\n            \"ISR\": 18.7803\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.01205,\n            \"SIR\": 10.2907,\n            \"SAR\": 6.74881,\n            \"ISR\": 13.327\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7604,\n            \"SIR\": 22.6716,\n            \"SAR\": 12.6458,\n            \"ISR\": 13.4699\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3391,\n            \"SIR\": 20.8459,\n            \"SAR\": 10.9266,\n            \"ISR\": 15.4296\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5183,\n            \"SIR\": 27.185,\n            \"SAR\": 18.1243,\n            \"ISR\": 17.9021\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.40693,\n            \"SIR\": 5.84656,\n            \"SAR\": 4.74525,\n            \"ISR\": 12.9332\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1516,\n            \"SIR\": 25.5977,\n            \"SAR\": 13.5577,\n            \"ISR\": 13.456\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0761,\n            \"SIR\": 22.3341,\n            \"SAR\": 13.3786,\n            \"ISR\": 16.2584\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5241,\n            \"SIR\": 24.5901,\n            \"SAR\": 15.7514,\n            \"ISR\": 17.0892\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0768,\n            \"SIR\": 17.8248,\n            \"SAR\": 10.9961,\n            \"ISR\": 14.9913\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9039,\n            \"SIR\": 22.2719,\n            \"SAR\": 13.4026,\n            \"ISR\": 15.5459\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6162,\n            \"SIR\": 20.5145,\n            \"SAR\": 12.7942,\n            \"ISR\": 15.097\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9626,\n            \"SIR\": 24.206,\n            \"SAR\": 17.4561,\n            \"ISR\": 17.6882\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.03948,\n            \"SIR\": 17.5927,\n            \"SAR\": 5.81181,\n            \"ISR\": 8.08107\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8406,\n            \"SIR\": 28.8281,\n            \"SAR\": 23.283,\n            \"ISR\": 19.248\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5163,\n            \"SIR\": 28.6815,\n            \"SAR\": 13.2582,\n            \"ISR\": 16.775\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2711,\n            \"SIR\": 28.1908,\n            \"SAR\": 17.7063,\n            \"ISR\": 18.6993\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0333,\n            \"SIR\": 21.8075,\n            \"SAR\": 10.4498,\n            \"ISR\": 12.1428\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7148,\n            \"SIR\": 18.2247,\n            \"SAR\": 15.6633,\n            \"ISR\": 17.7562\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.683,\n            \"SIR\": 31.6124,\n            \"SAR\": 16.4119,\n            \"ISR\": 18.1814\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7295,\n            \"SIR\": 32.1348,\n            \"SAR\": 19.7392,\n            \"ISR\": 19.1319\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2629,\n            \"SIR\": 21.2314,\n            \"SAR\": 11.511,\n            \"ISR\": 14.6865\n          },\n          \"instrumental\": {\n            \"SDR\": 15.172,\n            \"SIR\": 24.4783,\n            \"SAR\": 17.5185,\n            \"ISR\": 18.0282\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.36114,\n            \"SIR\": 19.3901,\n            \"SAR\": 7.89158,\n            \"ISR\": 13.4168\n          },\n          \"instrumental\": {\n            \"SDR\": 15.374,\n            \"SIR\": 26.2494,\n            \"SAR\": 17.5166,\n            \"ISR\": 17.7353\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.17133,\n            \"SIR\": 17.3938,\n            \"SAR\": 10.5581,\n            \"ISR\": 11.822\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8198,\n            \"SIR\": 23.5432,\n            \"SAR\": 19.1527,\n            \"ISR\": 17.2943\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.45864,\n            \"SIR\": 16.7948,\n            \"SAR\": 3.45432,\n            \"ISR\": 9.16653\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0818,\n            \"SIR\": 22.2247,\n            \"SAR\": 18.6655,\n            \"ISR\": 18.5209\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.67323,\n            \"SIR\": 20.1956,\n            \"SAR\": 10.3826,\n            \"ISR\": 15.0287\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0093,\n            \"SIR\": 25.2223,\n            \"SAR\": 16.0314,\n            \"ISR\": 17.2692\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.46733,\n            \"SIR\": 14.4289,\n            \"SAR\": 8.26317,\n            \"ISR\": 13.8089\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4638,\n            \"SIR\": 24.8882,\n            \"SAR\": 16.3475,\n            \"ISR\": 17.329\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.9087,\n            \"SIR\": -36.2914,\n            \"SAR\": 0.65834,\n            \"ISR\": 11.4568\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6261,\n            \"SIR\": 57.9599,\n            \"SAR\": 13.2085,\n            \"ISR\": 13.0549\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.95456,\n            \"SIR\": 14.2596,\n            \"SAR\": 6.75791,\n            \"ISR\": 11.4243\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7651,\n            \"SIR\": 22.4289,\n            \"SAR\": 17.9914,\n            \"ISR\": 17.1906\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9847,\n            \"SIR\": 23.6265,\n            \"SAR\": 11.0551,\n            \"ISR\": 13.7479\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5553,\n            \"SIR\": 24.1099,\n            \"SAR\": 17.5546,\n            \"ISR\": 18.5386\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87595,\n            \"SIR\": 22.2421,\n            \"SAR\": 10.56,\n            \"ISR\": 15.4275\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0812,\n            \"SIR\": 29.784,\n            \"SAR\": 20.5828,\n            \"ISR\": 18.7624\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.9397,\n            \"SIR\": -9.84927,\n            \"SAR\": -0.27485,\n            \"ISR\": 11.9883\n          },\n          \"instrumental\": {\n            \"SDR\": 19.3108,\n            \"SIR\": 42.4161,\n            \"SAR\": 30.6085,\n            \"ISR\": 19.3062\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.03531,\n            \"SIR\": 12.9645,\n            \"SAR\": 4.00333,\n            \"ISR\": 6.88912\n          },\n          \"instrumental\": {\n            \"SDR\": 10.565,\n            \"SIR\": 14.5892,\n            \"SAR\": 14.1943,\n            \"ISR\": 16.4317\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.1253,\n            \"SIR\": 14.9457,\n            \"SAR\": 7.07191,\n            \"ISR\": 13.9011\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3149,\n            \"SIR\": 31.5911,\n            \"SAR\": 20.4903,\n            \"ISR\": 18.4403\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.67665,\n            \"SIR\": 18.1184,\n            \"SAR\": 8.08483,\n            \"ISR\": 11.709\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5468,\n            \"SIR\": 22.421,\n            \"SAR\": 17.3646,\n            \"ISR\": 18.0213\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.76755,\n            \"SIR\": 17.0665,\n            \"SAR\": 9.24872,\n            \"ISR\": 14.4635\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4349,\n            \"SIR\": 26.0756,\n            \"SAR\": 17.3962,\n            \"ISR\": 17.2898\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4375,\n            \"SIR\": 22.9006,\n            \"SAR\": 11.8205,\n            \"ISR\": 14.441\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8928,\n            \"SIR\": 21.1601,\n            \"SAR\": 14.2726,\n            \"ISR\": 17.157\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.74507,\n            \"SIR\": 16.9879,\n            \"SAR\": 6.50989,\n            \"ISR\": 10.7511\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9893,\n            \"SIR\": 18.3637,\n            \"SAR\": 14.0045,\n            \"ISR\": 16.4332\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.03989,\n            \"SIR\": 8.53777,\n            \"SAR\": 8.42248,\n            \"ISR\": 15.6855\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5511,\n            \"SIR\": 25.0212,\n            \"SAR\": 13.6745,\n            \"ISR\": 14.4074\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7286,\n            \"SIR\": 29.0251,\n            \"SAR\": 15.0579,\n            \"ISR\": 16.4205\n          },\n          \"instrumental\": {\n            \"SDR\": 15.021,\n            \"SIR\": 21.9776,\n            \"SAR\": 15.3902,\n            \"ISR\": 18.5802\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1669,\n            \"SIR\": 22.542,\n            \"SAR\": 11.7124,\n            \"ISR\": 15.4658\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1597,\n            \"SIR\": 24.0232,\n            \"SAR\": 15.857,\n            \"ISR\": 17.2722\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1201,\n            \"SIR\": 23.5612,\n            \"SAR\": 11.8078,\n            \"ISR\": 15.1698\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7692,\n            \"SIR\": 28.1321,\n            \"SAR\": 21.0145,\n            \"ISR\": 18.986\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.08622,\n            \"SIR\": 8.53725,\n            \"SAR\": 6.44304,\n            \"ISR\": 11.1278\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6671,\n            \"SIR\": 18.6134,\n            \"SAR\": 13.3115,\n            \"ISR\": 14.8267\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6786,\n            \"SIR\": 26.7599,\n            \"SAR\": 13.251,\n            \"ISR\": 17.786\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5981,\n            \"SIR\": 36.1545,\n            \"SAR\": 24.5599,\n            \"ISR\": 19.2087\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0023,\n            \"SIR\": 19.8247,\n            \"SAR\": 12.3262,\n            \"ISR\": 16.8419\n          },\n          \"instrumental\": {\n            \"SDR\": 11.725,\n            \"SIR\": 23.7767,\n            \"SAR\": 12.5155,\n            \"ISR\": 14.8395\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.50035,\n            \"SIR\": 21.2886,\n            \"SAR\": 8.54565,\n            \"ISR\": 13.6276\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6602,\n            \"SIR\": 22.6474,\n            \"SAR\": 15.2035,\n            \"ISR\": 17.5195\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.21839,\n            \"SIR\": 0.91338,\n            \"SAR\": 2.08767,\n            \"ISR\": 11.102\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7142,\n            \"SIR\": 31.576,\n            \"SAR\": 21.3691,\n            \"ISR\": 17.4536\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0856,\n            \"SIR\": 25.0043,\n            \"SAR\": 10.6565,\n            \"ISR\": 15.5804\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9831,\n            \"SIR\": 30.4962,\n            \"SAR\": 20.5973,\n            \"ISR\": 18.9054\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3863,\n            \"SIR\": 27.4513,\n            \"SAR\": 11.884,\n            \"ISR\": 15.9665\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6539,\n            \"SIR\": 31.9858,\n            \"SAR\": 22.2182,\n            \"ISR\": 19.0893\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.873,\n            \"SIR\": 28.4651,\n            \"SAR\": 13.5462,\n            \"ISR\": 15.7344\n          },\n          \"instrumental\": {\n            \"SDR\": 15.303,\n            \"SIR\": 23.6916,\n            \"SAR\": 17.2506,\n            \"ISR\": 18.5393\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.22039,\n        \"SIR\": 19.6074,\n        \"SAR\": 10.4162,\n        \"ISR\": 14.1711\n      },\n      \"instrumental\": {\n        \"SDR\": 15.2216,\n        \"SIR\": 24.6851,\n        \"SAR\": 17.4261,\n        \"ISR\": 17.7117\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_2.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.81251,\n            \"SIR\": 18.688,\n            \"SAR\": 5.91282,\n            \"ISR\": 10.5675\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6667,\n            \"SIR\": 25.3572,\n            \"SAR\": 19.8788,\n            \"ISR\": 18.9157\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.03507,\n            \"SIR\": 13.2679,\n            \"SAR\": 7.40571,\n            \"ISR\": 12.5667\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0218,\n            \"SIR\": 21.7743,\n            \"SAR\": 13.7749,\n            \"ISR\": 15.3119\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2217,\n            \"SIR\": 21.8449,\n            \"SAR\": 10.7796,\n            \"ISR\": 15.1951\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7542,\n            \"SIR\": 26.7667,\n            \"SAR\": 18.2101,\n            \"ISR\": 18.3445\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.63046,\n            \"SIR\": 6.79876,\n            \"SAR\": 4.04389,\n            \"ISR\": 12.7712\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5335,\n            \"SIR\": 25.3537,\n            \"SAR\": 13.7167,\n            \"ISR\": 14.1187\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1162,\n            \"SIR\": 23.1182,\n            \"SAR\": 13.224,\n            \"ISR\": 16.1107\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7827,\n            \"SIR\": 24.2252,\n            \"SAR\": 15.8059,\n            \"ISR\": 17.4097\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0125,\n            \"SIR\": 17.9988,\n            \"SAR\": 10.9954,\n            \"ISR\": 14.861\n          },\n          \"instrumental\": {\n            \"SDR\": 11.822,\n            \"SIR\": 21.9985,\n            \"SAR\": 13.4256,\n            \"ISR\": 15.6867\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5837,\n            \"SIR\": 20.9345,\n            \"SAR\": 12.8295,\n            \"ISR\": 14.9077\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0979,\n            \"SIR\": 24.146,\n            \"SAR\": 17.3049,\n            \"ISR\": 17.9542\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.17399,\n            \"SIR\": 18.5685,\n            \"SAR\": 5.59429,\n            \"ISR\": 8.04308\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0321,\n            \"SIR\": 28.563,\n            \"SAR\": 23.5452,\n            \"ISR\": 19.5554\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4912,\n            \"SIR\": 30.1042,\n            \"SAR\": 13.0478,\n            \"ISR\": 16.7307\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3563,\n            \"SIR\": 28.3106,\n            \"SAR\": 17.6737,\n            \"ISR\": 19.0399\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9501,\n            \"SIR\": 23.1963,\n            \"SAR\": 9.99365,\n            \"ISR\": 11.7486\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8061,\n            \"SIR\": 17.732,\n            \"SAR\": 15.4562,\n            \"ISR\": 18.2028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7007,\n            \"SIR\": 32.2617,\n            \"SAR\": 16.3188,\n            \"ISR\": 18.1995\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7778,\n            \"SIR\": 32.0005,\n            \"SAR\": 19.7468,\n            \"ISR\": 19.3556\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3223,\n            \"SIR\": 22.0072,\n            \"SAR\": 11.7495,\n            \"ISR\": 14.9387\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3016,\n            \"SIR\": 25.4108,\n            \"SAR\": 17.6072,\n            \"ISR\": 18.3561\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.31498,\n            \"SIR\": 19.5799,\n            \"SAR\": 7.7927,\n            \"ISR\": 13.256\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3945,\n            \"SIR\": 26.171,\n            \"SAR\": 17.3545,\n            \"ISR\": 17.8872\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.27584,\n            \"SIR\": 19.2853,\n            \"SAR\": 10.7407,\n            \"ISR\": 11.7779\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9625,\n            \"SIR\": 23.3491,\n            \"SAR\": 19.243,\n            \"ISR\": 17.7851\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.3567,\n            \"SIR\": 17.079,\n            \"SAR\": 3.28756,\n            \"ISR\": 9.18334\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9156,\n            \"SIR\": 22.2411,\n            \"SAR\": 18.2914,\n            \"ISR\": 18.7129\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.63878,\n            \"SIR\": 20.7125,\n            \"SAR\": 10.3642,\n            \"ISR\": 14.9467\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0722,\n            \"SIR\": 24.9552,\n            \"SAR\": 16.0758,\n            \"ISR\": 17.5367\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2708,\n            \"SIR\": 14.4636,\n            \"SAR\": 7.96466,\n            \"ISR\": 13.7637\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4284,\n            \"SIR\": 24.9753,\n            \"SAR\": 16.1224,\n            \"ISR\": 17.4161\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.5855,\n            \"SIR\": -35.3358,\n            \"SAR\": 0.11232,\n            \"ISR\": 11.6229\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4063,\n            \"SIR\": 58.0105,\n            \"SAR\": 13.4108,\n            \"ISR\": 13.597\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.97203,\n            \"SIR\": 15.295,\n            \"SAR\": 6.33644,\n            \"ISR\": 10.9168\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0535,\n            \"SIR\": 22.0756,\n            \"SAR\": 18.1735,\n            \"ISR\": 17.681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9766,\n            \"SIR\": 24.6515,\n            \"SAR\": 11.6324,\n            \"ISR\": 14.9982\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8716,\n            \"SIR\": 26.0192,\n            \"SAR\": 18.0409,\n            \"ISR\": 18.8528\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3924,\n            \"SIR\": 20.9976,\n            \"SAR\": 10.6197,\n            \"ISR\": 15.6971\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3955,\n            \"SIR\": 31.3053,\n            \"SAR\": 20.5625,\n            \"ISR\": 18.7567\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.8627,\n            \"SIR\": -9.78477,\n            \"SAR\": 1.06071,\n            \"ISR\": 12.2823\n          },\n          \"instrumental\": {\n            \"SDR\": 19.1344,\n            \"SIR\": 42.4571,\n            \"SAR\": 30.3242,\n            \"ISR\": 18.8636\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.10058,\n            \"SIR\": 15.0494,\n            \"SAR\": 3.81815,\n            \"ISR\": 6.38847\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7286,\n            \"SIR\": 14.1052,\n            \"SAR\": 14.5272,\n            \"ISR\": 17.3538\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.08509,\n            \"SIR\": 15.5865,\n            \"SAR\": 6.63364,\n            \"ISR\": 12.6521\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6379,\n            \"SIR\": 28.6127,\n            \"SAR\": 20.7278,\n            \"ISR\": 18.8141\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.66062,\n            \"SIR\": 19.107,\n            \"SAR\": 7.74373,\n            \"ISR\": 11.2347\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5588,\n            \"SIR\": 21.7933,\n            \"SAR\": 17.3106,\n            \"ISR\": 18.4283\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.7153,\n            \"SIR\": 16.7471,\n            \"SAR\": 8.96247,\n            \"ISR\": 14.5521\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5425,\n            \"SIR\": 26.1372,\n            \"SAR\": 17.0841,\n            \"ISR\": 17.3688\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0092,\n            \"SIR\": 23.7219,\n            \"SAR\": 11.8801,\n            \"ISR\": 14.1161\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1992,\n            \"SIR\": 20.4075,\n            \"SAR\": 14.4947,\n            \"ISR\": 17.5465\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.6549,\n            \"SIR\": 17.0236,\n            \"SAR\": 6.30547,\n            \"ISR\": 11.0584\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9959,\n            \"SIR\": 18.7037,\n            \"SAR\": 14.0408,\n            \"ISR\": 16.4825\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.97192,\n            \"SIR\": 8.53821,\n            \"SAR\": 8.4687,\n            \"ISR\": 15.3588\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5946,\n            \"SIR\": 24.0052,\n            \"SAR\": 13.616,\n            \"ISR\": 14.4385\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6432,\n            \"SIR\": 29.9424,\n            \"SAR\": 14.5316,\n            \"ISR\": 16.0745\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0517,\n            \"SIR\": 20.8096,\n            \"SAR\": 14.995,\n            \"ISR\": 18.8446\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4802,\n            \"SIR\": 23.5743,\n            \"SAR\": 11.7385,\n            \"ISR\": 15.3427\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2596,\n            \"SIR\": 23.7544,\n            \"SAR\": 15.8844,\n            \"ISR\": 17.7275\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0342,\n            \"SIR\": 23.6754,\n            \"SAR\": 12.0753,\n            \"ISR\": 15.1376\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7604,\n            \"SIR\": 27.8391,\n            \"SAR\": 21.0371,\n            \"ISR\": 18.9115\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.28036,\n            \"SIR\": 8.85196,\n            \"SAR\": 6.18763,\n            \"ISR\": 10.6371\n          },\n          \"instrumental\": {\n            \"SDR\": 11.626,\n            \"SIR\": 18.5579,\n            \"SAR\": 12.7936,\n            \"ISR\": 15.3765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7333,\n            \"SIR\": 27.92,\n            \"SAR\": 13.3968,\n            \"ISR\": 17.6507\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8562,\n            \"SIR\": 35.5962,\n            \"SAR\": 24.6942,\n            \"ISR\": 19.5919\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9815,\n            \"SIR\": 20.5084,\n            \"SAR\": 12.2548,\n            \"ISR\": 16.6395\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8528,\n            \"SIR\": 23.1744,\n            \"SAR\": 12.4432,\n            \"ISR\": 15.2815\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.27707,\n            \"SIR\": 22.7964,\n            \"SAR\": 8.51968,\n            \"ISR\": 13.3434\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8097,\n            \"SIR\": 22.3462,\n            \"SAR\": 15.1678,\n            \"ISR\": 17.992\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.89108,\n            \"SIR\": 1.22211,\n            \"SAR\": 2.09401,\n            \"ISR\": 10.5277\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9056,\n            \"SIR\": 31.19,\n            \"SAR\": 21.7378,\n            \"ISR\": 17.8806\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.88317,\n            \"SIR\": 25.9374,\n            \"SAR\": 10.5835,\n            \"ISR\": 15.707\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0924,\n            \"SIR\": 30.6882,\n            \"SAR\": 20.5221,\n            \"ISR\": 19.1085\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.397,\n            \"SIR\": 29.8071,\n            \"SAR\": 11.8379,\n            \"ISR\": 15.666\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9739,\n            \"SIR\": 31.2668,\n            \"SAR\": 22.1716,\n            \"ISR\": 19.4989\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0323,\n            \"SIR\": 26.8458,\n            \"SAR\": 13.6038,\n            \"ISR\": 16.1593\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3098,\n            \"SIR\": 24.8406,\n            \"SAR\": 16.8251,\n            \"ISR\": 18.4072\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.17704,\n        \"SIR\": 20.0442,\n        \"SAR\": 10.1789,\n        \"ISR\": 14.3341\n      },\n      \"instrumental\": {\n        \"SDR\": 15.3057,\n        \"SIR\": 24.9653,\n        \"SAR\": 17.3077,\n        \"ISR\": 17.9731\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_3.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst 3\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.77412,\n            \"SIR\": 16.6923,\n            \"SAR\": 5.69646,\n            \"ISR\": 9.88816\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5002,\n            \"SIR\": 24.725,\n            \"SAR\": 19.8214,\n            \"ISR\": 18.6737\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.6432,\n            \"SIR\": 12.2482,\n            \"SAR\": 7.11848,\n            \"ISR\": 12.8001\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7117,\n            \"SIR\": 22.0246,\n            \"SAR\": 13.5846,\n            \"ISR\": 14.7927\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1246,\n            \"SIR\": 21.0051,\n            \"SAR\": 10.7916,\n            \"ISR\": 15.3012\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6591,\n            \"SIR\": 26.9529,\n            \"SAR\": 18.25,\n            \"ISR\": 18.066\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.35652,\n            \"SIR\": 5.33454,\n            \"SAR\": 4.18202,\n            \"ISR\": 13.0928\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9739,\n            \"SIR\": 26.0468,\n            \"SAR\": 13.1927,\n            \"ISR\": 13.1829\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1438,\n            \"SIR\": 23.0382,\n            \"SAR\": 13.1487,\n            \"ISR\": 15.6307\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6786,\n            \"SIR\": 23.0214,\n            \"SAR\": 15.7438,\n            \"ISR\": 17.3644\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0864,\n            \"SIR\": 17.9835,\n            \"SAR\": 11.0516,\n            \"ISR\": 14.8473\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9327,\n            \"SIR\": 22.0759,\n            \"SAR\": 13.5035,\n            \"ISR\": 15.6389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6754,\n            \"SIR\": 20.3464,\n            \"SAR\": 12.8819,\n            \"ISR\": 15.0967\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0212,\n            \"SIR\": 24.3924,\n            \"SAR\": 17.3601,\n            \"ISR\": 17.749\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.2107,\n            \"SIR\": 18.2829,\n            \"SAR\": 5.40131,\n            \"ISR\": 8.26437\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9332,\n            \"SIR\": 28.4715,\n            \"SAR\": 23.4861,\n            \"ISR\": 19.3926\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4826,\n            \"SIR\": 29.5294,\n            \"SAR\": 13.2369,\n            \"ISR\": 16.8931\n          },\n          \"instrumental\": {\n            \"SDR\": 15.443,\n            \"SIR\": 28.6977,\n            \"SAR\": 17.6316,\n            \"ISR\": 18.8707\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0404,\n            \"SIR\": 22.1815,\n            \"SAR\": 10.4648,\n            \"ISR\": 12.0875\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7184,\n            \"SIR\": 17.6872,\n            \"SAR\": 15.6322,\n            \"ISR\": 17.8432\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7543,\n            \"SIR\": 31.5931,\n            \"SAR\": 16.372,\n            \"ISR\": 18.2426\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5709,\n            \"SIR\": 32.1309,\n            \"SAR\": 19.5579,\n            \"ISR\": 19.1107\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.382,\n            \"SIR\": 21.2525,\n            \"SAR\": 11.7638,\n            \"ISR\": 15.0526\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3147,\n            \"SIR\": 25.443,\n            \"SAR\": 17.736,\n            \"ISR\": 18.0958\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.5724,\n            \"SIR\": 19.3888,\n            \"SAR\": 7.79543,\n            \"ISR\": 13.1371\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4097,\n            \"SIR\": 25.8265,\n            \"SAR\": 17.4231,\n            \"ISR\": 17.8031\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.73592,\n            \"SIR\": 17.3382,\n            \"SAR\": 10.7469,\n            \"ISR\": 11.9492\n          },\n          \"instrumental\": {\n            \"SDR\": 15.915,\n            \"SIR\": 23.6542,\n            \"SAR\": 19.4073,\n            \"ISR\": 17.4486\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.42874,\n            \"SIR\": 16.5161,\n            \"SAR\": 3.26707,\n            \"ISR\": 9.28924\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2586,\n            \"SIR\": 22.4019,\n            \"SAR\": 18.6682,\n            \"ISR\": 18.5654\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.73493,\n            \"SIR\": 20.3035,\n            \"SAR\": 10.3893,\n            \"ISR\": 15.0234\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1035,\n            \"SIR\": 25.2992,\n            \"SAR\": 16.2181,\n            \"ISR\": 17.4163\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.27787,\n            \"SIR\": 14.3533,\n            \"SAR\": 8.00501,\n            \"ISR\": 13.6838\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4864,\n            \"SIR\": 24.9136,\n            \"SAR\": 16.275,\n            \"ISR\": 17.3378\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -16.264,\n            \"SIR\": -35.0388,\n            \"SAR\": 0.564215,\n            \"ISR\": 11.1744\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7218,\n            \"SIR\": 57.8283,\n            \"SAR\": 14.3418,\n            \"ISR\": 13.9544\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.0136,\n            \"SIR\": 13.9806,\n            \"SAR\": 6.93255,\n            \"ISR\": 11.3927\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9361,\n            \"SIR\": 22.7253,\n            \"SAR\": 18.1929,\n            \"ISR\": 17.1884\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9558,\n            \"SIR\": 23.735,\n            \"SAR\": 11.5231,\n            \"ISR\": 14.8201\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6384,\n            \"SIR\": 25.9013,\n            \"SAR\": 18.0755,\n            \"ISR\": 18.6616\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7621,\n            \"SIR\": 21.5123,\n            \"SAR\": 11.5013,\n            \"ISR\": 15.7124\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2562,\n            \"SIR\": 30.9834,\n            \"SAR\": 20.9351,\n            \"ISR\": 18.767\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.9306,\n            \"SIR\": -9.79013,\n            \"SAR\": -0.79942,\n            \"ISR\": 11.6277\n          },\n          \"instrumental\": {\n            \"SDR\": 19.4271,\n            \"SIR\": 41.7745,\n            \"SAR\": 29.8428,\n            \"ISR\": 19.1896\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.00125,\n            \"SIR\": 13.5107,\n            \"SAR\": 3.90179,\n            \"ISR\": 6.47348\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5122,\n            \"SIR\": 14.1438,\n            \"SAR\": 14.2941,\n            \"ISR\": 16.7864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.38178,\n            \"SIR\": 15.6241,\n            \"SAR\": 6.94665,\n            \"ISR\": 12.8522\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3635,\n            \"SIR\": 28.9353,\n            \"SAR\": 20.5629,\n            \"ISR\": 18.6211\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.70257,\n            \"SIR\": 18.1048,\n            \"SAR\": 8.07648,\n            \"ISR\": 11.6728\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5529,\n            \"SIR\": 22.4592,\n            \"SAR\": 17.4272,\n            \"ISR\": 18.1521\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.40232,\n            \"SIR\": 16.5069,\n            \"SAR\": 8.71018,\n            \"ISR\": 14.3452\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4161,\n            \"SIR\": 25.7865,\n            \"SAR\": 17.1866,\n            \"ISR\": 17.2058\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.428,\n            \"SIR\": 23.3453,\n            \"SAR\": 11.8697,\n            \"ISR\": 14.1061\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9237,\n            \"SIR\": 20.452,\n            \"SAR\": 14.3812,\n            \"ISR\": 17.3982\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.51638,\n            \"SIR\": 15.9289,\n            \"SAR\": 6.18442,\n            \"ISR\": 10.6903\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7455,\n            \"SIR\": 18.0861,\n            \"SAR\": 13.8035,\n            \"ISR\": 15.9961\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.15579,\n            \"SIR\": 8.54681,\n            \"SAR\": 8.55723,\n            \"ISR\": 15.4401\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5769,\n            \"SIR\": 24.2447,\n            \"SAR\": 13.784,\n            \"ISR\": 14.4954\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6892,\n            \"SIR\": 29.3146,\n            \"SAR\": 14.955,\n            \"ISR\": 16.2879\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9736,\n            \"SIR\": 21.9336,\n            \"SAR\": 15.411,\n            \"ISR\": 18.6156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1469,\n            \"SIR\": 22.6559,\n            \"SAR\": 11.5175,\n            \"ISR\": 15.5156\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1849,\n            \"SIR\": 24.3213,\n            \"SAR\": 15.8428,\n            \"ISR\": 17.369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2579,\n            \"SIR\": 23.7354,\n            \"SAR\": 12.0039,\n            \"ISR\": 15.6007\n          },\n          \"instrumental\": {\n            \"SDR\": 16.936,\n            \"SIR\": 28.954,\n            \"SAR\": 21.1182,\n            \"ISR\": 19.1507\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.33279,\n            \"SIR\": 9.47367,\n            \"SAR\": 6.30509,\n            \"ISR\": 10.6033\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9254,\n            \"SIR\": 18.8455,\n            \"SAR\": 13.2567,\n            \"ISR\": 15.6338\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5684,\n            \"SIR\": 27.0175,\n            \"SAR\": 13.4392,\n            \"ISR\": 17.6309\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7265,\n            \"SIR\": 35.5074,\n            \"SAR\": 24.6931,\n            \"ISR\": 19.4144\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.879,\n            \"SIR\": 20.8992,\n            \"SAR\": 12.2585,\n            \"ISR\": 16.2141\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6991,\n            \"SIR\": 21.9277,\n            \"SAR\": 12.6751,\n            \"ISR\": 15.4842\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.21066,\n            \"SIR\": 21.9736,\n            \"SAR\": 8.32494,\n            \"ISR\": 13.4856\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8897,\n            \"SIR\": 22.3811,\n            \"SAR\": 15.2035,\n            \"ISR\": 17.7277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.27717,\n            \"SIR\": 1.10583,\n            \"SAR\": 1.99922,\n            \"ISR\": 10.8617\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8766,\n            \"SIR\": 31.135,\n            \"SAR\": 21.5214,\n            \"ISR\": 17.7908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1355,\n            \"SIR\": 24.331,\n            \"SAR\": 10.7942,\n            \"ISR\": 15.6704\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8695,\n            \"SIR\": 30.7303,\n            \"SAR\": 20.4932,\n            \"ISR\": 18.7683\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4206,\n            \"SIR\": 28.1421,\n            \"SAR\": 12.0984,\n            \"ISR\": 15.8227\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9059,\n            \"SIR\": 31.6985,\n            \"SAR\": 22.2949,\n            \"ISR\": 19.344\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8826,\n            \"SIR\": 28.5669,\n            \"SAR\": 13.7229,\n            \"ISR\": 15.697\n          },\n          \"instrumental\": {\n            \"SDR\": 15.32,\n            \"SIR\": 23.6641,\n            \"SAR\": 17.2697,\n            \"ISR\": 18.6702\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.23542,\n        \"SIR\": 19.8462,\n        \"SAR\": 10.4271,\n        \"ISR\": 14.2256\n      },\n      \"instrumental\": {\n        \"SDR\": 15.3174,\n        \"SIR\": 24.8193,\n        \"SAR\": 17.3916,\n        \"ISR\": 17.797\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR_MDXNET_KARA.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Karaoke\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.5289,\n            \"SIR\": 11.2152,\n            \"SAR\": 2.35093,\n            \"ISR\": 5.35324\n          },\n          \"instrumental\": {\n            \"SDR\": 15.417,\n            \"SIR\": 19.1696,\n            \"SAR\": 18.0835,\n            \"ISR\": 18.3171\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0564,\n            \"SIR\": 20.8422,\n            \"SAR\": 3.04063,\n            \"ISR\": 5.94235\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0574,\n            \"SIR\": 12.3998,\n            \"SAR\": 13.613,\n            \"ISR\": 18.6982\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42771,\n            \"SIR\": 21.4445,\n            \"SAR\": 7.27498,\n            \"ISR\": 9.48009\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2172,\n            \"SIR\": 17.0912,\n            \"SAR\": 15.8817,\n            \"ISR\": 18.4665\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.2417,\n            \"SIR\": 4.58352,\n            \"SAR\": 0.58045,\n            \"ISR\": 7.36285\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8566,\n            \"SIR\": 17.7441,\n            \"SAR\": 12.9671,\n            \"ISR\": 14.3204\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.05704,\n            \"SIR\": 25.0258,\n            \"SAR\": 6.52112,\n            \"ISR\": 8.94029\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7104,\n            \"SIR\": 12.4674,\n            \"SAR\": 11.6818,\n            \"ISR\": 18.4151\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.22703,\n            \"SIR\": 16.9934,\n            \"SAR\": 3.32207,\n            \"ISR\": 5.83474\n          },\n          \"instrumental\": {\n            \"SDR\": 6.8695,\n            \"SIR\": 8.37493,\n            \"SAR\": 10.1326,\n            \"ISR\": 16.744\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00129,\n            \"SIR\": 19.8825,\n            \"SAR\": -0.00767,\n            \"ISR\": 0.28846\n          },\n          \"instrumental\": {\n            \"SDR\": 5.40708,\n            \"SIR\": 4.76193,\n            \"SAR\": 26.5519,\n            \"ISR\": 19.7353\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.11186,\n            \"SIR\": 18.2667,\n            \"SAR\": 1.48356,\n            \"ISR\": 5.73909\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6823,\n            \"SIR\": 23.9893,\n            \"SAR\": 22.9554,\n            \"ISR\": 19.4904\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.797,\n            \"SIR\": 31.2002,\n            \"SAR\": 12.566,\n            \"ISR\": 15.543\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0166,\n            \"SIR\": 24.901,\n            \"SAR\": 17.0767,\n            \"ISR\": 18.9879\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9809,\n            \"SIR\": 26.5403,\n            \"SAR\": 5.89875,\n            \"ISR\": 8.22557\n          },\n          \"instrumental\": {\n            \"SDR\": 14.935,\n            \"SIR\": 12.1093,\n            \"SAR\": 12.8441,\n            \"ISR\": 18.8993\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5143,\n            \"SIR\": 32.4218,\n            \"SAR\": 15.1878,\n            \"ISR\": 16.9864\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0623,\n            \"SIR\": 27.3009,\n            \"SAR\": 18.555,\n            \"ISR\": 19.2452\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.85284,\n            \"SIR\": 24.3374,\n            \"SAR\": 8.20191,\n            \"ISR\": 10.2457\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7617,\n            \"SIR\": 16.2322,\n            \"SAR\": 15.6337,\n            \"ISR\": 18.9877\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.90552,\n            \"SIR\": 20.4224,\n            \"SAR\": 1.03737,\n            \"ISR\": 4.32479\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8752,\n            \"SIR\": 13.851,\n            \"SAR\": 14.6467,\n            \"ISR\": 18.7664\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.50783,\n            \"SIR\": 20.7911,\n            \"SAR\": 5.41322,\n            \"ISR\": 8.66734\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8127,\n            \"SIR\": 21.8516,\n            \"SAR\": 18.7515,\n            \"ISR\": 19.1797\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.65378,\n            \"SIR\": 17.9674,\n            \"SAR\": 1.90169,\n            \"ISR\": 7.14682\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7536,\n            \"SIR\": 19.723,\n            \"SAR\": 18.36,\n            \"ISR\": 18.8948\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.67484,\n            \"SIR\": 20.7702,\n            \"SAR\": 6.83407,\n            \"ISR\": 8.76928\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3704,\n            \"SIR\": 15.2497,\n            \"SAR\": 14.0335,\n            \"ISR\": 18.155\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.22372,\n            \"SIR\": 25.288,\n            \"SAR\": 5.07355,\n            \"ISR\": 8.02649\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8656,\n            \"SIR\": 16.5137,\n            \"SAR\": 15.2239,\n            \"ISR\": 19.1731\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.45822,\n            \"SIR\": -21.5837,\n            \"SAR\": 0.136965,\n            \"ISR\": 2.6042\n          },\n          \"instrumental\": {\n            \"SDR\": 19.9325,\n            \"SIR\": 49.8401,\n            \"SAR\": 31.6203,\n            \"ISR\": 18.7976\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.01196,\n            \"SIR\": 19.1251,\n            \"SAR\": -0.41595,\n            \"ISR\": 1.09576\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8854,\n            \"SIR\": 10.4612,\n            \"SAR\": 22.2658,\n            \"ISR\": 19.704\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.09551,\n            \"SIR\": 24.5233,\n            \"SAR\": 7.80483,\n            \"ISR\": 9.33324\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9395,\n            \"SIR\": 16.8294,\n            \"SAR\": 15.563,\n            \"ISR\": 18.9361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.5556,\n            \"SIR\": 23.7006,\n            \"SAR\": 7.94631,\n            \"ISR\": 10.6868\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3188,\n            \"SIR\": 21.3961,\n            \"SAR\": 19.2505,\n            \"ISR\": 19.2721\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.62319,\n            \"SIR\": 1.91057,\n            \"SAR\": -0.56588,\n            \"ISR\": 3.09224\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7168,\n            \"SIR\": 38.9092,\n            \"SAR\": 30.8064,\n            \"ISR\": 19.2763\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.54567,\n            \"SIR\": 15.3412,\n            \"SAR\": 2.60552,\n            \"ISR\": 5.09136\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5507,\n            \"SIR\": 12.6737,\n            \"SAR\": 14.7341,\n            \"ISR\": 17.7351\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.002335,\n            \"SIR\": 19.0673,\n            \"SAR\": 0.491215,\n            \"ISR\": 3.59767\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4957,\n            \"SIR\": 16.2419,\n            \"SAR\": 17.5179,\n            \"ISR\": 19.493\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.58333,\n            \"SIR\": 20.5853,\n            \"SAR\": 5.18569,\n            \"ISR\": 7.65773\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8154,\n            \"SIR\": 16.7797,\n            \"SAR\": 16.1842,\n            \"ISR\": 18.8164\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.52231,\n            \"SIR\": 20.8251,\n            \"SAR\": 5.04768,\n            \"ISR\": 7.58749\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8273,\n            \"SIR\": 17.189,\n            \"SAR\": 15.6916,\n            \"ISR\": 18.6621\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.31775,\n            \"SIR\": 25.9117,\n            \"SAR\": 8.49792,\n            \"ISR\": 9.32783\n          },\n          \"instrumental\": {\n            \"SDR\": 9.74533,\n            \"SIR\": 12.3206,\n            \"SAR\": 11.7556,\n            \"ISR\": 18.3058\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.33631,\n            \"SIR\": 22.5779,\n            \"SAR\": 2.94582,\n            \"ISR\": 5.59719\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7145,\n            \"SIR\": 11.6574,\n            \"SAR\": 13.528,\n            \"ISR\": 18.4936\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.65104,\n            \"SIR\": 21.2336,\n            \"SAR\": 4.28864,\n            \"ISR\": 7.10642\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2511,\n            \"SIR\": 13.8117,\n            \"SAR\": 14.0231,\n            \"ISR\": 18.0858\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1986,\n            \"SIR\": 28.509,\n            \"SAR\": 10.741,\n            \"ISR\": 12.354\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6971,\n            \"SIR\": 15.8368,\n            \"SAR\": 12.7692,\n            \"ISR\": 18.6517\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.57878,\n            \"SIR\": 24.4588,\n            \"SAR\": 6.62173,\n            \"ISR\": 8.29279\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4588,\n            \"SIR\": 13.0202,\n            \"SAR\": 12.5844,\n            \"ISR\": 18.3553\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.0198,\n            \"SIR\": 32.5431,\n            \"SAR\": 0.6603,\n            \"ISR\": 3.93775\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6696,\n            \"SIR\": 13.3786,\n            \"SAR\": 15.1539,\n            \"ISR\": 19.7558\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00014,\n            \"SIR\": -20.6212,\n            \"SAR\": -0.45702,\n            \"ISR\": 0.01726\n          },\n          \"instrumental\": {\n            \"SDR\": 6.57204,\n            \"SIR\": 5.8997,\n            \"SAR\": 35.7573,\n            \"ISR\": 19.3374\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.41611,\n            \"SIR\": 20.7799,\n            \"SAR\": 9.15099,\n            \"ISR\": 13.4741\n          },\n          \"instrumental\": {\n            \"SDR\": 17.591,\n            \"SIR\": 25.573,\n            \"SAR\": 20.5776,\n            \"ISR\": 18.8046\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.85247,\n            \"SIR\": 20.7196,\n            \"SAR\": 8.5456,\n            \"ISR\": 11.3854\n          },\n          \"instrumental\": {\n            \"SDR\": 9.95266,\n            \"SIR\": 14.2309,\n            \"SAR\": 10.9413,\n            \"ISR\": 15.8568\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.64485,\n            \"SIR\": 25.3723,\n            \"SAR\": 4.9917,\n            \"ISR\": 9.26667\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7043,\n            \"SIR\": 16.7264,\n            \"SAR\": 13.8338,\n            \"ISR\": 18.5124\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.47455,\n            \"SIR\": 7.92686,\n            \"SAR\": 2.24785,\n            \"ISR\": 9.32628\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4367,\n            \"SIR\": 29.5998,\n            \"SAR\": 21.6625,\n            \"ISR\": 17.9467\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.55973,\n            \"SIR\": 24.4062,\n            \"SAR\": 9.26289,\n            \"ISR\": 13.9007\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7302,\n            \"SIR\": 26.6979,\n            \"SAR\": 20.0271,\n            \"ISR\": 19.0294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.51993,\n            \"SIR\": 28.8807,\n            \"SAR\": 5.99657,\n            \"ISR\": 9.43659\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3484,\n            \"SIR\": 19.858,\n            \"SAR\": 18.2318,\n            \"ISR\": 19.3998\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1749,\n            \"SIR\": 31.3892,\n            \"SAR\": 11.1233,\n            \"ISR\": 11.9124\n          },\n          \"instrumental\": {\n            \"SDR\": 13.652,\n            \"SIR\": 17.0774,\n            \"SAR\": 15.6846,\n            \"ISR\": 19.0312\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 5.58549,\n        \"SIR\": 20.8336,\n        \"SAR\": 5.06062,\n        \"ISR\": 8.12603\n      },\n      \"instrumental\": {\n        \"SDR\": 14.0784,\n        \"SIR\": 16.62,\n        \"SAR\": 15.6881,\n        \"ISR\": 18.8105\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR_MDXNET_KARA_2.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Karaoke 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.00209,\n            \"SIR\": 14.6178,\n            \"SAR\": 4.02018,\n            \"ISR\": 6.89909\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0738,\n            \"SIR\": 21.7738,\n            \"SAR\": 19.382,\n            \"ISR\": 18.5912\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.61701,\n            \"SIR\": 15.9976,\n            \"SAR\": 3.58878,\n            \"ISR\": 7.10318\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1291,\n            \"SIR\": 13.1536,\n            \"SAR\": 12.9738,\n            \"ISR\": 17.3359\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.4403,\n            \"SIR\": 20.9256,\n            \"SAR\": 6.77748,\n            \"ISR\": 9.65391\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4604,\n            \"SIR\": 17.2931,\n            \"SAR\": 15.5827,\n            \"ISR\": 18.4917\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.05866,\n            \"SIR\": 6.64083,\n            \"SAR\": 0.42683,\n            \"ISR\": 8.25522\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1908,\n            \"SIR\": 18.6576,\n            \"SAR\": 13.2026,\n            \"ISR\": 15.1118\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.74512,\n            \"SIR\": 23.034,\n            \"SAR\": 6.60947,\n            \"ISR\": 9.47989\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0425,\n            \"SIR\": 13.0028,\n            \"SAR\": 11.5975,\n            \"ISR\": 18.0243\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.85386,\n            \"SIR\": 17.1679,\n            \"SAR\": 3.48227,\n            \"ISR\": 5.91749\n          },\n          \"instrumental\": {\n            \"SDR\": 6.70028,\n            \"SIR\": 8.54596,\n            \"SAR\": 10.0753,\n            \"ISR\": 16.7695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.23268,\n            \"SIR\": 9.92286,\n            \"SAR\": -0.03788,\n            \"ISR\": 0.49938\n          },\n          \"instrumental\": {\n            \"SDR\": 5.60307,\n            \"SIR\": 5.0142,\n            \"SAR\": 25.1527,\n            \"ISR\": 19.4889\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.13846,\n            \"SIR\": 15.7682,\n            \"SAR\": 0.2034,\n            \"ISR\": 5.63615\n          },\n          \"instrumental\": {\n            \"SDR\": 17.601,\n            \"SIR\": 23.7892,\n            \"SAR\": 21.2591,\n            \"ISR\": 19.6766\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1373,\n            \"SIR\": 27.2753,\n            \"SAR\": 12.886,\n            \"ISR\": 16.4542\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0802,\n            \"SIR\": 27.6426,\n            \"SAR\": 17.2078,\n            \"ISR\": 18.7498\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0084,\n            \"SIR\": 22.9223,\n            \"SAR\": 5.92398,\n            \"ISR\": 8.46033\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1456,\n            \"SIR\": 12.3467,\n            \"SAR\": 12.6568,\n            \"ISR\": 18.3492\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.328,\n            \"SIR\": 27.8906,\n            \"SAR\": 16.108,\n            \"ISR\": 18.0169\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2058,\n            \"SIR\": 31.5902,\n            \"SAR\": 19.2708,\n            \"ISR\": 18.7522\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.32362,\n            \"SIR\": 21.663,\n            \"SAR\": 7.618,\n            \"ISR\": 10.1983\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5594,\n            \"SIR\": 15.9231,\n            \"SAR\": 15.037,\n            \"ISR\": 18.656\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.17281,\n            \"SIR\": 19.7962,\n            \"SAR\": 2.65205,\n            \"ISR\": 5.91568\n          },\n          \"instrumental\": {\n            \"SDR\": 15.314,\n            \"SIR\": 15.6859,\n            \"SAR\": 14.8395,\n            \"ISR\": 18.6209\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.37859,\n            \"SIR\": 22.4801,\n            \"SAR\": 7.87591,\n            \"ISR\": 8.1133\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1582,\n            \"SIR\": 19.3483,\n            \"SAR\": 17.7539,\n            \"ISR\": 18.6317\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.80189,\n            \"SIR\": 17.9656,\n            \"SAR\": 2.43705,\n            \"ISR\": 7.29403\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1775,\n            \"SIR\": 20.1025,\n            \"SAR\": 18.9576,\n            \"ISR\": 19.0825\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.71807,\n            \"SIR\": 11.2673,\n            \"SAR\": 4.22871,\n            \"ISR\": 8.25058\n          },\n          \"instrumental\": {\n            \"SDR\": 10.511,\n            \"SIR\": 14.375,\n            \"SAR\": 11.6415,\n            \"ISR\": 14.8821\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.41812,\n            \"SIR\": 11.8352,\n            \"SAR\": 3.96709,\n            \"ISR\": 8.75844\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4304,\n            \"SIR\": 17.374,\n            \"SAR\": 14.3986,\n            \"ISR\": 17.683\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -23.1605,\n            \"SIR\": -28.9331,\n            \"SAR\": 3.25109,\n            \"ISR\": 3.44229\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6102,\n            \"SIR\": 51.8458,\n            \"SAR\": 26.2953,\n            \"ISR\": 17.5994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.02973,\n            \"SIR\": 8.93427,\n            \"SAR\": -2.39192,\n            \"ISR\": 1.21133\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4176,\n            \"SIR\": 10.5108,\n            \"SAR\": 21.0039,\n            \"ISR\": 19.1387\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.55331,\n            \"SIR\": 22.6488,\n            \"SAR\": 8.06089,\n            \"ISR\": 10.1118\n          },\n          \"instrumental\": {\n            \"SDR\": 14.292,\n            \"SIR\": 17.3233,\n            \"SAR\": 15.6319,\n            \"ISR\": 19.0169\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.55592,\n            \"SIR\": 20.5024,\n            \"SAR\": 7.03764,\n            \"ISR\": 11.308\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0765,\n            \"SIR\": 22.1445,\n            \"SAR\": 18.6733,\n            \"ISR\": 19.0655\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.3401,\n            \"SIR\": -10.285,\n            \"SAR\": 2.02631,\n            \"ISR\": 13.1445\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5384,\n            \"SIR\": 44.2884,\n            \"SAR\": 27.7559,\n            \"ISR\": 18.5052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.60041,\n            \"SIR\": 13.9902,\n            \"SAR\": 2.79028,\n            \"ISR\": 5.53478\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5275,\n            \"SIR\": 13.1438,\n            \"SAR\": 14.2331,\n            \"ISR\": 17.2593\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.06378,\n            \"SIR\": 14.0861,\n            \"SAR\": 2.22556,\n            \"ISR\": 3.74632\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6304,\n            \"SIR\": 17.1904,\n            \"SAR\": 17.7502,\n            \"ISR\": 18.9147\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.90428,\n            \"SIR\": 18.6378,\n            \"SAR\": 5.25325,\n            \"ISR\": 7.95948\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9243,\n            \"SIR\": 17.2284,\n            \"SAR\": 16.0703,\n            \"ISR\": 18.7996\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.9228,\n            \"SIR\": 20.4127,\n            \"SAR\": 5.21735,\n            \"ISR\": 8.13422\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1717,\n            \"SIR\": 17.8859,\n            \"SAR\": 15.7567,\n            \"ISR\": 18.7401\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.53657,\n            \"SIR\": 24.2046,\n            \"SAR\": 8.55696,\n            \"ISR\": 9.67778\n          },\n          \"instrumental\": {\n            \"SDR\": 9.88331,\n            \"SIR\": 12.7892,\n            \"SAR\": 11.6531,\n            \"ISR\": 18.2938\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.02634,\n            \"SIR\": 18.2885,\n            \"SAR\": 3.03615,\n            \"ISR\": 6.154\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1282,\n            \"SIR\": 12.3596,\n            \"SAR\": 13.3898,\n            \"ISR\": 17.6122\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.97955,\n            \"SIR\": 3.029,\n            \"SAR\": 3.86532,\n            \"ISR\": 7.06799\n          },\n          \"instrumental\": {\n            \"SDR\": 9.17006,\n            \"SIR\": 12.6082,\n            \"SAR\": 9.8683,\n            \"ISR\": 13.2213\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.895,\n            \"SIR\": 25.8397,\n            \"SAR\": 12.9456,\n            \"ISR\": 14.951\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4513,\n            \"SIR\": 18.7673,\n            \"SAR\": 13.6884,\n            \"ISR\": 17.4373\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.73761,\n            \"SIR\": 22.2739,\n            \"SAR\": 6.93508,\n            \"ISR\": 8.94096\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3558,\n            \"SIR\": 14.0053,\n            \"SAR\": 12.5062,\n            \"ISR\": 18.0781\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.24876,\n            \"SIR\": 15.1884,\n            \"SAR\": -0.06719,\n            \"ISR\": 3.41222\n          },\n          \"instrumental\": {\n            \"SDR\": 15.939,\n            \"SIR\": 12.4749,\n            \"SAR\": 14.844,\n            \"ISR\": 19.4028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.4164,\n            \"SIR\": 2.98322,\n            \"SAR\": 2.62964,\n            \"ISR\": 0.79752\n          },\n          \"instrumental\": {\n            \"SDR\": 7.02398,\n            \"SIR\": 6.94421,\n            \"SAR\": 21.624,\n            \"ISR\": 16.9634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.0089,\n            \"SIR\": 19.5793,\n            \"SAR\": 9.74325,\n            \"ISR\": 15.2366\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0604,\n            \"SIR\": 28.919,\n            \"SAR\": 21.2845,\n            \"ISR\": 18.9835\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.71089,\n            \"SIR\": 19.7459,\n            \"SAR\": 9.98765,\n            \"ISR\": 13.4251\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6064,\n            \"SIR\": 17.4769,\n            \"SAR\": 11.4112,\n            \"ISR\": 15.0596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.44735,\n            \"SIR\": 23.9016,\n            \"SAR\": 6.00637,\n            \"ISR\": 10.073\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0476,\n            \"SIR\": 18.3528,\n            \"SAR\": 14.0672,\n            \"ISR\": 18.5943\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.28806,\n            \"SIR\": 1.58512,\n            \"SAR\": 2.15331,\n            \"ISR\": 10.1362\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2467,\n            \"SIR\": 30.7199,\n            \"SAR\": 21.4521,\n            \"ISR\": 17.9494\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.06921,\n            \"SIR\": 21.0888,\n            \"SAR\": 9.64567,\n            \"ISR\": 14.8451\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4757,\n            \"SIR\": 28.7963,\n            \"SAR\": 20.1996,\n            \"ISR\": 18.3311\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.71127,\n            \"SIR\": 28.61,\n            \"SAR\": 7.06729,\n            \"ISR\": 10.4639\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9043,\n            \"SIR\": 21.4157,\n            \"SAR\": 19.0342,\n            \"ISR\": 19.7535\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6962,\n            \"SIR\": 26.9905,\n            \"SAR\": 12.4498,\n            \"ISR\": 14.6145\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0861,\n            \"SIR\": 20.8174,\n            \"SAR\": 16.1717,\n            \"ISR\": 19.0106\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 5.43274,\n        \"SIR\": 19.1086,\n        \"SAR\": 4.72303,\n        \"ISR\": 8.35778\n      },\n      \"instrumental\": {\n        \"SDR\": 14.7703,\n        \"SIR\": 17.3486,\n        \"SAR\": 15.6943,\n        \"ISR\": 18.5482\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR_MDXNET_9482.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR_MDXNET_9482\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.70393,\n            \"SIR\": 15.5526,\n            \"SAR\": 4.81548,\n            \"ISR\": 9.38948\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1586,\n            \"SIR\": 23.716,\n            \"SAR\": 19.3851,\n            \"ISR\": 18.5291\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.07234,\n            \"SIR\": 11.5188,\n            \"SAR\": 6.46524,\n            \"ISR\": 12.3028\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0337,\n            \"SIR\": 21.0694,\n            \"SAR\": 12.6453,\n            \"ISR\": 14.3454\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.57444,\n            \"SIR\": 20.8463,\n            \"SAR\": 10.1759,\n            \"ISR\": 14.1441\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3294,\n            \"SIR\": 24.6644,\n            \"SAR\": 17.8505,\n            \"ISR\": 18.0982\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.61298,\n            \"SIR\": 6.81609,\n            \"SAR\": 4.26762,\n            \"ISR\": 11.5903\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5345,\n            \"SIR\": 24.1792,\n            \"SAR\": 13.814,\n            \"ISR\": 14.2408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7806,\n            \"SIR\": 22.7088,\n            \"SAR\": 12.4303,\n            \"ISR\": 14.4563\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9472,\n            \"SIR\": 20.8437,\n            \"SAR\": 14.8862,\n            \"ISR\": 17.3182\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.61124,\n            \"SIR\": 17.1145,\n            \"SAR\": 10.6343,\n            \"ISR\": 14.5098\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3793,\n            \"SIR\": 21.2875,\n            \"SAR\": 12.9182,\n            \"ISR\": 15.3201\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0505,\n            \"SIR\": 19.4148,\n            \"SAR\": 12.0564,\n            \"ISR\": 14.782\n          },\n          \"instrumental\": {\n            \"SDR\": 14.514,\n            \"SIR\": 22.9755,\n            \"SAR\": 16.5918,\n            \"ISR\": 17.5083\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.81049,\n            \"SIR\": 18.6439,\n            \"SAR\": 5.42254,\n            \"ISR\": 7.35925\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9928,\n            \"SIR\": 27.498,\n            \"SAR\": 23.8777,\n            \"ISR\": 19.4397\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.593,\n            \"SIR\": 30.4032,\n            \"SAR\": 12.274,\n            \"ISR\": 15.7601\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9079,\n            \"SIR\": 25.407,\n            \"SAR\": 16.8949,\n            \"ISR\": 18.8801\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0795,\n            \"SIR\": 22.2326,\n            \"SAR\": 9.17702,\n            \"ISR\": 11.1506\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9824,\n            \"SIR\": 17.1881,\n            \"SAR\": 14.7807,\n            \"ISR\": 18.0436\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7058,\n            \"SIR\": 31.0123,\n            \"SAR\": 15.1995,\n            \"ISR\": 17.2948\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9551,\n            \"SIR\": 28.0224,\n            \"SAR\": 18.3023,\n            \"ISR\": 19.094\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.075,\n            \"SIR\": 20.6309,\n            \"SAR\": 11.4508,\n            \"ISR\": 14.7976\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1607,\n            \"SIR\": 25.3332,\n            \"SAR\": 17.4097,\n            \"ISR\": 18.0006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.27385,\n            \"SIR\": 19.1073,\n            \"SAR\": 7.71731,\n            \"ISR\": 12.466\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2578,\n            \"SIR\": 24.758,\n            \"SAR\": 17.3261,\n            \"ISR\": 17.7504\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.07183,\n            \"SIR\": 15.6675,\n            \"SAR\": 8.38538,\n            \"ISR\": 12.2443\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9784,\n            \"SIR\": 24.0664,\n            \"SAR\": 19.6213,\n            \"ISR\": 17.8618\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.85222,\n            \"SIR\": 16.1512,\n            \"SAR\": 2.51033,\n            \"ISR\": 8.36815\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9263,\n            \"SIR\": 21.5584,\n            \"SAR\": 18.2906,\n            \"ISR\": 18.5949\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.34075,\n            \"SIR\": 20.4862,\n            \"SAR\": 10.0069,\n            \"ISR\": 14.4871\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0295,\n            \"SIR\": 24.0439,\n            \"SAR\": 15.9236,\n            \"ISR\": 17.6072\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.03683,\n            \"SIR\": 25.3606,\n            \"SAR\": 8.27512,\n            \"ISR\": 11.7679\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5411,\n            \"SIR\": 21.6575,\n            \"SAR\": 17.0623,\n            \"ISR\": 18.9216\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -10.3898,\n            \"SIR\": -34.4487,\n            \"SAR\": 0.35722,\n            \"ISR\": 5.99467\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8776,\n            \"SIR\": 55.2309,\n            \"SAR\": 13.3314,\n            \"ISR\": 13.8653\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.84294,\n            \"SIR\": 15.1574,\n            \"SAR\": 6.63296,\n            \"ISR\": 10.3164\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1842,\n            \"SIR\": 21.1175,\n            \"SAR\": 17.9219,\n            \"ISR\": 17.6173\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2859,\n            \"SIR\": 22.9345,\n            \"SAR\": 10.2,\n            \"ISR\": 13.6116\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1128,\n            \"SIR\": 23.2923,\n            \"SAR\": 17.0267,\n            \"ISR\": 18.5958\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2154,\n            \"SIR\": 21.7924,\n            \"SAR\": 10.8697,\n            \"ISR\": 15.1649\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2812,\n            \"SIR\": 30.4496,\n            \"SAR\": 20.6088,\n            \"ISR\": 18.8924\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -6.82274,\n            \"SIR\": -0.63483,\n            \"SAR\": -0.25581,\n            \"ISR\": 3.20113\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7076,\n            \"SIR\": 39.9579,\n            \"SAR\": 29.8587,\n            \"ISR\": 18.9095\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.90095,\n            \"SIR\": 14.4138,\n            \"SAR\": 3.6033,\n            \"ISR\": 6.06761\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5705,\n            \"SIR\": 13.7069,\n            \"SAR\": 14.531,\n            \"ISR\": 17.2314\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.33652,\n            \"SIR\": 16.4959,\n            \"SAR\": 5.90402,\n            \"ISR\": 12.0358\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0529,\n            \"SIR\": 28.1791,\n            \"SAR\": 19.4823,\n            \"ISR\": 18.6711\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.28505,\n            \"SIR\": 18.7056,\n            \"SAR\": 7.40894,\n            \"ISR\": 10.6887\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2056,\n            \"SIR\": 20.9492,\n            \"SAR\": 17.0286,\n            \"ISR\": 18.3318\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95278,\n            \"SIR\": 15.3467,\n            \"SAR\": 8.00745,\n            \"ISR\": 13.8601\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0814,\n            \"SIR\": 24.6767,\n            \"SAR\": 16.214,\n            \"ISR\": 17.0421\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0594,\n            \"SIR\": 23.5381,\n            \"SAR\": 11.2134,\n            \"ISR\": 13.524\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5717,\n            \"SIR\": 19.4589,\n            \"SAR\": 13.7668,\n            \"ISR\": 17.2927\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.19798,\n            \"SIR\": 13.9414,\n            \"SAR\": 5.74761,\n            \"ISR\": 10.9749\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4426,\n            \"SIR\": 18.7032,\n            \"SAR\": 13.481,\n            \"ISR\": 15.7538\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.22553,\n            \"SIR\": 21.7759,\n            \"SAR\": 9.85403,\n            \"ISR\": 14.1877\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3882,\n            \"SIR\": 22.6655,\n            \"SAR\": 16.4587,\n            \"ISR\": 17.5904\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9983,\n            \"SIR\": 28.7298,\n            \"SAR\": 13.6195,\n            \"ISR\": 15.1819\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3429,\n            \"SIR\": 19.4762,\n            \"SAR\": 14.2158,\n            \"ISR\": 18.6116\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6462,\n            \"SIR\": 23.0367,\n            \"SAR\": 11.1154,\n            \"ISR\": 14.5782\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8868,\n            \"SIR\": 22.0221,\n            \"SAR\": 15.4119,\n            \"ISR\": 17.4377\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.80325,\n            \"SIR\": 30.5478,\n            \"SAR\": 10.7027,\n            \"ISR\": 12.4313\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9112,\n            \"SIR\": 23.2796,\n            \"SAR\": 20.2875,\n            \"ISR\": 19.5943\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.50833,\n            \"SIR\": 9.98863,\n            \"SAR\": 6.62481,\n            \"ISR\": 9.32092\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1524,\n            \"SIR\": 16.4919,\n            \"SAR\": 13.8134,\n            \"ISR\": 16.0344\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8572,\n            \"SIR\": 25.6005,\n            \"SAR\": 12.9112,\n            \"ISR\": 17.098\n          },\n          \"instrumental\": {\n            \"SDR\": 18.313,\n            \"SIR\": 33.4432,\n            \"SAR\": 23.3386,\n            \"ISR\": 19.2252\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6872,\n            \"SIR\": 19.2425,\n            \"SAR\": 11.8296,\n            \"ISR\": 16.353\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2338,\n            \"SIR\": 22.4637,\n            \"SAR\": 11.8464,\n            \"ISR\": 14.5815\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.66394,\n            \"SIR\": 22.5477,\n            \"SAR\": 7.65051,\n            \"ISR\": 12.2621\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1064,\n            \"SIR\": 20.8688,\n            \"SAR\": 14.572,\n            \"ISR\": 17.8358\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.79162,\n            \"SIR\": 7.26558,\n            \"SAR\": 1.97903,\n            \"ISR\": 9.30672\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4457,\n            \"SIR\": 29.8962,\n            \"SAR\": 21.4895,\n            \"ISR\": 17.7156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.64099,\n            \"SIR\": 24.9404,\n            \"SAR\": 10.27,\n            \"ISR\": 15.2421\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7311,\n            \"SIR\": 29.6512,\n            \"SAR\": 20.1109,\n            \"ISR\": 18.9025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5596,\n            \"SIR\": 27.3402,\n            \"SAR\": 10.8957,\n            \"ISR\": 14.6915\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6392,\n            \"SIR\": 29.0337,\n            \"SAR\": 21.2428,\n            \"ISR\": 19.1831\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8846,\n            \"SIR\": 28.7445,\n            \"SAR\": 11.8919,\n            \"ISR\": 13.1481\n          },\n          \"instrumental\": {\n            \"SDR\": 13.93,\n            \"SIR\": 19.2918,\n            \"SAR\": 15.7187,\n            \"ISR\": 18.5865\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.28314,\n        \"SIR\": 19.9505,\n        \"SAR\": 9.51553,\n        \"ISR\": 12.8071\n      },\n      \"instrumental\": {\n        \"SDR\": 14.9171,\n        \"SIR\": 23.2859,\n        \"SAR\": 16.9608,\n        \"ISR\": 17.9312\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET-Voc_FT.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Voc FT\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.85247,\n            \"SIR\": 18.6162,\n            \"SAR\": 5.85353,\n            \"ISR\": 9.82122\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5346,\n            \"SIR\": 24.6446,\n            \"SAR\": 19.9257,\n            \"ISR\": 18.9024\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.96594,\n            \"SIR\": 17.5106,\n            \"SAR\": 8.00501,\n            \"ISR\": 12.1838\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2981,\n            \"SIR\": 21.2883,\n            \"SAR\": 15.182,\n            \"ISR\": 17.0256\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6445,\n            \"SIR\": 23.8935,\n            \"SAR\": 11.2081,\n            \"ISR\": 15.0075\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0164,\n            \"SIR\": 26.5424,\n            \"SAR\": 18.7357,\n            \"ISR\": 18.5959\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.58071,\n            \"SIR\": 6.48431,\n            \"SAR\": 4.27237,\n            \"ISR\": 12.6095\n          },\n          \"instrumental\": {\n            \"SDR\": 11.17,\n            \"SIR\": 26.3101,\n            \"SAR\": 13.6593,\n            \"ISR\": 13.8797\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8469,\n            \"SIR\": 23.7856,\n            \"SAR\": 13.9115,\n            \"ISR\": 16.3771\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8464,\n            \"SIR\": 24.8091,\n            \"SAR\": 16.2189,\n            \"ISR\": 17.499\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.269,\n            \"SIR\": 18.9392,\n            \"SAR\": 11.0698,\n            \"ISR\": 14.838\n          },\n          \"instrumental\": {\n            \"SDR\": 12.134,\n            \"SIR\": 22.0301,\n            \"SAR\": 13.578,\n            \"ISR\": 16.1432\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7642,\n            \"SIR\": 21.2362,\n            \"SAR\": 12.8601,\n            \"ISR\": 15.3088\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1891,\n            \"SIR\": 24.2813,\n            \"SAR\": 17.2975,\n            \"ISR\": 18.0099\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.70382,\n            \"SIR\": 20.1665,\n            \"SAR\": 5.35759,\n            \"ISR\": 9.07049\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2045,\n            \"SIR\": 29.2438,\n            \"SAR\": 24.3075,\n            \"ISR\": 19.4682\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8063,\n            \"SIR\": 33.684,\n            \"SAR\": 13.7955,\n            \"ISR\": 16.5452\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7797,\n            \"SIR\": 26.942,\n            \"SAR\": 18.1452,\n            \"ISR\": 19.2421\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6163,\n            \"SIR\": 23.9791,\n            \"SAR\": 10.8092,\n            \"ISR\": 12.2521\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2699,\n            \"SIR\": 18.8515,\n            \"SAR\": 16.0674,\n            \"ISR\": 18.3353\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7031,\n            \"SIR\": 36.3939,\n            \"SAR\": 16.9067,\n            \"ISR\": 18.0324\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8544,\n            \"SIR\": 29.9331,\n            \"SAR\": 20.0558,\n            \"ISR\": 19.479\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7748,\n            \"SIR\": 22.6663,\n            \"SAR\": 12.1794,\n            \"ISR\": 15.498\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7036,\n            \"SIR\": 26.7618,\n            \"SAR\": 17.9489,\n            \"ISR\": 18.3876\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79256,\n            \"SIR\": 19.3248,\n            \"SAR\": 8.42329,\n            \"ISR\": 13.4326\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3574,\n            \"SIR\": 26.2987,\n            \"SAR\": 17.5672,\n            \"ISR\": 17.6025\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.02421,\n            \"SIR\": 18.1668,\n            \"SAR\": 9.81481,\n            \"ISR\": 13.2326\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9897,\n            \"SIR\": 25.1985,\n            \"SAR\": 22.189,\n            \"ISR\": 18.4285\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.75123,\n            \"SIR\": 16.3072,\n            \"SAR\": 3.78565,\n            \"ISR\": 9.39513\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9584,\n            \"SIR\": 22.6741,\n            \"SAR\": 18.184,\n            \"ISR\": 18.5465\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0409,\n            \"SIR\": 21.9385,\n            \"SAR\": 10.8427,\n            \"ISR\": 15.0522\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7408,\n            \"SIR\": 25.2352,\n            \"SAR\": 16.7092,\n            \"ISR\": 18.0604\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.16098,\n            \"SIR\": 26.3273,\n            \"SAR\": 9.517,\n            \"ISR\": 13.7031\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4281,\n            \"SIR\": 24.9875,\n            \"SAR\": 17.9108,\n            \"ISR\": 18.9762\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -5.16032,\n            \"SIR\": -33.5111,\n            \"SAR\": 0.18526,\n            \"ISR\": 10.7655\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3143,\n            \"SIR\": 57.3696,\n            \"SAR\": 15.3099,\n            \"ISR\": 15.1747\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.40511,\n            \"SIR\": 16.6976,\n            \"SAR\": 7.30403,\n            \"ISR\": 10.9417\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4242,\n            \"SIR\": 22.1563,\n            \"SAR\": 18.7711,\n            \"ISR\": 18.0463\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0178,\n            \"SIR\": 24.8346,\n            \"SAR\": 11.2068,\n            \"ISR\": 14.8051\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6918,\n            \"SIR\": 25.6284,\n            \"SAR\": 17.9008,\n            \"ISR\": 18.8346\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1381,\n            \"SIR\": 24.3847,\n            \"SAR\": 12.0468,\n            \"ISR\": 15.701\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7625,\n            \"SIR\": 32.2718,\n            \"SAR\": 22.2098,\n            \"ISR\": 19.159\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.33647,\n            \"SIR\": 1.85708,\n            \"SAR\": -0.10501,\n            \"ISR\": 3.53402\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7916,\n            \"SIR\": 40.1614,\n            \"SAR\": 33.0704,\n            \"ISR\": 19.3178\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.17661,\n            \"SIR\": 14.7964,\n            \"SAR\": 4.01796,\n            \"ISR\": 6.64208\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8177,\n            \"SIR\": 14.4024,\n            \"SAR\": 14.6059,\n            \"ISR\": 17.2214\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.78965,\n            \"SIR\": 17.8963,\n            \"SAR\": 7.29069,\n            \"ISR\": 12.9973\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6829,\n            \"SIR\": 29.2335,\n            \"SAR\": 20.8738,\n            \"ISR\": 18.886\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.76178,\n            \"SIR\": 19.1443,\n            \"SAR\": 8.14042,\n            \"ISR\": 11.734\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6389,\n            \"SIR\": 22.6177,\n            \"SAR\": 17.3795,\n            \"ISR\": 18.3685\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.80396,\n            \"SIR\": 18.0042,\n            \"SAR\": 9.31483,\n            \"ISR\": 14.4906\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6034,\n            \"SIR\": 26.18,\n            \"SAR\": 17.5161,\n            \"ISR\": 17.5853\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4557,\n            \"SIR\": 25.8265,\n            \"SAR\": 12.1438,\n            \"ISR\": 14.2183\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6163,\n            \"SIR\": 20.5352,\n            \"SAR\": 14.8004,\n            \"ISR\": 17.8439\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.47755,\n            \"SIR\": 13.8203,\n            \"SAR\": 6.51348,\n            \"ISR\": 11.9656\n          },\n          \"instrumental\": {\n            \"SDR\": 12.312,\n            \"SIR\": 20.0225,\n            \"SAR\": 13.7489,\n            \"ISR\": 15.0913\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7897,\n            \"SIR\": 25.4714,\n            \"SAR\": 11.2195,\n            \"ISR\": 15.6618\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8927,\n            \"SIR\": 25.869,\n            \"SAR\": 17.2579,\n            \"ISR\": 17.8019\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0109,\n            \"SIR\": 31.465,\n            \"SAR\": 14.6727,\n            \"ISR\": 15.9151\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4396,\n            \"SIR\": 20.6849,\n            \"SAR\": 15.0187,\n            \"ISR\": 18.9824\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3948,\n            \"SIR\": 25.0985,\n            \"SAR\": 12.1072,\n            \"ISR\": 15.4371\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5233,\n            \"SIR\": 23.886,\n            \"SAR\": 16.0168,\n            \"ISR\": 17.9571\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0623,\n            \"SIR\": 29.4646,\n            \"SAR\": 12.3561,\n            \"ISR\": 14.13\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6522,\n            \"SIR\": 25.6851,\n            \"SAR\": 21.8034,\n            \"ISR\": 19.5396\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.12546,\n            \"SIR\": 10.2078,\n            \"SAR\": 5.63472,\n            \"ISR\": 8.10285\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0255,\n            \"SIR\": 15.7903,\n            \"SAR\": 13.9319,\n            \"ISR\": 16.1243\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2019,\n            \"SIR\": 26.8411,\n            \"SAR\": 14.4558,\n            \"ISR\": 17.7664\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7103,\n            \"SIR\": 36.0611,\n            \"SAR\": 24.9146,\n            \"ISR\": 19.3984\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6305,\n            \"SIR\": 21.4772,\n            \"SAR\": 11.8067,\n            \"ISR\": 15.6618\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8029,\n            \"SIR\": 21.4342,\n            \"SAR\": 12.4832,\n            \"ISR\": 15.7957\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.59622,\n            \"SIR\": 21.5607,\n            \"SAR\": 8.76544,\n            \"ISR\": 13.634\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0218,\n            \"SIR\": 22.9615,\n            \"SAR\": 15.4146,\n            \"ISR\": 17.6536\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.58659,\n            \"SIR\": 5.70285,\n            \"SAR\": 2.34938,\n            \"ISR\": 11.1776\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2975,\n            \"SIR\": 31.1511,\n            \"SAR\": 21.33,\n            \"ISR\": 17.9352\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2741,\n            \"SIR\": 25.9903,\n            \"SAR\": 10.9464,\n            \"ISR\": 15.5489\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1617,\n            \"SIR\": 30.4592,\n            \"SAR\": 20.6779,\n            \"ISR\": 19.0455\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9382,\n            \"SIR\": 30.5499,\n            \"SAR\": 12.7034,\n            \"ISR\": 15.9186\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1179,\n            \"SIR\": 32.4492,\n            \"SAR\": 22.9616,\n            \"ISR\": 19.4414\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1645,\n            \"SIR\": 30.7798,\n            \"SAR\": 14.2288,\n            \"ISR\": 14.6103\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2234,\n            \"SIR\": 21.7717,\n            \"SAR\": 17.5695,\n            \"ISR\": 18.9104\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1549,\n        \"SIR\": 21.519,\n        \"SAR\": 10.826,\n        \"ISR\": 14.1742\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4338,\n        \"SIR\": 25.2169,\n        \"SAR\": 17.5684,\n        \"ISR\": 18.3519\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"Kim_Vocal_1.onnx\": {\n    \"model_name\": \"MDX-Net Model: Kim Vocal 1\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.7787,\n            \"SIR\": 19.3037,\n            \"SAR\": 5.83443,\n            \"ISR\": 9.62516\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6193,\n            \"SIR\": 24.5137,\n            \"SAR\": 19.9734,\n            \"ISR\": 18.9958\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.72532,\n            \"SIR\": 16.6197,\n            \"SAR\": 7.91073,\n            \"ISR\": 11.9734\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1701,\n            \"SIR\": 20.8569,\n            \"SAR\": 15.0532,\n            \"ISR\": 16.7519\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5123,\n            \"SIR\": 24.3538,\n            \"SAR\": 11.0592,\n            \"ISR\": 14.8854\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9999,\n            \"SIR\": 26.1584,\n            \"SAR\": 18.6098,\n            \"ISR\": 18.6795\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.3479,\n            \"SIR\": 6.26056,\n            \"SAR\": 4.23228,\n            \"ISR\": 12.5585\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2788,\n            \"SIR\": 26.2322,\n            \"SAR\": 13.6273,\n            \"ISR\": 13.7472\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8776,\n            \"SIR\": 24.3981,\n            \"SAR\": 13.4949,\n            \"ISR\": 15.7198\n          },\n          \"instrumental\": {\n            \"SDR\": 14.863,\n            \"SIR\": 23.2041,\n            \"SAR\": 15.9946,\n            \"ISR\": 17.7149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2076,\n            \"SIR\": 19.1108,\n            \"SAR\": 11.0353,\n            \"ISR\": 14.8802\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1313,\n            \"SIR\": 22.1319,\n            \"SAR\": 13.5355,\n            \"ISR\": 16.1979\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8044,\n            \"SIR\": 21.2777,\n            \"SAR\": 12.7373,\n            \"ISR\": 15.3939\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1199,\n            \"SIR\": 24.2453,\n            \"SAR\": 17.1488,\n            \"ISR\": 18.0283\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.53386,\n            \"SIR\": 20.4466,\n            \"SAR\": 5.48217,\n            \"ISR\": 9.04914\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1696,\n            \"SIR\": 28.9068,\n            \"SAR\": 24.204,\n            \"ISR\": 19.4861\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7225,\n            \"SIR\": 34.019,\n            \"SAR\": 13.7318,\n            \"ISR\": 16.4829\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7506,\n            \"SIR\": 26.8203,\n            \"SAR\": 17.9955,\n            \"ISR\": 19.2422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6082,\n            \"SIR\": 24.6136,\n            \"SAR\": 10.4236,\n            \"ISR\": 11.8828\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2434,\n            \"SIR\": 18.1343,\n            \"SAR\": 15.5907,\n            \"ISR\": 18.4772\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7653,\n            \"SIR\": 37.0024,\n            \"SAR\": 16.72,\n            \"ISR\": 18.0918\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8061,\n            \"SIR\": 30.0302,\n            \"SAR\": 19.9635,\n            \"ISR\": 19.5142\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6722,\n            \"SIR\": 23.7029,\n            \"SAR\": 12.1194,\n            \"ISR\": 15.2511\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7233,\n            \"SIR\": 26.1193,\n            \"SAR\": 17.9673,\n            \"ISR\": 18.61\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.81356,\n            \"SIR\": 19.7831,\n            \"SAR\": 8.39421,\n            \"ISR\": 13.3889\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4745,\n            \"SIR\": 26.3038,\n            \"SAR\": 17.7676,\n            \"ISR\": 17.8354\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.8218,\n            \"SIR\": 18.7611,\n            \"SAR\": 9.515,\n            \"ISR\": 13.2065\n          },\n          \"instrumental\": {\n            \"SDR\": 18.211,\n            \"SIR\": 25.1118,\n            \"SAR\": 22.173,\n            \"ISR\": 18.5333\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.88415,\n            \"SIR\": 18.0369,\n            \"SAR\": 4.06765,\n            \"ISR\": 9.15582\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1837,\n            \"SIR\": 22.4004,\n            \"SAR\": 18.4696,\n            \"ISR\": 18.7867\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0277,\n            \"SIR\": 21.9748,\n            \"SAR\": 10.7622,\n            \"ISR\": 15.1361\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7132,\n            \"SIR\": 25.3496,\n            \"SAR\": 16.6553,\n            \"ISR\": 18.0576\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.12554,\n            \"SIR\": 26.566,\n            \"SAR\": 9.46116,\n            \"ISR\": 13.6395\n          },\n          \"instrumental\": {\n            \"SDR\": 15.484,\n            \"SIR\": 24.8508,\n            \"SAR\": 18.0095,\n            \"ISR\": 18.9962\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -5.90419,\n            \"SIR\": -33.1553,\n            \"SAR\": 0.318885,\n            \"ISR\": 9.98123\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6152,\n            \"SIR\": 56.8769,\n            \"SAR\": 15.6243,\n            \"ISR\": 15.3254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.34323,\n            \"SIR\": 16.9046,\n            \"SAR\": 7.19502,\n            \"ISR\": 10.8001\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4855,\n            \"SIR\": 21.9075,\n            \"SAR\": 18.4947,\n            \"ISR\": 18.1059\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0283,\n            \"SIR\": 25.1683,\n            \"SAR\": 11.1416,\n            \"ISR\": 14.676\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5838,\n            \"SIR\": 25.7431,\n            \"SAR\": 17.7343,\n            \"ISR\": 18.8768\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8575,\n            \"SIR\": 24.4099,\n            \"SAR\": 11.7502,\n            \"ISR\": 15.632\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6651,\n            \"SIR\": 32.0456,\n            \"SAR\": 21.6783,\n            \"ISR\": 19.1703\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.62992,\n            \"SIR\": 1.47867,\n            \"SAR\": -0.12885,\n            \"ISR\": 3.53572\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7668,\n            \"SIR\": 40.5453,\n            \"SAR\": 32.6216,\n            \"ISR\": 19.293\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.12934,\n            \"SIR\": 14.6188,\n            \"SAR\": 3.96225,\n            \"ISR\": 6.55847\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8107,\n            \"SIR\": 14.2916,\n            \"SAR\": 14.66,\n            \"ISR\": 17.1994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.48564,\n            \"SIR\": 17.1307,\n            \"SAR\": 6.91522,\n            \"ISR\": 13.4063\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6587,\n            \"SIR\": 29.5261,\n            \"SAR\": 20.6854,\n            \"ISR\": 18.7757\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.68942,\n            \"SIR\": 19.4542,\n            \"SAR\": 8.09418,\n            \"ISR\": 11.6124\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6184,\n            \"SIR\": 22.4699,\n            \"SAR\": 17.2804,\n            \"ISR\": 18.4301\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.6663,\n            \"SIR\": 18.6032,\n            \"SAR\": 9.33868,\n            \"ISR\": 14.3162\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5574,\n            \"SIR\": 25.9722,\n            \"SAR\": 17.2718,\n            \"ISR\": 17.7623\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3979,\n            \"SIR\": 26.3154,\n            \"SAR\": 11.9998,\n            \"ISR\": 14.2104\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6197,\n            \"SIR\": 20.5099,\n            \"SAR\": 14.7268,\n            \"ISR\": 17.9731\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.72467,\n            \"SIR\": 16.3565,\n            \"SAR\": 6.43715,\n            \"ISR\": 11.2926\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8279,\n            \"SIR\": 19.1357,\n            \"SAR\": 14.1054,\n            \"ISR\": 16.1796\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.682,\n            \"SIR\": 25.6488,\n            \"SAR\": 11.2064,\n            \"ISR\": 15.7281\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9126,\n            \"SIR\": 25.8945,\n            \"SAR\": 17.1637,\n            \"ISR\": 17.7615\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0262,\n            \"SIR\": 31.8946,\n            \"SAR\": 14.5439,\n            \"ISR\": 15.885\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3924,\n            \"SIR\": 20.5674,\n            \"SAR\": 14.8726,\n            \"ISR\": 19.0238\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4785,\n            \"SIR\": 25.2074,\n            \"SAR\": 12.1125,\n            \"ISR\": 15.435\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5186,\n            \"SIR\": 23.9035,\n            \"SAR\": 16.0137,\n            \"ISR\": 17.9791\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8562,\n            \"SIR\": 29.5296,\n            \"SAR\": 12.0968,\n            \"ISR\": 14.2107\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5601,\n            \"SIR\": 25.6694,\n            \"SAR\": 21.7747,\n            \"ISR\": 19.5271\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.12053,\n            \"SIR\": 9.82821,\n            \"SAR\": 6.35754,\n            \"ISR\": 9.48596\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9955,\n            \"SIR\": 16.6271,\n            \"SAR\": 13.7444,\n            \"ISR\": 15.8035\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1288,\n            \"SIR\": 26.5795,\n            \"SAR\": 14.2956,\n            \"ISR\": 17.8476\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7061,\n            \"SIR\": 36.3053,\n            \"SAR\": 24.6828,\n            \"ISR\": 19.3872\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1856,\n            \"SIR\": 22.2052,\n            \"SAR\": 11.3637,\n            \"ISR\": 14.8685\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4526,\n            \"SIR\": 19.9352,\n            \"SAR\": 12.3851,\n            \"ISR\": 16.0649\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.3438,\n            \"SIR\": 21.7652,\n            \"SAR\": 8.52415,\n            \"ISR\": 13.4414\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8409,\n            \"SIR\": 22.5935,\n            \"SAR\": 15.243,\n            \"ISR\": 17.7323\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.63183,\n            \"SIR\": 5.93662,\n            \"SAR\": 2.39649,\n            \"ISR\": 11.1103\n          },\n          \"instrumental\": {\n            \"SDR\": 17.329,\n            \"SIR\": 30.7894,\n            \"SAR\": 21.3316,\n            \"ISR\": 17.9891\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2666,\n            \"SIR\": 26.4786,\n            \"SAR\": 10.8872,\n            \"ISR\": 15.5374\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1934,\n            \"SIR\": 30.4887,\n            \"SAR\": 20.5983,\n            \"ISR\": 19.0959\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9033,\n            \"SIR\": 30.7985,\n            \"SAR\": 12.6202,\n            \"ISR\": 16.0502\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1294,\n            \"SIR\": 32.5599,\n            \"SAR\": 22.7484,\n            \"ISR\": 19.452\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0683,\n            \"SIR\": 30.8377,\n            \"SAR\": 14.0395,\n            \"ISR\": 14.3655\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9496,\n            \"SIR\": 21.0722,\n            \"SAR\": 17.4033,\n            \"ISR\": 18.921\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1067,\n        \"SIR\": 21.87,\n        \"SAR\": 10.5929,\n        \"ISR\": 14.2105\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4793,\n        \"SIR\": 25.2307,\n        \"SAR\": 17.5688,\n        \"ISR\": 18.4536\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"Kim_Vocal_2.onnx\": {\n    \"model_name\": \"MDX-Net Model: Kim Vocal 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.78748,\n            \"SIR\": 18.1983,\n            \"SAR\": 5.91704,\n            \"ISR\": 10.1435\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5766,\n            \"SIR\": 24.9934,\n            \"SAR\": 19.8726,\n            \"ISR\": 18.7919\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.03279,\n            \"SIR\": 17.5348,\n            \"SAR\": 8.07367,\n            \"ISR\": 12.4061\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2508,\n            \"SIR\": 21.5817,\n            \"SAR\": 15.168,\n            \"ISR\": 16.9983\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6233,\n            \"SIR\": 23.2434,\n            \"SAR\": 11.1794,\n            \"ISR\": 15.2452\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9744,\n            \"SIR\": 26.8562,\n            \"SAR\": 18.6375,\n            \"ISR\": 18.4786\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.30705,\n            \"SIR\": 6.44751,\n            \"SAR\": 4.3895,\n            \"ISR\": 12.6803\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2415,\n            \"SIR\": 26.6465,\n            \"SAR\": 13.586,\n            \"ISR\": 13.8498\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8263,\n            \"SIR\": 23.5513,\n            \"SAR\": 14.1046,\n            \"ISR\": 16.6691\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8004,\n            \"SIR\": 25.2006,\n            \"SAR\": 16.2002,\n            \"ISR\": 17.3963\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2559,\n            \"SIR\": 18.7186,\n            \"SAR\": 11.1054,\n            \"ISR\": 14.9016\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0543,\n            \"SIR\": 22.1427,\n            \"SAR\": 13.5384,\n            \"ISR\": 16.0391\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7289,\n            \"SIR\": 21.057,\n            \"SAR\": 12.8999,\n            \"ISR\": 15.2477\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2356,\n            \"SIR\": 24.3255,\n            \"SAR\": 17.3425,\n            \"ISR\": 17.9555\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.54312,\n            \"SIR\": 19.8927,\n            \"SAR\": 5.29757,\n            \"ISR\": 9.15676\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1248,\n            \"SIR\": 29.5977,\n            \"SAR\": 24.2145,\n            \"ISR\": 19.441\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9178,\n            \"SIR\": 33.076,\n            \"SAR\": 13.778,\n            \"ISR\": 16.8409\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6294,\n            \"SIR\": 27.5723,\n            \"SAR\": 18.1363,\n            \"ISR\": 19.1667\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6482,\n            \"SIR\": 23.1401,\n            \"SAR\": 10.9009,\n            \"ISR\": 12.5658\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2588,\n            \"SIR\": 19.4777,\n            \"SAR\": 16.268,\n            \"ISR\": 18.138\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7466,\n            \"SIR\": 35.6205,\n            \"SAR\": 16.9331,\n            \"ISR\": 18.2321\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8514,\n            \"SIR\": 30.4856,\n            \"SAR\": 20.2734,\n            \"ISR\": 19.4275\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7835,\n            \"SIR\": 21.6868,\n            \"SAR\": 12.2327,\n            \"ISR\": 15.9921\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5987,\n            \"SIR\": 28.1843,\n            \"SAR\": 17.7422,\n            \"ISR\": 18.1653\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.78312,\n            \"SIR\": 16.2525,\n            \"SAR\": 8.30285,\n            \"ISR\": 13.6426\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5191,\n            \"SIR\": 26.6244,\n            \"SAR\": 16.7724,\n            \"ISR\": 16.9445\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.03621,\n            \"SIR\": 17.1699,\n            \"SAR\": 9.78009,\n            \"ISR\": 13.3789\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5654,\n            \"SIR\": 25.3953,\n            \"SAR\": 21.2052,\n            \"ISR\": 18.2209\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.62285,\n            \"SIR\": 15.4851,\n            \"SAR\": 3.48634,\n            \"ISR\": 9.60032\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9442,\n            \"SIR\": 22.9515,\n            \"SAR\": 18.184,\n            \"ISR\": 18.397\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1029,\n            \"SIR\": 21.6369,\n            \"SAR\": 10.8482,\n            \"ISR\": 15.1901\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6988,\n            \"SIR\": 25.5247,\n            \"SAR\": 16.7924,\n            \"ISR\": 17.9737\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.22076,\n            \"SIR\": 25.928,\n            \"SAR\": 9.53879,\n            \"ISR\": 13.8432\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3079,\n            \"SIR\": 25.2327,\n            \"SAR\": 17.8101,\n            \"ISR\": 18.9256\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.82072,\n            \"SIR\": -34.0399,\n            \"SAR\": 0.271745,\n            \"ISR\": 11.184\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0513,\n            \"SIR\": 57.7268,\n            \"SAR\": 14.9775,\n            \"ISR\": 14.7599\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.32928,\n            \"SIR\": 16.3347,\n            \"SAR\": 7.1888,\n            \"ISR\": 11.0469\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3661,\n            \"SIR\": 22.2271,\n            \"SAR\": 18.7206,\n            \"ISR\": 17.9562\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1147,\n            \"SIR\": 24.3264,\n            \"SAR\": 11.3121,\n            \"ISR\": 14.9408\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8885,\n            \"SIR\": 25.9411,\n            \"SAR\": 18.1125,\n            \"ISR\": 18.7605\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0218,\n            \"SIR\": 21.9526,\n            \"SAR\": 12.0348,\n            \"ISR\": 16.3209\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7501,\n            \"SIR\": 33.344,\n            \"SAR\": 21.7712,\n            \"ISR\": 18.8814\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.12617,\n            \"SIR\": 1.41337,\n            \"SAR\": -0.07959,\n            \"ISR\": 3.5151\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7915,\n            \"SIR\": 40.0327,\n            \"SAR\": 32.9356,\n            \"ISR\": 19.3013\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.21663,\n            \"SIR\": 14.4897,\n            \"SAR\": 4.0762,\n            \"ISR\": 6.78415\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7745,\n            \"SIR\": 14.5625,\n            \"SAR\": 14.415,\n            \"ISR\": 17.0671\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.43246,\n            \"SIR\": 19.2882,\n            \"SAR\": 7.11181,\n            \"ISR\": 12.7918\n          },\n          \"instrumental\": {\n            \"SDR\": 17.617,\n            \"SIR\": 28.6754,\n            \"SAR\": 20.4346,\n            \"ISR\": 18.9706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.80663,\n            \"SIR\": 18.6989,\n            \"SAR\": 8.16437,\n            \"ISR\": 11.9199\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6036,\n            \"SIR\": 22.8508,\n            \"SAR\": 17.3628,\n            \"ISR\": 18.2662\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68653,\n            \"SIR\": 16.4102,\n            \"SAR\": 8.77651,\n            \"ISR\": 14.8132\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6596,\n            \"SIR\": 26.7008,\n            \"SAR\": 17.2505,\n            \"ISR\": 17.1268\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.531,\n            \"SIR\": 24.9249,\n            \"SAR\": 12.2922,\n            \"ISR\": 14.5638\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2538,\n            \"SIR\": 21.2963,\n            \"SAR\": 14.6618,\n            \"ISR\": 17.5915\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.21945,\n            \"SIR\": 11.5096,\n            \"SAR\": 6.22666,\n            \"ISR\": 12.6766\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5724,\n            \"SIR\": 20.7357,\n            \"SAR\": 13.0614,\n            \"ISR\": 13.9142\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8238,\n            \"SIR\": 25.0172,\n            \"SAR\": 11.2914,\n            \"ISR\": 15.8788\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9404,\n            \"SIR\": 26.4337,\n            \"SAR\": 17.2136,\n            \"ISR\": 17.7954\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0445,\n            \"SIR\": 30.6778,\n            \"SAR\": 14.4596,\n            \"ISR\": 15.7824\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4118,\n            \"SIR\": 20.3336,\n            \"SAR\": 15.0035,\n            \"ISR\": 18.8308\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4697,\n            \"SIR\": 24.7364,\n            \"SAR\": 12.0661,\n            \"ISR\": 15.5975\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4492,\n            \"SIR\": 24.1199,\n            \"SAR\": 15.9711,\n            \"ISR\": 17.8576\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1373,\n            \"SIR\": 29.079,\n            \"SAR\": 12.2703,\n            \"ISR\": 14.0694\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5817,\n            \"SIR\": 25.559,\n            \"SAR\": 21.7436,\n            \"ISR\": 19.4994\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.68875,\n            \"SIR\": 9.88833,\n            \"SAR\": 3.91378,\n            \"ISR\": 6.62887\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5589,\n            \"SIR\": 14.5512,\n            \"SAR\": 14.0416,\n            \"ISR\": 16.4449\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1823,\n            \"SIR\": 26.6126,\n            \"SAR\": 14.2855,\n            \"ISR\": 17.9611\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6893,\n            \"SIR\": 36.7055,\n            \"SAR\": 24.7931,\n            \"ISR\": 19.3679\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.027,\n            \"SIR\": 20.2644,\n            \"SAR\": 12.4025,\n            \"ISR\": 16.5225\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7852,\n            \"SIR\": 22.7434,\n            \"SAR\": 12.4544,\n            \"ISR\": 15.2133\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.58457,\n            \"SIR\": 21.0019,\n            \"SAR\": 8.81202,\n            \"ISR\": 13.8718\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9735,\n            \"SIR\": 23.3021,\n            \"SAR\": 15.4052,\n            \"ISR\": 17.4838\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.21154,\n            \"SIR\": 5.45196,\n            \"SAR\": 2.2815,\n            \"ISR\": 11.2524\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8901,\n            \"SIR\": 31.5901,\n            \"SAR\": 20.9381,\n            \"ISR\": 17.8407\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2706,\n            \"SIR\": 25.4344,\n            \"SAR\": 10.9862,\n            \"ISR\": 15.8282\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0911,\n            \"SIR\": 31.0946,\n            \"SAR\": 20.6592,\n            \"ISR\": 18.9748\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9377,\n            \"SIR\": 30.1603,\n            \"SAR\": 12.7632,\n            \"ISR\": 15.9392\n          },\n          \"instrumental\": {\n            \"SDR\": 18.037,\n            \"SIR\": 32.3938,\n            \"SAR\": 22.8888,\n            \"ISR\": 19.4167\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2278,\n            \"SIR\": 30.3828,\n            \"SAR\": 14.2931,\n            \"ISR\": 14.9457\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2429,\n            \"SIR\": 22.5973,\n            \"SAR\": 17.6381,\n            \"ISR\": 18.7928\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1794,\n        \"SIR\": 21.0294,\n        \"SAR\": 10.8745,\n        \"ISR\": 14.3166\n      },\n      \"instrumental\": {\n        \"SDR\": 15.3598,\n        \"SIR\": 25.46,\n        \"SAR\": 17.5005,\n        \"ISR\": 18.1517\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"Kim_Inst.onnx\": {\n    \"model_name\": \"MDX-Net Model: Kim Inst\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.26277,\n            \"SIR\": 19.5259,\n            \"SAR\": 4.96245,\n            \"ISR\": 8.62868\n          },\n          \"instrumental\": {\n            \"SDR\": 16.55,\n            \"SIR\": 23.7871,\n            \"SAR\": 19.7671,\n            \"ISR\": 18.9284\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.11105,\n            \"SIR\": 12.189,\n            \"SAR\": 6.56294,\n            \"ISR\": 12.4095\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3977,\n            \"SIR\": 21.3127,\n            \"SAR\": 12.8984,\n            \"ISR\": 14.6754\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0759,\n            \"SIR\": 22.0085,\n            \"SAR\": 10.7607,\n            \"ISR\": 15.149\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7018,\n            \"SIR\": 26.6682,\n            \"SAR\": 18.2456,\n            \"ISR\": 18.2541\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.01943,\n            \"SIR\": 9.38702,\n            \"SAR\": 3.55312,\n            \"ISR\": 11.7268\n          },\n          \"instrumental\": {\n            \"SDR\": 12.045,\n            \"SIR\": 23.5869,\n            \"SAR\": 14.0357,\n            \"ISR\": 15.6441\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1252,\n            \"SIR\": 23.3204,\n            \"SAR\": 13.2023,\n            \"ISR\": 15.8791\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6767,\n            \"SIR\": 23.8644,\n            \"SAR\": 15.7025,\n            \"ISR\": 17.4144\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.98956,\n            \"SIR\": 18.245,\n            \"SAR\": 10.8498,\n            \"ISR\": 14.6787\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8149,\n            \"SIR\": 21.671,\n            \"SAR\": 13.3198,\n            \"ISR\": 15.7917\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5222,\n            \"SIR\": 20.7305,\n            \"SAR\": 12.7164,\n            \"ISR\": 14.8499\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0181,\n            \"SIR\": 23.7438,\n            \"SAR\": 17.3564,\n            \"ISR\": 17.8586\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.55708,\n            \"SIR\": 18.6162,\n            \"SAR\": 5.538,\n            \"ISR\": 9.08567\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8733,\n            \"SIR\": 29.2467,\n            \"SAR\": 23.5318,\n            \"ISR\": 19.289\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5041,\n            \"SIR\": 30.2399,\n            \"SAR\": 13.2887,\n            \"ISR\": 16.8311\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5184,\n            \"SIR\": 28.5678,\n            \"SAR\": 17.6463,\n            \"ISR\": 18.9643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1289,\n            \"SIR\": 22.9826,\n            \"SAR\": 10.0506,\n            \"ISR\": 11.8367\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8138,\n            \"SIR\": 17.152,\n            \"SAR\": 15.2558,\n            \"ISR\": 17.9899\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7009,\n            \"SIR\": 35.0287,\n            \"SAR\": 16.5611,\n            \"ISR\": 18.2066\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7921,\n            \"SIR\": 31.6557,\n            \"SAR\": 19.7944,\n            \"ISR\": 19.327\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4857,\n            \"SIR\": 22.4695,\n            \"SAR\": 12.0572,\n            \"ISR\": 15.6258\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3689,\n            \"SIR\": 27.2182,\n            \"SAR\": 17.8018,\n            \"ISR\": 18.3432\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.44753,\n            \"SIR\": 19.2903,\n            \"SAR\": 7.78604,\n            \"ISR\": 13.4492\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4279,\n            \"SIR\": 26.3707,\n            \"SAR\": 17.4098,\n            \"ISR\": 17.7376\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.5805,\n            \"SIR\": 17.472,\n            \"SAR\": 10.8827,\n            \"ISR\": 12.0734\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0654,\n            \"SIR\": 24.0787,\n            \"SAR\": 19.8907,\n            \"ISR\": 17.676\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.559,\n            \"SIR\": 19.6837,\n            \"SAR\": 3.81748,\n            \"ISR\": 8.79254\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4527,\n            \"SIR\": 21.7783,\n            \"SAR\": 19.1474,\n            \"ISR\": 18.9617\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.63147,\n            \"SIR\": 20.5054,\n            \"SAR\": 10.368,\n            \"ISR\": 14.9922\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0653,\n            \"SIR\": 25.0996,\n            \"SAR\": 16.1256,\n            \"ISR\": 17.4226\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.46004,\n            \"SIR\": 14.5425,\n            \"SAR\": 8.18075,\n            \"ISR\": 13.6921\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6892,\n            \"SIR\": 24.9309,\n            \"SAR\": 16.3479,\n            \"ISR\": 17.5871\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.5222,\n            \"SIR\": -31.6784,\n            \"SAR\": 0.457015,\n            \"ISR\": 11.4212\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2507,\n            \"SIR\": 58.4133,\n            \"SAR\": 16.6646,\n            \"ISR\": 15.4292\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.04199,\n            \"SIR\": 15.8402,\n            \"SAR\": 6.84512,\n            \"ISR\": 10.8884\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9707,\n            \"SIR\": 22.021,\n            \"SAR\": 18.0063,\n            \"ISR\": 17.866\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0904,\n            \"SIR\": 24.4226,\n            \"SAR\": 11.5114,\n            \"ISR\": 15.2134\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6601,\n            \"SIR\": 26.7623,\n            \"SAR\": 18.0982,\n            \"ISR\": 18.7246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.52,\n            \"SIR\": 22.1288,\n            \"SAR\": 11.0857,\n            \"ISR\": 15.8421\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2587,\n            \"SIR\": 31.3245,\n            \"SAR\": 21.0856,\n            \"ISR\": 18.7977\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.496,\n            \"SIR\": -9.62789,\n            \"SAR\": 0.5246,\n            \"ISR\": 11.303\n          },\n          \"instrumental\": {\n            \"SDR\": 19.295,\n            \"SIR\": 40.9617,\n            \"SAR\": 29.5361,\n            \"ISR\": 19.0167\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.9009,\n            \"SIR\": 14.5612,\n            \"SAR\": 3.35668,\n            \"ISR\": 6.01042\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6492,\n            \"SIR\": 13.5777,\n            \"SAR\": 14.525,\n            \"ISR\": 17.2589\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.3612,\n            \"SIR\": 17.7706,\n            \"SAR\": 5.36505,\n            \"ISR\": 10.1516\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5383,\n            \"SIR\": 25.132,\n            \"SAR\": 20.5485,\n            \"ISR\": 19.1151\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.58855,\n            \"SIR\": 18.7821,\n            \"SAR\": 7.72783,\n            \"ISR\": 11.0853\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5183,\n            \"SIR\": 21.6345,\n            \"SAR\": 17.271,\n            \"ISR\": 18.2507\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.65464,\n            \"SIR\": 15.9043,\n            \"SAR\": 8.70431,\n            \"ISR\": 14.5067\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5429,\n            \"SIR\": 25.9075,\n            \"SAR\": 17.0502,\n            \"ISR\": 17.1881\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4087,\n            \"SIR\": 23.622,\n            \"SAR\": 12.1198,\n            \"ISR\": 14.4197\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4511,\n            \"SIR\": 20.8778,\n            \"SAR\": 14.6288,\n            \"ISR\": 17.4228\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.53221,\n            \"SIR\": 15.5778,\n            \"SAR\": 6.18376,\n            \"ISR\": 11.4715\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9354,\n            \"SIR\": 19.3045,\n            \"SAR\": 13.8164,\n            \"ISR\": 15.8692\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.1846,\n            \"SIR\": 8.49418,\n            \"SAR\": 8.53703,\n            \"ISR\": 15.979\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6274,\n            \"SIR\": 25.8583,\n            \"SAR\": 13.6298,\n            \"ISR\": 14.3399\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7406,\n            \"SIR\": 29.7598,\n            \"SAR\": 14.8589,\n            \"ISR\": 16.4227\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0167,\n            \"SIR\": 21.856,\n            \"SAR\": 15.2401,\n            \"ISR\": 18.6284\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.395,\n            \"SIR\": 23.4172,\n            \"SAR\": 11.9037,\n            \"ISR\": 15.5138\n          },\n          \"instrumental\": {\n            \"SDR\": 14.309,\n            \"SIR\": 24.0715,\n            \"SAR\": 15.9119,\n            \"ISR\": 17.5826\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.224,\n            \"SIR\": 23.6275,\n            \"SAR\": 12.1848,\n            \"ISR\": 15.7146\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7548,\n            \"SIR\": 29.1013,\n            \"SAR\": 21.0587,\n            \"ISR\": 18.9483\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.11952,\n            \"SIR\": 7.85644,\n            \"SAR\": 8.428,\n            \"ISR\": 13.1743\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9198,\n            \"SIR\": 21.2196,\n            \"SAR\": 14.4782,\n            \"ISR\": 14.4159\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7367,\n            \"SIR\": 27.1822,\n            \"SAR\": 13.4723,\n            \"ISR\": 17.6164\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6489,\n            \"SIR\": 35.3411,\n            \"SAR\": 24.4767,\n            \"ISR\": 19.3661\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3861,\n            \"SIR\": 21.3924,\n            \"SAR\": 11.227,\n            \"ISR\": 13.6416\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4481,\n            \"SIR\": 18.0478,\n            \"SAR\": 12.2151,\n            \"ISR\": 15.8016\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.06744,\n            \"SIR\": 23.3958,\n            \"SAR\": 8.29173,\n            \"ISR\": 13.4451\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9644,\n            \"SIR\": 22.9701,\n            \"SAR\": 15.397,\n            \"ISR\": 17.9973\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.70047,\n            \"SIR\": 1.66527,\n            \"SAR\": 2.16292,\n            \"ISR\": 10.2113\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8065,\n            \"SIR\": 30.336,\n            \"SAR\": 21.7829,\n            \"ISR\": 17.9469\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1107,\n            \"SIR\": 25.7741,\n            \"SAR\": 10.7008,\n            \"ISR\": 15.7567\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9696,\n            \"SIR\": 30.6441,\n            \"SAR\": 20.4904,\n            \"ISR\": 18.8756\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.619,\n            \"SIR\": 29.2429,\n            \"SAR\": 11.9158,\n            \"ISR\": 15.8813\n          },\n          \"instrumental\": {\n            \"SDR\": 17.896,\n            \"SIR\": 31.7735,\n            \"SAR\": 22.3261,\n            \"ISR\": 19.3388\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2398,\n            \"SIR\": 29.0538,\n            \"SAR\": 13.7428,\n            \"ISR\": 15.9896\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4585,\n            \"SIR\": 24.0575,\n            \"SAR\": 17.4053,\n            \"ISR\": 18.7139\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.14306,\n        \"SIR\": 20.0946,\n        \"SAR\": 10.2093,\n        \"ISR\": 14.0559\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4556,\n        \"SIR\": 24.5048,\n        \"SAR\": 17.3809,\n        \"ISR\": 17.9684\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"Reverb_HQ_By_FoxJoy.onnx\": {\n    \"model_name\": \"MDX-Net Model: Reverb HQ By FoxJoy\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"reverb\",\n      \"no reverb\"\n    ],\n    \"target_stem\": \"reverb\"\n  },\n  \"UVR-MDX-NET_Crowd_HQ_1.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Crowd HQ 1 By Aufr33\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"no crowd\",\n      \"crowd\"\n    ],\n    \"target_stem\": \"no crowd\"\n  },\n  \"kuielab_a_vocals.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_a_vocals\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.03051,\n            \"SIR\": 19.0287,\n            \"SAR\": 6.30829,\n            \"ISR\": 10.7322\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4026,\n            \"SIR\": 25.2452,\n            \"SAR\": 20.0441,\n            \"ISR\": 18.8\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79108,\n            \"SIR\": 19.5485,\n            \"SAR\": 8.18339,\n            \"ISR\": 12.0497\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1751,\n            \"SIR\": 21.2396,\n            \"SAR\": 15.3596,\n            \"ISR\": 17.6468\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0706,\n            \"SIR\": 20.7343,\n            \"SAR\": 10.8404,\n            \"ISR\": 14.2627\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3395,\n            \"SIR\": 24.9303,\n            \"SAR\": 18.147,\n            \"ISR\": 18.0621\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.3808,\n            \"SIR\": 14.5351,\n            \"SAR\": 6.81839,\n            \"ISR\": 12.0365\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2029,\n            \"SIR\": 24.1477,\n            \"SAR\": 17.9478,\n            \"ISR\": 17.4357\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0981,\n            \"SIR\": 22.6637,\n            \"SAR\": 13.4578,\n            \"ISR\": 15.4502\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0664,\n            \"SIR\": 23.0025,\n            \"SAR\": 15.9212,\n            \"ISR\": 17.1644\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.75249,\n            \"SIR\": 17.5292,\n            \"SAR\": 10.8363,\n            \"ISR\": 14.0858\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6948,\n            \"SIR\": 20.532,\n            \"SAR\": 13.419,\n            \"ISR\": 15.6081\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3404,\n            \"SIR\": 19.7158,\n            \"SAR\": 12.6517,\n            \"ISR\": 14.8139\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8315,\n            \"SIR\": 23.9257,\n            \"SAR\": 17.2894,\n            \"ISR\": 17.6543\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.43439,\n            \"SIR\": 18.4447,\n            \"SAR\": 5.70884,\n            \"ISR\": 8.44233\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2111,\n            \"SIR\": 29.1751,\n            \"SAR\": 24.2738,\n            \"ISR\": 19.3949\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7921,\n            \"SIR\": 31.2858,\n            \"SAR\": 12.9965,\n            \"ISR\": 15.1122\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1792,\n            \"SIR\": 24.1135,\n            \"SAR\": 17.7085,\n            \"ISR\": 19.0217\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4438,\n            \"SIR\": 18.6393,\n            \"SAR\": 11.2144,\n            \"ISR\": 12.8695\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5006,\n            \"SIR\": 20.3331,\n            \"SAR\": 16.451,\n            \"ISR\": 16.8977\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7353,\n            \"SIR\": 34.862,\n            \"SAR\": 15.9299,\n            \"ISR\": 16.1708\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2984,\n            \"SIR\": 25.7474,\n            \"SAR\": 19.4356,\n            \"ISR\": 19.3742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0085,\n            \"SIR\": 21.773,\n            \"SAR\": 11.7101,\n            \"ISR\": 14.7899\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2654,\n            \"SIR\": 25.6076,\n            \"SAR\": 17.9433,\n            \"ISR\": 18.335\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.52816,\n            \"SIR\": 18.5755,\n            \"SAR\": 7.56713,\n            \"ISR\": 12.9584\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3151,\n            \"SIR\": 25.4701,\n            \"SAR\": 17.5584,\n            \"ISR\": 17.5639\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.29306,\n            \"SIR\": 19.0025,\n            \"SAR\": 10.2181,\n            \"ISR\": 12.819\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8453,\n            \"SIR\": 24.5982,\n            \"SAR\": 22.5411,\n            \"ISR\": 18.5911\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.2525,\n            \"SIR\": 16.6621,\n            \"SAR\": 4.67262,\n            \"ISR\": 10.2596\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8254,\n            \"SIR\": 23.9096,\n            \"SAR\": 19.5432,\n            \"ISR\": 18.4999\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.58318,\n            \"SIR\": 19.7898,\n            \"SAR\": 10.2908,\n            \"ISR\": 14.4709\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1312,\n            \"SIR\": 24.1626,\n            \"SAR\": 16.3298,\n            \"ISR\": 17.4778\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.75829,\n            \"SIR\": 25.3512,\n            \"SAR\": 9.23667,\n            \"ISR\": 12.7865\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1954,\n            \"SIR\": 23.5701,\n            \"SAR\": 17.8758,\n            \"ISR\": 18.8842\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.79323,\n            \"SIR\": -20.6862,\n            \"SAR\": 0.13322,\n            \"ISR\": 5.71155\n          },\n          \"instrumental\": {\n            \"SDR\": 19.9488,\n            \"SIR\": 55.6836,\n            \"SAR\": 32.0945,\n            \"ISR\": 18.7377\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.64835,\n            \"SIR\": 15.4573,\n            \"SAR\": 7.82235,\n            \"ISR\": 11.4052\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5112,\n            \"SIR\": 22.7866,\n            \"SAR\": 19.0485,\n            \"ISR\": 17.8047\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6602,\n            \"SIR\": 22.8814,\n            \"SAR\": 11.2883,\n            \"ISR\": 14.5295\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4315,\n            \"SIR\": 25.42,\n            \"SAR\": 18.1247,\n            \"ISR\": 18.4917\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8027,\n            \"SIR\": 21.3173,\n            \"SAR\": 11.8299,\n            \"ISR\": 15.3935\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2968,\n            \"SIR\": 30.851,\n            \"SAR\": 20.9412,\n            \"ISR\": 18.7896\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -1.46489,\n            \"SIR\": -0.29017,\n            \"SAR\": -0.01044,\n            \"ISR\": 3.3685\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7483,\n            \"SIR\": 40.1666,\n            \"SAR\": 31.2469,\n            \"ISR\": 19.0765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.43716,\n            \"SIR\": 14.2186,\n            \"SAR\": 4.24093,\n            \"ISR\": 7.61527\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7555,\n            \"SIR\": 15.557,\n            \"SAR\": 14.2806,\n            \"ISR\": 16.6573\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.98534,\n            \"SIR\": 18.464,\n            \"SAR\": 6.95737,\n            \"ISR\": 13.5799\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1728,\n            \"SIR\": 31.099,\n            \"SAR\": 20.9328,\n            \"ISR\": 18.796\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.45018,\n            \"SIR\": 16.9634,\n            \"SAR\": 7.89225,\n            \"ISR\": 11.6913\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2052,\n            \"SIR\": 22.4137,\n            \"SAR\": 17.09,\n            \"ISR\": 17.8771\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.45454,\n            \"SIR\": 17.7008,\n            \"SAR\": 9.08884,\n            \"ISR\": 13.6135\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3463,\n            \"SIR\": 24.6442,\n            \"SAR\": 17.3824,\n            \"ISR\": 17.5351\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7096,\n            \"SIR\": 23.1149,\n            \"SAR\": 12.5639,\n            \"ISR\": 15.2631\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0782,\n            \"SIR\": 22.643,\n            \"SAR\": 14.6624,\n            \"ISR\": 17.1022\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.0811,\n            \"SIR\": 18.6792,\n            \"SAR\": 6.92075,\n            \"ISR\": 11.0878\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2517,\n            \"SIR\": 18.87,\n            \"SAR\": 14.6445,\n            \"ISR\": 17.0008\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0479,\n            \"SIR\": 24.0152,\n            \"SAR\": 10.809,\n            \"ISR\": 14.5945\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0018,\n            \"SIR\": 24.5053,\n            \"SAR\": 17.1403,\n            \"ISR\": 18.0614\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9579,\n            \"SIR\": 28.2774,\n            \"SAR\": 14.6901,\n            \"ISR\": 15.2356\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4458,\n            \"SIR\": 20.4421,\n            \"SAR\": 15.2414,\n            \"ISR\": 18.5656\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0729,\n            \"SIR\": 23.1717,\n            \"SAR\": 11.6019,\n            \"ISR\": 14.5734\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2532,\n            \"SIR\": 22.0329,\n            \"SAR\": 15.8748,\n            \"ISR\": 17.5089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4038,\n            \"SIR\": 31.8697,\n            \"SAR\": 12.1474,\n            \"ISR\": 13.7618\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1564,\n            \"SIR\": 25.0477,\n            \"SAR\": 21.3983,\n            \"ISR\": 19.6156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.37318,\n            \"SIR\": 11.789,\n            \"SAR\": 8.43184,\n            \"ISR\": 11.1446\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8799,\n            \"SIR\": 18.1667,\n            \"SAR\": 15.418,\n            \"ISR\": 16.803\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4004,\n            \"SIR\": 26.2568,\n            \"SAR\": 13.908,\n            \"ISR\": 16.2541\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3974,\n            \"SIR\": 31.3873,\n            \"SAR\": 24.1561,\n            \"ISR\": 19.2368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0247,\n            \"SIR\": 19.5053,\n            \"SAR\": 12.531,\n            \"ISR\": 16.2811\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5551,\n            \"SIR\": 22.5694,\n            \"SAR\": 12.6582,\n            \"ISR\": 14.8534\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.33481,\n            \"SIR\": 22.1772,\n            \"SAR\": 8.55771,\n            \"ISR\": 13.4603\n          },\n          \"instrumental\": {\n            \"SDR\": 13.551,\n            \"SIR\": 22.5918,\n            \"SAR\": 15.4703,\n            \"ISR\": 17.6572\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.37702,\n            \"SIR\": 7.8154,\n            \"SAR\": 2.33474,\n            \"ISR\": 9.81231\n          },\n          \"instrumental\": {\n            \"SDR\": 17.71,\n            \"SIR\": 30.6302,\n            \"SAR\": 22.2112,\n            \"ISR\": 18.0183\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.56022,\n            \"SIR\": 24.2622,\n            \"SAR\": 10.3906,\n            \"ISR\": 14.4689\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8912,\n            \"SIR\": 27.8564,\n            \"SAR\": 20.677,\n            \"ISR\": 18.8667\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1507,\n            \"SIR\": 27.8262,\n            \"SAR\": 11.9373,\n            \"ISR\": 14.8946\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7739,\n            \"SIR\": 29.4481,\n            \"SAR\": 22.1916,\n            \"ISR\": 19.2363\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5462,\n            \"SIR\": 27.662,\n            \"SAR\": 13.3142,\n            \"ISR\": 14.3879\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9565,\n            \"SIR\": 22.388,\n            \"SAR\": 17.7091,\n            \"ISR\": 18.4978\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.5717,\n        \"SIR\": 19.6322,\n        \"SAR\": 10.3407,\n        \"ISR\": 13.6876\n      },\n      \"instrumental\": {\n        \"SDR\": 15.2903,\n        \"SIR\": 24.1551,\n        \"SAR\": 17.7925,\n        \"ISR\": 18.0617\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"kuielab_a_other.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_a_other\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.2985,\n            \"SIR\": 23.2364,\n            \"SAR\": 10.8844,\n            \"ISR\": 15.4433\n          },\n          \"instrumental\": {\n            \"SDR\": 5.0566,\n            \"SIR\": 13.1137,\n            \"SAR\": 10.1327,\n            \"ISR\": 6.35631\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.61851,\n            \"SIR\": 15.8185,\n            \"SAR\": 8.05757,\n            \"ISR\": 13.0745\n          },\n          \"instrumental\": {\n            \"SDR\": 2.76622,\n            \"SIR\": 10.1729,\n            \"SAR\": 3.73319,\n            \"ISR\": 5.413\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.49128,\n            \"SIR\": 14.4941,\n            \"SAR\": 4.77523,\n            \"ISR\": 10.9684\n          },\n          \"instrumental\": {\n            \"SDR\": 3.74514,\n            \"SIR\": 8.76704,\n            \"SAR\": 3.87568,\n            \"ISR\": 9.0393\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.50656,\n            \"SIR\": 12.7829,\n            \"SAR\": 3.94946,\n            \"ISR\": 10.6515\n          },\n          \"instrumental\": {\n            \"SDR\": 6.48677,\n            \"SIR\": 14.9738,\n            \"SAR\": 10.1263,\n            \"ISR\": 10.4038\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.14388,\n            \"SIR\": 16.9658,\n            \"SAR\": 8.37129,\n            \"ISR\": 12.216\n          },\n          \"instrumental\": {\n            \"SDR\": 1.18375,\n            \"SIR\": 8.94635,\n            \"SAR\": 0.019145,\n            \"ISR\": 5.60027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.4545,\n            \"SIR\": 16.4952,\n            \"SAR\": 7.75036,\n            \"ISR\": 11.9116\n          },\n          \"instrumental\": {\n            \"SDR\": 0.47187,\n            \"SIR\": 7.23714,\n            \"SAR\": -1.49364,\n            \"ISR\": 4.59639\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.78665,\n            \"SIR\": 17.3403,\n            \"SAR\": 8.25838,\n            \"ISR\": 12.1739\n          },\n          \"instrumental\": {\n            \"SDR\": 2.79513,\n            \"SIR\": 10.4054,\n            \"SAR\": 2.56792,\n            \"ISR\": 7.6092\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.6235,\n            \"SIR\": 16.3783,\n            \"SAR\": 7.86388,\n            \"ISR\": 12.2587\n          },\n          \"instrumental\": {\n            \"SDR\": 7.38504,\n            \"SIR\": 13.2484,\n            \"SAR\": 12.3593,\n            \"ISR\": 10.5543\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.9138,\n            \"SIR\": 17.6851,\n            \"SAR\": 12.1946,\n            \"ISR\": 16.0613\n          },\n          \"instrumental\": {\n            \"SDR\": 1.02427,\n            \"SIR\": 4.48848,\n            \"SAR\": 5.51297,\n            \"ISR\": 2.07418\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.7856,\n            \"SIR\": 17.8252,\n            \"SAR\": 10.8393,\n            \"ISR\": 13.8617\n          },\n          \"instrumental\": {\n            \"SDR\": 1.38582,\n            \"SIR\": 7.33442,\n            \"SAR\": 0.59428,\n            \"ISR\": 4.54566\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.1912,\n            \"SIR\": 16.5443,\n            \"SAR\": 11.2273,\n            \"ISR\": 15.5455\n          },\n          \"instrumental\": {\n            \"SDR\": 1.61645,\n            \"SIR\": 6.12899,\n            \"SAR\": 3.15946,\n            \"ISR\": 3.88359\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.65245,\n            \"SIR\": 13.2616,\n            \"SAR\": 5.88494,\n            \"ISR\": 9.37478\n          },\n          \"instrumental\": {\n            \"SDR\": 4.05419,\n            \"SIR\": 11.3487,\n            \"SAR\": 4.16195,\n            \"ISR\": 8.64173\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.56849,\n            \"SIR\": 20.5174,\n            \"SAR\": 8.86627,\n            \"ISR\": 13.2972\n          },\n          \"instrumental\": {\n            \"SDR\": 3.06098,\n            \"SIR\": 9.66555,\n            \"SAR\": 3.64654,\n            \"ISR\": 5.78672\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.01072,\n            \"SIR\": 17.1826,\n            \"SAR\": 7.19506,\n            \"ISR\": 11.3628\n          },\n          \"instrumental\": {\n            \"SDR\": 4.14377,\n            \"SIR\": 12.5413,\n            \"SAR\": 5.04966,\n            \"ISR\": 8.30788\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.48973,\n            \"SIR\": 17.8002,\n            \"SAR\": 9.12187,\n            \"ISR\": 14.1446\n          },\n          \"instrumental\": {\n            \"SDR\": 5.32877,\n            \"SIR\": 12.6329,\n            \"SAR\": 10.8281,\n            \"ISR\": 7.18892\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.77581,\n            \"SIR\": 17.1362,\n            \"SAR\": 9.10962,\n            \"ISR\": 13.579\n          },\n          \"instrumental\": {\n            \"SDR\": 0.866335,\n            \"SIR\": 5.60169,\n            \"SAR\": -0.348145,\n            \"ISR\": 4.42851\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.69857,\n            \"SIR\": 18.6912,\n            \"SAR\": 8.97592,\n            \"ISR\": 13.5084\n          },\n          \"instrumental\": {\n            \"SDR\": 4.4767,\n            \"SIR\": 11.7458,\n            \"SAR\": 8.81959,\n            \"ISR\": 7.15279\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 15.7999,\n            \"SIR\": 27.0755,\n            \"SAR\": 18.1,\n            \"ISR\": 17.3565\n          },\n          \"instrumental\": {\n            \"SDR\": 1.53496,\n            \"SIR\": 8.88451,\n            \"SAR\": 15.9003,\n            \"ISR\": 2.61132\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.84939,\n            \"SIR\": 12.5711,\n            \"SAR\": 6.71769,\n            \"ISR\": 10.8188\n          },\n          \"instrumental\": {\n            \"SDR\": 4.89066,\n            \"SIR\": 8.75295,\n            \"SAR\": 7.49685,\n            \"ISR\": 6.0601\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.00353,\n            \"SIR\": 19.6584,\n            \"SAR\": 8.80098,\n            \"ISR\": 12.3314\n          },\n          \"instrumental\": {\n            \"SDR\": 3.32798,\n            \"SIR\": 10.5997,\n            \"SAR\": 2.7358,\n            \"ISR\": 7.64894\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.30968,\n            \"SIR\": 16.9177,\n            \"SAR\": 7.62686,\n            \"ISR\": 11.4196\n          },\n          \"instrumental\": {\n            \"SDR\": 5.60845,\n            \"SIR\": 14.7983,\n            \"SAR\": 6.53752,\n            \"ISR\": 10.2596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 14.0685,\n            \"SIR\": 21.6993,\n            \"SAR\": 15.715,\n            \"ISR\": 18.4531\n          },\n          \"instrumental\": {\n            \"SDR\": 0.80932,\n            \"SIR\": 3.89792,\n            \"SAR\": 7.87141,\n            \"ISR\": 1.66903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.86678,\n            \"SIR\": 11.5179,\n            \"SAR\": 6.38553,\n            \"ISR\": 10.4474\n          },\n          \"instrumental\": {\n            \"SDR\": 4.60198,\n            \"SIR\": 10.4515,\n            \"SAR\": 6.37081,\n            \"ISR\": 6.21048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.84029,\n            \"SIR\": 13.6954,\n            \"SAR\": 7.5487,\n            \"ISR\": 11.2578\n          },\n          \"instrumental\": {\n            \"SDR\": 4.30509,\n            \"SIR\": 13.7435,\n            \"SAR\": 9.24794,\n            \"ISR\": 8.15643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.59759,\n            \"SIR\": 9.55487,\n            \"SAR\": 5.2649,\n            \"ISR\": 9.49235\n          },\n          \"instrumental\": {\n            \"SDR\": 6.06816,\n            \"SIR\": 16.8876,\n            \"SAR\": 6.83275,\n            \"ISR\": 9.42662\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 13.6861,\n            \"SIR\": 28.1709,\n            \"SAR\": 14.6368,\n            \"ISR\": 16.7043\n          },\n          \"instrumental\": {\n            \"SDR\": 0.623135,\n            \"SIR\": 9.19617,\n            \"SAR\": -0.19074,\n            \"ISR\": 2.6002\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.59078,\n            \"SIR\": 19.8978,\n            \"SAR\": 7.9918,\n            \"ISR\": 13.0866\n          },\n          \"instrumental\": {\n            \"SDR\": 0.47774,\n            \"SIR\": 8.01838,\n            \"SAR\": -0.81359,\n            \"ISR\": 4.63586\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 11.8791,\n            \"SIR\": 23.4795,\n            \"SAR\": 12.433,\n            \"ISR\": 16.2657\n          },\n          \"instrumental\": {\n            \"SDR\": -0.02304,\n            \"SIR\": 3.81006,\n            \"SAR\": -5.00733,\n            \"ISR\": 0.97585\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.91592,\n            \"SIR\": 15.6572,\n            \"SAR\": 4.71229,\n            \"ISR\": 7.94667\n          },\n          \"instrumental\": {\n            \"SDR\": 4.80295,\n            \"SIR\": 16.5897,\n            \"SAR\": 4.56007,\n            \"ISR\": 12.1643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 11.6946,\n            \"SIR\": 22.2304,\n            \"SAR\": 12.5556,\n            \"ISR\": 15.0223\n          },\n          \"instrumental\": {\n            \"SDR\": -0.10172,\n            \"SIR\": 6.20824,\n            \"SAR\": -1.8584,\n            \"ISR\": 3.35012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.32112,\n            \"SIR\": 11.2157,\n            \"SAR\": 8.51001,\n            \"ISR\": 13.4655\n          },\n          \"instrumental\": {\n            \"SDR\": 0.23892,\n            \"SIR\": 5.01115,\n            \"SAR\": -2.93014,\n            \"ISR\": 3.07836\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 3.2767,\n            \"SIR\": 12.641,\n            \"SAR\": 2.06731,\n            \"ISR\": 5.88696\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5314,\n            \"SIR\": 19.9448,\n            \"SAR\": 14.8501,\n            \"ISR\": 16.4615\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.19205,\n            \"SIR\": 21.3224,\n            \"SAR\": 7.25572,\n            \"ISR\": 10.7759\n          },\n          \"instrumental\": {\n            \"SDR\": 1.34502,\n            \"SIR\": 7.19421,\n            \"SAR\": 0.986595,\n            \"ISR\": 5.66761\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.29795,\n            \"SIR\": 19.7123,\n            \"SAR\": 4.92955,\n            \"ISR\": 13.8048\n          },\n          \"instrumental\": {\n            \"SDR\": 5.94046,\n            \"SIR\": 12.3227,\n            \"SAR\": 5.77602,\n            \"ISR\": 10.8548\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.89376,\n            \"SIR\": 18.0495,\n            \"SAR\": 4.30901,\n            \"ISR\": 9.86718\n          },\n          \"instrumental\": {\n            \"SDR\": -0.04307,\n            \"SIR\": 14.8438,\n            \"SAR\": -1.44964,\n            \"ISR\": 11.0345\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.3818,\n            \"SIR\": 19.5252,\n            \"SAR\": 10.8809,\n            \"ISR\": 15.5455\n          },\n          \"instrumental\": {\n            \"SDR\": 0.87812,\n            \"SIR\": 6.53023,\n            \"SAR\": 0.28347,\n            \"ISR\": 2.71706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.25805,\n            \"SIR\": 9.44667,\n            \"SAR\": 6.65279,\n            \"ISR\": 10.3286\n          },\n          \"instrumental\": {\n            \"SDR\": 3.86658,\n            \"SIR\": 10.9645,\n            \"SAR\": 6.21744,\n            \"ISR\": 6.06446\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.36939,\n            \"SIR\": 14.2856,\n            \"SAR\": 11.1927,\n            \"ISR\": 14.0695\n          },\n          \"instrumental\": {\n            \"SDR\": 2.18852,\n            \"SIR\": 10.0559,\n            \"SAR\": 3.11571,\n            \"ISR\": 3.73256\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.63463,\n            \"SIR\": 17.4556,\n            \"SAR\": 7.86379,\n            \"ISR\": 13.5296\n          },\n          \"instrumental\": {\n            \"SDR\": 5.39697,\n            \"SIR\": 14.6953,\n            \"SAR\": 6.30495,\n            \"ISR\": 11.3929\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.81779,\n            \"SIR\": 18.4801,\n            \"SAR\": 7.6782,\n            \"ISR\": 11.9292\n          },\n          \"instrumental\": {\n            \"SDR\": 1.70499,\n            \"SIR\": 8.65265,\n            \"SAR\": 2.78163,\n            \"ISR\": 6.66371\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"instrumental\": {\n        \"SDR\": 2.92805,\n        \"SIR\": 10.1144,\n        \"SAR\": 4.01882,\n        \"ISR\": 6.13747\n      }\n    },\n    \"stems\": [\n      \"other\",\n      \"no other\"\n    ],\n    \"target_stem\": \"other\"\n  },\n  \"kuielab_a_bass.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_a_bass\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 16.0732,\n            \"SIR\": 23.7351,\n            \"SAR\": 18.5785,\n            \"ISR\": 18.8732\n          },\n          \"instrumental\": {\n            \"SDR\": 0.77436,\n            \"SIR\": 6.81879,\n            \"SAR\": 4.43829,\n            \"ISR\": 1.74334\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.4234,\n            \"SIR\": 21.1994,\n            \"SAR\": 12.4414,\n            \"ISR\": 13.555\n          },\n          \"instrumental\": {\n            \"SDR\": 2.16512,\n            \"SIR\": 5.57842,\n            \"SAR\": 2.94353,\n            \"ISR\": 3.86083\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.56121,\n            \"SIR\": 17.6101,\n            \"SAR\": 8.99326,\n            \"ISR\": 11.8569\n          },\n          \"instrumental\": {\n            \"SDR\": 2.66124,\n            \"SIR\": 6.03571,\n            \"SAR\": 3.04769,\n            \"ISR\": 6.3789\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.4728,\n            \"SIR\": 20.4749,\n            \"SAR\": 11.6721,\n            \"ISR\": 13.8727\n          },\n          \"instrumental\": {\n            \"SDR\": 2.04131,\n            \"SIR\": 4.08179,\n            \"SAR\": 3.77969,\n            \"ISR\": 4.65636\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.3345,\n            \"SIR\": 21.1304,\n            \"SAR\": 10.5167,\n            \"ISR\": 14.1136\n          },\n          \"instrumental\": {\n            \"SDR\": 1.36093,\n            \"SIR\": 7.13739,\n            \"SAR\": 0.76338,\n            \"ISR\": 7.76997\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.02994,\n            \"SIR\": 15.4654,\n            \"SAR\": 6.98974,\n            \"ISR\": 12.0058\n          },\n          \"instrumental\": {\n            \"SDR\": 2.12083,\n            \"SIR\": 12.1217,\n            \"SAR\": 1.22223,\n            \"ISR\": 12.6578\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.8816,\n            \"SIR\": 24.8207,\n            \"SAR\": 14.1756,\n            \"ISR\": 15.9029\n          },\n          \"instrumental\": {\n            \"SDR\": 0.914555,\n            \"SIR\": 4.78908,\n            \"SAR\": 0.974005,\n            \"ISR\": 3.62381\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.7034,\n            \"SIR\": 24.6163,\n            \"SAR\": 14.436,\n            \"ISR\": 16.2823\n          },\n          \"instrumental\": {\n            \"SDR\": 1.5507,\n            \"SIR\": 2.32283,\n            \"SAR\": 5.43508,\n            \"ISR\": 2.79115\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.69524,\n            \"SIR\": 20.9984,\n            \"SAR\": 8.73451,\n            \"ISR\": 11.4699\n          },\n          \"instrumental\": {\n            \"SDR\": 7.89635,\n            \"SIR\": 11.1789,\n            \"SAR\": 16.1208,\n            \"ISR\": 12.125\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.9338,\n            \"SIR\": 20.8366,\n            \"SAR\": 11.0176,\n            \"ISR\": 13.4131\n          },\n          \"instrumental\": {\n            \"SDR\": 3.07882,\n            \"SIR\": 7.71596,\n            \"SAR\": 3.94259,\n            \"ISR\": 9.66354\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.2932,\n            \"SIR\": 22.0512,\n            \"SAR\": 7.5089,\n            \"ISR\": 9.02483\n          },\n          \"instrumental\": {\n            \"SDR\": 5.33401,\n            \"SIR\": 8.1387,\n            \"SAR\": 4.32716,\n            \"ISR\": 10.3974\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.23296,\n            \"SIR\": 19.0919,\n            \"SAR\": 9.80712,\n            \"ISR\": 14.1654\n          },\n          \"instrumental\": {\n            \"SDR\": 3.55055,\n            \"SIR\": 7.82103,\n            \"SAR\": 4.57447,\n            \"ISR\": 9.43293\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 12.2386,\n            \"SIR\": 21.113,\n            \"SAR\": 13.6184,\n            \"ISR\": 17.0982\n          },\n          \"instrumental\": {\n            \"SDR\": 1.81564,\n            \"SIR\": 5.93183,\n            \"SAR\": 3.1888,\n            \"ISR\": 3.77541\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.84294,\n            \"SIR\": 19.1712,\n            \"SAR\": 10.6262,\n            \"ISR\": 14.9867\n          },\n          \"instrumental\": {\n            \"SDR\": 2.73447,\n            \"SIR\": 4.60121,\n            \"SAR\": 5.46088,\n            \"ISR\": 5.66416\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.637,\n            \"SIR\": 22.6788,\n            \"SAR\": 14.923,\n            \"ISR\": 17.2898\n          },\n          \"instrumental\": {\n            \"SDR\": 1.48216,\n            \"SIR\": 4.07368,\n            \"SAR\": 5.69331,\n            \"ISR\": 2.5765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 6.76663,\n            \"SIR\": 11.9132,\n            \"SAR\": 2.57693,\n            \"ISR\": 2.15574\n          },\n          \"instrumental\": {\n            \"SDR\": 3.63085,\n            \"SIR\": 5.80337,\n            \"SAR\": 2.23456,\n            \"ISR\": 10.5938\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.5897,\n            \"SIR\": 16.4196,\n            \"SAR\": 8.55788,\n            \"ISR\": 12.8794\n          },\n          \"instrumental\": {\n            \"SDR\": 5.77935,\n            \"SIR\": 9.62361,\n            \"SAR\": 10.3646,\n            \"ISR\": 10.4156\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.7619,\n            \"SIR\": 20.9739,\n            \"SAR\": 15.1435,\n            \"ISR\": 16.5327\n          },\n          \"instrumental\": {\n            \"SDR\": 3.34718,\n            \"SIR\": 10.9225,\n            \"SAR\": 16.2328,\n            \"ISR\": 4.80444\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.2467,\n            \"SIR\": 22.2041,\n            \"SAR\": 13.9468,\n            \"ISR\": 15.8338\n          },\n          \"instrumental\": {\n            \"SDR\": 2.47201,\n            \"SIR\": 7.27829,\n            \"SAR\": 6.67825,\n            \"ISR\": 4.46604\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.6764,\n            \"SIR\": 20.5468,\n            \"SAR\": 11.5618,\n            \"ISR\": 15.7534\n          },\n          \"instrumental\": {\n            \"SDR\": 1.37206,\n            \"SIR\": 2.91497,\n            \"SAR\": 2.12287,\n            \"ISR\": 4.2682\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.5687,\n            \"SIR\": 22.5036,\n            \"SAR\": 15.5006,\n            \"ISR\": 17.2629\n          },\n          \"instrumental\": {\n            \"SDR\": 0.46533,\n            \"SIR\": 1.29338,\n            \"SAR\": 0.56047,\n            \"ISR\": 1.74435\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.0284,\n            \"SIR\": 24.4635,\n            \"SAR\": 11.94,\n            \"ISR\": 14.4842\n          },\n          \"instrumental\": {\n            \"SDR\": 6.59616,\n            \"SIR\": 9.50216,\n            \"SAR\": 15.2833,\n            \"ISR\": 9.3666\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 4.41786,\n            \"SIR\": 12.8054,\n            \"SAR\": 4.12871,\n            \"ISR\": 7.29088\n          },\n          \"instrumental\": {\n            \"SDR\": 6.99912,\n            \"SIR\": 13.3873,\n            \"SAR\": 8.82188,\n            \"ISR\": 11.9956\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.206,\n            \"SIR\": 13.0449,\n            \"SAR\": 10.1204,\n            \"ISR\": 12.5577\n          },\n          \"instrumental\": {\n            \"SDR\": 1.70888,\n            \"SIR\": 4.51149,\n            \"SAR\": 3.62472,\n            \"ISR\": 3.40736\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.31776,\n            \"SIR\": 15.8755,\n            \"SAR\": 8.77232,\n            \"ISR\": 12.411\n          },\n          \"instrumental\": {\n            \"SDR\": 4.06327,\n            \"SIR\": 9.55606,\n            \"SAR\": 5.51186,\n            \"ISR\": 7.25932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 7.55078,\n            \"SIR\": 22.3229,\n            \"SAR\": 8.43863,\n            \"ISR\": 13.9534\n          },\n          \"instrumental\": {\n            \"SDR\": 5.34112,\n            \"SIR\": 10.3105,\n            \"SAR\": 6.56292,\n            \"ISR\": 10.5833\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.3498,\n            \"SIR\": 20.3274,\n            \"SAR\": 12.8324,\n            \"ISR\": 15.2372\n          },\n          \"instrumental\": {\n            \"SDR\": 0.51499,\n            \"SIR\": 7.12556,\n            \"SAR\": -0.36919,\n            \"ISR\": 4.43316\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.6693,\n            \"SIR\": 23.9978,\n            \"SAR\": 10.4741,\n            \"ISR\": 13.2995\n          },\n          \"instrumental\": {\n            \"SDR\": 5.38955,\n            \"SIR\": 17.6591,\n            \"SAR\": 5.16155,\n            \"ISR\": 14.0096\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.3858,\n            \"SIR\": 23.034,\n            \"SAR\": 12.2837,\n            \"ISR\": 16.3577\n          },\n          \"instrumental\": {\n            \"SDR\": 2.70918,\n            \"SIR\": 5.02455,\n            \"SAR\": 3.89302,\n            \"ISR\": 6.93987\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.31564,\n            \"SIR\": 22.3612,\n            \"SAR\": 8.39025,\n            \"ISR\": 11.9301\n          },\n          \"instrumental\": {\n            \"SDR\": 1.75347,\n            \"SIR\": 10.2886,\n            \"SAR\": 0.91249,\n            \"ISR\": 11.9752\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 4.06815,\n            \"SIR\": 13.8011,\n            \"SAR\": 2.23144,\n            \"ISR\": 5.98818\n          },\n          \"instrumental\": {\n            \"SDR\": 3.0792,\n            \"SIR\": 14.3801,\n            \"SAR\": 2.26075,\n            \"ISR\": 12.6147\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.036,\n            \"SIR\": 25.3242,\n            \"SAR\": 1.71184,\n            \"ISR\": -0.28736\n          },\n          \"instrumental\": {\n            \"SDR\": 5.7284,\n            \"SIR\": 4.49601,\n            \"SAR\": 4.01895,\n            \"ISR\": 12.3067\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.3982,\n            \"SIR\": 29.8156,\n            \"SAR\": 11.7944,\n            \"ISR\": 14.1614\n          },\n          \"instrumental\": {\n            \"SDR\": 3.00312,\n            \"SIR\": 7.01646,\n            \"SAR\": 5.97162,\n            \"ISR\": 7.55067\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 15.1121,\n            \"SIR\": 29.4922,\n            \"SAR\": 17.2961,\n            \"ISR\": 18.2611\n          },\n          \"instrumental\": {\n            \"SDR\": 0.4544,\n            \"SIR\": 2.09194,\n            \"SAR\": 0.969195,\n            \"ISR\": 3.11978\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.9887,\n            \"SIR\": 22.3777,\n            \"SAR\": 11.9519,\n            \"ISR\": 15.9469\n          },\n          \"instrumental\": {\n            \"SDR\": -1.89782,\n            \"SIR\": 1.2986,\n            \"SAR\": -6.09063,\n            \"ISR\": 2.9936\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.05815,\n            \"SIR\": 18.1841,\n            \"SAR\": 9.15806,\n            \"ISR\": 13.3938\n          },\n          \"instrumental\": {\n            \"SDR\": 3.70975,\n            \"SIR\": 10.6943,\n            \"SAR\": 3.84451,\n            \"ISR\": 10.0631\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.93678,\n            \"SIR\": 11.8764,\n            \"SAR\": 0.5732,\n            \"ISR\": -4.58055\n          },\n          \"instrumental\": {\n            \"SDR\": 3.03604,\n            \"SIR\": -3.3435,\n            \"SAR\": 1.05938,\n            \"ISR\": 5.61012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 12.2636,\n            \"SIR\": 20.7013,\n            \"SAR\": 6.50702,\n            \"ISR\": 6.90626\n          },\n          \"instrumental\": {\n            \"SDR\": 2.49143,\n            \"SIR\": 3.84323,\n            \"SAR\": 2.59421,\n            \"ISR\": 4.77818\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.3077,\n            \"SIR\": 15.372,\n            \"SAR\": 13.4569,\n            \"ISR\": 17.2716\n          },\n          \"instrumental\": {\n            \"SDR\": 0.45525,\n            \"SIR\": 1.8495,\n            \"SAR\": -0.921975,\n            \"ISR\": 1.78777\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.99375,\n            \"SIR\": 20.7437,\n            \"SAR\": 2.10514,\n            \"ISR\": 0.89566\n          },\n          \"instrumental\": {\n            \"SDR\": 2.68674,\n            \"SIR\": -0.00767,\n            \"SAR\": 0.97192,\n            \"ISR\": 7.95314\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"bass\": {\n        \"SDR\": 10.4481,\n        \"SIR\": 20.9862,\n        \"SAR\": 10.5715,\n        \"ISR\": 13.9131\n      },\n      \"instrumental\": {\n        \"SDR\": 2.67399,\n        \"SIR\": 6.42725,\n        \"SAR\": 3.8121,\n        \"ISR\": 6.65939\n      }\n    },\n    \"stems\": [\n      \"bass\",\n      \"no bass\"\n    ],\n    \"target_stem\": \"bass\"\n  },\n  \"kuielab_a_drums.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_a_drums\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 2.18069,\n            \"SIR\": 12.4405,\n            \"SAR\": 0.482,\n            \"ISR\": 3.95218\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5281,\n            \"SIR\": 24.8726,\n            \"SAR\": 10.707,\n            \"ISR\": 18.0658\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.09482,\n            \"SIR\": 15.3373,\n            \"SAR\": 5.69309,\n            \"ISR\": 9.71032\n          },\n          \"instrumental\": {\n            \"SDR\": 5.95259,\n            \"SIR\": 12.4937,\n            \"SAR\": 6.345,\n            \"ISR\": 13.3849\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.65583,\n            \"SIR\": 13.2083,\n            \"SAR\": 8.22115,\n            \"ISR\": 15.202\n          },\n          \"instrumental\": {\n            \"SDR\": 1.96653,\n            \"SIR\": 4.93443,\n            \"SAR\": 1.95511,\n            \"ISR\": 5.13672\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.74451,\n            \"SIR\": 16.8515,\n            \"SAR\": 8.00592,\n            \"ISR\": 11.9181\n          },\n          \"instrumental\": {\n            \"SDR\": 3.37901,\n            \"SIR\": 5.6739,\n            \"SAR\": 5.9064,\n            \"ISR\": 6.87746\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.3774,\n            \"SIR\": 17.358,\n            \"SAR\": 8.9703,\n            \"ISR\": 13.4097\n          },\n          \"instrumental\": {\n            \"SDR\": 1.13225,\n            \"SIR\": 5.78609,\n            \"SAR\": -0.02352,\n            \"ISR\": 7.20048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.92681,\n            \"SIR\": 17.9665,\n            \"SAR\": 9.83928,\n            \"ISR\": 13.4112\n          },\n          \"instrumental\": {\n            \"SDR\": 0.60597,\n            \"SIR\": 6.57224,\n            \"SAR\": -0.71987,\n            \"ISR\": 5.69672\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.79356,\n            \"SIR\": 18.9857,\n            \"SAR\": 9.26457,\n            \"ISR\": 14.0863\n          },\n          \"instrumental\": {\n            \"SDR\": 3.21713,\n            \"SIR\": 9.42799,\n            \"SAR\": 3.47756,\n            \"ISR\": 8.76188\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.5485,\n            \"SIR\": 14.2432,\n            \"SAR\": 7.06405,\n            \"ISR\": 11.4376\n          },\n          \"instrumental\": {\n            \"SDR\": 6.09861,\n            \"SIR\": 8.81919,\n            \"SAR\": 8.56361,\n            \"ISR\": 9.31477\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.91834,\n            \"SIR\": 13.4484,\n            \"SAR\": 5.95597,\n            \"ISR\": 11.5345\n          },\n          \"instrumental\": {\n            \"SDR\": 5.67263,\n            \"SIR\": 11.0306,\n            \"SAR\": 10.8119,\n            \"ISR\": 9.83723\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.0039,\n            \"SIR\": 12.054,\n            \"SAR\": 6.81685,\n            \"ISR\": 13.0093\n          },\n          \"instrumental\": {\n            \"SDR\": 3.0156,\n            \"SIR\": 6.44809,\n            \"SAR\": 3.29406,\n            \"ISR\": 6.11917\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.15806,\n            \"SIR\": 15.13,\n            \"SAR\": 10.096,\n            \"ISR\": 15.7347\n          },\n          \"instrumental\": {\n            \"SDR\": 2.78024,\n            \"SIR\": 7.20549,\n            \"SAR\": 5.52393,\n            \"ISR\": 6.07613\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.31067,\n            \"SIR\": 17.63,\n            \"SAR\": 10.2502,\n            \"ISR\": 13.9311\n          },\n          \"instrumental\": {\n            \"SDR\": 1.75991,\n            \"SIR\": 3.09722,\n            \"SAR\": 2.26884,\n            \"ISR\": 4.17393\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.83538,\n            \"SIR\": 13.4238,\n            \"SAR\": 4.36491,\n            \"ISR\": 10.2342\n          },\n          \"instrumental\": {\n            \"SDR\": 5.97447,\n            \"SIR\": 12.2097,\n            \"SAR\": 5.75242,\n            \"ISR\": 12.1273\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.28863,\n            \"SIR\": 14.0002,\n            \"SAR\": 7.68004,\n            \"ISR\": 13.8658\n          },\n          \"instrumental\": {\n            \"SDR\": 2.74748,\n            \"SIR\": 5.41681,\n            \"SAR\": 3.99022,\n            \"ISR\": 6.19914\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.78078,\n            \"SIR\": 14.9289,\n            \"SAR\": 5.43252,\n            \"ISR\": 10.7921\n          },\n          \"instrumental\": {\n            \"SDR\": 7.69169,\n            \"SIR\": 11.3176,\n            \"SAR\": 9.49023,\n            \"ISR\": 12.6742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.24303,\n            \"SIR\": 19.0052,\n            \"SAR\": 5.74591,\n            \"ISR\": 7.47942\n          },\n          \"instrumental\": {\n            \"SDR\": 2.78144,\n            \"SIR\": 8.07551,\n            \"SAR\": 2.13371,\n            \"ISR\": 8.10865\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.6149,\n            \"SIR\": 19.3239,\n            \"SAR\": 11.4845,\n            \"ISR\": 16.2255\n          },\n          \"instrumental\": {\n            \"SDR\": 1.56215,\n            \"SIR\": 3.60926,\n            \"SAR\": 4.0495,\n            \"ISR\": 3.04256\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 1.03858,\n            \"SIR\": 9.65585,\n            \"SAR\": -1.85758,\n            \"ISR\": 2.59726\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7024,\n            \"SIR\": 29.8033,\n            \"SAR\": 22.5688,\n            \"ISR\": 18.8863\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.99062,\n            \"SIR\": 12.4202,\n            \"SAR\": 6.89015,\n            \"ISR\": 11.2594\n          },\n          \"instrumental\": {\n            \"SDR\": 4.55727,\n            \"SIR\": 10.2386,\n            \"SAR\": 6.65771,\n            \"ISR\": 8.4532\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.62737,\n            \"SIR\": 12.3577,\n            \"SAR\": 6.15831,\n            \"ISR\": 12.8649\n          },\n          \"instrumental\": {\n            \"SDR\": 3.41299,\n            \"SIR\": 7.60883,\n            \"SAR\": 3.7214,\n            \"ISR\": 8.48702\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.11632,\n            \"SIR\": 15.3467,\n            \"SAR\": 4.99825,\n            \"ISR\": 10.9375\n          },\n          \"instrumental\": {\n            \"SDR\": 6.29898,\n            \"SIR\": 12.667,\n            \"SAR\": 7.13854,\n            \"ISR\": 13.4354\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 3.57077,\n            \"SIR\": 11.2833,\n            \"SAR\": 3.1214,\n            \"ISR\": 9.30502\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1914,\n            \"SIR\": 15.2058,\n            \"SAR\": 16.1805,\n            \"ISR\": 14.0507\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.93091,\n            \"SIR\": 11.5098,\n            \"SAR\": 8.36715,\n            \"ISR\": 12.9866\n          },\n          \"instrumental\": {\n            \"SDR\": 2.57479,\n            \"SIR\": 6.441,\n            \"SAR\": 5.0244,\n            \"ISR\": 3.94392\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.96706,\n            \"SIR\": 10.6288,\n            \"SAR\": 4.86976,\n            \"ISR\": 11.0817\n          },\n          \"instrumental\": {\n            \"SDR\": 4.94332,\n            \"SIR\": 9.20077,\n            \"SAR\": 6.34731,\n            \"ISR\": 8.51785\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.73033,\n            \"SIR\": 15.4582,\n            \"SAR\": 11.0858,\n            \"ISR\": 15.3206\n          },\n          \"instrumental\": {\n            \"SDR\": 1.63607,\n            \"SIR\": 5.84164,\n            \"SAR\": 3.04934,\n            \"ISR\": 3.18988\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.76085,\n            \"SIR\": 18.638,\n            \"SAR\": 9.21645,\n            \"ISR\": 15.8884\n          },\n          \"instrumental\": {\n            \"SDR\": 2.69699,\n            \"SIR\": 9.48389,\n            \"SAR\": 5.05059,\n            \"ISR\": 7.81328\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.20553,\n            \"SIR\": 15.6011,\n            \"SAR\": 5.58087,\n            \"ISR\": 10.4822\n          },\n          \"instrumental\": {\n            \"SDR\": 2.12568,\n            \"SIR\": 10.77,\n            \"SAR\": 1.21319,\n            \"ISR\": 10.8006\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.8125,\n            \"SIR\": 15.4991,\n            \"SAR\": 5.55885,\n            \"ISR\": 7.61156\n          },\n          \"instrumental\": {\n            \"SDR\": 5.71126,\n            \"SIR\": 16.0341,\n            \"SAR\": 5.10574,\n            \"ISR\": 14.3044\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.64094,\n            \"SIR\": 19.0624,\n            \"SAR\": 7.86968,\n            \"ISR\": 10.2876\n          },\n          \"instrumental\": {\n            \"SDR\": 1.55153,\n            \"SIR\": 4.16771,\n            \"SAR\": 1.05764,\n            \"ISR\": 5.15522\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.85053,\n            \"SIR\": 17.5839,\n            \"SAR\": 9.36337,\n            \"ISR\": 14.1257\n          },\n          \"instrumental\": {\n            \"SDR\": 0.38816,\n            \"SIR\": 6.15087,\n            \"SAR\": -0.90502,\n            \"ISR\": 5.6844\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.00775,\n            \"SIR\": 15.0546,\n            \"SAR\": 7.24981,\n            \"ISR\": 11.6443\n          },\n          \"instrumental\": {\n            \"SDR\": 2.00295,\n            \"SIR\": 8.67782,\n            \"SAR\": 1.303,\n            \"ISR\": 8.34425\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 14.2848,\n            \"SIR\": 21.1582,\n            \"SAR\": 15.7269,\n            \"ISR\": 20.1131\n          },\n          \"instrumental\": {\n            \"SDR\": -0.08024,\n            \"SIR\": -0.94982,\n            \"SAR\": -0.27963,\n            \"ISR\": 0.84907\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.3476,\n            \"SIR\": 22.8908,\n            \"SAR\": 10.7892,\n            \"ISR\": 15.3912\n          },\n          \"instrumental\": {\n            \"SDR\": 2.17779,\n            \"SIR\": 5.26827,\n            \"SAR\": 3.96689,\n            \"ISR\": 6.6059\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 11.1725,\n            \"SIR\": 25.784,\n            \"SAR\": 11.1362,\n            \"ISR\": 16.7703\n          },\n          \"instrumental\": {\n            \"SDR\": 3.75723,\n            \"SIR\": 8.86305,\n            \"SAR\": 4.90081,\n            \"ISR\": 9.77396\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.34164,\n            \"SIR\": 12.5832,\n            \"SAR\": 4.34374,\n            \"ISR\": 9.81655\n          },\n          \"instrumental\": {\n            \"SDR\": -0.04514,\n            \"SIR\": 10.6225,\n            \"SAR\": -1.59781,\n            \"ISR\": 10.5392\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.46053,\n            \"SIR\": 18.3525,\n            \"SAR\": 7.97954,\n            \"ISR\": 11.9513\n          },\n          \"instrumental\": {\n            \"SDR\": 3.82152,\n            \"SIR\": 10.0609,\n            \"SAR\": 3.93938,\n            \"ISR\": 9.25875\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.57404,\n            \"SIR\": 11.7291,\n            \"SAR\": 5.93683,\n            \"ISR\": 10.71\n          },\n          \"instrumental\": {\n            \"SDR\": 4.46425,\n            \"SIR\": 9.35077,\n            \"SAR\": 7.31786,\n            \"ISR\": 7.47718\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 3.47832,\n            \"SIR\": 9.67822,\n            \"SAR\": 2.87043,\n            \"ISR\": 7.64742\n          },\n          \"instrumental\": {\n            \"SDR\": 6.24888,\n            \"SIR\": 14.4736,\n            \"SAR\": 5.97047,\n            \"ISR\": 13.5767\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.91571,\n            \"SIR\": 16.725,\n            \"SAR\": 5.15543,\n            \"ISR\": 10.7169\n          },\n          \"instrumental\": {\n            \"SDR\": 5.68382,\n            \"SIR\": 10.4048,\n            \"SAR\": 6.52787,\n            \"ISR\": 11.3673\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.6703,\n            \"SIR\": 20.4944,\n            \"SAR\": 10.7375,\n            \"ISR\": 15.5134\n          },\n          \"instrumental\": {\n            \"SDR\": 1.96625,\n            \"SIR\": 5.47078,\n            \"SAR\": 3.88715,\n            \"ISR\": 6.46382\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"drums\": {\n        \"SDR\": 6.99726,\n        \"SIR\": 15.342,\n        \"SAR\": 7.15693,\n        \"ISR\": 11.7812\n      },\n      \"instrumental\": {\n        \"SDR\": 3.11637,\n        \"SIR\": 8.84112,\n        \"SAR\": 4.47515,\n        \"ISR\": 8.39873\n      }\n    },\n    \"stems\": [\n      \"drums\",\n      \"no drums\"\n    ],\n    \"target_stem\": \"drums\"\n  },\n  \"kuielab_b_vocals.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_b_vocals\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.30456,\n            \"SIR\": 21.1354,\n            \"SAR\": 4.92979,\n            \"ISR\": 8.8335\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2972,\n            \"SIR\": 23.2941,\n            \"SAR\": 19.5481,\n            \"ISR\": 19.0885\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.5451,\n            \"SIR\": 19.2118,\n            \"SAR\": 7.6691,\n            \"ISR\": 11.8577\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8641,\n            \"SIR\": 20.8663,\n            \"SAR\": 14.6975,\n            \"ISR\": 17.5673\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.57259,\n            \"SIR\": 20.0866,\n            \"SAR\": 10.3754,\n            \"ISR\": 14.5497\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0904,\n            \"SIR\": 25.4261,\n            \"SAR\": 17.5226,\n            \"ISR\": 17.8968\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.16702,\n            \"SIR\": 5.81915,\n            \"SAR\": 4.30808,\n            \"ISR\": 12.0934\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9453,\n            \"SIR\": 25.1263,\n            \"SAR\": 13.5379,\n            \"ISR\": 13.5175\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0609,\n            \"SIR\": 22.2485,\n            \"SAR\": 13.2409,\n            \"ISR\": 15.8491\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8384,\n            \"SIR\": 23.4062,\n            \"SAR\": 15.3105,\n            \"ISR\": 17.0161\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.72609,\n            \"SIR\": 17.9826,\n            \"SAR\": 9.516,\n            \"ISR\": 12.4338\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9397,\n            \"SIR\": 17.4942,\n            \"SAR\": 12.5101,\n            \"ISR\": 15.8853\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8537,\n            \"SIR\": 19.7906,\n            \"SAR\": 11.9351,\n            \"ISR\": 14.259\n          },\n          \"instrumental\": {\n            \"SDR\": 14.508,\n            \"SIR\": 22.9531,\n            \"SAR\": 16.7366,\n            \"ISR\": 17.6048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.30976,\n            \"SIR\": 17.827,\n            \"SAR\": 6.19641,\n            \"ISR\": 10.3881\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2253,\n            \"SIR\": 29.9346,\n            \"SAR\": 23.9521,\n            \"ISR\": 19.3701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5207,\n            \"SIR\": 29.5724,\n            \"SAR\": 12.1356,\n            \"ISR\": 15.5681\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7669,\n            \"SIR\": 25.0363,\n            \"SAR\": 16.6717,\n            \"ISR\": 18.7621\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2081,\n            \"SIR\": 19.685,\n            \"SAR\": 9.90594,\n            \"ISR\": 12.1775\n          },\n          \"instrumental\": {\n            \"SDR\": 15.188,\n            \"SIR\": 18.9637,\n            \"SAR\": 15.3158,\n            \"ISR\": 17.3308\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7991,\n            \"SIR\": 33.1809,\n            \"SAR\": 15.2835,\n            \"ISR\": 17.1035\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1131,\n            \"SIR\": 27.4016,\n            \"SAR\": 18.6377,\n            \"ISR\": 19.297\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8648,\n            \"SIR\": 21.542,\n            \"SAR\": 11.5209,\n            \"ISR\": 14.9481\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0247,\n            \"SIR\": 25.8077,\n            \"SAR\": 17.4885,\n            \"ISR\": 18.2814\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.07043,\n            \"SIR\": 18.7006,\n            \"SAR\": 7.42649,\n            \"ISR\": 12.8412\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3272,\n            \"SIR\": 25.0967,\n            \"SAR\": 17.139,\n            \"ISR\": 17.6419\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.13561,\n            \"SIR\": 19.2962,\n            \"SAR\": 9.17369,\n            \"ISR\": 12.3769\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7564,\n            \"SIR\": 24.1918,\n            \"SAR\": 22.189,\n            \"ISR\": 18.7132\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.58865,\n            \"SIR\": 17.1991,\n            \"SAR\": 3.28129,\n            \"ISR\": 9.40123\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5546,\n            \"SIR\": 22.6272,\n            \"SAR\": 19.1176,\n            \"ISR\": 18.607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.21084,\n            \"SIR\": 19.9805,\n            \"SAR\": 9.84155,\n            \"ISR\": 14.2361\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9441,\n            \"SIR\": 23.4221,\n            \"SAR\": 15.9075,\n            \"ISR\": 17.5219\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.43341,\n            \"SIR\": 25.4898,\n            \"SAR\": 8.79153,\n            \"ISR\": 12.646\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9751,\n            \"SIR\": 23.1067,\n            \"SAR\": 17.4988,\n            \"ISR\": 18.9181\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -3.9442,\n            \"SIR\": -19.9831,\n            \"SAR\": 0.177055,\n            \"ISR\": 5.69437\n          },\n          \"instrumental\": {\n            \"SDR\": 19.9266,\n            \"SIR\": 54.9052,\n            \"SAR\": 32.5103,\n            \"ISR\": 18.7894\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.12483,\n            \"SIR\": 15.5136,\n            \"SAR\": 7.31829,\n            \"ISR\": 11.298\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2242,\n            \"SIR\": 22.4601,\n            \"SAR\": 18.4828,\n            \"ISR\": 17.7294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2207,\n            \"SIR\": 22.5102,\n            \"SAR\": 10.638,\n            \"ISR\": 14.2634\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1849,\n            \"SIR\": 24.797,\n            \"SAR\": 17.2016,\n            \"ISR\": 18.4404\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5309,\n            \"SIR\": 22.6299,\n            \"SAR\": 10.9705,\n            \"ISR\": 14.919\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2178,\n            \"SIR\": 29.3981,\n            \"SAR\": 21.1577,\n            \"ISR\": 18.9597\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.32653,\n            \"SIR\": -0.30444,\n            \"SAR\": -0.08732,\n            \"ISR\": 3.13562\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7604,\n            \"SIR\": 39.3107,\n            \"SAR\": 31.6501,\n            \"ISR\": 19.1183\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.18968,\n            \"SIR\": 15.2985,\n            \"SAR\": 3.81692,\n            \"ISR\": 6.51706\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8661,\n            \"SIR\": 14.3243,\n            \"SAR\": 14.6505,\n            \"ISR\": 17.2804\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.04885,\n            \"SIR\": 17.7424,\n            \"SAR\": 6.3453,\n            \"ISR\": 13.2417\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1538,\n            \"SIR\": 28.759,\n            \"SAR\": 20.3809,\n            \"ISR\": 18.7932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.11171,\n            \"SIR\": 17.8422,\n            \"SAR\": 7.33663,\n            \"ISR\": 10.8895\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0838,\n            \"SIR\": 21.0175,\n            \"SAR\": 16.9227,\n            \"ISR\": 18.1265\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.13028,\n            \"SIR\": 18.1819,\n            \"SAR\": 8.53843,\n            \"ISR\": 13.5881\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1421,\n            \"SIR\": 24.3556,\n            \"SAR\": 16.987,\n            \"ISR\": 17.7103\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3791,\n            \"SIR\": 23.101,\n            \"SAR\": 12.2295,\n            \"ISR\": 14.8481\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7279,\n            \"SIR\": 21.7571,\n            \"SAR\": 14.1367,\n            \"ISR\": 17.0832\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.60102,\n            \"SIR\": 19.4187,\n            \"SAR\": 6.18339,\n            \"ISR\": 10.1852\n          },\n          \"instrumental\": {\n            \"SDR\": 13.086,\n            \"SIR\": 17.5006,\n            \"SAR\": 14.2845,\n            \"ISR\": 17.3025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.73211,\n            \"SIR\": 23.2544,\n            \"SAR\": 9.96806,\n            \"ISR\": 14.911\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5184,\n            \"SIR\": 24.6816,\n            \"SAR\": 16.6041,\n            \"ISR\": 17.9132\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7085,\n            \"SIR\": 27.9944,\n            \"SAR\": 12.9552,\n            \"ISR\": 14.6582\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1468,\n            \"SIR\": 18.97,\n            \"SAR\": 13.8095,\n            \"ISR\": 18.5127\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9392,\n            \"SIR\": 23.0397,\n            \"SAR\": 11.094,\n            \"ISR\": 14.6053\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9582,\n            \"SIR\": 21.9524,\n            \"SAR\": 15.2822,\n            \"ISR\": 17.4525\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4632,\n            \"SIR\": 32.1246,\n            \"SAR\": 11.758,\n            \"ISR\": 13.806\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9755,\n            \"SIR\": 24.8777,\n            \"SAR\": 21.054,\n            \"ISR\": 19.6166\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.81324,\n            \"SIR\": 12.9986,\n            \"SAR\": 6.44942,\n            \"ISR\": 9.75713\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0588,\n            \"SIR\": 17.2469,\n            \"SAR\": 14.0433,\n            \"ISR\": 17.3272\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0319,\n            \"SIR\": 25.9071,\n            \"SAR\": 13.1343,\n            \"ISR\": 16.9845\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2416,\n            \"SIR\": 33.2648,\n            \"SAR\": 23.3783,\n            \"ISR\": 19.1745\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9654,\n            \"SIR\": 19.0299,\n            \"SAR\": 12.2731,\n            \"ISR\": 17.1389\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4665,\n            \"SIR\": 23.8385,\n            \"SAR\": 12.02,\n            \"ISR\": 14.5321\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.89718,\n            \"SIR\": 22.9178,\n            \"SAR\": 8.12822,\n            \"ISR\": 12.7305\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3542,\n            \"SIR\": 21.5011,\n            \"SAR\": 14.9012,\n            \"ISR\": 17.8808\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.49519,\n            \"SIR\": 7.84096,\n            \"SAR\": 2.31508,\n            \"ISR\": 9.60146\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4841,\n            \"SIR\": 30.6488,\n            \"SAR\": 21.8172,\n            \"ISR\": 17.9845\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.46895,\n            \"SIR\": 24.8831,\n            \"SAR\": 10.2708,\n            \"ISR\": 14.8141\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7038,\n            \"SIR\": 28.8329,\n            \"SAR\": 19.9919,\n            \"ISR\": 18.9152\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7288,\n            \"SIR\": 26.9895,\n            \"SAR\": 10.9454,\n            \"ISR\": 15.3945\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5598,\n            \"SIR\": 30.5208,\n            \"SAR\": 21.2514,\n            \"ISR\": 19.1324\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8685,\n            \"SIR\": 27.7549,\n            \"SAR\": 12.5085,\n            \"ISR\": 13.9922\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2331,\n            \"SIR\": 21.0278,\n            \"SAR\": 16.4572,\n            \"ISR\": 18.4063\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.96846,\n        \"SIR\": 19.8856,\n        \"SAR\": 9.67877,\n        \"ISR\": 13.4149\n      },\n      \"instrumental\": {\n        \"SDR\": 15.0575,\n        \"SIR\": 24.0151,\n        \"SAR\": 17.063,\n        \"ISR\": 18.0555\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"kuielab_b_other.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_b_other\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.64939,\n            \"SIR\": 22.6796,\n            \"SAR\": 10.5282,\n            \"ISR\": 13.6929\n          },\n          \"instrumental\": {\n            \"SDR\": 5.30899,\n            \"SIR\": 13.7586,\n            \"SAR\": 10.6078,\n            \"ISR\": 6.64791\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.2477,\n            \"SIR\": 15.4396,\n            \"SAR\": 7.63335,\n            \"ISR\": 11.639\n          },\n          \"instrumental\": {\n            \"SDR\": 3.03107,\n            \"SIR\": 10.8778,\n            \"SAR\": 4.0042,\n            \"ISR\": 5.69864\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.62974,\n            \"SIR\": 12.6399,\n            \"SAR\": 3.78679,\n            \"ISR\": 9.55436\n          },\n          \"instrumental\": {\n            \"SDR\": 4.20871,\n            \"SIR\": 9.70324,\n            \"SAR\": 4.30325,\n            \"ISR\": 9.23504\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 0.4693,\n            \"SIR\": 0.85681,\n            \"SAR\": -1.61364,\n            \"ISR\": 2.07061\n          },\n          \"instrumental\": {\n            \"SDR\": 8.0243,\n            \"SIR\": 27.0428,\n            \"SAR\": 8.50187,\n            \"ISR\": 13.9945\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.54582,\n            \"SIR\": 16.0865,\n            \"SAR\": 7.94442,\n            \"ISR\": 10.7807\n          },\n          \"instrumental\": {\n            \"SDR\": 1.41065,\n            \"SIR\": 9.83694,\n            \"SAR\": 0.046955,\n            \"ISR\": 5.96283\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.94498,\n            \"SIR\": 15.6722,\n            \"SAR\": 5.37025,\n            \"ISR\": 8.2591\n          },\n          \"instrumental\": {\n            \"SDR\": 0.95701,\n            \"SIR\": 10.5176,\n            \"SAR\": -0.81633,\n            \"ISR\": 5.7793\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.03697,\n            \"SIR\": 15.627,\n            \"SAR\": 7.49614,\n            \"ISR\": 10.7276\n          },\n          \"instrumental\": {\n            \"SDR\": 3.03031,\n            \"SIR\": 11.1621,\n            \"SAR\": 2.57803,\n            \"ISR\": 7.89506\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.99408,\n            \"SIR\": 15.0626,\n            \"SAR\": 7.01234,\n            \"ISR\": 11.1562\n          },\n          \"instrumental\": {\n            \"SDR\": 7.57224,\n            \"SIR\": 13.75,\n            \"SAR\": 12.5566,\n            \"ISR\": 10.7771\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.65245,\n            \"SIR\": 17.3684,\n            \"SAR\": 9.95854,\n            \"ISR\": 11.2087\n          },\n          \"instrumental\": {\n            \"SDR\": 1.7757,\n            \"SIR\": 7.2058,\n            \"SAR\": 5.30094,\n            \"ISR\": 3.13038\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.64301,\n            \"SIR\": 16.7565,\n            \"SAR\": 9.93033,\n            \"ISR\": 11.6496\n          },\n          \"instrumental\": {\n            \"SDR\": 1.7586,\n            \"SIR\": 8.82671,\n            \"SAR\": 1.64525,\n            \"ISR\": 4.98343\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.22624,\n            \"SIR\": 16.3695,\n            \"SAR\": 10.7097,\n            \"ISR\": 13.1194\n          },\n          \"instrumental\": {\n            \"SDR\": 1.96556,\n            \"SIR\": 7.01307,\n            \"SAR\": 3.22011,\n            \"ISR\": 4.33743\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.41656,\n            \"SIR\": 12.4231,\n            \"SAR\": 5.63976,\n            \"ISR\": 8.89898\n          },\n          \"instrumental\": {\n            \"SDR\": 4.07683,\n            \"SIR\": 11.7097,\n            \"SAR\": 4.2601,\n            \"ISR\": 8.76732\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.92765,\n            \"SIR\": 19.5166,\n            \"SAR\": 8.54674,\n            \"ISR\": 11.6073\n          },\n          \"instrumental\": {\n            \"SDR\": 3.4921,\n            \"SIR\": 10.4253,\n            \"SAR\": 3.85245,\n            \"ISR\": 6.17409\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.19542,\n            \"SIR\": 17.363,\n            \"SAR\": 6.61771,\n            \"ISR\": 9.61574\n          },\n          \"instrumental\": {\n            \"SDR\": 4.35067,\n            \"SIR\": 13.7201,\n            \"SAR\": 4.96223,\n            \"ISR\": 8.76094\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.09068,\n            \"SIR\": 17.6092,\n            \"SAR\": 8.98422,\n            \"ISR\": 12.3712\n          },\n          \"instrumental\": {\n            \"SDR\": 5.59829,\n            \"SIR\": 13.2661,\n            \"SAR\": 11.1781,\n            \"ISR\": 7.56892\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.31856,\n            \"SIR\": 16.4479,\n            \"SAR\": 8.85628,\n            \"ISR\": 12.1086\n          },\n          \"instrumental\": {\n            \"SDR\": 1.09208,\n            \"SIR\": 6.40031,\n            \"SAR\": 1.4028,\n            \"ISR\": 4.74192\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.92887,\n            \"SIR\": 17.3064,\n            \"SAR\": 8.32305,\n            \"ISR\": 12.2517\n          },\n          \"instrumental\": {\n            \"SDR\": 4.65095,\n            \"SIR\": 12.1399,\n            \"SAR\": 7.77261,\n            \"ISR\": 7.42832\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 14.7176,\n            \"SIR\": 26.3088,\n            \"SAR\": 18.1685,\n            \"ISR\": 15.5724\n          },\n          \"instrumental\": {\n            \"SDR\": 1.81029,\n            \"SIR\": 9.50112,\n            \"SAR\": 16.0224,\n            \"ISR\": 2.87091\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.11759,\n            \"SIR\": 11.4394,\n            \"SAR\": 6.11227,\n            \"ISR\": 9.88093\n          },\n          \"instrumental\": {\n            \"SDR\": 5.39452,\n            \"SIR\": 9.51154,\n            \"SAR\": 7.74441,\n            \"ISR\": 6.25688\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.35569,\n            \"SIR\": 18.0482,\n            \"SAR\": 8.00892,\n            \"ISR\": 11.0704\n          },\n          \"instrumental\": {\n            \"SDR\": 3.34213,\n            \"SIR\": 11.1823,\n            \"SAR\": 2.85,\n            \"ISR\": 7.93734\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.51418,\n            \"SIR\": 15.8501,\n            \"SAR\": 6.8566,\n            \"ISR\": 9.71067\n          },\n          \"instrumental\": {\n            \"SDR\": 5.60238,\n            \"SIR\": 15.7301,\n            \"SAR\": 6.51614,\n            \"ISR\": 10.6595\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 11.8816,\n            \"SIR\": 22.2792,\n            \"SAR\": 14.7422,\n            \"ISR\": 14.3541\n          },\n          \"instrumental\": {\n            \"SDR\": 1.25678,\n            \"SIR\": 5.84482,\n            \"SAR\": 7.69005,\n            \"ISR\": 2.15879\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.13684,\n            \"SIR\": 9.7413,\n            \"SAR\": 5.93986,\n            \"ISR\": 9.53541\n          },\n          \"instrumental\": {\n            \"SDR\": 4.71079,\n            \"SIR\": 11.1188,\n            \"SAR\": 6.19342,\n            \"ISR\": 6.3735\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.86376,\n            \"SIR\": 12.5694,\n            \"SAR\": 6.73528,\n            \"ISR\": 10.0457\n          },\n          \"instrumental\": {\n            \"SDR\": 4.55425,\n            \"SIR\": 14.1528,\n            \"SAR\": 9.23272,\n            \"ISR\": 8.42971\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.0522,\n            \"SIR\": 7.53218,\n            \"SAR\": 4.68118,\n            \"ISR\": 9.03075\n          },\n          \"instrumental\": {\n            \"SDR\": 5.9781,\n            \"SIR\": 17.3417,\n            \"SAR\": 6.62162,\n            \"ISR\": 9.33317\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 12.6976,\n            \"SIR\": 27.5416,\n            \"SAR\": 14.0998,\n            \"ISR\": 14.8159\n          },\n          \"instrumental\": {\n            \"SDR\": 0.91939,\n            \"SIR\": 9.68124,\n            \"SAR\": 0.85521,\n            \"ISR\": 2.94146\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.8383,\n            \"SIR\": 18.6822,\n            \"SAR\": 7.26491,\n            \"ISR\": 11.4248\n          },\n          \"instrumental\": {\n            \"SDR\": 0.67544,\n            \"SIR\": 8.92284,\n            \"SAR\": -0.93616,\n            \"ISR\": 5.03267\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 11.1212,\n            \"SIR\": 21.7538,\n            \"SAR\": 12.0377,\n            \"ISR\": 14.7442\n          },\n          \"instrumental\": {\n            \"SDR\": 0.26136,\n            \"SIR\": 5.31244,\n            \"SAR\": -4.16131,\n            \"ISR\": 1.22848\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 3.0719,\n            \"SIR\": 5.51981,\n            \"SAR\": 2.40327,\n            \"ISR\": 7.20397\n          },\n          \"instrumental\": {\n            \"SDR\": 4.59113,\n            \"SIR\": 17.0407,\n            \"SAR\": 3.84494,\n            \"ISR\": 11.16\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 10.3447,\n            \"SIR\": 20.6822,\n            \"SAR\": 11.2742,\n            \"ISR\": 12.9132\n          },\n          \"instrumental\": {\n            \"SDR\": 0.23595,\n            \"SIR\": 7.00732,\n            \"SAR\": -1.69078,\n            \"ISR\": 3.77278\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.5955,\n            \"SIR\": 9.80763,\n            \"SAR\": 8.03074,\n            \"ISR\": 12.1842\n          },\n          \"instrumental\": {\n            \"SDR\": 0.3194,\n            \"SIR\": 6.1072,\n            \"SAR\": -3.28883,\n            \"ISR\": 3.25316\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 2.40989,\n            \"SIR\": 11.5116,\n            \"SAR\": 1.3879,\n            \"ISR\": 4.99395\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1713,\n            \"SIR\": 20.8102,\n            \"SAR\": 14.7704,\n            \"ISR\": 16.5908\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 6.41997,\n            \"SIR\": 18.8085,\n            \"SAR\": 6.83169,\n            \"ISR\": 10.6595\n          },\n          \"instrumental\": {\n            \"SDR\": 1.63646,\n            \"SIR\": 7.35046,\n            \"SAR\": 1.3573,\n            \"ISR\": 5.55991\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.53491,\n            \"SIR\": 19.7521,\n            \"SAR\": 3.94093,\n            \"ISR\": 12.4326\n          },\n          \"instrumental\": {\n            \"SDR\": 5.95565,\n            \"SIR\": 12.5336,\n            \"SAR\": 5.7771,\n            \"ISR\": 10.9417\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 5.07085,\n            \"SIR\": 18.6145,\n            \"SAR\": 4.35329,\n            \"ISR\": 8.53857\n          },\n          \"instrumental\": {\n            \"SDR\": 0.21596,\n            \"SIR\": 15.6609,\n            \"SAR\": -1.28946,\n            \"ISR\": 11.5589\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 9.11662,\n            \"SIR\": 17.3152,\n            \"SAR\": 9.60333,\n            \"ISR\": 13.0907\n          },\n          \"instrumental\": {\n            \"SDR\": 1.21533,\n            \"SIR\": 7.37947,\n            \"SAR\": -0.01784,\n            \"ISR\": 3.12771\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 4.65643,\n            \"SIR\": 8.08793,\n            \"SAR\": 6.06734,\n            \"ISR\": 9.4626\n          },\n          \"instrumental\": {\n            \"SDR\": 4.0963,\n            \"SIR\": 11.9099,\n            \"SAR\": 6.42775,\n            \"ISR\": 6.33339\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 8.7823,\n            \"SIR\": 13.5396,\n            \"SAR\": 10.9588,\n            \"ISR\": 12.4886\n          },\n          \"instrumental\": {\n            \"SDR\": 2.48424,\n            \"SIR\": 10.9439,\n            \"SAR\": 3.17304,\n            \"ISR\": 4.04116\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.74916,\n            \"SIR\": 15.7468,\n            \"SAR\": 6.68936,\n            \"ISR\": 11.8915\n          },\n          \"instrumental\": {\n            \"SDR\": 5.65419,\n            \"SIR\": 15.2447,\n            \"SAR\": 6.35984,\n            \"ISR\": 11.7118\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"other\": {\n            \"SDR\": 7.09594,\n            \"SIR\": 16.9553,\n            \"SAR\": 7.40809,\n            \"ISR\": 10.836\n          },\n          \"instrumental\": {\n            \"SDR\": 1.8609,\n            \"SIR\": 9.34897,\n            \"SAR\": 2.79366,\n            \"ISR\": 6.88979\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"instrumental\": {\n        \"SDR\": 3.1866,\n        \"SIR\": 10.9108,\n        \"SAR\": 4.13215,\n        \"ISR\": 6.35344\n      }\n    },\n    \"stems\": [\n      \"other\",\n      \"no other\"\n    ],\n    \"target_stem\": \"other\"\n  },\n  \"kuielab_b_bass.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_b_bass\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 15.7548,\n            \"SIR\": 23.7323,\n            \"SAR\": 18.4123,\n            \"ISR\": 18.2626\n          },\n          \"instrumental\": {\n            \"SDR\": 0.81314,\n            \"SIR\": 6.97275,\n            \"SAR\": 4.42315,\n            \"ISR\": 1.81282\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.2335,\n            \"SIR\": 21.1984,\n            \"SAR\": 12.3839,\n            \"ISR\": 13.3543\n          },\n          \"instrumental\": {\n            \"SDR\": 2.2203,\n            \"SIR\": 5.75124,\n            \"SAR\": 2.93622,\n            \"ISR\": 3.89337\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.61002,\n            \"SIR\": 18.8263,\n            \"SAR\": 8.78097,\n            \"ISR\": 11.4529\n          },\n          \"instrumental\": {\n            \"SDR\": 2.7386,\n            \"SIR\": 6.12097,\n            \"SAR\": 3.13034,\n            \"ISR\": 6.54165\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 3.08289,\n            \"SIR\": 25.5082,\n            \"SAR\": 3.01361,\n            \"ISR\": 4.23416\n          },\n          \"instrumental\": {\n            \"SDR\": 4.89626,\n            \"SIR\": 12.1984,\n            \"SAR\": 4.40366,\n            \"ISR\": 9.55751\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.8357,\n            \"SIR\": 20.3337,\n            \"SAR\": 10.1956,\n            \"ISR\": 13.4856\n          },\n          \"instrumental\": {\n            \"SDR\": 1.42536,\n            \"SIR\": 7.40689,\n            \"SAR\": 0.7136,\n            \"ISR\": 7.82504\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 5.61949,\n            \"SIR\": 9.35277,\n            \"SAR\": 5.27928,\n            \"ISR\": 10.6836\n          },\n          \"instrumental\": {\n            \"SDR\": 2.07579,\n            \"SIR\": 12.7337,\n            \"SAR\": 0.95736,\n            \"ISR\": 11.9567\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.2638,\n            \"SIR\": 23.7722,\n            \"SAR\": 13.8163,\n            \"ISR\": 15.413\n          },\n          \"instrumental\": {\n            \"SDR\": 0.94812,\n            \"SIR\": 4.98609,\n            \"SAR\": 0.919005,\n            \"ISR\": 3.65536\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.4702,\n            \"SIR\": 23.2104,\n            \"SAR\": 14.2076,\n            \"ISR\": 16.4926\n          },\n          \"instrumental\": {\n            \"SDR\": 1.55335,\n            \"SIR\": 2.39983,\n            \"SAR\": 5.29428,\n            \"ISR\": 2.76656\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 7.70288,\n            \"SIR\": 20.2693,\n            \"SAR\": 7.73998,\n            \"ISR\": 10.0089\n          },\n          \"instrumental\": {\n            \"SDR\": 8.12446,\n            \"SIR\": 11.7788,\n            \"SAR\": 15.7118,\n            \"ISR\": 12.3598\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.2519,\n            \"SIR\": 19.9879,\n            \"SAR\": 10.3312,\n            \"ISR\": 12.7349\n          },\n          \"instrumental\": {\n            \"SDR\": 3.09573,\n            \"SIR\": 7.95487,\n            \"SAR\": 3.87341,\n            \"ISR\": 9.70361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.74868,\n            \"SIR\": 21.1154,\n            \"SAR\": 7.09004,\n            \"ISR\": 8.64697\n          },\n          \"instrumental\": {\n            \"SDR\": 5.17796,\n            \"SIR\": 8.194,\n            \"SAR\": 4.2851,\n            \"ISR\": 10.403\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.41547,\n            \"SIR\": 18.8555,\n            \"SAR\": 8.9441,\n            \"ISR\": 12.7429\n          },\n          \"instrumental\": {\n            \"SDR\": 3.71434,\n            \"SIR\": 8.18723,\n            \"SAR\": 4.59536,\n            \"ISR\": 9.64957\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.793,\n            \"SIR\": 20.8977,\n            \"SAR\": 13.1418,\n            \"ISR\": 16.3052\n          },\n          \"instrumental\": {\n            \"SDR\": 2.00303,\n            \"SIR\": 6.23416,\n            \"SAR\": 3.07668,\n            \"ISR\": 3.88179\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.92261,\n            \"SIR\": 18.6214,\n            \"SAR\": 9.8512,\n            \"ISR\": 13.9627\n          },\n          \"instrumental\": {\n            \"SDR\": 2.73232,\n            \"SIR\": 4.95466,\n            \"SAR\": 4.76885,\n            \"ISR\": 5.78991\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.3993,\n            \"SIR\": 24.0591,\n            \"SAR\": 14.7255,\n            \"ISR\": 16.4459\n          },\n          \"instrumental\": {\n            \"SDR\": 1.62143,\n            \"SIR\": 4.18037,\n            \"SAR\": 5.74988,\n            \"ISR\": 2.73077\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 6.20341,\n            \"SIR\": 10.7066,\n            \"SAR\": 3.18892,\n            \"ISR\": 3.95547\n          },\n          \"instrumental\": {\n            \"SDR\": 3.76185,\n            \"SIR\": 7.26743,\n            \"SAR\": 2.55489,\n            \"ISR\": 10.6071\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.21082,\n            \"SIR\": 17.0031,\n            \"SAR\": 7.67154,\n            \"ISR\": 11.3789\n          },\n          \"instrumental\": {\n            \"SDR\": 5.98539,\n            \"SIR\": 10.1891,\n            \"SAR\": 10.3468,\n            \"ISR\": 10.869\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.601,\n            \"SIR\": 21.0477,\n            \"SAR\": 15.0617,\n            \"ISR\": 16.0498\n          },\n          \"instrumental\": {\n            \"SDR\": 3.43685,\n            \"SIR\": 11.0703,\n            \"SAR\": 16.0928,\n            \"ISR\": 4.84765\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.3268,\n            \"SIR\": 21.4642,\n            \"SAR\": 13.7382,\n            \"ISR\": 16.673\n          },\n          \"instrumental\": {\n            \"SDR\": 2.35495,\n            \"SIR\": 7.20103,\n            \"SAR\": 6.58162,\n            \"ISR\": 4.26424\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.4218,\n            \"SIR\": 19.9021,\n            \"SAR\": 11.0927,\n            \"ISR\": 14.9817\n          },\n          \"instrumental\": {\n            \"SDR\": 1.42512,\n            \"SIR\": 3.21367,\n            \"SAR\": 2.1126,\n            \"ISR\": 4.37602\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.2073,\n            \"SIR\": 21.8536,\n            \"SAR\": 15.4168,\n            \"ISR\": 17.2479\n          },\n          \"instrumental\": {\n            \"SDR\": 0.44343,\n            \"SIR\": 1.27262,\n            \"SAR\": 0.23863,\n            \"ISR\": 1.71844\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.5464,\n            \"SIR\": 22.2233,\n            \"SAR\": 11.318,\n            \"ISR\": 13.8509\n          },\n          \"instrumental\": {\n            \"SDR\": 6.46552,\n            \"SIR\": 9.57089,\n            \"SAR\": 15.1044,\n            \"ISR\": 9.30445\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 3.21515,\n            \"SIR\": 13.3912,\n            \"SAR\": 1.26532,\n            \"ISR\": 4.40634\n          },\n          \"instrumental\": {\n            \"SDR\": 7.77384,\n            \"SIR\": 16.7379,\n            \"SAR\": 8.42431,\n            \"ISR\": 13.9576\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.98494,\n            \"SIR\": 14.1989,\n            \"SAR\": 9.7302,\n            \"ISR\": 11.6097\n          },\n          \"instrumental\": {\n            \"SDR\": 1.90551,\n            \"SIR\": 4.7302,\n            \"SAR\": 3.87369,\n            \"ISR\": 3.64729\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 7.84134,\n            \"SIR\": 16.0192,\n            \"SAR\": 8.57408,\n            \"ISR\": 11.4806\n          },\n          \"instrumental\": {\n            \"SDR\": 4.29513,\n            \"SIR\": 9.94399,\n            \"SAR\": 5.52655,\n            \"ISR\": 7.5392\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 5.38987,\n            \"SIR\": 21.7443,\n            \"SAR\": 6.15124,\n            \"ISR\": 13.3924\n          },\n          \"instrumental\": {\n            \"SDR\": 5.36153,\n            \"SIR\": 10.4647,\n            \"SAR\": 6.58048,\n            \"ISR\": 10.6227\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.5593,\n            \"SIR\": 19.1382,\n            \"SAR\": 12.3122,\n            \"ISR\": 14.6797\n          },\n          \"instrumental\": {\n            \"SDR\": 0.47325,\n            \"SIR\": 7.47173,\n            \"SAR\": -0.43683,\n            \"ISR\": 4.4767\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.9461,\n            \"SIR\": 19.7225,\n            \"SAR\": 8.8104,\n            \"ISR\": 12.9769\n          },\n          \"instrumental\": {\n            \"SDR\": 5.32202,\n            \"SIR\": 17.9164,\n            \"SAR\": 5.10667,\n            \"ISR\": 14.0013\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.2725,\n            \"SIR\": 23.0555,\n            \"SAR\": 11.9542,\n            \"ISR\": 15.3665\n          },\n          \"instrumental\": {\n            \"SDR\": 2.82989,\n            \"SIR\": 5.18172,\n            \"SAR\": 3.90687,\n            \"ISR\": 7.06041\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.39465,\n            \"SIR\": 20.2563,\n            \"SAR\": 7.90997,\n            \"ISR\": 11.4464\n          },\n          \"instrumental\": {\n            \"SDR\": 1.75367,\n            \"SIR\": 10.3253,\n            \"SAR\": 0.830015,\n            \"ISR\": 11.9748\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 3.7596,\n            \"SIR\": 13.1369,\n            \"SAR\": 1.92571,\n            \"ISR\": 5.65045\n          },\n          \"instrumental\": {\n            \"SDR\": 3.18183,\n            \"SIR\": 14.8768,\n            \"SAR\": 1.87619,\n            \"ISR\": 12.772\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 9.31692,\n            \"SIR\": 27.2579,\n            \"SAR\": 0.75021,\n            \"ISR\": -4.33117\n          },\n          \"instrumental\": {\n            \"SDR\": 5.62643,\n            \"SIR\": 2.76063,\n            \"SAR\": 3.00181,\n            \"ISR\": 12.2923\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 13.0351,\n            \"SIR\": 26.1968,\n            \"SAR\": 10.7765,\n            \"ISR\": 13.2642\n          },\n          \"instrumental\": {\n            \"SDR\": 3.22029,\n            \"SIR\": 7.44384,\n            \"SAR\": 5.7559,\n            \"ISR\": 7.75851\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 14.7969,\n            \"SIR\": 26.6912,\n            \"SAR\": 16.9395,\n            \"ISR\": 17.9366\n          },\n          \"instrumental\": {\n            \"SDR\": 0.484755,\n            \"SIR\": 2.37826,\n            \"SAR\": 0.83404,\n            \"ISR\": 3.05398\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 10.7409,\n            \"SIR\": 22.12,\n            \"SAR\": 11.5083,\n            \"ISR\": 15.4611\n          },\n          \"instrumental\": {\n            \"SDR\": -1.86122,\n            \"SIR\": 1.53389,\n            \"SAR\": -6.10625,\n            \"ISR\": 3.03968\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.90883,\n            \"SIR\": 16.9371,\n            \"SAR\": 9.06707,\n            \"ISR\": 13.4879\n          },\n          \"instrumental\": {\n            \"SDR\": 3.68763,\n            \"SIR\": 10.6277,\n            \"SAR\": 3.85806,\n            \"ISR\": 10.0287\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.44492,\n            \"SIR\": 12.3766,\n            \"SAR\": 0.42707,\n            \"ISR\": -5.51573\n          },\n          \"instrumental\": {\n            \"SDR\": 3.2009,\n            \"SIR\": -4.69864,\n            \"SAR\": 0.80892,\n            \"ISR\": 5.93919\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.8977,\n            \"SIR\": 21.4119,\n            \"SAR\": 4.74189,\n            \"ISR\": 4.5358\n          },\n          \"instrumental\": {\n            \"SDR\": 2.56857,\n            \"SIR\": 2.73476,\n            \"SAR\": 2.2893,\n            \"ISR\": 4.96208\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 11.7981,\n            \"SIR\": 17.4025,\n            \"SAR\": 13.7179,\n            \"ISR\": 16.5175\n          },\n          \"instrumental\": {\n            \"SDR\": 0.6164,\n            \"SIR\": 2.0046,\n            \"SAR\": -0.39888,\n            \"ISR\": 1.97171\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"bass\": {\n            \"SDR\": 8.63686,\n            \"SIR\": 18.5065,\n            \"SAR\": 3.06041,\n            \"ISR\": 3.07377\n          },\n          \"instrumental\": {\n            \"SDR\": 2.57565,\n            \"SIR\": 0.80275,\n            \"SAR\": 1.14688,\n            \"ISR\": 7.67727\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"bass\": {\n        \"SDR\": 9.91032,\n        \"SIR\": 20.3015,\n        \"SAR\": 9.7907,\n        \"ISR\": 13.3093\n      },\n      \"instrumental\": {\n        \"SDR\": 2.73546,\n        \"SIR\": 7.23423,\n        \"SAR\": 3.86573,\n        \"ISR\": 7.29981\n      }\n    },\n    \"stems\": [\n      \"bass\",\n      \"no bass\"\n    ],\n    \"target_stem\": \"bass\"\n  },\n  \"kuielab_b_drums.onnx\": {\n    \"model_name\": \"MDX-Net Model: kuielab_b_drums\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 2.5684,\n            \"SIR\": 15.1012,\n            \"SAR\": 0.54169,\n            \"ISR\": 3.75755\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5504,\n            \"SIR\": 25.3366,\n            \"SAR\": 10.7179,\n            \"ISR\": 18.4031\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.2011,\n            \"SIR\": 19.1452,\n            \"SAR\": 5.91363,\n            \"ISR\": 9.29231\n          },\n          \"instrumental\": {\n            \"SDR\": 6.00811,\n            \"SIR\": 12.5654,\n            \"SAR\": 6.30249,\n            \"ISR\": 14.0578\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.19709,\n            \"SIR\": 14.5978,\n            \"SAR\": 8.67332,\n            \"ISR\": 14.11\n          },\n          \"instrumental\": {\n            \"SDR\": 2.25324,\n            \"SIR\": 5.2431,\n            \"SAR\": 2.28704,\n            \"ISR\": 5.50651\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.10067,\n            \"SIR\": 16.2967,\n            \"SAR\": 7.89512,\n            \"ISR\": 10.2504\n          },\n          \"instrumental\": {\n            \"SDR\": 3.56954,\n            \"SIR\": 6.40193,\n            \"SAR\": 5.72985,\n            \"ISR\": 7.30478\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.63067,\n            \"SIR\": 18.6,\n            \"SAR\": 9.37125,\n            \"ISR\": 12.8421\n          },\n          \"instrumental\": {\n            \"SDR\": 1.17916,\n            \"SIR\": 5.92972,\n            \"SAR\": 0.17392,\n            \"ISR\": 7.40509\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.49692,\n            \"SIR\": 18.744,\n            \"SAR\": 9.72448,\n            \"ISR\": 11.6189\n          },\n          \"instrumental\": {\n            \"SDR\": 0.76547,\n            \"SIR\": 7.22604,\n            \"SAR\": -0.55118,\n            \"ISR\": 6.21183\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.75381,\n            \"SIR\": 19.6976,\n            \"SAR\": 9.48677,\n            \"ISR\": 13.153\n          },\n          \"instrumental\": {\n            \"SDR\": 3.27906,\n            \"SIR\": 9.50116,\n            \"SAR\": 3.46903,\n            \"ISR\": 8.97561\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.91812,\n            \"SIR\": 15.559,\n            \"SAR\": 7.41762,\n            \"ISR\": 11.2848\n          },\n          \"instrumental\": {\n            \"SDR\": 6.16064,\n            \"SIR\": 8.82203,\n            \"SAR\": 8.57605,\n            \"ISR\": 9.51061\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.84413,\n            \"SIR\": 15.0591,\n            \"SAR\": 5.80223,\n            \"ISR\": 9.94096\n          },\n          \"instrumental\": {\n            \"SDR\": 5.82193,\n            \"SIR\": 11.3604,\n            \"SAR\": 10.9842,\n            \"ISR\": 10.4279\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.06685,\n            \"SIR\": 13.2933,\n            \"SAR\": 7.03635,\n            \"ISR\": 12.0527\n          },\n          \"instrumental\": {\n            \"SDR\": 3.0867,\n            \"SIR\": 6.60793,\n            \"SAR\": 3.4161,\n            \"ISR\": 6.36216\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.58602,\n            \"SIR\": 16.7514,\n            \"SAR\": 10.5066,\n            \"ISR\": 14.8305\n          },\n          \"instrumental\": {\n            \"SDR\": 2.9283,\n            \"SIR\": 7.34443,\n            \"SAR\": 5.63948,\n            \"ISR\": 6.3464\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.68661,\n            \"SIR\": 19.8937,\n            \"SAR\": 10.6275,\n            \"ISR\": 13.8978\n          },\n          \"instrumental\": {\n            \"SDR\": 1.67553,\n            \"SIR\": 2.99451,\n            \"SAR\": 2.43189,\n            \"ISR\": 4.20413\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.95161,\n            \"SIR\": 14.8377,\n            \"SAR\": 4.58704,\n            \"ISR\": 9.44919\n          },\n          \"instrumental\": {\n            \"SDR\": 6.00352,\n            \"SIR\": 12.5359,\n            \"SAR\": 5.78378,\n            \"ISR\": 12.5975\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.12964,\n            \"SIR\": 15.8945,\n            \"SAR\": 8.43014,\n            \"ISR\": 12.9855\n          },\n          \"instrumental\": {\n            \"SDR\": 2.90031,\n            \"SIR\": 5.78233,\n            \"SAR\": 4.61964,\n            \"ISR\": 6.59501\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.12447,\n            \"SIR\": 15.6269,\n            \"SAR\": 5.94624,\n            \"ISR\": 9.98507\n          },\n          \"instrumental\": {\n            \"SDR\": 7.84515,\n            \"SIR\": 11.7485,\n            \"SAR\": 9.56791,\n            \"ISR\": 13.0054\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.13615,\n            \"SIR\": 20.2134,\n            \"SAR\": 5.83405,\n            \"ISR\": 7.15399\n          },\n          \"instrumental\": {\n            \"SDR\": 2.88148,\n            \"SIR\": 8.20722,\n            \"SAR\": 2.22197,\n            \"ISR\": 8.34761\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.6933,\n            \"SIR\": 20.478,\n            \"SAR\": 11.7371,\n            \"ISR\": 15.3026\n          },\n          \"instrumental\": {\n            \"SDR\": 1.69627,\n            \"SIR\": 3.94051,\n            \"SAR\": 4.28544,\n            \"ISR\": 3.24043\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 1.47191,\n            \"SIR\": 9.95266,\n            \"SAR\": -1.52858,\n            \"ISR\": 3.29303\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6898,\n            \"SIR\": 28.0153,\n            \"SAR\": 22.898,\n            \"ISR\": 18.7676\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.95376,\n            \"SIR\": 12.495,\n            \"SAR\": 7.27907,\n            \"ISR\": 11.1508\n          },\n          \"instrumental\": {\n            \"SDR\": 4.53461,\n            \"SIR\": 10.2303,\n            \"SAR\": 6.73113,\n            \"ISR\": 8.50964\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.97956,\n            \"SIR\": 14.9381,\n            \"SAR\": 6.38333,\n            \"ISR\": 12.0979\n          },\n          \"instrumental\": {\n            \"SDR\": 3.54066,\n            \"SIR\": 7.78204,\n            \"SAR\": 3.96349,\n            \"ISR\": 8.92341\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.00642,\n            \"SIR\": 17.4592,\n            \"SAR\": 4.79745,\n            \"ISR\": 10.9245\n          },\n          \"instrumental\": {\n            \"SDR\": 6.15955,\n            \"SIR\": 12.7043,\n            \"SAR\": 7.43093,\n            \"ISR\": 13.594\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 2.60555,\n            \"SIR\": 10.215,\n            \"SAR\": 2.16607,\n            \"ISR\": 8.54691\n          },\n          \"instrumental\": {\n            \"SDR\": 9.77316,\n            \"SIR\": 15.6496,\n            \"SAR\": 15.6572,\n            \"ISR\": 14.2025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.1572,\n            \"SIR\": 14.3231,\n            \"SAR\": 8.26936,\n            \"ISR\": 11.2281\n          },\n          \"instrumental\": {\n            \"SDR\": 3.04591,\n            \"SIR\": 6.86055,\n            \"SAR\": 5.6269,\n            \"ISR\": 4.71269\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.2755,\n            \"SIR\": 11.665,\n            \"SAR\": 5.11633,\n            \"ISR\": 9.90275\n          },\n          \"instrumental\": {\n            \"SDR\": 5.41017,\n            \"SIR\": 9.84848,\n            \"SAR\": 6.9734,\n            \"ISR\": 9.11239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.2365,\n            \"SIR\": 17.8736,\n            \"SAR\": 11.6629,\n            \"ISR\": 14.0066\n          },\n          \"instrumental\": {\n            \"SDR\": 1.81437,\n            \"SIR\": 6.20197,\n            \"SAR\": 3.37121,\n            \"ISR\": 3.46865\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 9.18064,\n            \"SIR\": 19.4174,\n            \"SAR\": 9.81886,\n            \"ISR\": 15.2134\n          },\n          \"instrumental\": {\n            \"SDR\": 2.73682,\n            \"SIR\": 9.58862,\n            \"SAR\": 5.68188,\n            \"ISR\": 7.96164\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.73973,\n            \"SIR\": 16.9175,\n            \"SAR\": 6.59046,\n            \"ISR\": 11.0466\n          },\n          \"instrumental\": {\n            \"SDR\": 2.11844,\n            \"SIR\": 10.6467,\n            \"SAR\": 1.28349,\n            \"ISR\": 10.8954\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.76715,\n            \"SIR\": 13.867,\n            \"SAR\": 5.5621,\n            \"ISR\": 7.68594\n          },\n          \"instrumental\": {\n            \"SDR\": 5.63864,\n            \"SIR\": 16.103,\n            \"SAR\": 5.01904,\n            \"ISR\": 14.2624\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.81418,\n            \"SIR\": 21.1299,\n            \"SAR\": 8.06179,\n            \"ISR\": 10.0182\n          },\n          \"instrumental\": {\n            \"SDR\": 1.72096,\n            \"SIR\": 4.33378,\n            \"SAR\": 1.26953,\n            \"ISR\": 5.43391\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 8.46906,\n            \"SIR\": 19.6227,\n            \"SAR\": 8.81233,\n            \"ISR\": 12.7851\n          },\n          \"instrumental\": {\n            \"SDR\": 0.599795,\n            \"SIR\": 6.38856,\n            \"SAR\": -0.752025,\n            \"ISR\": 6.15902\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.2038,\n            \"SIR\": 16.1349,\n            \"SAR\": 7.35825,\n            \"ISR\": 11.4774\n          },\n          \"instrumental\": {\n            \"SDR\": 2.16539,\n            \"SIR\": 8.64191,\n            \"SAR\": 1.35815,\n            \"ISR\": 8.51987\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 14.4814,\n            \"SIR\": 22.7441,\n            \"SAR\": 16.0829,\n            \"ISR\": 19.0982\n          },\n          \"instrumental\": {\n            \"SDR\": 0.02236,\n            \"SIR\": -0.68008,\n            \"SAR\": -0.05878,\n            \"ISR\": 0.97669\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.7839,\n            \"SIR\": 24.1476,\n            \"SAR\": 11.7829,\n            \"ISR\": 14.7781\n          },\n          \"instrumental\": {\n            \"SDR\": 2.26446,\n            \"SIR\": 5.40597,\n            \"SAR\": 4.09761,\n            \"ISR\": 6.73183\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 11.3648,\n            \"SIR\": 26.6375,\n            \"SAR\": 11.5932,\n            \"ISR\": 15.7057\n          },\n          \"instrumental\": {\n            \"SDR\": 3.93787,\n            \"SIR\": 9.09541,\n            \"SAR\": 4.92127,\n            \"ISR\": 9.93224\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 4.07266,\n            \"SIR\": 10.9354,\n            \"SAR\": 4.32409,\n            \"ISR\": 9.951\n          },\n          \"instrumental\": {\n            \"SDR\": -0.09054,\n            \"SIR\": 10.5061,\n            \"SAR\": -1.76441,\n            \"ISR\": 10.232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 7.61546,\n            \"SIR\": 20.9245,\n            \"SAR\": 8.2853,\n            \"ISR\": 11.09\n          },\n          \"instrumental\": {\n            \"SDR\": 3.77759,\n            \"SIR\": 10.3551,\n            \"SAR\": 3.92625,\n            \"ISR\": 9.61456\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 5.90095,\n            \"SIR\": 14.3915,\n            \"SAR\": 6.44924,\n            \"ISR\": 9.37\n          },\n          \"instrumental\": {\n            \"SDR\": 4.79998,\n            \"SIR\": 9.9591,\n            \"SAR\": 7.99268,\n            \"ISR\": 8.1511\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 3.68085,\n            \"SIR\": 11.6621,\n            \"SAR\": 3.02148,\n            \"ISR\": 7.01052\n          },\n          \"instrumental\": {\n            \"SDR\": 6.35439,\n            \"SIR\": 14.9113,\n            \"SAR\": 6.02898,\n            \"ISR\": 14.1714\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 6.29074,\n            \"SIR\": 17.3678,\n            \"SAR\": 5.75812,\n            \"ISR\": 11.1898\n          },\n          \"instrumental\": {\n            \"SDR\": 5.41089,\n            \"SIR\": 10.0938,\n            \"SAR\": 6.39544,\n            \"ISR\": 11.4617\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"drums\": {\n            \"SDR\": 10.6079,\n            \"SIR\": 21.1037,\n            \"SAR\": 11.1041,\n            \"ISR\": 14.5222\n          },\n          \"instrumental\": {\n            \"SDR\": 2.03495,\n            \"SIR\": 5.70979,\n            \"SAR\": 3.96897,\n            \"ISR\": 6.62577\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"drums\": {\n        \"SDR\": 7.08376,\n        \"SIR\": 16.5241,\n        \"SAR\": 7.38794,\n        \"ISR\": 11.2089\n      },\n      \"instrumental\": {\n        \"SDR\": 3.18288,\n        \"SIR\": 8.95872,\n        \"SAR\": 4.77046,\n        \"ISR\": 8.51475\n      }\n    },\n    \"stems\": [\n      \"drums\",\n      \"no drums\"\n    ],\n    \"target_stem\": \"drums\"\n  },\n  \"UVR-MDX-NET_Main_340.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Main_340\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.59047,\n            \"SIR\": 19.3254,\n            \"SAR\": 5.54795,\n            \"ISR\": 9.92554\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5772,\n            \"SIR\": 24.8953,\n            \"SAR\": 19.8876,\n            \"ISR\": 18.9964\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.54666,\n            \"SIR\": 11.9182,\n            \"SAR\": 7.21952,\n            \"ISR\": 13.6981\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8166,\n            \"SIR\": 23.4527,\n            \"SAR\": 13.2295,\n            \"ISR\": 14.4353\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4048,\n            \"SIR\": 23.0031,\n            \"SAR\": 10.8814,\n            \"ISR\": 15.7745\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8797,\n            \"SIR\": 27.5331,\n            \"SAR\": 18.3232,\n            \"ISR\": 18.4226\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.39553,\n            \"SIR\": 5.95349,\n            \"SAR\": 4.57037,\n            \"ISR\": 13.2196\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8805,\n            \"SIR\": 27.9049,\n            \"SAR\": 13.207,\n            \"ISR\": 13.3993\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.879,\n            \"SIR\": 23.8183,\n            \"SAR\": 13.7653,\n            \"ISR\": 16.8547\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7291,\n            \"SIR\": 25.3035,\n            \"SAR\": 15.8782,\n            \"ISR\": 17.4843\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3303,\n            \"SIR\": 19.1643,\n            \"SAR\": 11.0204,\n            \"ISR\": 15.3374\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0085,\n            \"SIR\": 22.9093,\n            \"SAR\": 13.3492,\n            \"ISR\": 16.1573\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.921,\n            \"SIR\": 21.2136,\n            \"SAR\": 12.7544,\n            \"ISR\": 16.0009\n          },\n          \"instrumental\": {\n            \"SDR\": 15.08,\n            \"SIR\": 25.275,\n            \"SAR\": 17.0213,\n            \"ISR\": 17.9508\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.40857,\n            \"SIR\": 20.0864,\n            \"SAR\": 5.57557,\n            \"ISR\": 9.19211\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0019,\n            \"SIR\": 28.5227,\n            \"SAR\": 24.0548,\n            \"ISR\": 19.4788\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8817,\n            \"SIR\": 32.9888,\n            \"SAR\": 13.5102,\n            \"ISR\": 17.6065\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5718,\n            \"SIR\": 28.593,\n            \"SAR\": 17.6771,\n            \"ISR\": 19.1346\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7265,\n            \"SIR\": 23.5946,\n            \"SAR\": 10.3781,\n            \"ISR\": 12.5704\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8452,\n            \"SIR\": 19.218,\n            \"SAR\": 15.4081,\n            \"ISR\": 18.2195\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.2035,\n            \"SIR\": 32.4611,\n            \"SAR\": 16.6814,\n            \"ISR\": 19.2902\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6764,\n            \"SIR\": 31.1089,\n            \"SAR\": 19.5272,\n            \"ISR\": 19.2673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8368,\n            \"SIR\": 23.9672,\n            \"SAR\": 11.9723,\n            \"ISR\": 15.4875\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6607,\n            \"SIR\": 25.7476,\n            \"SAR\": 17.8226,\n            \"ISR\": 18.6773\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.24528,\n            \"SIR\": 19.8362,\n            \"SAR\": 7.67928,\n            \"ISR\": 13.626\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6018,\n            \"SIR\": 26.728,\n            \"SAR\": 17.6935,\n            \"ISR\": 17.8728\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.40925,\n            \"SIR\": 19.8111,\n            \"SAR\": 9.28753,\n            \"ISR\": 13.4965\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0223,\n            \"SIR\": 25.8298,\n            \"SAR\": 21.8279,\n            \"ISR\": 18.6861\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.53122,\n            \"SIR\": 19.1035,\n            \"SAR\": 3.71645,\n            \"ISR\": 9.26805\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3034,\n            \"SIR\": 22.444,\n            \"SAR\": 18.6108,\n            \"ISR\": 18.8866\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0275,\n            \"SIR\": 21.7959,\n            \"SAR\": 10.6982,\n            \"ISR\": 15.5935\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5324,\n            \"SIR\": 26.1843,\n            \"SAR\": 16.4632,\n            \"ISR\": 17.9692\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.07272,\n            \"SIR\": 25.9994,\n            \"SAR\": 9.29185,\n            \"ISR\": 14.2574\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3648,\n            \"SIR\": 25.7787,\n            \"SAR\": 17.676,\n            \"ISR\": 18.9055\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -5.84287,\n            \"SIR\": -36.995,\n            \"SAR\": 0.355675,\n            \"ISR\": 10.3781\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6599,\n            \"SIR\": 57.4676,\n            \"SAR\": 12.5332,\n            \"ISR\": 12.5977\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.24907,\n            \"SIR\": 16.7769,\n            \"SAR\": 6.99725,\n            \"ISR\": 10.9783\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4862,\n            \"SIR\": 21.9252,\n            \"SAR\": 18.2315,\n            \"ISR\": 18.015\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2983,\n            \"SIR\": 24.5958,\n            \"SAR\": 11.0713,\n            \"ISR\": 14.5942\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5101,\n            \"SIR\": 25.0115,\n            \"SAR\": 17.4846,\n            \"ISR\": 18.8203\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9704,\n            \"SIR\": 24.0405,\n            \"SAR\": 11.6902,\n            \"ISR\": 16.1945\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5229,\n            \"SIR\": 32.6163,\n            \"SAR\": 21.5149,\n            \"ISR\": 19.1371\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.86038,\n            \"SIR\": 2.39004,\n            \"SAR\": -0.09909,\n            \"ISR\": 3.6875\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7922,\n            \"SIR\": 40.3354,\n            \"SAR\": 32.3835,\n            \"ISR\": 19.2675\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.20095,\n            \"SIR\": 14.6732,\n            \"SAR\": 4.03773,\n            \"ISR\": 6.83048\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7884,\n            \"SIR\": 14.6335,\n            \"SAR\": 14.3113,\n            \"ISR\": 17.1369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91478,\n            \"SIR\": 17.751,\n            \"SAR\": 6.40324,\n            \"ISR\": 13.1966\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5248,\n            \"SIR\": 29.5362,\n            \"SAR\": 20.2376,\n            \"ISR\": 18.8478\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.78787,\n            \"SIR\": 19.4971,\n            \"SAR\": 8.06298,\n            \"ISR\": 11.8468\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5291,\n            \"SIR\": 22.5169,\n            \"SAR\": 17.1153,\n            \"ISR\": 18.4195\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.86539,\n            \"SIR\": 17.5331,\n            \"SAR\": 9.29978,\n            \"ISR\": 15.2815\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4407,\n            \"SIR\": 27.2567,\n            \"SAR\": 16.9712,\n            \"ISR\": 17.4028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1748,\n            \"SIR\": 25.1553,\n            \"SAR\": 11.9515,\n            \"ISR\": 14.7653\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1729,\n            \"SIR\": 21.3009,\n            \"SAR\": 14.1761,\n            \"ISR\": 17.6395\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.59262,\n            \"SIR\": 14.0947,\n            \"SAR\": 6.45711,\n            \"ISR\": 12.3968\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2318,\n            \"SIR\": 20.5671,\n            \"SAR\": 13.4724,\n            \"ISR\": 15.1727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5502,\n            \"SIR\": 25.3408,\n            \"SAR\": 11.1519,\n            \"ISR\": 16.221\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6137,\n            \"SIR\": 26.2834,\n            \"SAR\": 17.0856,\n            \"ISR\": 17.5627\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.3102,\n            \"SIR\": 31.1403,\n            \"SAR\": 14.7674,\n            \"ISR\": 16.8947\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1248,\n            \"SIR\": 21.258,\n            \"SAR\": 14.7674,\n            \"ISR\": 18.901\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2346,\n            \"SIR\": 24.6397,\n            \"SAR\": 11.7021,\n            \"ISR\": 15.9209\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2231,\n            \"SIR\": 24.0241,\n            \"SAR\": 15.5871,\n            \"ISR\": 17.7846\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6508,\n            \"SIR\": 33.3165,\n            \"SAR\": 9.21143,\n            \"ISR\": 10.2997\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5079,\n            \"SIR\": 21.4968,\n            \"SAR\": 19.6272,\n            \"ISR\": 19.6975\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.22097,\n            \"SIR\": 11.5764,\n            \"SAR\": 6.82513,\n            \"ISR\": 10.3243\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7208,\n            \"SIR\": 17.2716,\n            \"SAR\": 14.1359,\n            \"ISR\": 16.408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2105,\n            \"SIR\": 28.0098,\n            \"SAR\": 13.9937,\n            \"ISR\": 18.9055\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6625,\n            \"SIR\": 38.7813,\n            \"SAR\": 24.4699,\n            \"ISR\": 19.3791\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.327,\n            \"SIR\": 20.6429,\n            \"SAR\": 12.526,\n            \"ISR\": 17.7893\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7535,\n            \"SIR\": 25.5315,\n            \"SAR\": 12.476,\n            \"ISR\": 15.1966\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.54308,\n            \"SIR\": 22.4156,\n            \"SAR\": 8.61041,\n            \"ISR\": 14.1128\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7766,\n            \"SIR\": 23.6513,\n            \"SAR\": 15.1867,\n            \"ISR\": 17.8067\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.8775,\n            \"SIR\": 6.24075,\n            \"SAR\": 2.0346,\n            \"ISR\": 11.7342\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7803,\n            \"SIR\": 32.2031,\n            \"SAR\": 21.6552,\n            \"ISR\": 17.7895\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3215,\n            \"SIR\": 26.0265,\n            \"SAR\": 10.7873,\n            \"ISR\": 16.5515\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0727,\n            \"SIR\": 33.0414,\n            \"SAR\": 20.3793,\n            \"ISR\": 19.0383\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8412,\n            \"SIR\": 30.2713,\n            \"SAR\": 12.2894,\n            \"ISR\": 16.7455\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9839,\n            \"SIR\": 33.6239,\n            \"SAR\": 22.1693,\n            \"ISR\": 19.4058\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7992,\n            \"SIR\": 31.4797,\n            \"SAR\": 13.4691,\n            \"ISR\": 14.8236\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7789,\n            \"SIR\": 22.2598,\n            \"SAR\": 17.0289,\n            \"ISR\": 18.8969\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1745,\n        \"SIR\": 21.5048,\n        \"SAR\": 9.83894,\n        \"ISR\": 14.4258\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4028,\n        \"SIR\": 25.6395,\n        \"SAR\": 17.3,\n        \"ISR\": 18.3195\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET_Main_390.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Main_390\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.20541,\n            \"SIR\": 16.7913,\n            \"SAR\": 5.91045,\n            \"ISR\": 9.75897\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3509,\n            \"SIR\": 18.2773,\n            \"SAR\": 15.1963,\n            \"ISR\": 17.5234\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.78109,\n            \"SIR\": 13.0325,\n            \"SAR\": 7.55413,\n            \"ISR\": 9.30404\n          },\n          \"instrumental\": {\n            \"SDR\": 6.75573,\n            \"SIR\": 10.8264,\n            \"SAR\": 9.24601,\n            \"ISR\": 12.5868\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.73706,\n            \"SIR\": 22.6711,\n            \"SAR\": 11.0354,\n            \"ISR\": 7.1558\n          },\n          \"instrumental\": {\n            \"SDR\": 9.32083,\n            \"SIR\": 10.4807,\n            \"SAR\": 14.2313,\n            \"ISR\": 17.0003\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -1.14132,\n            \"SIR\": 5.385,\n            \"SAR\": 4.08277,\n            \"ISR\": 9.07427\n          },\n          \"instrumental\": {\n            \"SDR\": 6.62126,\n            \"SIR\": 14.4152,\n            \"SAR\": 8.91041,\n            \"ISR\": 10.2967\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.51178,\n            \"SIR\": 23.0842,\n            \"SAR\": 13.928,\n            \"ISR\": 9.99188\n          },\n          \"instrumental\": {\n            \"SDR\": 7.55177,\n            \"SIR\": 8.35106,\n            \"SAR\": 13.3653,\n            \"ISR\": 15.9889\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.44664,\n            \"SIR\": 18.507,\n            \"SAR\": 11.1041,\n            \"ISR\": 7.47867\n          },\n          \"instrumental\": {\n            \"SDR\": 5.01169,\n            \"SIR\": 5.90006,\n            \"SAR\": 9.60692,\n            \"ISR\": 13.5728\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.97685,\n            \"SIR\": 20.8348,\n            \"SAR\": 12.9264,\n            \"ISR\": 7.23117\n          },\n          \"instrumental\": {\n            \"SDR\": 7.48743,\n            \"SIR\": 8.45608,\n            \"SAR\": 13.4337,\n            \"ISR\": 16.3188\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.64126,\n            \"SIR\": 20.3077,\n            \"SAR\": 5.74439,\n            \"ISR\": 7.24166\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7226,\n            \"SIR\": 19.1607,\n            \"SAR\": 19.5924,\n            \"ISR\": 19.1367\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.85057,\n            \"SIR\": 32.7318,\n            \"SAR\": 13.652,\n            \"ISR\": 6.38448\n          },\n          \"instrumental\": {\n            \"SDR\": 6.47859,\n            \"SIR\": 7.01326,\n            \"SAR\": 14.1868,\n            \"ISR\": 18.5835\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.97743,\n            \"SIR\": 22.3569,\n            \"SAR\": 10.9353,\n            \"ISR\": 9.21187\n          },\n          \"instrumental\": {\n            \"SDR\": 7.29254,\n            \"SIR\": 9.85028,\n            \"SAR\": 11.898,\n            \"ISR\": 16.7331\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.28759,\n            \"SIR\": 31.9377,\n            \"SAR\": 16.6604,\n            \"ISR\": 6.00972\n          },\n          \"instrumental\": {\n            \"SDR\": 6.20834,\n            \"SIR\": 6.00502,\n            \"SAR\": 16.0185,\n            \"ISR\": 18.6868\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.8586,\n            \"SIR\": 22.3043,\n            \"SAR\": 12.0324,\n            \"ISR\": 7.05104\n          },\n          \"instrumental\": {\n            \"SDR\": 8.44329,\n            \"SIR\": 10.2969,\n            \"SAR\": 13.6717,\n            \"ISR\": 17.1684\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.29407,\n            \"SIR\": 17.9866,\n            \"SAR\": 7.63681,\n            \"ISR\": 16.6722\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5799,\n            \"SIR\": 19.1996,\n            \"SAR\": 15.0056,\n            \"ISR\": 16.8145\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.02635,\n            \"SIR\": 17.5912,\n            \"SAR\": 9.54704,\n            \"ISR\": 6.47959\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3714,\n            \"SIR\": 11.6303,\n            \"SAR\": 16.1434,\n            \"ISR\": 16.7441\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.99104,\n            \"SIR\": 18.3385,\n            \"SAR\": 3.55722,\n            \"ISR\": 10.0413\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8436,\n            \"SIR\": 19.2623,\n            \"SAR\": 14.3082,\n            \"ISR\": 17.9463\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.52258,\n            \"SIR\": 21.3506,\n            \"SAR\": 10.7605,\n            \"ISR\": 7.41085\n          },\n          \"instrumental\": {\n            \"SDR\": 7.60552,\n            \"SIR\": 9.12738,\n            \"SAR\": 12.3912,\n            \"ISR\": 16.3596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.33745,\n            \"SIR\": 25.7323,\n            \"SAR\": 9.52587,\n            \"ISR\": 8.2687\n          },\n          \"instrumental\": {\n            \"SDR\": 9.72218,\n            \"SIR\": 12.2565,\n            \"SAR\": 13.2566,\n            \"ISR\": 18.163\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -9.69914,\n            \"SIR\": -35.7775,\n            \"SAR\": 0.193735,\n            \"ISR\": 11.1668\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6326,\n            \"SIR\": 52.659,\n            \"SAR\": 11.2901,\n            \"ISR\": 12.4895\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.38655,\n            \"SIR\": 15.4574,\n            \"SAR\": 7.04866,\n            \"ISR\": 11.0792\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4229,\n            \"SIR\": 17.1729,\n            \"SAR\": 14.3861,\n            \"ISR\": 16.2388\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.50584,\n            \"SIR\": 24.1687,\n            \"SAR\": 11.139,\n            \"ISR\": 7.11991\n          },\n          \"instrumental\": {\n            \"SDR\": 9.17302,\n            \"SIR\": 11.6277,\n            \"SAR\": 14.092,\n            \"ISR\": 17.9407\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.70898,\n            \"SIR\": 23.8028,\n            \"SAR\": 12.076,\n            \"ISR\": 6.15867\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2055,\n            \"SIR\": 12.765,\n            \"SAR\": 18.0373,\n            \"ISR\": 18.3785\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -6.21867,\n            \"SIR\": 1.38433,\n            \"SAR\": -0.13684,\n            \"ISR\": 3.27683\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7692,\n            \"SIR\": 36.7626,\n            \"SAR\": 27.6018,\n            \"ISR\": 18.8359\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.19115,\n            \"SIR\": 14.0899,\n            \"SAR\": 4.05779,\n            \"ISR\": 11.3656\n          },\n          \"instrumental\": {\n            \"SDR\": 8.43892,\n            \"SIR\": 16.6608,\n            \"SAR\": 9.45778,\n            \"ISR\": 15.0596\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.81349,\n            \"SIR\": 16.3556,\n            \"SAR\": 6.57123,\n            \"ISR\": 7.26133\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6754,\n            \"SIR\": 17.0337,\n            \"SAR\": 16.2936,\n            \"ISR\": 17.6028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.07193,\n            \"SIR\": 18.6844,\n            \"SAR\": 8.10956,\n            \"ISR\": 9.97511\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0706,\n            \"SIR\": 14.2036,\n            \"SAR\": 12.6793,\n            \"ISR\": 16.9427\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.19343,\n            \"SIR\": 17.6906,\n            \"SAR\": 9.15414,\n            \"ISR\": 14.2397\n          },\n          \"instrumental\": {\n            \"SDR\": 11.996,\n            \"SIR\": 16.8618,\n            \"SAR\": 14.4066,\n            \"ISR\": 16.5471\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.7088,\n            \"SIR\": 24.3518,\n            \"SAR\": 11.8705,\n            \"ISR\": 11.2351\n          },\n          \"instrumental\": {\n            \"SDR\": 7.4305,\n            \"SIR\": 9.23594,\n            \"SAR\": 11.3759,\n            \"ISR\": 16.359\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.00809,\n            \"SIR\": 11.441,\n            \"SAR\": 6.14318,\n            \"ISR\": 8.95125\n          },\n          \"instrumental\": {\n            \"SDR\": 6.06875,\n            \"SIR\": 10.1106,\n            \"SAR\": 8.1383,\n            \"ISR\": 11.1587\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.63523,\n            \"SIR\": 24.5942,\n            \"SAR\": 11.1104,\n            \"ISR\": 6.72897\n          },\n          \"instrumental\": {\n            \"SDR\": 8.22452,\n            \"SIR\": 9.4809,\n            \"SAR\": 13.2378,\n            \"ISR\": 14.4119\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.23038,\n            \"SIR\": 30.9219,\n            \"SAR\": 14.7764,\n            \"ISR\": 5.51033\n          },\n          \"instrumental\": {\n            \"SDR\": 3.07409,\n            \"SIR\": 2.29799,\n            \"SAR\": 12.9081,\n            \"ISR\": 18.0712\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.51619,\n            \"SIR\": 24.1531,\n            \"SAR\": 12.1408,\n            \"ISR\": 9.11816\n          },\n          \"instrumental\": {\n            \"SDR\": 7.51586,\n            \"SIR\": 8.98796,\n            \"SAR\": 12.7079,\n            \"ISR\": 16.552\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.80022,\n            \"SIR\": 27.8756,\n            \"SAR\": 11.031,\n            \"ISR\": 7.66426\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9353,\n            \"SIR\": 13.2876,\n            \"SAR\": 16.7817,\n            \"ISR\": 19.0614\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.90267,\n            \"SIR\": 11.1777,\n            \"SAR\": 1.57497,\n            \"ISR\": 7.87925\n          },\n          \"instrumental\": {\n            \"SDR\": 7.81953,\n            \"SIR\": 12.8057,\n            \"SAR\": 8.54483,\n            \"ISR\": 14.8816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.29717,\n            \"SIR\": 27.7512,\n            \"SAR\": 14.1791,\n            \"ISR\": 10.4651\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1779,\n            \"SIR\": 14.5176,\n            \"SAR\": 21.6159,\n            \"ISR\": 19.0704\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.94701,\n            \"SIR\": 20.3444,\n            \"SAR\": 12.4755,\n            \"ISR\": 16.7659\n          },\n          \"instrumental\": {\n            \"SDR\": 7.55356,\n            \"SIR\": 9.95375,\n            \"SAR\": 10.5917,\n            \"ISR\": 13.9203\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.93671,\n            \"SIR\": 20.2014,\n            \"SAR\": 8.68515,\n            \"ISR\": 7.75969\n          },\n          \"instrumental\": {\n            \"SDR\": 7.34594,\n            \"SIR\": 9.64287,\n            \"SAR\": 10.9994,\n            \"ISR\": 15.6019\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.48102,\n            \"SIR\": 5.92901,\n            \"SAR\": 2.24504,\n            \"ISR\": 8.85133\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3262,\n            \"SIR\": 21.5113,\n            \"SAR\": 16.6025,\n            \"ISR\": 16.3706\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.38844,\n            \"SIR\": 25.9631,\n            \"SAR\": 10.9852,\n            \"ISR\": 9.69793\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5178,\n            \"SIR\": 13.6892,\n            \"SAR\": 17.0456,\n            \"ISR\": 18.5194\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.77645,\n            \"SIR\": 29.6897,\n            \"SAR\": 12.5191,\n            \"ISR\": 6.72845\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6968,\n            \"SIR\": 12.1384,\n            \"SAR\": 18.1899,\n            \"ISR\": 18.9801\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.56993,\n            \"SIR\": 30.0454,\n            \"SAR\": 13.3417,\n            \"ISR\": 6.34176\n          },\n          \"instrumental\": {\n            \"SDR\": 6.42373,\n            \"SIR\": 6.57884,\n            \"SAR\": 13.6072,\n            \"ISR\": 17.8828\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 4.54626,\n        \"SIR\": 20.5896,\n        \"SAR\": 10.8479,\n        \"ISR\": 8.07398\n      },\n      \"instrumental\": {\n        \"SDR\": 8.80815,\n        \"SIR\": 11.8844,\n        \"SAR\": 13.6395,\n        \"ISR\": 16.7793\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET_Main_406.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Main_406\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.85269,\n            \"SIR\": 17.2346,\n            \"SAR\": 5.75262,\n            \"ISR\": 10.9493\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5944,\n            \"SIR\": 26.387,\n            \"SAR\": 19.5127,\n            \"ISR\": 18.6513\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.92379,\n            \"SIR\": 12.2082,\n            \"SAR\": 7.45387,\n            \"ISR\": 14.543\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0279,\n            \"SIR\": 24.5345,\n            \"SAR\": 13.3392,\n            \"ISR\": 14.4247\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7074,\n            \"SIR\": 22.5645,\n            \"SAR\": 11.1273,\n            \"ISR\": 16.944\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7911,\n            \"SIR\": 29.0133,\n            \"SAR\": 18.1774,\n            \"ISR\": 18.2854\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.08184,\n            \"SIR\": 5.41147,\n            \"SAR\": 4.33557,\n            \"ISR\": 14.4986\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4945,\n            \"SIR\": 30.1395,\n            \"SAR\": 12.777,\n            \"ISR\": 12.8862\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2082,\n            \"SIR\": 23.4667,\n            \"SAR\": 13.8813,\n            \"ISR\": 18.7923\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3165,\n            \"SIR\": 27.5545,\n            \"SAR\": 15.7939,\n            \"ISR\": 17.309\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5437,\n            \"SIR\": 18.7618,\n            \"SAR\": 11.1092,\n            \"ISR\": 16.9324\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9488,\n            \"SIR\": 25.3293,\n            \"SAR\": 13.1192,\n            \"ISR\": 15.8457\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.165,\n            \"SIR\": 21.0816,\n            \"SAR\": 12.9775,\n            \"ISR\": 17.1211\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0218,\n            \"SIR\": 26.8968,\n            \"SAR\": 16.8997,\n            \"ISR\": 17.8594\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.5595,\n            \"SIR\": 19.7854,\n            \"SAR\": 5.67036,\n            \"ISR\": 9.44396\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0762,\n            \"SIR\": 30.7333,\n            \"SAR\": 23.8319,\n            \"ISR\": 19.434\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2822,\n            \"SIR\": 33.179,\n            \"SAR\": 13.6017,\n            \"ISR\": 18.8632\n          },\n          \"instrumental\": {\n            \"SDR\": 15.61,\n            \"SIR\": 28.717,\n            \"SAR\": 17.6275,\n            \"ISR\": 19.1609\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9462,\n            \"SIR\": 22.4425,\n            \"SAR\": 10.9985,\n            \"ISR\": 14.0326\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7365,\n            \"SIR\": 21.8904,\n            \"SAR\": 15.8364,\n            \"ISR\": 17.9213\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.6493,\n            \"SIR\": 32.1988,\n            \"SAR\": 16.6339,\n            \"ISR\": 21.1733\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4421,\n            \"SIR\": 30.1763,\n            \"SAR\": 19.1576,\n            \"ISR\": 19.1986\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9561,\n            \"SIR\": 22.6627,\n            \"SAR\": 12.0422,\n            \"ISR\": 17.1639\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4984,\n            \"SIR\": 27.7582,\n            \"SAR\": 17.5769,\n            \"ISR\": 18.3379\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.40762,\n            \"SIR\": 17.921,\n            \"SAR\": 7.64786,\n            \"ISR\": 15.0966\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4057,\n            \"SIR\": 29.5912,\n            \"SAR\": 17.0214,\n            \"ISR\": 17.4585\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.14378,\n            \"SIR\": 17.705,\n            \"SAR\": 9.4558,\n            \"ISR\": 14.1433\n          },\n          \"instrumental\": {\n            \"SDR\": 17.337,\n            \"SIR\": 26.837,\n            \"SAR\": 20.9095,\n            \"ISR\": 18.2032\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.43038,\n            \"SIR\": 15.8658,\n            \"SAR\": 3.14604,\n            \"ISR\": 9.85908\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9572,\n            \"SIR\": 23.4585,\n            \"SAR\": 17.7611,\n            \"ISR\": 18.3835\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3386,\n            \"SIR\": 21.4479,\n            \"SAR\": 10.8039,\n            \"ISR\": 17.2099\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4504,\n            \"SIR\": 29.2352,\n            \"SAR\": 16.2578,\n            \"ISR\": 17.7763\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.29467,\n            \"SIR\": 25.0927,\n            \"SAR\": 9.37852,\n            \"ISR\": 15.4085\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2638,\n            \"SIR\": 27.8164,\n            \"SAR\": 17.1557,\n            \"ISR\": 18.7649\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.95716,\n            \"SIR\": -33.6119,\n            \"SAR\": 0.169185,\n            \"ISR\": 11.6943\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2022,\n            \"SIR\": 58.3928,\n            \"SAR\": 14.678,\n            \"ISR\": 14.6233\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.3409,\n            \"SIR\": 16.5622,\n            \"SAR\": 7.12643,\n            \"ISR\": 11.9651\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5531,\n            \"SIR\": 23.4749,\n            \"SAR\": 18.3173,\n            \"ISR\": 17.8918\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.497,\n            \"SIR\": 24.3892,\n            \"SAR\": 11.3564,\n            \"ISR\": 15.9544\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6969,\n            \"SIR\": 26.5967,\n            \"SAR\": 17.4343,\n            \"ISR\": 18.7886\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4287,\n            \"SIR\": 23.7075,\n            \"SAR\": 12.0748,\n            \"ISR\": 18.0169\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5176,\n            \"SIR\": 33.1457,\n            \"SAR\": 21.1284,\n            \"ISR\": 19.0461\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -6.02913,\n            \"SIR\": 1.16806,\n            \"SAR\": -0.31571,\n            \"ISR\": 3.63733\n          },\n          \"instrumental\": {\n            \"SDR\": 19.5723,\n            \"SIR\": 39.946,\n            \"SAR\": 30.3347,\n            \"ISR\": 19.2551\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.03064,\n            \"SIR\": 13.7042,\n            \"SAR\": 4.00465,\n            \"ISR\": 7.20661\n          },\n          \"instrumental\": {\n            \"SDR\": 10.65,\n            \"SIR\": 15.0243,\n            \"SAR\": 13.8727,\n            \"ISR\": 16.6632\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.66973,\n            \"SIR\": 16.1513,\n            \"SAR\": 6.45549,\n            \"ISR\": 14.9347\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3235,\n            \"SIR\": 32.4516,\n            \"SAR\": 20.2595,\n            \"ISR\": 18.5398\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.91968,\n            \"SIR\": 19.0506,\n            \"SAR\": 8.03322,\n            \"ISR\": 12.8172\n          },\n          \"instrumental\": {\n            \"SDR\": 14.581,\n            \"SIR\": 24.0471,\n            \"SAR\": 16.9676,\n            \"ISR\": 18.2565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92466,\n            \"SIR\": 17.1233,\n            \"SAR\": 9.22667,\n            \"ISR\": 16.3965\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2912,\n            \"SIR\": 29.6159,\n            \"SAR\": 16.7509,\n            \"ISR\": 17.2251\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1146,\n            \"SIR\": 24.3619,\n            \"SAR\": 12.526,\n            \"ISR\": 17.1307\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0685,\n            \"SIR\": 24.6799,\n            \"SAR\": 14.3078,\n            \"ISR\": 17.3545\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.12086,\n            \"SIR\": 10.2634,\n            \"SAR\": 6.11878,\n            \"ISR\": 13.9205\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1575,\n            \"SIR\": 22.3877,\n            \"SAR\": 11.9012,\n            \"ISR\": 13.0429\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8228,\n            \"SIR\": 24.6827,\n            \"SAR\": 11.0189,\n            \"ISR\": 17.8965\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4318,\n            \"SIR\": 27.9047,\n            \"SAR\": 16.7393,\n            \"ISR\": 17.3327\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.7378,\n            \"SIR\": 30.6372,\n            \"SAR\": 14.5455,\n            \"ISR\": 17.2935\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8449,\n            \"SIR\": 19.6048,\n            \"SAR\": 14.6099,\n            \"ISR\": 18.8052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.839,\n            \"SIR\": 24.2648,\n            \"SAR\": 11.9585,\n            \"ISR\": 17.3835\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2763,\n            \"SIR\": 25.312,\n            \"SAR\": 15.5273,\n            \"ISR\": 17.6477\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2276,\n            \"SIR\": 27.5034,\n            \"SAR\": 12.0039,\n            \"ISR\": 16.1987\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3479,\n            \"SIR\": 27.3928,\n            \"SAR\": 21.2889,\n            \"ISR\": 19.3708\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.17105,\n            \"SIR\": 12.5917,\n            \"SAR\": 2.85984,\n            \"ISR\": 7.54104\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7638,\n            \"SIR\": 14.8659,\n            \"SAR\": 12.0567,\n            \"ISR\": 17.5165\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7243,\n            \"SIR\": 27.3897,\n            \"SAR\": 14.1531,\n            \"ISR\": 20.945\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5717,\n            \"SIR\": 37.0665,\n            \"SAR\": 24.5974,\n            \"ISR\": 19.3182\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3548,\n            \"SIR\": 20.7762,\n            \"SAR\": 12.3786,\n            \"ISR\": 18.9285\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5262,\n            \"SIR\": 26.2417,\n            \"SAR\": 12.1687,\n            \"ISR\": 15.1975\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.63999,\n            \"SIR\": 21.077,\n            \"SAR\": 8.69383,\n            \"ISR\": 15.396\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7364,\n            \"SIR\": 26.0014,\n            \"SAR\": 14.8899,\n            \"ISR\": 17.3915\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.94074,\n            \"SIR\": 5.62562,\n            \"SAR\": 2.23153,\n            \"ISR\": 12.3325\n          },\n          \"instrumental\": {\n            \"SDR\": 17.199,\n            \"SIR\": 33.4162,\n            \"SAR\": 20.702,\n            \"ISR\": 17.7513\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5791,\n            \"SIR\": 25.3579,\n            \"SAR\": 10.9447,\n            \"ISR\": 17.8893\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9572,\n            \"SIR\": 35.2834,\n            \"SAR\": 20.2154,\n            \"ISR\": 18.9272\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2134,\n            \"SIR\": 29.4645,\n            \"SAR\": 12.5179,\n            \"ISR\": 18.5048\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7932,\n            \"SIR\": 36.5564,\n            \"SAR\": 22.0422,\n            \"ISR\": 19.3503\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1006,\n            \"SIR\": 30.2437,\n            \"SAR\": 14.1278,\n            \"ISR\": 15.6777\n          },\n          \"instrumental\": {\n            \"SDR\": 14.791,\n            \"SIR\": 22.7587,\n            \"SAR\": 16.8571,\n            \"ISR\": 18.7422\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.4412,\n        \"SIR\": 21.0793,\n        \"SAR\": 10.8743,\n        \"ISR\": 15.8161\n      },\n      \"instrumental\": {\n        \"SDR\": 15.2775,\n        \"SIR\": 27.4736,\n        \"SAR\": 16.9945,\n        \"ISR\": 18.0622\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET_Main_427.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Main_427\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.82906,\n            \"SIR\": 17.6647,\n            \"SAR\": 5.943,\n            \"ISR\": 10.1517\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5208,\n            \"SIR\": 25.1407,\n            \"SAR\": 20.0019,\n            \"ISR\": 18.7475\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.18473,\n            \"SIR\": 13.6405,\n            \"SAR\": 7.63287,\n            \"ISR\": 13.145\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5536,\n            \"SIR\": 22.5558,\n            \"SAR\": 14.1371,\n            \"ISR\": 15.4313\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5803,\n            \"SIR\": 23.0794,\n            \"SAR\": 11.0791,\n            \"ISR\": 15.2334\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9408,\n            \"SIR\": 26.9283,\n            \"SAR\": 18.6528,\n            \"ISR\": 18.4442\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.43239,\n            \"SIR\": 5.86112,\n            \"SAR\": 4.12099,\n            \"ISR\": 12.7795\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1052,\n            \"SIR\": 26.2659,\n            \"SAR\": 13.3815,\n            \"ISR\": 13.5054\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.751,\n            \"SIR\": 23.5242,\n            \"SAR\": 13.8815,\n            \"ISR\": 16.3623\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6235,\n            \"SIR\": 24.7496,\n            \"SAR\": 16.1266,\n            \"ISR\": 17.4145\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.263,\n            \"SIR\": 18.6007,\n            \"SAR\": 11.1831,\n            \"ISR\": 15.163\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0269,\n            \"SIR\": 22.64,\n            \"SAR\": 13.5634,\n            \"ISR\": 15.9108\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8092,\n            \"SIR\": 20.7821,\n            \"SAR\": 12.9431,\n            \"ISR\": 15.5305\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1739,\n            \"SIR\": 25.0044,\n            \"SAR\": 17.2768,\n            \"ISR\": 17.8643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.48335,\n            \"SIR\": 20.4452,\n            \"SAR\": 6.01091,\n            \"ISR\": 9.33015\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1436,\n            \"SIR\": 28.7473,\n            \"SAR\": 24.1441,\n            \"ISR\": 19.4861\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9246,\n            \"SIR\": 32.88,\n            \"SAR\": 13.6859,\n            \"ISR\": 17.0253\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7651,\n            \"SIR\": 28.1523,\n            \"SAR\": 18.0615,\n            \"ISR\": 19.1234\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4911,\n            \"SIR\": 22.7069,\n            \"SAR\": 11.0089,\n            \"ISR\": 12.7886\n          },\n          \"instrumental\": {\n            \"SDR\": 16.108,\n            \"SIR\": 19.7213,\n            \"SAR\": 16.2973,\n            \"ISR\": 18.0665\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7035,\n            \"SIR\": 32.9433,\n            \"SAR\": 16.7267,\n            \"ISR\": 18.1318\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6874,\n            \"SIR\": 30.274,\n            \"SAR\": 19.6895,\n            \"ISR\": 19.3017\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7368,\n            \"SIR\": 21.6175,\n            \"SAR\": 11.922,\n            \"ISR\": 14.983\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5755,\n            \"SIR\": 25.2078,\n            \"SAR\": 17.8753,\n            \"ISR\": 18.1936\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84444,\n            \"SIR\": 19.9605,\n            \"SAR\": 8.05998,\n            \"ISR\": 13.6508\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6069,\n            \"SIR\": 26.5375,\n            \"SAR\": 17.6803,\n            \"ISR\": 17.9367\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.11076,\n            \"SIR\": 17.8021,\n            \"SAR\": 9.79032,\n            \"ISR\": 13.1461\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1209,\n            \"SIR\": 24.9091,\n            \"SAR\": 21.204,\n            \"ISR\": 18.3423\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.70765,\n            \"SIR\": 18.4609,\n            \"SAR\": 3.59805,\n            \"ISR\": 9.37758\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3403,\n            \"SIR\": 22.5583,\n            \"SAR\": 18.7607,\n            \"ISR\": 18.8189\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1157,\n            \"SIR\": 21.3878,\n            \"SAR\": 10.8579,\n            \"ISR\": 15.354\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5969,\n            \"SIR\": 25.8581,\n            \"SAR\": 16.6269,\n            \"ISR\": 17.8575\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.00868,\n            \"SIR\": 26.4189,\n            \"SAR\": 9.29668,\n            \"ISR\": 13.3758\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3278,\n            \"SIR\": 24.2921,\n            \"SAR\": 17.6984,\n            \"ISR\": 19.0027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.76577,\n            \"SIR\": -33.1414,\n            \"SAR\": 0.178235,\n            \"ISR\": 10.9973\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3093,\n            \"SIR\": 57.9346,\n            \"SAR\": 14.9751,\n            \"ISR\": 15.1469\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.39028,\n            \"SIR\": 15.5989,\n            \"SAR\": 7.22268,\n            \"ISR\": 11.5129\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5327,\n            \"SIR\": 22.7365,\n            \"SAR\": 18.8445,\n            \"ISR\": 17.692\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2435,\n            \"SIR\": 24.4722,\n            \"SAR\": 11.2163,\n            \"ISR\": 14.2944\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7269,\n            \"SIR\": 24.5448,\n            \"SAR\": 17.5491,\n            \"ISR\": 18.8246\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1382,\n            \"SIR\": 23.9968,\n            \"SAR\": 11.9879,\n            \"ISR\": 15.8671\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6451,\n            \"SIR\": 31.9426,\n            \"SAR\": 22.0543,\n            \"ISR\": 19.1177\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -2.49158,\n            \"SIR\": 2.15807,\n            \"SAR\": -0.12965,\n            \"ISR\": 3.57837\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8011,\n            \"SIR\": 40.3989,\n            \"SAR\": 32.996,\n            \"ISR\": 19.3244\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.19905,\n            \"SIR\": 14.5709,\n            \"SAR\": 3.97645,\n            \"ISR\": 6.73817\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8736,\n            \"SIR\": 14.5301,\n            \"SAR\": 14.3756,\n            \"ISR\": 17.1284\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.22215,\n            \"SIR\": 17.7075,\n            \"SAR\": 6.92583,\n            \"ISR\": 12.9828\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4981,\n            \"SIR\": 28.5646,\n            \"SAR\": 20.676,\n            \"ISR\": 18.837\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.73954,\n            \"SIR\": 18.77,\n            \"SAR\": 8.08825,\n            \"ISR\": 11.792\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5828,\n            \"SIR\": 22.6464,\n            \"SAR\": 17.3255,\n            \"ISR\": 18.2484\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.99932,\n            \"SIR\": 17.9972,\n            \"SAR\": 9.25604,\n            \"ISR\": 14.6958\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6511,\n            \"SIR\": 26.6695,\n            \"SAR\": 17.4702,\n            \"ISR\": 17.5615\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2289,\n            \"SIR\": 24.2029,\n            \"SAR\": 12.0261,\n            \"ISR\": 14.5605\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0242,\n            \"SIR\": 21.1282,\n            \"SAR\": 14.2652,\n            \"ISR\": 17.4035\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.59393,\n            \"SIR\": 12.9057,\n            \"SAR\": 6.51486,\n            \"ISR\": 12.3617\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0193,\n            \"SIR\": 20.6749,\n            \"SAR\": 13.3775,\n            \"ISR\": 14.6897\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6517,\n            \"SIR\": 24.7357,\n            \"SAR\": 11.016,\n            \"ISR\": 15.8507\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7927,\n            \"SIR\": 26.1829,\n            \"SAR\": 17.2063,\n            \"ISR\": 17.816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.973,\n            \"SIR\": 31.205,\n            \"SAR\": 15.0679,\n            \"ISR\": 16.4571\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2341,\n            \"SIR\": 21.4016,\n            \"SAR\": 15.3197,\n            \"ISR\": 18.9314\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5918,\n            \"SIR\": 24.2749,\n            \"SAR\": 11.9896,\n            \"ISR\": 15.7704\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3706,\n            \"SIR\": 24.4454,\n            \"SAR\": 15.8943,\n            \"ISR\": 17.736\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8173,\n            \"SIR\": 29.9201,\n            \"SAR\": 11.588,\n            \"ISR\": 13.3619\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4738,\n            \"SIR\": 24.6547,\n            \"SAR\": 21.5141,\n            \"ISR\": 19.5573\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.19626,\n            \"SIR\": 11.4512,\n            \"SAR\": 4.88464,\n            \"ISR\": 7.36752\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2192,\n            \"SIR\": 14.7938,\n            \"SAR\": 13.9812,\n            \"ISR\": 16.77\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1682,\n            \"SIR\": 28.0545,\n            \"SAR\": 14.3753,\n            \"ISR\": 17.8069\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7426,\n            \"SIR\": 36.419,\n            \"SAR\": 24.9575,\n            \"ISR\": 19.404\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9274,\n            \"SIR\": 20.7557,\n            \"SAR\": 12.1661,\n            \"ISR\": 16.3913\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0172,\n            \"SIR\": 23.0026,\n            \"SAR\": 12.671,\n            \"ISR\": 15.3927\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.61115,\n            \"SIR\": 20.2167,\n            \"SAR\": 8.79476,\n            \"ISR\": 14.1855\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9395,\n            \"SIR\": 23.8276,\n            \"SAR\": 15.2305,\n            \"ISR\": 17.2746\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.34018,\n            \"SIR\": 5.69531,\n            \"SAR\": 2.21021,\n            \"ISR\": 11.6237\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8566,\n            \"SIR\": 31.7886,\n            \"SAR\": 18.5851,\n            \"ISR\": 17.7667\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3832,\n            \"SIR\": 26.045,\n            \"SAR\": 10.9394,\n            \"ISR\": 15.9365\n          },\n          \"instrumental\": {\n            \"SDR\": 17.13,\n            \"SIR\": 31.0683,\n            \"SAR\": 20.6198,\n            \"ISR\": 19.0695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9351,\n            \"SIR\": 29.8557,\n            \"SAR\": 12.5938,\n            \"ISR\": 16.4933\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0269,\n            \"SIR\": 33.1436,\n            \"SAR\": 22.7178,\n            \"ISR\": 19.3754\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7682,\n            \"SIR\": 31.7154,\n            \"SAR\": 13.3831,\n            \"ISR\": 14.4392\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8062,\n            \"SIR\": 21.1557,\n            \"SAR\": 17.1792,\n            \"ISR\": 18.9236\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1894,\n        \"SIR\": 20.7689,\n        \"SAR\": 10.8986,\n        \"ISR\": 14.24\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4579,\n        \"SIR\": 24.9567,\n        \"SAR\": 17.5097,\n        \"ISR\": 18.1301\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET_Main_438.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Main_438\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.53862,\n            \"SIR\": 16.0907,\n            \"SAR\": 5.62231,\n            \"ISR\": 9.7587\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5526,\n            \"SIR\": 24.6013,\n            \"SAR\": 19.8913,\n            \"ISR\": 18.7021\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.87137,\n            \"SIR\": 10.954,\n            \"SAR\": 7.51221,\n            \"ISR\": 13.1776\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8205,\n            \"SIR\": 22.5439,\n            \"SAR\": 13.5629,\n            \"ISR\": 13.9922\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4693,\n            \"SIR\": 22.3815,\n            \"SAR\": 11.0578,\n            \"ISR\": 15.069\n          },\n          \"instrumental\": {\n            \"SDR\": 15.908,\n            \"SIR\": 26.5451,\n            \"SAR\": 18.5679,\n            \"ISR\": 18.2995\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.19079,\n            \"SIR\": 5.03508,\n            \"SAR\": 4.19754,\n            \"ISR\": 13.1544\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7231,\n            \"SIR\": 27.2728,\n            \"SAR\": 12.9448,\n            \"ISR\": 12.9452\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5596,\n            \"SIR\": 23.0696,\n            \"SAR\": 13.9336,\n            \"ISR\": 16.2919\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8041,\n            \"SIR\": 24.9615,\n            \"SAR\": 16.2353,\n            \"ISR\": 17.305\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1787,\n            \"SIR\": 18.5406,\n            \"SAR\": 11.179,\n            \"ISR\": 14.7971\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0469,\n            \"SIR\": 22.015,\n            \"SAR\": 13.6313,\n            \"ISR\": 15.8742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8322,\n            \"SIR\": 20.9027,\n            \"SAR\": 13.0488,\n            \"ISR\": 15.241\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2066,\n            \"SIR\": 24.583,\n            \"SAR\": 17.515,\n            \"ISR\": 17.8805\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.62374,\n            \"SIR\": 20.1042,\n            \"SAR\": 5.52221,\n            \"ISR\": 9.49657\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1448,\n            \"SIR\": 28.4615,\n            \"SAR\": 24.2907,\n            \"ISR\": 19.4531\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7146,\n            \"SIR\": 32.5973,\n            \"SAR\": 13.7003,\n            \"ISR\": 16.5219\n          },\n          \"instrumental\": {\n            \"SDR\": 15.627,\n            \"SIR\": 27.2061,\n            \"SAR\": 18.0835,\n            \"ISR\": 19.1334\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3636,\n            \"SIR\": 23.6057,\n            \"SAR\": 10.8339,\n            \"ISR\": 12.3016\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0353,\n            \"SIR\": 18.7557,\n            \"SAR\": 16.0968,\n            \"ISR\": 18.2179\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.6332,\n            \"SIR\": 31.4352,\n            \"SAR\": 16.6358,\n            \"ISR\": 17.7667\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6191,\n            \"SIR\": 29.6305,\n            \"SAR\": 19.7205,\n            \"ISR\": 19.1658\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5553,\n            \"SIR\": 20.4857,\n            \"SAR\": 11.9005,\n            \"ISR\": 15.2388\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4347,\n            \"SIR\": 25.9966,\n            \"SAR\": 17.4954,\n            \"ISR\": 17.9408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.69254,\n            \"SIR\": 19.7088,\n            \"SAR\": 8.24542,\n            \"ISR\": 13.385\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6626,\n            \"SIR\": 26.1541,\n            \"SAR\": 17.7936,\n            \"ISR\": 17.8871\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.23499,\n            \"SIR\": 18.6885,\n            \"SAR\": 9.92657,\n            \"ISR\": 12.988\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5952,\n            \"SIR\": 24.8591,\n            \"SAR\": 21.7264,\n            \"ISR\": 18.4903\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.36084,\n            \"SIR\": 15.7868,\n            \"SAR\": 3.15721,\n            \"ISR\": 9.26881\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9717,\n            \"SIR\": 22.4424,\n            \"SAR\": 18.2432,\n            \"ISR\": 18.4725\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0448,\n            \"SIR\": 21.0299,\n            \"SAR\": 10.8293,\n            \"ISR\": 15.0617\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6096,\n            \"SIR\": 25.4094,\n            \"SAR\": 16.7105,\n            \"ISR\": 17.7909\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.03938,\n            \"SIR\": 25.0834,\n            \"SAR\": 9.31471,\n            \"ISR\": 13.6026\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3736,\n            \"SIR\": 24.734,\n            \"SAR\": 17.9873,\n            \"ISR\": 18.8247\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -7.13392,\n            \"SIR\": -35.1355,\n            \"SAR\": 0.099835,\n            \"ISR\": 10.8927\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8364,\n            \"SIR\": 57.6771,\n            \"SAR\": 13.916,\n            \"ISR\": 13.8715\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.33598,\n            \"SIR\": 15.8023,\n            \"SAR\": 6.96489,\n            \"ISR\": 11.018\n          },\n          \"instrumental\": {\n            \"SDR\": 16.48,\n            \"SIR\": 21.8868,\n            \"SAR\": 18.6566,\n            \"ISR\": 17.7627\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2553,\n            \"SIR\": 24.1021,\n            \"SAR\": 11.4967,\n            \"ISR\": 14.5851\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8207,\n            \"SIR\": 25.361,\n            \"SAR\": 18.0551,\n            \"ISR\": 18.743\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9964,\n            \"SIR\": 23.7389,\n            \"SAR\": 12.0271,\n            \"ISR\": 15.8025\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7773,\n            \"SIR\": 31.8743,\n            \"SAR\": 22.2261,\n            \"ISR\": 19.0802\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -4.21508,\n            \"SIR\": 0.4978,\n            \"SAR\": -0.15217,\n            \"ISR\": 3.50991\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7313,\n            \"SIR\": 39.967,\n            \"SAR\": 31.1834,\n            \"ISR\": 19.1955\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.16299,\n            \"SIR\": 14.0534,\n            \"SAR\": 4.11577,\n            \"ISR\": 6.92901\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7203,\n            \"SIR\": 14.7547,\n            \"SAR\": 14.3836,\n            \"ISR\": 16.9152\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.46138,\n            \"SIR\": 17.4229,\n            \"SAR\": 6.89396,\n            \"ISR\": 13.328\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6482,\n            \"SIR\": 29.2913,\n            \"SAR\": 20.7738,\n            \"ISR\": 18.7852\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.77107,\n            \"SIR\": 18.6343,\n            \"SAR\": 8.18181,\n            \"ISR\": 11.6926\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5319,\n            \"SIR\": 22.3178,\n            \"SAR\": 17.3748,\n            \"ISR\": 18.2355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.77257,\n            \"SIR\": 16.8142,\n            \"SAR\": 8.93113,\n            \"ISR\": 14.4552\n          },\n          \"instrumental\": {\n            \"SDR\": 15.577,\n            \"SIR\": 26.0336,\n            \"SAR\": 17.1291,\n            \"ISR\": 17.2565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9914,\n            \"SIR\": 24.9799,\n            \"SAR\": 11.9488,\n            \"ISR\": 14.015\n          },\n          \"instrumental\": {\n            \"SDR\": 13.379,\n            \"SIR\": 20.3048,\n            \"SAR\": 14.5722,\n            \"ISR\": 17.6497\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.13499,\n            \"SIR\": 11.7891,\n            \"SAR\": 6.21354,\n            \"ISR\": 12.3758\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7913,\n            \"SIR\": 20.1435,\n            \"SAR\": 13.1917,\n            \"ISR\": 14.1052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5549,\n            \"SIR\": 24.6736,\n            \"SAR\": 11.1417,\n            \"ISR\": 15.5767\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6963,\n            \"SIR\": 25.8794,\n            \"SAR\": 17.3203,\n            \"ISR\": 17.4034\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7173,\n            \"SIR\": 30.8442,\n            \"SAR\": 14.4554,\n            \"ISR\": 15.5749\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1525,\n            \"SIR\": 20.2115,\n            \"SAR\": 15.065,\n            \"ISR\": 18.8932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3401,\n            \"SIR\": 24.1123,\n            \"SAR\": 11.9423,\n            \"ISR\": 15.3852\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3629,\n            \"SIR\": 23.9114,\n            \"SAR\": 15.8269,\n            \"ISR\": 17.6701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0658,\n            \"SIR\": 29.4224,\n            \"SAR\": 12.1483,\n            \"ISR\": 14.5367\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5967,\n            \"SIR\": 26.2081,\n            \"SAR\": 21.8042,\n            \"ISR\": 19.5187\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.19327,\n            \"SIR\": 12.3334,\n            \"SAR\": 5.97236,\n            \"ISR\": 8.99335\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7635,\n            \"SIR\": 15.8535,\n            \"SAR\": 14.1537,\n            \"ISR\": 16.9865\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9861,\n            \"SIR\": 27.2632,\n            \"SAR\": 14.1549,\n            \"ISR\": 17.4853\n          },\n          \"instrumental\": {\n            \"SDR\": 18.664,\n            \"SIR\": 35.1575,\n            \"SAR\": 24.8635,\n            \"ISR\": 19.3491\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2005,\n            \"SIR\": 20.7725,\n            \"SAR\": 12.6536,\n            \"ISR\": 16.538\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9446,\n            \"SIR\": 23.3534,\n            \"SAR\": 12.7453,\n            \"ISR\": 15.3119\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.51859,\n            \"SIR\": 19.8307,\n            \"SAR\": 8.75184,\n            \"ISR\": 13.6763\n          },\n          \"instrumental\": {\n            \"SDR\": 13.776,\n            \"SIR\": 22.8631,\n            \"SAR\": 15.2102,\n            \"ISR\": 17.2252\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.04433,\n            \"SIR\": 4.95162,\n            \"SAR\": 2.05809,\n            \"ISR\": 11.4678\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2729,\n            \"SIR\": 31.2839,\n            \"SAR\": 18.0395,\n            \"ISR\": 17.6568\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2762,\n            \"SIR\": 25.668,\n            \"SAR\": 11.0193,\n            \"ISR\": 15.543\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1582,\n            \"SIR\": 30.2847,\n            \"SAR\": 20.7154,\n            \"ISR\": 19.0361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8025,\n            \"SIR\": 29.3939,\n            \"SAR\": 12.6197,\n            \"ISR\": 15.8903\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0312,\n            \"SIR\": 31.8475,\n            \"SAR\": 22.7848,\n            \"ISR\": 19.3608\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8863,\n            \"SIR\": 30.6522,\n            \"SAR\": 13.9807,\n            \"ISR\": 14.6258\n          },\n          \"instrumental\": {\n            \"SDR\": 15.214,\n            \"SIR\": 21.9328,\n            \"SAR\": 17.5349,\n            \"ISR\": 18.838\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.1118,\n        \"SIR\": 20.6291,\n        \"SAR\": 10.8316,\n        \"ISR\": 14.2351\n      },\n      \"instrumental\": {\n        \"SDR\": 15.3232,\n        \"SIR\": 25.1613,\n        \"SAR\": 17.525,\n        \"ISR\": 18.0793\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"UVR-MDX-NET_Inst_82_beta.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Inst_82_beta\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.97868,\n            \"SIR\": 15.9098,\n            \"SAR\": 4.93943,\n            \"ISR\": 8.46945\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8315,\n            \"SIR\": 22.5876,\n            \"SAR\": 19.2636,\n            \"ISR\": 18.0554\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.33976,\n            \"SIR\": 12.3039,\n            \"SAR\": 6.87203,\n            \"ISR\": 12.0359\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2058,\n            \"SIR\": 20.7697,\n            \"SAR\": 13.0201,\n            \"ISR\": 14.6704\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.44618,\n            \"SIR\": 19.41,\n            \"SAR\": 10.0434,\n            \"ISR\": 14.0118\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8485,\n            \"SIR\": 24.4099,\n            \"SAR\": 17.4376,\n            \"ISR\": 17.388\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.47122,\n            \"SIR\": 6.56028,\n            \"SAR\": 4.05764,\n            \"ISR\": 12.2868\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2159,\n            \"SIR\": 24.729,\n            \"SAR\": 13.5021,\n            \"ISR\": 13.814\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5528,\n            \"SIR\": 20.6169,\n            \"SAR\": 12.421,\n            \"ISR\": 14.8839\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6366,\n            \"SIR\": 21.5589,\n            \"SAR\": 15.0913,\n            \"ISR\": 16.3547\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.46487,\n            \"SIR\": 16.7948,\n            \"SAR\": 10.5931,\n            \"ISR\": 13.8946\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3233,\n            \"SIR\": 20.1273,\n            \"SAR\": 13.0235,\n            \"ISR\": 15.1119\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6629,\n            \"SIR\": 18.7702,\n            \"SAR\": 11.9801,\n            \"ISR\": 13.9944\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1613,\n            \"SIR\": 21.8537,\n            \"SAR\": 16.4255,\n            \"ISR\": 16.9025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.31024,\n            \"SIR\": 14.0898,\n            \"SAR\": 4.93407,\n            \"ISR\": 6.31193\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0418,\n            \"SIR\": 26.8198,\n            \"SAR\": 22.776,\n            \"ISR\": 18.3882\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5808,\n            \"SIR\": 25.1633,\n            \"SAR\": 12.3084,\n            \"ISR\": 16.1251\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2692,\n            \"SIR\": 26.6604,\n            \"SAR\": 16.65,\n            \"ISR\": 17.7963\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7913,\n            \"SIR\": 20.3711,\n            \"SAR\": 8.6833,\n            \"ISR\": 10.9428\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5405,\n            \"SIR\": 16.5369,\n            \"SAR\": 14.3339,\n            \"ISR\": 17.1808\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4825,\n            \"SIR\": 26.7907,\n            \"SAR\": 15.0602,\n            \"ISR\": 17.5295\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4544,\n            \"SIR\": 30.0342,\n            \"SAR\": 18.1195,\n            \"ISR\": 18.1055\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8654,\n            \"SIR\": 20.077,\n            \"SAR\": 11.1553,\n            \"ISR\": 14.0203\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4459,\n            \"SIR\": 24.0598,\n            \"SAR\": 16.9315,\n            \"ISR\": 17.5409\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.02807,\n            \"SIR\": 17.5502,\n            \"SAR\": 7.50222,\n            \"ISR\": 12.2748\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8281,\n            \"SIR\": 24.3257,\n            \"SAR\": 17.0102,\n            \"ISR\": 16.9566\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.80679,\n            \"SIR\": 19.3428,\n            \"SAR\": 10.2089,\n            \"ISR\": 10.8592\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1631,\n            \"SIR\": 22.2432,\n            \"SAR\": 19.3452,\n            \"ISR\": 17.2753\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.11965,\n            \"SIR\": 14.4788,\n            \"SAR\": 3.33064,\n            \"ISR\": 8.66795\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6763,\n            \"SIR\": 21.5402,\n            \"SAR\": 18.8246,\n            \"ISR\": 17.7569\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.04124,\n            \"SIR\": 18.1292,\n            \"SAR\": 9.78163,\n            \"ISR\": 14.396\n          },\n          \"instrumental\": {\n            \"SDR\": 13.398,\n            \"SIR\": 23.7996,\n            \"SAR\": 15.486,\n            \"ISR\": 16.5433\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.63307,\n            \"SIR\": 13.4695,\n            \"SAR\": 7.31423,\n            \"ISR\": 11.2282\n          },\n          \"instrumental\": {\n            \"SDR\": 13.689,\n            \"SIR\": 20.9781,\n            \"SAR\": 15.5905,\n            \"ISR\": 17.0138\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -19.243,\n            \"SIR\": -39.0255,\n            \"SAR\": 0.66646,\n            \"ISR\": 10.4374\n          },\n          \"instrumental\": {\n            \"SDR\": 11.92,\n            \"SIR\": 53.7986,\n            \"SAR\": 11.236,\n            \"ISR\": 11.0034\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.46186,\n            \"SIR\": 14.8059,\n            \"SAR\": 6.06655,\n            \"ISR\": 9.46849\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2147,\n            \"SIR\": 19.9269,\n            \"SAR\": 17.2396,\n            \"ISR\": 17.3638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87192,\n            \"SIR\": 20.6308,\n            \"SAR\": 9.2786,\n            \"ISR\": 11.64\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2598,\n            \"SIR\": 19.9639,\n            \"SAR\": 16.8666,\n            \"ISR\": 17.8177\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.80995,\n            \"SIR\": 17.973,\n            \"SAR\": 9.84104,\n            \"ISR\": 14.1602\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3427,\n            \"SIR\": 28.1697,\n            \"SAR\": 19.6022,\n            \"ISR\": 17.7344\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -14.2101,\n            \"SIR\": -10.2625,\n            \"SAR\": 2.13489,\n            \"ISR\": 12.8683\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6652,\n            \"SIR\": 44.0013,\n            \"SAR\": 26.785,\n            \"ISR\": 18.3033\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.80457,\n            \"SIR\": 12.6507,\n            \"SAR\": 3.66269,\n            \"ISR\": 6.03163\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3415,\n            \"SIR\": 13.5011,\n            \"SAR\": 14.4417,\n            \"ISR\": 16.3849\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.9181,\n            \"SIR\": 15.5006,\n            \"SAR\": 4.01626,\n            \"ISR\": 8.32992\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5603,\n            \"SIR\": 23.4027,\n            \"SAR\": 19.8952,\n            \"ISR\": 18.1015\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.07668,\n            \"SIR\": 16.6664,\n            \"SAR\": 7.43408,\n            \"ISR\": 10.731\n          },\n          \"instrumental\": {\n            \"SDR\": 13.866,\n            \"SIR\": 20.8703,\n            \"SAR\": 16.8175,\n            \"ISR\": 17.435\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.62121,\n            \"SIR\": 14.5528,\n            \"SAR\": 8.26119,\n            \"ISR\": 13.5425\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6368,\n            \"SIR\": 24.172,\n            \"SAR\": 16.2977,\n            \"ISR\": 16.4566\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.217,\n            \"SIR\": 21.4823,\n            \"SAR\": 11.3837,\n            \"ISR\": 13.3821\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3735,\n            \"SIR\": 19.0836,\n            \"SAR\": 13.7436,\n            \"ISR\": 16.5025\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.00672,\n            \"SIR\": 16.3591,\n            \"SAR\": 5.80263,\n            \"ISR\": 9.39829\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5378,\n            \"SIR\": 16.5914,\n            \"SAR\": 13.588,\n            \"ISR\": 16.3429\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.2233,\n            \"SIR\": 7.77928,\n            \"SAR\": 7.96899,\n            \"ISR\": 14.4686\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7616,\n            \"SIR\": 21.9019,\n            \"SAR\": 12.9497,\n            \"ISR\": 13.7031\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.2107,\n            \"SIR\": 26.3023,\n            \"SAR\": 13.2331,\n            \"ISR\": 14.8089\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2371,\n            \"SIR\": 19.0771,\n            \"SAR\": 13.9072,\n            \"ISR\": 17.6365\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5285,\n            \"SIR\": 21.4459,\n            \"SAR\": 11.2219,\n            \"ISR\": 14.132\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5597,\n            \"SIR\": 21.5254,\n            \"SAR\": 15.2236,\n            \"ISR\": 16.8931\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1928,\n            \"SIR\": 21.6071,\n            \"SAR\": 9.88645,\n            \"ISR\": 11.775\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0795,\n            \"SIR\": 22.3702,\n            \"SAR\": 19.4496,\n            \"ISR\": 18.6247\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.21492,\n            \"SIR\": 9.27714,\n            \"SAR\": 2.07043,\n            \"ISR\": 7.40248\n          },\n          \"instrumental\": {\n            \"SDR\": 10.5473,\n            \"SIR\": 14.3239,\n            \"SAR\": 11.8278,\n            \"ISR\": 15.839\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5194,\n            \"SIR\": 21.2608,\n            \"SAR\": 12.1847,\n            \"ISR\": 17.3034\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5868,\n            \"SIR\": 34.249,\n            \"SAR\": 23.1577,\n            \"ISR\": 18.1323\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6511,\n            \"SIR\": 18.5914,\n            \"SAR\": 11.9932,\n            \"ISR\": 16.5261\n          },\n          \"instrumental\": {\n            \"SDR\": 11.0901,\n            \"SIR\": 22.6392,\n            \"SAR\": 11.6467,\n            \"ISR\": 14.1197\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.09959,\n            \"SIR\": 19.2745,\n            \"SAR\": 7.55956,\n            \"ISR\": 11.8409\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6853,\n            \"SIR\": 20.0304,\n            \"SAR\": 14.3729,\n            \"ISR\": 16.9422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.72921,\n            \"SIR\": 0.83863,\n            \"SAR\": 1.55131,\n            \"ISR\": 9.45459\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7747,\n            \"SIR\": 30.0314,\n            \"SAR\": 21.0191,\n            \"ISR\": 16.8437\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.3368,\n            \"SIR\": 21.9294,\n            \"SAR\": 10.1128,\n            \"ISR\": 14.9206\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1781,\n            \"SIR\": 28.9311,\n            \"SAR\": 19.9618,\n            \"ISR\": 18.0022\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3285,\n            \"SIR\": 23.1891,\n            \"SAR\": 10.7884,\n            \"ISR\": 14.3528\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9151,\n            \"SIR\": 28.4261,\n            \"SAR\": 21.1221,\n            \"ISR\": 18.2953\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6637,\n            \"SIR\": 24.5604,\n            \"SAR\": 12.3422,\n            \"ISR\": 14.9468\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3565,\n            \"SIR\": 22.0978,\n            \"SAR\": 16.0513,\n            \"ISR\": 17.5292\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.22151,\n        \"SIR\": 18.0511,\n        \"SAR\": 8.98095,\n        \"ISR\": 12.5775\n      },\n      \"instrumental\": {\n        \"SDR\": 14.3129,\n        \"SIR\": 22.3067,\n        \"SAR\": 16.5377,\n        \"ISR\": 17.2281\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET_Inst_90_beta.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Inst_90_beta\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.20267,\n            \"SIR\": 12.2383,\n            \"SAR\": 6.46048,\n            \"ISR\": 9.42509\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2004,\n            \"SIR\": 23.5832,\n            \"SAR\": 19.5653,\n            \"ISR\": 16.7009\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.57133,\n            \"SIR\": 9.09481,\n            \"SAR\": 6.93344,\n            \"ISR\": 12.0644\n          },\n          \"instrumental\": {\n            \"SDR\": 10.2586,\n            \"SIR\": 20.2755,\n            \"SAR\": 12.5757,\n            \"ISR\": 12.9299\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.38302,\n            \"SIR\": 16.3522,\n            \"SAR\": 10.5141,\n            \"ISR\": 14.2015\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1767,\n            \"SIR\": 24.6142,\n            \"SAR\": 17.499,\n            \"ISR\": 16.1413\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.67459,\n            \"SIR\": 4.69882,\n            \"SAR\": 4.86601,\n            \"ISR\": 13.0126\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4524,\n            \"SIR\": 25.6892,\n            \"SAR\": 13.266,\n            \"ISR\": 12.3422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1491,\n            \"SIR\": 18.1338,\n            \"SAR\": 12.8584,\n            \"ISR\": 15.4072\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1982,\n            \"SIR\": 22.6092,\n            \"SAR\": 15.1417,\n            \"ISR\": 15.1294\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.46007,\n            \"SIR\": 15.0668,\n            \"SAR\": 11.0231,\n            \"ISR\": 14.3746\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1035,\n            \"SIR\": 20.7546,\n            \"SAR\": 13.0315,\n            \"ISR\": 13.9626\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7217,\n            \"SIR\": 16.615,\n            \"SAR\": 12.4598,\n            \"ISR\": 14.4201\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6973,\n            \"SIR\": 22.5849,\n            \"SAR\": 16.6048,\n            \"ISR\": 15.7591\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.21906,\n            \"SIR\": 9.85829,\n            \"SAR\": 6.62323,\n            \"ISR\": 6.96987\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1991,\n            \"SIR\": 27.1252,\n            \"SAR\": 23.1972,\n            \"ISR\": 17.1665\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3584,\n            \"SIR\": 20.1837,\n            \"SAR\": 12.5645,\n            \"ISR\": 16.2482\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7105,\n            \"SIR\": 26.7251,\n            \"SAR\": 16.638,\n            \"ISR\": 16.2788\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6352,\n            \"SIR\": 16.3377,\n            \"SAR\": 9.74293,\n            \"ISR\": 11.9466\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0272,\n            \"SIR\": 18.334,\n            \"SAR\": 14.8331,\n            \"ISR\": 15.5219\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.292,\n            \"SIR\": 21.7226,\n            \"SAR\": 15.1109,\n            \"ISR\": 17.633\n          },\n          \"instrumental\": {\n            \"SDR\": 14.707,\n            \"SIR\": 30.2118,\n            \"SAR\": 18.1144,\n            \"ISR\": 16.6277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6523,\n            \"SIR\": 17.3253,\n            \"SAR\": 11.5388,\n            \"ISR\": 14.6133\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9999,\n            \"SIR\": 24.8454,\n            \"SAR\": 17.0806,\n            \"ISR\": 16.2674\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.1097,\n            \"SIR\": 14.5728,\n            \"SAR\": 7.85878,\n            \"ISR\": 12.839\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1288,\n            \"SIR\": 24.9779,\n            \"SAR\": 17.2192,\n            \"ISR\": 15.4992\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.57967,\n            \"SIR\": 15.6643,\n            \"SAR\": 10.7271,\n            \"ISR\": 11.4946\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5064,\n            \"SIR\": 22.6226,\n            \"SAR\": 18.8388,\n            \"ISR\": 15.9312\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.93704,\n            \"SIR\": 10.894,\n            \"SAR\": 5.66662,\n            \"ISR\": 8.86966\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1478,\n            \"SIR\": 21.6014,\n            \"SAR\": 18.5365,\n            \"ISR\": 16.528\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.98175,\n            \"SIR\": 16.154,\n            \"SAR\": 10.0676,\n            \"ISR\": 14.4547\n          },\n          \"instrumental\": {\n            \"SDR\": 12.7934,\n            \"SIR\": 23.7011,\n            \"SAR\": 15.5471,\n            \"ISR\": 15.4652\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.74832,\n            \"SIR\": 12.4927,\n            \"SAR\": 7.89414,\n            \"ISR\": 12.4142\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3587,\n            \"SIR\": 22.2747,\n            \"SAR\": 15.8602,\n            \"ISR\": 15.7992\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -27.4341,\n            \"SIR\": -41.7215,\n            \"SAR\": 2.36152,\n            \"ISR\": 12.0074\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3352,\n            \"SIR\": 54.3477,\n            \"SAR\": 9.52361,\n            \"ISR\": 8.93\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.16758,\n            \"SIR\": 11.4117,\n            \"SAR\": 6.74364,\n            \"ISR\": 9.93892\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5351,\n            \"SIR\": 20.3294,\n            \"SAR\": 17.5407,\n            \"ISR\": 15.8813\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.86415,\n            \"SIR\": 17.0683,\n            \"SAR\": 10.232,\n            \"ISR\": 12.356\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9609,\n            \"SIR\": 21.2519,\n            \"SAR\": 16.9743,\n            \"ISR\": 16.5743\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.53017,\n            \"SIR\": 14.9536,\n            \"SAR\": 10.1498,\n            \"ISR\": 14.872\n          },\n          \"instrumental\": {\n            \"SDR\": 15.484,\n            \"SIR\": 29.1,\n            \"SAR\": 19.5157,\n            \"ISR\": 16.5301\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -13.0368,\n            \"SIR\": -10.6564,\n            \"SAR\": 7.59445,\n            \"ISR\": 13.2938\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3802,\n            \"SIR\": 44.1642,\n            \"SAR\": 27.1096,\n            \"ISR\": 17.0368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.87647,\n            \"SIR\": 10.8503,\n            \"SAR\": 4.26633,\n            \"ISR\": 6.29382\n          },\n          \"instrumental\": {\n            \"SDR\": 10.1952,\n            \"SIR\": 13.6309,\n            \"SAR\": 14.5818,\n            \"ISR\": 15.3893\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.5467,\n            \"SIR\": 11.5149,\n            \"SAR\": 6.95337,\n            \"ISR\": 11.755\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7581,\n            \"SIR\": 27.6261,\n            \"SAR\": 20.5288,\n            \"ISR\": 16.6283\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.99209,\n            \"SIR\": 14.0758,\n            \"SAR\": 7.92927,\n            \"ISR\": 10.9793\n          },\n          \"instrumental\": {\n            \"SDR\": 13.387,\n            \"SIR\": 21.145,\n            \"SAR\": 17.0217,\n            \"ISR\": 16.2149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.59957,\n            \"SIR\": 12.3984,\n            \"SAR\": 8.52245,\n            \"ISR\": 13.7444\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1692,\n            \"SIR\": 24.3327,\n            \"SAR\": 16.2307,\n            \"ISR\": 15.3322\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.032,\n            \"SIR\": 18.9589,\n            \"SAR\": 12.2025,\n            \"ISR\": 14.7923\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0707,\n            \"SIR\": 21.6104,\n            \"SAR\": 13.967,\n            \"ISR\": 15.2024\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.76981,\n            \"SIR\": 11.0752,\n            \"SAR\": 6.32747,\n            \"ISR\": 10.4228\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6102,\n            \"SIR\": 17.3353,\n            \"SAR\": 13.2431,\n            \"ISR\": 13.9176\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.99845,\n            \"SIR\": 6.98281,\n            \"SAR\": 8.44448,\n            \"ISR\": 14.7222\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4324,\n            \"SIR\": 22.2574,\n            \"SAR\": 13.0607,\n            \"ISR\": 13.0165\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.072,\n            \"SIR\": 23.2516,\n            \"SAR\": 13.545,\n            \"ISR\": 14.9129\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6541,\n            \"SIR\": 19.3555,\n            \"SAR\": 14.1474,\n            \"ISR\": 16.3782\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3247,\n            \"SIR\": 18.5579,\n            \"SAR\": 11.3553,\n            \"ISR\": 14.5035\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0946,\n            \"SIR\": 21.7402,\n            \"SAR\": 15.3277,\n            \"ISR\": 15.6151\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.93231,\n            \"SIR\": 17.0268,\n            \"SAR\": 11.1058,\n            \"ISR\": 13.3245\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3262,\n            \"SIR\": 24.0168,\n            \"SAR\": 19.8755,\n            \"ISR\": 17.1591\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.97279,\n            \"SIR\": 7.16759,\n            \"SAR\": 6.58768,\n            \"ISR\": 10.5582\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8523,\n            \"SIR\": 18.3732,\n            \"SAR\": 13.0302,\n            \"ISR\": 13.4754\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4563,\n            \"SIR\": 18.5555,\n            \"SAR\": 12.8828,\n            \"ISR\": 17.2104\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6174,\n            \"SIR\": 33.7264,\n            \"SAR\": 23.5436,\n            \"ISR\": 17.0111\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1738,\n            \"SIR\": 17.3697,\n            \"SAR\": 11.9402,\n            \"ISR\": 15.8828\n          },\n          \"instrumental\": {\n            \"SDR\": 10.6332,\n            \"SIR\": 21.7943,\n            \"SAR\": 11.6937,\n            \"ISR\": 13.4713\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.38409,\n            \"SIR\": 13.9087,\n            \"SAR\": 8.13109,\n            \"ISR\": 12.8174\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2273,\n            \"SIR\": 20.9835,\n            \"SAR\": 13.9579,\n            \"ISR\": 14.7378\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.39374,\n            \"SIR\": 0.70406,\n            \"SAR\": 3.9578,\n            \"ISR\": 9.18104\n          },\n          \"instrumental\": {\n            \"SDR\": 14.674,\n            \"SIR\": 30.4623,\n            \"SAR\": 18.3033,\n            \"ISR\": 15.6645\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.20524,\n            \"SIR\": 17.4444,\n            \"SAR\": 10.5626,\n            \"ISR\": 15.0667\n          },\n          \"instrumental\": {\n            \"SDR\": 15.332,\n            \"SIR\": 29.0095,\n            \"SAR\": 19.9754,\n            \"ISR\": 16.6886\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0138,\n            \"SIR\": 17.9647,\n            \"SAR\": 11.1812,\n            \"ISR\": 14.7069\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9754,\n            \"SIR\": 28.285,\n            \"SAR\": 21.0298,\n            \"ISR\": 16.9922\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6604,\n            \"SIR\": 18.9153,\n            \"SAR\": 12.9875,\n            \"ISR\": 15.9069\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8227,\n            \"SIR\": 25.1864,\n            \"SAR\": 16.1632,\n            \"ISR\": 15.4374\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.13924,\n        \"SIR\": 15.0102,\n        \"SAR\": 9.90526,\n        \"ISR\": 13.3091\n      },\n      \"instrumental\": {\n        \"SDR\": 13.8918,\n        \"SIR\": 23.1029,\n        \"SAR\": 16.6214,\n        \"ISR\": 15.7792\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET_Inst_187_beta.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET_Inst_187_beta\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.36936,\n            \"SIR\": 13.026,\n            \"SAR\": 7.20428,\n            \"ISR\": 9.46774\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2983,\n            \"SIR\": 23.7378,\n            \"SAR\": 19.7604,\n            \"ISR\": 16.8152\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.51237,\n            \"SIR\": 8.68336,\n            \"SAR\": 7.02625,\n            \"ISR\": 13.4352\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0231,\n            \"SIR\": 22.6724,\n            \"SAR\": 12.2508,\n            \"ISR\": 12.4554\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.79824,\n            \"SIR\": 16.4021,\n            \"SAR\": 10.9665,\n            \"ISR\": 15.019\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3635,\n            \"SIR\": 26.1217,\n            \"SAR\": 17.8836,\n            \"ISR\": 16.0959\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.28764,\n            \"SIR\": 4.63104,\n            \"SAR\": 4.53183,\n            \"ISR\": 13.1727\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4408,\n            \"SIR\": 26.0525,\n            \"SAR\": 13.1905,\n            \"ISR\": 12.3168\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8284,\n            \"SIR\": 18.846,\n            \"SAR\": 13.3086,\n            \"ISR\": 16.0081\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5657,\n            \"SIR\": 23.8069,\n            \"SAR\": 15.5869,\n            \"ISR\": 15.3798\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.7213,\n            \"SIR\": 15.6196,\n            \"SAR\": 11.2403,\n            \"ISR\": 14.6775\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1598,\n            \"SIR\": 21.3468,\n            \"SAR\": 13.2475,\n            \"ISR\": 14.1823\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9998,\n            \"SIR\": 17.2269,\n            \"SAR\": 12.8645,\n            \"ISR\": 14.7168\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9219,\n            \"SIR\": 23.188,\n            \"SAR\": 17.0147,\n            \"ISR\": 15.8638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.76283,\n            \"SIR\": 10.1612,\n            \"SAR\": 7.41427,\n            \"ISR\": 7.76716\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1556,\n            \"SIR\": 28.0721,\n            \"SAR\": 23.101,\n            \"ISR\": 17.0586\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9189,\n            \"SIR\": 21.6511,\n            \"SAR\": 12.9066,\n            \"ISR\": 16.7239\n          },\n          \"instrumental\": {\n            \"SDR\": 14.078,\n            \"SIR\": 27.9904,\n            \"SAR\": 17.1271,\n            \"ISR\": 16.6125\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.119,\n            \"SIR\": 17.5241,\n            \"SAR\": 10.1873,\n            \"ISR\": 12.0741\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4694,\n            \"SIR\": 18.2688,\n            \"SAR\": 15.2649,\n            \"ISR\": 15.8977\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9378,\n            \"SIR\": 22.6192,\n            \"SAR\": 16.0801,\n            \"ISR\": 18.03\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0557,\n            \"SIR\": 30.9244,\n            \"SAR\": 18.6909,\n            \"ISR\": 16.8214\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9621,\n            \"SIR\": 17.2045,\n            \"SAR\": 12.1515,\n            \"ISR\": 15.504\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2716,\n            \"SIR\": 26.8404,\n            \"SAR\": 17.397,\n            \"ISR\": 16.1551\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49748,\n            \"SIR\": 15.7191,\n            \"SAR\": 8.18195,\n            \"ISR\": 13.005\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4973,\n            \"SIR\": 25.2574,\n            \"SAR\": 17.5821,\n            \"ISR\": 16.0322\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.10643,\n            \"SIR\": 16.0526,\n            \"SAR\": 11.3911,\n            \"ISR\": 11.7472\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6982,\n            \"SIR\": 23.1172,\n            \"SAR\": 19.0789,\n            \"ISR\": 16.0708\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.98627,\n            \"SIR\": 11.3211,\n            \"SAR\": 6.16117,\n            \"ISR\": 8.96469\n          },\n          \"instrumental\": {\n            \"SDR\": 14.277,\n            \"SIR\": 21.6327,\n            \"SAR\": 18.9127,\n            \"ISR\": 16.6113\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.33747,\n            \"SIR\": 16.4205,\n            \"SAR\": 10.4907,\n            \"ISR\": 14.9317\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0885,\n            \"SIR\": 24.6582,\n            \"SAR\": 15.8527,\n            \"ISR\": 15.4861\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.88963,\n            \"SIR\": 12.7681,\n            \"SAR\": 8.13672,\n            \"ISR\": 13.1047\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3498,\n            \"SIR\": 23.4772,\n            \"SAR\": 15.9515,\n            \"ISR\": 15.7655\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -25.1384,\n            \"SIR\": -38.8244,\n            \"SAR\": 1.65476,\n            \"ISR\": 11.6135\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3747,\n            \"SIR\": 56.5735,\n            \"SAR\": 12.5153,\n            \"ISR\": 11.3759\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.50226,\n            \"SIR\": 11.7269,\n            \"SAR\": 7.02281,\n            \"ISR\": 10.7734\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6904,\n            \"SIR\": 21.3717,\n            \"SAR\": 17.9454,\n            \"ISR\": 15.8412\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2548,\n            \"SIR\": 17.5972,\n            \"SAR\": 11.2889,\n            \"ISR\": 13.9862\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4079,\n            \"SIR\": 23.8972,\n            \"SAR\": 17.2697,\n            \"ISR\": 16.5928\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.99252,\n            \"SIR\": 15.1268,\n            \"SAR\": 10.439,\n            \"ISR\": 15.2382\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6408,\n            \"SIR\": 29.1007,\n            \"SAR\": 20.2419,\n            \"ISR\": 16.5109\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.3581,\n            \"SIR\": -10.5771,\n            \"SAR\": 9.06836,\n            \"ISR\": 12.773\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2892,\n            \"SIR\": 43.0899,\n            \"SAR\": 28.9179,\n            \"ISR\": 16.9788\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.04467,\n            \"SIR\": 11.1306,\n            \"SAR\": 4.42455,\n            \"ISR\": 6.55434\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3007,\n            \"SIR\": 13.933,\n            \"SAR\": 14.5828,\n            \"ISR\": 15.4258\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.84179,\n            \"SIR\": 12.1022,\n            \"SAR\": 7.61948,\n            \"ISR\": 12.0636\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8385,\n            \"SIR\": 27.6706,\n            \"SAR\": 20.4044,\n            \"ISR\": 16.6957\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.20076,\n            \"SIR\": 14.6473,\n            \"SAR\": 8.17887,\n            \"ISR\": 11.1653\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6283,\n            \"SIR\": 21.4367,\n            \"SAR\": 17.19,\n            \"ISR\": 16.3149\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.0528,\n            \"SIR\": 12.9967,\n            \"SAR\": 8.89459,\n            \"ISR\": 14.0636\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3749,\n            \"SIR\": 25.005,\n            \"SAR\": 16.849,\n            \"ISR\": 15.4946\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3206,\n            \"SIR\": 19.6451,\n            \"SAR\": 12.4908,\n            \"ISR\": 14.9285\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2574,\n            \"SIR\": 21.7951,\n            \"SAR\": 14.2051,\n            \"ISR\": 15.4701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.13154,\n            \"SIR\": 13.0094,\n            \"SAR\": 6.66814,\n            \"ISR\": 10.7435\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1288,\n            \"SIR\": 18.1541,\n            \"SAR\": 13.6063,\n            \"ISR\": 14.6017\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.55034,\n            \"SIR\": 7.36739,\n            \"SAR\": 8.59668,\n            \"ISR\": 14.8697\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5973,\n            \"SIR\": 22.5521,\n            \"SAR\": 13.3371,\n            \"ISR\": 13.2887\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.2929,\n            \"SIR\": 24.0298,\n            \"SAR\": 14.2954,\n            \"ISR\": 15.3965\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8998,\n            \"SIR\": 20.3896,\n            \"SAR\": 14.6298,\n            \"ISR\": 16.5022\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9149,\n            \"SIR\": 19.0512,\n            \"SAR\": 12.0628,\n            \"ISR\": 15.0942\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4183,\n            \"SIR\": 22.9578,\n            \"SAR\": 15.7083,\n            \"ISR\": 15.7118\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4199,\n            \"SIR\": 17.0238,\n            \"SAR\": 11.9322,\n            \"ISR\": 14.4594\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3481,\n            \"SIR\": 26.2851,\n            \"SAR\": 20.2876,\n            \"ISR\": 17.0469\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.56634,\n            \"SIR\": 8.85607,\n            \"SAR\": 2.53657,\n            \"ISR\": 7.47159\n          },\n          \"instrumental\": {\n            \"SDR\": 10.4968,\n            \"SIR\": 13.9847,\n            \"SAR\": 12.7539,\n            \"ISR\": 14.9708\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3323,\n            \"SIR\": 18.5174,\n            \"SAR\": 13.25,\n            \"ISR\": 17.7224\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6316,\n            \"SIR\": 35.4979,\n            \"SAR\": 23.8839,\n            \"ISR\": 16.9231\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8046,\n            \"SIR\": 17.8196,\n            \"SAR\": 12.5165,\n            \"ISR\": 16.572\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1833,\n            \"SIR\": 22.7637,\n            \"SAR\": 12.2072,\n            \"ISR\": 13.6263\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.101,\n            \"SIR\": 16.0713,\n            \"SAR\": 8.70599,\n            \"ISR\": 13.1482\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8315,\n            \"SIR\": 21.4532,\n            \"SAR\": 14.8154,\n            \"ISR\": 15.5388\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.50855,\n            \"SIR\": 0.67222,\n            \"SAR\": 4.61358,\n            \"ISR\": 10.3423\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0653,\n            \"SIR\": 30.3329,\n            \"SAR\": 21.1636,\n            \"ISR\": 15.8941\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.60538,\n            \"SIR\": 17.9619,\n            \"SAR\": 10.8293,\n            \"ISR\": 15.4658\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4963,\n            \"SIR\": 29.9486,\n            \"SAR\": 20.3135,\n            \"ISR\": 16.7951\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2757,\n            \"SIR\": 18.1564,\n            \"SAR\": 11.5929,\n            \"ISR\": 15.1745\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9764,\n            \"SIR\": 29.5843,\n            \"SAR\": 21.4453,\n            \"ISR\": 16.9758\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4525,\n            \"SIR\": 21.1111,\n            \"SAR\": 13.6101,\n            \"ISR\": 15.9529\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2756,\n            \"SIR\": 24.5536,\n            \"SAR\": 17.1151,\n            \"ISR\": 16.3198\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.54948,\n        \"SIR\": 15.8859,\n        \"SAR\": 10.3132,\n        \"ISR\": 14.0249\n      },\n      \"instrumental\": {\n        \"SDR\": 14.2736,\n        \"SIR\": 23.852,\n        \"SAR\": 17.1211,\n        \"ISR\": 15.9649\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_full_292.onnx\": {\n    \"model_name\": \"MDX-Net Model VIP: UVR-MDX-NET-Inst_full_292\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.54252,\n            \"SIR\": 15.7078,\n            \"SAR\": 5.50417,\n            \"ISR\": 10.0188\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1056,\n            \"SIR\": 24.6499,\n            \"SAR\": 19.6085,\n            \"ISR\": 18.3314\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.47415,\n            \"SIR\": 10.7711,\n            \"SAR\": 7.08723,\n            \"ISR\": 13.123\n          },\n          \"instrumental\": {\n            \"SDR\": 11.317,\n            \"SIR\": 22.4021,\n            \"SAR\": 12.8289,\n            \"ISR\": 13.9143\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1643,\n            \"SIR\": 21.0724,\n            \"SAR\": 10.6419,\n            \"ISR\": 15.1426\n          },\n          \"instrumental\": {\n            \"SDR\": 15.558,\n            \"SIR\": 26.565,\n            \"SAR\": 18.1709,\n            \"ISR\": 18.0106\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.39747,\n            \"SIR\": 5.51887,\n            \"SAR\": 4.56685,\n            \"ISR\": 13.7659\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8682,\n            \"SIR\": 27.211,\n            \"SAR\": 13.0723,\n            \"ISR\": 13.1838\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.987,\n            \"SIR\": 22.0627,\n            \"SAR\": 12.9677,\n            \"ISR\": 15.9777\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3953,\n            \"SIR\": 24.3051,\n            \"SAR\": 15.6619,\n            \"ISR\": 17.0268\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.81381,\n            \"SIR\": 17.1199,\n            \"SAR\": 10.742,\n            \"ISR\": 14.8021\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5654,\n            \"SIR\": 21.8976,\n            \"SAR\": 13.0929,\n            \"ISR\": 15.2299\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3714,\n            \"SIR\": 20.038,\n            \"SAR\": 12.587,\n            \"ISR\": 14.8939\n          },\n          \"instrumental\": {\n            \"SDR\": 14.891,\n            \"SIR\": 23.6299,\n            \"SAR\": 17.0918,\n            \"ISR\": 17.5681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.15152,\n            \"SIR\": 17.8522,\n            \"SAR\": 5.08817,\n            \"ISR\": 8.532\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8187,\n            \"SIR\": 28.61,\n            \"SAR\": 23.4221,\n            \"ISR\": 19.1368\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9932,\n            \"SIR\": 28.4073,\n            \"SAR\": 12.853,\n            \"ISR\": 16.4699\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0332,\n            \"SIR\": 27.6154,\n            \"SAR\": 17.1807,\n            \"ISR\": 18.6237\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7741,\n            \"SIR\": 20.7192,\n            \"SAR\": 10.2936,\n            \"ISR\": 12.4045\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4307,\n            \"SIR\": 19.0908,\n            \"SAR\": 15.7516,\n            \"ISR\": 17.4509\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1183,\n            \"SIR\": 30.6671,\n            \"SAR\": 15.9555,\n            \"ISR\": 17.8021\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4406,\n            \"SIR\": 30.4041,\n            \"SAR\": 19.1185,\n            \"ISR\": 19.0291\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2107,\n            \"SIR\": 20.4259,\n            \"SAR\": 11.5713,\n            \"ISR\": 15.6806\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0826,\n            \"SIR\": 27.3436,\n            \"SAR\": 17.3563,\n            \"ISR\": 17.8239\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.6928,\n            \"SIR\": 19.2459,\n            \"SAR\": 8.11505,\n            \"ISR\": 13.2198\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4035,\n            \"SIR\": 25.9502,\n            \"SAR\": 17.386,\n            \"ISR\": 17.6483\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.02842,\n            \"SIR\": 19.9925,\n            \"SAR\": 10.7817,\n            \"ISR\": 11.7508\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7368,\n            \"SIR\": 23.3707,\n            \"SAR\": 19.1946,\n            \"ISR\": 17.7475\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.51616,\n            \"SIR\": 16.1912,\n            \"SAR\": 3.34447,\n            \"ISR\": 9.09535\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9567,\n            \"SIR\": 22.2559,\n            \"SAR\": 18.4926,\n            \"ISR\": 18.4256\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.61379,\n            \"SIR\": 17.0044,\n            \"SAR\": 9.31084,\n            \"ISR\": 14.7581\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6567,\n            \"SIR\": 24.5011,\n            \"SAR\": 14.3967,\n            \"ISR\": 16.2093\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.29479,\n            \"SIR\": 20.8529,\n            \"SAR\": 8.52881,\n            \"ISR\": 13.5185\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5113,\n            \"SIR\": 24.4933,\n            \"SAR\": 16.7242,\n            \"ISR\": 18.1657\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -17.7931,\n            \"SIR\": -38.9964,\n            \"SAR\": 0.26585,\n            \"ISR\": 11.6109\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8178,\n            \"SIR\": 57.5807,\n            \"SAR\": 11.1146,\n            \"ISR\": 11.2028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.61161,\n            \"SIR\": 13.6788,\n            \"SAR\": 6.68038,\n            \"ISR\": 11.0572\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5741,\n            \"SIR\": 22.0444,\n            \"SAR\": 17.6169,\n            \"ISR\": 17.0856\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.768,\n            \"SIR\": 22.2074,\n            \"SAR\": 11.09,\n            \"ISR\": 15.0508\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3527,\n            \"SIR\": 26.0894,\n            \"SAR\": 17.6055,\n            \"ISR\": 18.2836\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0469,\n            \"SIR\": 20.9816,\n            \"SAR\": 10.4421,\n            \"ISR\": 15.483\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9409,\n            \"SIR\": 31.2262,\n            \"SAR\": 20.2999,\n            \"ISR\": 18.5831\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.9116,\n            \"SIR\": -8.97781,\n            \"SAR\": -0.96472,\n            \"ISR\": 10.0981\n          },\n          \"instrumental\": {\n            \"SDR\": 19.4743,\n            \"SIR\": 40.7421,\n            \"SAR\": 29.1881,\n            \"ISR\": 18.9728\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.10791,\n            \"SIR\": 14.5987,\n            \"SAR\": 3.90009,\n            \"ISR\": 6.39505\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7603,\n            \"SIR\": 14.0662,\n            \"SAR\": 14.5859,\n            \"ISR\": 17.1264\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0496,\n            \"SIR\": 16.0767,\n            \"SAR\": 6.23946,\n            \"ISR\": 12.7454\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2552,\n            \"SIR\": 28.4409,\n            \"SAR\": 19.7706,\n            \"ISR\": 18.5276\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.537,\n            \"SIR\": 18.3375,\n            \"SAR\": 7.87007,\n            \"ISR\": 11.4149\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3855,\n            \"SIR\": 22.0259,\n            \"SAR\": 17.2115,\n            \"ISR\": 18.1118\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.4672,\n            \"SIR\": 16.6646,\n            \"SAR\": 8.71142,\n            \"ISR\": 14.2952\n          },\n          \"instrumental\": {\n            \"SDR\": 15.286,\n            \"SIR\": 25.8262,\n            \"SAR\": 17.1067,\n            \"ISR\": 17.1934\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5318,\n            \"SIR\": 23.4127,\n            \"SAR\": 11.874,\n            \"ISR\": 14.7266\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9211,\n            \"SIR\": 21.74,\n            \"SAR\": 14.3134,\n            \"ISR\": 17.1976\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.6753,\n            \"SIR\": 9.97184,\n            \"SAR\": 5.78856,\n            \"ISR\": 12.0798\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1482,\n            \"SIR\": 19.6303,\n            \"SAR\": 12.2594,\n            \"ISR\": 13.2331\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.30236,\n            \"SIR\": 6.23026,\n            \"SAR\": 7.5286,\n            \"ISR\": 15.1277\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8388,\n            \"SIR\": 23.4107,\n            \"SAR\": 12.8049,\n            \"ISR\": 13.2665\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4394,\n            \"SIR\": 28.6187,\n            \"SAR\": 14.7549,\n            \"ISR\": 16.262\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8457,\n            \"SIR\": 21.7121,\n            \"SAR\": 15.1061,\n            \"ISR\": 18.5389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2095,\n            \"SIR\": 22.9471,\n            \"SAR\": 11.6402,\n            \"ISR\": 15.3509\n          },\n          \"instrumental\": {\n            \"SDR\": 14.095,\n            \"SIR\": 23.6691,\n            \"SAR\": 15.6341,\n            \"ISR\": 17.3565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3092,\n            \"SIR\": 24.8561,\n            \"SAR\": 10.2104,\n            \"ISR\": 12.0652\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1074,\n            \"SIR\": 23.3349,\n            \"SAR\": 20.2268,\n            \"ISR\": 18.8935\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.46394,\n            \"SIR\": 8.40823,\n            \"SAR\": 5.5441,\n            \"ISR\": 11.4008\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8089,\n            \"SIR\": 19.5895,\n            \"SAR\": 11.8569,\n            \"ISR\": 14.7945\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7168,\n            \"SIR\": 25.8381,\n            \"SAR\": 13.5305,\n            \"ISR\": 17.6718\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5516,\n            \"SIR\": 35.4711,\n            \"SAR\": 24.2257,\n            \"ISR\": 19.1836\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6445,\n            \"SIR\": 19.4317,\n            \"SAR\": 12.2014,\n            \"ISR\": 16.6169\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2919,\n            \"SIR\": 23.2338,\n            \"SAR\": 12.1039,\n            \"ISR\": 14.6652\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.34852,\n            \"SIR\": 22.1235,\n            \"SAR\": 8.43358,\n            \"ISR\": 13.3752\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5486,\n            \"SIR\": 22.2081,\n            \"SAR\": 15.0949,\n            \"ISR\": 17.6535\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.65055,\n            \"SIR\": 6.04193,\n            \"SAR\": 2.11584,\n            \"ISR\": 10.596\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4938,\n            \"SIR\": 30.5691,\n            \"SAR\": 20.1651,\n            \"ISR\": 17.8085\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.87268,\n            \"SIR\": 24.9939,\n            \"SAR\": 10.489,\n            \"ISR\": 15.3587\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8517,\n            \"SIR\": 30.1136,\n            \"SAR\": 20.3862,\n            \"ISR\": 18.8673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1302,\n            \"SIR\": 27.8748,\n            \"SAR\": 11.5186,\n            \"ISR\": 15.5401\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7366,\n            \"SIR\": 30.7493,\n            \"SAR\": 21.875,\n            \"ISR\": 19.1881\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9939,\n            \"SIR\": 26.5542,\n            \"SAR\": 13.726,\n            \"ISR\": 16.7207\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4001,\n            \"SIR\": 26.7834,\n            \"SAR\": 17.2413,\n            \"ISR\": 18.1632\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.54049,\n        \"SIR\": 19.7121,\n        \"SAR\": 9.76062,\n        \"ISR\": 14.0305\n      },\n      \"instrumental\": {\n        \"SDR\": 15.0579,\n        \"SIR\": 24.4972,\n        \"SAR\": 17.1961,\n        \"ISR\": 17.778\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"htdemucs_ft.yaml\": {\n    \"model_name\": \"Demucs v4: htdemucs_ft\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.08748,\n            \"SIR\": 17.144,\n            \"SAR\": 6.11034,\n            \"ISR\": 12.6967\n          },\n          \"drums\": {\n            \"SDR\": 5.1398,\n            \"SIR\": 16.7632,\n            \"SAR\": 4.47122,\n            \"ISR\": 8.87269\n          },\n          \"bass\": {\n            \"SDR\": 19.6061,\n            \"SIR\": 26.9475,\n            \"SAR\": 19.9212,\n            \"ISR\": 25.6764\n          },\n          \"other\": {\n            \"SDR\": 10.379,\n            \"SIR\": 15.77,\n            \"SAR\": 11.5282,\n            \"ISR\": 20.5043\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.87915,\n            \"SIR\": 19.0441,\n            \"SAR\": 7.86232,\n            \"ISR\": 14.6037\n          },\n          \"drums\": {\n            \"SDR\": 9.49839,\n            \"SIR\": 20.977,\n            \"SAR\": 10.2955,\n            \"ISR\": 14.0041\n          },\n          \"bass\": {\n            \"SDR\": 12.555,\n            \"SIR\": 20.7999,\n            \"SAR\": 14.0735,\n            \"ISR\": 15.4626\n          },\n          \"other\": {\n            \"SDR\": 8.49975,\n            \"SIR\": 14.2335,\n            \"SAR\": 9.43418,\n            \"ISR\": 16.7411\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8088,\n            \"SIR\": 18.8631,\n            \"SAR\": 10.8714,\n            \"ISR\": 20.7598\n          },\n          \"drums\": {\n            \"SDR\": 12.3337,\n            \"SIR\": 22.0962,\n            \"SAR\": 12.4767,\n            \"ISR\": 21.6261\n          },\n          \"bass\": {\n            \"SDR\": 11.3562,\n            \"SIR\": 22.8976,\n            \"SAR\": 9.73282,\n            \"ISR\": 12.9839\n          },\n          \"other\": {\n            \"SDR\": 7.05993,\n            \"SIR\": 12.191,\n            \"SAR\": 6.20555,\n            \"ISR\": 13.8254\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.95752,\n            \"SIR\": 4.50647,\n            \"SAR\": 5.77934,\n            \"ISR\": 15.4828\n          },\n          \"drums\": {\n            \"SDR\": 8.88779,\n            \"SIR\": 16.2661,\n            \"SAR\": 10.0191,\n            \"ISR\": 17.5029\n          },\n          \"bass\": {\n            \"SDR\": 4.93095,\n            \"SIR\": 24.8171,\n            \"SAR\": 5.56008,\n            \"ISR\": 6.97457\n          },\n          \"other\": {\n            \"SDR\": -0.39985,\n            \"SIR\": -2.64823,\n            \"SAR\": 1.72115,\n            \"ISR\": 4.62734\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8694,\n            \"SIR\": 20.2473,\n            \"SAR\": 12.2777,\n            \"ISR\": 20.4483\n          },\n          \"drums\": {\n            \"SDR\": 10.0244,\n            \"SIR\": 21.133,\n            \"SAR\": 10.997,\n            \"ISR\": 16.7617\n          },\n          \"bass\": {\n            \"SDR\": 11.6992,\n            \"SIR\": 21.4431,\n            \"SAR\": 11.629,\n            \"ISR\": 17.3537\n          },\n          \"other\": {\n            \"SDR\": 7.96476,\n            \"SIR\": 13.3619,\n            \"SAR\": 7.95521,\n            \"ISR\": 14.1681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.94422,\n            \"SIR\": 14.9863,\n            \"SAR\": 9.64679,\n            \"ISR\": 16.4351\n          },\n          \"drums\": {\n            \"SDR\": 9.78024,\n            \"SIR\": 18.8449,\n            \"SAR\": 11.0301,\n            \"ISR\": 14.7087\n          },\n          \"bass\": {\n            \"SDR\": 7.90541,\n            \"SIR\": 9.18312,\n            \"SAR\": 7.63197,\n            \"ISR\": 16.4765\n          },\n          \"other\": {\n            \"SDR\": 6.29039,\n            \"SIR\": 12.3169,\n            \"SAR\": 6.34577,\n            \"ISR\": 9.87795\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.036,\n            \"SIR\": 17.9162,\n            \"SAR\": 11.6161,\n            \"ISR\": 19.0923\n          },\n          \"drums\": {\n            \"SDR\": 10.1848,\n            \"SIR\": 20.0882,\n            \"SAR\": 11.3395,\n            \"ISR\": 16.0502\n          },\n          \"bass\": {\n            \"SDR\": 17.7502,\n            \"SIR\": 27.2123,\n            \"SAR\": 14.336,\n            \"ISR\": 17.6888\n          },\n          \"other\": {\n            \"SDR\": 7.87695,\n            \"SIR\": 12.534,\n            \"SAR\": 7.61291,\n            \"ISR\": 14.4614\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.68739,\n            \"SIR\": 19.6404,\n            \"SAR\": 6.70557,\n            \"ISR\": 12.4074\n          },\n          \"drums\": {\n            \"SDR\": 10.2345,\n            \"SIR\": 18.1301,\n            \"SAR\": 10.7862,\n            \"ISR\": 18.0684\n          },\n          \"bass\": {\n            \"SDR\": 17.9564,\n            \"SIR\": 30.6188,\n            \"SAR\": 16.498,\n            \"ISR\": 19.6131\n          },\n          \"other\": {\n            \"SDR\": 7.5472,\n            \"SIR\": 11.3519,\n            \"SAR\": 7.54351,\n            \"ISR\": 14.3772\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8404,\n            \"SIR\": 27.6138,\n            \"SAR\": 11.9268,\n            \"ISR\": 22.1581\n          },\n          \"drums\": {\n            \"SDR\": 8.31121,\n            \"SIR\": 18.0997,\n            \"SAR\": 8.71986,\n            \"ISR\": 15.0845\n          },\n          \"bass\": {\n            \"SDR\": 10.3589,\n            \"SIR\": 22.6435,\n            \"SAR\": 9.97587,\n            \"ISR\": 14.6063\n          },\n          \"other\": {\n            \"SDR\": 11.1281,\n            \"SIR\": 17.8399,\n            \"SAR\": 11.7408,\n            \"ISR\": 19.6241\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0388,\n            \"SIR\": 18.5112,\n            \"SAR\": 10.0006,\n            \"ISR\": 16.282\n          },\n          \"drums\": {\n            \"SDR\": 10.2561,\n            \"SIR\": 19.7651,\n            \"SAR\": 10.3934,\n            \"ISR\": 16.4898\n          },\n          \"bass\": {\n            \"SDR\": 16.1016,\n            \"SIR\": 30.238,\n            \"SAR\": 15.0562,\n            \"ISR\": 21.7911\n          },\n          \"other\": {\n            \"SDR\": 11.3816,\n            \"SIR\": 15.8239,\n            \"SAR\": 10.451,\n            \"ISR\": 17.1779\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1298,\n            \"SIR\": 23.1288,\n            \"SAR\": 13.7556,\n            \"ISR\": 25.9922\n          },\n          \"drums\": {\n            \"SDR\": 11.2861,\n            \"SIR\": 17.9969,\n            \"SAR\": 11.0163,\n            \"ISR\": 18.9596\n          },\n          \"bass\": {\n            \"SDR\": 12.0199,\n            \"SIR\": 20.3359,\n            \"SAR\": 7.91919,\n            \"ISR\": 9.77948\n          },\n          \"other\": {\n            \"SDR\": 10.1695,\n            \"SIR\": 12.4923,\n            \"SAR\": 8.84543,\n            \"ISR\": 20.3318\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7932,\n            \"SIR\": 19.5952,\n            \"SAR\": 10.9821,\n            \"ISR\": 19.8238\n          },\n          \"drums\": {\n            \"SDR\": 12.7738,\n            \"SIR\": 22.425,\n            \"SAR\": 13.7751,\n            \"ISR\": 20.141\n          },\n          \"bass\": {\n            \"SDR\": 11.4445,\n            \"SIR\": 20.1414,\n            \"SAR\": 11.9458,\n            \"ISR\": 20.2774\n          },\n          \"other\": {\n            \"SDR\": 6.91353,\n            \"SIR\": 13.7917,\n            \"SAR\": 7.03032,\n            \"ISR\": 12.8673\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.0187,\n            \"SIR\": 17.2363,\n            \"SAR\": 7.372,\n            \"ISR\": 15.644\n          },\n          \"drums\": {\n            \"SDR\": 7.09524,\n            \"SIR\": 18.0718,\n            \"SAR\": 7.31846,\n            \"ISR\": 13.0651\n          },\n          \"bass\": {\n            \"SDR\": 13.9639,\n            \"SIR\": 25.1196,\n            \"SAR\": 14.7761,\n            \"ISR\": 22.8725\n          },\n          \"other\": {\n            \"SDR\": 10.7082,\n            \"SIR\": 17.8976,\n            \"SAR\": 10.1184,\n            \"ISR\": 16.4778\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.7932,\n        \"SIR\": 18.8631,\n        \"SAR\": 10.0006,\n        \"ISR\": 16.4351\n      },\n      \"drums\": {\n        \"SDR\": 10.0244,\n        \"SIR\": 18.8449,\n        \"SAR\": 10.7862,\n        \"ISR\": 16.4898\n      },\n      \"bass\": {\n        \"SDR\": 12.0199,\n        \"SIR\": 22.8976,\n        \"SAR\": 11.9458,\n        \"ISR\": 17.3537\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"drums\",\n      \"bass\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"htdemucs.yaml\": {\n    \"model_name\": \"Demucs v4: htdemucs\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.57989,\n            \"SIR\": 15.919,\n            \"SAR\": 5.6672,\n            \"ISR\": 12.5146\n          },\n          \"drums\": {\n            \"SDR\": 3.4786,\n            \"SIR\": 15.3778,\n            \"SAR\": 2.08293,\n            \"ISR\": 5.22501\n          },\n          \"bass\": {\n            \"SDR\": 18.5115,\n            \"SIR\": 22.9108,\n            \"SAR\": 19.5757,\n            \"ISR\": 25.8449\n          },\n          \"other\": {\n            \"SDR\": 9.94838,\n            \"SIR\": 15.6823,\n            \"SAR\": 11.0621,\n            \"ISR\": 19.3189\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.39182,\n            \"SIR\": 18.1421,\n            \"SAR\": 7.33519,\n            \"ISR\": 13.7516\n          },\n          \"drums\": {\n            \"SDR\": 8.40161,\n            \"SIR\": 19.6676,\n            \"SAR\": 8.92091,\n            \"ISR\": 12.633\n          },\n          \"bass\": {\n            \"SDR\": 11.909,\n            \"SIR\": 19.711,\n            \"SAR\": 13.632,\n            \"ISR\": 14.9125\n          },\n          \"other\": {\n            \"SDR\": 7.77882,\n            \"SIR\": 12.9784,\n            \"SAR\": 8.84332,\n            \"ISR\": 16.1011\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1811,\n            \"SIR\": 17.5339,\n            \"SAR\": 10.3123,\n            \"ISR\": 19.459\n          },\n          \"drums\": {\n            \"SDR\": 10.4373,\n            \"SIR\": 18.3586,\n            \"SAR\": 10.8625,\n            \"ISR\": 22.2567\n          },\n          \"bass\": {\n            \"SDR\": 9.19871,\n            \"SIR\": 19.7823,\n            \"SAR\": 8.32562,\n            \"ISR\": 12.5227\n          },\n          \"other\": {\n            \"SDR\": 5.53135,\n            \"SIR\": 10.962,\n            \"SAR\": 4.45932,\n            \"ISR\": 11.3536\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.2606,\n            \"SIR\": 4.69423,\n            \"SAR\": 5.8341,\n            \"ISR\": 15.8249\n          },\n          \"drums\": {\n            \"SDR\": 8.19452,\n            \"SIR\": 14.4369,\n            \"SAR\": 9.44168,\n            \"ISR\": 18.1507\n          },\n          \"bass\": {\n            \"SDR\": 6.28668,\n            \"SIR\": 23.4279,\n            \"SAR\": 7.212,\n            \"ISR\": 9.27559\n          },\n          \"other\": {\n            \"SDR\": 0.55542,\n            \"SIR\": 1.21317,\n            \"SAR\": 0.23462,\n            \"ISR\": 4.11639\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4819,\n            \"SIR\": 19.0492,\n            \"SAR\": 11.8735,\n            \"ISR\": 20.3725\n          },\n          \"drums\": {\n            \"SDR\": 9.48768,\n            \"SIR\": 20.1542,\n            \"SAR\": 10.3816,\n            \"ISR\": 15.8755\n          },\n          \"bass\": {\n            \"SDR\": 11.6443,\n            \"SIR\": 20.7386,\n            \"SAR\": 11.3031,\n            \"ISR\": 17.2754\n          },\n          \"other\": {\n            \"SDR\": 7.58634,\n            \"SIR\": 13.272,\n            \"SAR\": 7.48369,\n            \"ISR\": 13.0932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.82893,\n            \"SIR\": 14.8728,\n            \"SAR\": 9.54323,\n            \"ISR\": 16.0121\n          },\n          \"drums\": {\n            \"SDR\": 9.36481,\n            \"SIR\": 18.7192,\n            \"SAR\": 10.701,\n            \"ISR\": 14.2709\n          },\n          \"bass\": {\n            \"SDR\": 8.11879,\n            \"SIR\": 14.1439,\n            \"SAR\": 7.86138,\n            \"ISR\": 15.0381\n          },\n          \"other\": {\n            \"SDR\": 6.64456,\n            \"SIR\": 12.3823,\n            \"SAR\": 7.11786,\n            \"ISR\": 11.6687\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2856,\n            \"SIR\": 17.4402,\n            \"SAR\": 10.8917,\n            \"ISR\": 18.1344\n          },\n          \"drums\": {\n            \"SDR\": 9.44627,\n            \"SIR\": 19.3698,\n            \"SAR\": 10.5507,\n            \"ISR\": 15.4531\n          },\n          \"bass\": {\n            \"SDR\": 16.8356,\n            \"SIR\": 25.6126,\n            \"SAR\": 14.164,\n            \"ISR\": 17.6165\n          },\n          \"other\": {\n            \"SDR\": 7.46841,\n            \"SIR\": 12.1285,\n            \"SAR\": 7.00531,\n            \"ISR\": 13.5816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.15287,\n            \"SIR\": 18.4751,\n            \"SAR\": 6.21281,\n            \"ISR\": 11.6394\n          },\n          \"drums\": {\n            \"SDR\": 9.453,\n            \"SIR\": 17.7372,\n            \"SAR\": 10.1852,\n            \"ISR\": 16.6252\n          },\n          \"bass\": {\n            \"SDR\": 17.4451,\n            \"SIR\": 29.9173,\n            \"SAR\": 15.9392,\n            \"ISR\": 19.2901\n          },\n          \"other\": {\n            \"SDR\": 7.01304,\n            \"SIR\": 10.2876,\n            \"SAR\": 7.1112,\n            \"ISR\": 14.2461\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.925,\n            \"SIR\": 25.9155,\n            \"SAR\": 11.397,\n            \"ISR\": 20.0906\n          },\n          \"drums\": {\n            \"SDR\": 7.77766,\n            \"SIR\": 17.8595,\n            \"SAR\": 8.13634,\n            \"ISR\": 15.0445\n          },\n          \"bass\": {\n            \"SDR\": 10.1941,\n            \"SIR\": 22.6631,\n            \"SAR\": 9.63813,\n            \"ISR\": 14.0471\n          },\n          \"other\": {\n            \"SDR\": 10.9618,\n            \"SIR\": 17.5331,\n            \"SAR\": 11.5776,\n            \"ISR\": 19.6973\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.86742,\n            \"SIR\": 17.6698,\n            \"SAR\": 9.09544,\n            \"ISR\": 14.7304\n          },\n          \"drums\": {\n            \"SDR\": 9.65639,\n            \"SIR\": 18.3177,\n            \"SAR\": 9.66205,\n            \"ISR\": 15.5911\n          },\n          \"bass\": {\n            \"SDR\": 15.3681,\n            \"SIR\": 29.0374,\n            \"SAR\": 14.3688,\n            \"ISR\": 20.5504\n          },\n          \"other\": {\n            \"SDR\": 10.7408,\n            \"SIR\": 14.9256,\n            \"SAR\": 9.80483,\n            \"ISR\": 16.4024\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2306,\n            \"SIR\": 22.3695,\n            \"SAR\": 13.1633,\n            \"ISR\": 24.1674\n          },\n          \"drums\": {\n            \"SDR\": 10.0162,\n            \"SIR\": 14.3347,\n            \"SAR\": 9.38603,\n            \"ISR\": 17.3695\n          },\n          \"bass\": {\n            \"SDR\": 11.0374,\n            \"SIR\": 19.7294,\n            \"SAR\": 7.4226,\n            \"ISR\": 8.89045\n          },\n          \"other\": {\n            \"SDR\": 9.09514,\n            \"SIR\": 11.4196,\n            \"SAR\": 7.90024,\n            \"ISR\": 17.413\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3326,\n            \"SIR\": 18.4057,\n            \"SAR\": 10.577,\n            \"ISR\": 19.2448\n          },\n          \"drums\": {\n            \"SDR\": 12.174,\n            \"SIR\": 21.7936,\n            \"SAR\": 13.3275,\n            \"ISR\": 19.0528\n          },\n          \"bass\": {\n            \"SDR\": 10.9604,\n            \"SIR\": 19.8011,\n            \"SAR\": 11.62,\n            \"ISR\": 19.195\n          },\n          \"other\": {\n            \"SDR\": 6.4247,\n            \"SIR\": 12.8973,\n            \"SAR\": 6.68105,\n            \"ISR\": 12.1017\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.48775,\n            \"SIR\": 16.9004,\n            \"SAR\": 6.89733,\n            \"ISR\": 14.7807\n          },\n          \"drums\": {\n            \"SDR\": 6.63689,\n            \"SIR\": 16.8225,\n            \"SAR\": 6.69713,\n            \"ISR\": 12.3227\n          },\n          \"bass\": {\n            \"SDR\": 13.5712,\n            \"SIR\": 23.3432,\n            \"SAR\": 14.5354,\n            \"ISR\": 22.564\n          },\n          \"other\": {\n            \"SDR\": 10.6269,\n            \"SIR\": 17.8247,\n            \"SAR\": 9.36735,\n            \"ISR\": 15.6004\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.86742,\n        \"SIR\": 17.6698,\n        \"SAR\": 9.54323,\n        \"ISR\": 16.0121\n      },\n      \"drums\": {\n        \"SDR\": 9.44627,\n        \"SIR\": 18.3177,\n        \"SAR\": 9.66205,\n        \"ISR\": 15.5911\n      },\n      \"bass\": {\n        \"SDR\": 11.6443,\n        \"SIR\": 22.6631,\n        \"SAR\": 11.62,\n        \"ISR\": 17.2754\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"drums\",\n      \"bass\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"hdemucs_mmi.yaml\": {\n    \"model_name\": \"Demucs v4: hdemucs_mmi\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.56426,\n            \"SIR\": 15.3421,\n            \"SAR\": 5.81316,\n            \"ISR\": 13.3612\n          },\n          \"drums\": {\n            \"SDR\": 4.76153,\n            \"SIR\": 17.0363,\n            \"SAR\": 3.95759,\n            \"ISR\": 7.58476\n          },\n          \"bass\": {\n            \"SDR\": 19.1023,\n            \"SIR\": 24.9967,\n            \"SAR\": 20.0212,\n            \"ISR\": 26.5658\n          },\n          \"other\": {\n            \"SDR\": 9.95157,\n            \"SIR\": 16.1186,\n            \"SAR\": 11.1911,\n            \"ISR\": 18.8137\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.4434,\n            \"SIR\": 18.4871,\n            \"SAR\": 7.33728,\n            \"ISR\": 14.0059\n          },\n          \"drums\": {\n            \"SDR\": 8.96928,\n            \"SIR\": 21.1526,\n            \"SAR\": 10.1224,\n            \"ISR\": 12.9924\n          },\n          \"bass\": {\n            \"SDR\": 12.7492,\n            \"SIR\": 19.4508,\n            \"SAR\": 13.3306,\n            \"ISR\": 16.4465\n          },\n          \"other\": {\n            \"SDR\": 7.773,\n            \"SIR\": 13.5621,\n            \"SAR\": 8.92433,\n            \"ISR\": 16.5778\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4397,\n            \"SIR\": 17.8346,\n            \"SAR\": 10.4412,\n            \"ISR\": 19.5344\n          },\n          \"drums\": {\n            \"SDR\": 11.3684,\n            \"SIR\": 20.2939,\n            \"SAR\": 11.5673,\n            \"ISR\": 21.5121\n          },\n          \"bass\": {\n            \"SDR\": 11.2627,\n            \"SIR\": 21.99,\n            \"SAR\": 9.48758,\n            \"ISR\": 13.1898\n          },\n          \"other\": {\n            \"SDR\": 6.23897,\n            \"SIR\": 11.8417,\n            \"SAR\": 5.5416,\n            \"ISR\": 12.7981\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.06505,\n            \"SIR\": 5.26505,\n            \"SAR\": 5.28702,\n            \"ISR\": 15.1863\n          },\n          \"drums\": {\n            \"SDR\": 8.99905,\n            \"SIR\": 16.9155,\n            \"SAR\": 10.0134,\n            \"ISR\": 17.2106\n          },\n          \"bass\": {\n            \"SDR\": 6.30738,\n            \"SIR\": 20.8858,\n            \"SAR\": 7.01856,\n            \"ISR\": 8.89273\n          },\n          \"other\": {\n            \"SDR\": 0.16473,\n            \"SIR\": -0.41634,\n            \"SAR\": 0.47168,\n            \"ISR\": 4.62325\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4024,\n            \"SIR\": 19.8252,\n            \"SAR\": 11.7951,\n            \"ISR\": 19.8887\n          },\n          \"drums\": {\n            \"SDR\": 9.82706,\n            \"SIR\": 20.2907,\n            \"SAR\": 10.7675,\n            \"ISR\": 15.9836\n          },\n          \"bass\": {\n            \"SDR\": 12.2277,\n            \"SIR\": 21.215,\n            \"SAR\": 11.9833,\n            \"ISR\": 17.5396\n          },\n          \"other\": {\n            \"SDR\": 7.83546,\n            \"SIR\": 13.0879,\n            \"SAR\": 7.86218,\n            \"ISR\": 13.9801\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.74169,\n            \"SIR\": 14.1201,\n            \"SAR\": 9.72683,\n            \"ISR\": 16.3828\n          },\n          \"drums\": {\n            \"SDR\": 9.63568,\n            \"SIR\": 18.6127,\n            \"SAR\": 11.0021,\n            \"ISR\": 14.2895\n          },\n          \"bass\": {\n            \"SDR\": 8.64882,\n            \"SIR\": 15.1677,\n            \"SAR\": 8.37584,\n            \"ISR\": 14.0595\n          },\n          \"other\": {\n            \"SDR\": 6.76545,\n            \"SIR\": 12.2364,\n            \"SAR\": 7.54473,\n            \"ISR\": 12.0405\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3752,\n            \"SIR\": 16.6912,\n            \"SAR\": 11.2973,\n            \"ISR\": 18.7731\n          },\n          \"drums\": {\n            \"SDR\": 9.92437,\n            \"SIR\": 19.1125,\n            \"SAR\": 10.9707,\n            \"ISR\": 15.7898\n          },\n          \"bass\": {\n            \"SDR\": 18.0335,\n            \"SIR\": 28.3402,\n            \"SAR\": 14.6807,\n            \"ISR\": 17.7093\n          },\n          \"other\": {\n            \"SDR\": 7.6504,\n            \"SIR\": 12.427,\n            \"SAR\": 7.33266,\n            \"ISR\": 13.8777\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.47528,\n            \"SIR\": 18.6816,\n            \"SAR\": 6.52616,\n            \"ISR\": 12.4384\n          },\n          \"drums\": {\n            \"SDR\": 9.27946,\n            \"SIR\": 17.0606,\n            \"SAR\": 9.8061,\n            \"ISR\": 16.807\n          },\n          \"bass\": {\n            \"SDR\": 17.2024,\n            \"SIR\": 28.3713,\n            \"SAR\": 15.8922,\n            \"ISR\": 19.6174\n          },\n          \"other\": {\n            \"SDR\": 7.28211,\n            \"SIR\": 10.9266,\n            \"SAR\": 7.11841,\n            \"ISR\": 13.9155\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.343,\n            \"SIR\": 27.1768,\n            \"SAR\": 11.8133,\n            \"ISR\": 20.8925\n          },\n          \"drums\": {\n            \"SDR\": 8.0797,\n            \"SIR\": 18.5335,\n            \"SAR\": 8.60186,\n            \"ISR\": 15.311\n          },\n          \"bass\": {\n            \"SDR\": 10.1394,\n            \"SIR\": 24.832,\n            \"SAR\": 9.70667,\n            \"ISR\": 13.4208\n          },\n          \"other\": {\n            \"SDR\": 11.4582,\n            \"SIR\": 17.4309,\n            \"SAR\": 11.9787,\n            \"ISR\": 21.244\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.166,\n            \"SIR\": 18.4076,\n            \"SAR\": 9.40416,\n            \"ISR\": 15.2365\n          },\n          \"drums\": {\n            \"SDR\": 9.99903,\n            \"SIR\": 18.2487,\n            \"SAR\": 10.1007,\n            \"ISR\": 16.4779\n          },\n          \"bass\": {\n            \"SDR\": 15.9573,\n            \"SIR\": 30.6784,\n            \"SAR\": 14.9032,\n            \"ISR\": 21.2961\n          },\n          \"other\": {\n            \"SDR\": 11.0211,\n            \"SIR\": 15.3572,\n            \"SAR\": 10.0456,\n            \"ISR\": 16.9093\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3994,\n            \"SIR\": 22.7286,\n            \"SAR\": 13.1524,\n            \"ISR\": 24.0039\n          },\n          \"drums\": {\n            \"SDR\": 10.4857,\n            \"SIR\": 16.1989,\n            \"SAR\": 10.5346,\n            \"ISR\": 18.7091\n          },\n          \"bass\": {\n            \"SDR\": 11.8582,\n            \"SIR\": 21.4109,\n            \"SAR\": 8.26932,\n            \"ISR\": 10.35\n          },\n          \"other\": {\n            \"SDR\": 9.60429,\n            \"SIR\": 12.8069,\n            \"SAR\": 8.63633,\n            \"ISR\": 18.5483\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.44,\n            \"SIR\": 18.3898,\n            \"SAR\": 10.5196,\n            \"ISR\": 19.6375\n          },\n          \"drums\": {\n            \"SDR\": 11.9998,\n            \"SIR\": 20.8655,\n            \"SAR\": 13.0245,\n            \"ISR\": 19.6137\n          },\n          \"bass\": {\n            \"SDR\": 10.7698,\n            \"SIR\": 19.1833,\n            \"SAR\": 11.2627,\n            \"ISR\": 18.7348\n          },\n          \"other\": {\n            \"SDR\": 6.34637,\n            \"SIR\": 13.0994,\n            \"SAR\": 6.5491,\n            \"ISR\": 11.8594\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.54255,\n            \"SIR\": 16.6517,\n            \"SAR\": 7.06569,\n            \"ISR\": 14.773\n          },\n          \"drums\": {\n            \"SDR\": 6.66935,\n            \"SIR\": 16.3642,\n            \"SAR\": 6.84015,\n            \"ISR\": 12.5482\n          },\n          \"bass\": {\n            \"SDR\": 14.5047,\n            \"SIR\": 23.776,\n            \"SAR\": 15.0442,\n            \"ISR\": 23.5945\n          },\n          \"other\": {\n            \"SDR\": 10.7118,\n            \"SIR\": 17.9435,\n            \"SAR\": 9.71123,\n            \"ISR\": 15.856\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.166,\n        \"SIR\": 18.3898,\n        \"SAR\": 9.72683,\n        \"ISR\": 16.3828\n      },\n      \"drums\": {\n        \"SDR\": 9.63568,\n        \"SIR\": 18.5335,\n        \"SAR\": 10.1224,\n        \"ISR\": 15.9836\n      },\n      \"bass\": {\n        \"SDR\": 12.2277,\n        \"SIR\": 21.99,\n        \"SAR\": 11.9833,\n        \"ISR\": 17.5396\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"drums\",\n      \"bass\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"htdemucs_6s.yaml\": {\n    \"model_name\": \"Demucs v4: htdemucs_6s\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.0671,\n            \"SIR\": 14.5366,\n            \"SAR\": 5.39175,\n            \"ISR\": 13.2148\n          },\n          \"drums\": {\n            \"SDR\": 2.79582,\n            \"SIR\": 16.3173,\n            \"SAR\": 1.06495,\n            \"ISR\": 4.3619\n          },\n          \"bass\": {\n            \"SDR\": 17.5429,\n            \"SIR\": 22.165,\n            \"SAR\": 19.0201,\n            \"ISR\": 25.5315\n          },\n          \"other\": {\n            \"SDR\": 0.00485,\n            \"SIR\": 12.9123,\n            \"SAR\": 0.02412,\n            \"ISR\": 0.13107\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.63418,\n            \"SIR\": 10.3112,\n            \"SAR\": 6.45533,\n            \"ISR\": 14.8126\n          },\n          \"drums\": {\n            \"SDR\": 6.25735,\n            \"SIR\": 12.8745,\n            \"SAR\": 6.40789,\n            \"ISR\": 12.4119\n          },\n          \"bass\": {\n            \"SDR\": 9.5516,\n            \"SIR\": 17.7919,\n            \"SAR\": 11.9827,\n            \"ISR\": 11.8647\n          },\n          \"other\": {\n            \"SDR\": 3.64451,\n            \"SIR\": 10.5018,\n            \"SAR\": 4.14296,\n            \"ISR\": 6.22975\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.70294,\n            \"SIR\": 15.0099,\n            \"SAR\": 9.54043,\n            \"ISR\": 19.8526\n          },\n          \"drums\": {\n            \"SDR\": 7.11087,\n            \"SIR\": 11.8272,\n            \"SAR\": 7.93243,\n            \"ISR\": 15.9744\n          },\n          \"bass\": {\n            \"SDR\": 1.74278,\n            \"SIR\": 3.81685,\n            \"SAR\": 8.21649,\n            \"ISR\": 13.9148\n          },\n          \"other\": {\n            \"SDR\": 0.13952,\n            \"SIR\": 7.06959,\n            \"SAR\": -1.65524,\n            \"ISR\": 0.40887\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.08539,\n            \"SIR\": 4.85948,\n            \"SAR\": 5.63099,\n            \"ISR\": 15.7776\n          },\n          \"drums\": {\n            \"SDR\": 9.07883,\n            \"SIR\": 17.2557,\n            \"SAR\": 10.2048,\n            \"ISR\": 16.9867\n          },\n          \"bass\": {\n            \"SDR\": 7.49909,\n            \"SIR\": 22.7875,\n            \"SAR\": 7.81107,\n            \"ISR\": 9.45955\n          },\n          \"other\": {\n            \"SDR\": 0.00127,\n            \"SIR\": 8.74872,\n            \"SAR\": -0.0027,\n            \"ISR\": 0.06903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7929,\n            \"SIR\": 19.1612,\n            \"SAR\": 10.36,\n            \"ISR\": 15.8442\n          },\n          \"drums\": {\n            \"SDR\": 8.5171,\n            \"SIR\": 18.4964,\n            \"SAR\": 9.6957,\n            \"ISR\": 14.3901\n          },\n          \"bass\": {\n            \"SDR\": 10.9652,\n            \"SIR\": 19.6768,\n            \"SAR\": 10.7187,\n            \"ISR\": 17.096\n          },\n          \"other\": {\n            \"SDR\": 0.00125,\n            \"SIR\": -5.48266,\n            \"SAR\": -0.00771,\n            \"ISR\": 0.28972\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.20336,\n            \"SIR\": 14.534,\n            \"SAR\": 8.71192,\n            \"ISR\": 14.5608\n          },\n          \"drums\": {\n            \"SDR\": 8.66795,\n            \"SIR\": 16.6126,\n            \"SAR\": 10.1273,\n            \"ISR\": 13.5402\n          },\n          \"bass\": {\n            \"SDR\": 7.20517,\n            \"SIR\": 6.83853,\n            \"SAR\": 6.68404,\n            \"ISR\": 15.6228\n          },\n          \"other\": {\n            \"SDR\": 0.02392,\n            \"SIR\": 7.57753,\n            \"SAR\": -0.00311,\n            \"ISR\": 0.30841\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.43828,\n            \"SIR\": 16.808,\n            \"SAR\": 10.1425,\n            \"ISR\": 15.452\n          },\n          \"drums\": {\n            \"SDR\": 8.94208,\n            \"SIR\": 17.6703,\n            \"SAR\": 10.1135,\n            \"ISR\": 14.4929\n          },\n          \"bass\": {\n            \"SDR\": 15.8599,\n            \"SIR\": 18.764,\n            \"SAR\": 14.2832,\n            \"ISR\": 18.077\n          },\n          \"other\": {\n            \"SDR\": 8e-05,\n            \"SIR\": -14.1842,\n            \"SAR\": -7.86712,\n            \"ISR\": 0.00216\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.30089,\n            \"SIR\": 18.2651,\n            \"SAR\": 6.41398,\n            \"ISR\": 12.5593\n          },\n          \"drums\": {\n            \"SDR\": 6.13333,\n            \"SIR\": 20.9043,\n            \"SAR\": 7.60224,\n            \"ISR\": 9.38209\n          },\n          \"bass\": {\n            \"SDR\": 15.978,\n            \"SIR\": 28.2875,\n            \"SAR\": 14.5916,\n            \"ISR\": 17.9402\n          },\n          \"other\": {\n            \"SDR\": 0.111825,\n            \"SIR\": 4.44369,\n            \"SAR\": -0.20347,\n            \"ISR\": 0.3609\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2096,\n            \"SIR\": 24.5018,\n            \"SAR\": 11.5893,\n            \"ISR\": 20.4149\n          },\n          \"drums\": {\n            \"SDR\": 7.6472,\n            \"SIR\": 19.1537,\n            \"SAR\": 7.98651,\n            \"ISR\": 14.0588\n          },\n          \"bass\": {\n            \"SDR\": 10.0312,\n            \"SIR\": 22.72,\n            \"SAR\": 9.88463,\n            \"ISR\": 13.9168\n          },\n          \"other\": {\n            \"SDR\": 0.0433,\n            \"SIR\": 16.9533,\n            \"SAR\": -0.00019,\n            \"ISR\": 0.27942\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.7852,\n            \"SIR\": 16.1949,\n            \"SAR\": 8.14363,\n            \"ISR\": 13.5847\n          },\n          \"drums\": {\n            \"SDR\": 8.81943,\n            \"SIR\": 18.9322,\n            \"SAR\": 8.76285,\n            \"ISR\": 11.9054\n          },\n          \"bass\": {\n            \"SDR\": 14.3802,\n            \"SIR\": 25.3279,\n            \"SAR\": 13.4473,\n            \"ISR\": 19.752\n          },\n          \"other\": {\n            \"SDR\": 0.00093,\n            \"SIR\": 9.72642,\n            \"SAR\": 0.00346,\n            \"ISR\": 0.78442\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7892,\n            \"SIR\": 22.1083,\n            \"SAR\": 12.5385,\n            \"ISR\": 23.5863\n          },\n          \"drums\": {\n            \"SDR\": 8.42774,\n            \"SIR\": 13.9938,\n            \"SAR\": 7.77331,\n            \"ISR\": 13.3034\n          },\n          \"bass\": {\n            \"SDR\": 10.1646,\n            \"SIR\": 21.423,\n            \"SAR\": 6.79117,\n            \"ISR\": 7.58549\n          },\n          \"other\": {\n            \"SDR\": 1.10106,\n            \"SIR\": 4.73975,\n            \"SAR\": 0.40972,\n            \"ISR\": 2.0314\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78957,\n            \"SIR\": 18.8723,\n            \"SAR\": 9.74884,\n            \"ISR\": 17.129\n          },\n          \"drums\": {\n            \"SDR\": 11.2418,\n            \"SIR\": 20.2537,\n            \"SAR\": 12.5271,\n            \"ISR\": 17.513\n          },\n          \"bass\": {\n            \"SDR\": 9.24887,\n            \"SIR\": 15.5493,\n            \"SAR\": 10.9884,\n            \"ISR\": 15.6575\n          },\n          \"other\": {\n            \"SDR\": 0.01212,\n            \"SIR\": 7.35532,\n            \"SAR\": -0.150605,\n            \"ISR\": 0.14977\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.57061,\n        \"SIR\": 16.5014,\n        \"SAR\": 9.12617,\n        \"ISR\": 15.6148\n      },\n      \"drums\": {\n        \"SDR\": 8.47242,\n        \"SIR\": 17.463,\n        \"SAR\": 8.37468,\n        \"ISR\": 13.7995\n      },\n      \"bass\": {\n        \"SDR\": 10.0979,\n        \"SIR\": 20.5499,\n        \"SAR\": 10.8536,\n        \"ISR\": 15.6402\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"drums\",\n      \"bass\",\n      \"guitar\",\n      \"piano\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"MDX23C-8KFFT-InstVoc_HQ.ckpt\": {\n    \"model_name\": \"MDX23C Model: MDX23C-InstVoc HQ\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.73898,\n            \"SIR\": 23.2009,\n            \"SAR\": 6.78415,\n            \"ISR\": 11.0084\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1121,\n            \"SIR\": 25.9275,\n            \"SAR\": 20.4969,\n            \"ISR\": 19.2533\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.38292,\n            \"SIR\": 14.01,\n            \"SAR\": 7.95108,\n            \"ISR\": 12.7761\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5347,\n            \"SIR\": 22.1489,\n            \"SAR\": 14.6948,\n            \"ISR\": 15.648\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3468,\n            \"SIR\": 23.873,\n            \"SAR\": 12.0537,\n            \"ISR\": 15.6858\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3686,\n            \"SIR\": 27.8991,\n            \"SAR\": 19.2361,\n            \"ISR\": 18.5818\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.87827,\n            \"SIR\": 5.4187,\n            \"SAR\": 4.89509,\n            \"ISR\": 13.4183\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1203,\n            \"SIR\": 27.1057,\n            \"SAR\": 13.5972,\n            \"ISR\": 13.1595\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3779,\n            \"SIR\": 25.2843,\n            \"SAR\": 14.5687,\n            \"ISR\": 16.7121\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3063,\n            \"SIR\": 26.1264,\n            \"SAR\": 16.6579,\n            \"ISR\": 17.8816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6646,\n            \"SIR\": 19.9085,\n            \"SAR\": 11.483,\n            \"ISR\": 15.2706\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5504,\n            \"SIR\": 23.0013,\n            \"SAR\": 14.0207,\n            \"ISR\": 16.3965\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.554,\n            \"SIR\": 22.1606,\n            \"SAR\": 13.7291,\n            \"ISR\": 15.7148\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7628,\n            \"SIR\": 25.629,\n            \"SAR\": 18.1076,\n            \"ISR\": 18.2059\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.62088,\n            \"SIR\": 23.0322,\n            \"SAR\": 7.45352,\n            \"ISR\": 11.5589\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5467,\n            \"SIR\": 30.4189,\n            \"SAR\": 25.3794,\n            \"ISR\": 19.5884\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3471,\n            \"SIR\": 33.401,\n            \"SAR\": 14.501,\n            \"ISR\": 17.2427\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2384,\n            \"SIR\": 29.2133,\n            \"SAR\": 18.9258,\n            \"ISR\": 19.2428\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3743,\n            \"SIR\": 22.9828,\n            \"SAR\": 12.1323,\n            \"ISR\": 13.2697\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9206,\n            \"SIR\": 20.6224,\n            \"SAR\": 17.3149,\n            \"ISR\": 18.0976\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.4204,\n            \"SIR\": 35.9524,\n            \"SAR\": 17.5186,\n            \"ISR\": 18.4415\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2411,\n            \"SIR\": 32.3095,\n            \"SAR\": 20.8858,\n            \"ISR\": 19.4741\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2009,\n            \"SIR\": 23.8371,\n            \"SAR\": 12.8101,\n            \"ISR\": 16.0466\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0736,\n            \"SIR\": 27.8804,\n            \"SAR\": 18.726,\n            \"ISR\": 18.6299\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.97006,\n            \"SIR\": 20.0496,\n            \"SAR\": 8.53451,\n            \"ISR\": 13.8451\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7593,\n            \"SIR\": 27.0842,\n            \"SAR\": 18.1094,\n            \"ISR\": 17.9407\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.74997,\n            \"SIR\": 15.0591,\n            \"SAR\": 9.86238,\n            \"ISR\": 13.8809\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1608,\n            \"SIR\": 25.7753,\n            \"SAR\": 20.1306,\n            \"ISR\": 17.644\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.07479,\n            \"SIR\": 19.0069,\n            \"SAR\": 4.37509,\n            \"ISR\": 9.54885\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8448,\n            \"SIR\": 22.8352,\n            \"SAR\": 19.1224,\n            \"ISR\": 18.894\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4556,\n            \"SIR\": 22.0348,\n            \"SAR\": 11.386,\n            \"ISR\": 15.5782\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1515,\n            \"SIR\": 26.6208,\n            \"SAR\": 17.2912,\n            \"ISR\": 18.0243\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.46556,\n            \"SIR\": 25.8202,\n            \"SAR\": 9.94502,\n            \"ISR\": 13.9694\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8138,\n            \"SIR\": 25.5767,\n            \"SAR\": 18.4636,\n            \"ISR\": 18.9653\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -12.6087,\n            \"SIR\": -36.9833,\n            \"SAR\": 1.50542,\n            \"ISR\": 12.5769\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7282,\n            \"SIR\": 58.4948,\n            \"SAR\": 13.4711,\n            \"ISR\": 12.5155\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.88428,\n            \"SIR\": 17.5921,\n            \"SAR\": 8.1219,\n            \"ISR\": 11.6351\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9194,\n            \"SIR\": 23.1384,\n            \"SAR\": 19.3188,\n            \"ISR\": 18.2253\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0926,\n            \"SIR\": 25.8245,\n            \"SAR\": 12.7132,\n            \"ISR\": 16.077\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5513,\n            \"SIR\": 28.9507,\n            \"SAR\": 19.4746,\n            \"ISR\": 18.9786\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9115,\n            \"SIR\": 25.6961,\n            \"SAR\": 13.2219,\n            \"ISR\": 16.6918\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1611,\n            \"SIR\": 33.2309,\n            \"SAR\": 23.0705,\n            \"ISR\": 19.3022\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.16416,\n            \"SIR\": 4.4042,\n            \"SAR\": 0.05471,\n            \"ISR\": 4.38211\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8397,\n            \"SIR\": 41.5509,\n            \"SAR\": 35.7546,\n            \"ISR\": 19.4909\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.28362,\n            \"SIR\": 15.3849,\n            \"SAR\": 4.09141,\n            \"ISR\": 6.73268\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9246,\n            \"SIR\": 14.539,\n            \"SAR\": 14.6647,\n            \"ISR\": 17.3386\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.54968,\n            \"SIR\": 18.2247,\n            \"SAR\": 8.70481,\n            \"ISR\": 14.1121\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6522,\n            \"SIR\": 31.2048,\n            \"SAR\": 21.3056,\n            \"ISR\": 18.8564\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.01052,\n            \"SIR\": 19.3712,\n            \"SAR\": 8.33947,\n            \"ISR\": 11.8011\n          },\n          \"instrumental\": {\n            \"SDR\": 14.824,\n            \"SIR\": 22.7367,\n            \"SAR\": 17.6442,\n            \"ISR\": 18.402\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.41012,\n            \"SIR\": 18.1949,\n            \"SAR\": 9.70601,\n            \"ISR\": 14.8742\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8248,\n            \"SIR\": 26.6713,\n            \"SAR\": 17.9028,\n            \"ISR\": 17.6604\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6009,\n            \"SIR\": 26.7887,\n            \"SAR\": 13.3081,\n            \"ISR\": 15.0671\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0958,\n            \"SIR\": 22.5415,\n            \"SAR\": 15.7374,\n            \"ISR\": 18.089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.15313,\n            \"SIR\": 17.8866,\n            \"SAR\": 6.74711,\n            \"ISR\": 11.1515\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4603,\n            \"SIR\": 18.8979,\n            \"SAR\": 14.6241,\n            \"ISR\": 16.7606\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2043,\n            \"SIR\": 26.191,\n            \"SAR\": 11.8918,\n            \"ISR\": 16.3238\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5086,\n            \"SIR\": 27.4539,\n            \"SAR\": 18.1744,\n            \"ISR\": 18.2449\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.2326,\n            \"SIR\": 31.5087,\n            \"SAR\": 15.5602,\n            \"ISR\": 16.8085\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6703,\n            \"SIR\": 21.9884,\n            \"SAR\": 16.0163,\n            \"ISR\": 18.9873\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0955,\n            \"SIR\": 25.9309,\n            \"SAR\": 12.9834,\n            \"ISR\": 15.9593\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8437,\n            \"SIR\": 24.8055,\n            \"SAR\": 16.4884,\n            \"ISR\": 18.1214\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7046,\n            \"SIR\": 33.2659,\n            \"SAR\": 12.3268,\n            \"ISR\": 13.4\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9052,\n            \"SIR\": 24.7932,\n            \"SAR\": 22.0302,\n            \"ISR\": 19.7173\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.42777,\n            \"SIR\": 11.5231,\n            \"SAR\": 8.26833,\n            \"ISR\": 9.98201\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6014,\n            \"SIR\": 16.8602,\n            \"SAR\": 15.9387,\n            \"ISR\": 16.5381\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5795,\n            \"SIR\": 27.7323,\n            \"SAR\": 14.8579,\n            \"ISR\": 17.9123\n          },\n          \"instrumental\": {\n            \"SDR\": 18.8312,\n            \"SIR\": 36.546,\n            \"SAR\": 25.3191,\n            \"ISR\": 19.4355\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0573,\n            \"SIR\": 20.2084,\n            \"SAR\": 12.477,\n            \"ISR\": 16.8164\n          },\n          \"instrumental\": {\n            \"SDR\": 11.984,\n            \"SIR\": 23.4887,\n            \"SAR\": 12.7893,\n            \"ISR\": 15.024\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.11092,\n            \"SIR\": 24.5388,\n            \"SAR\": 9.48577,\n            \"ISR\": 14.2664\n          },\n          \"instrumental\": {\n            \"SDR\": 14.749,\n            \"SIR\": 24.6584,\n            \"SAR\": 16.3334,\n            \"ISR\": 18.2321\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.59973,\n            \"SIR\": 6.31592,\n            \"SAR\": 2.70559,\n            \"ISR\": 10.9212\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4726,\n            \"SIR\": 32.1754,\n            \"SAR\": 22.7474,\n            \"ISR\": 18.0809\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6567,\n            \"SIR\": 26.1735,\n            \"SAR\": 11.4635,\n            \"ISR\": 15.9297\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3897,\n            \"SIR\": 31.1462,\n            \"SAR\": 21.1957,\n            \"ISR\": 19.0793\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5711,\n            \"SIR\": 30.6399,\n            \"SAR\": 13.223,\n            \"ISR\": 16.8552\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3343,\n            \"SIR\": 34.1548,\n            \"SAR\": 23.5703,\n            \"ISR\": 19.4742\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0289,\n            \"SIR\": 31.4403,\n            \"SAR\": 16.3209,\n            \"ISR\": 17.6846\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9739,\n            \"SIR\": 30.3621,\n            \"SAR\": 20.016,\n            \"ISR\": 18.994\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.5562,\n        \"SIR\": 23.0075,\n        \"SAR\": 11.4247,\n        \"ISR\": 14.5703\n      },\n      \"instrumental\": {\n        \"SDR\": 15.8348,\n        \"SIR\": 26.3736,\n        \"SAR\": 18.319,\n        \"ISR\": 18.2385\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": null\n  },\n  \"MDX23C_D1581.ckpt\": {\n    \"model_name\": \"MDX23C Model VIP: MDX23C_D1581\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.95222,\n            \"SIR\": 20.5681,\n            \"SAR\": 6.71226,\n            \"ISR\": 10.7332\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8928,\n            \"SIR\": 25.3018,\n            \"SAR\": 20.6132,\n            \"ISR\": 19.1445\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0582,\n            \"SIR\": 10.5784,\n            \"SAR\": 6.98246,\n            \"ISR\": 12.6513\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9396,\n            \"SIR\": 21.5384,\n            \"SAR\": 12.9976,\n            \"ISR\": 13.8444\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4242,\n            \"SIR\": 22.6483,\n            \"SAR\": 11.1974,\n            \"ISR\": 14.7769\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8795,\n            \"SIR\": 26.0687,\n            \"SAR\": 18.6638,\n            \"ISR\": 18.3732\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.72703,\n            \"SIR\": 5.84843,\n            \"SAR\": 4.50749,\n            \"ISR\": 12.5716\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2627,\n            \"SIR\": 25.1382,\n            \"SAR\": 13.773,\n            \"ISR\": 13.5577\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.749,\n            \"SIR\": 23.4798,\n            \"SAR\": 13.9222,\n            \"ISR\": 16.4496\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6657,\n            \"SIR\": 25.2511,\n            \"SAR\": 16.0408,\n            \"ISR\": 17.4514\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.05,\n            \"SIR\": 18.9015,\n            \"SAR\": 11.1099,\n            \"ISR\": 14.4769\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0495,\n            \"SIR\": 20.9268,\n            \"SAR\": 13.8577,\n            \"ISR\": 15.9558\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6556,\n            \"SIR\": 20.3714,\n            \"SAR\": 13.1095,\n            \"ISR\": 14.8805\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3011,\n            \"SIR\": 23.7043,\n            \"SAR\": 17.9684,\n            \"ISR\": 17.9062\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.68583,\n            \"SIR\": 22.1936,\n            \"SAR\": 6.83291,\n            \"ISR\": 10.5457\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3252,\n            \"SIR\": 29.4822,\n            \"SAR\": 25.2045,\n            \"ISR\": 19.493\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4956,\n            \"SIR\": 31.6306,\n            \"SAR\": 13.3118,\n            \"ISR\": 16.4537\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4926,\n            \"SIR\": 27.3243,\n            \"SAR\": 17.878,\n            \"ISR\": 19.0089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5858,\n            \"SIR\": 22.691,\n            \"SAR\": 11.3572,\n            \"ISR\": 12.6234\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3815,\n            \"SIR\": 19.7589,\n            \"SAR\": 16.6567,\n            \"ISR\": 17.9925\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.8029,\n            \"SIR\": 32.5823,\n            \"SAR\": 16.8474,\n            \"ISR\": 17.8622\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7483,\n            \"SIR\": 30.1575,\n            \"SAR\": 19.8694,\n            \"ISR\": 19.2232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6875,\n            \"SIR\": 19.4326,\n            \"SAR\": 12.2091,\n            \"ISR\": 15.9124\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4387,\n            \"SIR\": 27.8999,\n            \"SAR\": 17.7655,\n            \"ISR\": 17.6928\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95235,\n            \"SIR\": 20.2625,\n            \"SAR\": 8.44388,\n            \"ISR\": 13.6896\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7267,\n            \"SIR\": 26.6715,\n            \"SAR\": 18.1927,\n            \"ISR\": 17.9964\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.99275,\n            \"SIR\": 13.559,\n            \"SAR\": 8.99798,\n            \"ISR\": 13.594\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7164,\n            \"SIR\": 25.4359,\n            \"SAR\": 18.6415,\n            \"ISR\": 17.0244\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.96877,\n            \"SIR\": 16.4629,\n            \"SAR\": 3.95051,\n            \"ISR\": 9.61576\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1124,\n            \"SIR\": 22.7617,\n            \"SAR\": 18.7659,\n            \"ISR\": 18.4551\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.025,\n            \"SIR\": 21.4115,\n            \"SAR\": 10.8718,\n            \"ISR\": 14.7403\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7134,\n            \"SIR\": 24.7548,\n            \"SAR\": 16.9224,\n            \"ISR\": 17.9557\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.76723,\n            \"SIR\": 23.8811,\n            \"SAR\": 9.06669,\n            \"ISR\": 13.0727\n          },\n          \"instrumental\": {\n            \"SDR\": 15.495,\n            \"SIR\": 24.0629,\n            \"SAR\": 18.2234,\n            \"ISR\": 18.7304\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -6.95664,\n            \"SIR\": -34.2185,\n            \"SAR\": 1.15597,\n            \"ISR\": 7.36254\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0112,\n            \"SIR\": 56.6464,\n            \"SAR\": 14.9971,\n            \"ISR\": 14.0802\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.44438,\n            \"SIR\": 15.5031,\n            \"SAR\": 7.35615,\n            \"ISR\": 11.1522\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6152,\n            \"SIR\": 21.9035,\n            \"SAR\": 18.8112,\n            \"ISR\": 17.6714\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.828,\n            \"SIR\": 25.2884,\n            \"SAR\": 12.3884,\n            \"ISR\": 15.7958\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4307,\n            \"SIR\": 28.6347,\n            \"SAR\": 19.2186,\n            \"ISR\": 18.8714\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4599,\n            \"SIR\": 25.4414,\n            \"SAR\": 12.4564,\n            \"ISR\": 16.0694\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0562,\n            \"SIR\": 31.6105,\n            \"SAR\": 22.5894,\n            \"ISR\": 19.2663\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.32329,\n            \"SIR\": -0.933135,\n            \"SAR\": 0.30699,\n            \"ISR\": 3.58862\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8396,\n            \"SIR\": 44.9643,\n            \"SAR\": 36.1532,\n            \"ISR\": 19.4394\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.1447,\n            \"SIR\": 14.8254,\n            \"SAR\": 4.01271,\n            \"ISR\": 6.4612\n          },\n          \"instrumental\": {\n            \"SDR\": 10.7491,\n            \"SIR\": 14.1851,\n            \"SAR\": 14.741,\n            \"ISR\": 17.2247\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.22039,\n            \"SIR\": 18.572,\n            \"SAR\": 8.36052,\n            \"ISR\": 14.0024\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7028,\n            \"SIR\": 30.4091,\n            \"SAR\": 21.4053,\n            \"ISR\": 18.8582\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.63389,\n            \"SIR\": 18.5149,\n            \"SAR\": 8.08097,\n            \"ISR\": 11.3776\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6081,\n            \"SIR\": 22.1313,\n            \"SAR\": 17.6455,\n            \"ISR\": 18.2681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.9865,\n            \"SIR\": 17.2055,\n            \"SAR\": 9.47226,\n            \"ISR\": 14.4632\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4311,\n            \"SIR\": 25.9314,\n            \"SAR\": 17.5717,\n            \"ISR\": 17.3099\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.179,\n            \"SIR\": 24.6598,\n            \"SAR\": 12.8949,\n            \"ISR\": 15.3986\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7093,\n            \"SIR\": 23.3597,\n            \"SAR\": 15.2359,\n            \"ISR\": 17.5274\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.78009,\n            \"SIR\": 14.4834,\n            \"SAR\": 6.68291,\n            \"ISR\": 11.2418\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8848,\n            \"SIR\": 18.9336,\n            \"SAR\": 14.1515,\n            \"ISR\": 15.5465\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8276,\n            \"SIR\": 25.2771,\n            \"SAR\": 11.3398,\n            \"ISR\": 15.6577\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3617,\n            \"SIR\": 26.368,\n            \"SAR\": 17.7301,\n            \"ISR\": 18.2811\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.8442,\n            \"SIR\": 30.7545,\n            \"SAR\": 15.5245,\n            \"ISR\": 16.4158\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4201,\n            \"SIR\": 20.9387,\n            \"SAR\": 15.7531,\n            \"ISR\": 18.8383\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7526,\n            \"SIR\": 24.875,\n            \"SAR\": 12.3944,\n            \"ISR\": 15.3059\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5063,\n            \"SIR\": 23.179,\n            \"SAR\": 16.2281,\n            \"ISR\": 17.8876\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4219,\n            \"SIR\": 27.4415,\n            \"SAR\": 12.8031,\n            \"ISR\": 14.4542\n          },\n          \"instrumental\": {\n            \"SDR\": 17.505,\n            \"SIR\": 26.7677,\n            \"SAR\": 22.1047,\n            \"ISR\": 19.3817\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.5368,\n            \"SIR\": 10.2714,\n            \"SAR\": 7.90839,\n            \"ISR\": 9.12476\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8417,\n            \"SIR\": 16.49,\n            \"SAR\": 15.5923,\n            \"ISR\": 16.1206\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4838,\n            \"SIR\": 25.397,\n            \"SAR\": 13.3905,\n            \"ISR\": 17.4396\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4647,\n            \"SIR\": 34.5499,\n            \"SAR\": 24.0095,\n            \"ISR\": 19.1439\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9583,\n            \"SIR\": 19.614,\n            \"SAR\": 12.5138,\n            \"ISR\": 16.6528\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9676,\n            \"SIR\": 23.0671,\n            \"SAR\": 12.5528,\n            \"ISR\": 14.7855\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.01237,\n            \"SIR\": 22.0877,\n            \"SAR\": 9.3567,\n            \"ISR\": 13.9478\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3069,\n            \"SIR\": 23.9475,\n            \"SAR\": 15.9733,\n            \"ISR\": 17.7868\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.74404,\n            \"SIR\": 8.63995,\n            \"SAR\": 2.82224,\n            \"ISR\": 9.08972\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4195,\n            \"SIR\": 30.6863,\n            \"SAR\": 21.0732,\n            \"ISR\": 18.0281\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3567,\n            \"SIR\": 25.2379,\n            \"SAR\": 10.996,\n            \"ISR\": 15.2968\n          },\n          \"instrumental\": {\n            \"SDR\": 17.123,\n            \"SIR\": 29.7162,\n            \"SAR\": 20.7122,\n            \"ISR\": 18.9704\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8574,\n            \"SIR\": 28.0471,\n            \"SAR\": 12.6925,\n            \"ISR\": 16.1998\n          },\n          \"instrumental\": {\n            \"SDR\": 18.159,\n            \"SIR\": 31.9184,\n            \"SAR\": 23.0412,\n            \"ISR\": 19.3049\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5525,\n            \"SIR\": 29.6751,\n            \"SAR\": 15.4178,\n            \"ISR\": 15.9442\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1069,\n            \"SIR\": 25.0944,\n            \"SAR\": 18.9055,\n            \"ISR\": 18.7154\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.0375,\n        \"SIR\": 20.9898,\n        \"SAR\": 10.9339,\n        \"ISR\": 14.4587\n      },\n      \"instrumental\": {\n        \"SDR\": 15.4657,\n        \"SIR\": 25.2765,\n        \"SAR\": 17.9232,\n        \"ISR\": 18.0123\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": null\n  },\n  \"MDX23C-8KFFT-InstVoc_HQ_2.ckpt\": {\n    \"model_name\": \"MDX23C Model VIP: MDX23C-InstVoc HQ 2\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.87617,\n            \"SIR\": 23.4542,\n            \"SAR\": 6.81798,\n            \"ISR\": 11.2126\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2403,\n            \"SIR\": 26.108,\n            \"SAR\": 20.4428,\n            \"ISR\": 19.4813\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.42785,\n            \"SIR\": 14.0147,\n            \"SAR\": 8.01004,\n            \"ISR\": 12.9014\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6142,\n            \"SIR\": 22.3347,\n            \"SAR\": 14.5291,\n            \"ISR\": 15.5849\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2569,\n            \"SIR\": 24.3811,\n            \"SAR\": 11.9624,\n            \"ISR\": 15.7206\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4526,\n            \"SIR\": 27.9357,\n            \"SAR\": 19.1456,\n            \"ISR\": 18.7891\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.93309,\n            \"SIR\": 5.72764,\n            \"SAR\": 4.73445,\n            \"ISR\": 13.4099\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3293,\n            \"SIR\": 26.9339,\n            \"SAR\": 13.6316,\n            \"ISR\": 13.4165\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.407,\n            \"SIR\": 25.4063,\n            \"SAR\": 14.4249,\n            \"ISR\": 16.855\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4401,\n            \"SIR\": 26.414,\n            \"SAR\": 16.6025,\n            \"ISR\": 18.0079\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6637,\n            \"SIR\": 19.8194,\n            \"SAR\": 11.4469,\n            \"ISR\": 15.3601\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5679,\n            \"SIR\": 23.1608,\n            \"SAR\": 13.9941,\n            \"ISR\": 16.4697\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4933,\n            \"SIR\": 22.1722,\n            \"SAR\": 13.6547,\n            \"ISR\": 15.7551\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8546,\n            \"SIR\": 25.6608,\n            \"SAR\": 18.0355,\n            \"ISR\": 18.3799\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.59963,\n            \"SIR\": 23.254,\n            \"SAR\": 7.12092,\n            \"ISR\": 11.4893\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7756,\n            \"SIR\": 30.5474,\n            \"SAR\": 25.3609,\n            \"ISR\": 19.8576\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3131,\n            \"SIR\": 34.494,\n            \"SAR\": 14.2323,\n            \"ISR\": 17.2401\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2552,\n            \"SIR\": 29.1474,\n            \"SAR\": 18.729,\n            \"ISR\": 19.2638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3367,\n            \"SIR\": 22.7857,\n            \"SAR\": 11.9941,\n            \"ISR\": 13.3302\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0059,\n            \"SIR\": 20.9584,\n            \"SAR\": 17.2229,\n            \"ISR\": 18.1922\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.3472,\n            \"SIR\": 37.2152,\n            \"SAR\": 17.3105,\n            \"ISR\": 18.4819\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3685,\n            \"SIR\": 32.115,\n            \"SAR\": 20.7553,\n            \"ISR\": 19.7369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1742,\n            \"SIR\": 24.1013,\n            \"SAR\": 12.7593,\n            \"ISR\": 16.196\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1898,\n            \"SIR\": 28.2734,\n            \"SAR\": 18.721,\n            \"ISR\": 18.8564\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.97711,\n            \"SIR\": 20.0952,\n            \"SAR\": 8.48382,\n            \"ISR\": 13.8196\n          },\n          \"instrumental\": {\n            \"SDR\": 15.811,\n            \"SIR\": 27.1513,\n            \"SAR\": 18.0403,\n            \"ISR\": 18.121\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.69804,\n            \"SIR\": 14.3328,\n            \"SAR\": 9.44229,\n            \"ISR\": 13.789\n          },\n          \"instrumental\": {\n            \"SDR\": 16.823,\n            \"SIR\": 25.6612,\n            \"SAR\": 19.4836,\n            \"ISR\": 17.5791\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.054,\n            \"SIR\": 19.4671,\n            \"SAR\": 4.28294,\n            \"ISR\": 9.61485\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9157,\n            \"SIR\": 22.9569,\n            \"SAR\": 18.9955,\n            \"ISR\": 19.1073\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4036,\n            \"SIR\": 22.2632,\n            \"SAR\": 11.3155,\n            \"ISR\": 15.5475\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2246,\n            \"SIR\": 26.6246,\n            \"SAR\": 17.2777,\n            \"ISR\": 18.2326\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.42633,\n            \"SIR\": 26.8586,\n            \"SAR\": 9.84564,\n            \"ISR\": 13.9788\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9222,\n            \"SIR\": 25.6412,\n            \"SAR\": 18.4247,\n            \"ISR\": 19.2462\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -11.9921,\n            \"SIR\": -36.9813,\n            \"SAR\": 1.24128,\n            \"ISR\": 12.6392\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4817,\n            \"SIR\": 58.1676,\n            \"SAR\": 13.2689,\n            \"ISR\": 12.437\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.86148,\n            \"SIR\": 17.6597,\n            \"SAR\": 8.0598,\n            \"ISR\": 11.6707\n          },\n          \"instrumental\": {\n            \"SDR\": 17.067,\n            \"SIR\": 23.1786,\n            \"SAR\": 19.3498,\n            \"ISR\": 18.3737\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0482,\n            \"SIR\": 26.1427,\n            \"SAR\": 12.5525,\n            \"ISR\": 16.0693\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4737,\n            \"SIR\": 28.9109,\n            \"SAR\": 19.4375,\n            \"ISR\": 18.9123\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7839,\n            \"SIR\": 26.0298,\n            \"SAR\": 13.0914,\n            \"ISR\": 16.7017\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3767,\n            \"SIR\": 33.2707,\n            \"SAR\": 22.8728,\n            \"ISR\": 19.6077\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.21578,\n            \"SIR\": 5.00571,\n            \"SAR\": -0.03989,\n            \"ISR\": 4.55859\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8563,\n            \"SIR\": 41.7901,\n            \"SAR\": 35.1394,\n            \"ISR\": 19.4303\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.31119,\n            \"SIR\": 15.7484,\n            \"SAR\": 4.03378,\n            \"ISR\": 6.76134\n          },\n          \"instrumental\": {\n            \"SDR\": 10.9604,\n            \"SIR\": 14.6113,\n            \"SAR\": 14.6321,\n            \"ISR\": 17.55\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.42907,\n            \"SIR\": 18.0411,\n            \"SAR\": 8.35173,\n            \"ISR\": 14.2391\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7597,\n            \"SIR\": 31.1492,\n            \"SAR\": 21.0164,\n            \"ISR\": 19.0249\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04496,\n            \"SIR\": 19.5588,\n            \"SAR\": 8.33975,\n            \"ISR\": 11.8428\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8643,\n            \"SIR\": 22.8496,\n            \"SAR\": 17.5905,\n            \"ISR\": 18.6339\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.34298,\n            \"SIR\": 18.4773,\n            \"SAR\": 9.67408,\n            \"ISR\": 14.9727\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8701,\n            \"SIR\": 26.8708,\n            \"SAR\": 17.7779,\n            \"ISR\": 17.8068\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.455,\n            \"SIR\": 26.7561,\n            \"SAR\": 13.1383,\n            \"ISR\": 15.0391\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0942,\n            \"SIR\": 22.4359,\n            \"SAR\": 15.5258,\n            \"ISR\": 18.1932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.05343,\n            \"SIR\": 17.4415,\n            \"SAR\": 6.66362,\n            \"ISR\": 11.2003\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4202,\n            \"SIR\": 19.008,\n            \"SAR\": 14.48,\n            \"ISR\": 16.6802\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1993,\n            \"SIR\": 26.0982,\n            \"SAR\": 11.7279,\n            \"ISR\": 16.1879\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1402,\n            \"SIR\": 27.1198,\n            \"SAR\": 17.7607,\n            \"ISR\": 17.8452\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.2233,\n            \"SIR\": 32.2725,\n            \"SAR\": 15.4792,\n            \"ISR\": 16.7394\n          },\n          \"instrumental\": {\n            \"SDR\": 15.747,\n            \"SIR\": 21.9231,\n            \"SAR\": 15.9607,\n            \"ISR\": 19.2218\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1553,\n            \"SIR\": 26.3986,\n            \"SAR\": 12.7755,\n            \"ISR\": 15.9839\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8491,\n            \"SIR\": 24.8442,\n            \"SAR\": 16.4969,\n            \"ISR\": 18.2967\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6719,\n            \"SIR\": 33.0287,\n            \"SAR\": 12.6361,\n            \"ISR\": 14.3029\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9172,\n            \"SIR\": 26.5597,\n            \"SAR\": 22.3544,\n            \"ISR\": 19.6802\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.54675,\n            \"SIR\": 10.8237,\n            \"SAR\": 8.67814,\n            \"ISR\": 10.5822\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4174,\n            \"SIR\": 17.4604,\n            \"SAR\": 16.1392,\n            \"ISR\": 16.072\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5144,\n            \"SIR\": 27.8448,\n            \"SAR\": 14.7456,\n            \"ISR\": 17.9309\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0967,\n            \"SIR\": 36.667,\n            \"SAR\": 25.1275,\n            \"ISR\": 19.7392\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1499,\n            \"SIR\": 20.2272,\n            \"SAR\": 12.3885,\n            \"ISR\": 17.0839\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0272,\n            \"SIR\": 24.4044,\n            \"SAR\": 12.7057,\n            \"ISR\": 15.0756\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.08337,\n            \"SIR\": 24.7954,\n            \"SAR\": 9.38067,\n            \"ISR\": 14.3144\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7452,\n            \"SIR\": 24.6377,\n            \"SAR\": 16.2441,\n            \"ISR\": 18.3736\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.58177,\n            \"SIR\": 6.47052,\n            \"SAR\": 2.59301,\n            \"ISR\": 10.9061\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6437,\n            \"SIR\": 32.1906,\n            \"SAR\": 22.6347,\n            \"ISR\": 18.314\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.6238,\n            \"SIR\": 26.6215,\n            \"SAR\": 11.417,\n            \"ISR\": 15.9769\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5271,\n            \"SIR\": 31.3342,\n            \"SAR\": 21.0824,\n            \"ISR\": 19.3129\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5055,\n            \"SIR\": 31.4731,\n            \"SAR\": 13.1795,\n            \"ISR\": 16.6647\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4959,\n            \"SIR\": 33.6911,\n            \"SAR\": 23.4512,\n            \"ISR\": 19.7558\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3247,\n            \"SIR\": 31.9899,\n            \"SAR\": 15.8975,\n            \"ISR\": 17.1669\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8544,\n            \"SIR\": 28.4349,\n            \"SAR\": 19.8302,\n            \"ISR\": 19.2184\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.5137,\n        \"SIR\": 23.0198,\n        \"SAR\": 11.3663,\n        \"ISR\": 14.6435\n      },\n      \"instrumental\": {\n        \"SDR\": 15.9189,\n        \"SIR\": 26.5922,\n        \"SAR\": 18.2325,\n        \"ISR\": 18.3768\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": null\n  },\n  \"model_bs_roformer_ep_317_sdr_12.9755.ckpt\": {\n    \"model_name\": \"Roformer Model: BS-Roformer-Viperx-1297\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.27145,\n            \"SIR\": 21.3266,\n            \"SAR\": 6.89905,\n            \"ISR\": 12.7087\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2885,\n            \"SIR\": 28.2786,\n            \"SAR\": 20.7563,\n            \"ISR\": 19.1431\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.36916,\n            \"SIR\": 25.2781,\n            \"SAR\": 9.32521,\n            \"ISR\": 13.8011\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7147,\n            \"SIR\": 24.0983,\n            \"SAR\": 16.5033,\n            \"ISR\": 18.6872\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0209,\n            \"SIR\": 29.6239,\n            \"SAR\": 13.9194,\n            \"ISR\": 17.2502\n          },\n          \"instrumental\": {\n            \"SDR\": 17.206,\n            \"SIR\": 31.5398,\n            \"SAR\": 20.446,\n            \"ISR\": 19.2727\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.39241,\n            \"SIR\": 7.74162,\n            \"SAR\": 4.72567,\n            \"ISR\": 14.7927\n          },\n          \"instrumental\": {\n            \"SDR\": 11.4704,\n            \"SIR\": 28.5414,\n            \"SAR\": 13.288,\n            \"ISR\": 14.2931\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3226,\n            \"SIR\": 27.9537,\n            \"SAR\": 15.3748,\n            \"ISR\": 17.667\n          },\n          \"instrumental\": {\n            \"SDR\": 15.604,\n            \"SIR\": 28.4274,\n            \"SAR\": 17.2536,\n            \"ISR\": 18.4205\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1078,\n            \"SIR\": 21.9648,\n            \"SAR\": 11.8021,\n            \"ISR\": 15.8931\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9535,\n            \"SIR\": 24.1863,\n            \"SAR\": 14.1888,\n            \"ISR\": 17.1184\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.867,\n            \"SIR\": 24.2088,\n            \"SAR\": 13.8855,\n            \"ISR\": 16.4038\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2871,\n            \"SIR\": 27.1032,\n            \"SAR\": 18.4096,\n            \"ISR\": 18.6002\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2302,\n            \"SIR\": 28.9779,\n            \"SAR\": 10.4524,\n            \"ISR\": 14.9303\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0013,\n            \"SIR\": 29.6188,\n            \"SAR\": 22.9885,\n            \"ISR\": 19.681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7068,\n            \"SIR\": 39.8113,\n            \"SAR\": 15.8475,\n            \"ISR\": 18.684\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9951,\n            \"SIR\": 34.8677,\n            \"SAR\": 19.8234,\n            \"ISR\": 19.659\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9516,\n            \"SIR\": 23.7946,\n            \"SAR\": 13.8535,\n            \"ISR\": 14.9136\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9353,\n            \"SIR\": 23.4625,\n            \"SAR\": 18.7001,\n            \"ISR\": 18.0127\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.1394,\n            \"SIR\": 40.5805,\n            \"SAR\": 18.4321,\n            \"ISR\": 18.949\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6222,\n            \"SIR\": 34.477,\n            \"SAR\": 21.2358,\n            \"ISR\": 19.7197\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7367,\n            \"SIR\": 26.4655,\n            \"SAR\": 13.6118,\n            \"ISR\": 17.0604\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4724,\n            \"SIR\": 30.6654,\n            \"SAR\": 19.2084,\n            \"ISR\": 18.9816\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78136,\n            \"SIR\": 22.5081,\n            \"SAR\": 10.105,\n            \"ISR\": 15.4463\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6766,\n            \"SIR\": 27.6504,\n            \"SAR\": 16.3122,\n            \"ISR\": 18.0535\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0149,\n            \"SIR\": 29.4709,\n            \"SAR\": 13.0542,\n            \"ISR\": 15.4531\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2397,\n            \"SIR\": 26.1105,\n            \"SAR\": 20.8677,\n            \"ISR\": 19.3018\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.61991,\n            \"SIR\": 23.4274,\n            \"SAR\": 7.22862,\n            \"ISR\": 11.4066\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7726,\n            \"SIR\": 22.3407,\n            \"SAR\": 17.5078,\n            \"ISR\": 19.0333\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2783,\n            \"SIR\": 23.3212,\n            \"SAR\": 11.8311,\n            \"ISR\": 16.2971\n          },\n          \"instrumental\": {\n            \"SDR\": 15.031,\n            \"SIR\": 28.1696,\n            \"SAR\": 17.1084,\n            \"ISR\": 18.3429\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8233,\n            \"SIR\": 31.2998,\n            \"SAR\": 11.2267,\n            \"ISR\": 15.4322\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3026,\n            \"SIR\": 28.1272,\n            \"SAR\": 19.0388,\n            \"ISR\": 19.4063\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.64678,\n            \"SIR\": 25.7471,\n            \"SAR\": 8.19411,\n            \"ISR\": 15.1165\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7877,\n            \"SIR\": 29.8842,\n            \"SAR\": 19.9395,\n            \"ISR\": 19.1579\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.12066,\n            \"SIR\": 20.0322,\n            \"SAR\": 9.04174,\n            \"ISR\": 13.1748\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5703,\n            \"SIR\": 24.8801,\n            \"SAR\": 20.2243,\n            \"ISR\": 18.5049\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3468,\n            \"SIR\": 29.6378,\n            \"SAR\": 14.1276,\n            \"ISR\": 17.1629\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7654,\n            \"SIR\": 30.8915,\n            \"SAR\": 19.4882,\n            \"ISR\": 19.1681\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4197,\n            \"SIR\": 30.0747,\n            \"SAR\": 15.6699,\n            \"ISR\": 18.0037\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0662,\n            \"SIR\": 34.185,\n            \"SAR\": 22.4746,\n            \"ISR\": 19.5247\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4383,\n            \"SIR\": 33.4779,\n            \"SAR\": 13.0333,\n            \"ISR\": 18.0859\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1892,\n            \"SIR\": 36.9145,\n            \"SAR\": 22.8796,\n            \"ISR\": 19.6952\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.66077,\n            \"SIR\": 17.1318,\n            \"SAR\": 4.2188,\n            \"ISR\": 7.61178\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1464,\n            \"SIR\": 15.5843,\n            \"SAR\": 14.2608,\n            \"ISR\": 17.6277\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4039,\n            \"SIR\": 27.3548,\n            \"SAR\": 11.0587,\n            \"ISR\": 15.5801\n          },\n          \"instrumental\": {\n            \"SDR\": 16.734,\n            \"SIR\": 28.6685,\n            \"SAR\": 19.52,\n            \"ISR\": 19.3359\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.66407,\n            \"SIR\": 20.5804,\n            \"SAR\": 8.87455,\n            \"ISR\": 13.1115\n          },\n          \"instrumental\": {\n            \"SDR\": 15.113,\n            \"SIR\": 24.7395,\n            \"SAR\": 17.643,\n            \"ISR\": 18.5929\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.126,\n            \"SIR\": 21.5221,\n            \"SAR\": 10.4151,\n            \"ISR\": 15.8837\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4299,\n            \"SIR\": 29.0696,\n            \"SAR\": 18.4581,\n            \"ISR\": 18.2819\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.6494,\n            \"SIR\": 30.4834,\n            \"SAR\": 14.3899,\n            \"ISR\": 16.2633\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0335,\n            \"SIR\": 24.9252,\n            \"SAR\": 16.2863,\n            \"ISR\": 18.8539\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.70131,\n            \"SIR\": 20.3859,\n            \"SAR\": 7.43047,\n            \"ISR\": 11.9802\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4709,\n            \"SIR\": 19.6736,\n            \"SAR\": 14.4294,\n            \"ISR\": 17.2905\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5851,\n            \"SIR\": 29.9729,\n            \"SAR\": 13.5023,\n            \"ISR\": 17.7674\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6238,\n            \"SIR\": 32.5914,\n            \"SAR\": 19.2673,\n            \"ISR\": 19.0896\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.6426,\n            \"SIR\": 34.0086,\n            \"SAR\": 17.5022,\n            \"ISR\": 17.9602\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2952,\n            \"SIR\": 21.9532,\n            \"SAR\": 16.8056,\n            \"ISR\": 19.2864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8832,\n            \"SIR\": 28.1031,\n            \"SAR\": 13.7298,\n            \"ISR\": 17.1319\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2718,\n            \"SIR\": 26.1247,\n            \"SAR\": 16.7281,\n            \"ISR\": 18.519\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1345,\n            \"SIR\": 37.4204,\n            \"SAR\": 15.2435,\n            \"ISR\": 17.4216\n          },\n          \"instrumental\": {\n            \"SDR\": 18.531,\n            \"SIR\": 33.2044,\n            \"SAR\": 23.9269,\n            \"ISR\": 19.7974\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.6385,\n            \"SIR\": 9.06066,\n            \"SAR\": 10.0479,\n            \"ISR\": 15.3108\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4934,\n            \"SIR\": 25.452,\n            \"SAR\": 15.6533,\n            \"ISR\": 14.9693\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.8821,\n            \"SIR\": 31.434,\n            \"SAR\": 17.7598,\n            \"ISR\": 18.9229\n          },\n          \"instrumental\": {\n            \"SDR\": 19.1253,\n            \"SIR\": 40.6957,\n            \"SAR\": 26.508,\n            \"ISR\": 19.6281\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3402,\n            \"SIR\": 24.5707,\n            \"SAR\": 13.4427,\n            \"ISR\": 17.7563\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4393,\n            \"SIR\": 25.6194,\n            \"SAR\": 13.1366,\n            \"ISR\": 16.5581\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.95385,\n            \"SIR\": 27.4435,\n            \"SAR\": 10.3124,\n            \"ISR\": 15.5437\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3988,\n            \"SIR\": 26.792,\n            \"SAR\": 16.9621,\n            \"ISR\": 18.695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.2256,\n            \"SIR\": 21.3097,\n            \"SAR\": 8.55264,\n            \"ISR\": 13.475\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6543,\n            \"SIR\": 23.7424,\n            \"SAR\": 15.7181,\n            \"ISR\": 18.4904\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5335,\n            \"SIR\": 29.5621,\n            \"SAR\": 12.3213,\n            \"ISR\": 16.6072\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9441,\n            \"SIR\": 31.1553,\n            \"SAR\": 20.1699,\n            \"ISR\": 19.259\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4035,\n            \"SIR\": 35.4832,\n            \"SAR\": 15.5806,\n            \"ISR\": 17.8685\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6056,\n            \"SIR\": 36.0998,\n            \"SAR\": 24.11,\n            \"ISR\": 19.6745\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.0906,\n            \"SIR\": 35.8248,\n            \"SAR\": 17.8613,\n            \"ISR\": 19.2221\n          },\n          \"instrumental\": {\n            \"SDR\": 17.953,\n            \"SIR\": 38.7085,\n            \"SAR\": 22.0366,\n            \"ISR\": 19.4598\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 11.7742,\n        \"SIR\": 27.3991,\n        \"SAR\": 12.6773,\n        \"ISR\": 16.0782\n      },\n      \"instrumental\": {\n        \"SDR\": 16.4511,\n        \"SIR\": 28.2241,\n        \"SAR\": 18.8695,\n        \"ISR\": 19.0074\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"model_bs_roformer_ep_368_sdr_12.9628.ckpt\": {\n    \"model_name\": \"Roformer Model: BS-Roformer-Viperx-1296\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.39953,\n            \"SIR\": 22.6627,\n            \"SAR\": 6.9396,\n            \"ISR\": 12.5152\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2737,\n            \"SIR\": 28.1382,\n            \"SAR\": 20.7429,\n            \"ISR\": 19.2457\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.36037,\n            \"SIR\": 25.3319,\n            \"SAR\": 9.31179,\n            \"ISR\": 13.7824\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7536,\n            \"SIR\": 24.0896,\n            \"SAR\": 16.4184,\n            \"ISR\": 18.6887\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1494,\n            \"SIR\": 29.732,\n            \"SAR\": 13.9263,\n            \"ISR\": 17.2413\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2264,\n            \"SIR\": 31.482,\n            \"SAR\": 20.5252,\n            \"ISR\": 19.2723\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.34793,\n            \"SIR\": 7.82398,\n            \"SAR\": 4.74901,\n            \"ISR\": 14.6313\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3709,\n            \"SIR\": 28.2637,\n            \"SAR\": 13.2022,\n            \"ISR\": 14.1835\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3395,\n            \"SIR\": 28.1322,\n            \"SAR\": 15.3707,\n            \"ISR\": 17.678\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6465,\n            \"SIR\": 28.4675,\n            \"SAR\": 17.3125,\n            \"ISR\": 18.4383\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1291,\n            \"SIR\": 22.0798,\n            \"SAR\": 11.826,\n            \"ISR\": 15.8762\n          },\n          \"instrumental\": {\n            \"SDR\": 12.9475,\n            \"SIR\": 24.0749,\n            \"SAR\": 14.1964,\n            \"ISR\": 17.1543\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9164,\n            \"SIR\": 24.3323,\n            \"SAR\": 13.8997,\n            \"ISR\": 16.4235\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2676,\n            \"SIR\": 27.1146,\n            \"SAR\": 18.3553,\n            \"ISR\": 18.6058\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5495,\n            \"SIR\": 29.49,\n            \"SAR\": 10.7915,\n            \"ISR\": 15.2517\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9895,\n            \"SIR\": 29.8774,\n            \"SAR\": 22.6017,\n            \"ISR\": 19.6864\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7787,\n            \"SIR\": 39.8481,\n            \"SAR\": 15.8941,\n            \"ISR\": 18.6011\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0029,\n            \"SIR\": 34.6981,\n            \"SAR\": 19.9098,\n            \"ISR\": 19.6592\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0142,\n            \"SIR\": 24.4319,\n            \"SAR\": 13.9425,\n            \"ISR\": 14.8489\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9607,\n            \"SIR\": 23.2911,\n            \"SAR\": 18.8132,\n            \"ISR\": 18.2128\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.1546,\n            \"SIR\": 40.5922,\n            \"SAR\": 18.5296,\n            \"ISR\": 18.8637\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6162,\n            \"SIR\": 32.9914,\n            \"SAR\": 21.1756,\n            \"ISR\": 19.7181\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7157,\n            \"SIR\": 26.6103,\n            \"SAR\": 13.6576,\n            \"ISR\": 17.0729\n          },\n          \"instrumental\": {\n            \"SDR\": 16.436,\n            \"SIR\": 30.7544,\n            \"SAR\": 19.1643,\n            \"ISR\": 18.9975\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2251,\n            \"SIR\": 23.229,\n            \"SAR\": 10.7962,\n            \"ISR\": 15.6026\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6229,\n            \"SIR\": 27.3555,\n            \"SAR\": 16.211,\n            \"ISR\": 18.1907\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4581,\n            \"SIR\": 29.532,\n            \"SAR\": 13.8958,\n            \"ISR\": 16.0717\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9741,\n            \"SIR\": 26.32,\n            \"SAR\": 20.6411,\n            \"ISR\": 19.2486\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.85879,\n            \"SIR\": 23.8983,\n            \"SAR\": 7.46923,\n            \"ISR\": 11.4269\n          },\n          \"instrumental\": {\n            \"SDR\": 14.686,\n            \"SIR\": 22.2229,\n            \"SAR\": 17.448,\n            \"ISR\": 19.0598\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.306,\n            \"SIR\": 23.3683,\n            \"SAR\": 11.8386,\n            \"ISR\": 16.311\n          },\n          \"instrumental\": {\n            \"SDR\": 15.045,\n            \"SIR\": 28.1509,\n            \"SAR\": 17.14,\n            \"ISR\": 18.3567\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8225,\n            \"SIR\": 31.427,\n            \"SAR\": 11.2367,\n            \"ISR\": 15.4045\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3166,\n            \"SIR\": 28.0932,\n            \"SAR\": 19.0669,\n            \"ISR\": 19.4124\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7401,\n            \"SIR\": 33.2741,\n            \"SAR\": 11.6433,\n            \"ISR\": 15.8872\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7811,\n            \"SIR\": 25.3068,\n            \"SAR\": 16.5197,\n            \"ISR\": 19.1905\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.11487,\n            \"SIR\": 20.0978,\n            \"SAR\": 9.0895,\n            \"ISR\": 13.1549\n          },\n          \"instrumental\": {\n            \"SDR\": 17.64,\n            \"SIR\": 24.9244,\n            \"SAR\": 20.362,\n            \"ISR\": 18.5145\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4686,\n            \"SIR\": 29.576,\n            \"SAR\": 14.3035,\n            \"ISR\": 17.2137\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6969,\n            \"SIR\": 30.7686,\n            \"SAR\": 19.2,\n            \"ISR\": 19.0507\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.427,\n            \"SIR\": 30.229,\n            \"SAR\": 15.7906,\n            \"ISR\": 18.0289\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9999,\n            \"SIR\": 34.4109,\n            \"SAR\": 22.5003,\n            \"ISR\": 19.5244\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.808,\n            \"SIR\": 34.9357,\n            \"SAR\": 14.9737,\n            \"ISR\": 18.3659\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1845,\n            \"SIR\": 36.5155,\n            \"SAR\": 22.8768,\n            \"ISR\": 19.6935\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.63448,\n            \"SIR\": 17.3934,\n            \"SAR\": 4.1798,\n            \"ISR\": 7.48037\n          },\n          \"instrumental\": {\n            \"SDR\": 11.165,\n            \"SIR\": 15.4057,\n            \"SAR\": 14.3041,\n            \"ISR\": 17.727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9725,\n            \"SIR\": 28.1688,\n            \"SAR\": 11.5272,\n            \"ISR\": 15.511\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5247,\n            \"SIR\": 27.933,\n            \"SAR\": 19.1213,\n            \"ISR\": 19.3798\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.66971,\n            \"SIR\": 20.9616,\n            \"SAR\": 8.84133,\n            \"ISR\": 13.0692\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1038,\n            \"SIR\": 24.6768,\n            \"SAR\": 17.6344,\n            \"ISR\": 18.6515\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3869,\n            \"SIR\": 23.4738,\n            \"SAR\": 10.8243,\n            \"ISR\": 15.8763\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2972,\n            \"SIR\": 28.4549,\n            \"SAR\": 18.3876,\n            \"ISR\": 18.3486\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7137,\n            \"SIR\": 30.679,\n            \"SAR\": 14.4911,\n            \"ISR\": 16.3675\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9251,\n            \"SIR\": 25.0751,\n            \"SAR\": 16.2708,\n            \"ISR\": 18.8643\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.37601,\n            \"SIR\": 21.5665,\n            \"SAR\": 7.3152,\n            \"ISR\": 11.4945\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2748,\n            \"SIR\": 18.9108,\n            \"SAR\": 14.2617,\n            \"ISR\": 17.5995\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.614,\n            \"SIR\": 30.1672,\n            \"SAR\": 13.5625,\n            \"ISR\": 17.7516\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5323,\n            \"SIR\": 32.3023,\n            \"SAR\": 19.2253,\n            \"ISR\": 19.1349\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.6795,\n            \"SIR\": 33.7576,\n            \"SAR\": 17.5999,\n            \"ISR\": 17.9539\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2509,\n            \"SIR\": 21.9868,\n            \"SAR\": 16.7314,\n            \"ISR\": 19.2491\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.887,\n            \"SIR\": 28.147,\n            \"SAR\": 13.6965,\n            \"ISR\": 17.0896\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3179,\n            \"SIR\": 26.098,\n            \"SAR\": 16.7487,\n            \"ISR\": 18.5351\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1955,\n            \"SIR\": 37.4715,\n            \"SAR\": 15.0205,\n            \"ISR\": 16.6058\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5468,\n            \"SIR\": 31.6753,\n            \"SAR\": 23.8806,\n            \"ISR\": 19.7926\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.65754,\n            \"SIR\": 9.5571,\n            \"SAR\": 10.0174,\n            \"ISR\": 14.9344\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4272,\n            \"SIR\": 24.2699,\n            \"SAR\": 15.9498,\n            \"ISR\": 15.2298\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.5351,\n            \"SIR\": 33.1355,\n            \"SAR\": 18.9963,\n            \"ISR\": 18.9178\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0794,\n            \"SIR\": 40.0272,\n            \"SAR\": 26.3595,\n            \"ISR\": 19.6428\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6328,\n            \"SIR\": 25.2811,\n            \"SAR\": 13.8165,\n            \"ISR\": 17.5757\n          },\n          \"instrumental\": {\n            \"SDR\": 12.3381,\n            \"SIR\": 24.3844,\n            \"SAR\": 12.8706,\n            \"ISR\": 16.8282\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3782,\n            \"SIR\": 28.2182,\n            \"SAR\": 10.6105,\n            \"ISR\": 15.525\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1444,\n            \"SIR\": 26.4238,\n            \"SAR\": 16.8456,\n            \"ISR\": 18.7488\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.33927,\n            \"SIR\": 21.5469,\n            \"SAR\": 8.66699,\n            \"ISR\": 13.4977\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6162,\n            \"SIR\": 23.4244,\n            \"SAR\": 15.4223,\n            \"ISR\": 18.5112\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7457,\n            \"SIR\": 29.9814,\n            \"SAR\": 12.3968,\n            \"ISR\": 16.6172\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8959,\n            \"SIR\": 30.9538,\n            \"SAR\": 20.0131,\n            \"ISR\": 19.297\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5522,\n            \"SIR\": 35.9402,\n            \"SAR\": 15.6485,\n            \"ISR\": 17.6323\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5864,\n            \"SIR\": 34.6089,\n            \"SAR\": 23.9372,\n            \"ISR\": 19.6892\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.1661,\n            \"SIR\": 35.7526,\n            \"SAR\": 17.8551,\n            \"ISR\": 19.235\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9249,\n            \"SIR\": 38.6427,\n            \"SAR\": 22.0729,\n            \"ISR\": 19.4537\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.1019,\n        \"SIR\": 28.1579,\n        \"SAR\": 12.9796,\n        \"ISR\": 16.1913\n      },\n      \"instrumental\": {\n        \"SDR\": 16.3069,\n        \"SIR\": 28.0131,\n        \"SAR\": 18.6004,\n        \"ISR\": 19.0241\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"model_bs_roformer_ep_937_sdr_10.5309.ckpt\": {\n    \"model_name\": \"Roformer Model: BS-Roformer-Viperx-1053\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.3\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.2\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.5\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.2\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.4\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.3\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 8.6\n    },\n    \"stems\": [\n      \"no drum-bass\",\n      \"drum-bass\"\n    ],\n    \"target_stem\": \"no drum-bass\"\n  },\n  \"model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt\": {\n    \"model_name\": \"Roformer Model: Mel-Roformer-Viperx-1143\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.22599,\n            \"SIR\": 20.7364,\n            \"SAR\": 6.81363,\n            \"ISR\": 11.9464\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1126,\n            \"SIR\": 25.7777,\n            \"SAR\": 19.0311,\n            \"ISR\": 18.8924\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.04546,\n            \"SIR\": 21.2043,\n            \"SAR\": 7.67787,\n            \"ISR\": 12.7872\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7151,\n            \"SIR\": 22.0413,\n            \"SAR\": 15.1107,\n            \"ISR\": 17.9742\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.551,\n            \"SIR\": 26.2964,\n            \"SAR\": 12.1747,\n            \"ISR\": 16.2897\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3613,\n            \"SIR\": 29.0243,\n            \"SAR\": 18.9893,\n            \"ISR\": 18.8874\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.07081,\n            \"SIR\": 10.0756,\n            \"SAR\": 4.27666,\n            \"ISR\": 13.2681\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7448,\n            \"SIR\": 24.9302,\n            \"SAR\": 13.4134,\n            \"ISR\": 15.4297\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0026,\n            \"SIR\": 24.4973,\n            \"SAR\": 13.4719,\n            \"ISR\": 15.9803\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8087,\n            \"SIR\": 23.8779,\n            \"SAR\": 15.7901,\n            \"ISR\": 17.7153\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2028,\n            \"SIR\": 19.3448,\n            \"SAR\": 10.8419,\n            \"ISR\": 14.8978\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9256,\n            \"SIR\": 22.1487,\n            \"SAR\": 13.1911,\n            \"ISR\": 16.27\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3702,\n            \"SIR\": 22.0296,\n            \"SAR\": 12.011,\n            \"ISR\": 14.9376\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0478,\n            \"SIR\": 24.3747,\n            \"SAR\": 16.8541,\n            \"ISR\": 18.087\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.36541,\n            \"SIR\": 26.5573,\n            \"SAR\": 9.35732,\n            \"ISR\": 13.8703\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5259,\n            \"SIR\": 28.2036,\n            \"SAR\": 21.8168,\n            \"ISR\": 19.5661\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6578,\n            \"SIR\": 34.6168,\n            \"SAR\": 13.511,\n            \"ISR\": 16.8816\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5256,\n            \"SIR\": 28.4225,\n            \"SAR\": 17.6244,\n            \"ISR\": 19.2833\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3493,\n            \"SIR\": 23.4189,\n            \"SAR\": 10.9716,\n            \"ISR\": 12.805\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8693,\n            \"SIR\": 19.1707,\n            \"SAR\": 16.1595,\n            \"ISR\": 18.0875\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5444,\n            \"SIR\": 36.848,\n            \"SAR\": 16.0659,\n            \"ISR\": 18.0546\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5295,\n            \"SIR\": 30.481,\n            \"SAR\": 19.2078,\n            \"ISR\": 19.4684\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9422,\n            \"SIR\": 23.9059,\n            \"SAR\": 12.4638,\n            \"ISR\": 16.2029\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5175,\n            \"SIR\": 28.5661,\n            \"SAR\": 17.7634,\n            \"ISR\": 18.6425\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.97026,\n            \"SIR\": 21.0287,\n            \"SAR\": 9.52597,\n            \"ISR\": 14.4262\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3221,\n            \"SIR\": 25.4717,\n            \"SAR\": 15.9633,\n            \"ISR\": 17.7574\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.82,\n            \"SIR\": 25.0831,\n            \"SAR\": 11.8402,\n            \"ISR\": 14.7674\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5423,\n            \"SIR\": 25.0294,\n            \"SAR\": 19.7378,\n            \"ISR\": 18.8441\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.19366,\n            \"SIR\": 21.4983,\n            \"SAR\": 6.30645,\n            \"ISR\": 10.5103\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1842,\n            \"SIR\": 21.0838,\n            \"SAR\": 16.7889,\n            \"ISR\": 18.7831\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0847,\n            \"SIR\": 21.8325,\n            \"SAR\": 10.8027,\n            \"ISR\": 15.352\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2207,\n            \"SIR\": 25.5833,\n            \"SAR\": 16.1734,\n            \"ISR\": 17.98\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.62655,\n            \"SIR\": 28.8525,\n            \"SAR\": 10.0028,\n            \"ISR\": 14.3664\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5942,\n            \"SIR\": 25.8222,\n            \"SAR\": 17.917,\n            \"ISR\": 19.2028\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.166635,\n            \"SIR\": -6.51745,\n            \"SAR\": 0.108365,\n            \"ISR\": 11.0128\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5254,\n            \"SIR\": 47.9267,\n            \"SAR\": 16.431,\n            \"ISR\": 17.7318\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.0742,\n            \"SIR\": 18.0718,\n            \"SAR\": 7.88194,\n            \"ISR\": 11.6038\n          },\n          \"instrumental\": {\n            \"SDR\": 16.6074,\n            \"SIR\": 22.5285,\n            \"SAR\": 18.8244,\n            \"ISR\": 18.182\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7138,\n            \"SIR\": 26.2193,\n            \"SAR\": 12.2143,\n            \"ISR\": 15.3867\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2847,\n            \"SIR\": 25.9592,\n            \"SAR\": 17.1909,\n            \"ISR\": 18.9188\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.862,\n            \"SIR\": 24.9785,\n            \"SAR\": 13.6176,\n            \"ISR\": 16.9629\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2697,\n            \"SIR\": 30.9239,\n            \"SAR\": 20.7161,\n            \"ISR\": 19.1325\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0228,\n            \"SIR\": 29.7847,\n            \"SAR\": 11.6063,\n            \"ISR\": 16.2\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3957,\n            \"SIR\": 30.5321,\n            \"SAR\": 20.7349,\n            \"ISR\": 19.4281\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.23988,\n            \"SIR\": 16.4631,\n            \"SAR\": 3.66268,\n            \"ISR\": 6.83089\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8817,\n            \"SIR\": 14.5998,\n            \"SAR\": 14.1,\n            \"ISR\": 17.684\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.16346,\n            \"SIR\": 22.447,\n            \"SAR\": 9.78872,\n            \"ISR\": 14.6676\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2642,\n            \"SIR\": 28.4148,\n            \"SAR\": 18.94,\n            \"ISR\": 19.0972\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.95002,\n            \"SIR\": 19.5031,\n            \"SAR\": 8.03194,\n            \"ISR\": 12.0693\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5739,\n            \"SIR\": 23.0761,\n            \"SAR\": 16.9534,\n            \"ISR\": 18.4001\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.73402,\n            \"SIR\": 21.5119,\n            \"SAR\": 10.3201,\n            \"ISR\": 15.0848\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2111,\n            \"SIR\": 25.9637,\n            \"SAR\": 16.5764,\n            \"ISR\": 17.7013\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8612,\n            \"SIR\": 26.1244,\n            \"SAR\": 12.2784,\n            \"ISR\": 14.382\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3654,\n            \"SIR\": 21.0023,\n            \"SAR\": 14.5066,\n            \"ISR\": 18.013\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.69852,\n            \"SIR\": 18.1825,\n            \"SAR\": 6.29807,\n            \"ISR\": 10.6505\n          },\n          \"instrumental\": {\n            \"SDR\": 12.1827,\n            \"SIR\": 17.5617,\n            \"SAR\": 13.2379,\n            \"ISR\": 16.7747\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9979,\n            \"SIR\": 26.6356,\n            \"SAR\": 11.5283,\n            \"ISR\": 16.1449\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6651,\n            \"SIR\": 26.5811,\n            \"SAR\": 16.9057,\n            \"ISR\": 17.8642\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7051,\n            \"SIR\": 30.6497,\n            \"SAR\": 13.6958,\n            \"ISR\": 15.7065\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0371,\n            \"SIR\": 17.4896,\n            \"SAR\": 13.0171,\n            \"ISR\": 18.3478\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8281,\n            \"SIR\": 25.4208,\n            \"SAR\": 12.5666,\n            \"ISR\": 15.8487\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8659,\n            \"SIR\": 23.7314,\n            \"SAR\": 15.1002,\n            \"ISR\": 17.9173\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7301,\n            \"SIR\": 34.1645,\n            \"SAR\": 13.5797,\n            \"ISR\": 16.7386\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9351,\n            \"SIR\": 31.0197,\n            \"SAR\": 22.1091,\n            \"ISR\": 19.7026\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.17442,\n            \"SIR\": 10.972,\n            \"SAR\": 6.79369,\n            \"ISR\": 8.42129\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5621,\n            \"SIR\": 14.6105,\n            \"SAR\": 14.0976,\n            \"ISR\": 15.8593\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.1324,\n            \"SIR\": 33.7296,\n            \"SAR\": 18.3241,\n            \"ISR\": 18.5191\n          },\n          \"instrumental\": {\n            \"SDR\": 18.3647,\n            \"SIR\": 36.8409,\n            \"SAR\": 23.587,\n            \"ISR\": 19.5641\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9136,\n            \"SIR\": 22.688,\n            \"SAR\": 13.0509,\n            \"ISR\": 17.4081\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2257,\n            \"SIR\": 23.4872,\n            \"SAR\": 11.8632,\n            \"ISR\": 15.754\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.62748,\n            \"SIR\": 22.9827,\n            \"SAR\": 8.79313,\n            \"ISR\": 14.2059\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0571,\n            \"SIR\": 24.118,\n            \"SAR\": 15.4935,\n            \"ISR\": 17.8618\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.43734,\n            \"SIR\": 19.9247,\n            \"SAR\": 7.577,\n            \"ISR\": 12.41\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0319,\n            \"SIR\": 22.1165,\n            \"SAR\": 15.0098,\n            \"ISR\": 18.1953\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2658,\n            \"SIR\": 26.1365,\n            \"SAR\": 10.9079,\n            \"ISR\": 15.5378\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0503,\n            \"SIR\": 28.9192,\n            \"SAR\": 18.7169,\n            \"ISR\": 18.8009\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7917,\n            \"SIR\": 31.6955,\n            \"SAR\": 13.4608,\n            \"ISR\": 16.1853\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7895,\n            \"SIR\": 30.4762,\n            \"SAR\": 21.8329,\n            \"ISR\": 19.4939\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7457,\n            \"SIR\": 27.5503,\n            \"SAR\": 13.64,\n            \"ISR\": 16.7788\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8242,\n            \"SIR\": 28.0431,\n            \"SAR\": 17.887,\n            \"ISR\": 18.6347\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.5429,\n        \"SIR\": 23.6624,\n        \"SAR\": 10.9398,\n        \"ISR\": 14.9177\n      },\n      \"instrumental\": {\n        \"SDR\": 15.1295,\n        \"SIR\": 25.5275,\n        \"SAR\": 16.8799,\n        \"ISR\": 18.2715\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MDX23C-De-Reverb-aufr33-jarredou.ckpt\": {\n    \"model_name\": \"MDX23C Model: MDX23C De-Reverb by aufr33-jarredou\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"dry\",\n      \"no dry\"\n    ],\n    \"target_stem\": null\n  },\n  \"MDX23C-DrumSep-aufr33-jarredou.ckpt\": {\n    \"model_name\": \"MDX23C Model: MDX23C DrumSep by aufr33-jarredou\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"kick\",\n      \"snare\",\n      \"toms\",\n      \"hh\",\n      \"ride\",\n      \"crash\"\n    ],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\": {\n    \"model_name\": \"Roformer Model: Mel-Roformer-Karaoke-Aufr33-Viperx\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.80284,\n            \"SIR\": 28.1352,\n            \"SAR\": 6.2309,\n            \"ISR\": 8.72186\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4103,\n            \"SIR\": 20.3727,\n            \"SAR\": 19.1878,\n            \"ISR\": 19.5712\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.44766,\n            \"SIR\": 28.6953,\n            \"SAR\": 5.61063,\n            \"ISR\": 7.9584\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6286,\n            \"SIR\": 13.8662,\n            \"SAR\": 14.2308,\n            \"ISR\": 19.4116\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3317,\n            \"SIR\": 31.3895,\n            \"SAR\": 8.71018,\n            \"ISR\": 10.3698\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0015,\n            \"SIR\": 18.3039,\n            \"SAR\": 16.7546,\n            \"ISR\": 19.4711\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.36094,\n            \"SIR\": 9.84555,\n            \"SAR\": 0.35986,\n            \"ISR\": 7.04313\n          },\n          \"instrumental\": {\n            \"SDR\": 11.262,\n            \"SIR\": 16.6469,\n            \"SAR\": 13.1283,\n            \"ISR\": 16.5805\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.60024,\n            \"SIR\": 29.7966,\n            \"SAR\": 6.36113,\n            \"ISR\": 8.71611\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5203,\n            \"SIR\": 12.0699,\n            \"SAR\": 11.2546,\n            \"ISR\": 19.1232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.50892,\n            \"SIR\": 22.2888,\n            \"SAR\": 3.60457,\n            \"ISR\": 5.76414\n          },\n          \"instrumental\": {\n            \"SDR\": 6.34154,\n            \"SIR\": 8.31747,\n            \"SAR\": 10.2241,\n            \"ISR\": 18.1232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.97312,\n            \"SIR\": 29.9001,\n            \"SAR\": -6.8765,\n            \"ISR\": 0.797585\n          },\n          \"instrumental\": {\n            \"SDR\": 7.36757,\n            \"SIR\": 5.43338,\n            \"SAR\": 13.1064,\n            \"ISR\": 19.853\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.02955,\n            \"SIR\": 28.2355,\n            \"SAR\": 10.2758,\n            \"ISR\": 12.0485\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5352,\n            \"SIR\": 26.4308,\n            \"SAR\": 22.2181,\n            \"ISR\": 19.6635\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9238,\n            \"SIR\": 37.8685,\n            \"SAR\": 15.3453,\n            \"ISR\": 17.4007\n          },\n          \"instrumental\": {\n            \"SDR\": 16.514,\n            \"SIR\": 30.3403,\n            \"SAR\": 19.4834,\n            \"ISR\": 19.5439\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.8564,\n            \"SIR\": 34.9158,\n            \"SAR\": 6.86828,\n            \"ISR\": 8.52586\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5753,\n            \"SIR\": 12.3389,\n            \"SAR\": 13.2117,\n            \"ISR\": 19.5608\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.517,\n            \"SIR\": 39.8722,\n            \"SAR\": 17.9265,\n            \"ISR\": 18.2814\n          },\n          \"instrumental\": {\n            \"SDR\": 17.289,\n            \"SIR\": 32.4369,\n            \"SAR\": 20.7586,\n            \"ISR\": 19.644\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2788,\n            \"SIR\": 28.7263,\n            \"SAR\": 10.0593,\n            \"ISR\": 11.7162\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1491,\n            \"SIR\": 18.4602,\n            \"SAR\": 16.0043,\n            \"ISR\": 19.2895\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.13337,\n            \"SIR\": 28.354,\n            \"SAR\": 2.7865,\n            \"ISR\": 5.75026\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4082,\n            \"SIR\": 9.98449,\n            \"SAR\": 11.2769,\n            \"ISR\": 19.2755\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84089,\n            \"SIR\": 31.099,\n            \"SAR\": 8.48509,\n            \"ISR\": 10.7593\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3726,\n            \"SIR\": 17.3026,\n            \"SAR\": 14.3911,\n            \"ISR\": 19.5066\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.24722,\n            \"SIR\": 24.2584,\n            \"SAR\": 5.69048,\n            \"ISR\": 8.05899\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8662,\n            \"SIR\": 17.4352,\n            \"SAR\": 16.8385,\n            \"ISR\": 19.263\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.29622,\n            \"SIR\": 24.4992,\n            \"SAR\": 9.41904,\n            \"ISR\": 11.7931\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1646,\n            \"SIR\": 19.2755,\n            \"SAR\": 15.706,\n            \"ISR\": 18.73\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.2889,\n            \"SIR\": 32.4128,\n            \"SAR\": 10.0165,\n            \"ISR\": 12.3648\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7498,\n            \"SIR\": 19.1051,\n            \"SAR\": 14.7954,\n            \"ISR\": 19.0623\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.80998,\n            \"SIR\": 27.8081,\n            \"SAR\": -2.92602,\n            \"ISR\": 1.74463\n          },\n          \"instrumental\": {\n            \"SDR\": 18.1408,\n            \"SIR\": 11.1376,\n            \"SAR\": 20.5673,\n            \"ISR\": 19.8881\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1731,\n            \"SIR\": 29.3368,\n            \"SAR\": 11.6156,\n            \"ISR\": 12.3624\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5899,\n            \"SIR\": 19.4455,\n            \"SAR\": 17.5667,\n            \"ISR\": 19.304\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3855,\n            \"SIR\": 30.0185,\n            \"SAR\": 12.4226,\n            \"ISR\": 13.656\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0729,\n            \"SIR\": 22.4126,\n            \"SAR\": 19.764,\n            \"ISR\": 19.5254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9903,\n            \"SIR\": 35.8745,\n            \"SAR\": 15.2679,\n            \"ISR\": 17.7962\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5269,\n            \"SIR\": 31.9855,\n            \"SAR\": 21.3407,\n            \"ISR\": 19.6462\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.26104,\n            \"SIR\": 17.8598,\n            \"SAR\": 3.2037,\n            \"ISR\": 5.9366\n          },\n          \"instrumental\": {\n            \"SDR\": 11.168,\n            \"SIR\": 13.7282,\n            \"SAR\": 14.6733,\n            \"ISR\": 18.2361\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.42602,\n            \"SIR\": 37.9798,\n            \"SAR\": 3.38901,\n            \"ISR\": 5.25114\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5619,\n            \"SIR\": 12.0054,\n            \"SAR\": 12.8753,\n            \"ISR\": 19.7806\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.92507,\n            \"SIR\": 24.3567,\n            \"SAR\": 5.72536,\n            \"ISR\": 8.15347\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6545,\n            \"SIR\": 17.1722,\n            \"SAR\": 16.5175,\n            \"ISR\": 19.2461\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.85825,\n            \"SIR\": 29.3925,\n            \"SAR\": 5.97532,\n            \"ISR\": 7.97919\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2766,\n            \"SIR\": 15.5439,\n            \"SAR\": 15.2343,\n            \"ISR\": 19.1974\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.06974,\n            \"SIR\": 31.4844,\n            \"SAR\": 9.33317,\n            \"ISR\": 9.84969\n          },\n          \"instrumental\": {\n            \"SDR\": 9.95159,\n            \"SIR\": 12.7375,\n            \"SAR\": 11.9531,\n            \"ISR\": 19.208\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.98873,\n            \"SIR\": 26.6854,\n            \"SAR\": 2.081,\n            \"ISR\": 4.97736\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3336,\n            \"SIR\": 9.96016,\n            \"SAR\": 12.7406,\n            \"ISR\": 19.0492\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.50192,\n            \"SIR\": 29.6537,\n            \"SAR\": 4.82214,\n            \"ISR\": 7.43185\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7999,\n            \"SIR\": 14.5408,\n            \"SAR\": 14.3212,\n            \"ISR\": 17.7872\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.3624,\n            \"SIR\": 35.004,\n            \"SAR\": 12.6696,\n            \"ISR\": 13.2289\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4301,\n            \"SIR\": 14.091,\n            \"SAR\": 12.4263,\n            \"ISR\": 19.3\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.54518,\n            \"SIR\": 30.1511,\n            \"SAR\": 9.18607,\n            \"ISR\": 10.2983\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9966,\n            \"SIR\": 13.8382,\n            \"SAR\": 12.9041,\n            \"ISR\": 19.0269\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1641,\n            \"SIR\": 39.1433,\n            \"SAR\": 4.4328,\n            \"ISR\": 6.58832\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7084,\n            \"SIR\": 15.1447,\n            \"SAR\": 15.6636,\n            \"ISR\": 19.8685\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.46461,\n            \"SIR\": 14.9759,\n            \"SAR\": -0.0,\n            \"ISR\": 1.32129\n          },\n          \"instrumental\": {\n            \"SDR\": 8.70372,\n            \"SIR\": 8.50921,\n            \"SAR\": 19.6935,\n            \"ISR\": 19.0185\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.84148,\n            \"SIR\": 37.673,\n            \"SAR\": 7.22407,\n            \"ISR\": 10.1225\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0958,\n            \"SIR\": 16.6524,\n            \"SAR\": 15.3224,\n            \"ISR\": 19.7899\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.43367,\n            \"SIR\": 26.7779,\n            \"SAR\": 9.68603,\n            \"ISR\": 11.9258\n          },\n          \"instrumental\": {\n            \"SDR\": 10.0505,\n            \"SIR\": 12.5026,\n            \"SAR\": 9.51029,\n            \"ISR\": 17.3935\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.26089,\n            \"SIR\": 32.1735,\n            \"SAR\": 5.54724,\n            \"ISR\": 8.81442\n          },\n          \"instrumental\": {\n            \"SDR\": 13.1194,\n            \"SIR\": 14.4778,\n            \"SAR\": 12.8589,\n            \"ISR\": 19.2837\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.42876,\n            \"SIR\": 21.8126,\n            \"SAR\": 8.91982,\n            \"ISR\": 13.0204\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7408,\n            \"SIR\": 22.7536,\n            \"SAR\": 15.8154,\n            \"ISR\": 18.5818\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8328,\n            \"SIR\": 30.3219,\n            \"SAR\": 12.118,\n            \"ISR\": 14.1414\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5724,\n            \"SIR\": 24.3882,\n            \"SAR\": 19.2364,\n            \"ISR\": 19.4295\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9194,\n            \"SIR\": 36.6129,\n            \"SAR\": 8.11822,\n            \"ISR\": 9.68744\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9919,\n            \"SIR\": 18.507,\n            \"SAR\": 17.6981,\n            \"ISR\": 19.7258\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.8794,\n            \"SIR\": 39.619,\n            \"SAR\": 15.3755,\n            \"ISR\": 14.6759\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7705,\n            \"SIR\": 21.039,\n            \"SAR\": 18.288,\n            \"ISR\": 19.5735\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.44766,\n        \"SIR\": 29.7966,\n        \"SAR\": 7.22407,\n        \"ISR\": 9.68744\n      },\n      \"instrumental\": {\n        \"SDR\": 14.6545,\n        \"SIR\": 16.6469,\n        \"SAR\": 15.3224,\n        \"ISR\": 19.3\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"denoise_mel_band_roformer_aufr33_sdr_27.9959.ckpt\": {\n    \"model_name\": \"Roformer Model: Mel-Roformer-Denoise-Aufr33\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.3\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.8\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.0\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.2\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.3\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.2\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 9.0\n    },\n    \"stems\": [\n      \"dry\",\n      \"other\"\n    ],\n    \"target_stem\": \"dry\"\n  },\n  \"denoise_mel_band_roformer_aufr33_aggr_sdr_27.9768.ckpt\": {\n    \"model_name\": \"Roformer Model: Mel-Roformer-Denoise-Aufr33-Aggr\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.2\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.8\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.1\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.1\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.9\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.2\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.6\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 9.6\n    },\n    \"stems\": [\n      \"dry\",\n      \"other\"\n    ],\n    \"target_stem\": \"dry\"\n  },\n  \"mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt\": {\n    \"model_name\": \"Roformer Model: Mel-Roformer-Crowd-Aufr33-Viperx\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.1\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.1\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.2\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 12.6\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 11.2\n    },\n    \"stems\": [\n      \"crowd\",\n      \"other\"\n    ],\n    \"target_stem\": \"crowd\"\n  },\n  \"deverb_bs_roformer_8_384dim_10depth.ckpt\": {\n    \"model_name\": \"Roformer Model: BS-Roformer-De-Reverb\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 98.1\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 37.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 15.0\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 59.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.3\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 85.2\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 85.4\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 31.7\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 13.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 41.6\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 39.4\n    },\n    \"stems\": [\n      \"noreverb\",\n      \"reverb\"\n    ],\n    \"target_stem\": \"noreverb\"\n  },\n  \"vocals_mel_band_roformer.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Vocals by Kimberley Jensen\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.20453,\n            \"SIR\": 24.62,\n            \"SAR\": 8.44323,\n            \"ISR\": 12.5478\n          },\n          \"other\": {\n            \"SDR\": -4.75106,\n            \"SIR\": 18.6217,\n            \"SAR\": -5.8514,\n            \"ISR\": 17.2321\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.44646,\n            \"SIR\": 24.3041,\n            \"SAR\": 9.59779,\n            \"ISR\": 13.7708\n          },\n          \"other\": {\n            \"SDR\": -0.445615,\n            \"SIR\": 19.7624,\n            \"SAR\": -1.50028,\n            \"ISR\": 16.5288\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5967,\n            \"SIR\": 27.6383,\n            \"SAR\": 13.7347,\n            \"ISR\": 16.5271\n          },\n          \"other\": {\n            \"SDR\": -5.52993,\n            \"SIR\": 22.4023,\n            \"SAR\": -6.43527,\n            \"ISR\": 17.6028\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.13064,\n            \"SIR\": 6.84398,\n            \"SAR\": 6.4108,\n            \"ISR\": 14.0466\n          },\n          \"other\": {\n            \"SDR\": -6.93653,\n            \"SIR\": 15.1097,\n            \"SAR\": -11.0505,\n            \"ISR\": 6.5595\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7019,\n            \"SIR\": 26.86,\n            \"SAR\": 14.8681,\n            \"ISR\": 17.1673\n          },\n          \"other\": {\n            \"SDR\": -0.91055,\n            \"SIR\": 22.9796,\n            \"SAR\": -2.09014,\n            \"ISR\": 16.787\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0534,\n            \"SIR\": 21.2412,\n            \"SAR\": 11.8893,\n            \"ISR\": 15.4179\n          },\n          \"other\": {\n            \"SDR\": 0.62249,\n            \"SIR\": 19.7785,\n            \"SAR\": -0.65793,\n            \"ISR\": 15.5725\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7003,\n            \"SIR\": 24.3785,\n            \"SAR\": 13.7136,\n            \"ISR\": 15.8473\n          },\n          \"other\": {\n            \"SDR\": -3.73782,\n            \"SIR\": 19.7305,\n            \"SAR\": -4.89454,\n            \"ISR\": 15.8948\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2278,\n            \"SIR\": 25.7081,\n            \"SAR\": 10.6627,\n            \"ISR\": 14.8755\n          },\n          \"other\": {\n            \"SDR\": -7.50623,\n            \"SIR\": 19.2564,\n            \"SAR\": -8.49641,\n            \"ISR\": 17.8931\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1645,\n            \"SIR\": 32.8974,\n            \"SAR\": 15.6251,\n            \"ISR\": 17.6311\n          },\n          \"other\": {\n            \"SDR\": 4.03484,\n            \"SIR\": 29.0168,\n            \"SAR\": 3.13644,\n            \"ISR\": 19.0481\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5588,\n            \"SIR\": 25.579,\n            \"SAR\": 13.4438,\n            \"ISR\": 14.0793\n          },\n          \"other\": {\n            \"SDR\": 0.498385,\n            \"SIR\": 18.8493,\n            \"SAR\": -0.577695,\n            \"ISR\": 17.5357\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.6277,\n            \"SIR\": 42.2097,\n            \"SAR\": 18.0326,\n            \"ISR\": 18.4597\n          },\n          \"other\": {\n            \"SDR\": 0.48112,\n            \"SIR\": 27.5168,\n            \"SAR\": -0.41631,\n            \"ISR\": 19.3874\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.5967,\n        \"SIR\": 25.579,\n        \"SAR\": 13.4438,\n        \"ISR\": 15.4179\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"mel_band_roformer_kim_ft_unwa.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | FT by unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.35618,\n            \"SIR\": 26.0703,\n            \"SAR\": 8.36585,\n            \"ISR\": 11.7519\n          },\n          \"other\": {\n            \"SDR\": -4.98154,\n            \"SIR\": 17.0602,\n            \"SAR\": -6.0678,\n            \"ISR\": 17.3498\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.28118,\n            \"SIR\": 26.5467,\n            \"SAR\": 9.28576,\n            \"ISR\": 13.3506\n          },\n          \"other\": {\n            \"SDR\": -0.453755,\n            \"SIR\": 18.9808,\n            \"SAR\": -1.42004,\n            \"ISR\": 17.186\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4663,\n            \"SIR\": 27.7636,\n            \"SAR\": 13.3778,\n            \"ISR\": 16.5801\n          },\n          \"other\": {\n            \"SDR\": -5.50767,\n            \"SIR\": 22.903,\n            \"SAR\": -6.45916,\n            \"ISR\": 17.5493\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.04107,\n            \"SIR\": 6.61981,\n            \"SAR\": 5.88247,\n            \"ISR\": 13.8226\n          },\n          \"other\": {\n            \"SDR\": -7.01421,\n            \"SIR\": 14.3395,\n            \"SAR\": -11.1192,\n            \"ISR\": 6.33089\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.8263,\n            \"SIR\": 27.0324,\n            \"SAR\": 15.0265,\n            \"ISR\": 16.9693\n          },\n          \"other\": {\n            \"SDR\": -0.88469,\n            \"SIR\": 22.5049,\n            \"SAR\": -2.04681,\n            \"ISR\": 16.7957\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9469,\n            \"SIR\": 20.9137,\n            \"SAR\": 11.7642,\n            \"ISR\": 15.4123\n          },\n          \"other\": {\n            \"SDR\": 0.61887,\n            \"SIR\": 19.7633,\n            \"SAR\": -0.68734,\n            \"ISR\": 15.4564\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4992,\n            \"SIR\": 24.1639,\n            \"SAR\": 13.535,\n            \"ISR\": 15.5225\n          },\n          \"other\": {\n            \"SDR\": -3.74308,\n            \"SIR\": 19.0804,\n            \"SAR\": -4.86216,\n            \"ISR\": 15.862\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.4672,\n            \"SIR\": 26.8517,\n            \"SAR\": 10.8946,\n            \"ISR\": 15.0238\n          },\n          \"other\": {\n            \"SDR\": -7.82418,\n            \"SIR\": 18.8163,\n            \"SAR\": -8.84081,\n            \"ISR\": 18.0755\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.953,\n            \"SIR\": 35.2963,\n            \"SAR\": 15.2368,\n            \"ISR\": 17.6925\n          },\n          \"other\": {\n            \"SDR\": 4.05024,\n            \"SIR\": 28.5405,\n            \"SAR\": 3.1634,\n            \"ISR\": 19.0926\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4877,\n            \"SIR\": 26.4602,\n            \"SAR\": 13.1553,\n            \"ISR\": 13.696\n          },\n          \"other\": {\n            \"SDR\": 0.48458,\n            \"SIR\": 17.6172,\n            \"SAR\": -0.58263,\n            \"ISR\": 17.6945\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.568,\n            \"SIR\": 42.2331,\n            \"SAR\": 17.9423,\n            \"ISR\": 18.5018\n          },\n          \"other\": {\n            \"SDR\": 0.47828,\n            \"SIR\": 27.4917,\n            \"SAR\": -0.41064,\n            \"ISR\": 19.3727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4183,\n            \"SIR\": 26.1056,\n            \"SAR\": 13.1788,\n            \"ISR\": 16.1269\n          },\n          \"other\": {\n            \"SDR\": -4.18164,\n            \"SIR\": 20.2352,\n            \"SAR\": -5.38024,\n            \"ISR\": 16.6858\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.4423,\n        \"SIR\": 26.5035,\n        \"SAR\": 13.167,\n        \"ISR\": 15.4674\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"melband_roformer_inst_v1e.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Inst V1 (E) by Unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.1805,\n            \"SIR\": 24.2704,\n            \"SAR\": 6.82554,\n            \"ISR\": 9.89464\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7943,\n            \"SIR\": 23.9173,\n            \"SAR\": 20.2513,\n            \"ISR\": 19.3787\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.51409,\n            \"SIR\": 25.3613,\n            \"SAR\": 8.6834,\n            \"ISR\": 11.6787\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3868,\n            \"SIR\": 20.6518,\n            \"SAR\": 16.2606,\n            \"ISR\": 18.7944\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3751,\n            \"SIR\": 28.543,\n            \"SAR\": 12.5306,\n            \"ISR\": 14.9963\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7268,\n            \"SIR\": 26.6064,\n            \"SAR\": 19.7989,\n            \"ISR\": 19.1907\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.3697,\n            \"SIR\": 7.26461,\n            \"SAR\": 4.42895,\n            \"ISR\": 12.114\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8007,\n            \"SIR\": 24.1797,\n            \"SAR\": 13.8526,\n            \"ISR\": 14.5889\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6472,\n            \"SIR\": 27.9184,\n            \"SAR\": 13.7355,\n            \"ISR\": 15.565\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0619,\n            \"SIR\": 23.387,\n            \"SAR\": 16.3172,\n            \"ISR\": 18.4356\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.9534,\n            \"SIR\": 22.0835,\n            \"SAR\": 10.9775,\n            \"ISR\": 13.6732\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4782,\n            \"SIR\": 19.9114,\n            \"SAR\": 13.8838,\n            \"ISR\": 17.2865\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2203,\n            \"SIR\": 24.2794,\n            \"SAR\": 12.2334,\n            \"ISR\": 13.6806\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1993,\n            \"SIR\": 21.9269,\n            \"SAR\": 17.4485,\n            \"ISR\": 18.6176\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.2891,\n            \"SIR\": 26.0093,\n            \"SAR\": 7.48639,\n            \"ISR\": 10.754\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6122,\n            \"SIR\": 28.9923,\n            \"SAR\": 24.8467,\n            \"ISR\": 19.6605\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9846,\n            \"SIR\": 34.1704,\n            \"SAR\": 13.116,\n            \"ISR\": 15.3484\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1801,\n            \"SIR\": 24.4637,\n            \"SAR\": 17.5123,\n            \"ISR\": 19.3009\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8784,\n            \"SIR\": 25.13,\n            \"SAR\": 10.8475,\n            \"ISR\": 12.1154\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0379,\n            \"SIR\": 19.0503,\n            \"SAR\": 16.2796,\n            \"ISR\": 18.5254\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4741,\n            \"SIR\": 38.7534,\n            \"SAR\": 16.3561,\n            \"ISR\": 17.5507\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5269,\n            \"SIR\": 29.414,\n            \"SAR\": 19.2987,\n            \"ISR\": 19.604\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4409,\n            \"SIR\": 27.275,\n            \"SAR\": 12.3826,\n            \"ISR\": 14.361\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7392,\n            \"SIR\": 24.1909,\n            \"SAR\": 18.3815,\n            \"ISR\": 19.128\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.33509,\n            \"SIR\": 22.712,\n            \"SAR\": 8.05494,\n            \"ISR\": 11.6723\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8092,\n            \"SIR\": 23.2884,\n            \"SAR\": 17.9981,\n            \"ISR\": 18.578\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.45288,\n            \"SIR\": 25.1991,\n            \"SAR\": 10.701,\n            \"ISR\": 12.6727\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0784,\n            \"SIR\": 24.9138,\n            \"SAR\": 23.2958,\n            \"ISR\": 19.3166\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.77401,\n            \"SIR\": 21.2967,\n            \"SAR\": 4.19684,\n            \"ISR\": 8.6086\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8272,\n            \"SIR\": 21.7599,\n            \"SAR\": 19.2401,\n            \"ISR\": 19.1652\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.41606,\n            \"SIR\": 24.3024,\n            \"SAR\": 10.33,\n            \"ISR\": 13.0445\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5402,\n            \"SIR\": 21.6912,\n            \"SAR\": 16.5876,\n            \"ISR\": 18.5579\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.23415,\n            \"SIR\": 30.5143,\n            \"SAR\": 9.93761,\n            \"ISR\": 12.8462\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9857,\n            \"SIR\": 23.5123,\n            \"SAR\": 18.683,\n            \"ISR\": 19.3937\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.072135,\n            \"SIR\": -22.2234,\n            \"SAR\": 0.031185,\n            \"ISR\": 10.4577\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8367,\n            \"SIR\": 57.1884,\n            \"SAR\": 29.162,\n            \"ISR\": 18.4616\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.60154,\n            \"SIR\": 20.3358,\n            \"SAR\": 8.06741,\n            \"ISR\": 10.3785\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4277,\n            \"SIR\": 21.2743,\n            \"SAR\": 19.4859,\n            \"ISR\": 18.8447\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.932,\n            \"SIR\": 27.1973,\n            \"SAR\": 11.6509,\n            \"ISR\": 14.7433\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9003,\n            \"SIR\": 25.6688,\n            \"SAR\": 18.429,\n            \"ISR\": 19.1145\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7983,\n            \"SIR\": 26.6479,\n            \"SAR\": 13.2131,\n            \"ISR\": 15.659\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8933,\n            \"SIR\": 29.1824,\n            \"SAR\": 22.1669,\n            \"ISR\": 19.3666\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00039,\n            \"SIR\": 5.46794,\n            \"SAR\": 0.09246,\n            \"ISR\": 4.05517\n          },\n          \"instrumental\": {\n            \"SDR\": 19.7012,\n            \"SIR\": 40.9593,\n            \"SAR\": 36.0428,\n            \"ISR\": 19.3948\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.13607,\n            \"SIR\": 18.5241,\n            \"SAR\": 3.75436,\n            \"ISR\": 5.8868\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1222,\n            \"SIR\": 13.649,\n            \"SAR\": 15.3634,\n            \"ISR\": 18.345\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.97052,\n            \"SIR\": 22.6016,\n            \"SAR\": 7.85408,\n            \"ISR\": 12.0882\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5073,\n            \"SIR\": 27.7181,\n            \"SAR\": 21.5109,\n            \"ISR\": 19.3784\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.55931,\n            \"SIR\": 21.9828,\n            \"SAR\": 7.88072,\n            \"ISR\": 10.5663\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7308,\n            \"SIR\": 20.901,\n            \"SAR\": 17.6799,\n            \"ISR\": 18.8723\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.89531,\n            \"SIR\": 21.2883,\n            \"SAR\": 9.52928,\n            \"ISR\": 13.4898\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9504,\n            \"SIR\": 24.2647,\n            \"SAR\": 17.9409,\n            \"ISR\": 18.3744\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.128,\n            \"SIR\": 28.9954,\n            \"SAR\": 12.0096,\n            \"ISR\": 13.4383\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0465,\n            \"SIR\": 19.1712,\n            \"SAR\": 14.5857,\n            \"ISR\": 18.567\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.19582,\n            \"SIR\": 21.8801,\n            \"SAR\": 5.84229,\n            \"ISR\": 8.91805\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2235,\n            \"SIR\": 16.0361,\n            \"SAR\": 13.9538,\n            \"ISR\": 18.0406\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.85566,\n            \"SIR\": 27.2216,\n            \"SAR\": 10.9904,\n            \"ISR\": 13.675\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6875,\n            \"SIR\": 22.2981,\n            \"SAR\": 17.3166,\n            \"ISR\": 18.6841\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.1754,\n            \"SIR\": 32.4062,\n            \"SAR\": 14.3075,\n            \"ISR\": 15.44\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5675,\n            \"SIR\": 20.3635,\n            \"SAR\": 14.6408,\n            \"ISR\": 18.8495\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7241,\n            \"SIR\": 26.8214,\n            \"SAR\": 11.4857,\n            \"ISR\": 13.72\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0,\n            \"SIR\": 20.905,\n            \"SAR\": 15.5168,\n            \"ISR\": 18.3796\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5523,\n            \"SIR\": 36.357,\n            \"SAR\": 12.8237,\n            \"ISR\": 15.084\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7576,\n            \"SIR\": 27.5903,\n            \"SAR\": 22.0431,\n            \"ISR\": 19.7703\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.0657,\n            \"SIR\": 17.181,\n            \"SAR\": 9.31288,\n            \"ISR\": 9.66949\n          },\n          \"instrumental\": {\n            \"SDR\": 13.3134,\n            \"SIR\": 16.5477,\n            \"SAR\": 17.9914,\n            \"ISR\": 18.3898\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.884,\n            \"SIR\": 32.6888,\n            \"SAR\": 15.6606,\n            \"ISR\": 17.2881\n          },\n          \"instrumental\": {\n            \"SDR\": 19.046,\n            \"SIR\": 34.3802,\n            \"SAR\": 26.0809,\n            \"ISR\": 19.7068\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8376,\n            \"SIR\": 22.833,\n            \"SAR\": 12.1502,\n            \"ISR\": 15.6833\n          },\n          \"instrumental\": {\n            \"SDR\": 11.758,\n            \"SIR\": 20.7597,\n            \"SAR\": 12.4489,\n            \"ISR\": 16.2091\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.07108,\n            \"SIR\": 27.1141,\n            \"SAR\": 8.8797,\n            \"ISR\": 12.4004\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9102,\n            \"SIR\": 20.8479,\n            \"SAR\": 15.6257,\n            \"ISR\": 18.6648\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.52123,\n            \"SIR\": 8.20646,\n            \"SAR\": 4.32156,\n            \"ISR\": 9.69582\n          },\n          \"instrumental\": {\n            \"SDR\": 18.9687,\n            \"SIR\": 30.395,\n            \"SAR\": 24.9345,\n            \"ISR\": 18.8545\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.65971,\n            \"SIR\": 29.4232,\n            \"SAR\": 10.5955,\n            \"ISR\": 14.2022\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9149,\n            \"SIR\": 27.2811,\n            \"SAR\": 20.2152,\n            \"ISR\": 19.3523\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1236,\n            \"SIR\": 31.1417,\n            \"SAR\": 12.0417,\n            \"ISR\": 14.3407\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7892,\n            \"SIR\": 28.4426,\n            \"SAR\": 22.1601,\n            \"ISR\": 19.5277\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7602,\n            \"SIR\": 25.9511,\n            \"SAR\": 14.0482,\n            \"ISR\": 16.3066\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5757,\n            \"SIR\": 27.4733,\n            \"SAR\": 17.3299,\n            \"ISR\": 17.9038\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.5563,\n        \"SIR\": 25.2802,\n        \"SAR\": 10.6483,\n        \"ISR\": 13.2414\n      },\n      \"instrumental\": {\n        \"SDR\": 15.8182,\n        \"SIR\": 23.7148,\n        \"SAR\": 17.9947,\n        \"ISR\": 18.8471\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | De-Reverb by anvuew\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"noreverb\",\n      \"reverb\"\n    ],\n    \"target_stem\": \"noreverb\"\n  },\n  \"dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | De-Reverb Less Aggressive by anvuew\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"noreverb\",\n      \"reverb\"\n    ],\n    \"target_stem\": \"noreverb\"\n  },\n  \"dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | De-Reverb-Echo by Sucial\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"dry\",\n      \"no dry\"\n    ],\n    \"target_stem\": null\n  },\n  \"dereverb-echo_mel_band_roformer_sdr_13.4843_v2.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | De-Reverb-Echo V2 by Sucial\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"dry\",\n      \"no dry\"\n    ],\n    \"target_stem\": \"dry\"\n  },\n  \"MelBandRoformerSYHFT.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | SYHFT by SYH99999\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.28397,\n            \"SIR\": 25.5937,\n            \"SAR\": -6.65094,\n            \"ISR\": 2.14962\n          },\n          \"other\": {\n            \"SDR\": -4.88875,\n            \"SIR\": 5.38152,\n            \"SAR\": -4.84099,\n            \"ISR\": 18.3422\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.65496,\n            \"SIR\": 24.1503,\n            \"SAR\": -8.05382,\n            \"ISR\": 2.3716\n          },\n          \"other\": {\n            \"SDR\": -1.8057,\n            \"SIR\": 4.25746,\n            \"SAR\": -1.00565,\n            \"ISR\": 18.2375\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.56881,\n            \"SIR\": 24.7638,\n            \"SAR\": -6.56127,\n            \"ISR\": 2.76947\n          },\n          \"other\": {\n            \"SDR\": -6.17091,\n            \"SIR\": 2.64662,\n            \"SAR\": -4.37662,\n            \"ISR\": 18.6147\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.06117,\n            \"SIR\": 2.913,\n            \"SAR\": -10.9283,\n            \"ISR\": 1.56717\n          },\n          \"other\": {\n            \"SDR\": -7.10168,\n            \"SIR\": 2.21937,\n            \"SAR\": -6.60669,\n            \"ISR\": 13.5741\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1566,\n            \"SIR\": 24.4344,\n            \"SAR\": -6.43097,\n            \"ISR\": 2.65868\n          },\n          \"other\": {\n            \"SDR\": -2.309,\n            \"SIR\": 0.229365,\n            \"SAR\": -0.69313,\n            \"ISR\": 18.4414\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.10493,\n            \"SIR\": 19.1978,\n            \"SAR\": -7.67819,\n            \"ISR\": 2.29761\n          },\n          \"other\": {\n            \"SDR\": -0.20165,\n            \"SIR\": 1.12613,\n            \"SAR\": 0.28259,\n            \"ISR\": 17.9232\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.68887,\n            \"SIR\": 22.7006,\n            \"SAR\": -7.41624,\n            \"ISR\": 2.17881\n          },\n          \"other\": {\n            \"SDR\": -4.29739,\n            \"SIR\": 1.15407,\n            \"SAR\": -2.73527,\n            \"ISR\": 17.9061\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.36242,\n            \"SIR\": 25.13,\n            \"SAR\": -7.1653,\n            \"ISR\": 2.43185\n          },\n          \"other\": {\n            \"SDR\": -8.18448,\n            \"SIR\": 4.49869,\n            \"SAR\": -7.39641,\n            \"ISR\": 19.0343\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8523,\n            \"SIR\": 30.2167,\n            \"SAR\": -6.74002,\n            \"ISR\": 2.84627\n          },\n          \"other\": {\n            \"SDR\": 3.319,\n            \"SIR\": 4.57753,\n            \"SAR\": 2.43338,\n            \"ISR\": 19.2716\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.8587,\n            \"SIR\": 28.9034,\n            \"SAR\": -8.93503,\n            \"ISR\": 2.22495\n          },\n          \"other\": {\n            \"SDR\": -0.07087,\n            \"SIR\": 2.8891,\n            \"SAR\": 0.32758,\n            \"ISR\": 19.1175\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.3293,\n            \"SIR\": 33.4488,\n            \"SAR\": -5.865,\n            \"ISR\": 2.87029\n          },\n          \"other\": {\n            \"SDR\": -0.76948,\n            \"SIR\": 2.5669,\n            \"SAR\": 0.18894,\n            \"ISR\": 19.3799\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.14983,\n            \"SIR\": 27.5606,\n            \"SAR\": -5.10267,\n            \"ISR\": 2.80428\n          },\n          \"other\": {\n            \"SDR\": -4.72006,\n            \"SIR\": 1.34799,\n            \"SAR\": -3.36664,\n            \"ISR\": 18.3667\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 7.98181,\n        \"SIR\": 24.9469,\n        \"SAR\": -6.95266,\n        \"ISR\": 2.40172\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MelBandRoformerSYHFTV2.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | SYHFT V2 by SYH99999\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.37024,\n            \"SIR\": 26.0963,\n            \"SAR\": -6.42188,\n            \"ISR\": 2.13067\n          },\n          \"other\": {\n            \"SDR\": -5.31182,\n            \"SIR\": 5.24659,\n            \"SAR\": -5.2012,\n            \"ISR\": 18.3257\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.83564,\n            \"SIR\": 24.6619,\n            \"SAR\": -7.74812,\n            \"ISR\": 2.36961\n          },\n          \"other\": {\n            \"SDR\": -1.90102,\n            \"SIR\": 4.19655,\n            \"SAR\": -1.01898,\n            \"ISR\": 18.2521\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.77308,\n            \"SIR\": 23.852,\n            \"SAR\": -6.34669,\n            \"ISR\": 2.72775\n          },\n          \"other\": {\n            \"SDR\": -6.38002,\n            \"SIR\": 3.06188,\n            \"SAR\": -4.51879,\n            \"ISR\": 18.5106\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.487605,\n            \"SIR\": 3.65557,\n            \"SAR\": -10.378,\n            \"ISR\": 1.65109\n          },\n          \"other\": {\n            \"SDR\": -7.09256,\n            \"SIR\": 2.36534,\n            \"SAR\": -6.55424,\n            \"ISR\": 13.565\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.75731,\n            \"SIR\": 25.6381,\n            \"SAR\": -6.39069,\n            \"ISR\": 2.58615\n          },\n          \"other\": {\n            \"SDR\": -2.4114,\n            \"SIR\": 0.17628,\n            \"SAR\": -0.657185,\n            \"ISR\": 18.665\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.49478,\n            \"SIR\": 17.5819,\n            \"SAR\": -7.40468,\n            \"ISR\": 2.43491\n          },\n          \"other\": {\n            \"SDR\": -0.08301,\n            \"SIR\": 1.56056,\n            \"SAR\": 0.15452,\n            \"ISR\": 17.6402\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.37845,\n            \"SIR\": 22.6717,\n            \"SAR\": -6.46152,\n            \"ISR\": 2.60955\n          },\n          \"other\": {\n            \"SDR\": -4.04285,\n            \"SIR\": 1.65709,\n            \"SAR\": -3.11757,\n            \"ISR\": 17.6554\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.79072,\n            \"SIR\": 26.0646,\n            \"SAR\": -7.36286,\n            \"ISR\": 2.36963\n          },\n          \"other\": {\n            \"SDR\": -7.87211,\n            \"SIR\": 4.26703,\n            \"SAR\": -7.17655,\n            \"ISR\": 19.0222\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1194,\n            \"SIR\": 32.0815,\n            \"SAR\": -6.63171,\n            \"ISR\": 2.7755\n          },\n          \"other\": {\n            \"SDR\": 3.3247,\n            \"SIR\": 4.52975,\n            \"SAR\": 2.50123,\n            \"ISR\": 19.4021\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.47626,\n            \"SIR\": 28.4171,\n            \"SAR\": -8.79002,\n            \"ISR\": 2.19966\n          },\n          \"other\": {\n            \"SDR\": 0.06928,\n            \"SIR\": 2.83696,\n            \"SAR\": 0.29133,\n            \"ISR\": 19.1336\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8396,\n            \"SIR\": 32.1657,\n            \"SAR\": -5.99695,\n            \"ISR\": 2.81832\n          },\n          \"other\": {\n            \"SDR\": -0.97207,\n            \"SIR\": 2.47631,\n            \"SAR\": 0.31421,\n            \"ISR\": 19.2378\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.49911,\n            \"SIR\": 25.9433,\n            \"SAR\": -4.99301,\n            \"ISR\": 2.8464\n          },\n          \"other\": {\n            \"SDR\": -4.6784,\n            \"SIR\": 1.34594,\n            \"SAR\": -3.39803,\n            \"ISR\": 18.3066\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.6361,\n        \"SIR\": 25.7907,\n        \"SAR\": -6.54662,\n        \"ISR\": 2.51053\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MelBandRoformerSYHFTV2.5.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | SYHFT V2.5 by SYH99999\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.2509,\n            \"SIR\": 27.1264,\n            \"SAR\": -5.74739,\n            \"ISR\": 1.89387\n          },\n          \"other\": {\n            \"SDR\": -4.60354,\n            \"SIR\": 5.3787,\n            \"SAR\": -4.48298,\n            \"ISR\": 18.5497\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.67353,\n            \"SIR\": 25.4151,\n            \"SAR\": -7.70052,\n            \"ISR\": 2.28475\n          },\n          \"other\": {\n            \"SDR\": -1.89801,\n            \"SIR\": 4.12038,\n            \"SAR\": -0.95478,\n            \"ISR\": 18.3542\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.63973,\n            \"SIR\": 25.1115,\n            \"SAR\": -6.45841,\n            \"ISR\": 2.66018\n          },\n          \"other\": {\n            \"SDR\": -6.40526,\n            \"SIR\": 3.00992,\n            \"SAR\": -4.46331,\n            \"ISR\": 18.6211\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00027,\n            \"SIR\": 3.06818,\n            \"SAR\": -10.1881,\n            \"ISR\": 1.62119\n          },\n          \"other\": {\n            \"SDR\": -7.14073,\n            \"SIR\": 2.05658,\n            \"SAR\": -6.73285,\n            \"ISR\": 13.2318\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.50716,\n            \"SIR\": 26.3876,\n            \"SAR\": -6.32311,\n            \"ISR\": 2.52035\n          },\n          \"other\": {\n            \"SDR\": -2.41861,\n            \"SIR\": 0.117005,\n            \"SAR\": -0.512955,\n            \"ISR\": 18.8304\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.36989,\n            \"SIR\": 18.8742,\n            \"SAR\": -7.49526,\n            \"ISR\": 2.36155\n          },\n          \"other\": {\n            \"SDR\": -0.12136,\n            \"SIR\": 1.36045,\n            \"SAR\": 0.2725,\n            \"ISR\": 17.9632\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.73279,\n            \"SIR\": 24.409,\n            \"SAR\": -6.54021,\n            \"ISR\": 2.51136\n          },\n          \"other\": {\n            \"SDR\": -4.04448,\n            \"SIR\": 1.57525,\n            \"SAR\": -2.97279,\n            \"ISR\": 17.9162\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.59133,\n            \"SIR\": 27.0506,\n            \"SAR\": -7.14608,\n            \"ISR\": 2.20751\n          },\n          \"other\": {\n            \"SDR\": -8.01336,\n            \"SIR\": 4.26656,\n            \"SAR\": -7.19951,\n            \"ISR\": 19.1576\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.974,\n            \"SIR\": 34.3501,\n            \"SAR\": -6.65376,\n            \"ISR\": 2.7205\n          },\n          \"other\": {\n            \"SDR\": 3.3108,\n            \"SIR\": 4.47529,\n            \"SAR\": 2.60297,\n            \"ISR\": 19.5124\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.13083,\n            \"SIR\": 31.1122,\n            \"SAR\": -8.92637,\n            \"ISR\": 2.11411\n          },\n          \"other\": {\n            \"SDR\": 0.05511,\n            \"SIR\": 2.77365,\n            \"SAR\": 0.377655,\n            \"ISR\": 19.3304\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.8844,\n            \"SIR\": 34.3363,\n            \"SAR\": -5.8707,\n            \"ISR\": 2.76611\n          },\n          \"other\": {\n            \"SDR\": -0.97863,\n            \"SIR\": 2.42401,\n            \"SAR\": 0.36995,\n            \"ISR\": 19.4227\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.459,\n            \"SIR\": 26.9441,\n            \"SAR\": -4.98987,\n            \"ISR\": 2.75873\n          },\n          \"other\": {\n            \"SDR\": -4.71604,\n            \"SIR\": 1.35969,\n            \"SAR\": -3.24537,\n            \"ISR\": 18.4479\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.54936,\n        \"SIR\": 26.6658,\n        \"SAR\": -6.59699,\n        \"ISR\": 2.43645\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MelBandRoformerSYHFTV3Epsilon.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | SYHFT V3 by SYH99999\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.69476,\n            \"SIR\": 23.11,\n            \"SAR\": -4.41369,\n            \"ISR\": 2.2685\n          },\n          \"other\": {\n            \"SDR\": -4.2509,\n            \"SIR\": 6.84262,\n            \"SAR\": -4.03572,\n            \"ISR\": 18.5098\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.42636,\n            \"SIR\": 21.8764,\n            \"SAR\": -7.34939,\n            \"ISR\": 2.49869\n          },\n          \"other\": {\n            \"SDR\": -1.90187,\n            \"SIR\": 4.31387,\n            \"SAR\": -1.07243,\n            \"ISR\": 17.9758\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.39977,\n            \"SIR\": 24.9348,\n            \"SAR\": -6.27115,\n            \"ISR\": 2.76801\n          },\n          \"other\": {\n            \"SDR\": -6.65212,\n            \"SIR\": 3.27308,\n            \"SAR\": -4.6549,\n            \"ISR\": 18.6046\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.57292,\n            \"SIR\": 4.23584,\n            \"SAR\": -10.6302,\n            \"ISR\": 1.71243\n          },\n          \"other\": {\n            \"SDR\": -7.14876,\n            \"SIR\": 2.29368,\n            \"SAR\": -6.7189,\n            \"ISR\": 13.7876\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0749,\n            \"SIR\": 25.7244,\n            \"SAR\": -5.99084,\n            \"ISR\": 2.85966\n          },\n          \"other\": {\n            \"SDR\": -1.91016,\n            \"SIR\": 0.53837,\n            \"SAR\": -0.809595,\n            \"ISR\": 18.5672\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.40808,\n            \"SIR\": 19.7226,\n            \"SAR\": -7.4225,\n            \"ISR\": 2.4391\n          },\n          \"other\": {\n            \"SDR\": -0.0883,\n            \"SIR\": 1.67919,\n            \"SAR\": 0.191055,\n            \"ISR\": 17.9385\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.75,\n            \"SIR\": 23.3612,\n            \"SAR\": -5.92324,\n            \"ISR\": 2.7848\n          },\n          \"other\": {\n            \"SDR\": -3.97229,\n            \"SIR\": 1.97194,\n            \"SAR\": -3.26933,\n            \"ISR\": 17.6803\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.6005,\n            \"SIR\": 26.4674,\n            \"SAR\": -6.70829,\n            \"ISR\": 2.37296\n          },\n          \"other\": {\n            \"SDR\": -7.30703,\n            \"SIR\": 4.46527,\n            \"SAR\": -6.91343,\n            \"ISR\": 19.1635\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5544,\n            \"SIR\": 32.2282,\n            \"SAR\": -6.68533,\n            \"ISR\": 2.78445\n          },\n          \"other\": {\n            \"SDR\": 3.24795,\n            \"SIR\": 4.54709,\n            \"SAR\": 2.49366,\n            \"ISR\": 19.4482\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.49879,\n            \"SIR\": 25.7345,\n            \"SAR\": -7.68662,\n            \"ISR\": 2.44653\n          },\n          \"other\": {\n            \"SDR\": 0.17081,\n            \"SIR\": 3.15008,\n            \"SAR\": 0.01005,\n            \"ISR\": 18.9612\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.2173,\n            \"SIR\": 37.433,\n            \"SAR\": -5.82431,\n            \"ISR\": 2.79836\n          },\n          \"other\": {\n            \"SDR\": -0.76289,\n            \"SIR\": 2.47047,\n            \"SAR\": 0.3654,\n            \"ISR\": 19.5669\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.66433,\n            \"SIR\": 26.7163,\n            \"SAR\": -4.8794,\n            \"ISR\": 2.93274\n          },\n          \"other\": {\n            \"SDR\": -4.59503,\n            \"SIR\": 2.34488,\n            \"SAR\": -3.51889,\n            \"ISR\": 18.3138\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.45343,\n        \"SIR\": 25.3296,\n        \"SAR\": -6.47824,\n        \"ISR\": 2.63335\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"MelBandRoformerBigSYHFTV1.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Big SYHFT V1 by SYH99999\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.32802,\n            \"SIR\": 23.7231,\n            \"SAR\": 7.63523,\n            \"ISR\": 11.8524\n          },\n          \"other\": {\n            \"SDR\": -4.23591,\n            \"SIR\": 18.7075,\n            \"SAR\": -5.19124,\n            \"ISR\": 17.3589\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.26036,\n            \"SIR\": 26.3146,\n            \"SAR\": 9.25244,\n            \"ISR\": 13.3987\n          },\n          \"other\": {\n            \"SDR\": -0.47455,\n            \"SIR\": 19.1456,\n            \"SAR\": -1.42593,\n            \"ISR\": 17.0664\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2978,\n            \"SIR\": 27.361,\n            \"SAR\": 13.2374,\n            \"ISR\": 16.323\n          },\n          \"other\": {\n            \"SDR\": -5.78077,\n            \"SIR\": 22.0778,\n            \"SAR\": -6.88454,\n            \"ISR\": 17.5073\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.86945,\n            \"SIR\": 7.36771,\n            \"SAR\": 5.69638,\n            \"ISR\": 13.7535\n          },\n          \"other\": {\n            \"SDR\": -7.13562,\n            \"SIR\": 14.6932,\n            \"SAR\": -10.8205,\n            \"ISR\": 7.13055\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5912,\n            \"SIR\": 27.0081,\n            \"SAR\": 14.5932,\n            \"ISR\": 16.936\n          },\n          \"other\": {\n            \"SDR\": -0.9158,\n            \"SIR\": 22.4455,\n            \"SAR\": -2.07327,\n            \"ISR\": 16.7634\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9093,\n            \"SIR\": 21.1428,\n            \"SAR\": 11.6773,\n            \"ISR\": 15.275\n          },\n          \"other\": {\n            \"SDR\": 0.58153,\n            \"SIR\": 19.5986,\n            \"SAR\": -0.66488,\n            \"ISR\": 15.5436\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4658,\n            \"SIR\": 23.8785,\n            \"SAR\": 13.3179,\n            \"ISR\": 15.7116\n          },\n          \"other\": {\n            \"SDR\": -3.74416,\n            \"SIR\": 19.3231,\n            \"SAR\": -4.8877,\n            \"ISR\": 15.7517\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.21062,\n            \"SIR\": 25.5773,\n            \"SAR\": 9.50849,\n            \"ISR\": 14.0962\n          },\n          \"other\": {\n            \"SDR\": -7.12991,\n            \"SIR\": 19.6647,\n            \"SAR\": -8.16753,\n            \"ISR\": 18.0027\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.6785,\n            \"SIR\": 36.8897,\n            \"SAR\": 14.9808,\n            \"ISR\": 17.0937\n          },\n          \"other\": {\n            \"SDR\": 4.00524,\n            \"SIR\": 27.413,\n            \"SAR\": 3.11794,\n            \"ISR\": 19.2033\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.6794,\n            \"SIR\": 26.1761,\n            \"SAR\": 12.7406,\n            \"ISR\": 13.4589\n          },\n          \"other\": {\n            \"SDR\": 0.45182,\n            \"SIR\": 17.9291,\n            \"SAR\": -0.61197,\n            \"ISR\": 17.8266\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.3621,\n            \"SIR\": 40.9233,\n            \"SAR\": 17.8066,\n            \"ISR\": 18.1758\n          },\n          \"other\": {\n            \"SDR\": 0.586795,\n            \"SIR\": 26.7615,\n            \"SAR\": -0.27375,\n            \"ISR\": 19.3057\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2176,\n            \"SIR\": 25.9719,\n            \"SAR\": 12.925,\n            \"ISR\": 16.0315\n          },\n          \"other\": {\n            \"SDR\": -4.17694,\n            \"SIR\": 20.2713,\n            \"SAR\": -5.35717,\n            \"ISR\": 16.7248\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.2577,\n        \"SIR\": 26.074,\n        \"SAR\": 12.8328,\n        \"ISR\": 15.4933\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"melband_roformer_big_beta4.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Big Beta 4 FT by unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.37689,\n            \"SIR\": 22.7518,\n            \"SAR\": 7.64581,\n            \"ISR\": 12.3426\n          },\n          \"other\": {\n            \"SDR\": -4.16058,\n            \"SIR\": 19.2598,\n            \"SAR\": -5.20131,\n            \"ISR\": 17.0692\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.44244,\n            \"SIR\": 23.1959,\n            \"SAR\": 9.36184,\n            \"ISR\": 14.4851\n          },\n          \"other\": {\n            \"SDR\": -0.46607,\n            \"SIR\": 20.9222,\n            \"SAR\": -1.56443,\n            \"ISR\": 16.2223\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5156,\n            \"SIR\": 26.3266,\n            \"SAR\": 13.2393,\n            \"ISR\": 17.0027\n          },\n          \"other\": {\n            \"SDR\": -5.73828,\n            \"SIR\": 23.7935,\n            \"SAR\": -6.92479,\n            \"ISR\": 16.9443\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.86382,\n            \"SIR\": 6.62091,\n            \"SAR\": 6.36907,\n            \"ISR\": 14.8088\n          },\n          \"other\": {\n            \"SDR\": -6.95283,\n            \"SIR\": 16.3892,\n            \"SAR\": -11.4173,\n            \"ISR\": 6.21786\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.6263,\n            \"SIR\": 24.8811,\n            \"SAR\": 14.4971,\n            \"ISR\": 18.0552\n          },\n          \"other\": {\n            \"SDR\": -0.89634,\n            \"SIR\": 25.5381,\n            \"SAR\": -2.16973,\n            \"ISR\": 16.0397\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.962,\n            \"SIR\": 19.4326,\n            \"SAR\": 11.629,\n            \"ISR\": 16.3227\n          },\n          \"other\": {\n            \"SDR\": 0.58844,\n            \"SIR\": 21.8394,\n            \"SAR\": -0.77813,\n            \"ISR\": 14.62\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6095,\n            \"SIR\": 22.9288,\n            \"SAR\": 13.2689,\n            \"ISR\": 16.4467\n          },\n          \"other\": {\n            \"SDR\": -3.72353,\n            \"SIR\": 20.4047,\n            \"SAR\": -5.00858,\n            \"ISR\": 15.3488\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68592,\n            \"SIR\": 23.8458,\n            \"SAR\": 9.22676,\n            \"ISR\": 13.3661\n          },\n          \"other\": {\n            \"SDR\": -7.22498,\n            \"SIR\": 20.9114,\n            \"SAR\": -8.28144,\n            \"ISR\": 17.7217\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.0121,\n            \"SIR\": 31.7821,\n            \"SAR\": 14.7973,\n            \"ISR\": 18.4656\n          },\n          \"other\": {\n            \"SDR\": 4.02268,\n            \"SIR\": 30.3784,\n            \"SAR\": 3.12026,\n            \"ISR\": 18.9133\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.2051,\n            \"SIR\": 24.8128,\n            \"SAR\": 12.7727,\n            \"ISR\": 14.4255\n          },\n          \"other\": {\n            \"SDR\": 0.4686,\n            \"SIR\": 19.5755,\n            \"SAR\": -0.67883,\n            \"ISR\": 17.4066\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.5632,\n            \"SIR\": 38.4082,\n            \"SAR\": 17.4184,\n            \"ISR\": 18.9371\n          },\n          \"other\": {\n            \"SDR\": 0.55558,\n            \"SIR\": 27.9603,\n            \"SAR\": -0.32981,\n            \"ISR\": 19.1225\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.5156,\n        \"SIR\": 23.8458,\n        \"SAR\": 12.7727,\n        \"ISR\": 16.3227\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"melband_roformer_big_beta5e.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Big Beta 5e FT by unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.49169,\n            \"SIR\": 22.6356,\n            \"SAR\": 7.5593,\n            \"ISR\": 12.2271\n          },\n          \"other\": {\n            \"SDR\": -4.19563,\n            \"SIR\": 19.085,\n            \"SAR\": -5.23404,\n            \"ISR\": 16.9934\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.23716,\n            \"SIR\": 25.0451,\n            \"SAR\": 9.27043,\n            \"ISR\": 13.7221\n          },\n          \"other\": {\n            \"SDR\": -0.44997,\n            \"SIR\": 19.5867,\n            \"SAR\": -1.46239,\n            \"ISR\": 16.7816\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3365,\n            \"SIR\": 25.8589,\n            \"SAR\": 13.1486,\n            \"ISR\": 16.8873\n          },\n          \"other\": {\n            \"SDR\": -5.7321,\n            \"SIR\": 23.2169,\n            \"SAR\": -6.91466,\n            \"ISR\": 16.8053\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 2.90358,\n            \"SIR\": 6.27191,\n            \"SAR\": 5.74066,\n            \"ISR\": 13.3394\n          },\n          \"other\": {\n            \"SDR\": -7.09233,\n            \"SIR\": 12.8478,\n            \"SAR\": -11.4024,\n            \"ISR\": 6.09584\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7324,\n            \"SIR\": 24.5787,\n            \"SAR\": 14.505,\n            \"ISR\": 17.2205\n          },\n          \"other\": {\n            \"SDR\": -0.92488,\n            \"SIR\": 22.9787,\n            \"SAR\": -2.17855,\n            \"ISR\": 15.9968\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9304,\n            \"SIR\": 18.8269,\n            \"SAR\": 11.777,\n            \"ISR\": 16.2412\n          },\n          \"other\": {\n            \"SDR\": 0.58929,\n            \"SIR\": 21.4928,\n            \"SAR\": -0.80436,\n            \"ISR\": 14.3095\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4935,\n            \"SIR\": 22.062,\n            \"SAR\": 13.3449,\n            \"ISR\": 16.289\n          },\n          \"other\": {\n            \"SDR\": -3.72188,\n            \"SIR\": 20.0211,\n            \"SAR\": -5.03224,\n            \"ISR\": 15.0456\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.64817,\n            \"SIR\": 22.9176,\n            \"SAR\": 8.86618,\n            \"ISR\": 13.8414\n          },\n          \"other\": {\n            \"SDR\": -7.19083,\n            \"SIR\": 20.325,\n            \"SAR\": -8.31767,\n            \"ISR\": 17.4081\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7216,\n            \"SIR\": 34.0572,\n            \"SAR\": 14.883,\n            \"ISR\": 17.6855\n          },\n          \"other\": {\n            \"SDR\": 4.03792,\n            \"SIR\": 28.5062,\n            \"SAR\": 3.14876,\n            \"ISR\": 18.932\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9148,\n            \"SIR\": 23.3127,\n            \"SAR\": 12.734,\n            \"ISR\": 14.1068\n          },\n          \"other\": {\n            \"SDR\": 0.49315,\n            \"SIR\": 18.4418,\n            \"SAR\": -0.69737,\n            \"ISR\": 16.8923\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.5306,\n            \"SIR\": 38.2973,\n            \"SAR\": 17.6089,\n            \"ISR\": 18.721\n          },\n          \"other\": {\n            \"SDR\": 0.55652,\n            \"SIR\": 27.5824,\n            \"SAR\": -0.31755,\n            \"ISR\": 19.0682\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3716,\n            \"SIR\": 24.2785,\n            \"SAR\": 12.8717,\n            \"ISR\": 16.433\n          },\n          \"other\": {\n            \"SDR\": -4.17094,\n            \"SIR\": 20.6596,\n            \"SAR\": -5.43515,\n            \"ISR\": 16.0996\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 12.3541,\n        \"SIR\": 23.7956,\n        \"SAR\": 12.8028,\n        \"ISR\": 16.2651\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"other\"\n    ],\n    \"target_stem\": \"vocals\"\n  },\n  \"model_chorus_bs_roformer_ep_267_sdr_24.1275.ckpt\": {\n    \"model_name\": \"Roformer Model: BS Roformer | Chorus Male-Female by Sucial\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"male\",\n      \"female\"\n    ],\n    \"target_stem\": null\n  },\n  \"aspiration_mel_band_roformer_sdr_18.9845.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Aspiration by Sucial\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"aspiration\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Aspiration Less Aggressive by Sucial\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"aspiration\",\n      \"other\"\n    ],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_bleed_suppressor_v1.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Bleed Suppressor V1 by unwa-97chris\",\n    \"track_scores\": [],\n    \"median_scores\": {},\n    \"stems\": [\n      \"instrumental\",\n      \"bleed\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"UVR-MDX-NET-Inst_HQ_5.onnx\": {\n    \"model_name\": \"MDX-Net Model: UVR-MDX-NET Inst HQ 5\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.06866,\n            \"SIR\": 17.1293,\n            \"SAR\": 6.12897,\n            \"ISR\": 10.8085\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5979,\n            \"SIR\": 25.449,\n            \"SAR\": 20.1238,\n            \"ISR\": 18.5852\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.86413,\n            \"SIR\": 7.14475,\n            \"SAR\": 6.71163,\n            \"ISR\": 13.7815\n          },\n          \"instrumental\": {\n            \"SDR\": 10.3954,\n            \"SIR\": 23.1102,\n            \"SAR\": 11.5312,\n            \"ISR\": 11.3294\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1678,\n            \"SIR\": 21.4902,\n            \"SAR\": 10.9138,\n            \"ISR\": 15.2388\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5844,\n            \"SIR\": 26.8181,\n            \"SAR\": 18.1816,\n            \"ISR\": 18.0163\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 1.49665,\n            \"SIR\": 5.39072,\n            \"SAR\": 3.99017,\n            \"ISR\": 13.6714\n          },\n          \"instrumental\": {\n            \"SDR\": 10.8126,\n            \"SIR\": 26.7615,\n            \"SAR\": 12.889,\n            \"ISR\": 13.0727\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0976,\n            \"SIR\": 22.6074,\n            \"SAR\": 13.1823,\n            \"ISR\": 15.9144\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7452,\n            \"SIR\": 24.1706,\n            \"SAR\": 16.0199,\n            \"ISR\": 17.1616\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1052,\n            \"SIR\": 17.9538,\n            \"SAR\": 11.0594,\n            \"ISR\": 15.0005\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9227,\n            \"SIR\": 22.4091,\n            \"SAR\": 13.443,\n            \"ISR\": 15.6633\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3709,\n            \"SIR\": 20.1869,\n            \"SAR\": 12.5366,\n            \"ISR\": 14.5818\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8094,\n            \"SIR\": 23.0616,\n            \"SAR\": 17.0469,\n            \"ISR\": 17.6975\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.44196,\n            \"SIR\": 18.6383,\n            \"SAR\": 5.19323,\n            \"ISR\": 8.45813\n          },\n          \"instrumental\": {\n            \"SDR\": 17.944,\n            \"SIR\": 28.1254,\n            \"SAR\": 23.5433,\n            \"ISR\": 19.3422\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2688,\n            \"SIR\": 29.0017,\n            \"SAR\": 13.2409,\n            \"ISR\": 16.4406\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4003,\n            \"SIR\": 27.2896,\n            \"SAR\": 17.6964,\n            \"ISR\": 18.7218\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9371,\n            \"SIR\": 21.7519,\n            \"SAR\": 10.3204,\n            \"ISR\": 11.9767\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7308,\n            \"SIR\": 18.5303,\n            \"SAR\": 15.5904,\n            \"ISR\": 17.6628\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9209,\n            \"SIR\": 32.4354,\n            \"SAR\": 16.8237,\n            \"ISR\": 18.0697\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7015,\n            \"SIR\": 31.9637,\n            \"SAR\": 19.9936,\n            \"ISR\": 19.0201\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2377,\n            \"SIR\": 21.7182,\n            \"SAR\": 11.8136,\n            \"ISR\": 15.4074\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1684,\n            \"SIR\": 26.5507,\n            \"SAR\": 17.5078,\n            \"ISR\": 18.0099\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.66587,\n            \"SIR\": 18.3055,\n            \"SAR\": 8.12039,\n            \"ISR\": 13.2057\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2961,\n            \"SIR\": 25.8978,\n            \"SAR\": 17.475,\n            \"ISR\": 17.409\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.43558,\n            \"SIR\": 21.3192,\n            \"SAR\": 11.4827,\n            \"ISR\": 11.7681\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9103,\n            \"SIR\": 23.4344,\n            \"SAR\": 19.7739,\n            \"ISR\": 17.9954\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.60842,\n            \"SIR\": 17.7695,\n            \"SAR\": 3.42424,\n            \"ISR\": 8.95832\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3167,\n            \"SIR\": 21.882,\n            \"SAR\": 18.8421,\n            \"ISR\": 18.7015\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.47507,\n            \"SIR\": 13.4293,\n            \"SAR\": 8.31376,\n            \"ISR\": 14.8655\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3564,\n            \"SIR\": 24.4421,\n            \"SAR\": 13.1426,\n            \"ISR\": 14.6737\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.69496,\n            \"SIR\": 20.5642,\n            \"SAR\": 9.03007,\n            \"ISR\": 13.8318\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7452,\n            \"SIR\": 25.1364,\n            \"SAR\": 17.0349,\n            \"ISR\": 18.1356\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -17.5718,\n            \"SIR\": -38.237,\n            \"SAR\": 0.57891,\n            \"ISR\": 12.2376\n          },\n          \"instrumental\": {\n            \"SDR\": 13.0372,\n            \"SIR\": 57.6564,\n            \"SAR\": 12.35,\n            \"ISR\": 12.0916\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.88442,\n            \"SIR\": 14.8298,\n            \"SAR\": 6.89469,\n            \"ISR\": 11.1319\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7369,\n            \"SIR\": 22.1866,\n            \"SAR\": 18.0478,\n            \"ISR\": 17.4268\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7124,\n            \"SIR\": 23.8202,\n            \"SAR\": 10.8061,\n            \"ISR\": 14.0169\n          },\n          \"instrumental\": {\n            \"SDR\": 15.6396,\n            \"SIR\": 24.2926,\n            \"SAR\": 17.6245,\n            \"ISR\": 18.6813\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3153,\n            \"SIR\": 21.3488,\n            \"SAR\": 10.8498,\n            \"ISR\": 15.769\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9638,\n            \"SIR\": 31.8237,\n            \"SAR\": 20.6841,\n            \"ISR\": 18.6502\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -10.6981,\n            \"SIR\": -7.97205,\n            \"SAR\": 0.03792,\n            \"ISR\": 8.22472\n          },\n          \"instrumental\": {\n            \"SDR\": 18.9689,\n            \"SIR\": 41.8995,\n            \"SAR\": 29.3966,\n            \"ISR\": 18.9556\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.12687,\n            \"SIR\": 13.7553,\n            \"SAR\": 3.92363,\n            \"ISR\": 6.64446\n          },\n          \"instrumental\": {\n            \"SDR\": 10.697,\n            \"SIR\": 14.3269,\n            \"SAR\": 14.3933,\n            \"ISR\": 16.762\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.20284,\n            \"SIR\": 13.8034,\n            \"SAR\": 4.5976,\n            \"ISR\": 11.1915\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0168,\n            \"SIR\": 25.8867,\n            \"SAR\": 19.3072,\n            \"ISR\": 18.2612\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.75697,\n            \"SIR\": 18.2142,\n            \"SAR\": 8.13771,\n            \"ISR\": 11.8747\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5873,\n            \"SIR\": 22.8122,\n            \"SAR\": 17.38,\n            \"ISR\": 18.1119\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.5251,\n            \"SIR\": 16.0533,\n            \"SAR\": 8.99974,\n            \"ISR\": 14.4896\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0573,\n            \"SIR\": 25.9107,\n            \"SAR\": 16.9117,\n            \"ISR\": 17.1145\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.077,\n            \"SIR\": 25.0475,\n            \"SAR\": 12.2441,\n            \"ISR\": 14.0933\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4303,\n            \"SIR\": 20.3962,\n            \"SAR\": 14.8715,\n            \"ISR\": 17.6408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.28971,\n            \"SIR\": 11.4661,\n            \"SAR\": 6.1435,\n            \"ISR\": 12.0171\n          },\n          \"instrumental\": {\n            \"SDR\": 11.5544,\n            \"SIR\": 19.8293,\n            \"SAR\": 13.1759,\n            \"ISR\": 14.0247\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.975,\n            \"SIR\": 5.73503,\n            \"SAR\": 7.465,\n            \"ISR\": 15.1574\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7186,\n            \"SIR\": 23.57,\n            \"SAR\": 12.8843,\n            \"ISR\": 13.0063\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3434,\n            \"SIR\": 28.5238,\n            \"SAR\": 14.5066,\n            \"ISR\": 15.8186\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7338,\n            \"SIR\": 20.7777,\n            \"SAR\": 14.7528,\n            \"ISR\": 18.1301\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.2444,\n            \"SIR\": 24.1163,\n            \"SAR\": 12.0753,\n            \"ISR\": 15.2515\n          },\n          \"instrumental\": {\n            \"SDR\": 14.2308,\n            \"SIR\": 23.7101,\n            \"SAR\": 15.8116,\n            \"ISR\": 17.6675\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.868,\n            \"SIR\": 28.1803,\n            \"SAR\": 12.5461,\n            \"ISR\": 15.479\n          },\n          \"instrumental\": {\n            \"SDR\": 17.546,\n            \"SIR\": 28.2661,\n            \"SAR\": 21.7024,\n            \"ISR\": 19.408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.2702,\n            \"SIR\": 9.19773,\n            \"SAR\": 4.75764,\n            \"ISR\": 10.4024\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3742,\n            \"SIR\": 18.0427,\n            \"SAR\": 11.9068,\n            \"ISR\": 15.2756\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.949,\n            \"SIR\": 21.946,\n            \"SAR\": 13.6592,\n            \"ISR\": 18.0142\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6185,\n            \"SIR\": 36.8288,\n            \"SAR\": 23.8534,\n            \"ISR\": 18.9695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9692,\n            \"SIR\": 20.2516,\n            \"SAR\": 12.3834,\n            \"ISR\": 16.6795\n          },\n          \"instrumental\": {\n            \"SDR\": 11.8514,\n            \"SIR\": 23.0838,\n            \"SAR\": 12.4722,\n            \"ISR\": 15.1694\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.59135,\n            \"SIR\": 21.6023,\n            \"SAR\": 8.9602,\n            \"ISR\": 13.8188\n          },\n          \"instrumental\": {\n            \"SDR\": 14.058,\n            \"SIR\": 22.8337,\n            \"SAR\": 15.5571,\n            \"ISR\": 17.6894\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.79925,\n            \"SIR\": 5.80021,\n            \"SAR\": 2.36476,\n            \"ISR\": 10.5422\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5183,\n            \"SIR\": 30.6284,\n            \"SAR\": 20.9877,\n            \"ISR\": 17.7329\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.92083,\n            \"SIR\": 22.3311,\n            \"SAR\": 10.7783,\n            \"ISR\": 15.5166\n          },\n          \"instrumental\": {\n            \"SDR\": 16.681,\n            \"SIR\": 30.32,\n            \"SAR\": 20.3632,\n            \"ISR\": 18.3138\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.9632,\n            \"SIR\": 29.3989,\n            \"SAR\": 12.5099,\n            \"ISR\": 15.8255\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0429,\n            \"SIR\": 31.8279,\n            \"SAR\": 22.4314,\n            \"ISR\": 19.3364\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.8147,\n            \"SIR\": 27.7858,\n            \"SAR\": 14.5578,\n            \"ISR\": 17.1887\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0434,\n            \"SIR\": 28.4183,\n            \"SAR\": 18.3994,\n            \"ISR\": 18.4986\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - Dont Let Go\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.513,\n            \"SIR\": 22.9547,\n            \"SAR\": 12.2014,\n            \"ISR\": 16.5532\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3309,\n            \"SIR\": 29.577,\n            \"SAR\": 19.5424,\n            \"ISR\": 18.6134\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.69496,\n        \"SIR\": 20.2516,\n        \"SAR\": 10.3204,\n        \"ISR\": 14.0933\n      },\n      \"instrumental\": {\n        \"SDR\": 15.2961,\n        \"SIR\": 25.1364,\n        \"SAR\": 17.475,\n        \"ISR\": 17.9954\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"melband_roformer_inst_v1.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Inst V1 by Unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.59313,\n            \"SIR\": 22.7019,\n            \"SAR\": 7.12521,\n            \"ISR\": 10.8986\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7678,\n            \"SIR\": 25.4369,\n            \"SAR\": 20.2166,\n            \"ISR\": 19.245\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.80987,\n            \"SIR\": 23.0893,\n            \"SAR\": 8.89572,\n            \"ISR\": 12.5469\n          },\n          \"instrumental\": {\n            \"SDR\": 14.462,\n            \"SIR\": 22.0064,\n            \"SAR\": 16.1973,\n            \"ISR\": 18.3806\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5977,\n            \"SIR\": 27.5923,\n            \"SAR\": 12.5824,\n            \"ISR\": 15.5384\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7028,\n            \"SIR\": 27.6585,\n            \"SAR\": 19.6419,\n            \"ISR\": 19.0694\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.48132,\n            \"SIR\": 7.6045,\n            \"SAR\": 4.49987,\n            \"ISR\": 12.7623\n          },\n          \"instrumental\": {\n            \"SDR\": 11.9082,\n            \"SIR\": 25.4403,\n            \"SAR\": 13.7811,\n            \"ISR\": 14.6838\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9825,\n            \"SIR\": 26.4734,\n            \"SAR\": 13.936,\n            \"ISR\": 16.2151\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0824,\n            \"SIR\": 25.0051,\n            \"SAR\": 16.4753,\n            \"ISR\": 18.1312\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3836,\n            \"SIR\": 21.0035,\n            \"SAR\": 11.2427,\n            \"ISR\": 14.4589\n          },\n          \"instrumental\": {\n            \"SDR\": 12.5237,\n            \"SIR\": 21.3739,\n            \"SAR\": 13.9282,\n            \"ISR\": 16.8434\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.5278,\n            \"SIR\": 23.2206,\n            \"SAR\": 12.6217,\n            \"ISR\": 14.436\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2894,\n            \"SIR\": 23.2075,\n            \"SAR\": 17.4857,\n            \"ISR\": 18.3853\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.3173,\n            \"SIR\": 24.1923,\n            \"SAR\": 7.67168,\n            \"ISR\": 11.3225\n          },\n          \"instrumental\": {\n            \"SDR\": 18.7461,\n            \"SIR\": 30.9583,\n            \"SAR\": 25.1604,\n            \"ISR\": 19.6076\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1105,\n            \"SIR\": 31.1097,\n            \"SAR\": 13.3075,\n            \"ISR\": 15.6329\n          },\n          \"instrumental\": {\n            \"SDR\": 15.1554,\n            \"SIR\": 25.2653,\n            \"SAR\": 17.5404,\n            \"ISR\": 19.1185\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0197,\n            \"SIR\": 24.2418,\n            \"SAR\": 11.1345,\n            \"ISR\": 12.628\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0726,\n            \"SIR\": 20.0758,\n            \"SAR\": 16.585,\n            \"ISR\": 18.3193\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.7085,\n            \"SIR\": 37.4375,\n            \"SAR\": 16.5238,\n            \"ISR\": 17.8271\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5209,\n            \"SIR\": 30.4956,\n            \"SAR\": 19.2649,\n            \"ISR\": 19.528\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.656,\n            \"SIR\": 26.0406,\n            \"SAR\": 12.589,\n            \"ISR\": 15.1021\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8132,\n            \"SIR\": 25.8702,\n            \"SAR\": 18.3155,\n            \"ISR\": 18.9679\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.63658,\n            \"SIR\": 21.6235,\n            \"SAR\": 8.34646,\n            \"ISR\": 12.4894\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7658,\n            \"SIR\": 24.6296,\n            \"SAR\": 17.9832,\n            \"ISR\": 18.3308\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78726,\n            \"SIR\": 22.7946,\n            \"SAR\": 10.8375,\n            \"ISR\": 13.0027\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9873,\n            \"SIR\": 25.2386,\n            \"SAR\": 23.195,\n            \"ISR\": 19.1357\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.96104,\n            \"SIR\": 20.0224,\n            \"SAR\": 4.21664,\n            \"ISR\": 9.21622\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9132,\n            \"SIR\": 22.552,\n            \"SAR\": 19.2311,\n            \"ISR\": 19.0224\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.78846,\n            \"SIR\": 23.0023,\n            \"SAR\": 10.5429,\n            \"ISR\": 14.048\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6786,\n            \"SIR\": 23.3239,\n            \"SAR\": 16.7645,\n            \"ISR\": 18.2915\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.51279,\n            \"SIR\": 29.2246,\n            \"SAR\": 10.126,\n            \"ISR\": 13.6077\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0283,\n            \"SIR\": 24.8546,\n            \"SAR\": 18.5468,\n            \"ISR\": 19.2948\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.37628,\n            \"SIR\": -28.6276,\n            \"SAR\": 0.013345,\n            \"ISR\": 11.123\n          },\n          \"instrumental\": {\n            \"SDR\": 18.5125,\n            \"SIR\": 57.5659,\n            \"SAR\": 19.6185,\n            \"ISR\": 17.1315\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.89028,\n            \"SIR\": 19.5303,\n            \"SAR\": 8.20676,\n            \"ISR\": 10.9664\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4796,\n            \"SIR\": 22.1414,\n            \"SAR\": 19.7809,\n            \"ISR\": 18.7023\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.1078,\n            \"SIR\": 25.8818,\n            \"SAR\": 11.7517,\n            \"ISR\": 15.2895\n          },\n          \"instrumental\": {\n            \"SDR\": 15.934,\n            \"SIR\": 26.7618,\n            \"SAR\": 18.4859,\n            \"ISR\": 18.9409\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8934,\n            \"SIR\": 25.3438,\n            \"SAR\": 13.0781,\n            \"ISR\": 16.1428\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8815,\n            \"SIR\": 31.2395,\n            \"SAR\": 22.5189,\n            \"ISR\": 19.2768\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.03079,\n            \"SIR\": -3.69748,\n            \"SAR\": 0.06847,\n            \"ISR\": 4.19538\n          },\n          \"instrumental\": {\n            \"SDR\": 19.8932,\n            \"SIR\": 46.6319,\n            \"SAR\": 36.9793,\n            \"ISR\": 19.446\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.34614,\n            \"SIR\": 17.3005,\n            \"SAR\": 3.96471,\n            \"ISR\": 6.51947\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1493,\n            \"SIR\": 14.3171,\n            \"SAR\": 14.9482,\n            \"ISR\": 17.9872\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.9198,\n            \"SIR\": 20.3273,\n            \"SAR\": 7.99178,\n            \"ISR\": 13.092\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6608,\n            \"SIR\": 29.5082,\n            \"SAR\": 21.4378,\n            \"ISR\": 19.2191\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.78141,\n            \"SIR\": 20.5788,\n            \"SAR\": 8.06921,\n            \"ISR\": 11.4673\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7541,\n            \"SIR\": 22.2265,\n            \"SAR\": 17.5065,\n            \"ISR\": 18.6101\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.09234,\n            \"SIR\": 20.807,\n            \"SAR\": 9.77877,\n            \"ISR\": 13.9277\n          },\n          \"instrumental\": {\n            \"SDR\": 15.9013,\n            \"SIR\": 25.1967,\n            \"SAR\": 17.9272,\n            \"ISR\": 18.1495\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.4326,\n            \"SIR\": 27.328,\n            \"SAR\": 12.3845,\n            \"ISR\": 14.0647\n          },\n          \"instrumental\": {\n            \"SDR\": 13.4148,\n            \"SIR\": 20.6769,\n            \"SAR\": 14.7155,\n            \"ISR\": 18.2046\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.3978,\n            \"SIR\": 19.5302,\n            \"SAR\": 6.04242,\n            \"ISR\": 9.90361\n          },\n          \"instrumental\": {\n            \"SDR\": 13.2972,\n            \"SIR\": 16.9654,\n            \"SAR\": 13.8593,\n            \"ISR\": 17.4784\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0652,\n            \"SIR\": 25.8667,\n            \"SAR\": 11.1473,\n            \"ISR\": 14.2432\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6695,\n            \"SIR\": 23.0059,\n            \"SAR\": 17.2168,\n            \"ISR\": 18.4369\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3048,\n            \"SIR\": 31.6282,\n            \"SAR\": 14.4973,\n            \"ISR\": 15.8107\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6387,\n            \"SIR\": 21.0255,\n            \"SAR\": 14.8086,\n            \"ISR\": 18.6725\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0456,\n            \"SIR\": 25.607,\n            \"SAR\": 11.5202,\n            \"ISR\": 14.2325\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0077,\n            \"SIR\": 22.0011,\n            \"SAR\": 15.6984,\n            \"ISR\": 18.1016\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8392,\n            \"SIR\": 35.0962,\n            \"SAR\": 13.0003,\n            \"ISR\": 15.6768\n          },\n          \"instrumental\": {\n            \"SDR\": 17.737,\n            \"SIR\": 28.5758,\n            \"SAR\": 22.0999,\n            \"ISR\": 19.7177\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.45893,\n            \"SIR\": 15.8722,\n            \"SAR\": 9.81507,\n            \"ISR\": 10.2869\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5702,\n            \"SIR\": 17.1708,\n            \"SAR\": 18.1745,\n            \"ISR\": 17.9683\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.151,\n            \"SIR\": 31.9359,\n            \"SAR\": 15.6072,\n            \"ISR\": 17.7287\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0501,\n            \"SIR\": 35.9791,\n            \"SAR\": 26.2297,\n            \"ISR\": 19.6703\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9019,\n            \"SIR\": 21.4803,\n            \"SAR\": 12.2739,\n            \"ISR\": 16.27\n          },\n          \"instrumental\": {\n            \"SDR\": 12.2484,\n            \"SIR\": 22.3732,\n            \"SAR\": 12.603,\n            \"ISR\": 15.8561\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.37636,\n            \"SIR\": 25.5439,\n            \"SAR\": 8.8963,\n            \"ISR\": 13.0278\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8614,\n            \"SIR\": 21.5895,\n            \"SAR\": 15.507,\n            \"ISR\": 18.3778\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.73788,\n            \"SIR\": 7.26737,\n            \"SAR\": 4.12875,\n            \"ISR\": 10.7461\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0454,\n            \"SIR\": 31.8882,\n            \"SAR\": 24.2473,\n            \"ISR\": 18.6447\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.82229,\n            \"SIR\": 27.5433,\n            \"SAR\": 10.5432,\n            \"ISR\": 14.6576\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9939,\n            \"SIR\": 28.3647,\n            \"SAR\": 20.3799,\n            \"ISR\": 19.1742\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.496,\n            \"SIR\": 29.6762,\n            \"SAR\": 12.2974,\n            \"ISR\": 14.8449\n          },\n          \"instrumental\": {\n            \"SDR\": 17.7971,\n            \"SIR\": 29.862,\n            \"SAR\": 22.1579,\n            \"ISR\": 19.4217\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0959,\n            \"SIR\": 25.1333,\n            \"SAR\": 14.1252,\n            \"ISR\": 16.8017\n          },\n          \"instrumental\": {\n            \"SDR\": 15.5816,\n            \"SIR\": 28.5947,\n            \"SAR\": 17.2643,\n            \"ISR\": 17.6931\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.80538,\n        \"SIR\": 23.7065,\n        \"SAR\": 10.6904,\n        \"ISR\": 13.9878\n      },\n      \"instrumental\": {\n        \"SDR\": 15.8573,\n        \"SIR\": 25.1009,\n        \"SAR\": 18.0788,\n        \"ISR\": 18.6274\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"melband_roformer_inst_v2.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | Inst V2 by Unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.02113,\n            \"SIR\": 22.8732,\n            \"SAR\": 7.34674,\n            \"ISR\": 11.4878\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8635,\n            \"SIR\": 26.2265,\n            \"SAR\": 20.4975,\n            \"ISR\": 19.2417\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.12526,\n            \"SIR\": 22.6007,\n            \"SAR\": 9.21153,\n            \"ISR\": 13.2992\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4697,\n            \"SIR\": 23.1296,\n            \"SAR\": 16.3163,\n            \"ISR\": 18.279\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.0387,\n            \"SIR\": 27.3062,\n            \"SAR\": 12.8191,\n            \"ISR\": 16.1088\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8135,\n            \"SIR\": 28.8152,\n            \"SAR\": 19.9229,\n            \"ISR\": 19.0194\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.80585,\n            \"SIR\": 7.58774,\n            \"SAR\": 4.64893,\n            \"ISR\": 13.3517\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7368,\n            \"SIR\": 26.3669,\n            \"SAR\": 13.6008,\n            \"ISR\": 14.4638\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.1766,\n            \"SIR\": 26.0215,\n            \"SAR\": 14.2426,\n            \"ISR\": 16.6319\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2261,\n            \"SIR\": 25.7423,\n            \"SAR\": 16.6955,\n            \"ISR\": 17.9991\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.5388,\n            \"SIR\": 20.7882,\n            \"SAR\": 11.448,\n            \"ISR\": 14.8628\n          },\n          \"instrumental\": {\n            \"SDR\": 12.6654,\n            \"SIR\": 22.1246,\n            \"SAR\": 14.0122,\n            \"ISR\": 16.7403\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.7922,\n            \"SIR\": 22.968,\n            \"SAR\": 12.8275,\n            \"ISR\": 14.9688\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2577,\n            \"SIR\": 24.0257,\n            \"SAR\": 17.5925,\n            \"ISR\": 18.2725\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.90708,\n            \"SIR\": 24.2879,\n            \"SAR\": 8.06339,\n            \"ISR\": 12.0824\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6465,\n            \"SIR\": 32.0538,\n            \"SAR\": 24.935,\n            \"ISR\": 19.5385\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0676,\n            \"SIR\": 30.9494,\n            \"SAR\": 14.1914,\n            \"ISR\": 16.8357\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8017,\n            \"SIR\": 28.0276,\n            \"SAR\": 18.3979,\n            \"ISR\": 19.0722\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.2034,\n            \"SIR\": 23.7646,\n            \"SAR\": 12.2129,\n            \"ISR\": 13.3\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8116,\n            \"SIR\": 21.1692,\n            \"SAR\": 17.4623,\n            \"ISR\": 18.2198\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.1494,\n            \"SIR\": 37.3432,\n            \"SAR\": 17.0733,\n            \"ISR\": 18.3459\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8335,\n            \"SIR\": 31.958,\n            \"SAR\": 19.9383,\n            \"ISR\": 19.5205\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8861,\n            \"SIR\": 25.6621,\n            \"SAR\": 12.7967,\n            \"ISR\": 15.5019\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8931,\n            \"SIR\": 26.8758,\n            \"SAR\": 18.3758,\n            \"ISR\": 18.9207\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.87606,\n            \"SIR\": 21.4379,\n            \"SAR\": 8.40433,\n            \"ISR\": 13.343\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0064,\n            \"SIR\": 26.0934,\n            \"SAR\": 18.2581,\n            \"ISR\": 18.2002\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1088,\n            \"SIR\": 22.4545,\n            \"SAR\": 11.0841,\n            \"ISR\": 13.3075\n          },\n          \"instrumental\": {\n            \"SDR\": 18.0389,\n            \"SIR\": 25.4843,\n            \"SAR\": 23.3216,\n            \"ISR\": 19.0655\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.10086,\n            \"SIR\": 19.8791,\n            \"SAR\": 4.61918,\n            \"ISR\": 9.77326\n          },\n          \"instrumental\": {\n            \"SDR\": 15.8275,\n            \"SIR\": 23.2805,\n            \"SAR\": 18.9869,\n            \"ISR\": 18.9551\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2267,\n            \"SIR\": 22.3629,\n            \"SAR\": 10.926,\n            \"ISR\": 14.7369\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8179,\n            \"SIR\": 24.6144,\n            \"SAR\": 16.9599,\n            \"ISR\": 18.1218\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.80505,\n            \"SIR\": 29.0038,\n            \"SAR\": 10.395,\n            \"ISR\": 14.0552\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1516,\n            \"SIR\": 25.692,\n            \"SAR\": 18.8756,\n            \"ISR\": 19.2613\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.07154,\n            \"SIR\": -28.7508,\n            \"SAR\": 0.033545,\n            \"ISR\": 11.5\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6129,\n            \"SIR\": 58.4135,\n            \"SAR\": 19.6397,\n            \"ISR\": 17.0796\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.0983,\n            \"SIR\": 19.0257,\n            \"SAR\": 8.43441,\n            \"ISR\": 11.2693\n          },\n          \"instrumental\": {\n            \"SDR\": 17.5558,\n            \"SIR\": 22.6064,\n            \"SAR\": 20.0213,\n            \"ISR\": 18.5654\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.829,\n            \"SIR\": 26.6002,\n            \"SAR\": 12.386,\n            \"ISR\": 16.0067\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2763,\n            \"SIR\": 28.398,\n            \"SAR\": 19.021,\n            \"ISR\": 18.9973\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4479,\n            \"SIR\": 26.0774,\n            \"SAR\": 13.7517,\n            \"ISR\": 16.6435\n          },\n          \"instrumental\": {\n            \"SDR\": 17.8987,\n            \"SIR\": 32.1639,\n            \"SAR\": 22.4922,\n            \"ISR\": 19.2739\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00067,\n            \"SIR\": -3.67736,\n            \"SAR\": 0.19529,\n            \"ISR\": 4.24983\n          },\n          \"instrumental\": {\n            \"SDR\": 19.9326,\n            \"SIR\": 47.1931,\n            \"SAR\": 37.5267,\n            \"ISR\": 19.5308\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.44956,\n            \"SIR\": 16.8059,\n            \"SAR\": 4.11688,\n            \"ISR\": 6.90925\n          },\n          \"instrumental\": {\n            \"SDR\": 11.1846,\n            \"SIR\": 14.7724,\n            \"SAR\": 14.6363,\n            \"ISR\": 17.7607\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.47103,\n            \"SIR\": 20.1499,\n            \"SAR\": 8.15602,\n            \"ISR\": 13.9796\n          },\n          \"instrumental\": {\n            \"SDR\": 17.762,\n            \"SIR\": 30.146,\n            \"SAR\": 21.6146,\n            \"ISR\": 19.174\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.00871,\n            \"SIR\": 20.3855,\n            \"SAR\": 8.35842,\n            \"ISR\": 11.968\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8171,\n            \"SIR\": 22.9609,\n            \"SAR\": 17.5545,\n            \"ISR\": 18.5724\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.42273,\n            \"SIR\": 20.6284,\n            \"SAR\": 10.1366,\n            \"ISR\": 14.4831\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1081,\n            \"SIR\": 26.3549,\n            \"SAR\": 18.2536,\n            \"ISR\": 18.0555\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8382,\n            \"SIR\": 27.9563,\n            \"SAR\": 12.9275,\n            \"ISR\": 14.2597\n          },\n          \"instrumental\": {\n            \"SDR\": 13.8321,\n            \"SIR\": 21.0212,\n            \"SAR\": 15.3133,\n            \"ISR\": 18.3662\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.81159,\n            \"SIR\": 18.8671,\n            \"SAR\": 6.73053,\n            \"ISR\": 10.4935\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6415,\n            \"SIR\": 17.94,\n            \"SAR\": 14.2477,\n            \"ISR\": 17.1803\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.8706,\n            \"SIR\": 26.3121,\n            \"SAR\": 11.8612,\n            \"ISR\": 15.5736\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7792,\n            \"SIR\": 25.4772,\n            \"SAR\": 17.8785,\n            \"ISR\": 17.4578\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.0225,\n            \"SIR\": 31.2628,\n            \"SAR\": 15.1781,\n            \"ISR\": 16.548\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0741,\n            \"SIR\": 21.2328,\n            \"SAR\": 15.7661,\n            \"ISR\": 18.7671\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6538,\n            \"SIR\": 24.9279,\n            \"SAR\": 12.5349,\n            \"ISR\": 15.3253\n          },\n          \"instrumental\": {\n            \"SDR\": 14.4443,\n            \"SIR\": 23.4619,\n            \"SAR\": 15.975,\n            \"ISR\": 17.903\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.7004,\n            \"SIR\": 35.4648,\n            \"SAR\": 13.8985,\n            \"ISR\": 16.33\n          },\n          \"instrumental\": {\n            \"SDR\": 18.059,\n            \"SIR\": 30.9682,\n            \"SAR\": 22.7971,\n            \"ISR\": 19.7761\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.4941,\n            \"SIR\": 16.2176,\n            \"SAR\": 10.0121,\n            \"ISR\": 10.5968\n          },\n          \"instrumental\": {\n            \"SDR\": 13.5837,\n            \"SIR\": 17.4015,\n            \"SAR\": 17.9409,\n            \"ISR\": 18.1126\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.5832,\n            \"SIR\": 31.2557,\n            \"SAR\": 16.3811,\n            \"ISR\": 18.1319\n          },\n          \"instrumental\": {\n            \"SDR\": 19.1675,\n            \"SIR\": 37.5946,\n            \"SAR\": 26.8855,\n            \"ISR\": 19.6217\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.349,\n            \"SIR\": 21.7457,\n            \"SAR\": 12.7289,\n            \"ISR\": 16.7757\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0854,\n            \"SIR\": 23.1845,\n            \"SAR\": 13.0484,\n            \"ISR\": 15.657\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.72349,\n            \"SIR\": 25.2805,\n            \"SAR\": 9.13203,\n            \"ISR\": 13.6972\n          },\n          \"instrumental\": {\n            \"SDR\": 14.3485,\n            \"SIR\": 22.6056,\n            \"SAR\": 16.2959,\n            \"ISR\": 18.2944\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.91693,\n            \"SIR\": 7.30367,\n            \"SAR\": 4.20726,\n            \"ISR\": 11.1963\n          },\n          \"instrumental\": {\n            \"SDR\": 19.0816,\n            \"SIR\": 32.2645,\n            \"SAR\": 24.1358,\n            \"ISR\": 18.6023\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.2827,\n            \"SIR\": 27.4632,\n            \"SAR\": 10.8963,\n            \"ISR\": 15.4544\n          },\n          \"instrumental\": {\n            \"SDR\": 17.1289,\n            \"SIR\": 29.9609,\n            \"SAR\": 20.6392,\n            \"ISR\": 19.1566\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6367,\n            \"SIR\": 31.1013,\n            \"SAR\": 13.0895,\n            \"ISR\": 16.8252\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2822,\n            \"SIR\": 33.3222,\n            \"SAR\": 23.2674,\n            \"ISR\": 19.4432\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.9713,\n            \"SIR\": 25.7951,\n            \"SAR\": 15.2871,\n            \"ISR\": 17.5479\n          },\n          \"instrumental\": {\n            \"SDR\": 16.141,\n            \"SIR\": 30.4691,\n            \"SAR\": 18.4409,\n            \"ISR\": 17.8294\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.2547,\n        \"SIR\": 23.3663,\n        \"SAR\": 11.0051,\n        \"ISR\": 14.3714\n      },\n      \"instrumental\": {\n        \"SDR\": 16.0572,\n        \"SIR\": 25.9179,\n        \"SAR\": 18.3869,\n        \"ISR\": 18.5689\n      }\n    },\n    \"stems\": [\n      \"instrumental\",\n      \"vocals\"\n    ],\n    \"target_stem\": \"instrumental\"\n  },\n  \"melband_roformer_instvoc_duality_v1.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | InstVoc Duality V1 by Unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.53732,\n            \"SIR\": 23.5485,\n            \"SAR\": 7.88449,\n            \"ISR\": 12.236\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0929,\n            \"SIR\": 27.1492,\n            \"SAR\": 20.6733,\n            \"ISR\": 19.2657\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.34051,\n            \"SIR\": 23.251,\n            \"SAR\": 9.499,\n            \"ISR\": 13.6445\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7629,\n            \"SIR\": 23.6761,\n            \"SAR\": 16.69,\n            \"ISR\": 18.3834\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6666,\n            \"SIR\": 28.7856,\n            \"SAR\": 13.5288,\n            \"ISR\": 16.6932\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9549,\n            \"SIR\": 29.8781,\n            \"SAR\": 20.2144,\n            \"ISR\": 19.1733\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.15616,\n            \"SIR\": 7.9543,\n            \"SAR\": 4.78232,\n            \"ISR\": 13.9195\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7598,\n            \"SIR\": 27.1678,\n            \"SAR\": 13.5728,\n            \"ISR\": 14.4721\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.6768,\n            \"SIR\": 26.9935,\n            \"SAR\": 14.7913,\n            \"ISR\": 17.023\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4851,\n            \"SIR\": 26.8696,\n            \"SAR\": 17.1382,\n            \"ISR\": 18.2263\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9739,\n            \"SIR\": 21.1101,\n            \"SAR\": 11.8031,\n            \"ISR\": 15.4413\n          },\n          \"instrumental\": {\n            \"SDR\": 12.8497,\n            \"SIR\": 23.3393,\n            \"SAR\": 14.3829,\n            \"ISR\": 16.849\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.5633,\n            \"SIR\": 22.9539,\n            \"SAR\": 13.6573,\n            \"ISR\": 15.7894\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7737,\n            \"SIR\": 25.5918,\n            \"SAR\": 18.0411,\n            \"ISR\": 18.3408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.24035,\n            \"SIR\": 26.3555,\n            \"SAR\": 9.73442,\n            \"ISR\": 14.4685\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2664,\n            \"SIR\": 31.4603,\n            \"SAR\": 23.8276,\n            \"ISR\": 19.5771\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.8878,\n            \"SIR\": 32.681,\n            \"SAR\": 15.2947,\n            \"ISR\": 17.3878\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4206,\n            \"SIR\": 29.7129,\n            \"SAR\": 19.39,\n            \"ISR\": 19.3272\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.3684,\n            \"SIR\": 24.2992,\n            \"SAR\": 13.2207,\n            \"ISR\": 13.9473\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4952,\n            \"SIR\": 21.9423,\n            \"SAR\": 18.1069,\n            \"ISR\": 18.3174\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.4328,\n            \"SIR\": 39.2139,\n            \"SAR\": 17.8256,\n            \"ISR\": 18.3919\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2783,\n            \"SIR\": 32.6546,\n            \"SAR\": 20.7553,\n            \"ISR\": 19.6359\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4095,\n            \"SIR\": 26.0061,\n            \"SAR\": 13.2019,\n            \"ISR\": 16.184\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0388,\n            \"SIR\": 28.1231,\n            \"SAR\": 18.6728,\n            \"ISR\": 18.9389\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.43016,\n            \"SIR\": 22.6494,\n            \"SAR\": 9.9772,\n            \"ISR\": 14.3791\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7698,\n            \"SIR\": 26.9032,\n            \"SAR\": 17.8109,\n            \"ISR\": 18.2583\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.1896,\n            \"SIR\": 27.7667,\n            \"SAR\": 13.755,\n            \"ISR\": 15.4648\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7317,\n            \"SIR\": 25.3364,\n            \"SAR\": 20.1383,\n            \"ISR\": 19.1183\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.07174,\n            \"SIR\": 22.3263,\n            \"SAR\": 6.40451,\n            \"ISR\": 10.381\n          },\n          \"instrumental\": {\n            \"SDR\": 15.083,\n            \"SIR\": 22.3937,\n            \"SAR\": 18.4798,\n            \"ISR\": 18.9954\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.7754,\n            \"SIR\": 23.0938,\n            \"SAR\": 11.4937,\n            \"ISR\": 15.5828\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0219,\n            \"SIR\": 26.4808,\n            \"SAR\": 17.1003,\n            \"ISR\": 18.285\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.316,\n            \"SIR\": 29.8442,\n            \"SAR\": 10.8124,\n            \"ISR\": 14.7429\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3174,\n            \"SIR\": 26.8933,\n            \"SAR\": 19.2417,\n            \"ISR\": 19.3104\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00021,\n            \"SIR\": -27.8217,\n            \"SAR\": 0.042015,\n            \"ISR\": 11.4847\n          },\n          \"instrumental\": {\n            \"SDR\": 17.9651,\n            \"SIR\": 58.4226,\n            \"SAR\": 19.6784,\n            \"ISR\": 17.3881\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.74227,\n            \"SIR\": 19.5223,\n            \"SAR\": 8.93713,\n            \"ISR\": 11.9711\n          },\n          \"instrumental\": {\n            \"SDR\": 17.621,\n            \"SIR\": 23.3362,\n            \"SAR\": 20.2091,\n            \"ISR\": 18.564\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6484,\n            \"SIR\": 27.7845,\n            \"SAR\": 13.6929,\n            \"ISR\": 16.594\n          },\n          \"instrumental\": {\n            \"SDR\": 16.444,\n            \"SIR\": 29.4823,\n            \"SAR\": 19.019,\n            \"ISR\": 19.081\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7193,\n            \"SIR\": 28.2691,\n            \"SAR\": 15.2732,\n            \"ISR\": 17.337\n          },\n          \"instrumental\": {\n            \"SDR\": 17.881,\n            \"SIR\": 32.0659,\n            \"SAR\": 22.0586,\n            \"ISR\": 19.4043\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 5.09568,\n            \"SIR\": 24.5833,\n            \"SAR\": 5.12182,\n            \"ISR\": 12.2483\n          },\n          \"instrumental\": {\n            \"SDR\": 19.2241,\n            \"SIR\": 37.6732,\n            \"SAR\": 28.282,\n            \"ISR\": 19.6981\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.54699,\n            \"SIR\": 17.2812,\n            \"SAR\": 4.28651,\n            \"ISR\": 7.05889\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2575,\n            \"SIR\": 14.9722,\n            \"SAR\": 14.6705,\n            \"ISR\": 17.85\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.38185,\n            \"SIR\": 21.7325,\n            \"SAR\": 10.0402,\n            \"ISR\": 14.8602\n          },\n          \"instrumental\": {\n            \"SDR\": 17.3412,\n            \"SIR\": 29.6069,\n            \"SAR\": 20.5866,\n            \"ISR\": 19.108\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.45359,\n            \"SIR\": 20.7774,\n            \"SAR\": 8.7449,\n            \"ISR\": 12.3763\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0671,\n            \"SIR\": 23.6141,\n            \"SAR\": 17.6703,\n            \"ISR\": 18.6145\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.0815,\n            \"SIR\": 22.4071,\n            \"SAR\": 10.7082,\n            \"ISR\": 15.209\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1855,\n            \"SIR\": 27.1852,\n            \"SAR\": 18.2517,\n            \"ISR\": 18.1533\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2051,\n            \"SIR\": 28.7439,\n            \"SAR\": 13.2644,\n            \"ISR\": 14.7713\n          },\n          \"instrumental\": {\n            \"SDR\": 14.0138,\n            \"SIR\": 21.8613,\n            \"SAR\": 15.5807,\n            \"ISR\": 18.5267\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.30297,\n            \"SIR\": 19.6108,\n            \"SAR\": 7.22984,\n            \"ISR\": 11.103\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6599,\n            \"SIR\": 18.5861,\n            \"SAR\": 14.4515,\n            \"ISR\": 17.3265\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8493,\n            \"SIR\": 28.4299,\n            \"SAR\": 12.8431,\n            \"ISR\": 16.569\n          },\n          \"instrumental\": {\n            \"SDR\": 15.3577,\n            \"SIR\": 27.6344,\n            \"SAR\": 18.1915,\n            \"ISR\": 17.9545\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.0123,\n            \"SIR\": 33.7107,\n            \"SAR\": 16.3625,\n            \"ISR\": 17.0475\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7082,\n            \"SIR\": 21.4094,\n            \"SAR\": 15.9916,\n            \"ISR\": 19.1976\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4792,\n            \"SIR\": 26.4868,\n            \"SAR\": 13.4953,\n            \"ISR\": 16.0018\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6539,\n            \"SIR\": 24.3769,\n            \"SAR\": 16.1475,\n            \"ISR\": 18.2052\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4389,\n            \"SIR\": 36.2467,\n            \"SAR\": 14.6843,\n            \"ISR\": 16.9709\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2103,\n            \"SIR\": 31.5702,\n            \"SAR\": 22.9304,\n            \"ISR\": 19.7679\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.00356,\n            \"SIR\": 16.1957,\n            \"SAR\": 10.7257,\n            \"ISR\": 10.8562\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6959,\n            \"SIR\": 17.682,\n            \"SAR\": 18.6418,\n            \"ISR\": 17.9722\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 17.0876,\n            \"SIR\": 36.3891,\n            \"SAR\": 20.2477,\n            \"ISR\": 18.662\n          },\n          \"instrumental\": {\n            \"SDR\": 18.9359,\n            \"SIR\": 37.944,\n            \"SAR\": 25.8288,\n            \"ISR\": 19.7083\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.212,\n            \"SIR\": 22.6858,\n            \"SAR\": 13.8374,\n            \"ISR\": 17.5489\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6508,\n            \"SIR\": 24.2611,\n            \"SAR\": 12.6734,\n            \"ISR\": 15.7187\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.28262,\n            \"SIR\": 26.7611,\n            \"SAR\": 9.69987,\n            \"ISR\": 14.4338\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8417,\n            \"SIR\": 23.9982,\n            \"SAR\": 16.5929,\n            \"ISR\": 18.5479\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.68977,\n            \"SIR\": 21.2588,\n            \"SAR\": 8.33894,\n            \"ISR\": 12.9313\n          },\n          \"instrumental\": {\n            \"SDR\": 14.259,\n            \"SIR\": 23.7148,\n            \"SAR\": 16.6304,\n            \"ISR\": 18.6012\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9469,\n            \"SIR\": 28.4195,\n            \"SAR\": 11.6161,\n            \"ISR\": 15.8446\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2245,\n            \"SIR\": 30.3024,\n            \"SAR\": 20.8416,\n            \"ISR\": 19.2408\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.4731,\n            \"SIR\": 32.7145,\n            \"SAR\": 14.209,\n            \"ISR\": 17.1316\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4981,\n            \"SIR\": 34.2829,\n            \"SAR\": 23.7271,\n            \"ISR\": 19.534\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.2016,\n            \"SIR\": 26.1342,\n            \"SAR\": 16.3848,\n            \"ISR\": 18.1669\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8135,\n            \"SIR\": 32.159,\n            \"SAR\": 19.2943,\n            \"ISR\": 17.9213\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.9604,\n        \"SIR\": 25.2947,\n        \"SAR\": 11.7096,\n        \"ISR\": 15.3252\n      },\n      \"instrumental\": {\n        \"SDR\": 16.1121,\n        \"SIR\": 26.8982,\n        \"SAR\": 18.5608,\n        \"ISR\": 18.5826\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": null\n  },\n  \"melband_roformer_instvox_duality_v2.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | InstVoc Duality V2 by Unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.56213,\n            \"SIR\": 23.5803,\n            \"SAR\": 8.01788,\n            \"ISR\": 12.2982\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0833,\n            \"SIR\": 27.1384,\n            \"SAR\": 20.5861,\n            \"ISR\": 19.2745\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.3538,\n            \"SIR\": 23.2952,\n            \"SAR\": 9.50223,\n            \"ISR\": 13.7509\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7621,\n            \"SIR\": 23.8415,\n            \"SAR\": 16.6724,\n            \"ISR\": 18.3805\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6859,\n            \"SIR\": 28.3699,\n            \"SAR\": 13.4957,\n            \"ISR\": 16.7817\n          },\n          \"instrumental\": {\n            \"SDR\": 16.95,\n            \"SIR\": 30.064,\n            \"SAR\": 20.2448,\n            \"ISR\": 19.1293\n          }\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.14893,\n            \"SIR\": 7.93664,\n            \"SAR\": 4.75506,\n            \"ISR\": 13.979\n          },\n          \"instrumental\": {\n            \"SDR\": 11.7227,\n            \"SIR\": 27.3124,\n            \"SAR\": 13.5381,\n            \"ISR\": 14.4529\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7073,\n            \"SIR\": 26.8221,\n            \"SAR\": 14.8115,\n            \"ISR\": 17.1036\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4715,\n            \"SIR\": 27.1383,\n            \"SAR\": 17.1429,\n            \"ISR\": 18.1914\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.9888,\n            \"SIR\": 21.1002,\n            \"SAR\": 11.7907,\n            \"ISR\": 15.5365\n          },\n          \"instrumental\": {\n            \"SDR\": 12.865,\n            \"SIR\": 23.6144,\n            \"SAR\": 14.3884,\n            \"ISR\": 16.8312\n          }\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6114,\n            \"SIR\": 23.0081,\n            \"SAR\": 13.5893,\n            \"ISR\": 15.9311\n          },\n          \"instrumental\": {\n            \"SDR\": 15.7647,\n            \"SIR\": 25.7301,\n            \"SAR\": 18.0456,\n            \"ISR\": 18.3371\n          }\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.39377,\n            \"SIR\": 26.3752,\n            \"SAR\": 9.78446,\n            \"ISR\": 14.506\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2595,\n            \"SIR\": 31.4199,\n            \"SAR\": 23.836,\n            \"ISR\": 19.5807\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.968,\n            \"SIR\": 32.5421,\n            \"SAR\": 15.3772,\n            \"ISR\": 17.5148\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3979,\n            \"SIR\": 30.1502,\n            \"SAR\": 19.4376,\n            \"ISR\": 19.2938\n          }\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.4153,\n            \"SIR\": 24.3086,\n            \"SAR\": 13.2163,\n            \"ISR\": 13.9688\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4273,\n            \"SIR\": 21.9713,\n            \"SAR\": 18.0786,\n            \"ISR\": 18.3127\n          }\n        }\n      },\n      {\n        \"track_name\": \"Angela Thomas Wade - Milk Cow Blues\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.5581,\n            \"SIR\": 39.0615,\n            \"SAR\": 17.8452,\n            \"ISR\": 18.4987\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2634,\n            \"SIR\": 33.0066,\n            \"SAR\": 20.7647,\n            \"ISR\": 19.6081\n          }\n        }\n      },\n      {\n        \"track_name\": \"Atlantis Bound - It Was My Fault For Waiting\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.3986,\n            \"SIR\": 25.9959,\n            \"SAR\": 13.2257,\n            \"ISR\": 16.2344\n          },\n          \"instrumental\": {\n            \"SDR\": 16.0211,\n            \"SIR\": 28.0995,\n            \"SAR\": 18.6324,\n            \"ISR\": 18.9417\n          }\n        }\n      },\n      {\n        \"track_name\": \"Auctioneer - Our Future Faces\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.44872,\n            \"SIR\": 22.6731,\n            \"SAR\": 10.0688,\n            \"ISR\": 14.4276\n          },\n          \"instrumental\": {\n            \"SDR\": 15.751,\n            \"SIR\": 26.7836,\n            \"SAR\": 17.7727,\n            \"ISR\": 18.2575\n          }\n        }\n      },\n      {\n        \"track_name\": \"AvaLuna - Waterduct\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4018,\n            \"SIR\": 27.9021,\n            \"SAR\": 13.9866,\n            \"ISR\": 15.6672\n          },\n          \"instrumental\": {\n            \"SDR\": 16.5541,\n            \"SIR\": 25.372,\n            \"SAR\": 19.8524,\n            \"ISR\": 19.1069\n          }\n        }\n      },\n      {\n        \"track_name\": \"BigTroubles - Phantom\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.07413,\n            \"SIR\": 22.316,\n            \"SAR\": 6.43003,\n            \"ISR\": 10.4376\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0768,\n            \"SIR\": 22.4817,\n            \"SAR\": 18.4517,\n            \"ISR\": 18.991\n          }\n        }\n      },\n      {\n        \"track_name\": \"Bill Chudziak - Children Of No-one\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.811,\n            \"SIR\": 23.0166,\n            \"SAR\": 11.532,\n            \"ISR\": 15.6425\n          },\n          \"instrumental\": {\n            \"SDR\": 14.9405,\n            \"SIR\": 26.5712,\n            \"SAR\": 17.0262,\n            \"ISR\": 18.2699\n          }\n        }\n      },\n      {\n        \"track_name\": \"Black Bloc - If You Want Success\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.3263,\n            \"SIR\": 29.7592,\n            \"SAR\": 10.8044,\n            \"ISR\": 14.8284\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3251,\n            \"SIR\": 26.9479,\n            \"SAR\": 19.2524,\n            \"ISR\": 19.3048\n          }\n        }\n      },\n      {\n        \"track_name\": \"Celestial Shore - Die For Us\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.00032,\n            \"SIR\": -26.9405,\n            \"SAR\": 0.02117,\n            \"ISR\": 11.772\n          },\n          \"instrumental\": {\n            \"SDR\": 17.0721,\n            \"SIR\": 58.3051,\n            \"SAR\": 19.7925,\n            \"ISR\": 17.4556\n          }\n        }\n      },\n      {\n        \"track_name\": \"Chris Durban - Celebrate\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.75441,\n            \"SIR\": 19.6373,\n            \"SAR\": 8.89659,\n            \"ISR\": 11.9712\n          },\n          \"instrumental\": {\n            \"SDR\": 17.6158,\n            \"SIR\": 23.3934,\n            \"SAR\": 20.1776,\n            \"ISR\": 18.584\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Air Traffic\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.6805,\n            \"SIR\": 27.6377,\n            \"SAR\": 13.7269,\n            \"ISR\": 16.6359\n          },\n          \"instrumental\": {\n            \"SDR\": 16.373,\n            \"SIR\": 29.4872,\n            \"SAR\": 18.9453,\n            \"ISR\": 19.0654\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Stella\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.7774,\n            \"SIR\": 28.3098,\n            \"SAR\": 15.2925,\n            \"ISR\": 17.4487\n          },\n          \"instrumental\": {\n            \"SDR\": 17.845,\n            \"SIR\": 32.1952,\n            \"SAR\": 22.0013,\n            \"ISR\": 19.3851\n          }\n        }\n      },\n      {\n        \"track_name\": \"Clara Berry And Wooldog - Waltz For My Victims\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.35433,\n            \"SIR\": 18.5918,\n            \"SAR\": 3.67859,\n            \"ISR\": 10.5347\n          },\n          \"instrumental\": {\n            \"SDR\": 19.4528,\n            \"SIR\": 40.166,\n            \"SAR\": 29.4948,\n            \"ISR\": 19.6741\n          }\n        }\n      },\n      {\n        \"track_name\": \"Cnoc An Tursa - Bannockburn\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 4.53327,\n            \"SIR\": 17.229,\n            \"SAR\": 4.24291,\n            \"ISR\": 7.07261\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2504,\n            \"SIR\": 15.0017,\n            \"SAR\": 14.6025,\n            \"ISR\": 17.8569\n          }\n        }\n      },\n      {\n        \"track_name\": \"Creepoid - OldTree\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.85855,\n            \"SIR\": 23.0928,\n            \"SAR\": 10.6531,\n            \"ISR\": 14.9853\n          },\n          \"instrumental\": {\n            \"SDR\": 16.9119,\n            \"SIR\": 28.2879,\n            \"SAR\": 19.8147,\n            \"ISR\": 19.1488\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dark Ride - Burning Bridges\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.48367,\n            \"SIR\": 20.6599,\n            \"SAR\": 8.7805,\n            \"ISR\": 12.4913\n          },\n          \"instrumental\": {\n            \"SDR\": 15.0735,\n            \"SIR\": 23.8228,\n            \"SAR\": 17.6036,\n            \"ISR\": 18.6037\n          }\n        }\n      },\n      {\n        \"track_name\": \"Dreamers Of The Ghetto - Heavy Love\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.1522,\n            \"SIR\": 22.534,\n            \"SAR\": 10.7907,\n            \"ISR\": 15.2616\n          },\n          \"instrumental\": {\n            \"SDR\": 16.1381,\n            \"SIR\": 27.2561,\n            \"SAR\": 18.1049,\n            \"ISR\": 18.1098\n          }\n        }\n      },\n      {\n        \"track_name\": \"Drumtracks - Ghost Bitch\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2619,\n            \"SIR\": 28.6554,\n            \"SAR\": 13.3458,\n            \"ISR\": 14.8631\n          },\n          \"instrumental\": {\n            \"SDR\": 13.9779,\n            \"SIR\": 21.8934,\n            \"SAR\": 15.5658,\n            \"ISR\": 18.4962\n          }\n        }\n      },\n      {\n        \"track_name\": \"Faces On Film - Waiting For Ga\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.30912,\n            \"SIR\": 19.7839,\n            \"SAR\": 7.21224,\n            \"ISR\": 11.0408\n          },\n          \"instrumental\": {\n            \"SDR\": 13.6766,\n            \"SIR\": 18.5157,\n            \"SAR\": 14.5054,\n            \"ISR\": 17.3744\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Back From The Start\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.8857,\n            \"SIR\": 28.2239,\n            \"SAR\": 12.8874,\n            \"ISR\": 16.6981\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4411,\n            \"SIR\": 27.9302,\n            \"SAR\": 18.1965,\n            \"ISR\": 18.0683\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - Nos Palpitants\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 16.1799,\n            \"SIR\": 33.5181,\n            \"SAR\": 16.5025,\n            \"ISR\": 17.2713\n          },\n          \"instrumental\": {\n            \"SDR\": 15.688,\n            \"SIR\": 21.5381,\n            \"SAR\": 16.0575,\n            \"ISR\": 19.1489\n          }\n        }\n      },\n      {\n        \"track_name\": \"Fergessen - The Wind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4938,\n            \"SIR\": 26.428,\n            \"SAR\": 13.5541,\n            \"ISR\": 16.0992\n          },\n          \"instrumental\": {\n            \"SDR\": 14.6561,\n            \"SIR\": 24.4644,\n            \"SAR\": 16.089,\n            \"ISR\": 18.1811\n          }\n        }\n      },\n      {\n        \"track_name\": \"Flags - 54\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5298,\n            \"SIR\": 36.2277,\n            \"SAR\": 14.7146,\n            \"ISR\": 17.0672\n          },\n          \"instrumental\": {\n            \"SDR\": 18.2133,\n            \"SIR\": 31.9328,\n            \"SAR\": 22.9548,\n            \"ISR\": 19.7633\n          }\n        }\n      },\n      {\n        \"track_name\": \"Giselle - Moss\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.02334,\n            \"SIR\": 16.1852,\n            \"SAR\": 10.7386,\n            \"ISR\": 10.903\n          },\n          \"instrumental\": {\n            \"SDR\": 13.7227,\n            \"SIR\": 17.7357,\n            \"SAR\": 18.69,\n            \"ISR\": 17.951\n          }\n        }\n      },\n      {\n        \"track_name\": \"Grants - PunchDrunk\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 17.1541,\n            \"SIR\": 36.1393,\n            \"SAR\": 20.2997,\n            \"ISR\": 18.7722\n          },\n          \"instrumental\": {\n            \"SDR\": 18.9338,\n            \"SIR\": 38.4876,\n            \"SAR\": 25.9108,\n            \"ISR\": 19.695\n          }\n        }\n      },\n      {\n        \"track_name\": \"Helado Negro - Mitad Del Mundo\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2775,\n            \"SIR\": 22.7516,\n            \"SAR\": 13.8514,\n            \"ISR\": 17.5384\n          },\n          \"instrumental\": {\n            \"SDR\": 11.6894,\n            \"SIR\": 24.096,\n            \"SAR\": 12.6792,\n            \"ISR\": 15.7647\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hezekiah Jones - Borrowed Heart\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.37351,\n            \"SIR\": 26.6779,\n            \"SAR\": 9.80521,\n            \"ISR\": 14.4952\n          },\n          \"instrumental\": {\n            \"SDR\": 14.8059,\n            \"SIR\": 24.1667,\n            \"SAR\": 16.5469,\n            \"ISR\": 18.5168\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hollow Ground - Left Blind\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.94777,\n            \"SIR\": 21.2505,\n            \"SAR\": 8.51137,\n            \"ISR\": 13.1184\n          },\n          \"instrumental\": {\n            \"SDR\": 14.1797,\n            \"SIR\": 24.0022,\n            \"SAR\": 16.4555,\n            \"ISR\": 18.5701\n          }\n        }\n      },\n      {\n        \"track_name\": \"Hop Along - Sister Cities\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.0289,\n            \"SIR\": 28.4668,\n            \"SAR\": 11.7626,\n            \"ISR\": 16.0497\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2159,\n            \"SIR\": 30.666,\n            \"SAR\": 20.7845,\n            \"ISR\": 19.2309\n          }\n        }\n      },\n      {\n        \"track_name\": \"Invisible Familiars - Disturbing Wildlife\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.5372,\n            \"SIR\": 32.6778,\n            \"SAR\": 14.2421,\n            \"ISR\": 17.3472\n          },\n          \"instrumental\": {\n            \"SDR\": 18.4862,\n            \"SIR\": 34.8379,\n            \"SAR\": 23.6458,\n            \"ISR\": 19.5244\n          }\n        }\n      },\n      {\n        \"track_name\": \"James May - All Souls Moon\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 15.2624,\n            \"SIR\": 26.2994,\n            \"SAR\": 16.4019,\n            \"ISR\": 18.2776\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7773,\n            \"SIR\": 32.3473,\n            \"SAR\": 19.3981,\n            \"ISR\": 17.9431\n          }\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 11.0088,\n        \"SIR\": 25.1522,\n        \"SAR\": 11.7767,\n        \"ISR\": 15.399\n      },\n      \"instrumental\": {\n        \"SDR\": 16.0796,\n        \"SIR\": 27.0431,\n        \"SAR\": 18.542,\n        \"ISR\": 18.577\n      }\n    },\n    \"stems\": [\n      \"vocals\",\n      \"instrumental\"\n    ],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_karaoke_gabox.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Karaoke by Gabox\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.88608,\n            \"SIR\": 27.7786,\n            \"SAR\": 6.37612,\n            \"ISR\": 9.25014\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4048,\n            \"SIR\": 21.2481,\n            \"SAR\": 19.3112,\n            \"ISR\": 19.5408\n          },\n          \"seconds_per_minute_m3\": 17.5\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.88608,\n            \"SIR\": 27.7786,\n            \"SAR\": 6.37612,\n            \"ISR\": 9.25014\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4048,\n            \"SIR\": 21.2481,\n            \"SAR\": 19.3112,\n            \"ISR\": 19.5408\n          },\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68991,\n            \"SIR\": 28.3824,\n            \"SAR\": 5.72926,\n            \"ISR\": 8.15281\n          },\n          \"instrumental\": {\n            \"SDR\": 13.618,\n            \"SIR\": 14.1145,\n            \"SAR\": 14.2226,\n            \"ISR\": 19.3663\n          },\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3862,\n            \"SIR\": 30.3756,\n            \"SAR\": 10.6916,\n            \"ISR\": 12.2328\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3727,\n            \"SIR\": 20.9004,\n            \"SAR\": 17.9822,\n            \"ISR\": 19.3596\n          },\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.77195,\n            \"SIR\": 8.6655,\n            \"SAR\": 0.31449,\n            \"ISR\": 7.357\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3035,\n            \"SIR\": 17.2281,\n            \"SAR\": 13.1654,\n            \"ISR\": 16.1609\n          },\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92063,\n            \"SIR\": 29.0326,\n            \"SAR\": 6.73376,\n            \"ISR\": 9.13132\n          },\n          \"instrumental\": {\n            \"SDR\": 13.588,\n            \"SIR\": 12.6607,\n            \"SAR\": 11.4185,\n            \"ISR\": 19.0098\n          },\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.63006,\n            \"SIR\": 22.2393,\n            \"SAR\": 3.69297,\n            \"ISR\": 5.86008\n          },\n          \"instrumental\": {\n            \"SDR\": 6.39273,\n            \"SIR\": 8.46748,\n            \"SAR\": 10.1885,\n            \"ISR\": 18.0867\n          },\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.470655,\n            \"SIR\": 29.0074,\n            \"SAR\": -6.29013,\n            \"ISR\": 0.447975\n          },\n          \"instrumental\": {\n            \"SDR\": 6.65515,\n            \"SIR\": 5.20545,\n            \"SAR\": 19.3375,\n            \"ISR\": 19.8512\n          },\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92486,\n            \"SIR\": 28.1179,\n            \"SAR\": 10.2117,\n            \"ISR\": 11.9891\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4948,\n            \"SIR\": 26.61,\n            \"SAR\": 22.2038,\n            \"ISR\": 19.6608\n          },\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.0375,\n            \"SIR\": 37.1893,\n            \"SAR\": 15.4431,\n            \"ISR\": 17.6437\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4592,\n            \"SIR\": 31.2339,\n            \"SAR\": 19.4534,\n            \"ISR\": 19.4882\n          },\n          \"seconds_per_minute_m3\": 8.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9097,\n            \"SIR\": 33.7731,\n            \"SAR\": 7.02799,\n            \"ISR\": 8.7271\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4964,\n            \"SIR\": 12.616,\n            \"SAR\": 13.1806,\n            \"ISR\": 19.475\n          },\n          \"seconds_per_minute_m3\": 8.5\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.88608,\n            \"SIR\": 27.7786,\n            \"SAR\": 6.37612,\n            \"ISR\": 9.25014\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4048,\n            \"SIR\": 21.2481,\n            \"SAR\": 19.3112,\n            \"ISR\": 19.5408\n          },\n          \"seconds_per_minute_m3\": 9.9\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.68991,\n            \"SIR\": 28.3824,\n            \"SAR\": 5.72926,\n            \"ISR\": 8.15281\n          },\n          \"instrumental\": {\n            \"SDR\": 13.618,\n            \"SIR\": 14.1145,\n            \"SAR\": 14.2226,\n            \"ISR\": 19.3663\n          },\n          \"seconds_per_minute_m3\": 9.2\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3862,\n            \"SIR\": 30.3756,\n            \"SAR\": 10.6916,\n            \"ISR\": 12.2328\n          },\n          \"instrumental\": {\n            \"SDR\": 16.3727,\n            \"SIR\": 20.9004,\n            \"SAR\": 17.9822,\n            \"ISR\": 19.3596\n          },\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.77195,\n            \"SIR\": 8.6655,\n            \"SAR\": 0.31449,\n            \"ISR\": 7.357\n          },\n          \"instrumental\": {\n            \"SDR\": 11.3035,\n            \"SIR\": 17.2281,\n            \"SAR\": 13.1654,\n            \"ISR\": 16.1609\n          },\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92063,\n            \"SIR\": 29.0326,\n            \"SAR\": 6.73376,\n            \"ISR\": 9.13132\n          },\n          \"instrumental\": {\n            \"SDR\": 13.588,\n            \"SIR\": 12.6607,\n            \"SAR\": 11.4185,\n            \"ISR\": 19.0098\n          },\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.63006,\n            \"SIR\": 22.2393,\n            \"SAR\": 3.69297,\n            \"ISR\": 5.86008\n          },\n          \"instrumental\": {\n            \"SDR\": 6.39273,\n            \"SIR\": 8.46748,\n            \"SAR\": 10.1885,\n            \"ISR\": 18.0867\n          },\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.470655,\n            \"SIR\": 29.0074,\n            \"SAR\": -6.29013,\n            \"ISR\": 0.447975\n          },\n          \"instrumental\": {\n            \"SDR\": 6.65515,\n            \"SIR\": 5.20545,\n            \"SAR\": 19.3375,\n            \"ISR\": 19.8512\n          },\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.92486,\n            \"SIR\": 28.1179,\n            \"SAR\": 10.2117,\n            \"ISR\": 11.9891\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4948,\n            \"SIR\": 26.61,\n            \"SAR\": 22.2038,\n            \"ISR\": 19.6608\n          },\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.0375,\n            \"SIR\": 37.1893,\n            \"SAR\": 15.4431,\n            \"ISR\": 17.6437\n          },\n          \"instrumental\": {\n            \"SDR\": 16.4592,\n            \"SIR\": 31.2339,\n            \"SAR\": 19.4534,\n            \"ISR\": 19.4882\n          },\n          \"seconds_per_minute_m3\": 8.3\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 14.9097,\n            \"SIR\": 33.7731,\n            \"SAR\": 7.02799,\n            \"ISR\": 8.7271\n          },\n          \"instrumental\": {\n            \"SDR\": 17.4964,\n            \"SIR\": 12.616,\n            \"SAR\": 13.1806,\n            \"ISR\": 19.475\n          },\n          \"seconds_per_minute_m3\": 8.2\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 8.68991,\n        \"SIR\": 28.3824,\n        \"SAR\": 6.37612,\n        \"ISR\": 9.13132\n      },\n      \"instrumental\": {\n        \"SDR\": 16.3727,\n        \"SIR\": 17.2281,\n        \"SAR\": 17.9822,\n        \"ISR\": 19.475\n      },\n      \"seconds_per_minute_m3\": 8.8\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_denoise_debleed_gabox.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Denoise-Debleed by Gabox\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.0014,\n            \"SIR\": -7.03493,\n            \"SAR\": -0.78799,\n            \"ISR\": 0.00577\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2134,\n            \"SIR\": 10.8891,\n            \"SAR\": 54.0149,\n            \"ISR\": 20.0249\n          },\n          \"seconds_per_minute_m3\": 187.2\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00383,\n            \"SIR\": -18.3608,\n            \"SAR\": -2.43427,\n            \"ISR\": 0.00322\n          },\n          \"instrumental\": {\n            \"SDR\": 6.52325,\n            \"SIR\": 5.77486,\n            \"SAR\": 39.9222,\n            \"ISR\": 19.7504\n          },\n          \"seconds_per_minute_m3\": 189.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00109,\n            \"SIR\": -5.29795,\n            \"SAR\": -4.38224,\n            \"ISR\": 0.00076\n          },\n          \"instrumental\": {\n            \"SDR\": 7.01523,\n            \"SIR\": 6.31504,\n            \"SAR\": 50.0547,\n            \"ISR\": 19.8552\n          },\n          \"seconds_per_minute_m3\": 184.7\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00112,\n            \"SIR\": -3.4899,\n            \"SAR\": -3.95266,\n            \"ISR\": 0.00793\n          },\n          \"instrumental\": {\n            \"SDR\": 9.87913,\n            \"SIR\": 9.54545,\n            \"SAR\": 45.0831,\n            \"ISR\": 19.8955\n          },\n          \"seconds_per_minute_m3\": 190.3\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00467,\n            \"SIR\": -4.53636,\n            \"SAR\": -4.6382,\n            \"ISR\": 0.003105\n          },\n          \"instrumental\": {\n            \"SDR\": 2.92688,\n            \"SIR\": 2.09233,\n            \"SAR\": 33.2344,\n            \"ISR\": 19.6816\n          },\n          \"seconds_per_minute_m3\": 189.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -8e-05,\n            \"SIR\": -10.0035,\n            \"SAR\": -0.11587,\n            \"ISR\": 0.00328\n          },\n          \"instrumental\": {\n            \"SDR\": 2.77824,\n            \"SIR\": 1.9335,\n            \"SAR\": 47.0445,\n            \"ISR\": 19.7313\n          },\n          \"seconds_per_minute_m3\": 187.6\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00069,\n            \"SIR\": -5.94093,\n            \"SAR\": 2.48549,\n            \"ISR\": -0.0003\n          },\n          \"instrumental\": {\n            \"SDR\": 4.99121,\n            \"SIR\": 4.21889,\n            \"SAR\": 54.2827,\n            \"ISR\": 19.9887\n          },\n          \"seconds_per_minute_m3\": 191.9\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00822,\n            \"SIR\": -14.4576,\n            \"SAR\": 0.99151,\n            \"ISR\": 0.003885\n          },\n          \"instrumental\": {\n            \"SDR\": 13.342,\n            \"SIR\": 13.6352,\n            \"SAR\": 55.1339,\n            \"ISR\": 19.9534\n          },\n          \"seconds_per_minute_m3\": 189.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00366,\n            \"SIR\": -3.65317,\n            \"SAR\": -2.79238,\n            \"ISR\": 0.00026\n          },\n          \"instrumental\": {\n            \"SDR\": 4.04863,\n            \"SIR\": 3.25497,\n            \"SAR\": 43.0345,\n            \"ISR\": 19.7609\n          },\n          \"seconds_per_minute_m3\": 189.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.0027,\n            \"SIR\": -3.70894,\n            \"SAR\": -1.77426,\n            \"ISR\": -0.00154\n          },\n          \"instrumental\": {\n            \"SDR\": 4.7393,\n            \"SIR\": 3.93978,\n            \"SAR\": 49.3048,\n            \"ISR\": 19.8585\n          },\n          \"seconds_per_minute_m3\": 188.6\n        }\n      },\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 0.0014,\n            \"SIR\": -7.03493,\n            \"SAR\": -0.78799,\n            \"ISR\": 0.00577\n          },\n          \"instrumental\": {\n            \"SDR\": 11.2134,\n            \"SIR\": 10.8891,\n            \"SAR\": 54.0149,\n            \"ISR\": 20.0249\n          },\n          \"seconds_per_minute_m3\": 184.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00383,\n            \"SIR\": -18.3608,\n            \"SAR\": -2.43427,\n            \"ISR\": 0.00322\n          },\n          \"instrumental\": {\n            \"SDR\": 6.52325,\n            \"SIR\": 5.77486,\n            \"SAR\": 39.9222,\n            \"ISR\": 19.7504\n          },\n          \"seconds_per_minute_m3\": 186.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00109,\n            \"SIR\": -5.29795,\n            \"SAR\": -4.38224,\n            \"ISR\": 0.00076\n          },\n          \"instrumental\": {\n            \"SDR\": 7.01523,\n            \"SIR\": 6.31504,\n            \"SAR\": 50.0547,\n            \"ISR\": 19.8552\n          },\n          \"seconds_per_minute_m3\": 187.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00112,\n            \"SIR\": -3.4899,\n            \"SAR\": -3.95266,\n            \"ISR\": 0.00793\n          },\n          \"instrumental\": {\n            \"SDR\": 9.87913,\n            \"SIR\": 9.54545,\n            \"SAR\": 45.0831,\n            \"ISR\": 19.8955\n          },\n          \"seconds_per_minute_m3\": 189.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00467,\n            \"SIR\": -4.53636,\n            \"SAR\": -4.6382,\n            \"ISR\": 0.003105\n          },\n          \"instrumental\": {\n            \"SDR\": 2.92688,\n            \"SIR\": 2.09233,\n            \"SAR\": 33.2344,\n            \"ISR\": 19.6816\n          },\n          \"seconds_per_minute_m3\": 189.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -8e-05,\n            \"SIR\": -10.0035,\n            \"SAR\": -0.11587,\n            \"ISR\": 0.00328\n          },\n          \"instrumental\": {\n            \"SDR\": 2.77824,\n            \"SIR\": 1.9335,\n            \"SAR\": 47.0445,\n            \"ISR\": 19.7313\n          },\n          \"seconds_per_minute_m3\": 186.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00069,\n            \"SIR\": -5.94093,\n            \"SAR\": 2.48549,\n            \"ISR\": -0.0003\n          },\n          \"instrumental\": {\n            \"SDR\": 4.99121,\n            \"SIR\": 4.21889,\n            \"SAR\": 54.2827,\n            \"ISR\": 19.9887\n          },\n          \"seconds_per_minute_m3\": 194.3\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00822,\n            \"SIR\": -14.4576,\n            \"SAR\": 0.99151,\n            \"ISR\": 0.003885\n          },\n          \"instrumental\": {\n            \"SDR\": 13.342,\n            \"SIR\": 13.6352,\n            \"SAR\": 55.1339,\n            \"ISR\": 19.9534\n          },\n          \"seconds_per_minute_m3\": 188.7\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.00366,\n            \"SIR\": -3.65317,\n            \"SAR\": -2.79238,\n            \"ISR\": 0.00026\n          },\n          \"instrumental\": {\n            \"SDR\": 4.04863,\n            \"SIR\": 3.25497,\n            \"SAR\": 43.0345,\n            \"ISR\": 19.7609\n          },\n          \"seconds_per_minute_m3\": 187.7\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": -0.0027,\n            \"SIR\": -3.70894,\n            \"SAR\": -1.77426,\n            \"ISR\": -0.00154\n          },\n          \"instrumental\": {\n            \"SDR\": 4.7393,\n            \"SIR\": 3.93978,\n            \"SAR\": 49.3048,\n            \"ISR\": 19.8585\n          },\n          \"seconds_per_minute_m3\": 196.9\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": -0.00191,\n        \"SIR\": -5.61944,\n        \"SAR\": -2.10426,\n        \"ISR\": 0.0031625\n      },\n      \"instrumental\": {\n        \"SDR\": 5.75723,\n        \"SIR\": 4.99688,\n        \"SAR\": 48.1746,\n        \"ISR\": 19.8569\n      },\n      \"seconds_per_minute_m3\": 188.8\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_kim_ft2_unwa.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | FT 2 by unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.6\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.0\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.9\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 8.8\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_kim_ft2_bleedless_unwa.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer Kim | FT 2 Bleedless by unwa\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 11.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 10.3\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.9\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.6\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 9.2\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 8.8\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 9.6\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_vocals_becruily.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Vocals by becruily\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 193.7\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 192.2\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 185.1\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 186.2\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 190.2\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 197.5\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 193.7\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 189.6\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 191.4\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 187.5\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 190.8\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_instrumental_becruily.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Instrumental by becruily\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 6.54838,\n            \"SIR\": 23.3712,\n            \"SAR\": 7.08102,\n            \"ISR\": 10.7044\n          },\n          \"instrumental\": {\n            \"SDR\": 16.8472,\n            \"SIR\": 24.8601,\n            \"SAR\": 20.4971,\n            \"ISR\": 19.3149\n          },\n          \"seconds_per_minute_m3\": 196.2\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 8.73301,\n            \"SIR\": 25.8327,\n            \"SAR\": 8.89428,\n            \"ISR\": 12.1507\n          },\n          \"instrumental\": {\n            \"SDR\": 14.5559,\n            \"SIR\": 21.2613,\n            \"SAR\": 16.3529,\n            \"ISR\": 18.8549\n          },\n          \"seconds_per_minute_m3\": 188.9\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.6503,\n            \"SIR\": 28.9944,\n            \"SAR\": 12.5913,\n            \"ISR\": 15.3496\n          },\n          \"instrumental\": {\n            \"SDR\": 16.7889,\n            \"SIR\": 27.3235,\n            \"SAR\": 19.7342,\n            \"ISR\": 19.237\n          },\n          \"seconds_per_minute_m3\": 189.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 3.39664,\n            \"SIR\": 7.7368,\n            \"SAR\": 3.69738,\n            \"ISR\": 11.5511\n          },\n          \"instrumental\": {\n            \"SDR\": 12.0128,\n            \"SIR\": 23.3865,\n            \"SAR\": 13.95,\n            \"ISR\": 15.072\n          },\n          \"seconds_per_minute_m3\": 193.7\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.9068,\n            \"SIR\": 27.0069,\n            \"SAR\": 13.8643,\n            \"ISR\": 15.6858\n          },\n          \"instrumental\": {\n            \"SDR\": 15.171,\n            \"SIR\": 23.7782,\n            \"SAR\": 16.3809,\n            \"ISR\": 18.2652\n          },\n          \"seconds_per_minute_m3\": 191.0\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 10.124,\n            \"SIR\": 21.3743,\n            \"SAR\": 11.1673,\n            \"ISR\": 13.8313\n          },\n          \"instrumental\": {\n            \"SDR\": 12.4319,\n            \"SIR\": 20.1249,\n            \"SAR\": 13.9947,\n            \"ISR\": 17.0897\n          },\n          \"seconds_per_minute_m3\": 197.3\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 11.3785,\n            \"SIR\": 23.5187,\n            \"SAR\": 12.508,\n            \"ISR\": 13.8912\n          },\n          \"instrumental\": {\n            \"SDR\": 15.2864,\n            \"SIR\": 22.2421,\n            \"SAR\": 17.5517,\n            \"ISR\": 18.4807\n          },\n          \"seconds_per_minute_m3\": 201.1\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.41175,\n            \"SIR\": 24.9306,\n            \"SAR\": 7.57044,\n            \"ISR\": 11.0494\n          },\n          \"instrumental\": {\n            \"SDR\": 18.6341,\n            \"SIR\": 29.5544,\n            \"SAR\": 25.1912,\n            \"ISR\": 19.6354\n          },\n          \"seconds_per_minute_m3\": 188.6\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.4047,\n            \"SIR\": 35.1477,\n            \"SAR\": 13.5941,\n            \"ISR\": 15.8628\n          },\n          \"instrumental\": {\n            \"SDR\": 15.4372,\n            \"SIR\": 25.9366,\n            \"SAR\": 17.7951,\n            \"ISR\": 19.3184\n          },\n          \"seconds_per_minute_m3\": 191.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 12.2305,\n            \"SIR\": 23.9563,\n            \"SAR\": 11.279,\n            \"ISR\": 12.4526\n          },\n          \"instrumental\": {\n            \"SDR\": 16.2548,\n            \"SIR\": 19.7432,\n            \"SAR\": 16.6159,\n            \"ISR\": 18.3415\n          },\n          \"seconds_per_minute_m3\": 194.4\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 10.7513,\n        \"SIR\": 24.4434,\n        \"SAR\": 11.2232,\n        \"ISR\": 13.142\n      },\n      \"instrumental\": {\n        \"SDR\": 15.3618,\n        \"SIR\": 23.5823,\n        \"SAR\": 17.0838,\n        \"ISR\": 18.6678\n      },\n      \"seconds_per_minute_m3\": 192.4\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"mel_band_roformer_vocal_fullness_aname.ckpt\": {\n    \"model_name\": \"Roformer Model: MelBand Roformer | Vocals Fullness by Aname\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 188.4\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 193.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 188.1\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Rockshow\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 189.3\n        }\n      },\n      {\n        \"track_name\": \"Actions - Devil's Words\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 187.8\n        }\n      },\n      {\n        \"track_name\": \"Actions - One Minute Smile\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 193.0\n        }\n      },\n      {\n        \"track_name\": \"Actions - South Of The Water\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 199.2\n        }\n      },\n      {\n        \"track_name\": \"Aimee Norwich - Child\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 187.0\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Goodbye Bolero\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 192.9\n        }\n      },\n      {\n        \"track_name\": \"Alexander Ross - Velvet Curtain\",\n        \"scores\": {\n          \"seconds_per_minute_m3\": 191.3\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"seconds_per_minute_m3\": 190.3\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  },\n  \"bs_roformer_vocals_gabox.ckpt\": {\n    \"model_name\": \"Roformer Model: BS Roformer | Vocals by Gabox\",\n    \"track_scores\": [\n      {\n        \"track_name\": \"A Classic Education - NightOwl\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 7.30156,\n            \"SIR\": 21.5334,\n            \"SAR\": 6.88792,\n            \"ISR\": 12.6678\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2849,\n            \"SIR\": 28.2602,\n            \"SAR\": 20.7549,\n            \"ISR\": 19.1586\n          },\n          \"seconds_per_minute_m3\": 141.0\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Clinic A\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 9.38357,\n            \"SIR\": 25.3187,\n            \"SAR\": 9.32861,\n            \"ISR\": 13.8029\n          },\n          \"instrumental\": {\n            \"SDR\": 14.7397,\n            \"SIR\": 24.1059,\n            \"SAR\": 16.5124,\n            \"ISR\": 18.6934\n          },\n          \"seconds_per_minute_m3\": 154.8\n        }\n      },\n      {\n        \"track_name\": \"ANiMAL - Easy Tiger\",\n        \"scores\": {\n          \"vocals\": {\n            \"SDR\": 13.0706,\n            \"SIR\": 29.6609,\n            \"SAR\": 13.9141,\n            \"ISR\": 17.2466\n          },\n          \"instrumental\": {\n            \"SDR\": 17.2147,\n            \"SIR\": 31.544,\n            \"SAR\": 20.5026,\n            \"ISR\": 19.2753\n          },\n          \"seconds_per_minute_m3\": 170.7\n        }\n      }\n    ],\n    \"median_scores\": {\n      \"vocals\": {\n        \"SDR\": 9.38357,\n        \"SIR\": 25.3187,\n        \"SAR\": 9.32861,\n        \"ISR\": 13.8029\n      },\n      \"instrumental\": {\n        \"SDR\": 17.2147,\n        \"SIR\": 28.2602,\n        \"SAR\": 20.5026,\n        \"ISR\": 19.1586\n      },\n      \"seconds_per_minute_m3\": 154.8\n    },\n    \"stems\": [],\n    \"target_stem\": null\n  }\n}"
  },
  {
    "path": "audio_separator/models.json",
    "content": "{\n    \"vr_download_list\": {\n        \"VR Arch Single Model v4: UVR-De-Reverb by aufr33-jarredou\": \"UVR-De-Reverb-aufr33-jarredou.pth\",\n        \"VR Arch Single Model v5: UVR-BVE-4B_SN-44100-2\": \"UVR-BVE-4B_SN-44100-2.pth\"\n    },\n    \"mdx_download_list\": {\n        \"MDX-Net Model: UVR-MDX-NET Inst HQ 5\": \"UVR-MDX-NET-Inst_HQ_5.onnx\"\n    },\n    \"mdx23c_download_list\": {\n        \"MDX23C Model: MDX23C De-Reverb by aufr33-jarredou\": {\n            \"MDX23C-De-Reverb-aufr33-jarredou.ckpt\": \"config_dereverb_mdx23c.yaml\"\n        },\n        \"MDX23C Model: MDX23C DrumSep by aufr33-jarredou\": {\n            \"MDX23C-DrumSep-aufr33-jarredou.ckpt\": \"config_drumsep_mdx23c.yaml\"\n        }\n    },\n    \"roformer_download_list\": {\n        \"Roformer Model: Mel-Roformer-Karaoke-Aufr33-Viperx\": {\n            \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\": \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956_config.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Karaoke by Gabox\": {\n            \"mel_band_roformer_karaoke_gabox.ckpt\": \"config_mel_band_roformer_karaoke_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Karaoke V2 by Gabox\": {\n            \"mel_band_roformer_karaoke_gabox_v2.ckpt\": \"config_mel_band_roformer_karaoke_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Karaoke by becruily\": {\n            \"mel_band_roformer_karaoke_becruily.ckpt\": \"config_mel_band_roformer_karaoke_becruily.yaml\"\n        },\n        \"Roformer Model: Mel-Roformer-Denoise-Aufr33\": {\n            \"denoise_mel_band_roformer_aufr33_sdr_27.9959.ckpt\": \"denoise_mel_band_roformer_aufr33_sdr_27.9959_config.yaml\"\n        },\n        \"Roformer Model: Mel-Roformer-Denoise-Aufr33-Aggr\": {\n            \"denoise_mel_band_roformer_aufr33_aggr_sdr_27.9768.ckpt\": \"denoise_mel_band_roformer_aufr33_aggr_sdr_27.9768_config.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Denoise-Debleed by Gabox\": {\n            \"mel_band_roformer_denoise_debleed_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: Mel-Roformer-Crowd-Aufr33-Viperx\": {\n            \"mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt\": \"mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144_config.yaml\"\n        },\n        \"Roformer Model: BS-Roformer-De-Reverb\": {\n            \"deverb_bs_roformer_8_384dim_10depth.ckpt\": \"deverb_bs_roformer_8_384dim_10depth_config.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals by Kimberley Jensen\": {\n            \"vocals_mel_band_roformer.ckpt\": \"vocals_mel_band_roformer.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | FT by unwa\": {\n            \"mel_band_roformer_kim_ft_unwa.ckpt\": \"config_mel_band_roformer_kim_ft_unwa.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | FT 2 by unwa\": {\n            \"mel_band_roformer_kim_ft2_unwa.ckpt\": \"config_mel_band_roformer_kim_ft_unwa.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | FT 2 Bleedless by unwa\": {\n            \"mel_band_roformer_kim_ft2_bleedless_unwa.ckpt\": \"config_mel_band_roformer_kim_ft_unwa.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | FT 3 by unwa\": {\n            \"mel_band_roformer_kim_ft3_unwa.ckpt\": \"config_mel_band_roformer_kim_ft_unwa.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Inst V1 Plus by Unwa\": {\n            \"melband_roformer_inst_v1_plus.ckpt\": \"config_melbandroformer_inst.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Inst V1 (E) by Unwa\": {\n            \"melband_roformer_inst_v1e.ckpt\": \"config_melbandroformer_inst.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Inst V1 (E) Plus by Unwa\": {\n            \"melband_roformer_inst_v1e_plus.ckpt\": \"config_melbandroformer_inst.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Vocals Revive by Unwa\": {\n            \"bs_roformer_vocals_revive_unwa.ckpt\": \"config_bs_roformer_vocals_revive_unwa.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Vocals Revive V2 by Unwa\": {\n            \"bs_roformer_vocals_revive_v2_unwa.ckpt\": \"config_bs_roformer_vocals_revive_unwa.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Vocals Revive V3e by Unwa\": {\n            \"bs_roformer_vocals_revive_v3e_unwa.ckpt\": \"config_bs_roformer_vocals_revive_unwa.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals by becruily\": {\n            \"mel_band_roformer_vocals_becruily.ckpt\": \"config_mel_band_roformer_vocals_becruily.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental by becruily\": {\n            \"mel_band_roformer_instrumental_becruily.ckpt\": \"config_mel_band_roformer_instrumental_becruily.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals Fullness by Aname\": {\n            \"mel_band_roformer_vocal_fullness_aname.ckpt\": \"config_mel_band_roformer_vocal_fullness_aname.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Vocals by Gabox\": {\n            \"bs_roformer_vocals_gabox.ckpt\": \"config_bs_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals by Gabox\": {\n            \"mel_band_roformer_vocals_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals V2 by Gabox\": {\n            \"mel_band_roformer_vocals_v2_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV1 by Gabox\": {\n            \"mel_band_roformer_vocals_fv1_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV2 by Gabox\": {\n            \"mel_band_roformer_vocals_fv2_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV3 by Gabox\": {\n            \"mel_band_roformer_vocals_fv3_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV4 by Gabox\": {\n            \"mel_band_roformer_vocals_fv4_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV5 by Gabox\": {\n            \"mel_band_roformer_vocals_fv5_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Vocals FV6 by Gabox\": {\n            \"mel_band_roformer_vocals_fv6_gabox.ckpt\": \"config_mel_band_roformer_vocals_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental by Gabox\": {\n            \"mel_band_roformer_instrumental_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental 2 by Gabox\": {\n            \"mel_band_roformer_instrumental_2_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental 3 by Gabox\": {\n            \"mel_band_roformer_instrumental_3_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Bleedless V1 by Gabox\": {\n            \"mel_band_roformer_instrumental_bleedless_v1_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Bleedless V2 by Gabox\": {\n            \"mel_band_roformer_instrumental_bleedless_v2_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Bleedless V3 by Gabox\": {\n            \"mel_band_roformer_instrumental_bleedless_v3_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Fullness V1 by Gabox\": {\n            \"mel_band_roformer_instrumental_fullness_v1_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Fullness V2 by Gabox\": {\n            \"mel_band_roformer_instrumental_fullness_v2_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Fullness V3 by Gabox\": {\n            \"mel_band_roformer_instrumental_fullness_v3_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental Fullness Noisy V4 by Gabox\": {\n            \"mel_band_roformer_instrumental_fullness_noise_v4_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV5 by Gabox\": {\n            \"mel_band_roformer_instrumental_instv5_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV5N by Gabox\": {\n            \"mel_band_roformer_instrumental_instv5n_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV6 by Gabox\": {\n            \"mel_band_roformer_instrumental_instv6_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV6N by Gabox\": {\n            \"mel_band_roformer_instrumental_instv6n_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV7 by Gabox\": {\n            \"mel_band_roformer_instrumental_instv7_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV7N by Gabox\": {\n            \"mel_band_roformer_instrumental_instv7n_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV8 by Gabox\": {\n            \"mel_band_roformer_instrumental_instv8_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | INSTV8N by Gabox\": {\n            \"mel_band_roformer_instrumental_instv8n_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental FV7z by Gabox\": {\n            \"mel_band_roformer_instrumental_fv7z_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental FV8 by Gabox\": {\n            \"mel_band_roformer_instrumental_fv8_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Instrumental FVX by Gabox\": {\n            \"mel_band_roformer_instrumental_fvx_gabox.ckpt\": \"config_mel_band_roformer_instrumental_gabox.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb by anvuew\": {\n            \"dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt\": \"dereverb_mel_band_roformer_anvuew.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb Less Aggressive by anvuew\": {\n            \"dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt\": \"dereverb_mel_band_roformer_anvuew.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb Mono by anvuew\": {\n            \"dereverb_mel_band_roformer_mono_anvuew.ckpt\": \"dereverb_mel_band_roformer_anvuew.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb Big by Sucial\": {\n            \"dereverb_big_mbr_ep_362.ckpt\": \"config_dereverb_echo_mel_band_roformer_v2.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb Super Big by Sucial\": {\n            \"dereverb_super_big_mbr_ep_346.ckpt\": \"config_dereverb_echo_mel_band_roformer_v2.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb-Echo by Sucial\": {\n            \"dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt\": \"config_dereverb-echo_mel_band_roformer.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb-Echo V2 by Sucial\": {\n            \"dereverb-echo_mel_band_roformer_sdr_13.4843_v2.ckpt\": \"config_dereverb-echo_mel_band_roformer_sdr_13.4843_v2.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | De-Reverb-Echo Fused by Sucial\": {\n            \"dereverb_echo_mbr_fused.ckpt\": \"config_dereverb_echo_mel_band_roformer_v2.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | SYHFT by SYH99999\": {\n            \"MelBandRoformerSYHFT.ckpt\": \"config_vocals_mel_band_roformer_ft.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | SYHFT V2 by SYH99999\": {\n            \"MelBandRoformerSYHFTV2.ckpt\": \"config_vocals_mel_band_roformer_ft.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | SYHFT V2.5 by SYH99999\": {\n            \"MelBandRoformerSYHFTV2.5.ckpt\": \"config_vocals_mel_band_roformer_ft.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | SYHFT V3 by SYH99999\": {\n            \"MelBandRoformerSYHFTV3Epsilon.ckpt\": \"config_vocals_mel_band_roformer_ft.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Big SYHFT V1 by SYH99999\": {\n            \"MelBandRoformerBigSYHFTV1.ckpt\": \"config_vocals_mel_band_roformer_big_v1_ft.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Big Beta 4 FT by unwa\": {\n            \"melband_roformer_big_beta4.ckpt\": \"config_melbandroformer_big_beta4.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer Kim | Big Beta 5e FT by unwa\": {\n            \"melband_roformer_big_beta5e.ckpt\": \"config_melband_roformer_big_beta5e.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Big Beta 6 by unwa\": {\n            \"melband_roformer_big_beta6.ckpt\": \"config_melbandroformer_big_beta6.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Big Beta 6X by unwa\": {\n            \"melband_roformer_big_beta6x.ckpt\": \"config_melbandroformer_big_beta6x.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Chorus Male-Female by Sucial\": {\n            \"model_chorus_bs_roformer_ep_267_sdr_24.1275.ckpt\": \"config_chorus_male_female_bs_roformer.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Male-Female by aufr33\": {\n            \"bs_roformer_male_female_by_aufr33_sdr_7.2889.ckpt\": \"config_chorus_male_female_bs_roformer.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Aspiration by Sucial\": {\n            \"aspiration_mel_band_roformer_sdr_18.9845.ckpt\": \"config_aspiration_mel_band_roformer.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Aspiration Less Aggressive by Sucial\": {\n            \"aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt\": \"config_aspiration_mel_band_roformer.yaml\"\n        },\n        \"Roformer Model: MelBand Roformer | Bleed Suppressor V1 by unwa-97chris\": {\n            \"mel_band_roformer_bleed_suppressor_v1.ckpt\": \"config_mel_band_roformer_bleed_suppressor_v1.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Vocals Resurrection by unwa\": {\n            \"bs_roformer_vocals_resurrection_unwa.ckpt\": \"config_bs_roformer_vocals_resurrection_unwa.yaml\"\n        },\n        \"Roformer Model: BS Roformer | Instrumental Resurrection by unwa\": {\n            \"bs_roformer_instrumental_resurrection_unwa.ckpt\": \"config_bs_roformer_instrumental_resurrection_unwa.yaml\"\n        },\n        \"Roformer Model: BS Roformer SW by jarredou\": {\n            \"BS-Roformer-SW.ckpt\": \"BS-Roformer-SW.yaml\"\n        }\n    }\n}"
  },
  {
    "path": "audio_separator/remote/README.md",
    "content": "## Remote API Usage 🌐\n\nAudio Separator includes a remote API client that allows you to connect to a deployed Audio Separator API service, enabling you to perform audio separation without running the models locally. The API uses asynchronous processing with job polling for efficient handling of separation tasks.\n\nTo make this easy to set up and use cheaply, we've created a deployment script for [modal.com](https://modal.com) who currently (July 2025) offer $30/mo free in GPU credits.\nConsidering [their pricing](https://modal.com/pricing) for execution with an Nvidia T4 is $0.000164 / sec and most audio-separation jobs take less than 2 minutes, that's around $0.019 per separation job.\nWith $30/month in free credits, that's over **1,500 GPU-accelerated audio separation jobs per month, for free!**\n\n**✨ Key Features:**\n\n- **Multiple Model Support**: Upload once, separate with multiple models in a single job\n- **Full Parameter Compatibility**: All local CLI parameters and architecture settings supported\n- **Efficient Processing**: Avoid repeated uploads when comparing different models\n- **Real-time Progress**: Track progress across multiple models with detailed status updates\n\n#### Multiple Model Workflow\n\n```mermaid\ngraph TD\n    A[\"🎵 Upload Audio File<br/>audio.wav\"] --> B[\"☁️ Remote API Server\"]\n    B --> C[\"📋 Queue Multiple Models<br/>• model_bs_roformer.ckpt<br/>• UVR-MDX-NET.onnx<br/>• htdemucs_6s.yaml\"]\n    C --> D[\"🚀 Process Model 1<br/>BS-Roformer\"]\n    C --> E[\"🚀 Process Model 2<br/>MDX-NET\"]\n    C --> F[\"🚀 Process Model 3<br/>Demucs 6-stem\"]\n    D --> G[\"📁 Generate Output<br/>audio_(Vocals)_bs_roformer.flac<br/>audio_(Instrumental)_bs_roformer.flac\"]\n    E --> H[\"📁 Generate Output<br/>audio_(Vocals)_mdx.flac<br/>audio_(Instrumental)_mdx.flac\"]\n    F --> I[\"📁 Generate Output<br/>audio_(Vocals)_htdemucs.flac<br/>audio_(Drums)_htdemucs.flac<br/>audio_(Bass)_htdemucs.flac<br/>etc...\"]\n    G --> J[\"📥 Download All Results<br/>Compare quality across models\"]\n    H --> J\n    I --> J\n\n    style A fill:#e1f5fe,color:#000\n    style B fill:#f3e5f5,color:#000\n    style C fill:#fff3e0,color:#000\n    style D fill:#e8f5e8,color:#000\n    style E fill:#e8f5e8,color:#000\n    style F fill:#e8f5e8,color:#000\n    style J fill:#e1f5fe,color:#000\n```\n\n### Deploying the API Server\n\nTo use the remote API functionality, you'll need to deploy the Audio Separator API server. The easiest way is using Modal.com:\n\n1. **Sign up for Modal.com** at [modal.com](https://modal.com)\n2. **Install the Modal CLI** and authenticate:\n   (Note, modal will need access to the installed project dependencies in whatever virtual environment you're using for audio-separator)\n   ```bash\n   poetry install --extras cpu # need either cpu or gpu to ensure onnxruntime is installed\n   pip install modal\n   modal setup\n   ```\n3. **Deploy the Audio Separator API**:\n   ```bash\n   modal deploy audio_separator/remote/deploy_modal.py\n   ```\n4. **Get your API URL** from the deployment output. It will look like:\n   ```\n   https://USERNAME--audio-separator-api.modal.run\n   ```\n\nSet this API URL as an environment variable:\n\n```bash\nexport AUDIO_SEPARATOR_API_URL=\"https://USERNAME--audio-separator-api.modal.run\"\n```\n\nOr pass it directly with the `--api_url` parameter.\n\n### Remote API Client (Python)\n\nYou can use the `AudioSeparatorAPIClient` class to interact with a remote Audio Separator API:\n\n```python\nimport logging\nfrom audio_separator.remote import AudioSeparatorAPIClient\n\n# Set up logging\nlogger = logging.getLogger(__name__)\n\n# Initialize the API client\napi_client = AudioSeparatorAPIClient(\"https://USERNAME--audio-separator-api.modal.run\", logger)\n\n# Simple example: separate audio and get results\nresult = api_client.separate_audio_and_wait(\"audio.mp3\")\nif result[\"status\"] == \"completed\":\n    print(f\"✅ Separation completed! Downloaded files:\")\n    for file_path in result[\"downloaded_files\"]:\n        print(f\"  - {file_path}\")\nelse:\n    print(f\"❌ Separation failed: {result.get('error', 'Unknown error')}\")\n\n# Multiple models example: separate with multiple models in one upload\nresult = api_client.separate_audio_and_wait(\n    \"path/to/audio.wav\",\n    models=[\n        \"model_bs_roformer_ep_317_sdr_12.9755.ckpt\",\n        \"UVR-MDX-NET-Inst_HQ_4.onnx\",\n        \"htdemucs_6s.yaml\"\n    ],\n    timeout=600,           # Wait up to 10 minutes for multiple models\n    poll_interval=10,      # Check status every 10 seconds\n    download=True,         # Automatically download files\n    output_dir=\"./output\"  # Save files to specific directory\n)\n\n# Complex example with custom options and separator parameters\nresult = api_client.separate_audio_and_wait(\n    \"path/to/audio.wav\",\n    model=\"model_bs_roformer_ep_317_sdr_12.9755.ckpt\",\n    timeout=300,           # Wait up to 5 minutes\n    poll_interval=10,      # Check status every 10 seconds\n    download=True,         # Automatically download files\n    output_dir=\"./output\", # Save files to specific directory\n    # Separator configuration options (same as local CLI)\n    output_format=\"wav\",\n    normalization_threshold=0.8,\n    custom_output_names={\"Vocals\": \"lead_vocals\", \"Instrumental\": \"backing_track\"},\n    # MDX parameters\n    mdx_segment_size=512,\n    mdx_batch_size=2,\n    # VR parameters\n    vr_aggression=10,\n    vr_window_size=320,\n    # And any other separator parameters...\n)\n\n# Advanced approach: manual job management (for custom polling logic)\nresult = api_client.separate_audio(\n    \"path/to/audio.wav\",\n    models=[\"model1.ckpt\", \"model2.onnx\"],\n    custom_output_names={\"Vocals\": \"vocals_output\", \"Instrumental\": \"instrumental_output\"}\n)\ntask_id = result[\"task_id\"]\nprint(f\"Job submitted! Task ID: {task_id}\")\n\n# Custom polling logic\nimport time\nwhile True:\n    status = api_client.get_job_status(task_id)\n    print(f\"Job status: {status['status']}\")\n\n    # Show progress with model information\n    if \"progress\" in status:\n        progress_info = f\"Progress: {status['progress']}%\"\n        if \"current_model_index\" in status and \"total_models\" in status:\n            model_info = f\" (Model {status['current_model_index'] + 1}/{status['total_models']})\"\n            progress_info += model_info\n        print(progress_info)\n\n    if status[\"status\"] == \"completed\":\n        # Download files manually\n        for filehash, filename in status[\"files\"].items():\n            output_path = api_client.download_file_by_hash(task_id, filehash, filename)\n            print(f\"Downloaded: {output_path}\")\n        break\n    elif status[\"status\"] == \"error\":\n        print(f\"Job failed: {status.get('error', 'Unknown error')}\")\n        break\n    else:\n        time.sleep(10)  # Wait 10 seconds\n\n# List available models\nmodels = api_client.list_models()\nprint(models[\"text\"])\n\n# Get server version\nversion = api_client.get_server_version()\nprint(f\"Server version: {version}\")\n```\n\n### Remote API CLI\n\nAudio Separator also provides a command-line interface for interacting with remote APIs:\n\n#### Commands\n\n**Separate audio files:**\n\n```bash\n# Separate audio file (asynchronous processing)\naudio-separator-remote separate audio.wav --model model_bs_roformer_ep_317_sdr_12.9755.ckpt\n\n# Multiple models - upload once, separate with multiple models\naudio-separator-remote separate audio.wav --models model_bs_roformer_ep_317_sdr_12.9755.ckpt UVR-MDX-NET-Inst_HQ_4.onnx htdemucs_6s.yaml\n\n# Multiple files\naudio-separator-remote separate audio1.wav audio2.wav audio3.wav\n\n# Use default model (if not specified)\naudio-separator-remote separate audio.wav\n\n# Advanced separation with custom parameters (all local CLI parameters supported)\naudio-separator-remote separate audio.wav \\\n  --model model_bs_roformer_ep_317_sdr_12.9755.ckpt \\\n  --output_format wav \\\n  --normalization 0.8 \\\n  --custom_output_names '{\"Vocals\": \"lead_vocals\", \"Instrumental\": \"backing_track\"}' \\\n  --mdx_segment_size 512 \\\n  --vr_aggression 10\n```\n\n**Check job status:**\n\n```bash\naudio-separator-remote status <task_id>\n```\n\n**List available models:**\n\n```bash\n# Pretty formatted list\naudio-separator-remote models\n\n# JSON output\naudio-separator-remote models --format json\n\n# Filter by stem type\naudio-separator-remote models --filter vocals\n```\n\n**Download specific files:**\n\n```bash\naudio-separator-remote download <task_id> filename1.wav filename2.wav\n```\n\n**Get version information:**\n\n```bash\naudio-separator-remote --version\n```\n\n#### CLI Options\n\n**Global Options:**\n\n- `--api_url`: Override the API URL\n- `--timeout`: Set timeout for polling (default: 600 seconds)\n- `--poll_interval`: Set polling interval (default: 10 seconds)\n- `--debug`: Enable debug logging\n- `--log_level`: Set log level (info, debug, warning, etc.)\n\n**Model Selection:**\n\n- `--model`: Single model to use for separation\n- `--models`: Multiple models to use for separation (space-separated)\n\n**Output Parameters:**\n\n- `--output_format`: Output format (default: flac)\n- `--output_bitrate`: Output bitrate\n- `--normalization`: Max peak amplitude to normalize to (default: 0.9)\n- `--amplification`: Min peak amplitude to amplify to (default: 0.0)\n- `--single_stem`: Output only single stem (e.g. Vocals, Instrumental)\n- `--invert_spect`: Invert secondary stem using spectrogram\n- `--sample_rate`: Sample rate of output audio (default: 44100)\n- `--use_soundfile`: Use soundfile for output writing\n- `--use_autocast`: Use PyTorch autocast for faster inference\n- `--custom_output_names`: Custom output names in JSON format\n\n**Architecture-Specific Parameters:**\nAll MDX, VR, Demucs, and MDXC parameters from the local CLI are supported:\n\n- `--mdx_segment_size`, `--mdx_overlap`, `--mdx_batch_size`, etc.\n- `--vr_batch_size`, `--vr_window_size`, `--vr_aggression`, etc.\n- `--demucs_segment_size`, `--demucs_shifts`, `--demucs_overlap`, etc.\n- `--mdxc_segment_size`, `--mdxc_overlap`, `--mdxc_batch_size`, etc.\n\n#### Examples\n\n```bash\n# Separate with multiple models and custom settings\naudio-separator-remote separate song.mp3 \\\n  --models model_bs_roformer_ep_317_sdr_12.9755.ckpt UVR-MDX-NET-Inst_HQ_4.onnx \\\n  --output_format wav \\\n  --normalization 0.8 \\\n  --api_url https://my-api.com \\\n  --timeout 600\n\n# Separate with custom output names\naudio-separator-remote separate song.mp3 \\\n  --model htdemucs_6s.yaml \\\n  --custom_output_names '{\"Vocals\": \"vocals\", \"Drums\": \"drums\", \"Bass\": \"bass\", \"Other\": \"other\"}'\n\n# Check status with debug logging\naudio-separator-remote status abc123 --debug\n\n# List vocal separation models in JSON format\naudio-separator-remote models --filter vocals --format json\n\n# Use VR parameters for better vocal isolation\naudio-separator-remote separate vocals.wav \\\n  --model 2_HP-UVR.pth \\\n  --vr_aggression 15 \\\n  --vr_window_size 320 \\\n  --vr_enable_tta\n```\n\n#### Key Features\n\nThe remote API client automatically handles:\n\n- **File uploading and downloading**: Seamless transfer of audio files and results\n- **Multiple model processing**: Upload once, separate with multiple models efficiently\n- **Full separator compatibility**: All local CLI parameters and architectures supported\n- **Job polling and status updates**: Real-time progress tracking with model-specific information\n- **Error handling and retries**: Robust error handling for reliable processing\n- **Progress reporting**: Detailed progress updates including current model being processed\n\n#### Benefits of Multiple Model Support\n\nWhen using multiple models, the remote API provides significant advantages:\n\n- **Efficiency**: Upload your audio file once, process with multiple models without re-uploading\n- **Comparison**: Easily compare results from different models (e.g., vocals vs. instrumental quality)\n- **Workflow optimization**: Process with complementary models in a single job\n- **Time savings**: Avoid repeated upload times for large audio files\n\nExample use cases:\n\n- Compare quality between `model_bs_roformer_ep_317_sdr_12.9755.ckpt` (high-quality vocals) and `UVR-MDX-NET-Inst_HQ_4.onnx` (high-quality instrumental)\n- Process with both 2-stem models (vocals/instrumental) and multi-stem models (vocals/drums/bass/other) in one job\n- Use different models optimized for different parts of the frequency spectrum\n"
  },
  {
    "path": "audio_separator/remote/__init__.py",
    "content": "from .api_client import AudioSeparatorAPIClient\n\n__all__ = [\"AudioSeparatorAPIClient\"]\n"
  },
  {
    "path": "audio_separator/remote/api_client.py",
    "content": "#!/usr/bin/env python\nimport os\nimport logging\nimport json\nfrom typing import Optional, List, Dict\nfrom urllib.parse import quote\n\nimport requests\n\n# Get package version for debugging\ntry:\n    from importlib.metadata import version\n    AUDIO_SEPARATOR_VERSION = version(\"audio-separator\")\nexcept ImportError:\n    try:\n        import pkg_resources\n        AUDIO_SEPARATOR_VERSION = pkg_resources.get_distribution(\"audio-separator\").version\n    except Exception:\n        AUDIO_SEPARATOR_VERSION = \"unknown\"\n\n\nclass AudioSeparatorAPIClient:\n    \"\"\"Client for interacting with a remotely deployed Audio Separator API.\"\"\"\n\n    def __init__(self, api_url: str, logger: logging.Logger):\n        self.api_url = api_url\n        self.logger = logger\n        self.session = requests.Session()\n\n    def separate_audio(\n        self,\n        file_path: Optional[str] = None,\n        model: Optional[str] = None,\n        models: Optional[List[str]] = None,\n        preset: Optional[str] = None,\n        gcs_uri: Optional[str] = None,\n        # Output parameters\n        output_format: str = \"flac\",\n        output_bitrate: Optional[str] = None,\n        normalization_threshold: float = 0.9,\n        amplification_threshold: float = 0.0,\n        output_single_stem: Optional[str] = None,\n        invert_using_spec: bool = False,\n        sample_rate: int = 44100,\n        use_soundfile: bool = False,\n        use_autocast: bool = False,\n        custom_output_names: Optional[Dict[str, str]] = None,\n        # MDX parameters\n        mdx_segment_size: int = 256,\n        mdx_overlap: float = 0.25,\n        mdx_batch_size: int = 1,\n        mdx_hop_length: int = 1024,\n        mdx_enable_denoise: bool = False,\n        # VR parameters\n        vr_batch_size: int = 1,\n        vr_window_size: int = 512,\n        vr_aggression: int = 5,\n        vr_enable_tta: bool = False,\n        vr_high_end_process: bool = False,\n        vr_enable_post_process: bool = False,\n        vr_post_process_threshold: float = 0.2,\n        # Demucs parameters\n        demucs_segment_size: str = \"Default\",\n        demucs_shifts: int = 2,\n        demucs_overlap: float = 0.25,\n        demucs_segments_enabled: bool = True,\n        # MDXC parameters\n        mdxc_segment_size: int = 256,\n        mdxc_override_model_segment_size: bool = False,\n        mdxc_overlap: int = 8,\n        mdxc_batch_size: int = 1,\n        mdxc_pitch_shift: int = 0,\n    ) -> dict:\n        \"\"\"Submit audio separation job (asynchronous processing).\n\n        Provide either file_path (uploads file) or gcs_uri (server fetches from GCS).\n        \"\"\"\n        if not file_path and not gcs_uri:\n            raise ValueError(\"Must provide either file_path or gcs_uri\")\n        if file_path and gcs_uri:\n            raise ValueError(\"Provide either file_path or gcs_uri, not both\")\n\n        files = {}\n        file_handle = None\n        if file_path:\n            if not os.path.exists(file_path):\n                raise FileNotFoundError(f\"Audio file not found: {file_path}\")\n            file_handle = open(file_path, \"rb\")\n            files = {\"file\": (os.path.basename(file_path), file_handle)}\n\n        data = {}\n\n        if gcs_uri:\n            data[\"gcs_uri\"] = gcs_uri\n\n        # Handle model/preset parameters\n        if preset:\n            data[\"preset\"] = preset\n        elif models:\n            data[\"models\"] = json.dumps(models)\n        elif model:\n            data[\"model\"] = model\n\n        # Add all separator parameters\n        data.update(\n            {\n                \"output_format\": output_format,\n                \"normalization_threshold\": normalization_threshold,\n                \"amplification_threshold\": amplification_threshold,\n                \"invert_using_spec\": invert_using_spec,\n                \"sample_rate\": sample_rate,\n                \"use_soundfile\": use_soundfile,\n                \"use_autocast\": use_autocast,\n                # MDX parameters\n                \"mdx_segment_size\": mdx_segment_size,\n                \"mdx_overlap\": mdx_overlap,\n                \"mdx_batch_size\": mdx_batch_size,\n                \"mdx_hop_length\": mdx_hop_length,\n                \"mdx_enable_denoise\": mdx_enable_denoise,\n                # VR parameters\n                \"vr_batch_size\": vr_batch_size,\n                \"vr_window_size\": vr_window_size,\n                \"vr_aggression\": vr_aggression,\n                \"vr_enable_tta\": vr_enable_tta,\n                \"vr_high_end_process\": vr_high_end_process,\n                \"vr_enable_post_process\": vr_enable_post_process,\n                \"vr_post_process_threshold\": vr_post_process_threshold,\n                # Demucs parameters\n                \"demucs_segment_size\": demucs_segment_size,\n                \"demucs_shifts\": demucs_shifts,\n                \"demucs_overlap\": demucs_overlap,\n                \"demucs_segments_enabled\": demucs_segments_enabled,\n                # MDXC parameters\n                \"mdxc_segment_size\": mdxc_segment_size,\n                \"mdxc_override_model_segment_size\": mdxc_override_model_segment_size,\n                \"mdxc_overlap\": mdxc_overlap,\n                \"mdxc_batch_size\": mdxc_batch_size,\n                \"mdxc_pitch_shift\": mdxc_pitch_shift,\n            }\n        )\n\n        # Add optional parameters only if they have non-default values\n        if output_bitrate:\n            data[\"output_bitrate\"] = output_bitrate\n        if output_single_stem:\n            data[\"output_single_stem\"] = output_single_stem\n        if custom_output_names:\n            data[\"custom_output_names\"] = json.dumps(custom_output_names)\n\n        try:\n            # Server processes synchronously; 1800s matches Cloud Run request timeout.\n            # When using gcs_uri (no file upload), we still need multipart/form-data\n            # encoding because FastAPI requires it for endpoints with File()/Form() params.\n            # Passing a dummy empty file field forces requests to use multipart encoding.\n            if not files:\n                files = {\"file\": (\"\", b\"\", \"application/octet-stream\")}\n            response = self.session.post(\n                f\"{self.api_url}/separate\",\n                files=files,\n                data=data,\n                timeout=1800,\n            )\n            response.raise_for_status()\n            return response.json()\n        except requests.RequestException as e:\n            self.logger.error(f\"Separation request failed: {e}\")\n            raise\n        finally:\n            if file_handle:\n                file_handle.close()\n\n    def separate_audio_and_wait(\n        self,\n        file_path: Optional[str] = None,\n        model: Optional[str] = None,\n        models: Optional[List[str]] = None,\n        preset: Optional[str] = None,\n        gcs_uri: Optional[str] = None,\n        timeout: int = 600,\n        poll_interval: int = 10,\n        download: bool = True,\n        output_dir: Optional[str] = None,\n        # All separator parameters (same as separate_audio)\n        output_format: str = \"flac\",\n        output_bitrate: Optional[str] = None,\n        normalization_threshold: float = 0.9,\n        amplification_threshold: float = 0.0,\n        output_single_stem: Optional[str] = None,\n        invert_using_spec: bool = False,\n        sample_rate: int = 44100,\n        use_soundfile: bool = False,\n        use_autocast: bool = False,\n        custom_output_names: Optional[Dict[str, str]] = None,\n        mdx_segment_size: int = 256,\n        mdx_overlap: float = 0.25,\n        mdx_batch_size: int = 1,\n        mdx_hop_length: int = 1024,\n        mdx_enable_denoise: bool = False,\n        vr_batch_size: int = 1,\n        vr_window_size: int = 512,\n        vr_aggression: int = 5,\n        vr_enable_tta: bool = False,\n        vr_high_end_process: bool = False,\n        vr_enable_post_process: bool = False,\n        vr_post_process_threshold: float = 0.2,\n        demucs_segment_size: str = \"Default\",\n        demucs_shifts: int = 2,\n        demucs_overlap: float = 0.25,\n        demucs_segments_enabled: bool = True,\n        mdxc_segment_size: int = 256,\n        mdxc_override_model_segment_size: bool = False,\n        mdxc_overlap: int = 8,\n        mdxc_batch_size: int = 1,\n        mdxc_pitch_shift: int = 0,\n    ) -> dict:\n        \"\"\"\n        Submit audio separation job and wait for completion (convenience method).\n\n        This method handles the full workflow: submit job, poll for completion,\n        and optionally download the result files.\n\n        Args:\n            file_path: Path to the audio file to separate (or None if using gcs_uri)\n            model: Single model to use for separation (for backwards compatibility)\n            models: List of models to use for separation\n            gcs_uri: GCS URI (gs://bucket/path) - server fetches directly from GCS\n            timeout: Maximum time to wait for completion in seconds (default: 600)\n            poll_interval: How often to check status in seconds (default: 10)\n            download: Whether to automatically download result files (default: True)\n            output_dir: Directory to save downloaded files (default: current directory)\n            **kwargs: All other separator parameters (same as separate_audio method)\n\n        Returns:\n            dict with keys:\n                - task_id: The job task ID\n                - status: \"completed\" or \"error\"\n                - files: List of output filenames\n                - downloaded_files: List of local file paths (if download=True)\n                - error: Error message (if status=\"error\")\n        \"\"\"\n        import time\n\n        # Submit the separation job with all parameters\n        if preset:\n            models_desc = f\"preset:{preset}\"\n        else:\n            models_desc = models or ([model] if model else [\"default\"])\n        source_desc = gcs_uri if gcs_uri else file_path\n        self.logger.info(f\"Submitting separation job for '{source_desc}' with {models_desc} (audio-separator v{AUDIO_SEPARATOR_VERSION})\")\n\n        result = self.separate_audio(\n            file_path,\n            model,\n            models,\n            preset,\n            gcs_uri,\n            output_format,\n            output_bitrate,\n            normalization_threshold,\n            amplification_threshold,\n            output_single_stem,\n            invert_using_spec,\n            sample_rate,\n            use_soundfile,\n            use_autocast,\n            custom_output_names,\n            mdx_segment_size,\n            mdx_overlap,\n            mdx_batch_size,\n            mdx_hop_length,\n            mdx_enable_denoise,\n            vr_batch_size,\n            vr_window_size,\n            vr_aggression,\n            vr_enable_tta,\n            vr_high_end_process,\n            vr_enable_post_process,\n            vr_post_process_threshold,\n            demucs_segment_size,\n            demucs_shifts,\n            demucs_overlap,\n            demucs_segments_enabled,\n            mdxc_segment_size,\n            mdxc_override_model_segment_size,\n            mdxc_overlap,\n            mdxc_batch_size,\n            mdxc_pitch_shift,\n        )\n\n        task_id = result[\"task_id\"]\n        self.logger.info(f\"Job submitted! Task ID: {task_id}\")\n\n        # Poll for completion\n        self.logger.info(\"Waiting for separation to complete...\")\n        start_time = time.time()\n        last_progress = -1\n\n        while time.time() - start_time < timeout:\n            try:\n                status = self.get_job_status(task_id)\n                current_status = status.get(\"status\", \"unknown\")\n\n                # Show progress if it changed\n                if \"progress\" in status and status[\"progress\"] != last_progress:\n                    progress_info = f\"Progress: {status['progress']}%\"\n                    if \"current_model_index\" in status and \"total_models\" in status:\n                        model_info = f\" (Model {status['current_model_index'] + 1}/{status['total_models']})\"\n                        progress_info += model_info\n                    self.logger.info(progress_info)\n                    last_progress = status[\"progress\"]\n\n                # Check if completed\n                if current_status == \"completed\":\n                    self.logger.info(\"✅ Separation completed!\")\n                    \n                    files_data = status.get(\"files\", {})\n                    \n                    # Handle both old (list) and new (dict) format for backward compatibility\n                    if isinstance(files_data, list):\n                        # Legacy format: list of filenames\n                        self.logger.info(f\"🔍 Job status returned {len(files_data)} files (legacy format)\")\n                        for i, filename in enumerate(files_data):\n                            self.logger.info(f\"  [{i}] '{filename}' (len={len(filename)})\")\n                        result = {\"task_id\": task_id, \"status\": \"completed\", \"files\": files_data}\n                    else:\n                        # New format: dictionary of hash -> filename\n                        self.logger.info(f\"🔍 Job status returned {len(files_data)} files (hash format)\")\n                        for i, (file_hash, filename) in enumerate(files_data.items()):\n                            self.logger.info(f\"  [{i}] hash={file_hash} -> '{filename}' (len={len(filename)})\")\n                        result = {\"task_id\": task_id, \"status\": \"completed\", \"files\": files_data}\n\n                    # Download files if requested\n                    if download:\n                        downloaded_files = []\n                        files_data = status.get(\"files\", {})\n                        \n                        # Handle both old (list) and new (dict) format\n                        if isinstance(files_data, list):\n                            # Legacy format: list of filenames\n                            self.logger.info(f\"📥 Downloading {len(files_data)} output files (legacy format)...\")\n                            self.logger.info(f\"🔍 Files to download: {files_data}\")\n\n                            for i, filename in enumerate(files_data):\n                                try:\n                                    self.logger.info(f\"🔍 [{i+1}/{len(files_data)}] Attempting to download: '{filename}' (len={len(filename)})\")\n                                    \n                                    if output_dir:\n                                        output_path = f\"{output_dir.rstrip('/')}/{filename}\"\n                                    else:\n                                        output_path = filename\n\n                                    downloaded_path = self.download_file(task_id, filename, output_path)\n                                    downloaded_files.append(downloaded_path)\n                                    self.logger.info(f\"  ✅ Downloaded: {downloaded_path}\")\n                                except Exception as e:\n                                    self.logger.error(f\"  ❌ Failed to download {filename}: {e}\")\n                                    self._log_server_version_on_error()\n                        else:\n                            # New format: dictionary of hash -> filename\n                            self.logger.info(f\"📥 Downloading {len(files_data)} output files (hash format)...\")\n                            filenames_list = list(files_data.values())\n                            self.logger.info(f\"🔍 Files to download: {filenames_list}\")\n\n                            for i, (file_hash, filename) in enumerate(files_data.items()):\n                                try:\n                                    self.logger.info(f\"🔍 [{i+1}/{len(files_data)}] Attempting to download: '{filename}' (hash={file_hash}, len={len(filename)})\")\n                                    \n                                    if output_dir:\n                                        output_path = f\"{output_dir.rstrip('/')}/{filename}\"\n                                    else:\n                                        output_path = filename\n\n                                    downloaded_path = self.download_file_by_hash(task_id, file_hash, filename, output_path)\n                                    downloaded_files.append(downloaded_path)\n                                    self.logger.info(f\"  ✅ Downloaded: {downloaded_path}\")\n                                except Exception as e:\n                                    self.logger.error(f\"  ❌ Failed to download {filename}: {e}\")\n                                    self._log_server_version_on_error()\n\n                        result[\"downloaded_files\"] = downloaded_files\n                        self.logger.info(f\"🎉 Successfully downloaded {len(downloaded_files)} files!\")\n\n                    return result\n\n                elif current_status == \"error\":\n                    error_msg = status.get(\"error\", \"Unknown error\")\n                    self.logger.error(f\"❌ Job failed: {error_msg}\")\n                    return {\"task_id\": task_id, \"status\": \"error\", \"error\": error_msg, \"files\": []}\n\n                # Wait before next poll\n                time.sleep(poll_interval)\n\n            except Exception as e:\n                self.logger.warning(f\"Error polling status: {e}\")\n                time.sleep(poll_interval)\n\n        # Timeout reached\n        self.logger.error(f\"❌ Job polling timed out after {timeout} seconds\")\n        return {\"task_id\": task_id, \"status\": \"timeout\", \"error\": f\"Job polling timed out after {timeout} seconds\", \"files\": []}\n\n    def get_job_status(self, task_id: str) -> dict:\n        \"\"\"Get job status.\"\"\"\n        try:\n            response = self.session.get(f\"{self.api_url}/status/{task_id}\", timeout=10)\n            response.raise_for_status()\n            return response.json()\n        except requests.RequestException as e:\n            self.logger.error(f\"Status request failed: {e}\")\n            raise\n\n    def download_file(self, task_id: str, filename: str, output_path: Optional[str] = None) -> str:\n        \"\"\"Download a file from a completed job (legacy method for backward compatibility).\"\"\"\n        if output_path is None:\n            output_path = filename\n\n        try:\n            # URL encode the filename to handle spaces and special characters\n            encoded_filename = quote(filename, safe='')\n            download_url = f\"{self.api_url}/download/{task_id}/{encoded_filename}\"\n            \n            # Debug logging to understand what's happening\n            self.logger.info(f\"🔍 Download details (legacy filename method):\")\n            self.logger.info(f\"  Original filename: '{filename}'\")\n            self.logger.info(f\"  Encoded filename: '{encoded_filename}'\")\n            self.logger.info(f\"  Download URL: {download_url}\")\n            self.logger.info(f\"  Task ID: {task_id}\")\n            \n            response = self.session.get(download_url, timeout=60)\n            \n            # Log response details for debugging\n            self.logger.info(f\"🔍 Response status: {response.status_code}\")\n            if response.status_code != 200:\n                try:\n                    self.logger.error(f\"🔍 Response headers: {dict(response.headers)}\")\n                except Exception:\n                    self.logger.error(f\"🔍 Response headers: {response.headers}\")\n                try:\n                    self.logger.error(f\"🔍 Response text (first 500 chars): {response.text[:500]}\")\n                except Exception:\n                    self.logger.error(f\"🔍 Response text: <unavailable>\")\n            \n            response.raise_for_status()\n\n            with open(output_path, \"wb\") as f:\n                f.write(response.content)\n\n            return output_path\n        except requests.RequestException as e:\n            self.logger.error(f\"Download failed: {e}\")\n            raise\n\n    def download_file_by_hash(self, task_id: str, file_hash: str, filename: str, output_path: Optional[str] = None) -> str:\n        \"\"\"Download a file from a completed job using its hash identifier.\"\"\"\n        if output_path is None:\n            output_path = filename\n\n        try:\n            # Use the file hash in the URL instead of the filename\n            download_url = f\"{self.api_url}/download/{task_id}/{file_hash}\"\n            \n            # Debug logging to understand what's happening\n            self.logger.info(f\"🔍 Download details (hash method):\")\n            self.logger.info(f\"  Original filename: '{filename}'\")\n            self.logger.info(f\"  File hash: '{file_hash}'\")\n            self.logger.info(f\"  Download URL: {download_url}\")\n            self.logger.info(f\"  Task ID: {task_id}\")\n            \n            response = self.session.get(download_url, timeout=60)\n            \n            # Log response details for debugging\n            self.logger.info(f\"🔍 Response status: {response.status_code}\")\n            if response.status_code != 200:\n                try:\n                    self.logger.error(f\"🔍 Response headers: {dict(response.headers)}\")\n                except Exception:\n                    self.logger.error(f\"🔍 Response headers: {response.headers}\")\n                try:\n                    self.logger.error(f\"🔍 Response text (first 500 chars): {response.text[:500]}\")\n                except Exception:\n                    self.logger.error(f\"🔍 Response text: <unavailable>\")\n            \n            response.raise_for_status()\n\n            with open(output_path, \"wb\") as f:\n                f.write(response.content)\n\n            return output_path\n        except requests.RequestException as e:\n            self.logger.error(f\"Download failed: {e}\")\n            raise\n\n    def _log_server_version_on_error(self):\n        \"\"\"Helper method to log server version when download fails.\"\"\"\n        try:\n            server_version = self.get_server_version()\n            self.logger.error(f\"🔍 Server version when download failed: {server_version}\")\n        except Exception as version_error:\n            self.logger.error(f\"🔍 Could not get server version: {version_error}\")\n\n    def list_models(self, format_type: str = \"pretty\", filter_by: Optional[str] = None) -> dict:\n        \"\"\"List available models.\"\"\"\n        try:\n            if format_type == \"json\":\n                response = self.session.get(f\"{self.api_url}/models-json\", timeout=10)\n            else:\n                url = f\"{self.api_url}/models\"\n                if filter_by:\n                    url += f\"?filter_sort_by={filter_by}\"\n                response = self.session.get(url, timeout=10)\n\n            response.raise_for_status()\n\n            if format_type == \"json\":\n                return response.json()\n            else:\n                return {\"text\": response.text}\n        except requests.RequestException as e:\n            self.logger.error(f\"Models request failed: {e}\")\n            raise\n\n    def get_server_version(self) -> str:\n        \"\"\"Get the server version.\"\"\"\n        try:\n            response = self.session.get(f\"{self.api_url}/health\", timeout=10)\n            response.raise_for_status()\n            health_data = response.json()\n            return health_data.get(\"version\", \"unknown\")\n        except requests.RequestException as e:\n            self.logger.error(f\"Health check request failed: {e}\")\n            raise\n"
  },
  {
    "path": "audio_separator/remote/cli.py",
    "content": "#!/usr/bin/env python\nimport argparse\nimport json\nimport logging\nimport os\nimport sys\nimport time\nfrom importlib import metadata\n\nfrom audio_separator.remote import AudioSeparatorAPIClient\n\n\ndef main():\n    \"\"\"Main entry point for the remote CLI.\"\"\"\n    logger = logging.getLogger(__name__)\n    log_handler = logging.StreamHandler()\n    log_formatter = logging.Formatter(fmt=\"%(asctime)s.%(msecs)03d - %(levelname)s - %(module)s - %(message)s\", datefmt=\"%Y-%m-%d %H:%M:%S\")\n    log_handler.setFormatter(log_formatter)\n    logger.addHandler(log_handler)\n\n    parser = argparse.ArgumentParser(description=\"Separate audio files using a remote audio-separator API.\", formatter_class=lambda prog: argparse.RawTextHelpFormatter(prog, max_help_position=60))\n\n    # Get package version\n    package_version = metadata.distribution(\"audio-separator\").version\n\n    # Main command subparsers\n    subparsers = parser.add_subparsers(dest=\"command\", help=\"Available commands\")\n\n    # Separate command\n    separate_parser = subparsers.add_parser(\"separate\", help=\"Separate audio files\")\n    separate_parser.add_argument(\"audio_files\", nargs=\"+\", help=\"Audio file paths to separate\")\n\n    # Model selection (mutually exclusive: preset, single model, or multiple models)\n    model_group = separate_parser.add_mutually_exclusive_group()\n    model_group.add_argument(\"-p\", \"--preset\", help=\"Ensemble preset name (e.g. instrumental_clean, karaoke, vocal_balanced)\")\n    model_group.add_argument(\"-m\", \"--model\", help=\"Single model to use for separation\")\n    model_group.add_argument(\"--models\", nargs=\"+\", help=\"Multiple models to use for separation\")\n\n    separate_parser.add_argument(\"--timeout\", type=int, default=600, help=\"Timeout in seconds for polling (default: 600)\")\n    separate_parser.add_argument(\"--poll_interval\", type=int, default=10, help=\"Polling interval in seconds (default: 10)\")\n\n    # Output parameters\n    output_group = separate_parser.add_argument_group(\"Output Parameters\")\n    output_group.add_argument(\"--output_format\", default=\"flac\", help=\"Output format for separated files (default: %(default)s)\")\n    output_group.add_argument(\"--output_bitrate\", help=\"Output bitrate for separated files\")\n    output_group.add_argument(\"--normalization\", type=float, default=0.9, help=\"Max peak amplitude to normalize audio to (default: %(default)s)\")\n    output_group.add_argument(\"--amplification\", type=float, default=0.0, help=\"Min peak amplitude to amplify audio to (default: %(default)s)\")\n    output_group.add_argument(\"--single_stem\", help=\"Output only single stem (e.g. Vocals, Instrumental)\")\n    output_group.add_argument(\"--invert_spect\", action=\"store_true\", help=\"Invert secondary stem using spectrogram\")\n    output_group.add_argument(\"--sample_rate\", type=int, default=44100, help=\"Sample rate of output audio (default: %(default)s)\")\n    output_group.add_argument(\"--use_soundfile\", action=\"store_true\", help=\"Use soundfile for output writing\")\n    output_group.add_argument(\"--use_autocast\", action=\"store_true\", help=\"Use PyTorch autocast for faster inference\")\n    output_group.add_argument(\"--custom_output_names\", type=json.loads, help=\"Custom output names in JSON format\")\n\n    # MDX parameters\n    mdx_group = separate_parser.add_argument_group(\"MDX Architecture Parameters\")\n    mdx_group.add_argument(\"--mdx_segment_size\", type=int, default=256, help=\"MDX segment size (default: %(default)s)\")\n    mdx_group.add_argument(\"--mdx_overlap\", type=float, default=0.25, help=\"MDX overlap (default: %(default)s)\")\n    mdx_group.add_argument(\"--mdx_batch_size\", type=int, default=1, help=\"MDX batch size (default: %(default)s)\")\n    mdx_group.add_argument(\"--mdx_hop_length\", type=int, default=1024, help=\"MDX hop length (default: %(default)s)\")\n    mdx_group.add_argument(\"--mdx_enable_denoise\", action=\"store_true\", help=\"Enable MDX denoising\")\n\n    # VR parameters\n    vr_group = separate_parser.add_argument_group(\"VR Architecture Parameters\")\n    vr_group.add_argument(\"--vr_batch_size\", type=int, default=1, help=\"VR batch size (default: %(default)s)\")\n    vr_group.add_argument(\"--vr_window_size\", type=int, default=512, help=\"VR window size (default: %(default)s)\")\n    vr_group.add_argument(\"--vr_aggression\", type=int, default=5, help=\"VR aggression (default: %(default)s)\")\n    vr_group.add_argument(\"--vr_enable_tta\", action=\"store_true\", help=\"Enable VR Test-Time-Augmentation\")\n    vr_group.add_argument(\"--vr_high_end_process\", action=\"store_true\", help=\"Enable VR high end processing\")\n    vr_group.add_argument(\"--vr_enable_post_process\", action=\"store_true\", help=\"Enable VR post processing\")\n    vr_group.add_argument(\"--vr_post_process_threshold\", type=float, default=0.2, help=\"VR post process threshold (default: %(default)s)\")\n\n    # Demucs parameters\n    demucs_group = separate_parser.add_argument_group(\"Demucs Architecture Parameters\")\n    demucs_group.add_argument(\"--demucs_segment_size\", default=\"Default\", help=\"Demucs segment size (default: %(default)s)\")\n    demucs_group.add_argument(\"--demucs_shifts\", type=int, default=2, help=\"Demucs shifts (default: %(default)s)\")\n    demucs_group.add_argument(\"--demucs_overlap\", type=float, default=0.25, help=\"Demucs overlap (default: %(default)s)\")\n    demucs_group.add_argument(\"--demucs_segments_enabled\", type=bool, default=True, help=\"Enable Demucs segments (default: %(default)s)\")\n\n    # MDXC parameters\n    mdxc_group = separate_parser.add_argument_group(\"MDXC Architecture Parameters\")\n    mdxc_group.add_argument(\"--mdxc_segment_size\", type=int, default=256, help=\"MDXC segment size (default: %(default)s)\")\n    mdxc_group.add_argument(\"--mdxc_override_model_segment_size\", action=\"store_true\", help=\"Override MDXC model segment size\")\n    mdxc_group.add_argument(\"--mdxc_overlap\", type=int, default=8, help=\"MDXC overlap (default: %(default)s)\")\n    mdxc_group.add_argument(\"--mdxc_batch_size\", type=int, default=1, help=\"MDXC batch size (default: %(default)s)\")\n    mdxc_group.add_argument(\"--mdxc_pitch_shift\", type=int, default=0, help=\"MDXC pitch shift (default: %(default)s)\")\n\n    # Status command\n    status_parser = subparsers.add_parser(\"status\", help=\"Check job status\")\n    status_parser.add_argument(\"task_id\", help=\"Task ID to check status for\")\n\n    # Models command\n    models_parser = subparsers.add_parser(\"models\", help=\"List available models\")\n    models_parser.add_argument(\"--format\", choices=[\"pretty\", \"json\"], default=\"pretty\", help=\"Output format\")\n    models_parser.add_argument(\"--filter\", help=\"Filter models by name, type, or stem\")\n\n    # Download command\n    download_parser = subparsers.add_parser(\"download\", help=\"Download specific files from a job\")\n    download_parser.add_argument(\"task_id\", help=\"Task ID\")\n    download_parser.add_argument(\"filenames\", nargs=\"+\", help=\"Filenames to download\")\n\n    # Global options\n    parser.add_argument(\"-v\", \"--version\", action=\"store_true\", help=\"Show version information\")\n    parser.add_argument(\"-d\", \"--debug\", action=\"store_true\", help=\"Enable debug logging\")\n    parser.add_argument(\"--log_level\", default=\"info\", help=\"Log level (default: info)\")\n    parser.add_argument(\"--api_url\", help=\"API URL (overrides AUDIO_SEPARATOR_API_URL env var)\")\n\n    args = parser.parse_args()\n\n    # Set up logging\n    if args.debug:\n        log_level = logging.DEBUG\n    else:\n        log_level = getattr(logging, args.log_level.upper())\n    logger.setLevel(log_level)\n\n    # Handle version command\n    if args.version:\n        print(f\"Client version: {package_version}\")\n\n        # Try to get server version if API URL is available\n        api_url = args.api_url or os.environ.get(\"AUDIO_SEPARATOR_API_URL\")\n        if api_url:\n            api_url = api_url.rstrip(\"/\")\n            api_client = AudioSeparatorAPIClient(api_url, logger)\n            try:\n                server_version = api_client.get_server_version()\n                print(f\"Server version: {server_version}\")\n            except Exception as e:\n                logger.warning(f\"Could not retrieve server version: {e}\")\n        else:\n            logger.warning(\"API URL not provided. Set AUDIO_SEPARATOR_API_URL environment variable or use --api_url to get server version\")\n\n        sys.exit(0)\n\n    # Get API URL\n    api_url = args.api_url or os.environ.get(\"AUDIO_SEPARATOR_API_URL\")\n    if not api_url:\n        logger.error(\"API URL not provided. Set AUDIO_SEPARATOR_API_URL environment variable or use --api_url\")\n        sys.exit(1)\n\n    # Remove trailing slash\n    api_url = api_url.rstrip(\"/\")\n\n    # Create API client\n    api_client = AudioSeparatorAPIClient(api_url, logger)\n\n    # Handle commands\n    if args.command == \"separate\":\n        handle_separate_command(args, api_client, logger)\n    elif args.command == \"status\":\n        handle_status_command(args, api_client, logger)\n    elif args.command == \"models\":\n        handle_models_command(args, api_client, logger)\n    elif args.command == \"download\":\n        handle_download_command(args, api_client, logger)\n    else:\n        parser.print_help()\n        sys.exit(1)\n\n\ndef handle_separate_command(args, api_client: AudioSeparatorAPIClient, logger: logging.Logger):\n    \"\"\"Handle the separate command.\"\"\"\n    for audio_file in args.audio_files:\n        logger.info(f\"Uploading '{audio_file}' to audio separator...\")\n\n        try:\n            # Prepare parameters for separation\n            kwargs = {\n                \"model\": args.model,\n                \"models\": args.models,\n                \"preset\": args.preset,\n                \"timeout\": args.timeout,\n                \"poll_interval\": args.poll_interval,\n                \"download\": True,  # Always download in CLI\n                \"output_dir\": None,  # Use current directory\n                # Output parameters\n                \"output_format\": args.output_format,\n                \"output_bitrate\": args.output_bitrate,\n                \"normalization_threshold\": args.normalization,\n                \"amplification_threshold\": args.amplification,\n                \"output_single_stem\": args.single_stem,\n                \"invert_using_spec\": args.invert_spect,\n                \"sample_rate\": args.sample_rate,\n                \"use_soundfile\": args.use_soundfile,\n                \"use_autocast\": args.use_autocast,\n                \"custom_output_names\": args.custom_output_names,\n                # MDX parameters\n                \"mdx_segment_size\": args.mdx_segment_size,\n                \"mdx_overlap\": args.mdx_overlap,\n                \"mdx_batch_size\": args.mdx_batch_size,\n                \"mdx_hop_length\": args.mdx_hop_length,\n                \"mdx_enable_denoise\": args.mdx_enable_denoise,\n                # VR parameters\n                \"vr_batch_size\": args.vr_batch_size,\n                \"vr_window_size\": args.vr_window_size,\n                \"vr_aggression\": args.vr_aggression,\n                \"vr_enable_tta\": args.vr_enable_tta,\n                \"vr_high_end_process\": args.vr_high_end_process,\n                \"vr_enable_post_process\": args.vr_enable_post_process,\n                \"vr_post_process_threshold\": args.vr_post_process_threshold,\n                # Demucs parameters\n                \"demucs_segment_size\": args.demucs_segment_size,\n                \"demucs_shifts\": args.demucs_shifts,\n                \"demucs_overlap\": args.demucs_overlap,\n                \"demucs_segments_enabled\": args.demucs_segments_enabled,\n                # MDXC parameters\n                \"mdxc_segment_size\": args.mdxc_segment_size,\n                \"mdxc_override_model_segment_size\": args.mdxc_override_model_segment_size,\n                \"mdxc_overlap\": args.mdxc_overlap,\n                \"mdxc_batch_size\": args.mdxc_batch_size,\n                \"mdxc_pitch_shift\": args.mdxc_pitch_shift,\n            }\n\n            # Use the convenience method that handles everything\n            result = api_client.separate_audio_and_wait(audio_file, **kwargs)\n\n            if result[\"status\"] == \"completed\":\n                if \"downloaded_files\" in result:\n                    logger.info(f\"✅ Separation completed! Downloaded {len(result['downloaded_files'])} files:\")\n                    for file_path in result[\"downloaded_files\"]:\n                        logger.info(f\"  - {file_path}\")\n                else:\n                    logger.info(f\"✅ Separation completed! Files available for download: {result['files']}\")\n            else:\n                logger.error(f\"❌ Separation failed: {result.get('error', 'Unknown error')}\")\n\n        except Exception as e:\n            logger.error(f\"❌ Error processing '{audio_file}': {e}\")\n\n\ndef handle_status_command(args, api_client: AudioSeparatorAPIClient, logger: logging.Logger):\n    \"\"\"Handle the status command.\"\"\"\n    try:\n        status = api_client.get_job_status(args.task_id)\n\n        logger.info(f\"Job Status: {status['status']}\")\n        if \"progress\" in status:\n            progress_info = f\"Progress: {status['progress']}%\"\n            if \"current_model_index\" in status and \"total_models\" in status:\n                model_info = f\" (Model {status['current_model_index'] + 1}/{status['total_models']})\"\n                progress_info += model_info\n            logger.info(progress_info)\n        if \"original_filename\" in status:\n            logger.info(f\"Original File: {status['original_filename']}\")\n        if \"models_used\" in status:\n            logger.info(f\"Models Used: {', '.join(status['models_used'])}\")\n        if status[\"status\"] == \"error\" and \"error\" in status:\n            logger.error(f\"Error: {status['error']}\")\n        elif status[\"status\"] == \"completed\" and \"files\" in status:\n            logger.info(\"Output Files:\")\n            for filename in status[\"files\"]:\n                logger.info(f\"  - {filename}\")\n\n    except Exception as e:\n        logger.error(f\"❌ Error getting status: {e}\")\n\n\ndef handle_models_command(args, api_client: AudioSeparatorAPIClient, logger: logging.Logger):\n    \"\"\"Handle the models command.\"\"\"\n    try:\n        models = api_client.list_models(args.format, args.filter)\n\n        if args.format == \"json\":\n            print(json.dumps(models, indent=2))\n        else:\n            print(models[\"text\"])\n\n    except Exception as e:\n        logger.error(f\"❌ Error listing models: {e}\")\n\n\ndef handle_download_command(args, api_client: AudioSeparatorAPIClient, logger: logging.Logger):\n    \"\"\"Handle the download command.\"\"\"\n    try:\n        for filename in args.filenames:\n            logger.info(f\"📂 Downloading: {filename}\")\n            output_path = api_client.download_file(args.task_id, filename)\n            logger.info(f\"✅ Downloaded: {output_path}\")\n\n    except Exception as e:\n        logger.error(f\"❌ Error downloading files: {e}\")\n\n\ndef poll_for_completion(task_id: str, api_client: AudioSeparatorAPIClient, logger: logging.Logger, timeout: int = 600, poll_interval: int = 10) -> bool:\n    \"\"\"Poll for job completion.\"\"\"\n    start_time = time.time()\n    last_progress = -1\n\n    while time.time() - start_time < timeout:\n        try:\n            status = api_client.get_job_status(task_id)\n            current_status = status.get(\"status\", \"unknown\")\n\n            # Show progress if it changed\n            if \"progress\" in status and status[\"progress\"] != last_progress:\n                progress_info = f\"📊 Progress: {status['progress']}%\"\n                if \"current_model_index\" in status and \"total_models\" in status:\n                    model_info = f\" (Model {status['current_model_index'] + 1}/{status['total_models']})\"\n                    progress_info += model_info\n                logger.info(progress_info)\n                last_progress = status[\"progress\"]\n\n            # Check if completed\n            if current_status == \"completed\":\n                logger.info(\"✅ Job completed!\")\n                return True\n            elif current_status == \"error\":\n                logger.error(f\"❌ Job failed: {status.get('error', 'Unknown error')}\")\n                return False\n\n            # Wait before next poll\n            time.sleep(poll_interval)\n\n        except Exception as e:\n            logger.warning(f\"Error polling status: {e}\")\n            time.sleep(poll_interval)\n\n    logger.error(f\"❌ Job polling timed out after {timeout} seconds\")\n    return False\n\n\ndef download_files(task_id: str, files: list, api_client: AudioSeparatorAPIClient, logger: logging.Logger):\n    \"\"\"Download all files from a completed job.\"\"\"\n    if not files:\n        logger.warning(\"No files to download\")\n        return\n\n    logger.info(f\"📥 Downloading {len(files)} output files...\")\n\n    downloaded_count = 0\n    for filename in files:\n        try:\n            logger.info(f\"  📂 Downloading: {filename}\")\n            output_path = api_client.download_file(task_id, filename)\n            logger.info(f\"  ✅ Downloaded: {output_path}\")\n            downloaded_count += 1\n        except Exception as e:\n            logger.error(f\"  ❌ Failed to download {filename}: {e}\")\n\n    if downloaded_count > 0:\n        logger.info(f\"🎉 Successfully downloaded {downloaded_count} files to current directory!\")\n    else:\n        logger.error(\"❌ No files were successfully downloaded\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "audio_separator/remote/deploy_cloudrun.py",
    "content": "\"\"\"\nAudio Separator API - Cloud Run GPU Deployment\n\nA FastAPI service for separating vocals from instrumental tracks using audio-separator,\ndeployed on Google Cloud Run with L4 GPU acceleration.\n\nThis is the GCP equivalent of deploy_modal.py — same API contract, different infrastructure.\nModels are downloaded from GCS on startup and cached in the container's local filesystem.\n\nUsage with Remote CLI:\n1. Install audio-separator package: pip install audio-separator\n2. Set environment variable: export AUDIO_SEPARATOR_API_URL=\"https://your-cloudrun-url.run.app\"\n3. Use the remote CLI:\n   - audio-separator-remote separate song.mp3\n   - audio-separator-remote separate song.mp3 --model UVR-MDX-NET-Inst_HQ_4\n   - audio-separator-remote status <task_id>\n   - audio-separator-remote models\n   - audio-separator-remote download <task_id> <filename>\n\"\"\"\n\nimport asyncio\nimport hashlib\nimport json\nimport logging\nimport os\nimport re\nimport shutil\nimport threading\nimport traceback\nimport typing\nimport uuid\nfrom importlib.metadata import version\nfrom typing import Optional\nfrom urllib.parse import quote\n\nimport filetype\nimport uvicorn\nfrom fastapi import FastAPI, File, Form, HTTPException, Response, UploadFile\nfrom fastapi.middleware.cors import CORSMiddleware\nfrom starlette.responses import PlainTextResponse\nfrom starlette.responses import Response as StarletteResponse\n\nlogger = logging.getLogger(\"audio-separator-api\")\nlogging.basicConfig(level=logging.INFO, format=\"%(asctime)s %(levelname)s %(name)s: %(message)s\")\n\n# Constants\nMODEL_DIR = os.environ.get(\"MODEL_DIR\", \"/models\")\nSTORAGE_DIR = os.environ.get(\"STORAGE_DIR\", \"/tmp/storage\")\nMODEL_BUCKET = os.environ.get(\"MODEL_BUCKET\", \"\")\nPORT = int(os.environ.get(\"PORT\", \"8080\"))\n\n\n\n# Track model readiness\nmodels_ready = False\n\n# --- Async job infrastructure ---\n\nOUTPUT_BUCKET = os.environ.get(\"OUTPUT_BUCKET\", \"nomadkaraoke-audio-separator-outputs\")\nGCP_PROJECT = os.environ.get(\"GCP_PROJECT\", \"nomadkaraoke\")\n\n_job_store = None\n_output_store = None\n\n\ndef get_job_store():\n    \"\"\"Get or create the Firestore job store (lazy init).\"\"\"\n    global _job_store\n    if _job_store is None:\n        from audio_separator.remote.job_store import FirestoreJobStore\n\n        _job_store = FirestoreJobStore(project=GCP_PROJECT)\n    return _job_store\n\n\ndef get_output_store():\n    \"\"\"Get or create the GCS output store (lazy init).\"\"\"\n    global _output_store\n    if _output_store is None:\n        from audio_separator.remote.output_store import GCSOutputStore\n\n        _output_store = GCSOutputStore(bucket_name=OUTPUT_BUCKET, project=GCP_PROJECT)\n    return _output_store\n\n\ndef generate_file_hash(filename: str) -> str:\n    \"\"\"Generate a short, stable hash for a filename to use in download URLs.\"\"\"\n    return hashlib.sha256(filename.encode(\"utf-8\")).hexdigest()[:16]\n\n\ndef download_from_gcs(gcs_uri: str) -> tuple[bytes, str]:\n    \"\"\"Download an audio file from GCS.\n\n    Args:\n        gcs_uri: GCS URI in the format gs://bucket/path/to/file\n\n    Returns:\n        Tuple of (file_bytes, filename)\n    \"\"\"\n    from google.cloud import storage\n\n    if not gcs_uri.startswith(\"gs://\"):\n        raise ValueError(f\"Invalid GCS URI (must start with gs://): {gcs_uri}\")\n\n    # Parse gs://bucket/path\n    without_prefix = gcs_uri[len(\"gs://\"):]\n    slash_idx = without_prefix.index(\"/\")\n    bucket_name = without_prefix[:slash_idx]\n    blob_path = without_prefix[slash_idx + 1:]\n    filename = os.path.basename(blob_path)\n\n    logger.info(f\"Downloading from GCS: bucket={bucket_name}, path={blob_path}\")\n    client = storage.Client()\n    bucket = client.bucket(bucket_name)\n    blob = bucket.blob(blob_path)\n    audio_bytes = blob.download_as_bytes()\n    logger.info(f\"Downloaded {len(audio_bytes)} bytes from GCS\")\n\n    return audio_bytes, filename\n\n\ntry:\n    AUDIO_SEPARATOR_VERSION = version(\"audio-separator\")\nexcept Exception:\n    AUDIO_SEPARATOR_VERSION = \"unknown\"\n\n\ndef download_models_from_gcs():\n    \"\"\"Download models from GCS bucket on startup.\"\"\"\n    global models_ready\n\n    if not MODEL_BUCKET:\n        logger.info(\"MODEL_BUCKET not set, skipping GCS model download (models will be downloaded on demand)\")\n        models_ready = True\n        return\n\n    try:\n        from google.cloud import storage\n\n        client = storage.Client()\n        bucket = client.bucket(MODEL_BUCKET)\n        blobs = list(bucket.list_blobs())\n\n        os.makedirs(MODEL_DIR, exist_ok=True)\n\n        for blob in blobs:\n            local_path = os.path.join(MODEL_DIR, blob.name)\n            if os.path.exists(local_path):\n                # Check size to skip already-downloaded models\n                if os.path.getsize(local_path) == blob.size:\n                    logger.info(f\"Model already cached: {blob.name} ({blob.size / 1024 / 1024:.1f} MB)\")\n                    continue\n\n            logger.info(f\"Downloading model: {blob.name} ({blob.size / 1024 / 1024:.1f} MB)\")\n            os.makedirs(os.path.dirname(local_path), exist_ok=True)\n            blob.download_to_filename(local_path)\n            logger.info(f\"Downloaded: {blob.name}\")\n\n        models_ready = True\n        logger.info(f\"All models ready in {MODEL_DIR}\")\n\n    except Exception as e:\n        logger.error(f\"Failed to download models from GCS: {e}\")\n        # Still mark as ready — models can be downloaded on demand by Separator\n        models_ready = True\n\n\ndef separate_audio_sync(\n    audio_data: bytes,\n    filename: str,\n    task_id: str,\n    models: Optional[list] = None,\n    preset: Optional[str] = None,\n    output_format: str = \"flac\",\n    output_bitrate: Optional[str] = None,\n    normalization_threshold: float = 0.9,\n    amplification_threshold: float = 0.0,\n    output_single_stem: Optional[str] = None,\n    invert_using_spec: bool = False,\n    sample_rate: int = 44100,\n    use_soundfile: bool = False,\n    use_autocast: bool = False,\n    custom_output_names: Optional[dict] = None,\n    # MDX parameters\n    mdx_segment_size: int = 256,\n    mdx_overlap: float = 0.25,\n    mdx_batch_size: int = 1,\n    mdx_hop_length: int = 1024,\n    mdx_enable_denoise: bool = False,\n    # VR parameters\n    vr_batch_size: int = 1,\n    vr_window_size: int = 512,\n    vr_aggression: int = 5,\n    vr_enable_tta: bool = False,\n    vr_high_end_process: bool = False,\n    vr_enable_post_process: bool = False,\n    vr_post_process_threshold: float = 0.2,\n    # Demucs parameters\n    demucs_segment_size: str = \"Default\",\n    demucs_shifts: int = 2,\n    demucs_overlap: float = 0.25,\n    demucs_segments_enabled: bool = True,\n    # MDXC parameters\n    mdxc_segment_size: int = 256,\n    mdxc_override_model_segment_size: bool = False,\n    mdxc_overlap: int = 8,\n    mdxc_batch_size: int = 1,\n    mdxc_pitch_shift: int = 0,\n) -> dict:\n    \"\"\"Separate audio into stems. Runs synchronously (Cloud Run GPU handles one job at a time).\"\"\"\n    from audio_separator.separator import Separator\n\n    all_output_files = {}\n    models_used = []\n\n    def update_status(status: str, progress: int = 0, error: str = None, files: dict = None):\n        status_data = {\n            \"status\": status,\n            \"progress\": progress,\n            \"models_used\": models_used,\n            \"total_models\": len(models) if models else 1,\n            \"current_model_index\": 0,\n        }\n        if files is not None:\n            status_data[\"files\"] = files\n        if error:\n            status_data[\"error\"] = error\n        try:\n            get_job_store().update(task_id, status_data)\n        except Exception as e:\n            logger.warning(f\"[{task_id}] Failed to update Firestore status: {e}\")\n\n    update_status(\"processing\", 0)\n    logger.info(f\"[{task_id}] Starting separation\")\n    try:\n        os.makedirs(f\"{STORAGE_DIR}/outputs/{task_id}\", exist_ok=True)\n        output_dir = f\"{STORAGE_DIR}/outputs/{task_id}\"\n\n        update_status(\"processing\", 5)\n\n        # Strip existing stem markers from filename (e.g. \"_(Vocals)_\", \"_(Instrumental)_\")\n        # to prevent the Separator from confusing them with output stem names during\n        # chained separations (Stage 1 output → Stage 2 input).\n        clean_filename = re.sub(r\"_\\([^)]+\\)_\", \"_\", filename)\n        input_file_path = os.path.join(output_dir, clean_filename)\n        with open(input_file_path, \"wb\") as f:\n            f.write(audio_data)\n\n        update_status(\"processing\", 10)\n\n        # Build separator kwargs\n        separator_kwargs = {\n            \"log_level\": logging.INFO,\n            \"model_file_dir\": MODEL_DIR,\n            \"output_dir\": output_dir,\n            \"output_format\": output_format,\n            \"output_bitrate\": output_bitrate,\n            \"normalization_threshold\": normalization_threshold,\n            \"amplification_threshold\": amplification_threshold,\n            \"output_single_stem\": output_single_stem,\n            \"invert_using_spec\": invert_using_spec,\n            \"sample_rate\": sample_rate,\n            \"use_soundfile\": use_soundfile,\n            \"use_autocast\": use_autocast,\n            \"mdx_params\": {\n                \"hop_length\": mdx_hop_length,\n                \"segment_size\": mdx_segment_size,\n                \"overlap\": mdx_overlap,\n                \"batch_size\": mdx_batch_size,\n                \"enable_denoise\": mdx_enable_denoise,\n            },\n            \"vr_params\": {\n                \"batch_size\": vr_batch_size,\n                \"window_size\": vr_window_size,\n                \"aggression\": vr_aggression,\n                \"enable_tta\": vr_enable_tta,\n                \"enable_post_process\": vr_enable_post_process,\n                \"post_process_threshold\": vr_post_process_threshold,\n                \"high_end_process\": vr_high_end_process,\n            },\n            \"demucs_params\": {\n                \"segment_size\": demucs_segment_size,\n                \"shifts\": demucs_shifts,\n                \"overlap\": demucs_overlap,\n                \"segments_enabled\": demucs_segments_enabled,\n            },\n            \"mdxc_params\": {\n                \"segment_size\": mdxc_segment_size,\n                \"batch_size\": mdxc_batch_size,\n                \"overlap\": mdxc_overlap,\n                \"override_model_segment_size\": mdxc_override_model_segment_size,\n                \"pitch_shift\": mdxc_pitch_shift,\n            },\n        }\n\n        if preset:\n            # Use ensemble preset — Separator handles model resolution\n            separator_kwargs[\"ensemble_preset\"] = preset\n            logger.info(f\"Using ensemble preset: {preset}\")\n\n            separator = Separator(**separator_kwargs)\n            separator.load_model()  # Preset models loaded automatically\n            models_used.append(f\"preset:{preset}\")\n\n            update_status(\"processing\", 50)\n            output_files = separator.separate(input_file_path, custom_output_names=custom_output_names)\n\n            if not output_files:\n                error_msg = f\"Separation with preset {preset} produced no output files\"\n                update_status(\"error\", 0, error=error_msg)\n                return {\"task_id\": task_id, \"status\": \"error\", \"error\": error_msg, \"models_used\": models_used}\n\n            for f in output_files:\n                fname = os.path.basename(f)\n                all_output_files[generate_file_hash(fname)] = fname\n\n        else:\n            # Traditional multi-model processing (no ensembling)\n            if models is None or len(models) == 0:\n                models_to_run = [None]\n            else:\n                models_to_run = models\n\n            total_models = len(models_to_run)\n\n            for model_index, model_name in enumerate(models_to_run):\n                base_progress = 10 + (model_index * 80 // total_models)\n                model_progress_range = 80 // total_models\n\n                logger.info(f\"Processing model {model_index + 1}/{total_models}: {model_name or 'default'}\")\n                update_status(\"processing\", base_progress + (model_progress_range // 4))\n\n                separator = Separator(**separator_kwargs)\n\n                update_status(\"processing\", base_progress + (model_progress_range // 2))\n                if model_name:\n                    separator.load_model(model_name)\n                    models_used.append(model_name)\n                else:\n                    separator.load_model()\n                    models_used.append(\"default\")\n\n                update_status(\"processing\", base_progress + (3 * model_progress_range // 4))\n\n                model_custom_output_names = None\n                if total_models > 1 and custom_output_names:\n                    model_suffix = f\"_{models_used[-1].replace('.', '_').replace('/', '_')}\"\n                    model_custom_output_names = {stem: f\"{name}{model_suffix}\" for stem, name in custom_output_names.items()}\n                elif custom_output_names:\n                    model_custom_output_names = custom_output_names\n\n                output_files = separator.separate(input_file_path, custom_output_names=model_custom_output_names)\n\n                if not output_files:\n                    error_msg = f\"Separation with model {models_used[-1]} produced no output files\"\n                    update_status(\"error\", 0, error=error_msg)\n                    return {\"task_id\": task_id, \"status\": \"error\", \"error\": error_msg, \"models_used\": models_used}\n\n                for f in output_files:\n                    fname = os.path.basename(f)\n                    all_output_files[generate_file_hash(fname)] = fname\n\n        # Upload outputs to GCS for cross-instance access\n        get_output_store().upload_task_outputs(task_id, output_dir)\n\n        update_status(\"completed\", 100, files=all_output_files)\n        logger.info(f\"Separation completed. {len(all_output_files)} output files.\")\n        return {\"task_id\": task_id, \"status\": \"completed\", \"files\": all_output_files, \"models_used\": models_used}\n\n    except Exception as e:\n        logger.error(f\"Separation error: {e}\")\n        traceback.print_exc()\n        update_status(\"error\", 0, error=str(e))\n\n        return {\"task_id\": task_id, \"status\": \"error\", \"error\": str(e), \"models_used\": models_used}\n\n    finally:\n        logger.info(f\"[{task_id}] Separation finished, cleaning up local files\")\n        # Clean up local files (outputs are in GCS now)\n        output_dir = f\"{STORAGE_DIR}/outputs/{task_id}\"\n        if os.path.exists(output_dir):\n            shutil.rmtree(output_dir, ignore_errors=True)\n\n\n# --- FastAPI Application ---\n\nclass PrettyJSONResponse(StarletteResponse):\n    media_type = \"application/json\"\n\n    def render(self, content: typing.Any) -> bytes:\n        return json.dumps(content, ensure_ascii=False, allow_nan=False, indent=4, separators=(\", \", \": \")).encode(\"utf-8\")\n\n\nweb_app = FastAPI(\n    title=\"Audio Separator API\",\n    description=\"Separate vocals from instrumental tracks using AI (Cloud Run GPU)\",\n    version=AUDIO_SEPARATOR_VERSION,\n)\n\nweb_app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], allow_credentials=True, allow_methods=[\"*\"], allow_headers=[\"*\"])\n\n\n@web_app.post(\"/separate\")\nasync def separate_audio(\n    file: Optional[UploadFile] = File(None, description=\"Audio file to separate\"),\n    gcs_uri: Optional[str] = Form(None, description=\"GCS URI (gs://bucket/path) to fetch audio from instead of uploading\"),\n    model: Optional[str] = Form(None, description=\"Single model to use for separation\"),\n    models: Optional[str] = Form(None, description='JSON list of models, e.g. [\"model1.ckpt\", \"model2.onnx\"]'),\n    preset: Optional[str] = Form(None, description=\"Ensemble preset name (e.g. instrumental_clean, karaoke)\"),\n    # Output parameters\n    output_format: str = Form(\"flac\", description=\"Output format\"),\n    output_bitrate: Optional[str] = Form(None, description=\"Output bitrate\"),\n    normalization_threshold: float = Form(0.9),\n    amplification_threshold: float = Form(0.0),\n    output_single_stem: Optional[str] = Form(None),\n    invert_using_spec: bool = Form(False),\n    sample_rate: int = Form(44100),\n    use_soundfile: bool = Form(False),\n    use_autocast: bool = Form(False),\n    custom_output_names: Optional[str] = Form(None),\n    # MDX parameters\n    mdx_segment_size: int = Form(256),\n    mdx_overlap: float = Form(0.25),\n    mdx_batch_size: int = Form(1),\n    mdx_hop_length: int = Form(1024),\n    mdx_enable_denoise: bool = Form(False),\n    # VR parameters\n    vr_batch_size: int = Form(1),\n    vr_window_size: int = Form(512),\n    vr_aggression: int = Form(5),\n    vr_enable_tta: bool = Form(False),\n    vr_high_end_process: bool = Form(False),\n    vr_enable_post_process: bool = Form(False),\n    vr_post_process_threshold: float = Form(0.2),\n    # Demucs parameters\n    demucs_segment_size: str = Form(\"Default\"),\n    demucs_shifts: int = Form(2),\n    demucs_overlap: float = Form(0.25),\n    demucs_segments_enabled: bool = Form(True),\n    # MDXC parameters\n    mdxc_segment_size: int = Form(256),\n    mdxc_override_model_segment_size: bool = Form(False),\n    mdxc_overlap: int = Form(8),\n    mdxc_batch_size: int = Form(1),\n    mdxc_pitch_shift: int = Form(0),\n) -> dict:\n    \"\"\"Upload an audio file (or provide a GCS URI) and separate it into stems.\"\"\"\n    # Validate: must provide exactly one of file or gcs_uri\n    has_file = file is not None and file.filename\n    has_gcs = gcs_uri is not None and gcs_uri.strip()\n    if not has_file and not has_gcs:\n        raise HTTPException(status_code=400, detail=\"Must provide either a file upload or gcs_uri parameter\")\n    if has_file and has_gcs:\n        raise HTTPException(status_code=400, detail=\"Provide either file upload or gcs_uri, not both\")\n\n    try:\n        # Parse models parameter\n        models_list = None\n        if models:\n            try:\n                models_list = json.loads(models)\n                if not isinstance(models_list, list):\n                    raise ValueError(\"Models must be a JSON list\")\n            except json.JSONDecodeError as e:\n                raise HTTPException(status_code=400, detail=f\"Invalid JSON in models parameter: {e}\")\n        elif model:\n            models_list = [model]\n\n        # Parse custom_output_names\n        custom_output_names_dict = None\n        if custom_output_names:\n            try:\n                custom_output_names_dict = json.loads(custom_output_names)\n                if not isinstance(custom_output_names_dict, dict):\n                    raise ValueError(\"Custom output names must be a JSON object\")\n            except json.JSONDecodeError as e:\n                raise HTTPException(status_code=400, detail=f\"Invalid JSON in custom_output_names parameter: {e}\")\n\n        # Get audio data from file upload or GCS\n        if has_gcs:\n            try:\n                audio_data, filename = download_from_gcs(gcs_uri.strip())\n            except Exception as e:\n                raise HTTPException(status_code=400, detail=f\"Failed to download from GCS: {e}\")\n        else:\n            audio_data = await file.read()\n            filename = file.filename\n\n        task_id = str(uuid.uuid4())\n        instance_id = os.environ.get(\"K_REVISION\", \"local\")\n\n        # Write initial status to Firestore\n        get_job_store().set(task_id, {\n            \"task_id\": task_id,\n            \"status\": \"submitted\",\n            \"progress\": 0,\n            \"original_filename\": filename,\n            \"models_used\": [f\"preset:{preset}\"] if preset else (models_list or [\"default\"]),\n            \"total_models\": 1 if preset else (len(models_list) if models_list else 1),\n            \"current_model_index\": 0,\n            \"files\": {},\n            \"instance_id\": instance_id,\n        })\n\n        # Run separation synchronously — Cloud Run keeps this request active,\n        # which lets the autoscaler know this instance is busy and route new\n        # requests to new instances (with concurrency=1).\n        # This matches Modal's .spawn() pattern: each job gets its own GPU instance.\n        loop = asyncio.get_event_loop()\n        result = await loop.run_in_executor(\n            None,\n            lambda: separate_audio_sync(\n                audio_data,\n                filename,\n                task_id,\n                models_list,\n                preset,\n                output_format,\n                output_bitrate,\n                normalization_threshold,\n                amplification_threshold,\n                output_single_stem,\n                invert_using_spec,\n                sample_rate,\n                use_soundfile,\n                use_autocast,\n                custom_output_names_dict,\n                mdx_segment_size,\n                mdx_overlap,\n                mdx_batch_size,\n                mdx_hop_length,\n                mdx_enable_denoise,\n                vr_batch_size,\n                vr_window_size,\n                vr_aggression,\n                vr_enable_tta,\n                vr_high_end_process,\n                vr_enable_post_process,\n                vr_post_process_threshold,\n                demucs_segment_size,\n                demucs_shifts,\n                demucs_overlap,\n                demucs_segments_enabled,\n                mdxc_segment_size,\n                mdxc_override_model_segment_size,\n                mdxc_overlap,\n                mdxc_batch_size,\n                mdxc_pitch_shift,\n            ),\n        )\n\n        # Return the completed/error result (Firestore + GCS already updated by separate_audio_sync)\n        return result\n\n    except HTTPException:\n        raise\n    except Exception as e:\n        raise HTTPException(status_code=500, detail=f\"Separation failed: {str(e)}\") from e\n\n\n@web_app.get(\"/status/{task_id}\")\nasync def get_job_status(task_id: str) -> dict:\n    \"\"\"Get the status of a separation job.\"\"\"\n    result = get_job_store().get(task_id)\n    if result:\n        return result\n    return {\n        \"task_id\": task_id,\n        \"status\": \"not_found\",\n        \"progress\": 0,\n        \"error\": \"Job not found - may have been cleaned up or never existed\",\n    }\n\n\n@web_app.get(\"/download/{task_id}/{file_hash}\")\nasync def download_file(task_id: str, file_hash: str) -> Response:\n    \"\"\"Download a separated audio file using its hash identifier.\"\"\"\n    try:\n        status_data = get_job_store().get(task_id)\n        if not status_data:\n            raise HTTPException(status_code=404, detail=\"Task not found\")\n\n        files_dict = status_data.get(\"files\", {})\n\n        actual_filename = None\n        if isinstance(files_dict, dict):\n            actual_filename = files_dict.get(file_hash)\n\n        if not actual_filename:\n            raise HTTPException(status_code=404, detail=f\"File with hash {file_hash} not found\")\n\n        file_data = get_output_store().get_file_bytes(task_id, actual_filename)\n\n        detected_type = filetype.guess(file_data)\n        content_type = detected_type.mime if detected_type and detected_type.mime else \"application/octet-stream\"\n\n        ascii_filename = \"\".join(c if ord(c) < 128 else \"_\" for c in actual_filename)\n        encoded_filename = quote(actual_filename, safe=\"\")\n        content_disposition = f'attachment; filename=\"{ascii_filename}\"; filename*=UTF-8\\'\\'{encoded_filename}'\n\n        return Response(content=file_data, media_type=content_type, headers={\"Content-Disposition\": content_disposition})\n\n    except HTTPException:\n        raise\n    except Exception as e:\n        raise HTTPException(status_code=500, detail=f\"Download failed: {str(e)}\") from e\n\n\n@web_app.get(\"/models-json\")\nasync def get_available_models() -> PrettyJSONResponse:\n    \"\"\"Get list of available separation models.\"\"\"\n    from audio_separator.separator import Separator\n\n    separator = Separator(info_only=True, model_file_dir=MODEL_DIR)\n    model_list = separator.list_supported_model_files()\n    return PrettyJSONResponse(content=model_list)\n\n\n@web_app.get(\"/models\")\nasync def get_simplified_models_list(filter_sort_by: str = None) -> PlainTextResponse:\n    \"\"\"Get simplified model list in plain text format.\"\"\"\n    from audio_separator.separator import Separator\n\n    separator = Separator(info_only=True, model_file_dir=MODEL_DIR)\n    models_data = separator.get_simplified_model_list(filter_sort_by=filter_sort_by)\n\n    if not models_data:\n        return PlainTextResponse(\"No models found\")\n\n    filename_width = max(len(\"Model Filename\"), max(len(f) for f in models_data.keys()))\n    arch_width = max(len(\"Arch\"), max(len(info[\"Type\"]) for info in models_data.values()))\n    stems_width = max(len(\"Output Stems (SDR)\"), max(len(\", \".join(info[\"Stems\"])) for info in models_data.values()))\n    name_width = max(len(\"Friendly Name\"), max(len(info[\"Name\"]) for info in models_data.values()))\n    total_width = filename_width + arch_width + stems_width + name_width + 15\n\n    output_lines = [\n        \"-\" * total_width,\n        f\"{'Model Filename':<{filename_width}}  {'Arch':<{arch_width}}  {'Output Stems (SDR)':<{stems_width}}  {'Friendly Name'}\",\n        \"-\" * total_width,\n    ]\n    for fname, info in models_data.items():\n        stems = \", \".join(info[\"Stems\"])\n        output_lines.append(f\"{fname:<{filename_width}}  {info['Type']:<{arch_width}}  {stems:<{stems_width}}  {info['Name']}\")\n\n    return PlainTextResponse(\"\\n\".join(output_lines))\n\n\n@web_app.get(\"/presets\")\nasync def list_presets() -> PrettyJSONResponse:\n    \"\"\"List available ensemble presets.\"\"\"\n    from audio_separator.separator import Separator\n\n    separator = Separator(info_only=True, model_file_dir=MODEL_DIR)\n    presets = separator.list_ensemble_presets()\n    return PrettyJSONResponse(content=presets)\n\n\n@web_app.get(\"/health\")\nasync def health_check() -> dict:\n    \"\"\"Health check endpoint.\"\"\"\n    return {\n        \"status\": \"healthy\",\n        \"service\": \"audio-separator-api\",\n        \"version\": AUDIO_SEPARATOR_VERSION,\n        \"models_ready\": models_ready,\n        \"platform\": \"cloud-run\",\n    }\n\n\n@web_app.get(\"/\")\nasync def root() -> dict:\n    \"\"\"Root endpoint with API information.\"\"\"\n    return {\n        \"message\": \"Audio Separator API\",\n        \"version\": AUDIO_SEPARATOR_VERSION,\n        \"platform\": \"cloud-run-gpu\",\n        \"description\": \"Separate vocals from instrumental tracks using AI\",\n        \"features\": [\n            \"Ensemble preset support (instrumental_clean, karaoke, etc.)\",\n            \"Multiple model processing in single job\",\n            \"Full separator parameter compatibility\",\n            \"GPU-accelerated processing (NVIDIA L4)\",\n            \"All MDX, VR, Demucs, and MDXC architectures supported\",\n        ],\n        \"endpoints\": {\n            \"POST /separate\": \"Separate audio file via upload or GCS URI (supports presets, multiple models, all parameters)\",\n            \"GET /status/{task_id}\": \"Get job status and progress\",\n            \"GET /download/{task_id}/{file_hash}\": \"Download separated file using hash identifier\",\n            \"GET /presets\": \"List available ensemble presets\",\n            \"GET /models-json\": \"List available models (JSON)\",\n            \"GET /models\": \"List available models (plain text)\",\n            \"GET /health\": \"Health check\",\n        },\n    }\n\n\n@web_app.on_event(\"startup\")\nasync def startup_event():\n    \"\"\"Clean up local storage and download models from GCS on startup.\"\"\"\n    os.makedirs(MODEL_DIR, exist_ok=True)\n\n    # Wipe local outputs from previous instance\n    outputs_dir = f\"{STORAGE_DIR}/outputs\"\n    if os.path.exists(outputs_dir):\n        shutil.rmtree(outputs_dir, ignore_errors=True)\n    os.makedirs(outputs_dir, exist_ok=True)\n\n    # Clean up old Firestore jobs (>1 hour)\n    try:\n        get_job_store().cleanup_old_jobs(max_age_seconds=3600)\n    except Exception as e:\n        logger.warning(f\"Failed to clean up old jobs: {e}\")\n\n    # Download models in background thread to not block startup probe\n    thread = threading.Thread(target=download_models_from_gcs, daemon=True)\n    thread.start()\n\n\nif __name__ == \"__main__\":\n    uvicorn.run(web_app, host=\"0.0.0.0\", port=PORT)\n"
  },
  {
    "path": "audio_separator/remote/deploy_modal.py",
    "content": "\"\"\"\nAudio Separator API - Simple Modal Deployment\nA FastAPI service for separating vocals from instrumental tracks using audio-separator\n\nFeatures:\n- Asynchronous job processing\n- Progress tracking and status polling\n- Persistent storage for models and outputs\n- GPU acceleration support\n- Multiple audio format support\n\nUsage with Remote CLI:\n1. Install audio-separator package: pip install audio-separator\n2. Set environment variable: export AUDIO_SEPARATOR_API_URL=\"https://your-deployment-url.modal.run\"\n3. Use the remote CLI:\n   - audio-separator-remote separate song.mp3\n   - audio-separator-remote separate song.mp3 --model UVR-MDX-NET-Inst_HQ_4\n   - audio-separator-remote status <task_id>\n   - audio-separator-remote models\n   - audio-separator-remote download <task_id> <filename>\n\"\"\"\n\n# Standard library imports\nimport logging\nimport os\nimport shutil\nimport traceback\nimport uuid\nimport json\nimport hashlib\nfrom importlib.metadata import version\nimport typing\nfrom typing import Optional\nfrom urllib.parse import quote\n\n# Third-party imports\nfrom fastapi import FastAPI, File, Form, HTTPException, Response, UploadFile\nfrom fastapi.middleware.cors import CORSMiddleware\nfrom starlette.responses import Response as StarletteResponse, PlainTextResponse\nimport filetype\nimport modal\n\n# Note: Separator is imported inside functions to allow Modal to parse this file\n# without requiring audio_separator to be installed on the deployment machine\n\n# Constants\nDEFAULT_MODEL_NAME = \"default\"  # Used when no model is specified\n\ndef generate_file_hash(filename: str) -> str:\n    \"\"\"Generate a short, stable hash for a filename to use in download URLs.\"\"\"\n    # Use SHA-256 hash of the filename, take first 16 characters for brevity\n    # This gives us a stable, URL-safe identifier that's much shorter than the filename\n    return hashlib.sha256(filename.encode('utf-8')).hexdigest()[:16]\n\n# Get the version of the installed audio-separator package\ntry:\n    AUDIO_SEPARATOR_VERSION = version(\"audio-separator\")\nexcept Exception:\n    # Fallback version if package version cannot be determined\n    AUDIO_SEPARATOR_VERSION = \"unknown\"\n\n# Create Modal app\napp = modal.App(\"audio-separator\")\n\n# Define the container image; we're using CUDA for hardware acceleration and Python 3.13 for optimal performance\nimage = (\n    modal.Image.from_registry(\"nvidia/cuda:12.9.1-devel-ubuntu22.04\", add_python=\"3.13\")\n    .apt_install(\n        [\n            # Core system packages\n            \"curl\",\n            \"wget\",\n            # Audio libraries and dependencies\n            \"libsndfile1\",\n            \"libsndfile1-dev\",\n            \"libsox-dev\",\n            \"sox\",\n            \"libportaudio2\",\n            \"portaudio19-dev\",\n            \"libasound2-dev\",\n            \"libpulse-dev\",\n            \"libjack-dev\",\n            # Sample rate conversion library\n            \"libsamplerate0\",\n            \"libsamplerate0-dev\",\n            # Build tools for compiling Python packages with C extensions\n            \"build-essential\",\n            \"clang\",\n            \"gcc\",\n            \"g++\",\n            \"make\",\n            \"cmake\",\n            \"pkg-config\",\n        ]\n    )\n    .run_commands(\n        [\n            # Set up CUDA library paths for NVENC support\n            \"echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf\",\n            \"ldconfig\",\n            # Install latest FFmpeg\n            \"wget https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-linux64-gpl.tar.xz\",\n            \"tar -xf ffmpeg-master-latest-linux64-gpl.tar.xz\",\n            \"cp ffmpeg-master-latest-linux64-gpl/bin/* /usr/local/bin/\",\n            \"chmod +x /usr/local/bin/ffmpeg /usr/local/bin/ffprobe\",\n            # Verify installations and NVENC support\n            \"ffmpeg -version\",\n        ]\n    )\n    .pip_install(\n        [\n            # Core audio-separator with GPU support\n            # Always pulls the latest version from PyPI when the image is rebuilt\n            \"audio-separator[gpu]\",\n            # FastAPI and web server dependencies for Modal API deployment\n            \"fastapi>=0.104.0\",\n            \"uvicorn[standard]>=0.24.0\",\n            \"python-multipart>=0.0.6\",\n            # File type detection for response content type\n            \"filetype>=1.2.0\",\n        ]\n    )\n    .env(\n        {\n            \"AUDIO_SEPARATOR_MODEL_DIR\": \"/models\",\n            # CUDA environment for NVENC support\n            \"LD_LIBRARY_PATH\": \"/usr/local/cuda/lib64:$LD_LIBRARY_PATH\",\n            \"PATH\": \"/usr/local/cuda/bin:$PATH\",\n        }\n    )\n)\n\n# Create persistent volume for storing separated files\nvolume = modal.Volume.from_name(\"audio-separator-storage\", create_if_missing=True)\n\n# Create persistent volume for caching downloaded models\nmodels_volume = modal.Volume.from_name(\"audio-separator-models\", create_if_missing=True)\n\n# Modal Dict for job status tracking is accessed by name in functions\n# job_status_dict = modal.Dict.from_name(\"audio-separator-job-status\", create_if_missing=True)\n\n\nclass PrettyJSONResponse(StarletteResponse):\n    \"\"\"Custom JSON response class for pretty-printing JSON\"\"\"\n\n    media_type = \"application/json\"\n\n    def render(self, content: typing.Any) -> bytes:\n        return json.dumps(content, ensure_ascii=False, allow_nan=False, indent=4, separators=(\", \", \": \")).encode(\"utf-8\")\n\n\n@app.function(image=image, gpu=\"ANY\", timeout=1200, volumes={\"/storage\": volume, \"/models\": models_volume}, scaledown_window=300)\ndef separate_audio_function(\n    audio_data: bytes,\n    filename: str,\n    models: Optional[list] = None,\n    task_id: Optional[str] = None,\n    # Separator parameters\n    output_format: str = \"flac\",\n    output_bitrate: Optional[str] = None,\n    normalization_threshold: float = 0.9,\n    amplification_threshold: float = 0.0,\n    output_single_stem: Optional[str] = None,\n    invert_using_spec: bool = False,\n    sample_rate: int = 44100,\n    use_soundfile: bool = False,\n    use_autocast: bool = False,\n    custom_output_names: Optional[dict] = None,\n    # MDX parameters\n    mdx_segment_size: int = 256,\n    mdx_overlap: float = 0.25,\n    mdx_batch_size: int = 1,\n    mdx_hop_length: int = 1024,\n    mdx_enable_denoise: bool = False,\n    # VR parameters\n    vr_batch_size: int = 1,\n    vr_window_size: int = 512,\n    vr_aggression: int = 5,\n    vr_enable_tta: bool = False,\n    vr_high_end_process: bool = False,\n    vr_enable_post_process: bool = False,\n    vr_post_process_threshold: float = 0.2,\n    # Demucs parameters\n    demucs_segment_size: str = \"Default\",\n    demucs_shifts: int = 2,\n    demucs_overlap: float = 0.25,\n    demucs_segments_enabled: bool = True,\n    # MDXC parameters\n    mdxc_segment_size: int = 256,\n    mdxc_override_model_segment_size: bool = False,\n    mdxc_overlap: int = 8,\n    mdxc_batch_size: int = 1,\n    mdxc_pitch_shift: int = 0,\n) -> dict:\n    \"\"\"\n    Separate audio into stems using one or more models\n    \"\"\"\n    from audio_separator.separator import Separator\n\n    if task_id is None:\n        task_id = str(uuid.uuid4())\n\n    # Default to single default model if no models specified\n    if models is None or len(models) == 0:\n        models = [None]  # None will use separator's default\n\n    all_output_files = {}  # Dictionary mapping file hashes to filenames\n    models_used = []\n    current_model_index = 0\n    total_models = len(models)\n\n    def update_job_status(status: str, progress: int = 0, error: str = None, files: list = None):\n        \"\"\"Update job status in Modal Dict\"\"\"\n        status_data = {\n            \"task_id\": task_id,\n            \"status\": status,\n            \"progress\": progress,\n            \"original_filename\": filename,\n            \"models_used\": models_used,\n            \"total_models\": total_models,\n            \"current_model_index\": current_model_index,\n            \"files\": files or [],\n        }\n        if error:\n            status_data[\"error\"] = error\n\n        # Access Modal Dict by name within function scope\n        job_status = modal.Dict.from_name(\"audio-separator-job-status\", create_if_missing=True)\n        job_status[task_id] = status_data\n\n    try:\n        # Ensure storage directories exist\n        os.makedirs(\"/storage/uploads\", exist_ok=True)\n        os.makedirs(\"/storage/outputs\", exist_ok=True)\n        os.makedirs(\"/models\", exist_ok=True)\n\n        # Update status: starting\n        update_job_status(\"processing\", 5)\n\n        # Create output directory\n        output_dir = f\"/storage/outputs/{task_id}\"\n        os.makedirs(output_dir, exist_ok=True)\n\n        # Write uploaded file directly to output directory with original filename\n        input_file_path = os.path.join(output_dir, filename)\n        with open(input_file_path, \"wb\") as f:\n            f.write(audio_data)\n\n        update_job_status(\"processing\", 10)\n\n        # Process each model\n        for model_index, model_name in enumerate(models):\n            current_model_index = model_index\n            base_progress = 10 + (model_index * 80 // total_models)\n            model_progress_range = 80 // total_models\n\n            print(f\"Processing model {model_index + 1}/{total_models}: {model_name or 'default'}\")\n            update_job_status(\"processing\", base_progress + (model_progress_range // 4))\n\n            # Initialize separator with all the provided parameters\n            separator = Separator(\n                log_level=logging.INFO,\n                model_file_dir=\"/models\",\n                output_dir=output_dir,\n                output_format=output_format,\n                output_bitrate=output_bitrate,\n                normalization_threshold=normalization_threshold,\n                amplification_threshold=amplification_threshold,\n                output_single_stem=output_single_stem,\n                invert_using_spec=invert_using_spec,\n                sample_rate=sample_rate,\n                use_soundfile=use_soundfile,\n                use_autocast=use_autocast,\n                mdx_params={\"hop_length\": mdx_hop_length, \"segment_size\": mdx_segment_size, \"overlap\": mdx_overlap, \"batch_size\": mdx_batch_size, \"enable_denoise\": mdx_enable_denoise},\n                vr_params={\n                    \"batch_size\": vr_batch_size,\n                    \"window_size\": vr_window_size,\n                    \"aggression\": vr_aggression,\n                    \"enable_tta\": vr_enable_tta,\n                    \"enable_post_process\": vr_enable_post_process,\n                    \"post_process_threshold\": vr_post_process_threshold,\n                    \"high_end_process\": vr_high_end_process,\n                },\n                demucs_params={\"segment_size\": demucs_segment_size, \"shifts\": demucs_shifts, \"overlap\": demucs_overlap, \"segments_enabled\": demucs_segments_enabled},\n                mdxc_params={\n                    \"segment_size\": mdxc_segment_size,\n                    \"batch_size\": mdxc_batch_size,\n                    \"overlap\": mdxc_overlap,\n                    \"override_model_segment_size\": mdxc_override_model_segment_size,\n                    \"pitch_shift\": mdxc_pitch_shift,\n                },\n            )\n\n            # Load the model\n            update_job_status(\"processing\", base_progress + (model_progress_range // 2))\n            if model_name:\n                print(f\"Loading specified model: {model_name}\")\n                separator.load_model(model_name)\n                models_used.append(model_name)\n            else:\n                print(f\"No model specified, using default model\")\n                separator.load_model()\n                models_used.append(\"default\")\n\n            # Perform separation\n            update_job_status(\"processing\", base_progress + (3 * model_progress_range // 4))\n            print(f\"Separating audio file: {filename} with model: {models_used[-1]}\")\n\n            # Create custom output names for this model if multiple models\n            model_custom_output_names = None\n            if total_models > 1 and custom_output_names:\n                # Add model name suffix to custom output names\n                model_suffix = f\"_{models_used[-1].replace('.', '_').replace('/', '_')}\"\n                model_custom_output_names = {stem: f\"{name}{model_suffix}\" for stem, name in custom_output_names.items()}\n            elif custom_output_names:\n                model_custom_output_names = custom_output_names\n\n            output_files = separator.separate(input_file_path, custom_output_names=model_custom_output_names)\n\n            if not output_files:\n                error_msg = f\"Separation with model {models_used[-1]} completed but produced no output files\"\n                print(f\"❌ {error_msg}\")\n                update_job_status(\"error\", 0, error=error_msg)\n                return {\"task_id\": task_id, \"status\": \"error\", \"error\": error_msg, \"models_used\": models_used, \"original_filename\": filename}\n\n            # Convert full paths to filenames and add to collection with hashes\n            model_result_files = [os.path.basename(f) for f in output_files]\n            for filename in model_result_files:\n                file_hash = generate_file_hash(filename)\n                all_output_files[file_hash] = filename\n            print(f\"Model {models_used[-1]} produced {len(model_result_files)} files: {model_result_files}\")\n\n        # Update final status\n        update_job_status(\"processing\", 95)\n        print(f\"All separations completed. Total output files: {len(all_output_files)}\")\n\n        # Commit volume changes\n        volume.commit()\n        models_volume.commit()\n\n        # Update status: completed\n        update_job_status(\"completed\", 100, files=all_output_files)\n\n        return {\"task_id\": task_id, \"status\": \"completed\", \"files\": all_output_files, \"models_used\": models_used, \"original_filename\": filename}\n\n    except FileNotFoundError as e:\n        print(f\"Input file not found: {str(e)}\")\n        traceback.print_exc()\n        try:\n            update_job_status(\"error\", 0, error=f\"Input file not found: {str(e)}\")\n        except Exception as status_error:\n            print(f\"WARNING: Failed to update job status: {status_error}\")\n        return {\"task_id\": task_id, \"status\": \"error\", \"error\": f\"Input file not found: {str(e)}\", \"models_used\": models_used, \"original_filename\": filename}\n\n    except ValueError as e:\n        print(f\"Invalid input or configuration: {str(e)}\")\n        traceback.print_exc()\n        try:\n            update_job_status(\"error\", 0, error=f\"Invalid input: {str(e)}\")\n        except Exception as status_error:\n            print(f\"WARNING: Failed to update job status: {status_error}\")\n        return {\"task_id\": task_id, \"status\": \"error\", \"error\": f\"Invalid input: {str(e)}\", \"models_used\": models_used, \"original_filename\": filename}\n\n    except Exception as e:\n        print(f\"Unexpected error during separation: {str(e)}\")\n        traceback.print_exc()\n        try:\n            update_job_status(\"error\", 0, error=str(e))\n        except Exception as status_error:\n            print(f\"WARNING: Failed to update job status: {status_error}\")\n\n        # Clean up on error\n        if os.path.exists(input_file_path):\n            os.unlink(input_file_path)\n        if os.path.exists(output_dir):\n            shutil.rmtree(output_dir, ignore_errors=True)\n\n        return {\"task_id\": task_id, \"status\": \"error\", \"error\": str(e), \"models_used\": models_used, \"original_filename\": filename}\n\n\n@app.function(image=image, timeout=300, volumes={\"/storage\": volume})\ndef get_job_status_function(task_id: str) -> dict:\n    \"\"\"\n    Get the status of a separation job\n    \"\"\"\n    try:\n        # Access Modal Dict by name within function scope\n        job_status = modal.Dict.from_name(\"audio-separator-job-status\", create_if_missing=True)\n\n        if task_id in job_status:\n            return job_status[task_id]\n        else:\n            # Job not found - might be initializing or doesn't exist\n            return {\"task_id\": task_id, \"status\": \"not_found\", \"progress\": 0, \"error\": \"Job not found - may have been cleaned up or never existed\"}\n    except Exception as e:\n        print(f\"ERROR: Failed to access job status for {task_id}: {str(e)}\")\n        return {\"task_id\": task_id, \"status\": \"error\", \"error\": f\"Failed to read status: {str(e)}\"}\n\n\n@app.function(image=image, timeout=300, volumes={\"/storage\": volume})\ndef get_file_function(task_id: str, filename: str) -> bytes:\n    \"\"\"\n    Retrieve a separated audio file\n    \"\"\"\n    file_path = f\"/storage/outputs/{task_id}/{filename}\"\n\n    if not os.path.exists(file_path):\n        raise FileNotFoundError(f\"File not found: {filename}\")\n\n    with open(file_path, \"rb\") as f:\n        return f.read()\n\n\n@app.function(image=image, timeout=300, volumes={\"/storage\": volume})\ndef get_file_by_hash_function(task_id: str, file_hash: str) -> tuple[bytes, str]:\n    \"\"\"\n    Retrieve a separated audio file by its hash identifier.\n    Returns tuple of (file_data, actual_filename)\n    \"\"\"\n    print(f\"🔍 get_file_by_hash_function called - Task ID: {task_id}, File hash: {file_hash}\")\n    \n    # Reload the volume to ensure we see the latest files written by other function executions\n    print(f\"🔍 Reloading volume to see latest files...\")\n    volume.reload()\n    \n    # Access Modal Dict to get the job status with file hash mappings\n    job_status = modal.Dict.from_name(\"audio-separator-job-status\", create_if_missing=True)\n    \n    if task_id not in job_status:\n        print(f\"❌ Task not found in job_status: {task_id}\")\n        raise FileNotFoundError(f\"Task not found: {task_id}\")\n    \n    status_data = job_status[task_id]\n    files_dict = status_data.get(\"files\", {})\n    print(f\"🔍 Retrieved files_dict: {files_dict}\")\n    print(f\"🔍 files_dict type: {type(files_dict)}\")\n    \n    # Check if files is still a list (backward compatibility)\n    if isinstance(files_dict, list):\n        print(f\"🔍 Using legacy list format with {len(files_dict)} files\")\n        # For backward compatibility, try to find file by regenerating hash\n        for filename in files_dict:\n            generated_hash = generate_file_hash(filename)\n            print(f\"🔍 Checking filename '{filename}' -> hash '{generated_hash}' vs requested '{file_hash}'\")\n            if generated_hash == file_hash:\n                file_path = f\"/storage/outputs/{task_id}/{filename}\"\n                print(f\"🔍 Hash match! Checking file path: {file_path}\")\n                if os.path.exists(file_path):\n                    print(f\"✅ File exists, returning content\")\n                    with open(file_path, \"rb\") as f:\n                        return f.read(), filename\n                else:\n                    print(f\"❌ File does not exist at path: {file_path}\")\n        raise FileNotFoundError(f\"File with hash {file_hash} not found in legacy format\")\n    \n    # Normal case: files is a dictionary mapping hashes to filenames\n    print(f\"🔍 Using new hash format with {len(files_dict)} files\")\n    print(f\"🔍 Available hashes: {list(files_dict.keys())}\")\n    \n    if file_hash not in files_dict:\n        print(f\"❌ Hash {file_hash} not found in files_dict\")\n        raise FileNotFoundError(f\"File with hash {file_hash} not found\")\n    \n    actual_filename = files_dict[file_hash]\n    file_path = f\"/storage/outputs/{task_id}/{actual_filename}\"\n    print(f\"🔍 Hash found! Filename: '{actual_filename}'\")\n    print(f\"🔍 Checking file path: {file_path}\")\n\n    if not os.path.exists(file_path):\n        print(f\"❌ File does not exist at path: {file_path}\")\n        # List what files actually exist in the directory\n        task_dir = f\"/storage/outputs/{task_id}\"\n        if os.path.exists(task_dir):\n            actual_files = os.listdir(task_dir)\n            print(f\"🔍 Files actually in directory ({len(actual_files)}):\")\n            for i, actual_file in enumerate(actual_files):\n                print(f\"  [{i}] '{actual_file}'\")\n                if actual_file == actual_filename:\n                    print(f\"    ✅ EXACT MATCH found!\")\n        else:\n            print(f\"❌ Task directory does not exist: {task_dir}\")\n        raise FileNotFoundError(f\"File not found: {actual_filename}\")\n\n    print(f\"✅ File exists, returning content\")\n    with open(file_path, \"rb\") as f:\n        return f.read(), actual_filename\n\n\n@app.function(image=image, timeout=60, volumes={\"/models\": models_volume})\ndef list_available_models() -> dict:\n    \"\"\"\n    List available separation models using the same approach as CLI\n    \"\"\"\n    from audio_separator.separator import Separator\n\n    # Use the persistent model directory\n    model_dir = \"/models\"\n\n    # Ensure the model directory exists\n    os.makedirs(model_dir, exist_ok=True)\n\n    # Use the same approach as the CLI: create separator with info_only=True\n    separator = Separator(info_only=True, model_file_dir=model_dir)\n\n    # Get the list of supported models\n    model_list = separator.list_supported_model_files()\n\n    # Return the full model dictionary\n    return model_list\n\n\n@app.function(image=image, timeout=60, volumes={\"/models\": models_volume})\ndef get_simplified_models(filter_sort_by: str = None) -> dict:\n    \"\"\"\n    Get simplified model list using the same approach as CLI --list_models\n    \"\"\"\n    from audio_separator.separator import Separator\n\n    # Use the persistent model directory\n    model_dir = \"/models\"\n\n    # Ensure the model directory exists\n    os.makedirs(model_dir, exist_ok=True)\n\n    # Use the same approach as the CLI: create separator with info_only=True\n    separator = Separator(info_only=True, model_file_dir=model_dir)\n\n    # Get the simplified model list\n    simplified_models = separator.get_simplified_model_list(filter_sort_by=filter_sort_by)\n\n    return simplified_models\n\n\nweb_app = FastAPI(title=\"Audio Separator API\", description=\"Separate vocals from instrumental tracks using AI\", version=AUDIO_SEPARATOR_VERSION)\n\nweb_app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], allow_credentials=True, allow_methods=[\"*\"], allow_headers=[\"*\"])\n\n\n@web_app.post(\"/separate\")\nasync def separate_audio(\n    file: UploadFile = File(..., description=\"Audio file to separate\"),\n    # Model selection - support both single model and multiple models\n    model: Optional[str] = Form(None, description=\"Single model to use for separation (for backwards compatibility)\"),\n    models: Optional[str] = Form(None, description='JSON list of models to use for separation, e.g. [\"model1.ckpt\", \"model2.onnx\"]'),\n    # Output parameters\n    output_format: str = Form(\"flac\", description=\"Output format for separated files\"),\n    output_bitrate: Optional[str] = Form(None, description=\"Output bitrate for separated files\"),\n    normalization_threshold: float = Form(0.9, description=\"Max peak amplitude to normalize audio to\"),\n    amplification_threshold: float = Form(0.0, description=\"Min peak amplitude to amplify audio to\"),\n    output_single_stem: Optional[str] = Form(None, description=\"Output only single stem (e.g. Vocals, Instrumental)\"),\n    invert_using_spec: bool = Form(False, description=\"Invert secondary stem using spectrogram\"),\n    sample_rate: int = Form(44100, description=\"Sample rate of output audio\"),\n    use_soundfile: bool = Form(False, description=\"Use soundfile for output writing\"),\n    use_autocast: bool = Form(False, description=\"Use PyTorch autocast for faster inference\"),\n    custom_output_names: Optional[str] = Form(None, description=\"JSON dict of custom output names\"),\n    # MDX parameters\n    mdx_segment_size: int = Form(256, description=\"MDX segment size\"),\n    mdx_overlap: float = Form(0.25, description=\"MDX overlap\"),\n    mdx_batch_size: int = Form(1, description=\"MDX batch size\"),\n    mdx_hop_length: int = Form(1024, description=\"MDX hop length\"),\n    mdx_enable_denoise: bool = Form(False, description=\"Enable MDX denoising\"),\n    # VR parameters\n    vr_batch_size: int = Form(1, description=\"VR batch size\"),\n    vr_window_size: int = Form(512, description=\"VR window size\"),\n    vr_aggression: int = Form(5, description=\"VR aggression\"),\n    vr_enable_tta: bool = Form(False, description=\"Enable VR Test-Time-Augmentation\"),\n    vr_high_end_process: bool = Form(False, description=\"Enable VR high end processing\"),\n    vr_enable_post_process: bool = Form(False, description=\"Enable VR post processing\"),\n    vr_post_process_threshold: float = Form(0.2, description=\"VR post process threshold\"),\n    # Demucs parameters\n    demucs_segment_size: str = Form(\"Default\", description=\"Demucs segment size\"),\n    demucs_shifts: int = Form(2, description=\"Demucs shifts\"),\n    demucs_overlap: float = Form(0.25, description=\"Demucs overlap\"),\n    demucs_segments_enabled: bool = Form(True, description=\"Enable Demucs segments\"),\n    # MDXC parameters\n    mdxc_segment_size: int = Form(256, description=\"MDXC segment size\"),\n    mdxc_override_model_segment_size: bool = Form(False, description=\"Override MDXC model segment size\"),\n    mdxc_overlap: int = Form(8, description=\"MDXC overlap\"),\n    mdxc_batch_size: int = Form(1, description=\"MDXC batch size\"),\n    mdxc_pitch_shift: int = Form(0, description=\"MDXC pitch shift\"),\n) -> dict:\n    \"\"\"\n    Upload an audio file and separate it into stems using one or more models (asynchronous processing)\n    \"\"\"\n    if not file.filename:\n        raise HTTPException(status_code=400, detail=\"No file provided\")\n\n    try:\n        # Parse models parameter\n        models_list = None\n        if models:\n            try:\n                models_list = json.loads(models)\n                if not isinstance(models_list, list):\n                    raise ValueError(\"Models must be a JSON list\")\n            except json.JSONDecodeError as e:\n                raise HTTPException(status_code=400, detail=f\"Invalid JSON in models parameter: {e}\")\n        elif model:\n            # Backwards compatibility: single model parameter\n            models_list = [model]\n        # If neither provided, models_list stays None (will use default)\n\n        # Parse custom_output_names if provided\n        custom_output_names_dict = None\n        if custom_output_names:\n            try:\n                custom_output_names_dict = json.loads(custom_output_names)\n                if not isinstance(custom_output_names_dict, dict):\n                    raise ValueError(\"Custom output names must be a JSON object\")\n            except json.JSONDecodeError as e:\n                raise HTTPException(status_code=400, detail=f\"Invalid JSON in custom_output_names parameter: {e}\")\n\n        # Read file data\n        audio_data = await file.read()\n\n        # Generate task ID\n        task_id = str(uuid.uuid4())\n\n        # Create initial status in Modal Dict\n        initial_status = {\n            \"task_id\": task_id,\n            \"status\": \"submitted\",\n            \"progress\": 0,\n            \"original_filename\": file.filename,\n            \"models_used\": models_list or [\"default\"],\n            \"total_models\": len(models_list) if models_list else 1,\n            \"current_model_index\": 0,\n            \"files\": [],\n        }\n\n        # Access Modal Dict by name to ensure proper scope\n        job_status = modal.Dict.from_name(\"audio-separator-job-status\", create_if_missing=True)\n        job_status[task_id] = initial_status\n\n        # Submit job asynchronously with all parameters\n        separate_audio_function.spawn(\n            audio_data,\n            file.filename,\n            models_list,\n            task_id,\n            # Separator parameters\n            output_format,\n            output_bitrate,\n            normalization_threshold,\n            amplification_threshold,\n            output_single_stem,\n            invert_using_spec,\n            sample_rate,\n            use_soundfile,\n            use_autocast,\n            custom_output_names_dict,\n            # MDX parameters\n            mdx_segment_size,\n            mdx_overlap,\n            mdx_batch_size,\n            mdx_hop_length,\n            mdx_enable_denoise,\n            # VR parameters\n            vr_batch_size,\n            vr_window_size,\n            vr_aggression,\n            vr_enable_tta,\n            vr_high_end_process,\n            vr_enable_post_process,\n            vr_post_process_threshold,\n            # Demucs parameters\n            demucs_segment_size,\n            demucs_shifts,\n            demucs_overlap,\n            demucs_segments_enabled,\n            # MDXC parameters\n            mdxc_segment_size,\n            mdxc_override_model_segment_size,\n            mdxc_overlap,\n            mdxc_batch_size,\n            mdxc_pitch_shift,\n        )\n\n        return {\n            \"task_id\": task_id,\n            \"status\": \"submitted\",\n            \"message\": \"Job submitted for processing. Use /status/{task_id} to check progress.\",\n            \"models_used\": models_list or [\"default\"],\n            \"total_models\": len(models_list) if models_list else 1,\n            \"original_filename\": file.filename,\n        }\n    except HTTPException:\n        raise\n    except Exception as e:\n        raise HTTPException(status_code=500, detail=f\"Separation failed: {str(e)}\") from e\n\n\n@web_app.get(\"/status/{task_id}\")\nasync def get_job_status(task_id: str) -> dict:\n    \"\"\"\n    Get the status of a separation job\n    \"\"\"\n    try:\n        status_data = get_job_status_function.remote(task_id)\n        return status_data\n    except Exception as e:\n        raise HTTPException(status_code=500, detail=f\"Failed to get job status: {str(e)}\") from e\n\n\n@web_app.get(\"/download/{task_id}/{file_hash}\")\nasync def download_file(task_id: str, file_hash: str) -> Response:\n    \"\"\"\n    Download a separated audio file using its hash identifier\n    \"\"\"\n    try:\n        file_data, actual_filename = get_file_by_hash_function.remote(task_id, file_hash)\n\n        # Detect file type from content\n        detected_type = filetype.guess(file_data)\n\n        if detected_type and detected_type.mime:\n            content_type = detected_type.mime\n        else:\n            # Log when we can't detect the file type\n            print(f\"WARNING: Could not detect MIME type for {actual_filename}, using generic type\")\n            content_type = \"application/octet-stream\"\n\n        # RFC 5987 encoding for Unicode filenames in Content-Disposition header\n        # Provide ASCII fallback for older clients, and UTF-8 encoded filename for modern clients\n        ascii_filename = \"\".join(c if ord(c) < 128 else \"_\" for c in actual_filename)\n        encoded_filename = quote(actual_filename, safe=\"\")\n        content_disposition = f\"attachment; filename=\\\"{ascii_filename}\\\"; filename*=UTF-8''{encoded_filename}\"\n\n        return Response(content=file_data, media_type=content_type, headers={\"Content-Disposition\": content_disposition})\n\n    except FileNotFoundError as exc:\n        raise HTTPException(status_code=404, detail=\"File not found\") from exc\n    except Exception as e:\n        raise HTTPException(status_code=500, detail=f\"Download failed: {str(e)}\") from e\n\n\n@web_app.get(\"/models-json\")\nasync def get_available_models() -> PrettyJSONResponse:\n    \"\"\"\n    Get list of available separation models\n    \"\"\"\n    models = list_available_models.remote()\n\n    # Return pretty-printed JSON for better readability\n    return PrettyJSONResponse(content=models)\n\n\n@web_app.get(\"/models\")\nasync def get_simplified_models_list(filter_sort_by: str = None) -> PlainTextResponse:\n    \"\"\"\n    Get simplified model list in plain text format (like CLI --list_models)\n    \"\"\"\n    models = get_simplified_models.remote(filter_sort_by=filter_sort_by)\n\n    if not models:\n        return PlainTextResponse(\"No models found\")\n\n    # Calculate maximum widths for each column\n    filename_width = max(len(\"Model Filename\"), max(len(filename) for filename in models.keys()))\n    arch_width = max(len(\"Arch\"), max(len(info[\"Type\"]) for info in models.values()))\n    stems_width = max(len(\"Output Stems (SDR)\"), max(len(\", \".join(info[\"Stems\"])) for info in models.values()))\n    name_width = max(len(\"Friendly Name\"), max(len(info[\"Name\"]) for info in models.values()))\n\n    # Calculate total width for separator line\n    total_width = filename_width + arch_width + stems_width + name_width + 15  # 15 accounts for spacing between columns\n\n    # Format the output with dynamic widths and extra spacing\n    output_lines = []\n    output_lines.append(\"-\" * total_width)\n    output_lines.append(f\"{'Model Filename':<{filename_width}}  {'Arch':<{arch_width}}  {'Output Stems (SDR)':<{stems_width}}  {'Friendly Name'}\")\n    output_lines.append(\"-\" * total_width)\n\n    for filename, info in models.items():\n        stems = \", \".join(info[\"Stems\"])\n        output_lines.append(f\"{filename:<{filename_width}}  {info['Type']:<{arch_width}}  {stems:<{stems_width}}  {info['Name']}\")\n\n    return PlainTextResponse(\"\\n\".join(output_lines))\n\n\n@web_app.get(\"/health\")\nasync def health_check() -> dict:\n    \"\"\"\n    Health check endpoint\n    \"\"\"\n    return {\"status\": \"healthy\", \"service\": \"audio-separator-api\", \"version\": AUDIO_SEPARATOR_VERSION}\n\n\n@web_app.get(\"/\")\nasync def root() -> dict:\n    \"\"\"\n    Root endpoint with API information\n    \"\"\"\n    return {\n        \"message\": \"Audio Separator API\",\n        \"version\": AUDIO_SEPARATOR_VERSION,\n        \"description\": (\"Separate vocals from instrumental tracks using AI - \" \"supports all formats and parameters that audio-separator CLI supports\"),\n        \"features\": [\n            \"Multiple model processing in single job\",\n            \"Full separator parameter compatibility\",\n            \"Asynchronous processing with progress tracking\",\n            \"All MDX, VR, Demucs, and MDXC architectures supported\",\n            \"Custom output naming and format options\",\n        ],\n        \"endpoints\": {\n            \"POST /separate\": \"Upload and separate audio file (supports multiple models and all separator parameters)\",\n            \"GET /status/{task_id}\": \"Get job status and progress (includes model-specific progress and file hashes)\",\n            \"GET /download/{task_id}/{file_hash}\": \"Download separated file using hash identifier (avoids URL length limits)\",\n            \"GET /models-json\": \"List available models (JSON format)\",\n            \"GET /models\": \"List available models (plain text format like CLI --list_models)\",\n            \"GET /health\": \"Health check\",\n        },\n        \"note\": (\"Full-featured wrapper around audio-separator with complete parameter compatibility\"),\n        \"remote_cli\": {\n            \"install\": \"pip install audio-separator\",\n            \"setup\": 'export AUDIO_SEPARATOR_API_URL=\"https://your-deployment-url.modal.run\"',\n            \"usage\": [\n                \"audio-separator-remote separate song.mp3\",\n                \"audio-separator-remote separate song.mp3 --model UVR-MDX-NET-Inst_HQ_4\",\n                \"audio-separator-remote separate song.mp3 --models model1.ckpt model2.onnx\",\n                'audio-separator-remote separate song.mp3 --custom_output_names \\'{\"Vocals\": \"lead_vocals\"}\\'',\n                \"audio-separator-remote status <task_id>\",\n                \"audio-separator-remote models --filter vocals\",\n            ],\n        },\n    }\n\n\n@app.function(image=image, timeout=600, scaledown_window=300, volumes={\"/storage\": volume})\n@modal.asgi_app()\ndef api() -> FastAPI:\n    \"\"\"\n    Deploy the FastAPI app as a Modal ASGI application\n    \"\"\"\n    return web_app\n"
  },
  {
    "path": "audio_separator/remote/job_store.py",
    "content": "\"\"\"Firestore-backed job status store for audio separation jobs.\n\nReplaces the in-memory dict so any Cloud Run instance can read/write job status.\n\"\"\"\nimport logging\nimport time\nfrom typing import Optional\n\nlogger = logging.getLogger(\"audio-separator-api\")\n\nCOLLECTION = \"audio_separation_jobs\"\n\n\nclass FirestoreJobStore:\n    \"\"\"Job status store backed by Firestore.\n\n    Provides dict-like get/set interface for job status documents.\n    \"\"\"\n\n    def __init__(self, project: str = \"nomadkaraoke\"):\n        from google.cloud import firestore\n\n        self._firestore = firestore\n        self._db = firestore.Client(project=project)\n        self._collection = self._db.collection(COLLECTION)\n\n    def set(self, task_id: str, data: dict) -> None:\n        \"\"\"Create or overwrite a job status document.\"\"\"\n        data = {**data, \"updated_at\": self._firestore.SERVER_TIMESTAMP}\n        if \"created_at\" not in data:\n            data[\"created_at\"] = self._firestore.SERVER_TIMESTAMP\n        self._collection.document(task_id).set(data)\n\n    def get(self, task_id: str) -> Optional[dict]:\n        \"\"\"Get job status. Returns None if not found.\"\"\"\n        doc = self._collection.document(task_id).get()\n        if doc.exists:\n            return doc.to_dict()\n        return None\n\n    def update(self, task_id: str, fields: dict) -> None:\n        \"\"\"Merge fields into an existing document.\"\"\"\n        fields = {**fields, \"updated_at\": self._firestore.SERVER_TIMESTAMP}\n        self._collection.document(task_id).update(fields)\n\n    def delete(self, task_id: str) -> None:\n        \"\"\"Delete a job status document.\"\"\"\n        self._collection.document(task_id).delete()\n\n    def __contains__(self, task_id: str) -> bool:\n        \"\"\"Check if a task exists.\"\"\"\n        doc = self._collection.document(task_id).get()\n        return doc.exists\n\n    def cleanup_old_jobs(self, max_age_seconds: int = 3600) -> int:\n        \"\"\"Delete completed/errored jobs older than max_age_seconds. Returns count deleted.\"\"\"\n        cutoff = time.time() - max_age_seconds\n        from datetime import datetime, timezone\n        cutoff_dt = datetime.fromtimestamp(cutoff, tz=timezone.utc)\n\n        deleted = 0\n        query = (\n            self._collection\n            .where(\"status\", \"in\", [\"completed\", \"error\"])\n            .where(\"updated_at\", \"<\", cutoff_dt)\n        )\n        for doc in query.stream():\n            doc.reference.delete()\n            deleted += 1\n\n        if deleted:\n            logger.info(f\"Cleaned up {deleted} old job(s) from Firestore\")\n        return deleted\n"
  },
  {
    "path": "audio_separator/remote/output_store.py",
    "content": "\"\"\"GCS-backed output file store for audio separation results.\n\nUploads separation output files to GCS so any Cloud Run instance can serve downloads.\n\"\"\"\nimport logging\nimport os\n\nlogger = logging.getLogger(\"audio-separator-api\")\n\n\nclass GCSOutputStore:\n    \"\"\"Manages separation output files in GCS.\"\"\"\n\n    def __init__(self, bucket_name: str = \"nomadkaraoke-audio-separator-outputs\", project: str = \"nomadkaraoke\"):\n        from google.cloud import storage\n\n        self._client = storage.Client(project=project)\n        self._bucket = self._client.bucket(bucket_name)\n\n    def upload_task_outputs(self, task_id: str, local_dir: str) -> list[str]:\n        \"\"\"Upload all files in local_dir to GCS under {task_id}/ prefix.\n\n        Returns list of uploaded filenames.\n        \"\"\"\n        uploaded = []\n        for filename in os.listdir(local_dir):\n            local_path = os.path.join(local_dir, filename)\n            if not os.path.isfile(local_path):\n                continue\n            gcs_path = f\"{task_id}/{filename}\"\n            blob = self._bucket.blob(gcs_path)\n            blob.upload_from_filename(local_path)\n            uploaded.append(filename)\n            logger.info(f\"Uploaded {filename} to gs://{self._bucket.name}/{gcs_path}\")\n        return uploaded\n\n    def get_file_bytes(self, task_id: str, filename: str) -> bytes:\n        \"\"\"Download file content as bytes (for HTTP responses).\"\"\"\n        gcs_path = f\"{task_id}/{filename}\"\n        blob = self._bucket.blob(gcs_path)\n        return blob.download_as_bytes()\n\n    def download_file(self, task_id: str, filename: str, local_path: str) -> str:\n        \"\"\"Download a file from GCS to a local path.\"\"\"\n        gcs_path = f\"{task_id}/{filename}\"\n        blob = self._bucket.blob(gcs_path)\n        blob.download_to_filename(local_path)\n        return local_path\n\n    def delete_task_outputs(self, task_id: str) -> int:\n        \"\"\"Delete all output files for a task. Returns count deleted.\"\"\"\n        deleted = 0\n        for blob in self._bucket.list_blobs(prefix=f\"{task_id}/\"):\n            blob.delete()\n            deleted += 1\n        if deleted:\n            logger.info(f\"Deleted {deleted} output file(s) for task {task_id}\")\n        return deleted\n"
  },
  {
    "path": "audio_separator/remote/requirements.txt",
    "content": "modal\n"
  },
  {
    "path": "audio_separator/separator/__init__.py",
    "content": "from .separator import Separator\n"
  },
  {
    "path": "audio_separator/separator/architectures/__init__.py",
    "content": ""
  },
  {
    "path": "audio_separator/separator/architectures/demucs_separator.py",
    "content": "import os\nimport sys\nfrom pathlib import Path\nimport torch\nimport numpy as np\nfrom audio_separator.separator.common_separator import CommonSeparator\nfrom audio_separator.separator.uvr_lib_v5.demucs.apply import apply_model, demucs_segments\nfrom audio_separator.separator.uvr_lib_v5.demucs.hdemucs import HDemucs\nfrom audio_separator.separator.uvr_lib_v5.demucs.pretrained import get_model as get_demucs_model\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\n\nDEMUCS_4_SOURCE = [\"drums\", \"bass\", \"other\", \"vocals\"]\n\nDEMUCS_2_SOURCE_MAPPER = {CommonSeparator.INST_STEM: 0, CommonSeparator.VOCAL_STEM: 1}\nDEMUCS_4_SOURCE_MAPPER = {CommonSeparator.BASS_STEM: 0, CommonSeparator.DRUM_STEM: 1, CommonSeparator.OTHER_STEM: 2, CommonSeparator.VOCAL_STEM: 3}\nDEMUCS_6_SOURCE_MAPPER = {\n    CommonSeparator.BASS_STEM: 0,\n    CommonSeparator.DRUM_STEM: 1,\n    CommonSeparator.OTHER_STEM: 2,\n    CommonSeparator.VOCAL_STEM: 3,\n    CommonSeparator.GUITAR_STEM: 4,\n    CommonSeparator.PIANO_STEM: 5,\n}\n\n\nclass DemucsSeparator(CommonSeparator):\n    \"\"\"\n    DemucsSeparator is responsible for separating audio sources using Demucs models.\n    It initializes with configuration parameters and prepares the model for separation tasks.\n    \"\"\"\n\n    def __init__(self, common_config, arch_config):\n        # Any configuration values which can be shared between architectures should be set already in CommonSeparator,\n        # e.g. user-specified functionality choices (self.output_single_stem) or common model parameters (self.primary_stem_name)\n        super().__init__(config=common_config)\n\n        # Initializing user-configurable parameters, passed through with an mdx_from the CLI or Separator instance\n\n        # Adjust segments to manage RAM or V-RAM usage:\n        # - Smaller sizes consume less resources.\n        # - Bigger sizes consume more resources, but may provide better results.\n        # - \"Default\" picks the optimal size.\n        # DEMUCS_SEGMENTS = (DEF_OPT, '1', '5', '10', '15', '20',\n        #           '25', '30', '35', '40', '45', '50',\n        #           '55', '60', '65', '70', '75', '80',\n        #           '85', '90', '95', '100')\n        self.segment_size = arch_config.get(\"segment_size\", \"Default\")\n\n        # Performs multiple predictions with random shifts of the input and averages them.\n        # The higher number of shifts, the longer the prediction will take.\n        # Not recommended unless you have a GPU.\n        # DEMUCS_SHIFTS = (0, 1, 2, 3, 4, 5,\n        #                 6, 7, 8, 9, 10, 11,\n        #                 12, 13, 14, 15, 16, 17,\n        #                 18, 19, 20)\n        self.shifts = arch_config.get(\"shifts\", 2)\n\n        # This option controls the amount of overlap between prediction windows.\n        #  - Higher values can provide better results, but will lead to longer processing times.\n        #  - You can choose between 0.001-0.999\n        # DEMUCS_OVERLAP = (0.25, 0.50, 0.75, 0.99)\n        self.overlap = arch_config.get(\"overlap\", 0.25)\n\n        # Enables \"Segments\". Deselecting this option is only recommended for those with powerful PCs.\n        self.segments_enabled = arch_config.get(\"segments_enabled\", True)\n\n        self.logger.debug(f\"Demucs arch params: segment_size={self.segment_size}, segments_enabled={self.segments_enabled}\")\n        self.logger.debug(f\"Demucs arch params: shifts={self.shifts}, overlap={self.overlap}\")\n\n        self.demucs_source_map = DEMUCS_4_SOURCE_MAPPER\n\n        self.audio_file_path = None\n        self.audio_file_base = None\n        self.demucs_model_instance = None\n\n        # Add uvr_lib_v5 folder to system path so pytorch serialization can find the demucs module\n        current_dir = os.path.dirname(__file__)\n        uvr_lib_v5_path = os.path.join(current_dir, \"..\", \"uvr_lib_v5\")\n        sys.path.insert(0, uvr_lib_v5_path)\n\n        self.logger.info(\"Demucs Separator initialisation complete\")\n\n    def separate(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Separates the audio file into its component stems using the Demucs model.\n\n        Args:\n            audio_file_path (str): The path to the audio file to be processed.\n            custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n            list: A list of paths to the output files generated by the separation process.\n        \"\"\"\n        self.logger.debug(\"Starting separation process...\")\n        source = None\n        stem_source = None\n        inst_source = {}\n\n        self.audio_file_path = audio_file_path\n        self.audio_file_base = os.path.splitext(os.path.basename(audio_file_path))[0]\n\n        # Prepare the mix for processing\n        self.logger.debug(\"Preparing mix...\")\n        mix = self.prepare_mix(self.audio_file_path)\n\n        self.logger.debug(f\"Mix prepared for demixing. Shape: {mix.shape}\")\n\n        self.logger.debug(\"Loading model for demixing...\")\n\n        self.demucs_model_instance = HDemucs(sources=DEMUCS_4_SOURCE)\n        self.demucs_model_instance = get_demucs_model(name=os.path.splitext(os.path.basename(self.model_path))[0], repo=Path(os.path.dirname(self.model_path)))\n        self.demucs_model_instance = demucs_segments(self.segment_size, self.demucs_model_instance)\n        self.demucs_model_instance.to(self.torch_device)\n        self.demucs_model_instance.eval()\n\n        self.logger.debug(\"Model loaded and set to evaluation mode.\")\n\n        source = self.demix_demucs(mix)\n\n        del self.demucs_model_instance\n        self.clear_gpu_cache()\n        self.logger.debug(\"Model and GPU cache cleared after demixing.\")\n\n        output_files = []\n        self.logger.debug(\"Processing output files...\")\n\n        if isinstance(inst_source, np.ndarray):\n            self.logger.debug(\"Processing instance source...\")\n            source_reshape = spec_utils.reshape_sources(inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]], source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]])\n            inst_source[self.demucs_source_map[CommonSeparator.VOCAL_STEM]] = source_reshape\n            source = inst_source\n\n        if isinstance(source, np.ndarray):\n            source_length = len(source)\n            self.logger.debug(f\"Processing source array, source length is {source_length}\")\n            match source_length:\n                case 2:\n                    self.logger.debug(\"Setting source map to 2-stem...\")\n                    self.demucs_source_map = DEMUCS_2_SOURCE_MAPPER\n                case 6:\n                    self.logger.debug(\"Setting source map to 6-stem...\")\n                    self.demucs_source_map = DEMUCS_6_SOURCE_MAPPER\n                case _:\n                    self.logger.debug(\"Setting source map to 4-stem...\")\n                    self.demucs_source_map = DEMUCS_4_SOURCE_MAPPER\n\n        self.logger.debug(\"Processing for all stems...\")\n        for stem_name, stem_value in self.demucs_source_map.items():\n            if self.output_single_stem is not None:\n                if stem_name.lower() != self.output_single_stem.lower():\n                    self.logger.debug(f\"Skipping writing stem {stem_name} as output_single_stem is set to {self.output_single_stem}...\")\n                    continue\n\n            stem_path = self.get_stem_output_path(stem_name, custom_output_names)\n            stem_source = source[stem_value].T\n\n            self.final_process(stem_path, stem_source, stem_name)\n            output_files.append(stem_path)\n\n        return output_files\n\n    def demix_demucs(self, mix):\n        \"\"\"\n        Demixes the input mix using the demucs model.\n        \"\"\"\n        self.logger.debug(\"Starting demixing process in demix_demucs...\")\n\n        processed = {}\n        mix = torch.tensor(mix, dtype=torch.float32)\n        ref = mix.mean(0)\n        mix = (mix - ref.mean()) / ref.std()\n        mix_infer = mix\n\n        with torch.no_grad():\n            self.logger.debug(\"Running model inference...\")\n            sources = apply_model(\n                model=self.demucs_model_instance,\n                mix=mix_infer[None],\n                shifts=self.shifts,\n                split=self.segments_enabled,\n                overlap=self.overlap,\n                static_shifts=1 if self.shifts == 0 else self.shifts,\n                set_progress_bar=None,\n                device=self.torch_device,\n                progress=True,\n            )[0]\n\n        sources = (sources * ref.std() + ref.mean()).cpu().numpy()\n        sources[[0, 1]] = sources[[1, 0]]\n        processed[mix] = sources[:, :, 0:None].copy()\n        sources = list(processed.values())\n        sources = [s[:, :, 0:None] for s in sources]\n        sources = np.concatenate(sources, axis=-1)\n\n        return sources\n"
  },
  {
    "path": "audio_separator/separator/architectures/mdx_separator.py",
    "content": "\"\"\"Module for separating audio sources using MDX architecture models.\"\"\"\n\nimport os\nimport platform\nimport torch\nimport onnx\nimport onnxruntime as ort\nimport numpy as np\nimport onnx2torch\nfrom tqdm import tqdm\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\nfrom audio_separator.separator.uvr_lib_v5.stft import STFT\nfrom audio_separator.separator.common_separator import CommonSeparator\n\n\nclass MDXSeparator(CommonSeparator):\n    \"\"\"\n    MDXSeparator is responsible for separating audio sources using MDX models.\n    It initializes with configuration parameters and prepares the model for separation tasks.\n    \"\"\"\n\n    def __init__(self, common_config, arch_config):\n        # Any configuration values which can be shared between architectures should be set already in CommonSeparator,\n        # e.g. user-specified functionality choices (self.output_single_stem) or common model parameters (self.primary_stem_name)\n        super().__init__(config=common_config)\n\n        # Initializing user-configurable parameters, passed through with an mdx_from the CLI or Separator instance\n\n        # Pick a segment size to balance speed, resource use, and quality:\n        # - Smaller sizes consume less resources.\n        # - Bigger sizes consume more resources, but may provide better results.\n        # - Default size is 256. Quality can change based on your pick.\n        self.segment_size = arch_config.get(\"segment_size\")\n\n        # This option controls the amount of overlap between prediction windows.\n        #  - Higher values can provide better results, but will lead to longer processing times.\n        #  - For Non-MDX23C models: You can choose between 0.001-0.999\n        self.overlap = arch_config.get(\"overlap\")\n\n        # Number of batches to be processed at a time.\n        # - Higher values mean more RAM usage but slightly faster processing times.\n        # - Lower values mean less RAM usage but slightly longer processing times.\n        # - Batch size value has no effect on output quality.\n        # BATCH_SIZE = ('1', ''2', '3', '4', '5', '6', '7', '8', '9', '10')\n        self.batch_size = arch_config.get(\"batch_size\", 1)\n\n        # hop_length is equivalent to the more commonly used term \"stride\" in convolutional neural networks\n        # In machine learning, particularly in the context of convolutional neural networks (CNNs),\n        # the term \"stride\" refers to the number of pixels by which we move the filter across the input image.\n        # Strides are a crucial component in the convolution operation, a fundamental building block of CNNs used primarily in the field of computer vision.\n        # Stride is a parameter that dictates the movement of the kernel, or filter, across the input data, such as an image.\n        # When performing a convolution operation, the stride determines how many units the filter shifts at each step.\n        # The choice of stride affects the model in several ways:\n        # Output Size: A larger stride will result in a smaller output spatial dimension.\n        # Computational Efficiency: Increasing the stride can decrease the computational load.\n        # Field of View: A higher stride means that each step of the filter takes into account a wider area of the input image.\n        #   This can be beneficial when the model needs to capture more global features rather than focusing on finer details.\n        self.hop_length = arch_config.get(\"hop_length\")\n\n        # If enabled, model will be run twice to reduce noise in output audio.\n        self.enable_denoise = arch_config.get(\"enable_denoise\")\n\n        self.logger.debug(f\"MDX arch params: batch_size={self.batch_size}, segment_size={self.segment_size}\")\n        self.logger.debug(f\"MDX arch params: overlap={self.overlap}, hop_length={self.hop_length}, enable_denoise={self.enable_denoise}\")\n\n        # Initializing model-specific parameters from model_data JSON\n        self.compensate = self.model_data[\"compensate\"]\n        self.dim_f = self.model_data[\"mdx_dim_f_set\"]\n        self.dim_t = 2 ** self.model_data[\"mdx_dim_t_set\"]\n        self.n_fft = self.model_data[\"mdx_n_fft_scale_set\"]\n        self.config_yaml = self.model_data.get(\"config_yaml\", None)\n\n        self.logger.debug(f\"MDX arch params: compensate={self.compensate}, dim_f={self.dim_f}, dim_t={self.dim_t}, n_fft={self.n_fft}\")\n        self.logger.debug(f\"MDX arch params: config_yaml={self.config_yaml}\")\n\n        # In UVR, these variables are set but either aren't useful or are better handled in audio-separator.\n        # Leaving these comments explaining to help myself or future developers understand why these aren't in audio-separator.\n\n        # \"chunks\" is not actually used for anything in UVR...\n        # self.chunks = 0\n\n        # \"adjust\" is hard-coded to 1 in UVR, and only used as a multiplier in run_model, so it does nothing.\n        # self.adjust = 1\n\n        # \"hop\" is hard-coded to 1024 in UVR. We have a \"hop_length\" parameter instead\n        # self.hop = 1024\n\n        # \"margin\" maps to sample rate and is set from the GUI in UVR (default: 44100). We have a \"sample_rate\" parameter instead.\n        # self.margin = 44100\n\n        # \"dim_c\" is hard-coded to 4 in UVR, seems to be a parameter for the number of channels, and is only used for checkpoint models.\n        # We haven't implemented support for the checkpoint models here, so we're not using it.\n        # self.dim_c = 4\n\n        self.load_model()\n\n        self.n_bins = 0\n        self.trim = 0\n        self.chunk_size = 0\n        self.gen_size = 0\n        self.stft = None\n\n        self.primary_source = None\n        self.secondary_source = None\n        self.audio_file_path = None\n        self.audio_file_base = None\n\n    def load_model(self):\n        \"\"\"\n        Load the model into memory from file on disk, initialize it with config from the model data,\n        and prepare for inferencing using hardware accelerated Torch device.\n        \"\"\"\n        self.logger.debug(\"Loading ONNX model for inference...\")\n\n        if self.segment_size == self.dim_t:\n            ort_session_options = ort.SessionOptions()\n            if self.log_level > 10:\n                ort_session_options.log_severity_level = 3\n            else:\n                ort_session_options.log_severity_level = 0\n\n            ort_inference_session = ort.InferenceSession(self.model_path, providers=self.onnx_execution_provider, sess_options=ort_session_options)\n            self.model_run = lambda spek: ort_inference_session.run(None, {\"input\": spek.cpu().numpy()})[0]\n            self.logger.debug(\"Model loaded successfully using ONNXruntime inferencing session.\")\n        else:\n            if platform.system() == 'Windows':\n                onnx_model = onnx.load(self.model_path)\n                self.model_run = onnx2torch.convert(onnx_model)\n            else:\n                self.model_run = onnx2torch.convert(self.model_path)\n   \n            self.model_run.to(self.torch_device).eval()\n            self.logger.warning(\"Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.\")\n\n    def separate(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Separates the audio file into primary and secondary sources based on the model's configuration.\n        It processes the mix, demixes it into sources, normalizes the sources, and saves the output files.\n\n        Args:\n            audio_file_path (str): The path to the audio file to be processed.\n            custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n            list: A list of paths to the output files generated by the separation process.\n        \"\"\"\n        self.audio_file_path = audio_file_path\n        self.audio_file_base = os.path.splitext(os.path.basename(audio_file_path))[0]\n\n        # Prepare the mix for processing\n        self.logger.debug(f\"Preparing mix for input audio file {self.audio_file_path}...\")\n        mix = self.prepare_mix(self.audio_file_path)\n\n        self.logger.debug(\"Normalizing mix before demixing...\")\n        peak = np.abs(mix).max()\n        mix = spec_utils.normalize(wave=mix, max_peak=self.normalization_threshold, min_peak=self.amplification_threshold)\n\n        # Start the demixing process\n        source = self.demix(mix) * peak\n        self.logger.debug(\"Demixing completed.\")\n\n\n        if not isinstance(self.primary_source, np.ndarray):\n            self.primary_source = source.T\n\n        # In UVR, the source is cached here if it's a vocal split model, but we're not supporting that yet\n\n        # Initialize the list for output files\n        output_files = []\n        self.logger.debug(\"Processing output files...\")\n\n        # Process the secondary source if not already an array\n        if not isinstance(self.secondary_source, np.ndarray):\n            self.logger.debug(\"Producing secondary source: demixing in match_mix mode\")\n            raw_mix = self.demix(mix, is_match_mix=True)\n\n            if self.invert_using_spec:\n                self.logger.debug(\"Inverting secondary stem using spectogram as invert_using_spec is set to True\")\n                self.secondary_source = spec_utils.invert_stem(raw_mix, self.primary_source * self.compensate)\n            else:\n                self.logger.debug(\"Inverting secondary stem by subtracting of transposed demixed stem from transposed original mix\")\n                self.secondary_source = (-self.primary_source * self.compensate) + mix.T\n\n        # Save and process the secondary stem if needed\n        if not self.output_single_stem or self.output_single_stem.lower() == self.secondary_stem_name.lower():\n            self.secondary_stem_output_path = self.get_stem_output_path(self.secondary_stem_name, custom_output_names)\n\n            self.logger.info(f\"Saving {self.secondary_stem_name} stem to {self.secondary_stem_output_path}...\")\n            self.final_process(self.secondary_stem_output_path, self.secondary_source, self.secondary_stem_name)\n            output_files.append(self.secondary_stem_output_path)\n\n        # Save and process the primary stem if needed\n        if not self.output_single_stem or self.output_single_stem.lower() == self.primary_stem_name.lower():\n            self.primary_stem_output_path = self.get_stem_output_path(self.primary_stem_name, custom_output_names)\n            self.logger.info(f\"Saving {self.primary_stem_name} stem to {self.primary_stem_output_path}...\")\n            self.final_process(self.primary_stem_output_path, self.primary_source, self.primary_stem_name)\n            output_files.append(self.primary_stem_output_path)\n\n        # Not yet implemented from UVR features:\n        # self.process_vocal_split_chain(secondary_sources)\n        # self.logger.debug(\"Vocal split chain processed.\")\n\n        return output_files\n\n    def initialize_model_settings(self):\n        \"\"\"\n        This function sets up the necessary parameters for the model, like the number of frequency bins (n_bins), the trimming size (trim),\n        the size of each audio chunk (chunk_size), and the window function for spectral transformations (window).\n        It ensures that the model is configured with the correct settings for processing the audio data.\n        \"\"\"\n        self.logger.debug(\"Initializing model settings...\")\n\n        # n_bins is half the FFT size plus one (self.n_fft // 2 + 1).\n        self.n_bins = self.n_fft // 2 + 1\n\n        # trim is half the FFT size (self.n_fft // 2).\n        self.trim = self.n_fft // 2\n\n        # chunk_size is the hop_length size times the segment size minus one\n        self.chunk_size = self.hop_length * (self.segment_size - 1)\n\n        # gen_size is the chunk size minus twice the trim size\n        self.gen_size = self.chunk_size - 2 * self.trim\n\n        self.stft = STFT(self.logger, self.n_fft, self.hop_length, self.dim_f, self.torch_device)\n\n        self.logger.debug(f\"Model input params: n_fft={self.n_fft} hop_length={self.hop_length} dim_f={self.dim_f}\")\n        self.logger.debug(f\"Model settings: n_bins={self.n_bins}, trim={self.trim}, chunk_size={self.chunk_size}, gen_size={self.gen_size}\")\n\n    def initialize_mix(self, mix, is_ckpt=False):\n        \"\"\"\n        After prepare_mix segments the audio, initialize_mix further processes each segment.\n        It ensures each audio segment is in the correct format for the model, applies necessary padding,\n        and converts the segments into tensors for processing with the model.\n        This step is essential for preparing the audio data in a format that the neural network can process.\n        \"\"\"\n        # Log the initialization of the mix and whether checkpoint mode is used\n        self.logger.debug(f\"Initializing mix with is_ckpt={is_ckpt}. Initial mix shape: {mix.shape}\")\n\n        # Ensure the mix is a 2-channel (stereo) audio signal\n        if mix.shape[0] != 2:\n            error_message = f\"Expected a 2-channel audio signal, but got {mix.shape[0]} channels\"\n            self.logger.error(error_message)\n            raise ValueError(error_message)\n\n        # If in checkpoint mode, process the mix differently\n        if is_ckpt:\n            self.logger.debug(\"Processing in checkpoint mode...\")\n            # Calculate padding based on the generation size and trim\n            pad = self.gen_size + self.trim - (mix.shape[-1] % self.gen_size)\n            self.logger.debug(f\"Padding calculated: {pad}\")\n            # Add padding at the beginning and the end of the mix\n            mixture = np.concatenate(\n                (\n                    np.zeros((2, self.trim), dtype=\"float32\"),  # Pad at the start\n                    mix,\n                    np.zeros((2, pad), dtype=\"float32\"),        # Pad in the middle (to match chunk size)\n                    np.zeros((2, self.trim), dtype=\"float32\"),  # Pad at the end\n                ),\n                1\n            )\n            # Determine the number of chunks based on the mixture's length\n            num_chunks = mixture.shape[-1] // self.gen_size\n            self.logger.debug(f\"Mixture shape after padding: {mixture.shape}, Number of chunks: {num_chunks}\")\n            # Split the mixture into chunks\n            mix_waves = [mixture[:, i * self.gen_size : i * self.gen_size + self.chunk_size] for i in range(num_chunks)]\n        else:\n            # If not in checkpoint mode, process normally\n            self.logger.debug(\"Processing in non-checkpoint mode...\")\n            mix_waves = []\n            n_sample = mix.shape[1]\n            # Calculate necessary padding to make the total length divisible by the generation size\n            pad = self.gen_size - n_sample % self.gen_size\n            self.logger.debug(f\"Number of samples: {n_sample}, Padding calculated: {pad}\")\n            # Apply padding to the mix\n            mix_p = np.concatenate((np.zeros((2, self.trim)), mix, np.zeros((2, pad)), np.zeros((2, self.trim))), 1)\n            self.logger.debug(f\"Shape of mix after padding: {mix_p.shape}\")\n\n            # Process the mix in chunks\n            i = 0\n            while i < n_sample + pad:\n                waves = np.array(mix_p[:, i : i + self.chunk_size])\n                mix_waves.append(waves)\n                self.logger.debug(f\"Processed chunk {len(mix_waves)}: Start {i}, End {i + self.chunk_size}\")\n                i += self.gen_size\n\n        # Convert the list of wave chunks into a tensor for processing on the specified device\n        mix_waves_tensor = torch.tensor(mix_waves, dtype=torch.float32).to(self.torch_device)\n        self.logger.debug(f\"Converted mix_waves to tensor. Tensor shape: {mix_waves_tensor.shape}\")\n\n        return mix_waves_tensor, pad\n\n    def demix(self, mix, is_match_mix=False):\n        \"\"\"\n        Demixes the input mix into its constituent sources. If is_match_mix is True, the function adjusts the processing\n        to better match the mix, affecting chunk sizes and overlaps. The demixing process involves padding the mix,\n        processing it in chunks, applying windowing for overlaps, and accumulating the results to separate the sources.\n        \"\"\"\n        self.logger.debug(f\"Starting demixing process with is_match_mix: {is_match_mix}...\")\n        self.initialize_model_settings()\n\n        # Preserves the original mix for later use.\n        # In UVR, this is used for the pitch fix and VR denoise processes, which aren't yet implemented here.\n        org_mix = mix\n        self.logger.debug(f\"Original mix stored. Shape: {org_mix.shape}\")\n\n        # Initializes a list to store the separated waveforms.\n        tar_waves_ = []\n\n        # Handling different chunk sizes and overlaps based on the matching requirement.\n        if is_match_mix:\n            # Sets a smaller chunk size specifically for matching the mix.\n            chunk_size = self.hop_length * (self.segment_size - 1)\n            # Sets a small overlap for the chunks.\n            overlap = 0.02\n            self.logger.debug(f\"Chunk size for matching mix: {chunk_size}, Overlap: {overlap}\")\n        else:\n            # Uses the regular chunk size defined in model settings.\n            chunk_size = self.chunk_size\n            # Uses the overlap specified in the model settings.\n            overlap = self.overlap\n            self.logger.debug(f\"Standard chunk size: {chunk_size}, Overlap: {overlap}\")\n\n        # Calculates the generated size after subtracting the trim from both ends of the chunk.\n        gen_size = chunk_size - 2 * self.trim\n        self.logger.debug(f\"Generated size calculated: {gen_size}\")\n\n        # Calculates padding to make the mix length a multiple of the generated size.\n        pad = gen_size + self.trim - ((mix.shape[-1]) % gen_size)\n        # Prepares the mixture with padding at the beginning and the end.\n        mixture = np.concatenate((np.zeros((2, self.trim), dtype=\"float32\"), mix, np.zeros((2, pad), dtype=\"float32\")), 1)\n        self.logger.debug(f\"Mixture prepared with padding. Mixture shape: {mixture.shape}\")\n\n        # Calculates the step size for processing chunks based on the overlap.\n        step = int((1 - overlap) * chunk_size)\n        self.logger.debug(f\"Step size for processing chunks: {step} as overlap is set to {overlap}.\")\n\n        # Initializes arrays to store the results and to account for overlap.\n        result = np.zeros((1, 2, mixture.shape[-1]), dtype=np.float32)\n        divider = np.zeros((1, 2, mixture.shape[-1]), dtype=np.float32)\n\n        # Initializes counters for processing chunks.\n        total = 0\n        total_chunks = (mixture.shape[-1] + step - 1) // step\n        self.logger.debug(f\"Total chunks to process: {total_chunks}\")\n\n        # Processes each chunk of the mixture.\n        for i in tqdm(range(0, mixture.shape[-1], step)):\n            total += 1\n            start = i\n            end = min(i + chunk_size, mixture.shape[-1])\n            self.logger.debug(f\"Processing chunk {total}/{total_chunks}: Start {start}, End {end}\")\n\n            # Handles windowing for overlapping chunks.\n            chunk_size_actual = end - start\n            window = None\n            if overlap != 0:\n                window = np.hanning(chunk_size_actual)\n                window = np.tile(window[None, None, :], (1, 2, 1))\n                self.logger.debug(\"Window applied to the chunk.\")\n\n            # Zero-pad the chunk to prepare it for processing.\n            mix_part_ = mixture[:, start:end]\n            if end != i + chunk_size:\n                pad_size = (i + chunk_size) - end\n                mix_part_ = np.concatenate((mix_part_, np.zeros((2, pad_size), dtype=\"float32\")), axis=-1)\n\n            # Converts the chunk to a tensor for processing.\n            mix_part = torch.tensor([mix_part_], dtype=torch.float32).to(self.torch_device)\n            # Splits the chunk into smaller batches if necessary.\n            mix_waves = mix_part.split(self.batch_size)\n            total_batches = len(mix_waves)\n            self.logger.debug(f\"Mix part split into batches. Number of batches: {total_batches}\")\n\n            with torch.no_grad():\n                # Processes each batch in the chunk.\n                batches_processed = 0\n                for mix_wave in mix_waves:\n                    batches_processed += 1\n                    self.logger.debug(f\"Processing mix_wave batch {batches_processed}/{total_batches}\")\n\n                    # Runs the model to separate the sources.\n                    tar_waves = self.run_model(mix_wave, is_match_mix=is_match_mix)\n\n                    # Applies windowing if needed and accumulates the results.\n                    if window is not None:\n                        tar_waves[..., :chunk_size_actual] *= window\n                        divider[..., start:end] += window\n                    else:\n                        divider[..., start:end] += 1\n\n                    result[..., start:end] += tar_waves[..., : end - start]\n\n        # Normalizes the results by the divider to account for overlap.\n        self.logger.debug(\"Normalizing result by dividing result by divider.\")\n        tar_waves = result / divider\n        tar_waves_.append(tar_waves)\n\n        # Reshapes the results to match the original dimensions.\n        tar_waves_ = np.vstack(tar_waves_)[:, :, self.trim : -self.trim]\n        tar_waves = np.concatenate(tar_waves_, axis=-1)[:, : mix.shape[-1]]\n\n        # Extracts the source from the results.\n        source = tar_waves[:, 0:None]\n        self.logger.debug(f\"Concatenated tar_waves. Shape: {tar_waves.shape}\")\n\n        # TODO: In UVR, pitch changing happens here. Consider implementing this as a feature.\n\n        # TODO: In UVR, VR denoise model gets applied here. Consider implementing this as a feature.\n\n        self.logger.debug(\"Demixing process completed.\")\n        return source\n\n    def run_model(self, mix, is_match_mix=False):\n        \"\"\"\n        Processes the input mix through the model to separate the sources.\n        Applies STFT, handles spectrum modifications, and runs the model for source separation.\n        \"\"\"\n        # Applying the STFT to the mix. The mix is moved to the specified device (e.g., GPU) before processing.\n        # self.logger.debug(f\"Running STFT on the mix. Mix shape before STFT: {mix.shape}\")\n        spek = self.stft(mix.to(self.torch_device))\n        self.logger.debug(f\"STFT applied on mix. Spectrum shape: {spek.shape}\")\n\n        # Zeroing out the first 3 bins of the spectrum. This is often done to reduce low-frequency noise.\n        spek[:, :, :3, :] *= 0\n        # self.logger.debug(\"First 3 bins of the spectrum zeroed out.\")\n\n        # Handling the case where the mix needs to be matched (is_match_mix = True)\n        if is_match_mix:\n            # self.logger.debug(\"Match mix mode is enabled. Converting spectrum to NumPy array.\")\n            spec_pred = spek.cpu().numpy()\n            self.logger.debug(\"is_match_mix: spectrum prediction obtained directly from STFT output.\")\n        else:\n            # If denoising is enabled, the model is run on both the negative and positive spectrums.\n            if self.enable_denoise:\n                # Assuming spek is a tensor and self.model_run can process it directly\n                spec_pred_neg = self.model_run(-spek)  # Ensure this line correctly negates spek and runs the model\n                spec_pred_pos = self.model_run(spek)\n                # Ensure both spec_pred_neg and spec_pred_pos are tensors before applying operations\n                spec_pred = (spec_pred_neg * -0.5) + (spec_pred_pos * 0.5)  # [invalid-unary-operand-type]\n                self.logger.debug(\"Model run on both negative and positive spectrums for denoising.\")\n            else:\n                spec_pred = self.model_run(spek)\n                self.logger.debug(\"Model run on the spectrum without denoising.\")\n\n        # Applying the inverse STFT to convert the spectrum back to the time domain.\n        result = self.stft.inverse(torch.tensor(spec_pred).to(self.torch_device)).cpu().detach().numpy()\n        self.logger.debug(f\"Inverse STFT applied. Returning result with shape: {result.shape}\")\n\n        return result\n"
  },
  {
    "path": "audio_separator/separator/architectures/mdxc_separator.py",
    "content": "import os\nimport sys\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom ml_collections import ConfigDict\nfrom scipy import signal\n\nfrom audio_separator.separator.common_separator import CommonSeparator\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\nfrom audio_separator.separator.uvr_lib_v5.tfc_tdf_v3 import TFC_TDF_net\n# Roformer direct constructors removed; loading handled via RoformerLoader in CommonSeparator.\n\n\nclass MDXCSeparator(CommonSeparator):\n    \"\"\"\n    MDXCSeparator is responsible for separating audio sources using MDXC models.\n    It initializes with configuration parameters and prepares the model for separation tasks.\n    \"\"\"\n\n    def __init__(self, common_config, arch_config):\n        # Any configuration values which can be shared between architectures should be set already in CommonSeparator,\n        # e.g. user-specified functionality choices (self.output_single_stem) or common model parameters (self.primary_stem_name)\n        super().__init__(config=common_config)\n\n        # Model data is basic overview metadata about the model, e.g. which stem is primary and whether it's a karaoke model\n        # It's loaded in from model_data_new.json in Separator.load_model and there are JSON examples in that method\n        # The instance variable self.model_data is passed through from Separator and set in CommonSeparator\n        self.logger.debug(f\"Model data: {self.model_data}\")\n\n        # Arch Config is the MDXC architecture specific user configuration options, which should all be configurable by the user\n        # either by their Separator class instantiation or by passing in a CLI parameter.\n        # While there are similarities between architectures for some of these (e.g. batch_size), they are deliberately configured\n        # this way as they have architecture-specific default values.\n        self.segment_size = arch_config.get(\"segment_size\", 256)\n\n        # Whether or not to use the segment size from model config, or the default\n        # The segment size is set based on the value provided in a chosen model's associated config file (yaml).\n        self.override_model_segment_size = arch_config.get(\"override_model_segment_size\", False)\n\n        self.overlap = arch_config.get(\"overlap\", 8)\n        self.batch_size = arch_config.get(\"batch_size\", 1)\n\n        # Amount of pitch shift to apply during processing (this does NOT affect the pitch of the output audio):\n        # • Whole numbers indicate semitones.\n        # • Using higher pitches may cut the upper bandwidth, even in high-quality models.\n        # • Upping the pitch can be better for tracks with deeper vocals.\n        # • Dropping the pitch may take more processing time but works well for tracks with high-pitched vocals.\n        self.pitch_shift = arch_config.get(\"pitch_shift\", 0)\n\n        self.process_all_stems = arch_config.get(\"process_all_stems\", True)\n\n        self.logger.debug(f\"MDXC arch params: batch_size={self.batch_size}, segment_size={self.segment_size}, overlap={self.overlap}\")\n        self.logger.debug(f\"MDXC arch params: override_model_segment_size={self.override_model_segment_size}, pitch_shift={self.pitch_shift}\")\n        self.logger.debug(f\"MDXC multi-stem params: process_all_stems={self.process_all_stems}\")\n\n        # Align Roformer detection flag with CommonSeparator to ensure consistent stats/logging\n        self.is_roformer = getattr(self, \"is_roformer_model\", False)\n\n        self.load_model()\n\n        self.primary_source = None\n        self.secondary_source = None\n        self.audio_file_path = None\n        self.audio_file_base = None\n\n        # Only mark primary stem as main target for single-target models.\n        # Multi-stem models should not trigger residual subtraction logic.\n        self.is_primary_stem_main_target = bool(self.model_data_cfgdict.training.target_instrument)\n\n        self.logger.debug(f\"is_primary_stem_main_target: {self.is_primary_stem_main_target}\")\n\n        self.logger.info(\"MDXC Separator initialisation complete\")\n\n    def load_model(self):\n        \"\"\"\n        Load the model into memory from file on disk, initialize it with config from the model data,\n        and prepare for inferencing using hardware accelerated Torch device.\n        \"\"\"\n        self.logger.debug(\"Loading checkpoint model for inference...\")\n\n        self.model_data_cfgdict = ConfigDict(self.model_data)\n\n        try:\n            if self.is_roformer:\n                # Use the RoformerLoader exclusively; no legacy fallback\n                self.logger.debug(\"Loading Roformer model via RoformerLoader...\")\n                result = self.roformer_loader.load_model(\n                    model_path=self.model_path,\n                    config=self.model_data,\n                    device=str(self.torch_device),\n                )\n\n                if getattr(result, \"success\", False) and getattr(result, \"model\", None) is not None:\n                    self.model_run = result.model\n                    self.model_run.to(self.torch_device).eval()\n                else:\n                    error_msg = getattr(result, \"error_message\", \"RoformerLoader unsuccessful\")\n                    self.logger.error(f\"Failed to load Roformer model: {error_msg}\")\n                    raise RuntimeError(error_msg)\n\n            else:\n                self.logger.debug(\"Loading TFC_TDF_net model...\")\n                self.model_run = TFC_TDF_net(self.model_data_cfgdict, device=self.torch_device)\n                self.logger.debug(\"Loading model onto cpu\")\n                # For some reason loading the state onto a hardware accelerated devices causes issues, \n                # so we load it onto CPU first then move it to the device\n                self.model_run.load_state_dict(torch.load(self.model_path, map_location=\"cpu\"))\n                self.model_run.to(self.torch_device).eval()\n\n        except RuntimeError as e:\n            self.logger.error(f\"Error: {e}\")\n            self.logger.error(\"An error occurred while loading the model file. This often occurs when the model file is corrupt or incomplete.\")\n            self.logger.error(f\"Please try deleting the model file from {self.model_path} and run audio-separator again to re-download it.\")\n            sys.exit(1)\n\n    def separate(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Separates the audio file into primary and secondary sources based on the model's configuration.\n        It processes the mix, demixes it into sources, normalizes the sources, and saves the output files.\n\n        Args:\n            audio_file_path (str): The path to the audio file to be processed.\n            custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n            list: A list of paths to the output files generated by the separation process.\n        \"\"\"\n        self.primary_source = None\n        self.secondary_source = None\n\n        self.audio_file_path = audio_file_path\n        self.audio_file_base = os.path.splitext(os.path.basename(audio_file_path))[0]\n\n        self.logger.debug(f\"Preparing mix for input audio file {self.audio_file_path}...\")\n        mix = self.prepare_mix(self.audio_file_path)\n\n        # Check if audio is shorter than threshold\n        audio_duration_seconds = mix.shape[1] / self.sample_rate\n        if audio_duration_seconds < 10.0:\n            # Only change and warn if it wasn't already set by the user\n            if not self.override_model_segment_size:\n                self.override_model_segment_size = True\n                self.logger.warning(f\"Audio duration ({audio_duration_seconds:.2f}s) is less than 10 seconds.\")\n                self.logger.warning(\"Automatically enabling override_model_segment_size for better processing of short audio.\")\n\n        self.logger.debug(\"Normalizing mix before demixing...\")\n        mix = spec_utils.normalize(wave=mix, max_peak=self.normalization_threshold, min_peak=self.amplification_threshold)\n\n        source = self.demix(mix=mix)\n        self.logger.debug(\"Demixing completed.\")\n\n        output_files = []\n        self.logger.debug(\"Processing output files...\")\n\n        if isinstance(source, dict):\n            self.logger.debug(\"Source is a dict, processing each stem...\")\n            \n            stem_list = []\n            if self.model_data_cfgdict.training.target_instrument:\n                stem_list = [self.model_data_cfgdict.training.target_instrument]\n            else:\n                stem_list = self.model_data_cfgdict.training.instruments\n            \n            self.logger.debug(f\"Available stems: {stem_list}\")\n\n            is_multi_stem_model = len(stem_list) > 2\n            should_process_all_stems = self.process_all_stems and is_multi_stem_model\n            \n            if should_process_all_stems:\n                self.logger.debug(\"Processing all stems from multi-stem model...\")\n                for stem_name in stem_list:\n                    stem_output_path = self.get_stem_output_path(stem_name, custom_output_names)\n                    stem_source = spec_utils.normalize(\n                        wave=source[stem_name], \n                        max_peak=self.normalization_threshold, \n                        min_peak=self.amplification_threshold\n                    ).T\n                    \n                    self.logger.info(f\"Saving {stem_name} stem to {stem_output_path}...\")\n                    self.final_process(stem_output_path, stem_source, stem_name)\n                    output_files.append(stem_output_path)\n            else:\n                # Standard processing for primary and secondary stems\n                if not isinstance(self.primary_source, np.ndarray):\n                    self.logger.debug(f\"Normalizing primary source for primary stem {self.primary_stem_name}...\")\n                    self.primary_source = spec_utils.normalize(\n                        wave=source[self.primary_stem_name], \n                        max_peak=self.normalization_threshold, \n                        min_peak=self.amplification_threshold\n                    ).T\n\n                if not isinstance(self.secondary_source, np.ndarray):\n                    self.logger.debug(f\"Normalizing secondary source for secondary stem {self.secondary_stem_name}...\")\n                    self.secondary_source = spec_utils.normalize(\n                        wave=source[self.secondary_stem_name], \n                        max_peak=self.normalization_threshold, \n                        min_peak=self.amplification_threshold\n                    ).T\n\n                if not self.output_single_stem or self.output_single_stem.lower() == self.secondary_stem_name.lower():\n                    self.secondary_stem_output_path = self.get_stem_output_path(self.secondary_stem_name, custom_output_names)\n\n                    self.logger.info(f\"Saving {self.secondary_stem_name} stem to {self.secondary_stem_output_path}...\")\n                    self.final_process(self.secondary_stem_output_path, self.secondary_source, self.secondary_stem_name)\n                    output_files.append(self.secondary_stem_output_path)\n                \n                if not self.output_single_stem or self.output_single_stem.lower() == self.primary_stem_name.lower():\n                    self.primary_stem_output_path = self.get_stem_output_path(self.primary_stem_name, custom_output_names)\n                    \n                    self.logger.info(f\"Saving {self.primary_stem_name} stem to {self.primary_stem_output_path}...\")\n                    self.final_process(self.primary_stem_output_path, self.primary_source, self.primary_stem_name)\n                    output_files.append(self.primary_stem_output_path)\n\n        else:\n            # Handle case when source is not a dictionary (single source model)\n            if not self.output_single_stem or self.output_single_stem.lower() == self.primary_stem_name.lower():\n                self.primary_stem_output_path = self.get_stem_output_path(self.primary_stem_name, custom_output_names)\n\n                if not isinstance(self.primary_source, np.ndarray):\n                    self.primary_source = source.T\n\n                self.logger.info(f\"Saving {self.primary_stem_name} stem to {self.primary_stem_output_path}...\")\n                self.final_process(self.primary_stem_output_path, self.primary_source, self.primary_stem_name)\n                output_files.append(self.primary_stem_output_path)\n\n        return output_files\n\n    def pitch_fix(self, source, sr_pitched, orig_mix):\n        \"\"\"\n        Change the pitch of the source audio by a number of semitones.\n\n        Args:\n            source (np.ndarray): The source audio to be pitch-shifted.\n            sr_pitched (int): The sample rate of the pitch-shifted audio.\n            orig_mix (np.ndarray): The original mix, used to match the shape of the pitch-shifted audio.\n\n        Returns:\n            np.ndarray: The pitch-shifted source audio.\n        \"\"\"\n        source = spec_utils.change_pitch_semitones(source, sr_pitched, semitone_shift=self.pitch_shift)[0]\n        source = spec_utils.match_array_shapes(source, orig_mix)\n        return source\n\n    def overlap_add(self, result, x, weights, start, length):\n        \"\"\"\n        Adds the overlapping part of the result to the result tensor.\n        \"\"\"\n        # Guard against minor shape mismatches from model output length\n        # Use the minimum of provided lengths to avoid broadcasting errors\n        safe_len = min(length, x.shape[-1], weights.shape[0])\n        if safe_len > 0:\n            result[..., start : start + safe_len] += x[..., :safe_len] * weights[:safe_len]\n        return result\n\n    def demix(self, mix: np.ndarray) -> dict:\n        \"\"\"\n        Demixes the input mix into primary and secondary sources using the model and model data.\n\n        Args:\n            mix (np.ndarray): The mix to be demixed.\n        Returns:\n            dict: A dictionary containing the demixed sources.\n        \"\"\"\n        orig_mix = mix\n\n        if self.pitch_shift != 0:\n            self.logger.debug(f\"Shifting pitch by -{self.pitch_shift} semitones...\")\n            mix, sample_rate = spec_utils.change_pitch_semitones(mix, self.sample_rate, semitone_shift=-self.pitch_shift)\n\n        if self.is_roformer:\n            # Note: Currently, for Roformer models, `batch_size` is not utilized due to negligible performance improvements.\n\n            mix = torch.tensor(mix, dtype=torch.float32)\n\n            if self.override_model_segment_size:\n                mdx_segment_size = self.segment_size\n                self.logger.debug(f\"Using configured segment size: {mdx_segment_size}\")\n            else:\n                mdx_segment_size = self.model_data_cfgdict.inference.dim_t\n                self.logger.debug(f\"Using model default segment size: {mdx_segment_size}\")\n\n            # num_stems aka \"S\" in UVR\n            num_stems = 1 if self.model_data_cfgdict.training.target_instrument else len(self.model_data_cfgdict.training.instruments)\n            self.logger.debug(f\"Number of stems: {num_stems}\")\n\n            # chunk_size aka \"C\" in UVR\n            # IMPORTANT: For Roformer models, use the model's STFT hop length to derive the temporal chunk size\n            stft_hop_len = getattr(self.model_data_cfgdict.model, \"stft_hop_length\", None)\n            if stft_hop_len is None:\n                # Fallback to audio.hop_length if not present, but log for visibility\n                stft_hop_len = self.model_data_cfgdict.audio.hop_length\n                self.logger.debug(\n                    f\"Model.stft_hop_length missing; falling back to audio.hop_length={stft_hop_len}\"\n                )\n\n            chunk_size = int(stft_hop_len) * (int(mdx_segment_size) - 1)\n            self.logger.debug(\n                f\"Chunk size: {chunk_size} (using stft_hop_length={stft_hop_len} and dim_t={mdx_segment_size})\"\n            )\n\n            # Align step to chunk_size by default for Roformer to avoid stride mismatches\n            # If a user-specified overlap (in seconds) results in a step larger than chunk_size, clamp it\n            desired_step = int(self.overlap * self.model_data_cfgdict.audio.sample_rate)\n            step = chunk_size if desired_step <= 0 else min(desired_step, chunk_size)\n            self.logger.debug(f\"Step: {step} (desired={desired_step})\")\n\n            # Create a weighting table and convert it to a PyTorch tensor\n            window = torch.tensor(signal.windows.hamming(chunk_size), dtype=torch.float32)\n\n            device = next(self.model_run.parameters()).device\n\n\n            with torch.no_grad():\n                req_shape = (len(self.model_data_cfgdict.training.instruments),) + tuple(mix.shape)\n                result = torch.zeros(req_shape, dtype=torch.float32)\n                counter = torch.zeros(req_shape, dtype=torch.float32)\n\n                for i in tqdm(range(0, mix.shape[1], step)):\n                    part = mix[:, i : i + chunk_size]\n                    length = part.shape[-1]\n                    if i + chunk_size > mix.shape[1]:\n                        part = mix[:, -chunk_size:]\n                        length = chunk_size\n                    part = part.to(device)\n                    x = self.model_run(part.unsqueeze(0))[0]\n                    x = x.cpu()\n                    # Perform overlap_add on CPU\n                    if i + chunk_size > mix.shape[1]:\n                        # Fixed to correctly add to the end of the tensor\n                        start_idx = result.shape[-1] - chunk_size\n                        result = self.overlap_add(result, x, window, start_idx, length)\n                        safe_len = min(length, x.shape[-1], window.shape[0])\n                        if safe_len > 0:\n                            counter[..., start_idx : start_idx + safe_len] += window[:safe_len]\n                    else:\n                        result = self.overlap_add(result, x, window, i, length)\n                        safe_len = min(length, x.shape[-1], window.shape[0])\n                        if safe_len > 0:\n                            counter[..., i : i + safe_len] += window[:safe_len]\n\n            inferenced_outputs = result / counter.clamp(min=1e-10)\n\n        else:\n            mix = torch.tensor(mix, dtype=torch.float32)\n\n            try:\n                num_stems = self.model_run.num_target_instruments\n            except AttributeError:\n                num_stems = self.model_run.module.num_target_instruments\n            self.logger.debug(f\"Number of stems: {num_stems}\")\n\n            if self.override_model_segment_size:\n                mdx_segment_size = self.segment_size\n                self.logger.debug(f\"Using configured segment size: {mdx_segment_size}\")\n            else:\n                mdx_segment_size = self.model_data_cfgdict.inference.dim_t\n                self.logger.debug(f\"Using model default segment size: {mdx_segment_size}\")\n\n            chunk_size = self.model_data_cfgdict.audio.hop_length * (mdx_segment_size - 1)\n            self.logger.debug(f\"Chunk size: {chunk_size}\")\n\n            hop_size = chunk_size // self.overlap\n            self.logger.debug(f\"Hop size: {hop_size}\")\n\n            mix_shape = mix.shape[1]\n            pad_size = hop_size - (mix_shape - chunk_size) % hop_size\n            self.logger.debug(f\"Pad size: {pad_size}\")\n\n            mix = torch.cat([torch.zeros(2, chunk_size - hop_size), mix, torch.zeros(2, pad_size + chunk_size - hop_size)], 1)\n            self.logger.debug(f\"Mix shape: {mix.shape}\")\n\n            chunks = mix.unfold(1, chunk_size, hop_size).transpose(0, 1)\n            self.logger.debug(f\"Chunks length: {len(chunks)} and shape: {chunks.shape}\")\n\n            batches = [chunks[i : i + self.batch_size] for i in range(0, len(chunks), self.batch_size)]\n            self.logger.debug(f\"Batch size: {self.batch_size}, number of batches: {len(batches)}\")\n\n            # accumulated_outputs is used to accumulate the output from processing each batch of chunks through the model.\n            # It starts as a tensor of zeros and is updated in-place as the model processes each batch.\n            # The variable holds the combined result of all processed batches, which, after post-processing, represents the separated audio sources.\n            accumulated_outputs = torch.zeros(num_stems, *mix.shape) if num_stems > 1 else torch.zeros_like(mix)\n\n            with torch.no_grad():\n                count = 0\n                for batch in tqdm(batches):\n                    # Since the model processes the audio data in batches, single_batch_result temporarily holds the model's output\n                    # for each batch before it is accumulated into accumulated_outputs.\n                    single_batch_result = self.model_run(batch.to(self.torch_device))\n\n                    # Each individual output tensor from the current batch's processing result.\n                    # Since single_batch_result can contain multiple output tensors (one for each piece of audio in the batch),\n                    # individual_output is used to iterate through these tensors and accumulate them into accumulated_outputs.\n                    for individual_output in single_batch_result:\n                        individual_output_cpu = individual_output.cpu()\n                        # Accumulate outputs on CPU\n                        accumulated_outputs[..., count * hop_size : count * hop_size + chunk_size] += individual_output_cpu\n                        count += 1\n\n            self.logger.debug(\"Calculating inferenced outputs based on accumulated outputs and overlap\")\n            inferenced_outputs = accumulated_outputs[..., chunk_size - hop_size : -(pad_size + chunk_size - hop_size)] / self.overlap\n            self.logger.debug(\"Deleting accumulated outputs to free up memory\")\n            del accumulated_outputs\n\n        if num_stems > 1:\n            self.logger.debug(\"Number of stems is greater than 1, detaching individual sources and correcting pitch if necessary...\")\n\n            sources = {}\n\n            # Iterates over each instrument specified in the model's configuration and its corresponding separated audio source.\n            # self.model_data_cfgdict.training.instruments provides the list of stems.\n            # estimated_sources.cpu().detach().numpy() converts the separated sources tensor to a NumPy array for processing.\n            # Each iteration provides an instrument name ('key') and its separated audio ('value') for further processing.\n            for key, value in zip(self.model_data_cfgdict.training.instruments, inferenced_outputs.cpu().detach().numpy()):\n                self.logger.debug(f\"Processing instrument: {key}\")\n                if self.pitch_shift != 0:\n                    self.logger.debug(f\"Applying pitch correction for {key}\")\n                    sources[key] = self.pitch_fix(value, sample_rate, orig_mix)\n                else:\n                    sources[key] = value\n\n            # Residual subtraction is only applicable for single-target models (not multi-stem)\n            if self.is_primary_stem_main_target and num_stems == 1:\n                self.logger.debug(f\"Primary stem: {self.primary_stem_name} is main target, detaching and matching array shapes if necessary...\")\n                if sources[self.primary_stem_name].shape[1] != orig_mix.shape[1]:\n                    sources[self.primary_stem_name] = spec_utils.match_array_shapes(sources[self.primary_stem_name], orig_mix)\n                sources[self.secondary_stem_name] = orig_mix - sources[self.primary_stem_name]\n\n            self.logger.debug(\"Deleting inferenced outputs to free up memory\")\n            del inferenced_outputs\n\n            self.logger.debug(\"Returning separated sources\")\n            return sources\n        else:\n            self.logger.debug(\"Processing single source...\")\n\n            if self.is_roformer:\n                sources = {k: v.cpu().detach().numpy() for k, v in zip([self.model_data_cfgdict.training.target_instrument], inferenced_outputs)}\n                inferenced_output = sources[self.model_data_cfgdict.training.target_instrument]\n            else:\n                inferenced_output = inferenced_outputs.cpu().detach().numpy()\n\n            self.logger.debug(\"Demix process completed for single source.\")\n\n            self.logger.debug(\"Deleting inferenced outputs to free up memory\")\n            del inferenced_outputs\n\n            # For single-target models (e.g., karaoke), also return the residual as secondary\n            if self.pitch_shift != 0:\n                self.logger.debug(\"Applying pitch correction for single instrument\")\n                primary = self.pitch_fix(inferenced_output, sample_rate, orig_mix)\n            else:\n                primary = inferenced_output\n\n            if self.is_primary_stem_main_target:\n                self.logger.debug(\"Single-target model detected; computing residual secondary stem from original mix\")\n                # Ensure shapes match before residual subtraction\n                if primary.shape[1] != orig_mix.shape[1]:\n                    primary = spec_utils.match_array_shapes(primary, orig_mix)\n                secondary = orig_mix - primary\n                return {\n                    self.primary_stem_name: primary,\n                    self.secondary_stem_name: secondary,\n                }\n\n            self.logger.debug(\"Returning inferenced output for single instrument\")\n            return primary\n"
  },
  {
    "path": "audio_separator/separator/architectures/vr_separator.py",
    "content": "\"\"\"Module for separating audio sources using VR architecture models.\"\"\"\n\nimport os\nimport math\n\nimport torch\nimport librosa\nimport numpy as np\nfrom tqdm import tqdm\n\n# Check if we really need the rerun_mp3 function, remove if not\nimport audioread\n\nfrom audio_separator.separator.common_separator import CommonSeparator\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\nfrom audio_separator.separator.uvr_lib_v5.vr_network import nets\nfrom audio_separator.separator.uvr_lib_v5.vr_network import nets_new\nfrom audio_separator.separator.uvr_lib_v5.vr_network.model_param_init import ModelParameters\n\n\nclass VRSeparator(CommonSeparator):\n    \"\"\"\n    VRSeparator is responsible for separating audio sources using VR models.\n    It initializes with configuration parameters and prepares the model for separation tasks.\n    \"\"\"\n\n    def __init__(self, common_config, arch_config: dict):\n        # Any configuration values which can be shared between architectures should be set already in CommonSeparator,\n        # e.g. user-specified functionality choices (self.output_single_stem) or common model parameters (self.primary_stem_name)\n        super().__init__(config=common_config)\n\n        # Model data is basic overview metadata about the model, e.g. which stem is primary and whether it's a karaoke model\n        # It's loaded in from model_data_new.json in Separator.load_model and there are JSON examples in that method\n        # The instance variable self.model_data is passed through from Separator and set in CommonSeparator\n        self.logger.debug(f\"Model data: {self.model_data}\")\n\n        # Most of the VR models use the same number of output channels, but the VR 51 models have specific values set in model_data JSON\n        self.model_capacity = 32, 128\n        self.is_vr_51_model = False\n\n        if \"nout\" in self.model_data.keys() and \"nout_lstm\" in self.model_data.keys():\n            self.model_capacity = self.model_data[\"nout\"], self.model_data[\"nout_lstm\"]\n            self.is_vr_51_model = True\n\n        # Model params are additional technical parameter values from JSON files in separator/uvr_lib_v5/vr_network/modelparams/*.json,\n        # with filenames referenced by the model_data[\"vr_model_param\"] value\n        package_root_filepath = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n        vr_params_json_dir = os.path.join(package_root_filepath, \"uvr_lib_v5\", \"vr_network\", \"modelparams\")\n        vr_params_json_filename = f\"{self.model_data['vr_model_param']}.json\"\n        vr_params_json_filepath = os.path.join(vr_params_json_dir, vr_params_json_filename)\n        self.model_params = ModelParameters(vr_params_json_filepath)\n\n        self.logger.debug(f\"Model params: {self.model_params.param}\")\n\n        # Arch Config is the VR architecture specific user configuration options, which should all be configurable by the user\n        # either by their Separator class instantiation or by passing in a CLI parameter.\n        # While there are similarities between architectures for some of these (e.g. batch_size), they are deliberately configured\n        # this way as they have architecture-specific default values.\n\n        # This option performs Test-Time-Augmentation to improve the separation quality.\n        # Note: Having this selected will increase the time it takes to complete a conversion\n        self.enable_tta = arch_config.get(\"enable_tta\", False)\n\n        # This option can potentially identify leftover instrumental artifacts within the vocal outputs; may improve the separation of some songs.\n        # Note: Selecting this option can adversely affect the conversion process, depending on the track. Because of this, it is only recommended as a last resort.\n        self.enable_post_process = arch_config.get(\"enable_post_process\", False)\n\n        # post_process_threshold values = ('0.1', '0.2', '0.3')\n        self.post_process_threshold = arch_config.get(\"post_process_threshold\", 0.2)\n\n        # Number of batches to be processed at a time.\n        # - Higher values mean more RAM usage but slightly faster processing times.\n        # - Lower values mean less RAM usage but slightly longer processing times.\n        # - Batch size value has no effect on output quality.\n\n        # Andrew note: for some reason, lower batch sizes seem to cause broken output for VR arch; need to investigate why\n        self.batch_size = arch_config.get(\"batch_size\", 1)\n\n        # Select window size to balance quality and speed:\n        # - 1024 - Quick but lesser quality.\n        # - 512 - Medium speed and quality.\n        # - 320 - Takes longer but may offer better quality.\n        self.window_size = arch_config.get(\"window_size\", 512)\n\n        # The application will mirror the missing frequency range of the output.\n        self.high_end_process = arch_config.get(\"high_end_process\", False)\n        self.input_high_end_h = None\n        self.input_high_end = None\n\n        # Adjust the intensity of primary stem extraction:\n        # - Ranges from -100 - 100.\n        # - Bigger values mean deeper extractions.\n        # - Typically, it's set to 5 for vocals & instrumentals.\n        # - Values beyond 5 might muddy the sound for non-vocal models.\n        self.aggression = float(int(arch_config.get(\"aggression\", 5)) / 100)\n\n        self.aggressiveness = {\"value\": self.aggression, \"split_bin\": self.model_params.param[\"band\"][1][\"crop_stop\"], \"aggr_correction\": self.model_params.param.get(\"aggr_correction\")}\n\n        self.model_samplerate = self.model_params.param[\"sr\"]\n\n        self.logger.debug(f\"VR arch params: enable_tta={self.enable_tta}, enable_post_process={self.enable_post_process}, post_process_threshold={self.post_process_threshold}\")\n        self.logger.debug(f\"VR arch params: batch_size={self.batch_size}, window_size={self.window_size}\")\n        self.logger.debug(f\"VR arch params: high_end_process={self.high_end_process}, aggression={self.aggression}\")\n        self.logger.debug(f\"VR arch params: is_vr_51_model={self.is_vr_51_model}, model_samplerate={self.model_samplerate}, model_capacity={self.model_capacity}\")\n\n        self.model_run = lambda *args, **kwargs: self.logger.error(\"Model run method is not initialised yet.\")\n\n        # wav_subtype will be set based on input audio bit depth in prepare_mix()\n        # Removed hardcoded \"PCM_16\" to allow bit depth preservation\n\n        self.logger.info(\"VR Separator initialisation complete\")\n\n    def separate(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Separates the audio file into primary and secondary sources based on the model's configuration.\n        It processes the mix, demixes it into sources, normalizes the sources, and saves the output files.\n\n        Args:\n            audio_file_path (str): The path to the audio file to be processed.\n            custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n            list: A list of paths to the output files generated by the separation process.\n        \"\"\"\n        self.primary_source = None\n        self.secondary_source = None\n\n        self.audio_file_path = audio_file_path\n        self.audio_file_base = os.path.splitext(os.path.basename(audio_file_path))[ 0]\n\n        # Detect input audio bit depth for output preservation\n        try:\n            import soundfile as sf\n            info = sf.info(audio_file_path)\n            self.input_audio_subtype = info.subtype\n            self.logger.info(f\"Input audio subtype: {self.input_audio_subtype}\")\n            \n            # Map subtype to wav_subtype for soundfile and set input_bit_depth for pydub\n            if \"24\" in self.input_audio_subtype:\n                self.wav_subtype = \"PCM_24\"\n                self.input_bit_depth = 24\n                self.logger.info(\"Detected 24-bit input audio\")\n            elif \"32\" in self.input_audio_subtype:\n                self.wav_subtype = \"PCM_32\"\n                self.input_bit_depth = 32\n                self.logger.info(\"Detected 32-bit input audio\")\n            else:\n                self.wav_subtype = \"PCM_16\"\n                self.input_bit_depth = 16\n                self.logger.info(\"Detected 16-bit input audio\")\n        except Exception as e:\n            self.logger.warning(f\"Could not detect input audio bit depth: {e}. Defaulting to PCM_16\")\n            self.wav_subtype = \"PCM_16\"\n            self.input_audio_subtype = None\n            self.input_bit_depth = 16\n\n        self.logger.debug(f\"Starting separation for input audio file {self.audio_file_path}...\")\n\n        nn_arch_sizes = [31191, 33966, 56817, 123821, 123812, 129605, 218409, 537238, 537227]  # default\n        vr_5_1_models = [56817, 218409]\n        model_size = math.ceil(os.stat(self.model_path).st_size / 1024)\n        nn_arch_size = min(nn_arch_sizes, key=lambda x: abs(x - model_size))\n        self.logger.debug(f\"Model size determined: {model_size}, NN architecture size: {nn_arch_size}\")\n\n        if nn_arch_size in vr_5_1_models or self.is_vr_51_model:\n            self.logger.debug(\"Using CascadedNet for VR 5.1 model...\")\n            self.model_run = nets_new.CascadedNet(self.model_params.param[\"bins\"] * 2, nn_arch_size, nout=self.model_capacity[0], nout_lstm=self.model_capacity[1])\n            self.is_vr_51_model = True\n        else:\n            self.logger.debug(\"Determining model capacity...\")\n            self.model_run = nets.determine_model_capacity(self.model_params.param[\"bins\"] * 2, nn_arch_size)\n\n        self.model_run.load_state_dict(torch.load(self.model_path, map_location=\"cpu\"))\n        self.model_run.to(self.torch_device)\n        self.logger.debug(\"Model loaded and moved to device.\")\n\n        y_spec, v_spec = self.inference_vr(self.loading_mix(), self.torch_device, self.aggressiveness)\n        self.logger.debug(\"Inference completed.\")\n\n        # Sanitize y_spec and v_spec to replace NaN and infinite values\n        y_spec = np.nan_to_num(y_spec, nan=0.0, posinf=0.0, neginf=0.0)\n        v_spec = np.nan_to_num(v_spec, nan=0.0, posinf=0.0, neginf=0.0)\n\n        self.logger.debug(\"Sanitization completed. Replaced NaN and infinite values in y_spec and v_spec.\")\n\n        # After inference_vr call\n        self.logger.debug(f\"Inference VR completed. y_spec shape: {y_spec.shape}, v_spec shape: {v_spec.shape}\")\n        self.logger.debug(f\"y_spec stats - min: {np.min(y_spec)}, max: {np.max(y_spec)}, isnan: {np.isnan(y_spec).any()}, isinf: {np.isinf(y_spec).any()}\")\n        self.logger.debug(f\"v_spec stats - min: {np.min(v_spec)}, max: {np.max(v_spec)}, isnan: {np.isnan(v_spec).any()}, isinf: {np.isinf(v_spec).any()}\")\n\n        # Not yet implemented from UVR features:\n        #\n        # if not self.is_vocal_split_model:\n        #     self.cache_source((y_spec, v_spec))\n\n        # if self.is_secondary_model_activated and self.secondary_model:\n        #     self.logger.debug(\"Processing secondary model...\")\n        #     self.secondary_source_primary, self.secondary_source_secondary = process_secondary_model(\n        #         self.secondary_model, self.process_data, main_process_method=self.process_method, main_model_primary=self.primary_stem\n        #     )\n\n        # Initialize the list for output files\n        output_files = []\n        self.logger.debug(\"Processing output files...\")\n\n        # Note: logic similar to the following should probably be added to the other architectures\n        # Check if output_single_stem is set to a value that would result in no output files\n        if self.output_single_stem and (self.output_single_stem.lower() != self.primary_stem_name.lower() and self.output_single_stem.lower() != self.secondary_stem_name.lower()):\n            # If so, reset output_single_stem to None to save both stems\n            self.output_single_stem = None\n            self.logger.warning(f\"The output_single_stem setting '{self.output_single_stem}' does not match any of the output files: '{self.primary_stem_name}' and '{self.secondary_stem_name}'. For this model '{self.model_name}', the output_single_stem setting will be ignored and all output files will be saved.\")\n\n        # Save and process the primary stem if needed\n        if not self.output_single_stem or self.output_single_stem.lower() == self.primary_stem_name.lower():\n            self.logger.debug(f\"Processing primary stem: {self.primary_stem_name}\")\n            if not isinstance(self.primary_source, np.ndarray):\n                self.logger.debug(f\"Preparing to convert spectrogram to waveform. Spec shape: {y_spec.shape}\")\n\n                self.primary_source = self.spec_to_wav(y_spec).T\n                self.logger.debug(\"Converting primary source spectrogram to waveform.\")\n                if not self.model_samplerate == 44100:\n                    self.primary_source = librosa.resample(self.primary_source.T, orig_sr=self.model_samplerate, target_sr=44100).T\n                    self.logger.debug(\"Resampling primary source to 44100Hz.\")\n\n            self.primary_stem_output_path = self.get_stem_output_path(self.primary_stem_name, custom_output_names)\n\n            self.logger.info(f\"Saving {self.primary_stem_name} stem to {self.primary_stem_output_path}...\")\n            self.final_process(self.primary_stem_output_path, self.primary_source, self.primary_stem_name)\n            output_files.append(self.primary_stem_output_path)\n\n        # Save and process the secondary stem if needed\n        if not self.output_single_stem or self.output_single_stem.lower() == self.secondary_stem_name.lower():\n            self.logger.debug(f\"Processing secondary stem: {self.secondary_stem_name}\")\n            if not isinstance(self.secondary_source, np.ndarray):\n                self.logger.debug(f\"Preparing to convert spectrogram to waveform. Spec shape: {v_spec.shape}\")\n\n                self.secondary_source = self.spec_to_wav(v_spec).T\n                self.logger.debug(\"Converting secondary source spectrogram to waveform.\")\n                if not self.model_samplerate == 44100:\n                    self.secondary_source = librosa.resample(self.secondary_source.T, orig_sr=self.model_samplerate, target_sr=44100).T\n                    self.logger.debug(\"Resampling secondary source to 44100Hz.\")\n\n            self.secondary_stem_output_path = self.get_stem_output_path(self.secondary_stem_name, custom_output_names)\n\n            self.logger.info(f\"Saving {self.secondary_stem_name} stem to {self.secondary_stem_output_path}...\")\n            self.final_process(self.secondary_stem_output_path, self.secondary_source, self.secondary_stem_name)\n            output_files.append(self.secondary_stem_output_path)\n\n        # Not yet implemented from UVR features:\n        # self.process_vocal_split_chain(secondary_sources)\n        # self.logger.debug(\"Vocal split chain processed.\")\n\n        return output_files\n\n    def loading_mix(self):\n        X_wave, X_spec_s = {}, {}\n\n        bands_n = len(self.model_params.param[\"band\"])\n\n        audio_file = spec_utils.write_array_to_mem(self.audio_file_path, subtype=self.wav_subtype)\n        is_mp3 = audio_file.endswith(\".mp3\") if isinstance(audio_file, str) else False\n\n        self.logger.debug(f\"loading_mix iteraring through {bands_n} bands\")\n        for d in tqdm(range(bands_n, 0, -1)):\n            bp = self.model_params.param[\"band\"][d]\n\n            wav_resolution = bp[\"res_type\"]\n\n            if self.torch_device_mps is not None:\n                wav_resolution = \"polyphase\"\n\n            if d == bands_n:  # high-end band\n                X_wave[d], _ = librosa.load(audio_file, sr=bp[\"sr\"], mono=False, dtype=np.float32, res_type=wav_resolution)\n                X_spec_s[d] = spec_utils.wave_to_spectrogram(X_wave[d], bp[\"hl\"], bp[\"n_fft\"], self.model_params, band=d, is_v51_model=self.is_vr_51_model)\n\n                if not np.any(X_wave[d]) and is_mp3:\n                    X_wave[d] = rerun_mp3(audio_file, bp[\"sr\"])\n\n                if X_wave[d].ndim == 1:\n                    X_wave[d] = np.asarray([X_wave[d], X_wave[d]])\n            else:  # lower bands\n                X_wave[d] = librosa.resample(X_wave[d + 1], orig_sr=self.model_params.param[\"band\"][d + 1][\"sr\"], target_sr=bp[\"sr\"], res_type=wav_resolution)\n                X_spec_s[d] = spec_utils.wave_to_spectrogram(X_wave[d], bp[\"hl\"], bp[\"n_fft\"], self.model_params, band=d, is_v51_model=self.is_vr_51_model)\n\n            if d == bands_n and self.high_end_process:\n                self.input_high_end_h = (bp[\"n_fft\"] // 2 - bp[\"crop_stop\"]) + (self.model_params.param[\"pre_filter_stop\"] - self.model_params.param[\"pre_filter_start\"])\n                self.input_high_end = X_spec_s[d][:, bp[\"n_fft\"] // 2 - self.input_high_end_h : bp[\"n_fft\"] // 2, :]\n\n        X_spec = spec_utils.combine_spectrograms(X_spec_s, self.model_params, is_v51_model=self.is_vr_51_model)\n\n        del X_wave, X_spec_s, audio_file\n\n        return X_spec\n\n    def inference_vr(self, X_spec, device, aggressiveness):\n        def _execute(X_mag_pad, roi_size):\n            X_dataset = []\n            patches = (X_mag_pad.shape[2] - 2 * self.model_run.offset) // roi_size\n\n            self.logger.debug(f\"inference_vr appending to X_dataset for each of {patches} patches\")\n            for i in tqdm(range(patches)):\n                start = i * roi_size\n                X_mag_window = X_mag_pad[:, :, start : start + self.window_size]\n                X_dataset.append(X_mag_window)\n\n            total_iterations = patches // self.batch_size if not self.enable_tta else (patches // self.batch_size) * 2\n            self.logger.debug(f\"inference_vr iterating through {total_iterations} batches, batch_size = {self.batch_size}\")\n\n            X_dataset = np.asarray(X_dataset)\n            self.model_run.eval()\n            with torch.no_grad():\n                mask = []\n\n                for i in tqdm(range(0, patches, self.batch_size)):\n\n                    X_batch = X_dataset[i : i + self.batch_size]\n                    X_batch = torch.from_numpy(X_batch).to(device)\n                    pred = self.model_run.predict_mask(X_batch)\n                    if not pred.size()[3] > 0:\n                        raise ValueError(f\"Window size error: h1_shape[3] must be greater than h2_shape[3]\")\n                    pred = pred.detach().cpu().numpy()\n                    pred = np.concatenate(pred, axis=2)\n                    mask.append(pred)\n                if len(mask) == 0:\n                    raise ValueError(f\"Window size error: h1_shape[3] must be greater than h2_shape[3]\")\n\n                mask = np.concatenate(mask, axis=2)\n            return mask\n\n        def postprocess(mask, X_mag, X_phase):\n            is_non_accom_stem = False\n            for stem in CommonSeparator.NON_ACCOM_STEMS:\n                if stem == self.primary_stem_name:\n                    is_non_accom_stem = True\n\n            mask = spec_utils.adjust_aggr(mask, is_non_accom_stem, aggressiveness)\n\n            if self.enable_post_process:\n                mask = spec_utils.merge_artifacts(mask, thres=self.post_process_threshold)\n\n            y_spec = mask * X_mag * np.exp(1.0j * X_phase)\n            v_spec = (1 - mask) * X_mag * np.exp(1.0j * X_phase)\n\n            return y_spec, v_spec\n\n        X_mag, X_phase = spec_utils.preprocess(X_spec)\n        n_frame = X_mag.shape[2]\n        pad_l, pad_r, roi_size = spec_utils.make_padding(n_frame, self.window_size, self.model_run.offset)\n        X_mag_pad = np.pad(X_mag, ((0, 0), (0, 0), (pad_l, pad_r)), mode=\"constant\")\n        X_mag_pad /= X_mag_pad.max()\n        mask = _execute(X_mag_pad, roi_size)\n\n        if self.enable_tta:\n            pad_l += roi_size // 2\n            pad_r += roi_size // 2\n            X_mag_pad = np.pad(X_mag, ((0, 0), (0, 0), (pad_l, pad_r)), mode=\"constant\")\n            X_mag_pad /= X_mag_pad.max()\n            mask_tta = _execute(X_mag_pad, roi_size)\n            mask_tta = mask_tta[:, :, roi_size // 2 :]\n            mask = (mask[:, :, :n_frame] + mask_tta[:, :, :n_frame]) * 0.5\n        else:\n            mask = mask[:, :, :n_frame]\n\n        y_spec, v_spec = postprocess(mask, X_mag, X_phase)\n\n        return y_spec, v_spec\n\n    def spec_to_wav(self, spec):\n        if self.high_end_process and isinstance(self.input_high_end, np.ndarray) and self.input_high_end_h:\n            input_high_end_ = spec_utils.mirroring(\"mirroring\", spec, self.input_high_end, self.model_params)\n            wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, self.input_high_end_h, input_high_end_, is_v51_model=self.is_vr_51_model)\n        else:\n            wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, is_v51_model=self.is_vr_51_model)\n\n        return wav\n\n\n# Check if we really need the rerun_mp3 function, refactor or remove if not\ndef rerun_mp3(audio_file, sample_rate=44100):\n    with audioread.audio_open(audio_file) as f:\n        track_length = int(f.duration)\n\n    return librosa.load(audio_file, duration=track_length, mono=False, sr=sample_rate)[0]\n"
  },
  {
    "path": "audio_separator/separator/audio_chunking.py",
    "content": "\"\"\"Audio chunking utilities for processing large audio files to prevent OOM errors.\"\"\"\n\nimport os\nimport logging\nfrom typing import List\nfrom pydub import AudioSegment\n\n\nclass AudioChunker:\n    \"\"\"\n    Handles splitting and merging of large audio files.\n\n    This class provides utilities to:\n    - Split large audio files into fixed-duration chunks\n    - Merge processed chunks back together with simple concatenation\n    - Determine if a file should be chunked based on its duration\n\n    Example:\n        >>> chunker = AudioChunker(chunk_duration_seconds=600)  # 10-minute chunks\n        >>> chunk_paths = chunker.split_audio(\"long_audio.wav\", \"/tmp/chunks\")\n        >>> # Process each chunk...\n        >>> output_path = chunker.merge_chunks(processed_chunks, \"output.wav\")\n    \"\"\"\n\n    def __init__(self, chunk_duration_seconds: float, logger: logging.Logger = None):\n        \"\"\"\n        Initialize the AudioChunker.\n\n        Args:\n            chunk_duration_seconds: Duration of each chunk in seconds\n            logger: Optional logger instance for logging operations\n        \"\"\"\n        self.chunk_duration_ms = int(chunk_duration_seconds * 1000)\n        self.logger = logger or logging.getLogger(__name__)\n\n    def split_audio(self, input_path: str, output_dir: str) -> List[str]:\n        \"\"\"\n        Split audio file into fixed-size chunks.\n\n        Args:\n            input_path: Path to the input audio file\n            output_dir: Directory where chunk files will be saved\n\n        Returns:\n            List of paths to the created chunk files\n\n        Raises:\n            FileNotFoundError: If input file doesn't exist\n            IOError: If there's an error reading or writing audio files\n        \"\"\"\n        if not os.path.exists(input_path):\n            raise FileNotFoundError(f\"Input file not found: {input_path}\")\n\n        if not os.path.exists(output_dir):\n            os.makedirs(output_dir)\n\n        self.logger.debug(f\"Loading audio file: {input_path}\")\n        audio = AudioSegment.from_file(input_path)\n\n        total_duration_ms = len(audio)\n        chunk_paths = []\n\n        # Calculate number of chunks\n        num_chunks = (total_duration_ms + self.chunk_duration_ms - 1) // self.chunk_duration_ms\n        self.logger.info(f\"Splitting {total_duration_ms / 1000:.1f}s audio into {num_chunks} chunks of {self.chunk_duration_ms / 1000:.1f}s each\")\n\n        # Get file extension from input\n        _, ext = os.path.splitext(input_path)\n        if not ext:\n            ext = \".wav\"  # Default to WAV if no extension\n\n        # Split into chunks\n        for i in range(num_chunks):\n            start_ms = i * self.chunk_duration_ms\n            end_ms = min(start_ms + self.chunk_duration_ms, total_duration_ms)\n\n            chunk = audio[start_ms:end_ms]\n            chunk_filename = f\"chunk_{i:04d}{ext}\"\n            chunk_path = os.path.join(output_dir, chunk_filename)\n\n            self.logger.debug(f\"Exporting chunk {i + 1}/{num_chunks}: {start_ms / 1000:.1f}s - {end_ms / 1000:.1f}s to {chunk_path}\")\n            chunk.export(chunk_path, format=ext.lstrip('.'))\n            chunk_paths.append(chunk_path)\n\n        return chunk_paths\n\n    def merge_chunks(self, chunk_paths: List[str], output_path: str) -> str:\n        \"\"\"\n        Merge processed chunks with simple concatenation.\n\n        Args:\n            chunk_paths: List of paths to chunk files to merge\n            output_path: Path where the merged output will be saved\n\n        Returns:\n            Path to the merged output file\n\n        Raises:\n            ValueError: If chunk_paths is empty\n            FileNotFoundError: If any chunk file doesn't exist\n            IOError: If there's an error reading or writing audio files\n        \"\"\"\n        if not chunk_paths:\n            raise ValueError(\"Cannot merge empty list of chunks\")\n\n        # Verify all chunks exist\n        for chunk_path in chunk_paths:\n            if not os.path.exists(chunk_path):\n                raise FileNotFoundError(f\"Chunk file not found: {chunk_path}\")\n\n        self.logger.info(f\"Merging {len(chunk_paths)} chunks into {output_path}\")\n\n        # Start with empty audio segment\n        combined = AudioSegment.empty()\n\n        # Concatenate all chunks\n        for i, chunk_path in enumerate(chunk_paths):\n            self.logger.debug(f\"Loading chunk {i + 1}/{len(chunk_paths)}: {chunk_path}\")\n            chunk = AudioSegment.from_file(chunk_path)\n            combined += chunk  # Simple concatenation\n\n        # Get output format from file extension\n        _, ext = os.path.splitext(output_path)\n        output_format = ext.lstrip('.') if ext else 'wav'\n\n        self.logger.info(f\"Exporting merged audio ({len(combined) / 1000:.1f}s) to {output_path}\")\n        combined.export(output_path, format=output_format)\n\n        return output_path\n\n    def should_chunk(self, audio_duration_seconds: float) -> bool:\n        \"\"\"\n        Determine if file is large enough to benefit from chunking.\n\n        Args:\n            audio_duration_seconds: Duration of the audio file in seconds\n\n        Returns:\n            True if the file should be chunked, False otherwise\n        \"\"\"\n        return audio_duration_seconds > (self.chunk_duration_ms / 1000)\n"
  },
  {
    "path": "audio_separator/separator/common_separator.py",
    "content": "\"\"\" This file contains the CommonSeparator class, common to all architecture-specific Separator classes. \"\"\"\n\nfrom logging import Logger\nimport os\nimport re\nimport gc\nimport numpy as np\nimport librosa\nimport torch\nfrom pydub import AudioSegment\nimport soundfile as sf\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\n\n\nclass CommonSeparator:\n    \"\"\"\n    This class contains the common methods and attributes common to all architecture-specific Separator classes.\n    \"\"\"\n\n    ALL_STEMS = \"All Stems\"\n    VOCAL_STEM = \"Vocals\"\n    INST_STEM = \"Instrumental\"\n    OTHER_STEM = \"Other\"\n    BASS_STEM = \"Bass\"\n    DRUM_STEM = \"Drums\"\n    GUITAR_STEM = \"Guitar\"\n    PIANO_STEM = \"Piano\"\n    SYNTH_STEM = \"Synthesizer\"\n    STRINGS_STEM = \"Strings\"\n    WOODWINDS_STEM = \"Woodwinds\"\n    BRASS_STEM = \"Brass\"\n    WIND_INST_STEM = \"Wind Inst\"\n    NO_OTHER_STEM = \"No Other\"\n    NO_BASS_STEM = \"No Bass\"\n    NO_DRUM_STEM = \"No Drums\"\n    NO_GUITAR_STEM = \"No Guitar\"\n    NO_PIANO_STEM = \"No Piano\"\n    NO_SYNTH_STEM = \"No Synthesizer\"\n    NO_STRINGS_STEM = \"No Strings\"\n    NO_WOODWINDS_STEM = \"No Woodwinds\"\n    NO_WIND_INST_STEM = \"No Wind Inst\"\n    NO_BRASS_STEM = \"No Brass\"\n    PRIMARY_STEM = \"Primary Stem\"\n    SECONDARY_STEM = \"Secondary Stem\"\n    LEAD_VOCAL_STEM = \"lead_only\"\n    BV_VOCAL_STEM = \"backing_only\"\n    LEAD_VOCAL_STEM_I = \"with_lead_vocals\"\n    BV_VOCAL_STEM_I = \"with_backing_vocals\"\n    LEAD_VOCAL_STEM_LABEL = \"Lead Vocals\"\n    BV_VOCAL_STEM_LABEL = \"Backing Vocals\"\n    NO_STEM = \"No \"\n\n    STEM_PAIR_MAPPER = {VOCAL_STEM: INST_STEM, INST_STEM: VOCAL_STEM, LEAD_VOCAL_STEM: BV_VOCAL_STEM, BV_VOCAL_STEM: LEAD_VOCAL_STEM, PRIMARY_STEM: SECONDARY_STEM}\n\n    NON_ACCOM_STEMS = (VOCAL_STEM, OTHER_STEM, BASS_STEM, DRUM_STEM, GUITAR_STEM, PIANO_STEM, SYNTH_STEM, STRINGS_STEM, WOODWINDS_STEM, BRASS_STEM, WIND_INST_STEM)\n\n    def __init__(self, config):\n\n        self.logger: Logger = config.get(\"logger\")\n        self.log_level: int = config.get(\"log_level\")\n\n        # Inferencing device / acceleration config\n        self.torch_device = config.get(\"torch_device\")\n        self.torch_device_cpu = config.get(\"torch_device_cpu\")\n        self.torch_device_mps = config.get(\"torch_device_mps\")\n        self.onnx_execution_provider = config.get(\"onnx_execution_provider\")\n\n        # Model data\n        self.model_name = config.get(\"model_name\")\n        self.model_path = config.get(\"model_path\")\n        self.model_data = config.get(\"model_data\")\n\n        # Output directory and format\n        self.output_dir = config.get(\"output_dir\")\n        self.output_format = config.get(\"output_format\")\n        self.output_bitrate = config.get(\"output_bitrate\")\n\n        # Functional options which are applicable to all architectures and the user may tweak to affect the output\n        self.normalization_threshold = config.get(\"normalization_threshold\")\n        self.amplification_threshold = config.get(\"amplification_threshold\")\n        self.enable_denoise = config.get(\"enable_denoise\")\n        self.output_single_stem = config.get(\"output_single_stem\")\n        self.invert_using_spec = config.get(\"invert_using_spec\")\n        self.sample_rate = config.get(\"sample_rate\")\n        self.use_soundfile = config.get(\"use_soundfile\")\n        \n        # Roformer-specific loading support\n        self.roformer_loader = None\n        self.is_roformer_model = self._detect_roformer_model()\n        if self.is_roformer_model:\n            self._initialize_roformer_loader()\n\n        # Model specific properties\n\n        # Check if model_data has a \"training\" key with \"instruments\" list\n        self.primary_stem_name = None\n        self.secondary_stem_name = None\n        \n        # Audio bit depth tracking for preserving input quality\n        self.input_bit_depth = None\n        self.input_subtype = None\n\n        if \"training\" in self.model_data and \"instruments\" in self.model_data[\"training\"]:\n            instruments = self.model_data[\"training\"][\"instruments\"]\n            if instruments:\n                target_instrument = self.model_data[\"training\"].get(\"target_instrument\")\n\n                # When target_instrument is set and doesn't match instruments[0],\n                # the model's prediction would be labeled with the wrong stem name.\n                # Swap so primary_stem_name always matches the model's actual target output.\n                if target_instrument and len(instruments) >= 2 and instruments[0] != target_instrument and instruments[1] == target_instrument:\n                    self.logger.debug(f\"Swapping stem names: target_instrument '{target_instrument}' doesn't match instruments[0] '{instruments[0]}'\")\n                    self.primary_stem_name = instruments[1]\n                    self.secondary_stem_name = instruments[0]\n                else:\n                    self.primary_stem_name = instruments[0]\n                    self.secondary_stem_name = instruments[1] if len(instruments) > 1 else self.secondary_stem(self.primary_stem_name)\n\n        if self.primary_stem_name is None:\n            self.primary_stem_name = self.model_data.get(\"primary_stem\", \"Vocals\")\n            self.secondary_stem_name = self.secondary_stem(self.primary_stem_name)\n\n        self.is_karaoke = self.model_data.get(\"is_karaoke\", False)\n        self.is_bv_model = self.model_data.get(\"is_bv_model\", False)\n        self.bv_model_rebalance = self.model_data.get(\"is_bv_model_rebalanced\", 0)\n\n        self.logger.debug(f\"Common params: model_name={self.model_name}, model_path={self.model_path}\")\n        self.logger.debug(f\"Common params: output_dir={self.output_dir}, output_format={self.output_format}\")\n        self.logger.debug(f\"Common params: normalization_threshold={self.normalization_threshold}, amplification_threshold={self.amplification_threshold}\")\n        self.logger.debug(f\"Common params: enable_denoise={self.enable_denoise}, output_single_stem={self.output_single_stem}\")\n        self.logger.debug(f\"Common params: invert_using_spec={self.invert_using_spec}, sample_rate={self.sample_rate}\")\n\n        self.logger.debug(f\"Common params: primary_stem_name={self.primary_stem_name}, secondary_stem_name={self.secondary_stem_name}\")\n        self.logger.debug(f\"Common params: is_karaoke={self.is_karaoke}, is_bv_model={self.is_bv_model}, bv_model_rebalance={self.bv_model_rebalance}\")\n\n        # File-specific variables which need to be cleared between processing different audio inputs\n        self.audio_file_path = None\n        self.audio_file_base = None\n\n        self.primary_source = None\n        self.secondary_source = None\n\n        self.primary_stem_output_path = None\n        self.secondary_stem_output_path = None\n\n        self.cached_sources_map = {}\n\n    def secondary_stem(self, primary_stem: str):\n        \"\"\"Determines secondary stem name based on the primary stem name.\"\"\"\n        primary_stem = primary_stem if primary_stem else self.NO_STEM\n\n        if primary_stem in self.STEM_PAIR_MAPPER:\n            secondary_stem = self.STEM_PAIR_MAPPER[primary_stem]\n        else:\n            secondary_stem = primary_stem.replace(self.NO_STEM, \"\") if self.NO_STEM in primary_stem else f\"{self.NO_STEM}{primary_stem}\"\n\n        return secondary_stem\n\n    def separate(self, audio_file_path):\n        \"\"\"\n        Placeholder method for separating audio sources. Should be overridden by subclasses.\n        \"\"\"\n        raise NotImplementedError(\"This method should be overridden by subclasses.\")\n\n    def final_process(self, stem_path, source, stem_name):\n        \"\"\"\n        Finalizes the processing of a stem by writing the audio to a file and returning the processed source.\n        \"\"\"\n        self.logger.debug(f\"Finalizing {stem_name} stem processing and writing audio...\")\n        self.write_audio(stem_path, source)\n\n        return {stem_name: source}\n\n    def cached_sources_clear(self):\n        \"\"\"\n        Clears the cache dictionaries for VR, MDX, and Demucs models.\n\n        This function is essential for ensuring that the cache does not hold outdated or irrelevant data\n        between different processing sessions or when a new batch of audio files is processed.\n        It helps in managing memory efficiently and prevents potential errors due to stale data.\n        \"\"\"\n        self.cached_sources_map = {}\n\n    def cached_source_callback(self, model_architecture, model_name=None):\n        \"\"\"\n        Retrieves the model and sources from the cache based on the processing method and model name.\n\n        Args:\n            model_architecture: The architecture type (VR, MDX, or Demucs) being used for processing.\n            model_name: The specific model name within the architecture type, if applicable.\n\n        Returns:\n            A tuple containing the model and its sources if found in the cache; otherwise, None.\n\n        This function is crucial for optimizing performance by avoiding redundant processing.\n        If the requested model and its sources are already in the cache, they can be reused directly,\n        saving time and computational resources.\n        \"\"\"\n        model, sources = None, None\n\n        mapper = self.cached_sources_map[model_architecture]\n\n        for key, value in mapper.items():\n            if model_name in key:\n                model = key\n                sources = value\n\n        return model, sources\n\n    def cached_model_source_holder(self, model_architecture, sources, model_name=None):\n        \"\"\"\n        Update the dictionary for the given model_architecture with the new model name and its sources.\n        Use the model_architecture as a key to access the corresponding cache source mapper dictionary.\n        \"\"\"\n        self.cached_sources_map[model_architecture] = {**self.cached_sources_map.get(model_architecture, {}), **{model_name: sources}}\n\n    def prepare_mix(self, mix):\n        \"\"\"\n        Prepares the mix for processing. This includes loading the audio from a file if necessary,\n        ensuring the mix is in the correct format, and converting mono to stereo if needed.\n        \"\"\"\n        # Store the original path or the mix itself for later checks\n        audio_path = mix\n\n        # Check if the input is a file path (string) and needs to be loaded\n        if not isinstance(mix, np.ndarray):\n            self.logger.debug(f\"Loading audio from file: {mix}\")\n            \n            # Get audio file info to capture bit depth before loading\n            try:\n                audio_info = sf.info(mix)\n                self.input_subtype = audio_info.subtype\n                self.logger.info(f\"Input audio subtype: {self.input_subtype}\")\n                \n                # Map subtype to bit depth\n                if 'PCM_16' in self.input_subtype or self.input_subtype == 'PCM_S8':\n                    self.input_bit_depth = 16\n                elif 'PCM_24' in self.input_subtype:\n                    self.input_bit_depth = 24\n                elif 'PCM_32' in self.input_subtype or 'FLOAT' in self.input_subtype or 'DOUBLE' in self.input_subtype:\n                    self.input_bit_depth = 32\n                else:\n                    # Default to 16-bit for unknown formats\n                    self.input_bit_depth = 16\n                    self.logger.warning(f\"Unknown audio subtype {self.input_subtype}, defaulting to 16-bit output\")\n                \n                self.logger.info(f\"Detected input bit depth: {self.input_bit_depth}-bit\")\n            except Exception as e:\n                self.logger.warning(f\"Could not read audio file info, defaulting to 16-bit output: {e}\")\n                self.input_bit_depth = 16\n                self.input_subtype = 'PCM_16'\n            \n            mix, sr = librosa.load(mix, mono=False, sr=self.sample_rate)\n            self.logger.debug(f\"Audio loaded. Sample rate: {sr}, Audio shape: {mix.shape}\")\n        else:\n            # Transpose the mix if it's already an ndarray (expected shape: [channels, samples])\n            self.logger.debug(\"Transposing the provided mix array.\")\n            # Default to 16-bit if numpy array provided directly\n            if self.input_bit_depth is None:\n                self.input_bit_depth = 16\n                self.input_subtype = 'PCM_16'\n            mix = mix.T\n            self.logger.debug(f\"Transposed mix shape: {mix.shape}\")\n\n        # If the original input was a filepath, check if the loaded mix is empty\n        if isinstance(audio_path, str):\n            if not np.any(mix):\n                error_msg = f\"Audio file {audio_path} is empty or not valid\"\n                self.logger.error(error_msg)\n                raise ValueError(error_msg)\n            else:\n                self.logger.debug(\"Audio file is valid and contains data.\")\n\n        # Ensure the mix is in stereo format\n        if mix.ndim == 1:\n            self.logger.debug(\"Mix is mono. Converting to stereo.\")\n            mix = np.asfortranarray([mix, mix])\n            self.logger.debug(\"Converted to stereo mix.\")\n\n        # Final log indicating successful preparation of the mix\n        self.logger.debug(\"Mix preparation completed.\")\n        return mix\n\n    def write_audio(self, stem_path: str, stem_source):\n        \"\"\"\n        Writes the separated audio source to a file using pydub or soundfile\n        Pydub supports a much wider range of audio formats and produces better encoded lossy files for some formats.\n        Soundfile is used for very large files (longer than 1 hour), as pydub has memory issues with large files:\n        https://github.com/jiaaro/pydub/issues/135\n        \"\"\"\n        # Get the duration of the input audio file\n        duration_seconds = librosa.get_duration(filename=self.audio_file_path)\n        duration_hours = duration_seconds / 3600\n        self.logger.info(f\"Audio duration is {duration_hours:.2f} hours ({duration_seconds:.2f} seconds).\")\n\n        if self.use_soundfile:\n            self.logger.warning(f\"Using soundfile for writing.\")\n            self.write_audio_soundfile(stem_path, stem_source)\n        else:\n            self.logger.info(f\"Using pydub for writing.\")\n            self.write_audio_pydub(stem_path, stem_source)\n\n    def write_audio_pydub(self, stem_path: str, stem_source):\n        \"\"\"\n        Writes the separated audio source to a file using pydub (ffmpeg)\n        \"\"\"\n        self.logger.debug(f\"Entering write_audio_pydub with stem_path: {stem_path}\")\n\n        stem_source = spec_utils.normalize(wave=stem_source, max_peak=self.normalization_threshold, min_peak=self.amplification_threshold)\n\n        # Check if the numpy array is empty or contains very low values\n        if np.max(np.abs(stem_source)) < 1e-6:\n            self.logger.warning(\"Warning: stem_source array is near-silent or empty.\")\n            return\n\n        # If output_dir is specified, create it and join it with stem_path\n        if self.output_dir:\n            os.makedirs(self.output_dir, exist_ok=True)\n            stem_path = os.path.join(self.output_dir, stem_path)\n\n        self.logger.debug(f\"Audio data shape before processing: {stem_source.shape}\")\n        self.logger.debug(f\"Data type before conversion: {stem_source.dtype}\")\n\n        # Determine bit depth for output (use input bit depth if available, otherwise default to 16)\n        output_bit_depth = self.input_bit_depth if self.input_bit_depth is not None else 16\n        self.logger.info(f\"Writing output with {output_bit_depth}-bit depth\")\n\n        # For pydub, we always convert to int16 for the AudioSegment creation\n        # Then let ffmpeg handle the conversion to the target bit depth during export\n        if stem_source.dtype != np.int16:\n            stem_source = (stem_source * 32767).astype(np.int16)\n            self.logger.debug(\"Converted stem_source to int16 for pydub processing.\")\n\n        # Correctly interleave stereo channels\n        stem_source_interleaved = np.empty((2 * stem_source.shape[0],), dtype=np.int16)\n        stem_source_interleaved[0::2] = stem_source[:, 0]  # Left channel\n        stem_source_interleaved[1::2] = stem_source[:, 1]  # Right channel\n\n        self.logger.debug(f\"Interleaved audio data shape: {stem_source_interleaved.shape}\")\n\n        # Create a pydub AudioSegment (always from 16-bit data)\n        try:\n            audio_segment = AudioSegment(stem_source_interleaved.tobytes(), frame_rate=self.sample_rate, sample_width=2, channels=2)\n            self.logger.debug(\"Created AudioSegment successfully.\")\n        except (IOError, ValueError) as e:\n            self.logger.error(f\"Specific error creating AudioSegment: {e}\")\n            return\n\n        # Determine file format based on the file extension\n        file_format = stem_path.lower().split(\".\")[-1]\n\n        # For m4a files, specify mp4 as the container format as the extension doesn't match the format name\n        if file_format == \"m4a\":\n            file_format = \"mp4\"\n        elif file_format == \"mka\":\n            file_format = \"matroska\"\n\n        # Set the bitrate to 320k for mp3 files if output_bitrate is not specified\n        bitrate = \"320k\" if file_format == \"mp3\" and self.output_bitrate is None else self.output_bitrate\n\n        # Export using the determined format\n        try:\n            # Pass codec parameters to ffmpeg to enforce bit depth for lossless formats\n            export_params = {\"format\": file_format}\n            \n            if bitrate:\n                export_params[\"bitrate\"] = bitrate\n            \n            # For lossless formats (WAV/FLAC), specify the codec parameters to enforce bit depth\n            if file_format in [\"wav\", \"flac\"]:\n                if output_bit_depth == 16:\n                    export_params[\"parameters\"] = [\"-sample_fmt\", \"s16\"]\n                elif output_bit_depth == 24:\n                    export_params[\"parameters\"] = [\"-sample_fmt\", \"s32\"]\n                    # For 24-bit, we also need to specify the bit depth explicitly\n                    if file_format == \"wav\":\n                        export_params[\"codec\"] = \"pcm_s24le\"\n                    elif file_format == \"flac\":\n                        # FLAC supports 24-bit natively, no special handling needed\n                        pass\n                elif output_bit_depth == 32:\n                    export_params[\"parameters\"] = [\"-sample_fmt\", \"s32\"]\n                    if file_format == \"wav\":\n                        export_params[\"codec\"] = \"pcm_s32le\"\n            \n            audio_segment.export(stem_path, **export_params)\n            self.logger.debug(f\"Exported audio file successfully to {stem_path} with {output_bit_depth}-bit depth\")\n        except (IOError, ValueError) as e:\n            self.logger.error(f\"Error exporting audio file: {e}\")\n\n    def write_audio_soundfile(self, stem_path: str, stem_source):\n        \"\"\"\n        Writes the separated audio source to a file using soundfile library.\n        \"\"\"\n        self.logger.debug(f\"Entering write_audio_soundfile with stem_path: {stem_path}\")\n\n        stem_source = spec_utils.normalize(wave=stem_source, max_peak=self.normalization_threshold, min_peak=self.amplification_threshold)\n\n        # Check if the numpy array is empty or contains very low values\n        if np.max(np.abs(stem_source)) < 1e-6:\n            self.logger.warning(\"Warning: stem_source array is near-silent or empty.\")\n            return\n\n        # If output_dir is specified, create it and join it with stem_path\n        if self.output_dir:\n            os.makedirs(self.output_dir, exist_ok=True)\n            stem_path = os.path.join(self.output_dir, stem_path)\n\n        # Determine the subtype based on the input audio's bit depth\n        output_subtype = None\n        if self.input_subtype:\n            output_subtype = self.input_subtype\n            self.logger.info(f\"Using input subtype for output: {output_subtype}\")\n        elif self.input_bit_depth:\n            # Map bit depth to subtype\n            if self.input_bit_depth == 16:\n                output_subtype = 'PCM_16'\n            elif self.input_bit_depth == 24:\n                output_subtype = 'PCM_24'\n            elif self.input_bit_depth == 32:\n                output_subtype = 'PCM_32'\n            else:\n                output_subtype = 'PCM_16'  # Default fallback\n            self.logger.info(f\"Using output subtype based on bit depth: {output_subtype}\")\n        else:\n            # Default to PCM_16 if no bit depth info available\n            output_subtype = 'PCM_16'\n            self.logger.warning(\"No bit depth info available, defaulting to PCM_16\")\n\n        # Correctly interleave stereo channels if needed\n        if stem_source.shape[1] == 2:\n            # If the audio is already interleaved, ensure it's in the correct order\n            # Check if the array is Fortran contiguous (column-major)\n            if stem_source.flags[\"F_CONTIGUOUS\"]:\n                # Convert to C contiguous (row-major)\n                stem_source = np.ascontiguousarray(stem_source)\n            # No need to manually interleave for soundfile - it handles multi-channel properly\n            # Just ensure we don't have the wrong shape\n\n        self.logger.debug(f\"Audio data shape for soundfile: {stem_source.shape}\")\n\n        \"\"\"\n        Write audio using soundfile (for formats other than M4A).\n        \"\"\"\n        # Save audio using soundfile with the specified subtype\n        try:\n            # Specify the subtype to match input bit depth\n            sf.write(stem_path, stem_source, self.sample_rate, subtype=output_subtype)\n            self.logger.debug(f\"Exported audio file successfully to {stem_path} with subtype {output_subtype}\")\n        except Exception as e:\n            self.logger.error(f\"Error exporting audio file: {e}\")\n\n    def clear_gpu_cache(self):\n        \"\"\"\n        This method clears the GPU cache to free up memory.\n        \"\"\"\n        self.logger.debug(\"Running garbage collection...\")\n        gc.collect()\n        if self.torch_device == torch.device(\"mps\"):\n            self.logger.debug(\"Clearing MPS cache...\")\n            torch.mps.empty_cache()\n        if self.torch_device == torch.device(\"cuda\"):\n            self.logger.debug(\"Clearing CUDA cache...\")\n            torch.cuda.empty_cache()\n\n    def clear_file_specific_paths(self):\n        \"\"\"\n        Clears the file-specific variables which need to be cleared between processing different audio inputs.\n        \"\"\"\n        self.logger.info(\"Clearing input audio file paths, sources and stems...\")\n\n        self.audio_file_path = None\n        self.audio_file_base = None\n\n        self.primary_source = None\n        self.secondary_source = None\n\n        self.primary_stem_output_path = None\n        self.secondary_stem_output_path = None\n\n    def sanitize_filename(self, filename):\n        \"\"\"\n        Cleans the filename by replacing invalid characters with underscores.\n        \"\"\"\n        sanitized = re.sub(r'[<>:\"/\\\\|?*]', '_', filename)\n        sanitized = re.sub(r'_+', '_', sanitized)\n        sanitized = sanitized.strip('_. ')\n        return sanitized\n\n    def get_stem_output_path(self, stem_name, custom_output_names):\n        \"\"\"\n        Gets the output path for a stem based on the stem name and custom output names.\n        \"\"\"\n        # Convert custom_output_names keys to lowercase for case-insensitive comparison\n        if custom_output_names:\n            custom_output_names_lower = {k.lower(): v for k, v in custom_output_names.items()}\n            stem_name_lower = stem_name.lower()\n            if stem_name_lower in custom_output_names_lower:\n                sanitized_custom_name = self.sanitize_filename(custom_output_names_lower[stem_name_lower])\n                return os.path.join(f\"{sanitized_custom_name}.{self.output_format.lower()}\")\n\n        sanitized_audio_base = self.sanitize_filename(self.audio_file_base)\n        sanitized_stem_name = self.sanitize_filename(stem_name)\n        sanitized_model_name = self.sanitize_filename(self.model_name) \n\n        filename = f\"{sanitized_audio_base}_({sanitized_stem_name})_{sanitized_model_name}.{self.output_format.lower()}\"\n        return os.path.join(filename)\n    \n    def _detect_roformer_model(self):\n        \"\"\"\n        Detect if the current model is a Roformer model.\n        \n        Returns:\n            bool: True if this is a Roformer model, False otherwise\n        \"\"\"\n        if not self.model_data:\n            return False\n            \n        # Check for explicit Roformer flag\n        if self.model_data.get(\"is_roformer\", False):\n            return True\n            \n        # Check model path for Roformer indicators\n        if self.model_path and \"roformer\" in self.model_path.lower():\n            return True\n            \n        # Check model name for Roformer indicators\n        if self.model_name and \"roformer\" in self.model_name.lower():\n            return True\n            \n        return False\n    \n    def _initialize_roformer_loader(self):\n        \"\"\"\n        Initialize the Roformer loader for this model.\n        \"\"\"\n        try:\n            from .roformer.roformer_loader import RoformerLoader\n            self.roformer_loader = RoformerLoader()\n            self.logger.debug(\"Initialized Roformer loader for CommonSeparator\")\n        except ImportError as e:\n            self.logger.warning(f\"Could not import RoformerLoader: {e}\")\n            self.roformer_loader = None\n    \n    def get_roformer_loading_stats(self):\n        \"\"\"\n        Get Roformer loading statistics if available.\n        \n        Returns:\n            dict: Loading statistics or empty dict if not available\n        \"\"\"\n        if self.roformer_loader:\n            return self.roformer_loader.get_loading_stats()\n        return {}\n    \n    def validate_roformer_config(self, config, model_type):\n        \"\"\"\n        Validate Roformer configuration if loader is available.\n        \n        Args:\n            config: Configuration dictionary to validate\n            model_type: Type of model to validate for\n            \n        Returns:\n            bool: True if valid or validation not available, False if invalid\n        \"\"\"\n        if self.roformer_loader:\n            return self.roformer_loader.validate_configuration(config, model_type)\n        return True  # Assume valid if no loader available\n"
  },
  {
    "path": "audio_separator/separator/ensembler.py",
    "content": "import numpy as np\nimport librosa\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\n\n\nclass Ensembler:\n    def __init__(self, logger, algorithm=\"avg_wave\", weights=None):\n        self.logger = logger\n        self.algorithm = algorithm\n        self.weights = weights\n\n    def ensemble(self, waveforms):\n        \"\"\"\n        Ensemble multiple waveforms using the selected algorithm.\n        :param waveforms: List of waveforms, each of shape (channels, length)\n        :return: Ensembled waveform of shape (channels, length)\n        \"\"\"\n        if not waveforms:\n            return None\n        if len(waveforms) == 1:\n            return waveforms[0]\n\n        # Ensure all waveforms have the same number of channels\n        num_channels = waveforms[0].shape[0]\n        if any(w.shape[0] != num_channels for w in waveforms):\n            raise ValueError(\"All waveforms must have the same number of channels for ensembling.\")\n\n        # Ensure all waveforms have the same length by padding with zeros\n        max_length = max(w.shape[1] for w in waveforms)\n        waveforms = [np.pad(w, ((0, 0), (0, max_length - w.shape[1]))) if w.shape[1] < max_length else w for w in waveforms]\n\n        if self.weights is None:\n            weights = np.ones(len(waveforms))\n        else:\n            weights = np.array(self.weights)\n            if len(weights) != len(waveforms):\n                self.logger.warning(f\"Number of weights ({len(weights)}) does not match number of waveforms ({len(waveforms)}). Using equal weights.\")\n                weights = np.ones(len(waveforms))\n            \n            # Validate weights are finite and sum is non-zero\n            weights_sum = np.sum(weights)\n            if not np.all(np.isfinite(weights)) or not np.isfinite(weights_sum) or weights_sum == 0:\n                self.logger.warning(f\"Weights {self.weights} contain non-finite values or sum to zero. Falling back to equal weights.\")\n                weights = np.ones(len(waveforms))\n\n        self.logger.debug(f\"Ensembling {len(waveforms)} waveforms using algorithm {self.algorithm}\")\n\n        if self.algorithm == \"avg_wave\":\n            ensembled = np.zeros_like(waveforms[0])\n            for w, weight in zip(waveforms, weights, strict=True):\n                ensembled += w * weight\n            return ensembled / np.sum(weights)\n        elif self.algorithm == \"median_wave\":\n            if self.weights is not None and not np.all(weights == weights[0]):\n                self.logger.warning(f\"Weights are ignored for algorithm {self.algorithm}\")\n            return np.median(waveforms, axis=0)\n        elif self.algorithm == \"min_wave\":\n            if self.weights is not None and not np.all(weights == weights[0]):\n                self.logger.warning(f\"Weights are ignored for algorithm {self.algorithm}\")\n            return self._lambda_min(np.array(waveforms), axis=0, key=np.abs)\n        elif self.algorithm == \"max_wave\":\n            if self.weights is not None and not np.all(weights == weights[0]):\n                self.logger.warning(f\"Weights are ignored for algorithm {self.algorithm}\")\n            return self._lambda_max(np.array(waveforms), axis=0, key=np.abs)\n        elif self.algorithm in [\"avg_fft\", \"median_fft\", \"min_fft\", \"max_fft\"]:\n            return self._ensemble_fft(waveforms, weights)\n        elif self.algorithm == \"uvr_max_spec\":\n            return self._ensemble_uvr(waveforms, spec_utils.MAX_SPEC)\n        elif self.algorithm == \"uvr_min_spec\":\n            return self._ensemble_uvr(waveforms, spec_utils.MIN_SPEC)\n        elif self.algorithm == \"ensemble_wav\":\n            return spec_utils.ensemble_wav(waveforms)\n        else:\n            raise ValueError(f\"Unknown ensemble algorithm: {self.algorithm}\")\n\n    def _lambda_max(self, arr, axis=None, key=None, keepdims=False):\n        idxs = np.argmax(key(arr), axis)\n        if axis is not None:\n            idxs = np.expand_dims(idxs, axis)\n            result = np.take_along_axis(arr, idxs, axis)\n            if not keepdims:\n                result = np.squeeze(result, axis=axis)\n            return result\n        else:\n            return arr.flatten()[idxs]\n\n    def _lambda_min(self, arr, axis=None, key=None, keepdims=False):\n        idxs = np.argmin(key(arr), axis)\n        if axis is not None:\n            idxs = np.expand_dims(idxs, axis)\n            result = np.take_along_axis(arr, idxs, axis)\n            if not keepdims:\n                result = np.squeeze(result, axis=axis)\n            return result\n        else:\n            return arr.flatten()[idxs]\n\n    def _stft(self, wave, nfft=2048, hl=1024):\n        if wave.ndim == 1:\n            wave = np.stack([wave, wave])\n        elif wave.shape[0] == 1:\n            wave = np.vstack([wave, wave])\n\n        wave_left = np.asfortranarray(wave[0])\n        wave_right = np.asfortranarray(wave[1])\n        spec_left = librosa.stft(wave_left, n_fft=nfft, hop_length=hl)\n        spec_right = librosa.stft(wave_right, n_fft=nfft, hop_length=hl)\n        spec = np.asfortranarray([spec_left, spec_right])\n        return spec\n\n    def _istft(self, spec, hl=1024, length=None, original_channels=None):\n        if spec.shape[0] == 1:\n            spec = np.vstack([spec, spec])\n\n        spec_left = np.asfortranarray(spec[0])\n        spec_right = np.asfortranarray(spec[1])\n        wave_left = librosa.istft(spec_left, hop_length=hl, length=length)\n        wave_right = librosa.istft(spec_right, hop_length=hl, length=length)\n        wave = np.asfortranarray([wave_left, wave_right])\n\n        if original_channels == 1:\n            wave = wave[:1, :]\n\n        return wave\n\n    def _ensemble_fft(self, waveforms, weights):\n        num_channels = waveforms[0].shape[0]\n        final_length = waveforms[0].shape[-1]\n        specs = [self._stft(w) for w in waveforms]\n        specs = np.array(specs)\n\n        if self.algorithm == \"avg_fft\":\n            ense_spec = np.zeros_like(specs[0])\n            for s, weight in zip(specs, weights, strict=True):\n                ense_spec += s * weight\n            ense_spec /= np.sum(weights)\n        elif self.algorithm in [\"median_fft\", \"min_fft\", \"max_fft\"]:\n            if self.weights is not None and not np.all(weights == weights[0]):\n                self.logger.warning(f\"Weights are ignored for algorithm {self.algorithm}\")\n\n            if self.algorithm == \"median_fft\":\n                # For complex numbers, we take median of real and imag parts separately to be safe\n                real_median = np.median(np.real(specs), axis=0)\n                imag_median = np.median(np.imag(specs), axis=0)\n                ense_spec = real_median + 1j * imag_median\n            elif self.algorithm == \"min_fft\":\n                ense_spec = self._lambda_min(specs, axis=0, key=np.abs)\n            elif self.algorithm == \"max_fft\":\n                ense_spec = self._lambda_max(specs, axis=0, key=np.abs)\n\n        return self._istft(ense_spec, length=final_length, original_channels=num_channels)\n\n    def _ensemble_uvr(self, waveforms, uvr_algorithm):\n        specs = [spec_utils.wave_to_spectrogram_no_mp(w) for w in waveforms]\n        ense_spec = spec_utils.ensembling(uvr_algorithm, specs)\n        return spec_utils.spectrogram_to_wave_no_mp(ense_spec)\n"
  },
  {
    "path": "audio_separator/separator/roformer/README.md",
    "content": "# Roformer Implementation - Updated Parameter Support\n\nThis directory contains the updated Roformer implementation with support for new parameters and backward compatibility for legacy models (handled by the single, modern loader).\n\n## Overview\n\nThe updated Roformer implementation provides:\n\n- **New Parameter Support**: `mlp_expansion_factor`, `sage_attention`, `zero_dc`, `use_torch_checkpoint`, `skip_connection`\n- **Single Loader Path**: Unified loader supports both newer and older checkpoints\n- **Robust Validation**: Comprehensive parameter validation with detailed error messages\n- **Configuration Normalization**: Handles different config formats and applies sensible defaults\n\n## Architecture\n\n```\nroformer/\n├── model_configuration.py      # Data model for standardized config\n├── parameter_validator.py      # Base parameter validation\n├── bs_roformer_validator.py    # BSRoformer-specific validation\n├── mel_band_roformer_validator.py  # MelBandRoformer-specific validation\n├── configuration_normalizer.py # Config normalization and defaults\n├── parameter_validation_error.py  # Custom exception handling\n├── roformer_loader.py          # Main loader (new implementation only)\n├── model_loading_result.py     # Result dataclass\n└── README.md                   # This documentation\n```\n\n## New Parameters\n\n### Core New Parameters\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `mlp_expansion_factor` | int | 4 | MLP expansion factor in transformer layers |\n| `sage_attention` | bool | False | Use Sage attention mechanism instead of standard attention |\n| `zero_dc` | bool | True | Apply zero DC component filtering |\n| `use_torch_checkpoint` | bool | False | Enable gradient checkpointing for memory efficiency |\n| `skip_connection` | bool | False | Add skip connections in the architecture |\n\n### Compatibility Notes\n\n- **`sage_attention`**: Cannot be used simultaneously with `flash_attn=True` (warning will be issued)\n- **`mlp_expansion_factor`**: Higher values increase model capacity but also memory usage\n- **`use_torch_checkpoint`**: Trades computation for memory - useful for large models\n- **`zero_dc`**: Recommended for most audio separation tasks\n\n## Usage Examples\n\n### Basic Usage with New Parameters\n\n```python\nfrom audio_separator.separator.roformer.roformer_loader import RoformerLoader\n\nloader = RoformerLoader()\n\nconfig = {\n    'dim': 512,\n    'depth': 12,\n    'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n    'mlp_expansion_factor': 8,\n    'sage_attention': True,\n    'zero_dc': True,\n    'use_torch_checkpoint': True,\n    'skip_connection': False\n}\n\nresult = loader.load_model('/path/to/model.ckpt', config, device='cuda')\nif result.success:\n    print(\"Model loaded\")\nelse:\n    print(f\"Failed: {result.error_message}\")\n```\n\n### Configuration Validation\n\n```python\nfrom audio_separator.separator.roformer.parameter_validator import ParameterValidator\n\nvalidator = ParameterValidator()\nissues = validator.validate_all(config, \"bs_roformer\")\nif issues:\n    for issue in issues:\n        print(f\"{issue.severity.value}: {issue.message}\")\nelse:\n    print(\"Configuration is valid!\")\n```\n\n### Configuration Normalization\n\n```python\nfrom audio_separator.separator.roformer.configuration_normalizer import ConfigurationNormalizer\n\nnormalizer = ConfigurationNormalizer()\nraw_config = {\n    'model': {'dim': '512', 'depth': 12.0, 'n_heads': 8},\n    'training': {'sample_rate': 44100},\n    'stereo': 'true',\n    'freq_bands': '[2, 4, 8, 16]'\n}\nnormalized = normalizer.normalize_config(raw_config, model_type=\"bs_roformer\", apply_defaults=True, validate=True)\nprint(f\"Normalized config has {len(normalized)} parameters\")\n```\n\n## Model Type Detection\n\nDetected via config contents (e.g., `freqs_per_bands` vs `num_bands`) and filename hints. Defaults to BSRoformer when ambiguous.\n\n## Integration with Audio Separator\n\n- Routing remains through the MDXC architecture path; Roformer models are detected and handled by the MDXC separator using the unified `RoformerLoader`.\n- Loader statistics are surfaced via `CommonSeparator.get_roformer_loading_stats()` and logged by the top-level `Separator`.\n\n## Testing\n\n```bash\n# Unit tests (examples)\npython -m pytest tests/unit/test_parameter_validator.py -v\npython -m pytest tests/unit/test_configuration_normalizer.py -v\n\n# Integration\npython -m pytest tests/integration/test_roformer_*.py -v\n```\n\n## Contributing\n\n- Follow TDD\n- Maintain compatibility for existing checkpoints through the single loader path\n- Update documentation when adding parameters or behavior\n\n## License\n\nFollows the main Audio Separator project license.\n"
  },
  {
    "path": "audio_separator/separator/roformer/__init__.py",
    "content": "\"\"\"\nRoformer implementation module.\nUpdated implementation supporting both old and new Roformer model parameters.\n\"\"\"\n\nfrom .model_configuration import ModelConfiguration\nfrom .bs_roformer_config import BSRoformerConfig\nfrom .mel_band_roformer_config import MelBandRoformerConfig\nfrom .model_loading_result import ModelLoadingResult\nfrom .parameter_validation_error import ParameterValidationError\n\n__all__ = [\n    'ModelConfiguration',\n    'BSRoformerConfig', \n    'MelBandRoformerConfig',\n    'ModelLoadingResult',\n    'ParameterValidationError'\n]\n"
  },
  {
    "path": "audio_separator/separator/roformer/bs_roformer_config.py",
    "content": "\"\"\"\nBSRoformer-specific configuration class.\nExtends ModelConfiguration with BSRoformer-specific parameters.\n\"\"\"\n\nfrom dataclasses import dataclass\nfrom typing import Tuple, Optional, Dict, Any\nfrom .model_configuration import ModelConfiguration, RoformerType\n\n\n@dataclass(frozen=True, unsafe_hash=True)\nclass BSRoformerConfig(ModelConfiguration):\n    \"\"\"\n    Configuration class specifically for BSRoformer (Band-Split Roformer) models.\n    \n    BSRoformer processes audio by splitting it into frequency bands and applying\n    Roformer architecture to each band separately.\n    \"\"\"\n    \n    # BSRoformer-specific required parameters\n    freqs_per_bands: Tuple[int, ...] = (2, 4, 8, 16, 32, 64)  # Default frequency band configuration\n    \n    # BSRoformer-specific optional parameters\n    mask_estimator_depth: int = 2  # Depth of mask estimation network\n    \n    def __post_init__(self):\n        \"\"\"Validate BSRoformer-specific configuration after initialization.\"\"\"\n        super().__post_init__()\n        self._validate_bs_roformer_parameters()\n    \n    def _validate_bs_roformer_parameters(self):\n        \"\"\"Validate BSRoformer-specific parameters.\"\"\"\n        if not self.freqs_per_bands:\n            raise ValueError(\"freqs_per_bands must be provided for BSRoformer\")\n        \n        if not isinstance(self.freqs_per_bands, (tuple, list)):\n            raise ValueError(f\"freqs_per_bands must be a tuple or list, got {type(self.freqs_per_bands)}\")\n        \n        if not all(isinstance(freq, int) and freq > 0 for freq in self.freqs_per_bands):\n            raise ValueError(\"All frequencies in freqs_per_bands must be positive integers\")\n        \n        if len(self.freqs_per_bands) < 2:\n            raise ValueError(\"freqs_per_bands must contain at least 2 frequency bands\")\n        \n        if self.mask_estimator_depth <= 0:\n            raise ValueError(f\"mask_estimator_depth must be positive, got {self.mask_estimator_depth}\")\n    \n    def get_total_frequency_bins(self) -> int:\n        \"\"\"Calculate total number of frequency bins.\"\"\"\n        return sum(self.freqs_per_bands)\n    \n    def validate_against_stft_config(self, n_fft: int) -> bool:\n        \"\"\"\n        Validate that frequency bands match STFT configuration.\n        \n        Args:\n            n_fft: STFT n_fft parameter\n            \n        Returns:\n            True if configuration is valid, False otherwise\n        \"\"\"\n        expected_freq_bins = n_fft // 2 + 1\n        total_freq_bins = self.get_total_frequency_bins()\n        \n        return total_freq_bins == expected_freq_bins\n    \n    def get_stft_compatibility_info(self, n_fft: int) -> Dict[str, Any]:\n        \"\"\"\n        Get information about STFT compatibility.\n        \n        Args:\n            n_fft: STFT n_fft parameter\n            \n        Returns:\n            Dictionary with compatibility information\n        \"\"\"\n        expected_freq_bins = n_fft // 2 + 1\n        total_freq_bins = self.get_total_frequency_bins()\n        \n        return {\n            'expected_freq_bins': expected_freq_bins,\n            'actual_freq_bins': total_freq_bins,\n            'is_compatible': total_freq_bins == expected_freq_bins,\n            'difference': total_freq_bins - expected_freq_bins,\n            'freqs_per_bands': self.freqs_per_bands\n        }\n    \n    def get_model_type(self) -> RoformerType:\n        \"\"\"Get the model type.\"\"\"\n        return RoformerType.BS_ROFORMER\n    \n    def get_bs_roformer_kwargs(self) -> Dict[str, Any]:\n        \"\"\"Get BSRoformer-specific parameters for model initialization.\"\"\"\n        kwargs = self.get_transformer_kwargs()\n        kwargs.update({\n            'freqs_per_bands': self.freqs_per_bands,\n            'mask_estimator_depth': self.mask_estimator_depth,\n            'stereo': self.stereo,\n            'num_stems': self.num_stems,\n            'mlp_expansion_factor': self.mlp_expansion_factor,\n            'use_torch_checkpoint': self.use_torch_checkpoint,\n            'skip_connection': self.skip_connection,\n        })\n        return kwargs\n    \n    @classmethod\n    def from_model_config(cls, config_dict: Dict[str, Any]) -> 'BSRoformerConfig':\n        \"\"\"\n        Create BSRoformerConfig from a model configuration dictionary.\n        \n        Args:\n            config_dict: Dictionary containing model configuration\n            \n        Returns:\n            BSRoformerConfig instance\n        \"\"\"\n        # Ensure freqs_per_bands is present\n        if 'freqs_per_bands' not in config_dict:\n            # Use default if not specified\n            config_dict = config_dict.copy()\n            config_dict['freqs_per_bands'] = (2, 4, 8, 16, 32, 64)\n        \n        return cls.from_dict(config_dict)\n    \n    def suggest_stft_n_fft(self) -> int:\n        \"\"\"\n        Suggest appropriate n_fft value for STFT based on frequency bands.\n        \n        Returns:\n            Suggested n_fft value\n        \"\"\"\n        total_freq_bins = self.get_total_frequency_bins()\n        # n_fft = 2 * (freq_bins - 1)\n        return 2 * (total_freq_bins - 1)\n    \n    def get_parameter_summary(self) -> str:\n        \"\"\"Get a summary string of key BSRoformer parameters.\"\"\"\n        base_summary = super().get_parameter_summary()\n        bs_info = f\", freqs_per_bands={self.freqs_per_bands}, total_bins={self.get_total_frequency_bins()}\"\n        return base_summary.replace(\")\", bs_info + \")\")\n    \n    def __repr__(self) -> str:\n        \"\"\"String representation of the BSRoformer configuration.\"\"\"\n        return f\"BSRoformerConfig({self.get_parameter_summary()})\"\n"
  },
  {
    "path": "audio_separator/separator/roformer/bs_roformer_validator.py",
    "content": "\"\"\"\nBSRoformer-specific parameter validator.\nExtends the base ParameterValidator with BSRoformer-specific validation logic.\n\"\"\"\n\nfrom typing import Dict, Any, List, Tuple\nfrom .parameter_validator import ParameterValidator, ValidationIssue, ValidationSeverity\n\n\nclass BSRoformerValidator(ParameterValidator):\n    \"\"\"\n    Specialized validator for BSRoformer model parameters.\n    \n    Provides BSRoformer-specific validation beyond the base parameter validation,\n    including frequency band configuration and STFT parameter validation.\n    \"\"\"\n    \n    # BSRoformer-specific parameter constraints\n    DEFAULT_FREQS_PER_BANDS = (2, 4, 8, 16, 32, 64, 128, 256, 512, 1024)\n    MIN_BANDS = 2\n    MAX_BANDS = 32\n    \n    def validate_freqs_per_bands(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate freqs_per_bands parameter for BSRoformer.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for freqs_per_bands\n        \"\"\"\n        issues = []\n        \n        if 'freqs_per_bands' not in config:\n            return issues  # Required parameter check handled by base class\n        \n        freqs_per_bands = config['freqs_per_bands']\n        \n        # Type check\n        if not isinstance(freqs_per_bands, (list, tuple)):\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"freqs_per_bands\",\n                message=\"freqs_per_bands must be a list or tuple\",\n                suggested_fix=\"Convert freqs_per_bands to a list or tuple of integers\",\n                current_value=freqs_per_bands,\n                expected_value=\"list or tuple of integers\"\n            )\n            issues.append(issue)\n            return issues\n        \n        # Length check\n        if len(freqs_per_bands) < self.MIN_BANDS:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"freqs_per_bands\",\n                message=f\"freqs_per_bands must have at least {self.MIN_BANDS} bands\",\n                suggested_fix=f\"Add more frequency bands to reach minimum of {self.MIN_BANDS}\",\n                current_value=len(freqs_per_bands),\n                expected_value=f\">= {self.MIN_BANDS}\"\n            )\n            issues.append(issue)\n        \n        if len(freqs_per_bands) > self.MAX_BANDS:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.WARNING,\n                parameter_name=\"freqs_per_bands\",\n                message=f\"freqs_per_bands has {len(freqs_per_bands)} bands, which may impact performance\",\n                suggested_fix=f\"Consider reducing to {self.MAX_BANDS} or fewer bands for better performance\",\n                current_value=len(freqs_per_bands),\n                expected_value=f\"<= {self.MAX_BANDS} (recommended)\"\n            )\n            issues.append(issue)\n        \n        # Value validation\n        for i, freq_count in enumerate(freqs_per_bands):\n            if not isinstance(freq_count, int) or freq_count <= 0:\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.ERROR,\n                    parameter_name=f\"freqs_per_bands[{i}]\",\n                    message=f\"Each frequency count must be a positive integer, got {freq_count}\",\n                    suggested_fix=\"Use positive integers for all frequency counts\",\n                    current_value=freq_count,\n                    expected_value=\"positive integer\"\n                )\n                issues.append(issue)\n        \n        # Check for reasonable progression (powers of 2 are common)\n        if all(isinstance(f, int) and f > 0 for f in freqs_per_bands):\n            # Check if values follow a reasonable pattern (not strictly enforced)\n            total_freqs = sum(freqs_per_bands)\n            if total_freqs > 8192:  # Unusually high for typical STFT\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.WARNING,\n                    parameter_name=\"freqs_per_bands\",\n                    message=f\"Total frequency count ({total_freqs}) is very high\",\n                    suggested_fix=\"Consider using fewer total frequencies for typical audio processing\",\n                    current_value=total_freqs,\n                    expected_value=\"<= 8192 (typical)\"\n                )\n                issues.append(issue)\n        \n        return issues\n    \n    def validate_stft_compatibility(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate STFT-related parameters for compatibility with freqs_per_bands.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for STFT compatibility\n        \"\"\"\n        issues = []\n        \n        if 'freqs_per_bands' not in config:\n            return issues\n        \n        freqs_per_bands = config.get('freqs_per_bands')\n        stft_n_fft = config.get('stft_n_fft', 2048)  # Common default\n        \n        if isinstance(freqs_per_bands, (list, tuple)) and all(isinstance(f, int) for f in freqs_per_bands):\n            total_freqs = sum(freqs_per_bands)\n            expected_freqs = stft_n_fft // 2 + 1\n            \n            if total_freqs != expected_freqs:\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.WARNING,\n                    parameter_name=\"freqs_per_bands, stft_n_fft\",\n                    message=f\"freqs_per_bands sum ({total_freqs}) doesn't match STFT frequency bins ({expected_freqs})\",\n                    suggested_fix=f\"Adjust freqs_per_bands to sum to {expected_freqs} or modify stft_n_fft\",\n                    current_value=f\"sum={total_freqs}, expected={expected_freqs}\",\n                    expected_value=f\"freqs_per_bands sum == stft_n_fft//2 + 1\"\n                )\n                issues.append(issue)\n        \n        return issues\n    \n    def validate_mask_estimator_config(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate mask estimator configuration for BSRoformer.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for mask estimator\n        \"\"\"\n        issues = []\n        \n        mask_depth = config.get('mask_estimator_depth', 2)\n        \n        if not isinstance(mask_depth, int) or mask_depth < 1:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"mask_estimator_depth\",\n                message=\"mask_estimator_depth must be a positive integer\",\n                suggested_fix=\"Set mask_estimator_depth to a positive integer (typically 2-8)\",\n                current_value=mask_depth,\n                expected_value=\"positive integer\"\n            )\n            issues.append(issue)\n        elif mask_depth > 8:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.WARNING,\n                parameter_name=\"mask_estimator_depth\",\n                message=f\"mask_estimator_depth of {mask_depth} may be unnecessarily deep\",\n                suggested_fix=\"Consider using a smaller depth (2-8) for better efficiency\",\n                current_value=mask_depth,\n                expected_value=\"2-8 (recommended)\"\n            )\n            issues.append(issue)\n        \n        return issues\n    \n    def validate_all(self, config: Dict[str, Any], model_type: str = \"bs_roformer\") -> List[ValidationIssue]:\n        \"\"\"\n        Run all BSRoformer-specific validation checks.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model (should be \"bs_roformer\")\n            \n        Returns:\n            List of all validation issues found\n        \"\"\"\n        # Start with base validation\n        all_issues = super().validate_all(config, model_type)\n        \n        # Add BSRoformer-specific validation\n        all_issues.extend(self.validate_freqs_per_bands(config))\n        all_issues.extend(self.validate_stft_compatibility(config))\n        all_issues.extend(self.validate_mask_estimator_config(config))\n        \n        return all_issues\n    \n    def get_parameter_defaults(self, model_type: str = \"bs_roformer\") -> Dict[str, Any]:\n        \"\"\"\n        Get BSRoformer-specific parameter defaults.\n        \n        Args:\n            model_type: Type of model\n            \n        Returns:\n            Dictionary of parameter names to default values\n        \"\"\"\n        defaults = super().get_parameter_defaults(model_type)\n        \n        # Override with BSRoformer-specific defaults\n        defaults.update({\n            'freqs_per_bands': self.DEFAULT_FREQS_PER_BANDS[:6],  # Use first 6 bands as default\n            'mask_estimator_depth': 2,\n            'stft_n_fft': 2048,\n            'stft_hop_length': 512,\n            'stft_win_length': 2048,\n            'multi_stft_resolution_loss_weight': 1.0,\n        })\n        \n        return defaults\n"
  },
  {
    "path": "audio_separator/separator/roformer/configuration_normalizer.py",
    "content": "\"\"\"\nConfiguration normalizer for Roformer models.\nNormalizes and standardizes configuration dictionaries from various sources.\n\"\"\"\n\nfrom typing import Dict, Any, Optional, Union, List\nimport copy\nimport logging\n\nfrom .parameter_validator import ParameterValidator\nfrom .bs_roformer_validator import BSRoformerValidator\nfrom .mel_band_roformer_validator import MelBandRoformerValidator\nfrom .parameter_validation_error import ParameterValidationError\n\nlogger = logging.getLogger(__name__)\n\n\nclass ConfigurationNormalizer:\n    \"\"\"\n    Normalizes configuration dictionaries for Roformer models.\n    \n    Handles different configuration formats, applies defaults, and ensures\n    consistency across different model types and versions.\n    \"\"\"\n    \n    def __init__(self):\n        \"\"\"Initialize the configuration normalizer with validators.\"\"\"\n        self.base_validator = ParameterValidator()\n        self.bs_validator = BSRoformerValidator()\n        self.mel_validator = MelBandRoformerValidator()\n    \n    def normalize_config(self, \n                        config: Dict[str, Any], \n                        model_type: str, \n                        apply_defaults: bool = True,\n                        validate: bool = True) -> Dict[str, Any]:\n        \"\"\"\n        Normalize a configuration dictionary.\n        \n        Args:\n            config: Raw configuration dictionary\n            model_type: Type of model (\"bs_roformer\" or \"mel_band_roformer\")\n            apply_defaults: Whether to apply default values for missing parameters\n            validate: Whether to validate the configuration\n            \n        Returns:\n            Normalized configuration dictionary\n            \n        Raises:\n            ParameterValidationError: If validation fails and validate=True\n        \"\"\"\n        # Deep copy to avoid modifying original\n        normalized = copy.deepcopy(config)\n        \n        # Step 1: Normalize structure (flatten nested configs if needed)\n        normalized = self._normalize_structure(normalized, model_type)\n        \n        # Step 2: Normalize parameter names and values\n        normalized = self._normalize_parameter_names(normalized)\n        normalized = self._normalize_parameter_values(normalized, model_type)\n        \n        # Step 3: Apply defaults if requested\n        if apply_defaults:\n            normalized = self._apply_defaults(normalized, model_type)\n        \n        # Step 4: Validate if requested\n        if validate:\n            self._validate_config(normalized, model_type)\n        \n        logger.debug(f\"Normalized {model_type} configuration: {len(normalized)} parameters\")\n        return normalized\n    \n    def _normalize_structure(self, config: Dict[str, Any], model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Normalize the structure of the configuration dictionary.\n        \n        Some configurations may be nested or have different structures.\n        This flattens and standardizes the structure.\n        \"\"\"\n        normalized = {}\n        \n        # Handle nested configurations (e.g., from YAML files)\n        for key, value in config.items():\n            if isinstance(value, dict) and key in ['model', 'architecture', 'params']:\n                # Flatten nested model parameters\n                normalized.update(value)\n            elif key in ['training', 'inference'] and isinstance(value, dict):\n                # Some configs have training/inference sections\n                # Extract relevant parameters with prefixes\n                for nested_key, nested_value in value.items():\n                    if nested_key in ['dim_t', 'hop_length', 'n_fft', 'sample_rate']:\n                        normalized[nested_key] = nested_value\n            else:\n                normalized[key] = value\n        \n        return normalized\n    \n    def _normalize_parameter_names(self, config: Dict[str, Any]) -> Dict[str, Any]:\n        \"\"\"\n        Normalize parameter names to standard format.\n        \n        Handles different naming conventions and aliases.\n        \"\"\"\n        normalized = {}\n        \n        # Parameter name mappings (old_name -> new_name)\n        name_mappings = {\n            # Common aliases\n            'n_fft': 'stft_n_fft',\n            'hop_length': 'stft_hop_length', \n            'win_length': 'stft_win_length',\n            'window_fn': 'stft_window_fn',\n            'normalized': 'stft_normalized',\n            \n            # Transformer aliases\n            'n_heads': 'heads',\n            'num_heads': 'heads',\n            'head_dim': 'dim_head',\n            'dropout': 'attn_dropout',\n            'attention_dropout': 'attn_dropout',\n            'feedforward_dropout': 'ff_dropout',\n            \n            # Model-specific aliases\n            'expansion_factor': 'mlp_expansion_factor',\n            'mlp_ratio': 'mlp_expansion_factor',\n            'use_checkpoint': 'use_torch_checkpoint',\n            'checkpoint': 'use_torch_checkpoint',\n            \n            # Frequency band aliases\n            'freq_bands': 'freqs_per_bands',\n            'frequency_bands': 'freqs_per_bands',\n            'mel_bands': 'num_bands',\n            'n_mels': 'num_bands',\n        }\n        \n        for key, value in config.items():\n            # Apply name mapping if exists\n            normalized_key = name_mappings.get(key, key)\n            normalized[normalized_key] = value\n        \n        return normalized\n    \n    def _normalize_parameter_values(self, config: Dict[str, Any], model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Normalize parameter values to expected types and formats.\n        \"\"\"\n        normalized = {}\n        \n        for key, value in config.items():\n            normalized_value = self._normalize_single_value(key, value, model_type)\n            normalized[key] = normalized_value\n        \n        return normalized\n    \n    def _normalize_single_value(self, key: str, value: Any, model_type: str) -> Any:\n        \"\"\"Normalize a single parameter value.\"\"\"\n        \n        # Boolean normalization\n        if key in ['stereo', 'flash_attn', 'sage_attention', 'zero_dc', \n                  'use_torch_checkpoint', 'skip_connection', 'stft_normalized']:\n            if isinstance(value, str):\n                return value.lower() in ['true', '1', 'yes', 'on']\n            return bool(value)\n        \n        # Integer normalization\n        elif key in ['dim', 'depth', 'num_stems', 'time_transformer_depth', \n                    'freq_transformer_depth', 'dim_head', 'heads', \n                    'mlp_expansion_factor', 'num_bands', 'sample_rate',\n                    'stft_n_fft', 'stft_hop_length', 'stft_win_length',\n                    'mask_estimator_depth']:\n            if isinstance(value, str):\n                try:\n                    return int(float(value))  # Handle \"2.0\" -> 2\n                except (ValueError, TypeError):\n                    return value  # Let validator catch the error\n            return int(value) if isinstance(value, (int, float)) else value\n        \n        # Float normalization  \n        elif key in ['attn_dropout', 'ff_dropout', 'multi_stft_resolution_loss_weight',\n                    'fmin', 'fmax']:\n            if isinstance(value, str):\n                try:\n                    return float(value)\n                except (ValueError, TypeError):\n                    return value  # Let validator catch the error\n            return float(value) if isinstance(value, (int, float)) else value\n        \n        # Tuple/list normalization\n        elif key.startswith('freqs_per_bands') or key in ['freqs_per_bands']:\n            if isinstance(value, str):\n                # Handle string representations like \"(2, 4, 8, 16)\"\n                try:\n                    # Remove parentheses and split\n                    clean_str = value.strip('()[]').replace(' ', '')\n                    if clean_str:\n                        return tuple(int(x) for x in clean_str.split(','))\n                except (ValueError, TypeError):\n                    return value  # Let validator catch the error\n            elif isinstance(value, list):\n                return tuple(value)  # Convert lists to tuples for consistency\n            return value\n        \n        # String normalization\n        elif key in ['norm', 'act', 'mel_scale']:\n            if value is not None:\n                return str(value).lower()\n            return value\n        \n        # No normalization needed\n        else:\n            return value\n    \n    def _apply_defaults(self, config: Dict[str, Any], model_type: str) -> Dict[str, Any]:\n        \"\"\"Apply default values for missing parameters.\"\"\"\n        \n        if model_type == \"bs_roformer\":\n            validator = self.bs_validator\n        elif model_type == \"mel_band_roformer\":\n            validator = self.mel_validator\n        else:\n            validator = self.base_validator\n        \n        return validator.apply_parameter_defaults(config, model_type)\n    \n    def _validate_config(self, config: Dict[str, Any], model_type: str) -> None:\n        \"\"\"Validate the configuration and raise errors if invalid.\"\"\"\n        \n        if model_type == \"bs_roformer\":\n            validator = self.bs_validator\n        elif model_type == \"mel_band_roformer\":\n            validator = self.mel_validator\n        else:\n            validator = self.base_validator\n        \n        validator.validate_and_raise(config, model_type)\n    \n    def detect_model_type(self, config: Dict[str, Any]) -> Optional[str]:\n        \"\"\"\n        Attempt to detect the model type from configuration parameters.\n        \n        Args:\n            config: Configuration dictionary\n            \n        Returns:\n            Detected model type or None if cannot be determined\n        \"\"\"\n        # Check for BSRoformer-specific parameters\n        if 'freqs_per_bands' in config:\n            return \"bs_roformer\"\n        \n        # Check for MelBandRoformer-specific parameters\n        if 'num_bands' in config or 'n_mels' in config or 'mel_bands' in config:\n            return \"mel_band_roformer\"\n        \n        # Check for model type hints in the config\n        model_type = config.get('model_type', config.get('type', config.get('architecture')))\n        if isinstance(model_type, str):\n            model_type_lower = model_type.lower()\n            if 'bs' in model_type_lower and 'roformer' in model_type_lower:\n                return \"bs_roformer\"\n            elif 'mel' in model_type_lower and 'roformer' in model_type_lower:\n                return \"mel_band_roformer\"\n            elif 'roformer' in model_type_lower:\n                # Default to BSRoformer if just \"roformer\"\n                return \"bs_roformer\"\n        \n        return None\n    \n    def normalize_from_file_path(self, \n                                config: Dict[str, Any], \n                                file_path: str,\n                                apply_defaults: bool = True,\n                                validate: bool = True) -> Dict[str, Any]:\n        \"\"\"\n        Normalize configuration with model type detection from file path.\n        \n        Args:\n            config: Configuration dictionary\n            file_path: Path to the model file (used for type detection)\n            apply_defaults: Whether to apply defaults\n            validate: Whether to validate\n            \n        Returns:\n            Normalized configuration\n        \"\"\"\n        # Try to detect model type from file path\n        file_path_lower = file_path.lower()\n        if 'bs' in file_path_lower and 'roformer' in file_path_lower:\n            model_type = \"bs_roformer\"\n        elif 'mel' in file_path_lower and 'roformer' in file_path_lower:\n            model_type = \"mel_band_roformer\"\n        else:\n            # Try to detect from config\n            model_type = self.detect_model_type(config)\n            if model_type is None:\n                # Default to BSRoformer\n                model_type = \"bs_roformer\"\n                logger.warning(f\"Could not detect model type from config or path {file_path}, defaulting to bs_roformer\")\n        \n        return self.normalize_config(config, model_type, apply_defaults, validate)\n"
  },
  {
    "path": "audio_separator/separator/roformer/mel_band_roformer_config.py",
    "content": "\"\"\"\nMelBandRoformer-specific configuration class.\nExtends ModelConfiguration with MelBandRoformer-specific parameters.\n\"\"\"\n\nfrom dataclasses import dataclass\nfrom typing import Optional, Dict, Any\nfrom .model_configuration import ModelConfiguration, RoformerType\n\n\n@dataclass(frozen=True, unsafe_hash=True)\nclass MelBandRoformerConfig(ModelConfiguration):\n    \"\"\"\n    Configuration class specifically for MelBandRoformer models.\n    \n    MelBandRoformer processes audio using mel-scale frequency bands,\n    which are more aligned with human auditory perception.\n    \"\"\"\n    \n    # MelBandRoformer-specific required parameters\n    num_bands: int = 64  # Number of mel-scale bands\n    \n    # MelBandRoformer-specific optional parameters\n    mel_scale: str = \"htk\"  # Mel scale type: \"htk\" or \"slaney\"\n    fmin: float = 0.0  # Minimum frequency for mel scale\n    fmax: Optional[float] = None  # Maximum frequency for mel scale (None = sample_rate/2)\n    \n    def __post_init__(self):\n        \"\"\"Validate MelBandRoformer-specific configuration after initialization.\"\"\"\n        super().__post_init__()\n        self._validate_mel_band_roformer_parameters()\n    \n    def _validate_mel_band_roformer_parameters(self):\n        \"\"\"Validate MelBandRoformer-specific parameters.\"\"\"\n        if not isinstance(self.num_bands, int) or self.num_bands <= 0:\n            raise ValueError(f\"num_bands must be a positive integer, got {self.num_bands}\")\n        \n        if self.num_bands < 8:\n            raise ValueError(f\"num_bands should be at least 8 for meaningful separation, got {self.num_bands}\")\n        \n        if self.num_bands > 512:\n            raise ValueError(f\"num_bands should not exceed 512 for practical purposes, got {self.num_bands}\")\n        \n        if self.mel_scale not in [\"htk\", \"slaney\"]:\n            raise ValueError(f\"mel_scale must be 'htk' or 'slaney', got '{self.mel_scale}'\")\n        \n        if self.fmin < 0:\n            raise ValueError(f\"fmin must be non-negative, got {self.fmin}\")\n        \n        if self.fmax is not None:\n            if self.fmax <= self.fmin:\n                raise ValueError(f\"fmax ({self.fmax}) must be greater than fmin ({self.fmin})\")\n            \n            if self.fmax > self.sample_rate / 2:\n                raise ValueError(f\"fmax ({self.fmax}) cannot exceed sample_rate/2 ({self.sample_rate/2})\")\n    \n    def get_effective_fmax(self) -> float:\n        \"\"\"Get the effective maximum frequency (fmax or sample_rate/2).\"\"\"\n        return self.fmax if self.fmax is not None else self.sample_rate / 2.0\n    \n    def get_frequency_range(self) -> tuple[float, float]:\n        \"\"\"Get the frequency range (fmin, fmax) for mel scale.\"\"\"\n        return (self.fmin, self.get_effective_fmax())\n    \n    def validate_sample_rate(self, sample_rate: int) -> bool:\n        \"\"\"\n        Validate that the configuration is compatible with a given sample rate.\n        \n        Args:\n            sample_rate: Audio sample rate to validate against\n            \n        Returns:\n            True if compatible, False otherwise\n        \"\"\"\n        if sample_rate != self.sample_rate:\n            # Check if fmax is compatible with new sample rate\n            if self.fmax is not None and self.fmax > sample_rate / 2:\n                return False\n        \n        return True\n    \n    def get_mel_scale_info(self) -> Dict[str, Any]:\n        \"\"\"\n        Get information about the mel scale configuration.\n        \n        Returns:\n            Dictionary with mel scale information\n        \"\"\"\n        return {\n            'num_bands': self.num_bands,\n            'mel_scale': self.mel_scale,\n            'fmin': self.fmin,\n            'fmax': self.get_effective_fmax(),\n            'sample_rate': self.sample_rate,\n            'frequency_range': self.get_frequency_range(),\n            'bands_per_octave': self._estimate_bands_per_octave()\n        }\n    \n    def _estimate_bands_per_octave(self) -> float:\n        \"\"\"Estimate the number of mel bands per octave.\"\"\"\n        fmin, fmax = self.get_frequency_range()\n        if fmin <= 0:\n            fmin = 1.0  # Avoid log(0)\n        \n        # Rough estimation using logarithmic scale\n        import math\n        octave_range = math.log2(fmax / fmin)\n        return self.num_bands / octave_range if octave_range > 0 else self.num_bands\n    \n    def get_model_type(self) -> RoformerType:\n        \"\"\"Get the model type.\"\"\"\n        return RoformerType.MEL_BAND_ROFORMER\n    \n    def get_mel_band_roformer_kwargs(self) -> Dict[str, Any]:\n        \"\"\"Get MelBandRoformer-specific parameters for model initialization.\"\"\"\n        kwargs = self.get_transformer_kwargs()\n        kwargs.update({\n            'num_bands': self.num_bands,\n            'sample_rate': self.sample_rate,\n            'mel_scale': self.mel_scale,\n            'fmin': self.fmin,\n            'fmax': self.fmax,\n            'stereo': self.stereo,\n            'num_stems': self.num_stems,\n            'mlp_expansion_factor': self.mlp_expansion_factor,\n            'use_torch_checkpoint': self.use_torch_checkpoint,\n            'skip_connection': self.skip_connection,\n        })\n        return kwargs\n    \n    @classmethod\n    def from_model_config(cls, config_dict: Dict[str, Any]) -> 'MelBandRoformerConfig':\n        \"\"\"\n        Create MelBandRoformerConfig from a model configuration dictionary.\n        \n        Args:\n            config_dict: Dictionary containing model configuration\n            \n        Returns:\n            MelBandRoformerConfig instance\n        \"\"\"\n        # Ensure num_bands is present\n        if 'num_bands' not in config_dict:\n            # Use default if not specified\n            config_dict = config_dict.copy()\n            config_dict['num_bands'] = 64\n        \n        return cls.from_dict(config_dict)\n    \n    def suggest_optimal_bands(self, target_frequency_resolution: float = 50.0) -> int:\n        \"\"\"\n        Suggest optimal number of bands based on desired frequency resolution.\n        \n        Args:\n            target_frequency_resolution: Desired frequency resolution in Hz\n            \n        Returns:\n            Suggested number of bands\n        \"\"\"\n        fmin, fmax = self.get_frequency_range()\n        frequency_span = fmax - fmin\n        suggested_bands = int(frequency_span / target_frequency_resolution)\n        \n        # Clamp to reasonable range\n        return max(8, min(512, suggested_bands))\n    \n    def get_parameter_summary(self) -> str:\n        \"\"\"Get a summary string of key MelBandRoformer parameters.\"\"\"\n        base_summary = super().get_parameter_summary()\n        mel_info = f\", num_bands={self.num_bands}, sr={self.sample_rate}, fmax={self.get_effective_fmax():.0f}\"\n        return base_summary.replace(\")\", mel_info + \")\")\n    \n    def __repr__(self) -> str:\n        \"\"\"String representation of the MelBandRoformer configuration.\"\"\"\n        return f\"MelBandRoformerConfig({self.get_parameter_summary()})\"\n"
  },
  {
    "path": "audio_separator/separator/roformer/mel_band_roformer_validator.py",
    "content": "\"\"\"\nMelBandRoformer-specific parameter validator.\nExtends the base ParameterValidator with MelBandRoformer-specific validation logic.\n\"\"\"\n\nfrom typing import Dict, Any, List\nfrom .parameter_validator import ParameterValidator, ValidationIssue, ValidationSeverity\n\n\nclass MelBandRoformerValidator(ParameterValidator):\n    \"\"\"\n    Specialized validator for MelBandRoformer model parameters.\n    \n    Provides MelBandRoformer-specific validation beyond the base parameter validation,\n    including mel-scale frequency band configuration and sample rate validation.\n    \"\"\"\n    \n    # MelBandRoformer-specific parameter constraints\n    MIN_NUM_BANDS = 8\n    MAX_NUM_BANDS = 512\n    RECOMMENDED_NUM_BANDS = 64\n    \n    # Common sample rates and their typical mel band counts\n    SAMPLE_RATE_BAND_RECOMMENDATIONS = {\n        8000: (8, 32),\n        16000: (16, 64),\n        22050: (24, 80),\n        44100: (32, 128),\n        48000: (32, 128),\n        96000: (64, 256),\n    }\n    \n    def validate_num_bands(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate num_bands parameter for MelBandRoformer.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for num_bands\n        \"\"\"\n        issues = []\n        \n        if 'num_bands' not in config:\n            return issues  # Required parameter check handled by base class\n        \n        num_bands = config['num_bands']\n        \n        # Type check\n        if not isinstance(num_bands, int):\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"num_bands\",\n                message=\"num_bands must be an integer\",\n                suggested_fix=\"Set num_bands to an integer value\",\n                current_value=num_bands,\n                expected_value=\"integer\"\n            )\n            issues.append(issue)\n            return issues\n        \n        # Range check\n        if num_bands < self.MIN_NUM_BANDS:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"num_bands\",\n                message=f\"num_bands must be at least {self.MIN_NUM_BANDS}\",\n                suggested_fix=f\"Set num_bands to {self.MIN_NUM_BANDS} or higher\",\n                current_value=num_bands,\n                expected_value=f\">= {self.MIN_NUM_BANDS}\"\n            )\n            issues.append(issue)\n        \n        if num_bands > self.MAX_NUM_BANDS:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.WARNING,\n                parameter_name=\"num_bands\",\n                message=f\"num_bands of {num_bands} is very high and may impact performance\",\n                suggested_fix=f\"Consider using {self.MAX_NUM_BANDS} or fewer bands for better performance\",\n                current_value=num_bands,\n                expected_value=f\"<= {self.MAX_NUM_BANDS} (recommended)\"\n            )\n            issues.append(issue)\n        \n        # Check if it's a power of 2 (often preferred for efficiency)\n        if num_bands > 0 and (num_bands & (num_bands - 1)) != 0:\n            # Find nearest powers of 2\n            lower_power = 1 << (num_bands.bit_length() - 1)\n            upper_power = 1 << num_bands.bit_length()\n            \n            issue = ValidationIssue(\n                severity=ValidationSeverity.INFO,\n                parameter_name=\"num_bands\",\n                message=f\"num_bands ({num_bands}) is not a power of 2, which may be less efficient\",\n                suggested_fix=f\"Consider using {lower_power} or {upper_power} for potentially better performance\",\n                current_value=num_bands,\n                expected_value=\"power of 2 (optional optimization)\"\n            )\n            issues.append(issue)\n        \n        return issues\n    \n    def validate_sample_rate_compatibility(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate sample rate compatibility with num_bands.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for sample rate compatibility\n        \"\"\"\n        issues = []\n        \n        num_bands = config.get('num_bands')\n        sample_rate = config.get('sample_rate')\n        \n        if not isinstance(num_bands, int) or not isinstance(sample_rate, int):\n            return issues  # Type validation handled elsewhere\n        \n        # Check against known good combinations\n        if sample_rate in self.SAMPLE_RATE_BAND_RECOMMENDATIONS:\n            min_bands, max_bands = self.SAMPLE_RATE_BAND_RECOMMENDATIONS[sample_rate]\n            \n            if num_bands < min_bands:\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.WARNING,\n                    parameter_name=\"num_bands, sample_rate\",\n                    message=f\"num_bands ({num_bands}) may be too low for sample_rate ({sample_rate})\",\n                    suggested_fix=f\"Consider using {min_bands}-{max_bands} bands for {sample_rate}Hz\",\n                    current_value=f\"bands={num_bands}, sr={sample_rate}\",\n                    expected_value=f\"{min_bands}-{max_bands} bands\"\n                )\n                issues.append(issue)\n            elif num_bands > max_bands:\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.WARNING,\n                    parameter_name=\"num_bands, sample_rate\",\n                    message=f\"num_bands ({num_bands}) may be too high for sample_rate ({sample_rate})\",\n                    suggested_fix=f\"Consider using {min_bands}-{max_bands} bands for {sample_rate}Hz\",\n                    current_value=f\"bands={num_bands}, sr={sample_rate}\",\n                    expected_value=f\"{min_bands}-{max_bands} bands\"\n                )\n                issues.append(issue)\n        \n        # General heuristic: num_bands should be much smaller than nyquist frequency\n        nyquist = sample_rate // 2\n        if num_bands > nyquist // 100:  # Very rough heuristic\n            issue = ValidationIssue(\n                severity=ValidationSeverity.WARNING,\n                parameter_name=\"num_bands, sample_rate\",\n                message=f\"num_bands ({num_bands}) seems high relative to sample_rate ({sample_rate})\",\n                suggested_fix=\"Verify that this band count is appropriate for your mel-scale analysis\",\n                current_value=f\"bands={num_bands}, nyquist={nyquist}\",\n                expected_value=\"num_bands << nyquist_frequency\"\n            )\n            issues.append(issue)\n        \n        return issues\n    \n    def validate_mel_scale_config(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate mel-scale specific configuration parameters.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for mel-scale configuration\n        \"\"\"\n        issues = []\n        \n        # Check for mel-scale related parameters if present\n        mel_params = ['fmin', 'fmax', 'mel_scale']\n        \n        fmin = config.get('fmin', 0)\n        fmax = config.get('fmax')\n        sample_rate = config.get('sample_rate', 44100)\n        \n        if fmin is not None and not isinstance(fmin, (int, float)):\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"fmin\",\n                message=\"fmin must be a number\",\n                suggested_fix=\"Set fmin to a numeric value (typically 0-100 Hz)\",\n                current_value=fmin,\n                expected_value=\"number\"\n            )\n            issues.append(issue)\n        elif isinstance(fmin, (int, float)) and fmin < 0:\n            issue = ValidationIssue(\n                severity=ValidationSeverity.ERROR,\n                parameter_name=\"fmin\",\n                message=\"fmin must be non-negative\",\n                suggested_fix=\"Set fmin to 0 or a positive frequency value\",\n                current_value=fmin,\n                expected_value=\">= 0\"\n            )\n            issues.append(issue)\n        \n        if fmax is not None:\n            if not isinstance(fmax, (int, float)):\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.ERROR,\n                    parameter_name=\"fmax\",\n                    message=\"fmax must be a number\",\n                    suggested_fix=\"Set fmax to a numeric value or None for automatic setting\",\n                    current_value=fmax,\n                    expected_value=\"number or None\"\n                )\n                issues.append(issue)\n            elif isinstance(fmax, (int, float)):\n                if fmax <= 0:\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.ERROR,\n                        parameter_name=\"fmax\",\n                        message=\"fmax must be positive\",\n                        suggested_fix=\"Set fmax to a positive frequency value\",\n                        current_value=fmax,\n                        expected_value=\"> 0\"\n                    )\n                    issues.append(issue)\n                elif fmax > sample_rate // 2:\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.WARNING,\n                        parameter_name=\"fmax, sample_rate\",\n                        message=f\"fmax ({fmax}) exceeds Nyquist frequency ({sample_rate//2})\",\n                        suggested_fix=f\"Set fmax to {sample_rate//2} or lower\",\n                        current_value=fmax,\n                        expected_value=f\"<= {sample_rate//2}\"\n                    )\n                    issues.append(issue)\n                elif isinstance(fmin, (int, float)) and fmax <= fmin:\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.ERROR,\n                        parameter_name=\"fmin, fmax\",\n                        message=f\"fmax ({fmax}) must be greater than fmin ({fmin})\",\n                        suggested_fix=\"Set fmax to a value higher than fmin\",\n                        current_value=f\"fmin={fmin}, fmax={fmax}\",\n                        expected_value=\"fmax > fmin\"\n                    )\n                    issues.append(issue)\n        \n        return issues\n    \n    def validate_all(self, config: Dict[str, Any], model_type: str = \"mel_band_roformer\") -> List[ValidationIssue]:\n        \"\"\"\n        Run all MelBandRoformer-specific validation checks.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model (should be \"mel_band_roformer\")\n            \n        Returns:\n            List of all validation issues found\n        \"\"\"\n        # Start with base validation\n        all_issues = super().validate_all(config, model_type)\n        \n        # Add MelBandRoformer-specific validation\n        all_issues.extend(self.validate_num_bands(config))\n        all_issues.extend(self.validate_sample_rate_compatibility(config))\n        all_issues.extend(self.validate_mel_scale_config(config))\n        \n        return all_issues\n    \n    def get_parameter_defaults(self, model_type: str = \"mel_band_roformer\") -> Dict[str, Any]:\n        \"\"\"\n        Get MelBandRoformer-specific parameter defaults.\n        \n        Args:\n            model_type: Type of model\n            \n        Returns:\n            Dictionary of parameter names to default values\n        \"\"\"\n        defaults = super().get_parameter_defaults(model_type)\n        \n        # Override with MelBandRoformer-specific defaults\n        defaults.update({\n            'num_bands': self.RECOMMENDED_NUM_BANDS,\n            'fmin': 0,\n            'fmax': None,  # Will be set to sample_rate // 2 automatically\n            'mel_scale': 'htk',  # or 'slaney'\n        })\n        \n        return defaults\n"
  },
  {
    "path": "audio_separator/separator/roformer/model_configuration.py",
    "content": "\"\"\"\nModel configuration dataclass for Roformer models.\nSupports both old and new parameter sets with backward compatibility.\n\"\"\"\n\nfrom dataclasses import dataclass, field\nfrom typing import Optional, Tuple, Any, Dict\nfrom enum import Enum\n\n\nclass RoformerType(Enum):\n    \"\"\"Supported Roformer model types.\"\"\"\n    BS_ROFORMER = \"bs_roformer\"\n    MEL_BAND_ROFORMER = \"mel_band_roformer\"\n\n\n@dataclass(frozen=True, unsafe_hash=True)\nclass ModelConfiguration:\n    \"\"\"\n    Model configuration parameters for Roformer models.\n    \n    This class supports both old and new parameter sets to maintain\n    backward compatibility while enabling new features.\n    \"\"\"\n    \n    # Required parameters (must be provided)\n    dim: int\n    depth: int\n    \n    # Common optional parameters (with sensible defaults)\n    stereo: bool = False\n    num_stems: int = 1\n    time_transformer_depth: int = 2\n    freq_transformer_depth: int = 2\n    dim_head: int = 64\n    heads: int = 8\n    attn_dropout: float = 0.0\n    ff_dropout: float = 0.0\n    flash_attn: bool = True\n    \n    # New parameters (with defaults for backward compatibility)\n    mlp_expansion_factor: int = 4\n    sage_attention: bool = False\n    zero_dc: bool = True\n    use_torch_checkpoint: bool = False\n    skip_connection: bool = False\n    \n    # Normalization (may be None in some configs)\n    norm: Optional[str] = None\n    \n    # Model-specific parameters (set by subclasses)\n    freqs_per_bands: Optional[Tuple[int, ...]] = None  # BSRoformer\n    num_bands: Optional[int] = None  # MelBandRoformer\n    sample_rate: int = 44100  # Default sample rate\n    \n    # Additional configuration data (stored as tuple for hashability)\n    extra_config: Tuple[Tuple[str, Any], ...] = field(default_factory=tuple)\n    \n    def __post_init__(self):\n        \"\"\"Validate configuration after initialization.\"\"\"\n        self._validate_basic_parameters()\n        self._validate_parameter_ranges()\n    \n    def _validate_basic_parameters(self):\n        \"\"\"Validate basic parameter types and requirements.\"\"\"\n        if not isinstance(self.dim, int) or self.dim <= 0:\n            raise ValueError(f\"dim must be a positive integer, got {self.dim}\")\n        \n        if not isinstance(self.depth, int) or self.depth <= 0:\n            raise ValueError(f\"depth must be a positive integer, got {self.depth}\")\n        \n        if not isinstance(self.num_stems, int) or self.num_stems <= 0:\n            raise ValueError(f\"num_stems must be a positive integer, got {self.num_stems}\")\n        \n        if not isinstance(self.mlp_expansion_factor, int) or self.mlp_expansion_factor <= 0:\n            raise ValueError(f\"mlp_expansion_factor must be a positive integer, got {self.mlp_expansion_factor}\")\n    \n    def _validate_parameter_ranges(self):\n        \"\"\"Validate parameter value ranges.\"\"\"\n        if not (0.0 <= self.attn_dropout <= 1.0):\n            raise ValueError(f\"attn_dropout must be between 0.0 and 1.0, got {self.attn_dropout}\")\n        \n        if not (0.0 <= self.ff_dropout <= 1.0):\n            raise ValueError(f\"ff_dropout must be between 0.0 and 1.0, got {self.ff_dropout}\")\n        \n        if self.dim_head <= 0:\n            raise ValueError(f\"dim_head must be positive, got {self.dim_head}\")\n        \n        if self.heads <= 0:\n            raise ValueError(f\"heads must be positive, got {self.heads}\")\n        \n        if self.sample_rate <= 0:\n            raise ValueError(f\"sample_rate must be positive, got {self.sample_rate}\")\n    \n    def to_dict(self) -> Dict[str, Any]:\n        \"\"\"Convert configuration to dictionary.\"\"\"\n        result = {}\n        for field_info in self.__dataclass_fields__.values():\n            value = getattr(self, field_info.name)\n            if value is not None:\n                result[field_info.name] = value\n        return result\n    \n    @classmethod\n    def from_dict(cls, config_dict: Dict[str, Any]) -> 'ModelConfiguration':\n        \"\"\"Create configuration from dictionary.\"\"\"\n        # Extract known parameters\n        known_params = {}\n        extra_params = {}\n        \n        for key, value in config_dict.items():\n            if key in cls.__dataclass_fields__:\n                known_params[key] = value\n            else:\n                extra_params[key] = value\n        \n        # Set extra_config as tuple if there are unknown parameters\n        if extra_params:\n            known_params['extra_config'] = tuple(extra_params.items())\n        \n        return cls(**known_params)\n    \n    def get_transformer_kwargs(self) -> Dict[str, Any]:\n        \"\"\"Get parameters to pass to transformer initialization.\"\"\"\n        return {\n            'dim': self.dim,\n            'depth': self.depth,\n            'heads': self.heads,\n            'dim_head': self.dim_head,\n            'attn_dropout': self.attn_dropout,\n            'ff_dropout': self.ff_dropout,\n            'flash_attn': self.flash_attn,\n            'sage_attention': self.sage_attention,  # New parameter\n            'zero_dc': self.zero_dc,  # New parameter\n        }\n    \n    def has_new_parameters(self) -> bool:\n        \"\"\"Check if configuration uses any new parameters.\"\"\"\n        return (\n            self.mlp_expansion_factor != 4 or\n            self.sage_attention is True or\n            self.zero_dc is not True or\n            self.use_torch_checkpoint is True or\n            self.skip_connection is True\n        )\n    \n    def get_parameter_summary(self) -> str:\n        \"\"\"Get a summary string of key parameters.\"\"\"\n        return (\n            f\"ModelConfiguration(dim={self.dim}, depth={self.depth}, \"\n            f\"stems={self.num_stems}, mlp_factor={self.mlp_expansion_factor}, \"\n            f\"sage_attn={self.sage_attention}, new_params={self.has_new_parameters()})\"\n        )\n    \n    def __repr__(self) -> str:\n        \"\"\"String representation of the configuration.\"\"\"\n        return self.get_parameter_summary()\n"
  },
  {
    "path": "audio_separator/separator/roformer/model_loading_result.py",
    "content": "\"\"\"\nModel loading result dataclass.\nContains the result of model loading operations with success/failure information.\n\"\"\"\n\nfrom dataclasses import dataclass, field\nfrom typing import Optional, List, Any, Dict\nfrom enum import Enum\n\n\nclass ImplementationVersion(Enum):\n    \"\"\"Available implementation versions.\"\"\"\n    OLD = \"old\"\n    NEW = \"new\"\n    FALLBACK = \"fallback\"\n\n\n@dataclass\nclass ModelLoadingResult:\n    \"\"\"\n    Result of a model loading operation.\n    \n    Contains information about whether the loading was successful,\n    the loaded model (if successful), error details (if failed),\n    and metadata about which implementation was used.\n    \"\"\"\n    \n    success: bool\n    model: Optional[Any] = None  # Actual model instance (torch.nn.Module)\n    error_message: Optional[str] = None\n    implementation_used: ImplementationVersion = ImplementationVersion.NEW\n    warnings: List[str] = field(default_factory=list)\n    \n    # Additional metadata\n    loading_time_seconds: Optional[float] = None\n    model_info: Dict[str, Any] = field(default_factory=dict)\n    config_used: Optional[Any] = None  # The configuration that was actually used\n    \n    def __post_init__(self):\n        \"\"\"Validate result after initialization.\"\"\"\n        if self.success and self.model is None:\n            raise ValueError(\"success=True but model is None\")\n        \n        if not self.success and self.error_message is None:\n            raise ValueError(\"success=False but error_message is None\")\n        \n        if self.warnings is None:\n            self.warnings = []\n    \n    def add_warning(self, warning: str):\n        \"\"\"Add a warning message.\"\"\"\n        if warning not in self.warnings:\n            self.warnings.append(warning)\n    \n    def add_model_info(self, key: str, value: Any):\n        \"\"\"Add model metadata information.\"\"\"\n        self.model_info[key] = value\n    \n    def get_summary(self) -> str:\n        \"\"\"Get a summary string of the loading result.\"\"\"\n        if self.success:\n            summary = f\"SUCCESS: Model loaded using {self.implementation_used.value} implementation\"\n            if self.loading_time_seconds:\n                summary += f\" in {self.loading_time_seconds:.2f}s\"\n            if self.warnings:\n                summary += f\" with {len(self.warnings)} warnings\"\n        else:\n            summary = f\"FAILED: {self.error_message}\"\n        \n        return summary\n    \n    def is_fallback_used(self) -> bool:\n        \"\"\"Check if fallback implementation was used.\"\"\"\n        return self.implementation_used == ImplementationVersion.FALLBACK\n    \n    def is_new_implementation_used(self) -> bool:\n        \"\"\"Check if new implementation was used.\"\"\"\n        return self.implementation_used == ImplementationVersion.NEW\n    \n    def has_warnings(self) -> bool:\n        \"\"\"Check if there are any warnings.\"\"\"\n        return len(self.warnings) > 0\n    \n    def get_model_parameters_count(self) -> Optional[int]:\n        \"\"\"Get the number of model parameters if available.\"\"\"\n        if self.model is None:\n            return None\n        \n        try:\n            # Try to count parameters for PyTorch models\n            if hasattr(self.model, 'parameters'):\n                return sum(p.numel() for p in self.model.parameters())\n        except Exception:\n            pass\n        \n        return self.model_info.get('parameter_count')\n    \n    def get_model_size_mb(self) -> Optional[float]:\n        \"\"\"Get the model size in MB if available.\"\"\"\n        param_count = self.get_model_parameters_count()\n        if param_count is not None:\n            # Assume float32 parameters (4 bytes each)\n            return (param_count * 4) / (1024 * 1024)\n        \n        return self.model_info.get('size_mb')\n    \n    def to_dict(self) -> Dict[str, Any]:\n        \"\"\"Convert result to dictionary for serialization.\"\"\"\n        return {\n            'success': self.success,\n            'error_message': self.error_message,\n            'implementation_used': self.implementation_used.value,\n            'warnings': self.warnings,\n            'loading_time_seconds': self.loading_time_seconds,\n            'model_info': self.model_info,\n            'has_model': self.model is not None,\n            'parameter_count': self.get_model_parameters_count(),\n            'model_size_mb': self.get_model_size_mb(),\n        }\n    \n    @classmethod\n    def success_result(\n        cls, \n        model: Any, \n        implementation: ImplementationVersion = ImplementationVersion.NEW,\n        config: Optional[Any] = None,\n        loading_time: Optional[float] = None\n    ) -> 'ModelLoadingResult':\n        \"\"\"\n        Create a successful loading result.\n        \n        Args:\n            model: The loaded model instance\n            implementation: Which implementation was used\n            config: The configuration that was used\n            loading_time: Time taken to load the model\n            \n        Returns:\n            ModelLoadingResult indicating success\n        \"\"\"\n        return cls(\n            success=True,\n            model=model,\n            implementation_used=implementation,\n            config_used=config,\n            loading_time_seconds=loading_time\n        )\n    \n    @classmethod\n    def failure_result(\n        cls, \n        error_message: str,\n        implementation: ImplementationVersion = ImplementationVersion.NEW,\n        warnings: Optional[List[str]] = None\n    ) -> 'ModelLoadingResult':\n        \"\"\"\n        Create a failed loading result.\n        \n        Args:\n            error_message: Description of what went wrong\n            implementation: Which implementation was attempted\n            warnings: Any warnings that occurred before failure\n            \n        Returns:\n            ModelLoadingResult indicating failure\n        \"\"\"\n        return cls(\n            success=False,\n            error_message=error_message,\n            implementation_used=implementation,\n            warnings=warnings or []\n        )\n    \n    @classmethod\n    def fallback_success_result(\n        cls,\n        model: Any,\n        original_error: str,\n        config: Optional[Any] = None,\n        loading_time: Optional[float] = None\n    ) -> 'ModelLoadingResult':\n        \"\"\"\n        Create a successful fallback loading result.\n        \n        Args:\n            model: The loaded model instance\n            original_error: The error that caused fallback\n            config: The configuration that was used\n            loading_time: Time taken to load the model\n            \n        Returns:\n            ModelLoadingResult indicating fallback success\n        \"\"\"\n        result = cls(\n            success=True,\n            model=model,\n            implementation_used=ImplementationVersion.FALLBACK,\n            config_used=config,\n            loading_time_seconds=loading_time\n        )\n        result.add_warning(f\"Fell back to old implementation due to: {original_error}\")\n        return result\n    \n    def __str__(self) -> str:\n        \"\"\"String representation of the loading result.\"\"\"\n        return self.get_summary()\n    \n    def __repr__(self) -> str:\n        \"\"\"Detailed string representation of the loading result.\"\"\"\n        return f\"ModelLoadingResult(success={self.success}, impl={self.implementation_used.value}, warnings={len(self.warnings)})\"\n"
  },
  {
    "path": "audio_separator/separator/roformer/parameter_validation_error.py",
    "content": "\"\"\"\nParameter validation error exception.\nRaised when model parameters are invalid or incompatible.\n\"\"\"\n\nfrom typing import Any, Optional\n\n\nclass ParameterValidationError(Exception):\n    \"\"\"\n    Exception raised when model parameters are invalid.\n    \n    This exception provides detailed information about what parameter\n    was invalid, what was expected, and suggestions for fixing the issue.\n    \"\"\"\n    \n    def __init__(\n        self, \n        parameter_name: str, \n        expected_type: str, \n        actual_value: Any, \n        suggested_fix: str,\n        context: Optional[str] = None\n    ):\n        \"\"\"\n        Initialize parameter validation error.\n        \n        Args:\n            parameter_name: Name of the invalid parameter\n            expected_type: Expected type or description of valid values\n            actual_value: The actual value that was provided\n            suggested_fix: Suggestion for how to fix the issue\n            context: Additional context about where the error occurred\n        \"\"\"\n        self.parameter_name = parameter_name\n        self.expected_type = expected_type\n        self.actual_value = actual_value\n        self.suggested_fix = suggested_fix\n        self.context = context\n        \n        # Create detailed error message\n        message = self._create_error_message()\n        super().__init__(message)\n    \n    def _create_error_message(self) -> str:\n        \"\"\"Create a detailed error message.\"\"\"\n        actual_type = type(self.actual_value).__name__\n        \n        message_parts = [\n            f\"Invalid parameter '{self.parameter_name}': \",\n            f\"expected {self.expected_type}, got {actual_type} ({self.actual_value})\"\n        ]\n        \n        if self.context:\n            message_parts.append(f\" in {self.context}\")\n        \n        message_parts.append(f\". Suggestion: {self.suggested_fix}\")\n        \n        return \"\".join(message_parts)\n    \n    def get_error_details(self) -> dict:\n        \"\"\"Get error details as a dictionary.\"\"\"\n        return {\n            'parameter_name': self.parameter_name,\n            'expected_type': self.expected_type,\n            'actual_value': self.actual_value,\n            'actual_type': type(self.actual_value).__name__,\n            'suggested_fix': self.suggested_fix,\n            'context': self.context,\n            'error_message': str(self)\n        }\n    \n    @classmethod\n    def missing_parameter(cls, parameter_name: str, parameter_type: str, context: Optional[str] = None) -> 'ParameterValidationError':\n        \"\"\"\n        Create error for missing required parameter.\n        \n        Args:\n            parameter_name: Name of missing parameter\n            parameter_type: Expected type of the parameter\n            context: Context where parameter is missing\n            \n        Returns:\n            ParameterValidationError for missing parameter\n        \"\"\"\n        return cls(\n            parameter_name=parameter_name,\n            expected_type=parameter_type,\n            actual_value=None,\n            suggested_fix=f\"Add '{parameter_name}' parameter with {parameter_type} value\",\n            context=context\n        )\n    \n    @classmethod\n    def wrong_type(cls, parameter_name: str, expected_type: str, actual_value: Any, context: Optional[str] = None) -> 'ParameterValidationError':\n        \"\"\"\n        Create error for wrong parameter type.\n        \n        Args:\n            parameter_name: Name of parameter with wrong type\n            expected_type: Expected type\n            actual_value: Actual value provided\n            context: Context where error occurred\n            \n        Returns:\n            ParameterValidationError for type mismatch\n        \"\"\"\n        return cls(\n            parameter_name=parameter_name,\n            expected_type=expected_type,\n            actual_value=actual_value,\n            suggested_fix=f\"Change '{parameter_name}' to {expected_type}\",\n            context=context\n        )\n    \n    @classmethod\n    def out_of_range(cls, parameter_name: str, valid_range: str, actual_value: Any, context: Optional[str] = None) -> 'ParameterValidationError':\n        \"\"\"\n        Create error for parameter value out of valid range.\n        \n        Args:\n            parameter_name: Name of parameter out of range\n            valid_range: Description of valid range\n            actual_value: Actual value provided\n            context: Context where error occurred\n            \n        Returns:\n            ParameterValidationError for out of range value\n        \"\"\"\n        return cls(\n            parameter_name=parameter_name,\n            expected_type=f\"value in range {valid_range}\",\n            actual_value=actual_value,\n            suggested_fix=f\"Set '{parameter_name}' to a value within {valid_range}\",\n            context=context\n        )\n    \n    @classmethod\n    def incompatible_parameters(cls, parameter_names: list, issue_description: str, suggested_fix: str, context: Optional[str] = None) -> 'ParameterValidationError':\n        \"\"\"\n        Create error for incompatible parameter combination.\n        \n        Args:\n            parameter_names: List of parameter names that are incompatible\n            issue_description: Description of the incompatibility\n            suggested_fix: How to fix the incompatibility\n            context: Context where error occurred\n            \n        Returns:\n            ParameterValidationError for incompatible parameters\n        \"\"\"\n        parameter_list = \", \".join(parameter_names)\n        return cls(\n            parameter_name=parameter_list,\n            expected_type=\"compatible parameter combination\",\n            actual_value=issue_description,\n            suggested_fix=suggested_fix,\n            context=context\n        )\n    \n    @classmethod\n    def invalid_normalization(cls, norm_value: Any, supported_norms: list, context: Optional[str] = None) -> 'ParameterValidationError':\n        \"\"\"\n        Create error for invalid normalization configuration.\n        \n        Args:\n            norm_value: The invalid normalization value\n            supported_norms: List of supported normalization types\n            context: Context where error occurred\n            \n        Returns:\n            ParameterValidationError for invalid normalization\n        \"\"\"\n        supported_list = \", \".join(f\"'{norm}'\" for norm in supported_norms)\n        return cls(\n            parameter_name=\"norm\",\n            expected_type=f\"one of: {supported_list}\",\n            actual_value=norm_value,\n            suggested_fix=f\"Use one of the supported normalization types: {supported_list}\",\n            context=context\n        )\n    \n    def __repr__(self) -> str:\n        \"\"\"Detailed string representation.\"\"\"\n        return f\"ParameterValidationError(parameter='{self.parameter_name}', expected='{self.expected_type}', actual={self.actual_value})\"\n"
  },
  {
    "path": "audio_separator/separator/roformer/parameter_validator.py",
    "content": "\"\"\"\nParameter validator implementation.\nValidates Roformer model parameters according to interface contracts.\n\"\"\"\n\nfrom typing import Dict, Any, List, Optional, Union, Tuple\nimport sys\nimport os\n\n# Add contracts to path for interface imports (optional)\ntry:\n    # Find project root dynamically\n    current_dir = os.path.dirname(os.path.abspath(__file__))\n    project_root = current_dir\n    # Go up until we find the project root (contains specs/ directory)\n    while project_root and not os.path.exists(os.path.join(project_root, 'specs')):\n        parent = os.path.dirname(project_root)\n        if parent == project_root:  # Reached filesystem root\n            break\n        project_root = parent\n    \n    contracts_path = os.path.join(project_root, 'specs', '001-update-roformer-implementation', 'contracts')\n    if os.path.exists(contracts_path):\n        sys.path.append(contracts_path)\n    from parameter_validator_interface import (\n        ParameterValidatorInterface,\n        ValidationIssue,\n        ValidationSeverity\n    )\n    _has_interface = True\nexcept ImportError:\n    # Create dummy interfaces for when contracts are not available\n    from enum import Enum\n    from dataclasses import dataclass\n    \n    class ValidationSeverity(Enum):\n        ERROR = \"error\"\n        WARNING = \"warning\"\n        INFO = \"info\"\n    \n    @dataclass\n    class ValidationIssue:\n        severity: ValidationSeverity\n        parameter_name: str\n        message: str\n        suggested_fix: str\n        current_value: any = None\n        expected_value: any = None\n    \n    class ParameterValidatorInterface:\n        pass\n    \n    _has_interface = False\nfrom .parameter_validation_error import ParameterValidationError\n\n\nclass ParameterValidator(ParameterValidatorInterface):\n    \"\"\"\n    Implementation of parameter validation for Roformer models.\n    \n    Validates model parameters according to the interface contract,\n    providing detailed error messages and suggestions for fixes.\n    \"\"\"\n    \n    # Define valid parameter types and ranges\n    PARAMETER_TYPES = {\n        'dim': int,\n        'depth': int,\n        'stereo': bool,\n        'num_stems': int,\n        'time_transformer_depth': int,\n        'freq_transformer_depth': int,\n        'dim_head': int,\n        'heads': int,\n        'attn_dropout': float,\n        'ff_dropout': float,\n        'flash_attn': bool,\n        'mlp_expansion_factor': int,\n        'sage_attention': bool,\n        'zero_dc': bool,\n        'use_torch_checkpoint': bool,\n        'skip_connection': bool,\n        'sample_rate': int,\n        'freqs_per_bands': (tuple, list),\n        'num_bands': int,\n        'mask_estimator_depth': int,\n    }\n    \n    PARAMETER_RANGES = {\n        'dim': (1, 8192),\n        'depth': (1, 64),\n        'num_stems': (1, 16),\n        'time_transformer_depth': (1, 32),\n        'freq_transformer_depth': (1, 32),\n        'dim_head': (1, 1024),\n        'heads': (1, 64),\n        'attn_dropout': (0.0, 1.0),\n        'ff_dropout': (0.0, 1.0),\n        'mlp_expansion_factor': (1, 16),\n        'sample_rate': (8000, 192000),\n        'num_bands': (8, 512),\n        'mask_estimator_depth': (1, 8),\n    }\n    \n    REQUIRED_PARAMETERS = {\n        'bs_roformer': ['dim', 'depth', 'freqs_per_bands'],\n        'mel_band_roformer': ['dim', 'depth', 'num_bands'],\n    }\n    \n    SUPPORTED_NORMALIZATION_TYPES = [\n        'layer_norm', 'batch_norm', 'rms_norm', 'group_norm', \n        'instance_norm', None, 'none'\n    ]\n    \n    def validate_required_parameters(self, config: Dict[str, Any], model_type: str) -> List[ValidationIssue]:\n        \"\"\"\n        Validate that all required parameters are present.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model (\"bs_roformer\" or \"mel_band_roformer\")\n            \n        Returns:\n            List of validation issues for missing required parameters\n        \"\"\"\n        issues = []\n        \n        required_params = self.REQUIRED_PARAMETERS.get(model_type, [])\n        \n        for param_name in required_params:\n            if param_name not in config:\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.ERROR,\n                    parameter_name=param_name,\n                    message=f\"Required parameter '{param_name}' is missing for {model_type}\",\n                    suggested_fix=f\"Add '{param_name}' parameter with appropriate {self._get_expected_type_description(param_name)} value\",\n                    current_value=None,\n                    expected_value=self._get_expected_type_description(param_name)\n                )\n                issues.append(issue)\n        \n        return issues\n    \n    def validate_parameter_types(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate parameter types match expected types.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for type mismatches\n        \"\"\"\n        issues = []\n        \n        for param_name, value in config.items():\n            if param_name in self.PARAMETER_TYPES:\n                expected_type = self.PARAMETER_TYPES[param_name]\n                \n                if not self._is_correct_type(value, expected_type):\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.ERROR,\n                        parameter_name=param_name,\n                        message=f\"Parameter '{param_name}' has incorrect type\",\n                        suggested_fix=f\"Change '{param_name}' to {self._get_type_name(expected_type)}\",\n                        current_value=value,\n                        expected_value=self._get_type_name(expected_type)\n                    )\n                    issues.append(issue)\n        \n        return issues\n    \n    def validate_parameter_ranges(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate parameter values are within acceptable ranges.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for out-of-range values\n        \"\"\"\n        issues = []\n        \n        for param_name, value in config.items():\n            if param_name in self.PARAMETER_RANGES:\n                min_val, max_val = self.PARAMETER_RANGES[param_name]\n                \n                if isinstance(value, (int, float)) and not (min_val <= value <= max_val):\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.ERROR,\n                        parameter_name=param_name,\n                        message=f\"Parameter '{param_name}' value {value} is outside valid range [{min_val}, {max_val}]\",\n                        suggested_fix=f\"Set '{param_name}' to a value between {min_val} and {max_val}\",\n                        current_value=value,\n                        expected_value=f\"{min_val} <= value <= {max_val}\"\n                    )\n                    issues.append(issue)\n        \n        return issues\n    \n    def validate_parameter_compatibility(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate that parameter combinations are compatible.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for incompatible parameter combinations\n        \"\"\"\n        issues = []\n        \n        # Check sage_attention and flash_attn compatibility\n        if config.get('sage_attention', False) and config.get('flash_attn', False):\n            issue = ValidationIssue(\n                severity=ValidationSeverity.WARNING,\n                parameter_name=\"sage_attention, flash_attn\",\n                message=\"Using both sage_attention=True and flash_attn=True may cause conflicts\",\n                suggested_fix=\"Consider using only one attention mechanism\",\n                current_value=\"both True\",\n                expected_value=\"only one True\"\n            )\n            issues.append(issue)\n        \n        # Check freqs_per_bands consistency for BSRoformer\n        if 'freqs_per_bands' in config:\n            freqs = config['freqs_per_bands']\n            if isinstance(freqs, (list, tuple)) and len(freqs) > 0:\n                total_freqs = sum(freqs)\n                # Check if it looks like a reasonable STFT frequency count\n                if total_freqs < 64 or total_freqs > 4096:\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.WARNING,\n                        parameter_name=\"freqs_per_bands\",\n                        message=f\"Sum of freqs_per_bands ({total_freqs}) may be incompatible with typical STFT configurations\",\n                        suggested_fix=\"Verify that freqs_per_bands sum matches your STFT n_fft//2 + 1\",\n                        current_value=total_freqs,\n                        expected_value=\"64 to 4096 (typical range)\"\n                    )\n                    issues.append(issue)\n        \n        # Check num_bands vs sample_rate for MelBandRoformer\n        if 'num_bands' in config and 'sample_rate' in config:\n            num_bands = config['num_bands']\n            sample_rate = config['sample_rate']\n            if num_bands > sample_rate // 100:  # Very rough heuristic\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.WARNING,\n                    parameter_name=\"num_bands, sample_rate\",\n                    message=f\"num_bands ({num_bands}) may be too high for sample_rate ({sample_rate})\",\n                    suggested_fix=\"Consider reducing num_bands or verify it's appropriate for your use case\",\n                    current_value=f\"bands={num_bands}, sr={sample_rate}\",\n                    expected_value=\"num_bands << sample_rate\"\n                )\n                issues.append(issue)\n        \n        return issues\n    \n    def validate_normalization_config(self, norm_config: Any) -> List[ValidationIssue]:\n        \"\"\"\n        Validate normalization configuration.\n        \n        Args:\n            norm_config: Normalization configuration (may be string, dict, or None)\n            \n        Returns:\n            List of validation issues for normalization configuration\n        \"\"\"\n        issues = []\n        \n        if norm_config is not None:\n            if isinstance(norm_config, str):\n                if norm_config not in self.SUPPORTED_NORMALIZATION_TYPES:\n                    issue = ValidationIssue(\n                        severity=ValidationSeverity.ERROR,\n                        parameter_name=\"norm\",\n                        message=f\"Unsupported normalization type '{norm_config}'\",\n                        suggested_fix=f\"Use one of: {', '.join(str(t) for t in self.SUPPORTED_NORMALIZATION_TYPES if t is not None)}\",\n                        current_value=norm_config,\n                        expected_value=\"supported normalization type\"\n                    )\n                    issues.append(issue)\n            elif not isinstance(norm_config, dict):\n                issue = ValidationIssue(\n                    severity=ValidationSeverity.ERROR,\n                    parameter_name=\"norm\",\n                    message=f\"Normalization config must be string, dict, or None, got {type(norm_config).__name__}\",\n                    suggested_fix=\"Use a string normalization type or None\",\n                    current_value=norm_config,\n                    expected_value=\"string, dict, or None\"\n                )\n                issues.append(issue)\n        \n        return issues\n    \n    def get_parameter_defaults(self, model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Get default values for optional parameters.\n        \n        Args:\n            model_type: Type of model (\"bs_roformer\" or \"mel_band_roformer\")\n            \n        Returns:\n            Dictionary of parameter names to default values\n        \"\"\"\n        defaults = {\n            'stereo': False,\n            'num_stems': 2,\n            'time_transformer_depth': 2,\n            'freq_transformer_depth': 2,\n            'dim_head': 64,\n            'heads': 8,\n            'attn_dropout': 0.0,\n            'ff_dropout': 0.0,\n            'flash_attn': True,\n            'mlp_expansion_factor': 4,\n            'sage_attention': False,\n            'zero_dc': True,\n            'use_torch_checkpoint': False,\n            'skip_connection': False,\n            'sample_rate': 44100,\n            'norm': None,\n        }\n        \n        # Add model-specific defaults\n        if model_type == 'bs_roformer':\n            defaults.update({\n                'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n                'mask_estimator_depth': 2,\n            })\n        elif model_type == 'mel_band_roformer':\n            defaults.update({\n                'num_bands': 64,\n            })\n        \n        return defaults\n    \n    def apply_parameter_defaults(self, config: Dict[str, Any], model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Apply default values to missing optional parameters.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model\n            \n        Returns:\n            Configuration with defaults applied\n        \"\"\"\n        defaults = self.get_parameter_defaults(model_type)\n        result_config = defaults.copy()\n        result_config.update(config)  # Override defaults with provided values\n        \n        return result_config\n    \n    def _is_correct_type(self, value: Any, expected_type: Union[type, Tuple[type, ...]]) -> bool:\n        \"\"\"Check if value matches expected type(s).\"\"\"\n        if isinstance(expected_type, tuple):\n            return isinstance(value, expected_type)\n        return isinstance(value, expected_type)\n    \n    def _get_type_name(self, expected_type: Union[type, Tuple[type, ...]]) -> str:\n        \"\"\"Get human-readable type name.\"\"\"\n        if isinstance(expected_type, tuple):\n            return \" or \".join(t.__name__ for t in expected_type)\n        return expected_type.__name__\n    \n    def _get_expected_type_description(self, param_name: str) -> str:\n        \"\"\"Get description of expected type for a parameter.\"\"\"\n        if param_name in self.PARAMETER_TYPES:\n            return self._get_type_name(self.PARAMETER_TYPES[param_name])\n        return \"appropriate type\"\n    \n    def validate_all(self, config: Dict[str, Any], model_type: str) -> List[ValidationIssue]:\n        \"\"\"\n        Run all validation checks on a configuration.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model\n            \n        Returns:\n            List of all validation issues found\n        \"\"\"\n        all_issues = []\n        \n        all_issues.extend(self.validate_required_parameters(config, model_type))\n        all_issues.extend(self.validate_parameter_types(config))\n        all_issues.extend(self.validate_parameter_ranges(config))\n        all_issues.extend(self.validate_parameter_compatibility(config))\n        \n        # Validate normalization if present\n        if 'norm' in config:\n            all_issues.extend(self.validate_normalization_config(config['norm']))\n        \n        return all_issues\n    \n    def validate_and_raise(self, config: Dict[str, Any], model_type: str) -> None:\n        \"\"\"\n        Validate configuration and raise ParameterValidationError if issues found.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model\n            \n        Raises:\n            ParameterValidationError: If validation issues are found\n        \"\"\"\n        issues = self.validate_all(config, model_type)\n        \n        # Find first error (not warning)\n        error_issues = [issue for issue in issues if issue.severity == ValidationSeverity.ERROR]\n        \n        if error_issues:\n            first_error = error_issues[0]\n            raise ParameterValidationError(\n                parameter_name=first_error.parameter_name,\n                expected_type=first_error.expected_value or \"valid value\",\n                actual_value=first_error.current_value,\n                suggested_fix=first_error.suggested_fix,\n                context=f\"{model_type} model validation\"\n            )\n"
  },
  {
    "path": "audio_separator/separator/roformer/roformer_loader.py",
    "content": "\"\"\"Roformer model loader with simplified new-implementation only path.\"\"\"\nfrom typing import Dict, Any\nimport logging\nimport os\n\nfrom .model_loading_result import ModelLoadingResult, ImplementationVersion\nfrom .configuration_normalizer import ConfigurationNormalizer\nfrom .parameter_validation_error import ParameterValidationError\n\nlogger = logging.getLogger(__name__)\n\n\nclass RoformerLoader:\n    \"\"\"Main Roformer model loader (new implementation only).\"\"\"\n\n    def __init__(self):\n        self.config_normalizer = ConfigurationNormalizer()\n        self._loading_stats = {\n            'new_implementation_success': 0,\n            'total_failures': 0\n        }\n\n    def load_model(self,\n                   model_path: str,\n                   config: Dict[str, Any],\n                   device: str = 'cpu') -> ModelLoadingResult:\n        logger.info(f\"Loading Roformer model from {model_path}\")\n        try:\n            normalized_config = self.config_normalizer.normalize_from_file_path(\n                config, model_path, apply_defaults=True, validate=True\n            )\n            model_type = self.config_normalizer.detect_model_type(normalized_config)\n            logger.debug(f\"Detected model type: {model_type}\")\n        except ParameterValidationError as e:\n            logger.error(f\"Configuration validation failed: {e}\")\n            return ModelLoadingResult.failure_result(\n                error_message=f\"Config validation: {e}\",\n                implementation=ImplementationVersion.NEW,\n            )\n\n        try:\n            result = self._load_with_new_implementation(\n                model_path, normalized_config, model_type, device\n            )\n            self._loading_stats['new_implementation_success'] += 1\n            logger.info(f\"Successfully loaded {model_type} model with new implementation\")\n            return result\n        except (RuntimeError, ValueError, TypeError) as e:\n            logger.error(f\"New implementation failed: {e}\")\n            # Attempt legacy fallback using the original (pre-normalized) configuration\n            try:\n                fallback_result = self._load_with_legacy_implementation(\n                    model_path=model_path,\n                    original_config=config,\n                    device=device,\n                    original_error=str(e)\n                )\n                logger.warning(\"Fell back to legacy Roformer implementation successfully\")\n                return fallback_result\n            except (RuntimeError, ValueError, TypeError) as fallback_error:\n                logger.error(f\"Legacy implementation also failed: {fallback_error}\")\n                self._loading_stats['total_failures'] += 1\n                return ModelLoadingResult.failure_result(\n                    error_message=f\"New implementation failed: {e}; Legacy fallback failed: {fallback_error}\",\n                    implementation=ImplementationVersion.NEW,\n                )\n\n    def validate_configuration(self, config: Dict[str, Any], model_type: str) -> bool:\n        try:\n            _ = self.config_normalizer.normalize_config(\n                config, model_type, apply_defaults=False, validate=True\n            )\n            logger.debug(f\"Configuration validation passed for {model_type}\")\n            return True\n        except ParameterValidationError as e:\n            logger.warning(f\"Configuration validation failed for {model_type}: {e}\")\n            return False\n        except (RuntimeError, ValueError) as e:\n            logger.error(f\"Unexpected error during validation: {e}\")\n            return False\n\n    def _load_with_new_implementation(self,\n                                      model_path: str,\n                                      config: Dict[str, Any],\n                                      model_type: str,\n                                      device: str) -> ModelLoadingResult:\n        import torch\n\n        try:\n            if model_type == \"bs_roformer\":\n                model = self._create_bs_roformer(config)\n            elif model_type == \"mel_band_roformer\":\n                model = self._create_mel_band_roformer(config)\n            else:\n                raise ValueError(f\"Unknown model type: {model_type}\")\n\n            if os.path.exists(model_path):\n                state_dict = torch.load(model_path, map_location=device)\n                if isinstance(state_dict, dict) and 'state_dict' in state_dict:\n                    model.load_state_dict(state_dict['state_dict'])\n                elif isinstance(state_dict, dict) and 'model' in state_dict:\n                    model.load_state_dict(state_dict['model'])\n                else:\n                    model.load_state_dict(state_dict)\n                logger.debug(f\"Loaded state dict from {model_path}\")\n\n            model.to(device)\n            model.eval()\n\n            result = ModelLoadingResult.success_result(\n                model=model,\n                implementation=ImplementationVersion.NEW,\n                config=config,\n            )\n            result.add_model_info('model_type', model_type)\n            result.add_model_info('loading_method', 'direct')\n            result.add_model_info('device', device)\n            return result\n        except (RuntimeError, ValueError) as e:\n            logger.error(f\"Failed to create {model_type} model: {e}\")\n            raise\n\n    def _create_bs_roformer(self, config: Dict[str, Any]):\n        from ..uvr_lib_v5.roformer.bs_roformer import BSRoformer\n        model_args = {\n            'dim': config['dim'],\n            'depth': config['depth'],\n            'stereo': config.get('stereo', False),\n            'num_stems': config.get('num_stems', 2),\n            'time_transformer_depth': config.get('time_transformer_depth', 2),\n            'freq_transformer_depth': config.get('freq_transformer_depth', 2),\n            'freqs_per_bands': config['freqs_per_bands'],\n            'dim_head': config.get('dim_head', 64),\n            'heads': config.get('heads', 8),\n            'attn_dropout': config.get('attn_dropout', 0.0),\n            'ff_dropout': config.get('ff_dropout', 0.0),\n            'flash_attn': config.get('flash_attn', True),\n            'mlp_expansion_factor': config.get('mlp_expansion_factor', 4),\n            'sage_attention': config.get('sage_attention', False),\n            'zero_dc': config.get('zero_dc', True),\n            'use_torch_checkpoint': config.get('use_torch_checkpoint', False),\n            'skip_connection': config.get('skip_connection', False),\n        }\n        if 'stft_n_fft' in config:\n            model_args['stft_n_fft'] = config['stft_n_fft']\n        if 'stft_hop_length' in config:\n            model_args['stft_hop_length'] = config['stft_hop_length']\n        if 'stft_win_length' in config:\n            model_args['stft_win_length'] = config['stft_win_length']\n        logger.debug(f\"Creating BSRoformer with args: {list(model_args.keys())}\")\n        return BSRoformer(**model_args)\n\n    def _create_mel_band_roformer(self, config: Dict[str, Any]):\n        from ..uvr_lib_v5.roformer.mel_band_roformer import MelBandRoformer\n        model_args = {\n            'dim': config['dim'],\n            'depth': config['depth'],\n            'stereo': config.get('stereo', False),\n            'num_stems': config.get('num_stems', 2),\n            'time_transformer_depth': config.get('time_transformer_depth', 2),\n            'freq_transformer_depth': config.get('freq_transformer_depth', 2),\n            'num_bands': config['num_bands'],\n            'dim_head': config.get('dim_head', 64),\n            'heads': config.get('heads', 8),\n            'attn_dropout': config.get('attn_dropout', 0.0),\n            'ff_dropout': config.get('ff_dropout', 0.0),\n            'flash_attn': config.get('flash_attn', True),\n            'mlp_expansion_factor': config.get('mlp_expansion_factor', 4),\n            'sage_attention': config.get('sage_attention', False),\n            'zero_dc': config.get('zero_dc', True),\n            'use_torch_checkpoint': config.get('use_torch_checkpoint', False),\n            'skip_connection': config.get('skip_connection', False),\n        }\n        if 'sample_rate' in config:\n            model_args['sample_rate'] = config['sample_rate']\n        # Optional parameters commonly present in legacy configs\n        for optional_key in [\n            'mask_estimator_depth',\n            'stft_n_fft',\n            'stft_hop_length',\n            'stft_win_length',\n            'stft_normalized',\n            'stft_window_fn',\n            'multi_stft_resolution_loss_weight',\n            'multi_stft_resolutions_window_sizes',\n            'multi_stft_hop_size',\n            'multi_stft_normalized',\n            'multi_stft_window_fn',\n            'match_input_audio_length',\n        ]:\n            if optional_key in config:\n                model_args[optional_key] = config[optional_key]\n        # Note: fmin and fmax are defined in config classes but not accepted by current constructor\n        logger.debug(f\"Creating MelBandRoformer with args: {list(model_args.keys())}\")\n        return MelBandRoformer(**model_args)\n\n    def _load_with_legacy_implementation(self,\n                                          model_path: str,\n                                          original_config: Dict[str, Any],\n                                          device: str,\n                                          original_error: str) -> ModelLoadingResult:\n        \"\"\"\n        Attempt to load the model using the legacy direct-constructor path\n        for maximum backward compatibility with existing checkpoints.\n        \"\"\"\n        import torch\n\n        # Use nested 'model' section if present; otherwise assume flat\n        model_cfg = original_config.get('model', original_config)\n\n        # Determine model type from config\n        if 'num_bands' in model_cfg:\n            from ..uvr_lib_v5.roformer.mel_band_roformer import MelBandRoformer\n            model = MelBandRoformer(**model_cfg)\n        elif 'freqs_per_bands' in model_cfg:\n            from ..uvr_lib_v5.roformer.bs_roformer import BSRoformer\n            model = BSRoformer(**model_cfg)\n        else:\n            raise ValueError(\"Unknown Roformer model type in legacy configuration\")\n\n        # Load checkpoint as raw state dict (legacy behavior)\n        try:\n            checkpoint = torch.load(model_path, map_location='cpu', weights_only=True)\n        except TypeError:\n            # For older torch versions without weights_only\n            checkpoint = torch.load(model_path, map_location='cpu')\n\n        model.load_state_dict(checkpoint)\n        model.to(device).eval()\n\n        return ModelLoadingResult.fallback_success_result(\n            model=model,\n            original_error=original_error,\n            config=original_config,\n        )\n\n    def get_loading_stats(self) -> Dict[str, int]:\n        return self._loading_stats.copy()\n\n    def reset_loading_stats(self) -> None:\n        self._loading_stats = {\n            'new_implementation_success': 0,\n            'total_failures': 0\n        }\n\n    def detect_model_type(self, model_path: str) -> str:\n        model_path_lower = model_path.lower()\n        if any(indicator in model_path_lower for indicator in ['bs_roformer', 'bs-roformer', 'bsroformer']):\n            return \"bs_roformer\"\n        if any(indicator in model_path_lower for indicator in ['mel_band_roformer', 'mel-band-roformer', 'melband']):\n            return \"mel_band_roformer\"\n        if 'roformer' in model_path_lower:\n            logger.warning(f\"Generic 'roformer' detected in {model_path}, defaulting to bs_roformer\")\n            return \"bs_roformer\"\n        raise ValueError(f\"Cannot determine Roformer model type from path: {model_path}\")\n\n    def get_default_configuration(self, model_type: str) -> Dict[str, Any]:\n        if model_type == \"bs_roformer\":\n            return {\n                'dim': 512,\n                'depth': 12,\n                'stereo': False,\n                'num_stems': 2,\n                'time_transformer_depth': 2,\n                'freq_transformer_depth': 2,\n                'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n                'dim_head': 64,\n                'heads': 8,\n                'attn_dropout': 0.0,\n                'ff_dropout': 0.0,\n                'flash_attn': True,\n                'mlp_expansion_factor': 4,\n                'sage_attention': False,\n                'zero_dc': True,\n                'use_torch_checkpoint': False,\n                'skip_connection': False,\n                'mask_estimator_depth': 2,\n                'stft_n_fft': 2048,\n                'stft_hop_length': 512,\n                'stft_win_length': 2048,\n            }\n        elif model_type == \"mel_band_roformer\":\n            return {\n                'dim': 512,\n                'depth': 12,\n                'stereo': False,\n                'num_stems': 2,\n                'time_transformer_depth': 2,\n                'freq_transformer_depth': 2,\n                'num_bands': 64,\n                'dim_head': 64,\n                'heads': 8,\n                'attn_dropout': 0.0,\n                'ff_dropout': 0.0,\n                'flash_attn': True,\n                'mlp_expansion_factor': 4,\n                'sage_attention': False,\n                'zero_dc': True,\n                'use_torch_checkpoint': False,\n                'skip_connection': False,\n                'sample_rate': 44100,\n                # Note: fmin and fmax are not implemented in MelBandRoformer constructor\n            }\n        else:\n            raise ValueError(f\"Unknown model type: {model_type}\")\n"
  },
  {
    "path": "audio_separator/separator/separator.py",
    "content": "\"\"\" This file contains the Separator class, to facilitate the separation of stems from audio. \"\"\"\n\nfrom importlib import metadata, resources\nimport os\nimport sys\nimport platform\nimport subprocess\nimport time\nimport logging\nimport warnings\nimport importlib\nimport io\nimport re\nimport librosa\nimport numpy as np\nfrom typing import Optional\n\nimport hashlib\nimport json\nimport yaml\nimport requests\nimport torch\nimport torch.amp.autocast_mode as autocast_mode\nimport onnxruntime as ort\nfrom tqdm import tqdm\nfrom audio_separator.separator.ensembler import Ensembler\n\n# Mapping of common stem name variations to canonical names for ensemble grouping.\nSTEM_NAME_MAP = {\n    \"vocals\": \"Vocals\",\n    \"instrumental\": \"Instrumental\",\n    \"inst\": \"Instrumental\",\n    \"karaoke\": \"Instrumental\",\n    \"other\": \"Other\",\n    \"no_vocals\": \"Instrumental\",\n    \"drums\": \"Drums\",\n    \"bass\": \"Bass\",\n    \"guitar\": \"Guitar\",\n    \"piano\": \"Piano\",\n    \"synthesizer\": \"Synthesizer\",\n    \"strings\": \"Strings\",\n    \"woodwinds\": \"Woodwinds\",\n    \"brass\": \"Brass\",\n    \"wind inst\": \"Wind Inst\",\n    \"lead vocals\": \"Lead Vocals\",\n    \"backing vocals\": \"Backing Vocals\",\n    \"primary stem\": \"Primary Stem\",\n    \"secondary stem\": \"Secondary Stem\",\n}\n\n\nclass Separator:\n    \"\"\"\n    The Separator class is designed to facilitate the separation of audio sources from a given audio file.\n    It supports various separation architectures and models, including MDX, VR, and Demucs. The class provides\n    functionalities to configure separation parameters, load models, and perform audio source separation.\n    It also handles logging, normalization, and output formatting of the separated audio stems.\n\n    The actual separation task is handled by one of the architecture-specific classes in the `architectures` module;\n    this class is responsible for initialising logging, configuring hardware acceleration, loading the model,\n    initiating the separation process and passing outputs back to the caller.\n\n    Common Attributes:\n        log_level (int): The logging level.\n        log_formatter (logging.Formatter): The logging formatter.\n        model_file_dir (str): The directory where model files are stored.\n        output_dir (str): The directory where output files will be saved.\n        output_format (str): The format of the output audio file.\n        output_bitrate (str): The bitrate of the output audio file.\n        amplification_threshold (float): The threshold for audio amplification.\n        normalization_threshold (float): The threshold for audio normalization.\n        output_single_stem (str): Option to output a single stem.\n        invert_using_spec (bool): Flag to invert using spectrogram.\n        sample_rate (int): The sample rate of the audio.\n        use_soundfile (bool): Use soundfile for audio writing, can solve OOM issues.\n        use_autocast (bool): Flag to use PyTorch autocast for faster inference.\n\n    MDX Architecture Specific Attributes:\n        hop_length (int): The hop length for STFT.\n        segment_size (int): The segment size for processing.\n        overlap (float): The overlap between segments.\n        batch_size (int): The batch size for processing.\n        enable_denoise (bool): Flag to enable or disable denoising.\n\n    VR Architecture Specific Attributes & Defaults:\n        batch_size: 16\n        window_size: 512\n        aggression: 5\n        enable_tta: False\n        enable_post_process: False\n        post_process_threshold: 0.2\n        high_end_process: False\n\n    Demucs Architecture Specific Attributes & Defaults:\n        segment_size: \"Default\"\n        shifts: 2\n        overlap: 0.25\n        segments_enabled: True\n\n    MDXC Architecture Specific Attributes & Defaults:\n        segment_size: 256\n        override_model_segment_size: False\n        batch_size: 1\n        overlap: 8\n        pitch_shift: 0\n    \"\"\"\n\n    def __init__(\n        self,\n        log_level=logging.INFO,\n        log_formatter=None,\n        model_file_dir=\"/tmp/audio-separator-models/\",\n        output_dir=None,\n        output_format=\"WAV\",\n        output_bitrate=None,\n        normalization_threshold=0.9,\n        amplification_threshold=0.0,\n        output_single_stem=None,\n        invert_using_spec=False,\n        sample_rate=44100,\n        use_soundfile=False,\n        use_autocast=False,\n        use_directml=False,\n        chunk_duration=None,\n        mdx_params={\"hop_length\": 1024, \"segment_size\": 256, \"overlap\": 0.25, \"batch_size\": 1, \"enable_denoise\": False},\n        vr_params={\"batch_size\": 1, \"window_size\": 512, \"aggression\": 5, \"enable_tta\": False, \"enable_post_process\": False, \"post_process_threshold\": 0.2, \"high_end_process\": False},\n        demucs_params={\"segment_size\": \"Default\", \"shifts\": 2, \"overlap\": 0.25, \"segments_enabled\": True},\n        mdxc_params={\"segment_size\": 256, \"override_model_segment_size\": False, \"batch_size\": 1, \"overlap\": 8, \"pitch_shift\": 0},\n        ensemble_algorithm=None,\n        ensemble_weights=None,\n        ensemble_preset=None,\n        info_only=False,\n    ):\n        \"\"\"Initialize the separator.\"\"\"\n        self.logger = logging.getLogger(__name__)\n        self.logger.setLevel(log_level)\n        self.log_level = log_level\n        self.log_formatter = log_formatter\n\n        self.log_handler = logging.StreamHandler()\n\n        if self.log_formatter is None:\n            self.log_formatter = logging.Formatter(\"%(asctime)s - %(levelname)s - %(module)s - %(message)s\")\n\n        self.log_handler.setFormatter(self.log_formatter)\n\n        if not self.logger.hasHandlers():\n            self.logger.addHandler(self.log_handler)\n\n        # Filter out noisy warnings from PyTorch for users who don't care about them\n        if log_level > logging.DEBUG:\n            warnings.filterwarnings(\"ignore\")\n\n        # Skip initialization logs if info_only is True\n        if not info_only:\n            package_version = self.get_package_distribution(\"audio-separator\").version\n            self.logger.info(f\"Separator version {package_version} instantiating with output_dir: {output_dir}, output_format: {output_format}\")\n\n        if output_dir is None:\n            output_dir = os.getcwd()\n            if not info_only:\n                self.logger.info(\"Output directory not specified. Using current working directory.\")\n\n        self.output_dir = output_dir\n\n        # Check for environment variable to override model_file_dir\n        env_model_dir = os.environ.get(\"AUDIO_SEPARATOR_MODEL_DIR\")\n        if env_model_dir:\n            self.model_file_dir = env_model_dir\n            self.logger.info(f\"Using model directory from AUDIO_SEPARATOR_MODEL_DIR env var: {self.model_file_dir}\")\n            if not os.path.exists(self.model_file_dir):\n                raise FileNotFoundError(f\"The specified model directory does not exist: {self.model_file_dir}\")\n        else:\n            self.logger.info(f\"Using model directory from model_file_dir parameter: {model_file_dir}\")\n            self.model_file_dir = model_file_dir\n\n        # Create the model directory if it does not exist\n        os.makedirs(self.model_file_dir, exist_ok=True)\n        os.makedirs(self.output_dir, exist_ok=True)\n\n        self.output_format = output_format\n        self.output_bitrate = output_bitrate\n\n        if self.output_format is None:\n            self.output_format = \"WAV\"\n\n        self.normalization_threshold = normalization_threshold\n        if normalization_threshold <= 0 or normalization_threshold > 1:\n            raise ValueError(\"The normalization_threshold must be greater than 0 and less than or equal to 1.\")\n\n        self.amplification_threshold = amplification_threshold\n        if amplification_threshold < 0 or amplification_threshold > 1:\n            raise ValueError(\"The amplification_threshold must be greater than or equal to 0 and less than or equal to 1.\")\n\n        self.output_single_stem = output_single_stem\n        if output_single_stem is not None:\n            self.logger.debug(f\"Single stem output requested, so only one output file ({output_single_stem}) will be written\")\n\n        self.invert_using_spec = invert_using_spec\n        if self.invert_using_spec:\n            self.logger.debug(f\"Secondary step will be inverted using spectogram rather than waveform. This may improve quality but is slightly slower.\")\n\n        try:\n            self.sample_rate = int(sample_rate)\n            if self.sample_rate <= 0:\n                raise ValueError(f\"The sample rate setting is {self.sample_rate} but it must be a non-zero whole number.\")\n            if self.sample_rate > 12800000:\n                raise ValueError(f\"The sample rate setting is {self.sample_rate}. Enter something less ambitious.\")\n        except ValueError:\n            raise ValueError(\"The sample rate must be a non-zero whole number. Please provide a valid integer.\")\n\n        self.use_soundfile = use_soundfile\n        self.use_autocast = use_autocast\n        self.use_directml = use_directml\n\n        self.chunk_duration = chunk_duration\n        if chunk_duration is not None:\n            if chunk_duration <= 0:\n                raise ValueError(\"chunk_duration must be greater than 0\")\n\n        self.ensemble_algorithm = ensemble_algorithm\n        self.ensemble_weights = ensemble_weights\n        self.ensemble_preset = ensemble_preset\n        self._ensemble_preset_models = None\n\n        # If an ensemble preset is specified, load it and apply defaults\n        if ensemble_preset is not None:\n            preset_data = self._load_ensemble_preset(ensemble_preset)\n            self._ensemble_preset_models = preset_data[\"models\"]\n            # Preset values are defaults — explicit user args (non-None) take priority\n            if ensemble_algorithm is None:\n                self.ensemble_algorithm = preset_data[\"algorithm\"]\n            if ensemble_weights is None and preset_data.get(\"weights\") is not None:\n                self.ensemble_weights = preset_data[\"weights\"]\n\n        # Apply default algorithm if still not set (no preset, no explicit arg)\n        if self.ensemble_algorithm is None:\n            self.ensemble_algorithm = \"avg_wave\"\n\n        # These are parameters which users may want to configure so we expose them to the top-level Separator class,\n        # even though they are specific to a single model architecture\n        self.arch_specific_params = {\"MDX\": mdx_params, \"VR\": vr_params, \"Demucs\": demucs_params, \"MDXC\": mdxc_params}\n\n        self.torch_device = None\n        self.torch_device_cpu = None\n        self.torch_device_mps = None\n\n        self.onnx_execution_provider = None\n        self.model_instance = None\n        self.model_filename = None\n        self.model_filenames = []\n\n        self.model_is_uvr_vip = False\n        self.model_friendly_name = None\n\n        if not info_only:\n            self.setup_accelerated_inferencing_device()\n\n    VALID_ENSEMBLE_ALGORITHMS = [\n        \"avg_wave\", \"median_wave\", \"min_wave\", \"max_wave\",\n        \"avg_fft\", \"median_fft\", \"min_fft\", \"max_fft\",\n        \"uvr_max_spec\", \"uvr_min_spec\", \"ensemble_wav\",\n    ]\n\n    def _load_ensemble_preset(self, preset_name):\n        \"\"\"\n        Load and validate an ensemble preset from ensemble_presets.json.\n\n        Returns a dict with keys: name, description, models, algorithm, weights, contributor.\n        Raises ValueError if the preset is not found or fails validation.\n        \"\"\"\n        try:\n            with resources.open_text(\"audio_separator\", \"ensemble_presets.json\") as f:\n                presets_data = json.load(f)\n        except FileNotFoundError:\n            raise ValueError(\"Ensemble presets file not found. The package may be corrupted or improperly installed.\")\n\n        presets = presets_data.get(\"presets\", {})\n        if preset_name not in presets:\n            available = \", \".join(sorted(presets.keys()))\n            raise ValueError(f\"Unknown ensemble preset: '{preset_name}'. Available presets: {available}\")\n\n        preset = presets[preset_name]\n\n        # Validate models\n        models = preset.get(\"models\", [])\n        if not isinstance(models, list) or len(models) < 2:\n            raise ValueError(f\"Ensemble preset '{preset_name}' must specify at least 2 models, got {len(models) if isinstance(models, list) else 0}\")\n\n        # Validate algorithm\n        algorithm = preset.get(\"algorithm\", \"avg_wave\")\n        if algorithm not in self.VALID_ENSEMBLE_ALGORITHMS:\n            raise ValueError(f\"Ensemble preset '{preset_name}' has unknown algorithm: '{algorithm}'\")\n\n        # Validate weights\n        weights = preset.get(\"weights\")\n        if weights is not None:\n            if not isinstance(weights, list) or len(weights) != len(models):\n                raise ValueError(f\"Ensemble preset '{preset_name}' weights length ({len(weights) if isinstance(weights, list) else 'N/A'}) must match models count ({len(models)})\")\n\n        self.logger.info(f\"Loaded ensemble preset '{preset_name}': {preset.get('name', preset_name)} — {preset.get('description', '')}\")\n        return preset\n\n    def list_ensemble_presets(self):\n        \"\"\"\n        List all available ensemble presets.\n\n        Returns a dict mapping preset IDs to their full preset data.\n        \"\"\"\n        try:\n            with resources.open_text(\"audio_separator\", \"ensemble_presets.json\") as f:\n                presets_data = json.load(f)\n        except FileNotFoundError:\n            return {}\n        return presets_data.get(\"presets\", {})\n\n    def setup_accelerated_inferencing_device(self):\n        \"\"\"\n        This method sets up the PyTorch and/or ONNX Runtime inferencing device, using GPU hardware acceleration if available.\n        \"\"\"\n        system_info = self.get_system_info()\n        self.check_ffmpeg_installed()\n        self.log_onnxruntime_packages()\n        self.setup_torch_device(system_info)\n\n    def get_system_info(self):\n        \"\"\"\n        This method logs the system information, including the operating system, CPU archutecture and Python version\n        \"\"\"\n        os_name = platform.system()\n        os_version = platform.version()\n        self.logger.info(f\"Operating System: {os_name} {os_version}\")\n\n        system_info = platform.uname()\n        self.logger.info(f\"System: {system_info.system} Node: {system_info.node} Release: {system_info.release} Machine: {system_info.machine} Proc: {system_info.processor}\")\n\n        python_version = platform.python_version()\n        self.logger.info(f\"Python Version: {python_version}\")\n\n        pytorch_version = torch.__version__\n        self.logger.info(f\"PyTorch Version: {pytorch_version}\")\n        return system_info\n\n    def check_ffmpeg_installed(self):\n        \"\"\"\n        This method checks if ffmpeg is installed and logs its version.\n        \"\"\"\n        try:\n            ffmpeg_version_output = subprocess.check_output([\"ffmpeg\", \"-version\"], text=True)\n            first_line = ffmpeg_version_output.splitlines()[0]\n            self.logger.info(f\"FFmpeg installed: {first_line}\")\n        except FileNotFoundError:\n            self.logger.error(\"FFmpeg is not installed. Please install FFmpeg to use this package.\")\n            # Raise an exception if this is being run by a user, as ffmpeg is required for pydub to write audio\n            # but if we're just running unit tests in CI, no reason to throw\n            if \"PYTEST_CURRENT_TEST\" not in os.environ:\n                raise\n\n    def log_onnxruntime_packages(self):\n        \"\"\"\n        This method logs the ONNX Runtime package versions, including the GPU and Silicon packages if available.\n        \"\"\"\n        onnxruntime_gpu_package = self.get_package_distribution(\"onnxruntime-gpu\")\n        onnxruntime_silicon_package = self.get_package_distribution(\"onnxruntime-silicon\")\n        onnxruntime_cpu_package = self.get_package_distribution(\"onnxruntime\")\n        onnxruntime_dml_package = self.get_package_distribution(\"onnxruntime-directml\")\n\n        if onnxruntime_gpu_package is not None:\n            self.logger.info(f\"ONNX Runtime GPU package installed with version: {onnxruntime_gpu_package.version}\")\n        if onnxruntime_silicon_package is not None:\n            self.logger.info(f\"ONNX Runtime Silicon package installed with version: {onnxruntime_silicon_package.version}\")\n        if onnxruntime_cpu_package is not None:\n            self.logger.info(f\"ONNX Runtime CPU package installed with version: {onnxruntime_cpu_package.version}\")\n        if onnxruntime_dml_package is not None:\n            self.logger.info(f\"ONNX Runtime DirectML package installed with version: {onnxruntime_dml_package.version}\")\n\n    def setup_torch_device(self, system_info):\n        \"\"\"\n        This method sets up the PyTorch and/or ONNX Runtime inferencing device, using GPU hardware acceleration if available.\n        \"\"\"\n        hardware_acceleration_enabled = False\n        ort_providers = ort.get_available_providers()\n        has_torch_dml_installed = self.get_package_distribution(\"torch_directml\")\n\n        self.torch_device_cpu = torch.device(\"cpu\")\n\n        if torch.cuda.is_available():\n            self.configure_cuda(ort_providers)\n            hardware_acceleration_enabled = True\n        elif hasattr(torch.backends, \"mps\") and torch.backends.mps.is_available() and system_info.processor == \"arm\":\n            self.configure_mps(ort_providers)\n            hardware_acceleration_enabled = True\n        elif self.use_directml and has_torch_dml_installed:\n            import torch_directml\n            if torch_directml.is_available():\n                self.configure_dml(ort_providers)\n                hardware_acceleration_enabled = True\n\n        if not hardware_acceleration_enabled:\n            self.logger.info(\"No hardware acceleration could be configured, running in CPU mode\")\n            self.torch_device = self.torch_device_cpu\n            self.onnx_execution_provider = [\"CPUExecutionProvider\"]\n\n    def configure_cuda(self, ort_providers):\n        \"\"\"\n        This method configures the CUDA device for PyTorch and ONNX Runtime, if available.\n        \"\"\"\n        self.logger.info(\"CUDA is available in Torch, setting Torch device to CUDA\")\n        self.torch_device = torch.device(\"cuda\")\n        if \"CUDAExecutionProvider\" in ort_providers:\n            self.logger.info(\"ONNXruntime has CUDAExecutionProvider available, enabling acceleration\")\n            self.onnx_execution_provider = [\"CUDAExecutionProvider\"]\n        else:\n            self.logger.warning(\"CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled\")\n\n    def configure_mps(self, ort_providers):\n        \"\"\"\n        This method configures the Apple Silicon MPS/CoreML device for PyTorch and ONNX Runtime, if available.\n        \"\"\"\n        self.logger.info(\"Apple Silicon MPS/CoreML is available in Torch and processor is ARM, setting Torch device to MPS\")\n        self.torch_device_mps = torch.device(\"mps\")\n\n        self.torch_device = self.torch_device_mps\n\n        if \"CoreMLExecutionProvider\" in ort_providers:\n            self.logger.info(\"ONNXruntime has CoreMLExecutionProvider available, enabling acceleration\")\n            self.onnx_execution_provider = [\"CoreMLExecutionProvider\"]\n        else:\n            self.logger.warning(\"CoreMLExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled\")\n\n    def configure_dml(self, ort_providers):\n        \"\"\"\n        This method configures the DirectML device for PyTorch and ONNX Runtime, if available.\n        \"\"\"\n        import torch_directml\n        self.logger.info(\"DirectML is available in Torch, setting Torch device to DirectML\")\n        self.torch_device_dml = torch_directml.device() \n        self.torch_device = self.torch_device_dml\n\n        if \"DmlExecutionProvider\" in ort_providers:\n            self.logger.info(\"ONNXruntime has DmlExecutionProvider available, enabling acceleration\")\n            self.onnx_execution_provider = [\"DmlExecutionProvider\"]\n        else:\n            self.logger.warning(\"DmlExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled\")\n\n    def get_package_distribution(self, package_name):\n        \"\"\"\n        This method returns the package distribution for a given package name if installed, or None otherwise.\n        \"\"\"\n        try:\n            return metadata.distribution(package_name)\n        except metadata.PackageNotFoundError:\n            self.logger.debug(f\"Python package: {package_name} not installed\")\n            return None\n\n    def get_model_hash(self, model_path):\n        \"\"\"\n        This method returns the MD5 hash of a given model file.\n        \"\"\"\n        self.logger.debug(f\"Calculating hash of model file {model_path}\")\n        # Use the specific byte count from the original logic\n        BYTES_TO_HASH = 10000 * 1024  # 10,240,000 bytes\n\n        try:\n            file_size = os.path.getsize(model_path)\n\n            with open(model_path, \"rb\") as f:\n                if file_size < BYTES_TO_HASH:\n                    # Hash the entire file if smaller than the target byte count\n                    self.logger.debug(f\"File size {file_size} < {BYTES_TO_HASH}, hashing entire file.\")\n                    hash_value = hashlib.md5(f.read()).hexdigest()\n                else:\n                    # Seek to the specific position before the end (from the beginning) and hash\n                    seek_pos = file_size - BYTES_TO_HASH\n                    self.logger.debug(f\"File size {file_size} >= {BYTES_TO_HASH}, seeking to {seek_pos} and hashing remaining bytes.\")\n                    f.seek(seek_pos, io.SEEK_SET)\n                    hash_value = hashlib.md5(f.read()).hexdigest()\n\n            # Log the calculated hash\n            self.logger.info(f\"Hash of model file {model_path} is {hash_value}\")\n            return hash_value\n\n        except FileNotFoundError:\n            self.logger.error(f\"Model file not found at {model_path}\")\n            raise # Re-raise the specific error\n        except Exception as e:\n            # Catch other potential errors (e.g., permissions, other IOErrors)\n            self.logger.error(f\"Error calculating hash for {model_path}: {e}\")\n            raise # Re-raise other errors\n\n    def download_file_if_not_exists(self, url, output_path):\n        \"\"\"\n        This method downloads a file from a given URL to a given output path, if the file does not already exist.\n        \"\"\"\n\n        if os.path.isfile(output_path):\n            self.logger.debug(f\"File already exists at {output_path}, skipping download\")\n            return\n\n        self.logger.debug(f\"Downloading file from {url} to {output_path} with timeout 300s\")\n        response = requests.get(url, stream=True, timeout=300)\n\n        if response.status_code == 200:\n            total_size_in_bytes = int(response.headers.get(\"content-length\", 0))\n            progress_bar = tqdm(total=total_size_in_bytes, unit=\"iB\", unit_scale=True)\n\n            with open(output_path, \"wb\") as f:\n                for chunk in response.iter_content(chunk_size=8192):\n                    progress_bar.update(len(chunk))\n                    f.write(chunk)\n            progress_bar.close()\n        else:\n            raise RuntimeError(f\"Failed to download file from {url}, response code: {response.status_code}\")\n\n    def list_supported_model_files(self):\n        \"\"\"\n        This method lists the supported model files for audio-separator, by fetching the same file UVR uses to list these.\n        Also includes model performance scores where available.\n\n        Example response object:\n\n        {\n            \"MDX\": {\n                \"MDX-Net Model VIP: UVR-MDX-NET-Inst_full_292\": {\n                \"filename\": \"UVR-MDX-NET-Inst_full_292.onnx\",\n                \"scores\": {\n                    \"vocals\": {\n                    \"SDR\": 10.6497,\n                    \"SIR\": 20.3786,\n                    \"SAR\": 10.692,\n                    \"ISR\": 14.848\n                    },\n                    \"instrumental\": {\n                    \"SDR\": 15.2149,\n                    \"SIR\": 25.6075,\n                    \"SAR\": 17.1363,\n                    \"ISR\": 17.7893\n                    }\n                },\n                \"download_files\": [\n                    \"UVR-MDX-NET-Inst_full_292.onnx\"\n                ]\n                }\n            },\n            \"Demucs\": {\n                \"Demucs v4: htdemucs_ft\": {\n                \"filename\": \"htdemucs_ft.yaml\",\n                \"scores\": {\n                    \"vocals\": {\n                    \"SDR\": 11.2685,\n                    \"SIR\": 21.257,\n                    \"SAR\": 11.0359,\n                    \"ISR\": 19.3753\n                    },\n                    \"drums\": {\n                    \"SDR\": 13.235,\n                    \"SIR\": 23.3053,\n                    \"SAR\": 13.0313,\n                    \"ISR\": 17.2889\n                    },\n                    \"bass\": {\n                    \"SDR\": 9.72743,\n                    \"SIR\": 19.5435,\n                    \"SAR\": 9.20801,\n                    \"ISR\": 13.5037\n                    }\n                },\n                \"download_files\": [\n                    \"https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/f7e0c4bc-ba3fe64a.th\",\n                    \"https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/d12395a8-e57c48e6.th\",\n                    \"https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/92cfc3b6-ef3bcb9c.th\",\n                    \"https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/04573f0d-f3cf25b2.th\",\n                    \"https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/htdemucs_ft.yaml\"\n                ]\n                }\n            },\n            \"MDXC\": {\n                \"MDX23C Model: MDX23C-InstVoc HQ\": {\n                \"filename\": \"MDX23C-8KFFT-InstVoc_HQ.ckpt\",\n                \"scores\": {\n                    \"vocals\": {\n                    \"SDR\": 11.9504,\n                    \"SIR\": 23.1166,\n                    \"SAR\": 12.093,\n                    \"ISR\": 15.4782\n                    },\n                    \"instrumental\": {\n                    \"SDR\": 16.3035,\n                    \"SIR\": 26.6161,\n                    \"SAR\": 18.5167,\n                    \"ISR\": 18.3939\n                    }\n                },\n                \"download_files\": [\n                    \"MDX23C-8KFFT-InstVoc_HQ.ckpt\",\n                    \"model_2_stem_full_band_8k.yaml\"\n                ]\n                }\n            }\n        }\n        \"\"\"\n        download_checks_path = os.path.join(self.model_file_dir, \"download_checks.json\")\n\n        self.download_file_if_not_exists(\"https://raw.githubusercontent.com/TRvlvr/application_data/main/filelists/download_checks.json\", download_checks_path)\n\n        model_downloads_list = json.load(open(download_checks_path, encoding=\"utf-8\"))\n        self.logger.debug(f\"UVR model download list loaded\")\n\n        # Load the model scores with error handling\n        model_scores = {}\n        try:\n            with resources.open_text(\"audio_separator\", \"models-scores.json\") as f:\n                model_scores = json.load(f)\n            self.logger.debug(f\"Model scores loaded\")\n        except json.JSONDecodeError as e:\n            self.logger.warning(f\"Failed to load model scores: {str(e)}\")\n            self.logger.warning(\"Continuing without model scores\")\n\n        # Only show Demucs v4 models as we've only implemented support for v4\n        filtered_demucs_v4 = {key: value for key, value in model_downloads_list[\"demucs_download_list\"].items() if key.startswith(\"Demucs v4\")}\n\n        # Modified Demucs handling to use YAML files as identifiers and include download files\n        demucs_models = {}\n        for name, files in filtered_demucs_v4.items():\n            # Find the YAML file in the model files\n            yaml_file = next((filename for filename in files.keys() if filename.endswith(\".yaml\")), None)\n            if yaml_file:\n                model_score_data = model_scores.get(yaml_file, {})\n                demucs_models[name] = {\n                    \"filename\": yaml_file,\n                    \"scores\": model_score_data.get(\"median_scores\", {}),\n                    \"stems\": model_score_data.get(\"stems\", []),\n                    \"target_stem\": model_score_data.get(\"target_stem\"),\n                    \"download_files\": list(files.values()),  # List of all download URLs/filenames\n                }\n\n        # Load the JSON file using importlib.resources\n        with resources.open_text(\"audio_separator\", \"models.json\") as f:\n            audio_separator_models_list = json.load(f)\n        self.logger.debug(f\"Audio-Separator model list loaded\")\n\n        # Return object with list of model names\n        model_files_grouped_by_type = {\n            \"VR\": {\n                name: {\n                    \"filename\": filename,\n                    \"scores\": model_scores.get(filename, {}).get(\"median_scores\", {}),\n                    \"stems\": model_scores.get(filename, {}).get(\"stems\", []),\n                    \"target_stem\": model_scores.get(filename, {}).get(\"target_stem\"),\n                    \"download_files\": [filename],\n                }  # Just the filename for VR models\n                for name, filename in {**model_downloads_list[\"vr_download_list\"], **audio_separator_models_list[\"vr_download_list\"]}.items()\n            },\n            \"MDX\": {\n                name: {\n                    \"filename\": filename,\n                    \"scores\": model_scores.get(filename, {}).get(\"median_scores\", {}),\n                    \"stems\": model_scores.get(filename, {}).get(\"stems\", []),\n                    \"target_stem\": model_scores.get(filename, {}).get(\"target_stem\"),\n                    \"download_files\": [filename],\n                }  # Just the filename for MDX models\n                for name, filename in {**model_downloads_list[\"mdx_download_list\"], **model_downloads_list[\"mdx_download_vip_list\"], **audio_separator_models_list[\"mdx_download_list\"]}.items()\n            },\n            \"Demucs\": demucs_models,\n            \"MDXC\": {\n                name: {\n                    \"filename\": next(iter(files.keys())),\n                    \"scores\": model_scores.get(next(iter(files.keys())), {}).get(\"median_scores\", {}),\n                    \"stems\": model_scores.get(next(iter(files.keys())), {}).get(\"stems\", []),\n                    \"target_stem\": model_scores.get(next(iter(files.keys())), {}).get(\"target_stem\"),\n                    \"download_files\": list(files.keys()) + list(files.values()),  # List of both model filenames and config filenames\n                }\n                for name, files in {\n                    **model_downloads_list[\"mdx23c_download_list\"],\n                    **model_downloads_list[\"mdx23c_download_vip_list\"],\n                    **model_downloads_list[\"roformer_download_list\"],\n                    **audio_separator_models_list[\"mdx23c_download_list\"],\n                    **audio_separator_models_list[\"roformer_download_list\"],\n                }.items()\n            },\n        }\n\n        return model_files_grouped_by_type\n\n    def print_uvr_vip_message(self):\n        \"\"\"\n        This method prints a message to the user if they have downloaded a VIP model, reminding them to support Anjok07 on Patreon.\n        \"\"\"\n        if self.model_is_uvr_vip:\n            self.logger.warning(f\"The model: '{self.model_friendly_name}' is a VIP model, intended by Anjok07 for access by paying subscribers only.\")\n            self.logger.warning(\"If you are not already subscribed, please consider supporting the developer of UVR, Anjok07 by subscribing here: https://patreon.com/uvr\")\n\n    def download_model_files(self, model_filename):\n        \"\"\"\n        This method downloads the model files for a given model filename, if they are not already present.\n        Returns tuple of (model_filename, model_type, model_friendly_name, model_path, yaml_config_filename)\n        \"\"\"\n        model_path = os.path.join(self.model_file_dir, f\"{model_filename}\")\n\n        supported_model_files_grouped = self.list_supported_model_files()\n        public_model_repo_url_prefix = \"https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models\"\n        vip_model_repo_url_prefix = \"https://github.com/Anjok0109/ai_magic/releases/download/v5\"\n        audio_separator_models_repo_url_prefix = \"https://github.com/nomadkaraoke/python-audio-separator/releases/download/model-configs\"\n\n        yaml_config_filename = None\n\n        self.logger.debug(f\"Searching for model_filename {model_filename} in supported_model_files_grouped\")\n\n        # Iterate through model types (MDX, Demucs, MDXC)\n        for model_type, models in supported_model_files_grouped.items():\n            # Iterate through each model in this type\n            for model_friendly_name, model_info in models.items():\n                self.model_is_uvr_vip = \"VIP\" in model_friendly_name\n                model_repo_url_prefix = vip_model_repo_url_prefix if self.model_is_uvr_vip else public_model_repo_url_prefix\n\n                # Check if this model matches our target filename\n                if model_info[\"filename\"] == model_filename or model_filename in model_info[\"download_files\"]:\n                    self.logger.debug(f\"Found matching model: {model_friendly_name}\")\n                    self.model_friendly_name = model_friendly_name\n                    self.print_uvr_vip_message()\n\n                    # Download each required file for this model\n                    for file_to_download in model_info[\"download_files\"]:\n                        # For URLs, extract just the filename portion\n                        if file_to_download.startswith(\"http\"):\n                            filename = file_to_download.split(\"/\")[-1]\n                            download_path = os.path.join(self.model_file_dir, filename)\n                            self.download_file_if_not_exists(file_to_download, download_path)\n                            continue\n\n                        download_path = os.path.join(self.model_file_dir, file_to_download)\n\n                        # For MDXC models, handle YAML config files specially\n                        if model_type == \"MDXC\" and file_to_download.endswith(\".yaml\"):\n                            yaml_config_filename = file_to_download\n                            try:\n                                yaml_url = f\"{model_repo_url_prefix}/mdx_model_data/mdx_c_configs/{file_to_download}\"\n                                self.download_file_if_not_exists(yaml_url, download_path)\n                            except RuntimeError:\n                                self.logger.debug(\"YAML config not found in UVR repo, trying audio-separator models repo...\")\n                                yaml_url = f\"{audio_separator_models_repo_url_prefix}/{file_to_download}\"\n                                self.download_file_if_not_exists(yaml_url, download_path)\n                            continue\n\n                        # For regular model files, try UVR repo first, then audio-separator repo\n                        try:\n                            download_url = f\"{model_repo_url_prefix}/{file_to_download}\"\n                            self.download_file_if_not_exists(download_url, download_path)\n                        except RuntimeError:\n                            self.logger.debug(\"Model not found in UVR repo, trying audio-separator models repo...\")\n                            download_url = f\"{audio_separator_models_repo_url_prefix}/{file_to_download}\"\n                            self.download_file_if_not_exists(download_url, download_path)\n\n                    return model_filename, model_type, model_friendly_name, model_path, yaml_config_filename\n\n        raise ValueError(f\"Model file {model_filename} not found in supported model files\")\n\n    def load_model_data_from_yaml(self, yaml_config_filename):\n        \"\"\"\n        This method loads model-specific parameters from the YAML file for that model.\n        The parameters in the YAML are critical to inferencing, as they need to match whatever was used during training.\n        \"\"\"\n        # Verify if the YAML filename includes a full path or just the filename\n        if not os.path.exists(yaml_config_filename):\n            model_data_yaml_filepath = os.path.join(self.model_file_dir, yaml_config_filename)\n        else:\n            model_data_yaml_filepath = yaml_config_filename\n\n        self.logger.debug(f\"Loading model data from YAML at path {model_data_yaml_filepath}\")\n\n        model_data = yaml.load(open(model_data_yaml_filepath, encoding=\"utf-8\"), Loader=yaml.FullLoader)\n        self.logger.debug(f\"Model data loaded from YAML file: {model_data}\")\n\n        if \"roformer\" in model_data_yaml_filepath.lower():\n            model_data[\"is_roformer\"] = True\n\n        return model_data\n\n    def load_model_data_using_hash(self, model_path):\n        \"\"\"\n        This method loads model-specific parameters from UVR model data files.\n        These parameters are critical to inferencing using a given model, as they need to match whatever was used during training.\n        The correct parameters are identified by calculating the hash of the model file and looking up the hash in the UVR data files.\n        \"\"\"\n        # Model data and configuration sources from UVR\n        model_data_url_prefix = \"https://raw.githubusercontent.com/TRvlvr/application_data/main\"\n\n        vr_model_data_url = f\"{model_data_url_prefix}/vr_model_data/model_data_new.json\"\n        mdx_model_data_url = f\"{model_data_url_prefix}/mdx_model_data/model_data_new.json\"\n\n        # Calculate hash for the downloaded model\n        self.logger.debug(\"Calculating MD5 hash for model file to identify model parameters from UVR data...\")\n        model_hash = self.get_model_hash(model_path)\n        self.logger.debug(f\"Model {model_path} has hash {model_hash}\")\n\n        # Setting up the path for model data and checking its existence\n        vr_model_data_path = os.path.join(self.model_file_dir, \"vr_model_data.json\")\n        self.logger.debug(f\"VR model data path set to {vr_model_data_path}\")\n        self.download_file_if_not_exists(vr_model_data_url, vr_model_data_path)\n\n        mdx_model_data_path = os.path.join(self.model_file_dir, \"mdx_model_data.json\")\n        self.logger.debug(f\"MDX model data path set to {mdx_model_data_path}\")\n        self.download_file_if_not_exists(mdx_model_data_url, mdx_model_data_path)\n\n        # Loading model data from UVR\n        self.logger.debug(\"Loading MDX and VR model parameters from UVR model data files...\")\n        vr_model_data_object = json.load(open(vr_model_data_path, encoding=\"utf-8\"))\n        mdx_model_data_object = json.load(open(mdx_model_data_path, encoding=\"utf-8\"))\n\n        # Load additional model data from audio-separator\n        self.logger.debug(\"Loading additional model parameters from audio-separator model data file...\")\n        with resources.open_text(\"audio_separator\", \"model-data.json\") as f:\n            audio_separator_model_data = json.load(f)\n\n        # Merge the model data objects, with audio-separator data taking precedence\n        vr_model_data_object = {**vr_model_data_object, **audio_separator_model_data.get(\"vr_model_data\", {})}\n        mdx_model_data_object = {**mdx_model_data_object, **audio_separator_model_data.get(\"mdx_model_data\", {})}\n\n        if model_hash in mdx_model_data_object:\n            model_data = mdx_model_data_object[model_hash]\n        elif model_hash in vr_model_data_object:\n            model_data = vr_model_data_object[model_hash]\n        else:\n            raise ValueError(f\"Unsupported Model File: parameters for MD5 hash {model_hash} could not be found in UVR model data file for MDX or VR arch.\")\n\n        self.logger.debug(f\"Model data loaded using hash {model_hash}: {model_data}\")\n\n        return model_data\n\n    def load_model(self, model_filename=\"model_bs_roformer_ep_317_sdr_12.9755.ckpt\"):\n        \"\"\"\n        This method instantiates the architecture-specific separation class,\n        loading the separation model into memory, downloading it first if necessary.\n        \"\"\"\n        # If an ensemble preset was loaded and no explicit model list was provided, use preset models\n        if self._ensemble_preset_models is not None and model_filename == \"model_bs_roformer_ep_317_sdr_12.9755.ckpt\":\n            model_filename = self._ensemble_preset_models\n\n        if isinstance(model_filename, list):\n            if len(model_filename) > 1:\n                self.model_filename = list(model_filename)\n                self.model_filenames = list(model_filename)\n                self.logger.info(f\"Multiple models specified for ensembling: {self.model_filenames}\")\n                return\n            model_filename = model_filename[0]\n\n        self.model_filename = model_filename\n        self.model_filenames = [model_filename]\n\n        self.logger.info(f\"Loading model {model_filename}...\")\n\n        load_model_start_time = time.perf_counter()\n\n        # Setting up the model path\n        model_filename, model_type, model_friendly_name, model_path, yaml_config_filename = self.download_model_files(model_filename)\n        model_name = model_filename.split(\".\")[0]\n        self.logger.debug(f\"Model downloaded, friendly name: {model_friendly_name}, model_path: {model_path}\")\n\n        if model_path.lower().endswith(\".yaml\"):\n            yaml_config_filename = model_path\n\n        if yaml_config_filename is not None:\n            model_data = self.load_model_data_from_yaml(yaml_config_filename)\n        else:\n            model_data = self.load_model_data_using_hash(model_path)\n\n        common_params = {\n            \"logger\": self.logger,\n            \"log_level\": self.log_level,\n            \"torch_device\": self.torch_device,\n            \"torch_device_cpu\": self.torch_device_cpu,\n            \"torch_device_mps\": self.torch_device_mps,\n            \"onnx_execution_provider\": self.onnx_execution_provider,\n            \"model_name\": model_name,\n            \"model_path\": model_path,\n            \"model_data\": model_data,\n            \"output_format\": self.output_format,\n            \"output_bitrate\": self.output_bitrate,\n            \"output_dir\": self.output_dir,\n            \"normalization_threshold\": self.normalization_threshold,\n            \"amplification_threshold\": self.amplification_threshold,\n            \"output_single_stem\": self.output_single_stem,\n            \"invert_using_spec\": self.invert_using_spec,\n            \"sample_rate\": self.sample_rate,\n            \"use_soundfile\": self.use_soundfile,\n        }\n\n        # Instantiate the appropriate separator class depending on the model type\n        separator_classes = {\"MDX\": \"mdx_separator.MDXSeparator\", \"VR\": \"vr_separator.VRSeparator\", \"Demucs\": \"demucs_separator.DemucsSeparator\", \"MDXC\": \"mdxc_separator.MDXCSeparator\"}\n\n        if model_type not in self.arch_specific_params or model_type not in separator_classes:\n            # Enhanced error message for Roformer models\n            if \"roformer\" in model_filename.lower() or (model_data and model_data.get(\"is_roformer\", False)):\n                error_msg = (f\"Roformer model type not properly configured: {model_type}. \"\n                           f\"This may indicate a configuration validation failure. \"\n                           f\"Please check the model file and YAML configuration.\")\n                self.logger.error(error_msg)\n                raise ValueError(error_msg)\n            else:\n                raise ValueError(f\"Model type not supported (yet): {model_type}\")\n\n        if model_type == \"Demucs\" and sys.version_info < (3, 10):\n            raise Exception(\"Demucs models require Python version 3.10 or newer.\")\n\n        self.logger.debug(f\"Importing module for model type {model_type}: {separator_classes[model_type]}\")\n\n        module_name, class_name = separator_classes[model_type].split(\".\")\n        module = importlib.import_module(f\"audio_separator.separator.architectures.{module_name}\")\n        separator_class = getattr(module, class_name)\n\n        self.logger.debug(f\"Instantiating separator class for model type {model_type}: {separator_class}\")\n\n        try:\n            self.model_instance = separator_class(common_config=common_params, arch_config=self.arch_specific_params[model_type])\n        except Exception as e:\n            # Enhanced error handling for Roformer models\n            if \"roformer\" in model_filename.lower() or (model_data and model_data.get(\"is_roformer\", False)):\n                error_msg = (f\"Failed to instantiate Roformer model: {e}. \"\n                           f\"This may be due to missing parameters or configuration validation failures.\")\n                self.logger.error(error_msg)\n                raise RuntimeError(error_msg) from e\n            else:\n                raise\n\n        # Log Roformer implementation version if applicable\n        if hasattr(self.model_instance, 'is_roformer_model') and self.model_instance.is_roformer_model:\n            roformer_stats = self.model_instance.get_roformer_loading_stats()\n            if roformer_stats:\n                self.logger.info(f\"Roformer loading stats: {roformer_stats}\")\n\n        # Log the completion of the model load process\n        self.logger.debug(\"Loading model completed.\")\n        self.logger.info(f'Load model duration: {time.strftime(\"%H:%M:%S\", time.gmtime(int(time.perf_counter() - load_model_start_time)))}')\n\n    def separate(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Separates the audio file(s) into different stems (e.g., vocals, instruments) using the loaded model.\n\n        This method takes the path to an audio file or a directory containing audio files, processes them through\n        the loaded separation model, and returns the paths to the output files containing the separated audio stems.\n        It handles the entire flow from loading the audio, running the separation, clearing up resources, and logging the process.\n\n        Parameters:\n        - audio_file_path (str or list): The path to the audio file or directory, or a list of paths.\n        - custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n        - output_files (list of str): A list containing the paths to the separated audio stem files.\n        \"\"\"\n        # Check if the model and device are properly initialized\n        if not (self.torch_device and (self.model_instance or (isinstance(self.model_filename, list) and len(self.model_filename) > 0))):\n            raise ValueError(\"Initialization failed or model not loaded. Please load a model before attempting to separate.\")\n\n        if isinstance(self.model_filename, list) and len(self.model_filename) > 1:\n            return self._separate_ensemble(audio_file_path, custom_output_names)\n\n        # If audio_file_path is a string, convert it to a list for uniform processing\n        if isinstance(audio_file_path, str):\n            audio_file_path = [audio_file_path]\n\n        # Initialize a list to store paths of all output files\n        output_files = []\n\n        # Process each path in the list\n        for path in audio_file_path:\n            if os.path.isdir(path):\n                # If the path is a directory, recursively search for all audio files\n                for root, dirs, files in os.walk(path):\n                    for file in files:\n                        # Check the file extension to ensure it's an audio file\n                        if file.endswith((\".wav\", \".flac\", \".mp3\", \".ogg\", \".opus\", \".m4a\", \".aiff\", \".ac3\")):  # Add other formats if needed\n                            full_path = os.path.join(root, file)\n                            self.logger.info(f\"Processing file: {full_path}\")\n                            try:\n                                # Perform separation for each file\n                                files_output = self._separate_file(full_path, custom_output_names)\n                                output_files.extend(files_output)\n                            except Exception as e:\n                                self.logger.error(f\"Failed to process file {full_path}: {e}\")\n            else:\n                # If the path is a file, process it directly\n                self.logger.info(f\"Processing file: {path}\")\n                try:\n                    files_output = self._separate_file(path, custom_output_names)\n                    output_files.extend(files_output)\n                except Exception as e:\n                    self.logger.error(f\"Failed to process file {path}: {e}\")\n\n        return output_files\n\n    def _separate_file(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Internal method to handle separation for a single audio file.\n        This method performs the actual separation process for a single audio file. It logs the start and end of the process,\n        handles autocast if enabled, and ensures GPU cache is cleared after processing.\n        Parameters:\n        - audio_file_path (str): The path to the audio file.\n        - custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n        Returns:\n        - output_files (list of str): A list containing the paths to the separated audio stem files.\n        \"\"\"\n        # Check if chunking is enabled and file is large enough\n        if self.chunk_duration is not None:\n            import librosa\n            duration = librosa.get_duration(path=audio_file_path)\n\n            from audio_separator.separator.audio_chunking import AudioChunker\n            chunker = AudioChunker(self.chunk_duration, self.logger)\n\n            if chunker.should_chunk(duration):\n                self.logger.info(f\"File duration {duration:.1f}s exceeds chunk size {self.chunk_duration}s, using chunked processing\")\n                return self._process_with_chunking(audio_file_path, custom_output_names)\n\n        # Log the start of the separation process\n        self.logger.info(f\"Starting separation process for audio_file_path: {audio_file_path}\")\n        separate_start_time = time.perf_counter()\n\n        # Log normalization and amplification thresholds\n        self.logger.debug(f\"Normalization threshold set to {self.normalization_threshold}, waveform will be lowered to this max amplitude to avoid clipping.\")\n        self.logger.debug(f\"Amplification threshold set to {self.amplification_threshold}, waveform will be scaled up to this max amplitude if below it.\")\n\n        # Run separation method for the loaded model with autocast enabled if supported by the device\n        output_files = None\n        if self.use_autocast and autocast_mode.is_autocast_available(self.torch_device.type):\n            self.logger.debug(\"Autocast available.\")\n            with autocast_mode.autocast(self.torch_device.type):\n                output_files = self.model_instance.separate(audio_file_path, custom_output_names)\n        else:\n            self.logger.debug(\"Autocast unavailable.\")\n            output_files = self.model_instance.separate(audio_file_path, custom_output_names)\n\n        # Clear GPU cache to free up memory\n        self.model_instance.clear_gpu_cache()\n\n        # Unset separation parameters to prevent accidentally re-using the wrong source files or output paths\n        self.model_instance.clear_file_specific_paths()\n\n        # Remind the user one more time if they used a VIP model, so the message doesn't get lost in the logs\n        self.print_uvr_vip_message()\n\n        # Log the completion of the separation process\n        self.logger.debug(\"Separation process completed.\")\n        self.logger.info(f'Separation duration: {time.strftime(\"%H:%M:%S\", time.gmtime(int(time.perf_counter() - separate_start_time)))}')\n\n        return output_files\n\n    def _process_with_chunking(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Process large file by splitting into chunks.\n\n        This method splits a large audio file into smaller chunks, processes each chunk\n        separately, and merges the results back together. This helps prevent out-of-memory\n        errors when processing very long audio files.\n\n        Parameters:\n        - audio_file_path (str): The path to the audio file.\n        - custom_output_names (dict, optional): Custom names for the output files. Defaults to None.\n\n        Returns:\n        - output_files (list of str): A list containing the paths to the separated audio stem files.\n        \"\"\"\n        import tempfile\n        import shutil\n        from audio_separator.separator.audio_chunking import AudioChunker\n\n        # Create temporary directory for chunks\n        temp_dir = tempfile.mkdtemp(prefix=\"audio-separator-chunks-\")\n        self.logger.debug(f\"Created temporary directory for chunks: {temp_dir}\")\n\n        try:\n            # Split audio into chunks\n            chunker = AudioChunker(self.chunk_duration, self.logger)\n            chunk_paths = chunker.split_audio(audio_file_path, temp_dir)\n\n            # Process each chunk\n            processed_chunks_by_stem = {}\n\n            for i, chunk_path in enumerate(chunk_paths):\n                self.logger.info(f\"Processing chunk {i+1}/{len(chunk_paths)}: {chunk_path}\")\n\n                original_chunk_duration = self.chunk_duration\n                original_output_dir = self.output_dir\n                self.chunk_duration = None\n                self.output_dir = temp_dir\n\n                if self.model_instance:\n                    original_model_output_dir = self.model_instance.output_dir\n                    self.model_instance.output_dir = temp_dir\n\n                try:\n                    output_files = self._separate_file(chunk_path)\n\n                    # Dynamically group chunks by stem name\n                    for stem_path in output_files:\n                        # Extract stem name from filename: \"chunk_0000_(Vocals).wav\" → \"Vocals\"\n                        filename = os.path.basename(stem_path)\n                        match = re.search(r'_\\(([^)]+)\\)', filename)\n                        if match:\n                            stem_name = match.group(1)\n                        else:\n                            # Fallback: use index-based name if pattern not found\n                            stem_index = len([k for k in processed_chunks_by_stem.keys() if k.startswith('stem_')])\n                            stem_name = f\"stem_{stem_index}\"\n                            self.logger.warning(f\"Could not extract stem name from {filename}, using {stem_name}\")\n\n                        if stem_name not in processed_chunks_by_stem:\n                            processed_chunks_by_stem[stem_name] = []\n\n                        # Ensure absolute path\n                        abs_path = stem_path if os.path.isabs(stem_path) else os.path.join(temp_dir, stem_path)\n                        processed_chunks_by_stem[stem_name].append(abs_path)\n\n                    if not output_files:\n                        self.logger.warning(f\"Chunk {i+1} produced no output files\")\n\n                finally:\n                    self.chunk_duration = original_chunk_duration\n                    self.output_dir = original_output_dir\n                    if self.model_instance:\n                        self.model_instance.output_dir = original_model_output_dir\n\n                # Clear GPU cache between chunks\n                if self.model_instance:\n                    self.model_instance.clear_gpu_cache()\n\n            # Merge chunks for each stem dynamically\n            base_name = os.path.splitext(os.path.basename(audio_file_path))[0]\n            output_files = []\n\n            for stem_name in sorted(processed_chunks_by_stem.keys()):\n                chunk_paths_for_stem = processed_chunks_by_stem[stem_name]\n\n                if not chunk_paths_for_stem:\n                    self.logger.warning(f\"No chunks found for stem: {stem_name}\")\n                    continue\n\n                # Determine output filename\n                if custom_output_names and stem_name in custom_output_names:\n                    output_filename = custom_output_names[stem_name]\n                else:\n                    output_filename = f\"{base_name}_({stem_name})\"\n\n                output_path = os.path.join(self.output_dir, f\"{output_filename}.{self.output_format.lower()}\")\n\n                self.logger.info(f\"Merging {len(chunk_paths_for_stem)} chunks for stem: {stem_name}\")\n                chunker.merge_chunks(chunk_paths_for_stem, output_path)\n                output_files.append(output_path)\n\n            self.logger.info(f\"Chunked processing completed. Output files: {output_files}\")\n            return output_files\n\n        finally:\n            # Clean up temporary directory\n            if os.path.exists(temp_dir):\n                self.logger.debug(f\"Cleaning up temporary directory: {temp_dir}\")\n                shutil.rmtree(temp_dir, ignore_errors=True)\n\n    def download_model_and_data(self, model_filename):\n        \"\"\"\n        Downloads the model file without loading it into memory.\n        \"\"\"\n        self.logger.info(f\"Downloading model {model_filename}...\")\n\n        model_filename, model_type, model_friendly_name, model_path, yaml_config_filename = self.download_model_files(model_filename)\n\n        if model_path.lower().endswith(\".yaml\"):\n            yaml_config_filename = model_path\n\n        if yaml_config_filename is not None:\n            model_data = self.load_model_data_from_yaml(yaml_config_filename)\n        else:\n            model_data = self.load_model_data_using_hash(model_path)\n\n        model_data_dict_size = len(model_data)\n\n        self.logger.info(f\"Model downloaded, type: {model_type}, friendly name: {model_friendly_name}, model_path: {model_path}, model_data: {model_data_dict_size} items\")\n\n    def get_simplified_model_list(self, filter_sort_by: Optional[str] = None):\n        \"\"\"\n        Returns a simplified, user-friendly list of models with their key metrics.\n        Optionally sorts the list based on the specified criteria.\n\n        :param sort_by: Criteria to sort by. Can be \"name\", \"filename\", or any stem name\n        \"\"\"\n        model_files = self.list_supported_model_files()\n        simplified_list = {}\n\n        for model_type, models in model_files.items():\n            for name, data in models.items():\n                filename = data[\"filename\"]\n                scores = data.get(\"scores\") or {}\n                stems = data.get(\"stems\") or []\n                target_stem = data.get(\"target_stem\")\n\n                # Format stems with their SDR scores where available\n                stems_with_scores = []\n                stem_sdr_dict = {}\n\n                # Process each stem from the model's stem list\n                for stem in stems:\n                    stem_scores = scores.get(stem, {})\n                    # Add asterisk if this is the target stem\n                    stem_display = f\"{stem}*\" if stem == target_stem else stem\n\n                    if isinstance(stem_scores, dict) and \"SDR\" in stem_scores:\n                        sdr = round(stem_scores[\"SDR\"], 1)\n                        stems_with_scores.append(f\"{stem_display} ({sdr})\")\n                        stem_sdr_dict[stem.lower()] = sdr\n                    else:\n                        # Include stem without SDR score\n                        stems_with_scores.append(stem_display)\n                        stem_sdr_dict[stem.lower()] = None\n\n                # If no stems listed, mark as Unknown\n                if not stems_with_scores:\n                    stems_with_scores = [\"Unknown\"]\n                    stem_sdr_dict[\"unknown\"] = None\n\n                simplified_list[filename] = {\"Name\": name, \"Type\": model_type, \"Stems\": stems_with_scores, \"SDR\": stem_sdr_dict}\n\n        # Sort and filter the list if a sort_by parameter is provided\n        if filter_sort_by:\n            if filter_sort_by == \"name\":\n                return dict(sorted(simplified_list.items(), key=lambda x: x[1][\"Name\"]))\n            elif filter_sort_by == \"filename\":\n                return dict(sorted(simplified_list.items()))\n            else:\n                # Convert sort_by to lowercase for case-insensitive comparison\n                sort_by_lower = filter_sort_by.lower()\n                # Filter out models that don't have the specified stem\n                filtered_list = {k: v for k, v in simplified_list.items() if sort_by_lower in v[\"SDR\"]}\n\n                # Sort by SDR score if available, putting None values last\n                def sort_key(item):\n                    sdr = item[1][\"SDR\"][sort_by_lower]\n                    return (0 if sdr is None else 1, sdr if sdr is not None else float(\"-inf\"))\n\n                return dict(sorted(filtered_list.items(), key=sort_key, reverse=True))\n\n        return simplified_list\n\n    def _separate_ensemble(self, audio_file_path, custom_output_names=None):\n        \"\"\"\n        Internal method to handle ensembling of multiple models.\n        \"\"\"\n        import tempfile\n        import shutil\n\n        if isinstance(audio_file_path, str):\n            audio_file_path = [audio_file_path]\n\n        output_files = []\n\n        original_model_filename = self.model_filename\n        original_model_filenames = self.model_filenames\n\n        for path in audio_file_path:\n            self.logger.info(f\"Ensemble processing for file: {path}\")\n\n            # Create temporary directory for intermediate stems\n            temp_dir = tempfile.mkdtemp(prefix=\"audio-separator-ensemble-\")\n            self.logger.debug(f\"Created temporary directory for ensemble: {temp_dir}\")\n\n            try:\n                # Store paths of intermediate stems grouped by stem name\n                # { \"Vocals\": [\"temp_dir/model1_Vocals.wav\", \"temp_dir/model2_Vocals.wav\"], ... }\n                stems_by_type = {}\n                original_output_dir = self.output_dir\n\n                for model_filename in original_model_filenames:\n                    self.logger.info(f\"Processing with model: {model_filename}\")\n\n                    # Load the model\n                    self.load_model(model_filename)\n\n                    # Set temporary output directory for this model\n                    self.output_dir = temp_dir\n                    if self.model_instance:\n                        self.model_instance.output_dir = temp_dir\n\n                    try:\n                        # Perform separation WITHOUT custom_output_names for intermediate files.\n                        # Intermediate stems must use the default \"base_(StemType)_model.ext\" naming\n                        # so the regex below can extract stem types for classification.\n                        # custom_output_names is applied later to the final ensembled output.\n                        model_stems = self._separate_file(path, None)\n\n                        # Extract and normalize stem names from this model's outputs\n                        model_stem_names = []\n                        for stem_path in model_stems:\n                            filename = os.path.basename(stem_path)\n                            match = re.search(r'_\\(([^)]+)\\)', filename)\n                            stem_name = match.group(1) if match else \"Unknown\"\n                            model_stem_names.append(stem_name)\n\n                        # Normalize stem names with context about how many stems this model produced\n                        num_model_stems = len(model_stem_names)\n                        has_vocal_stem = any(\n                            \"vocal\" in s.lower() or s.lower() in (\"vocals\",)\n                            for s in model_stem_names\n                        )\n\n                        for stem_path, raw_stem_name in zip(model_stems, model_stem_names):\n                            lower_name = raw_stem_name.lower()\n\n                            if \"vocal\" in lower_name and \"lead\" not in lower_name and \"backing\" not in lower_name:\n                                stem_name = \"Vocals\"\n                            elif lower_name == \"other\" and num_model_stems == 2 and has_vocal_stem:\n                                # For 2-stem models where one stem is vocals, \"other\" is the instrumental\n                                stem_name = \"Instrumental\"\n                                self.logger.debug(f\"Mapped 'other' → 'Instrumental' for 2-stem model (model produced: {model_stem_names})\")\n                            elif lower_name in STEM_NAME_MAP:\n                                stem_name = STEM_NAME_MAP[lower_name]\n                            else:\n                                stem_name = raw_stem_name.title()\n\n                            if stem_name not in stems_by_type:\n                                stems_by_type[stem_name] = []\n\n                            abs_path = stem_path if os.path.isabs(stem_path) else os.path.join(temp_dir, stem_path)\n                            stems_by_type[stem_name].append(abs_path)\n                    finally:\n                        self.output_dir = original_output_dir\n\n                # Perform ensembling for each stem type\n                ensembler = Ensembler(self.logger, self.ensemble_algorithm, self.ensemble_weights)\n                base_name = os.path.splitext(os.path.basename(path))[0]\n\n                for stem_name, stem_paths in stems_by_type.items():\n                    self.logger.info(f\"Ensembling {len(stem_paths)} stems for type: {stem_name}\")\n\n                    waveforms = []\n                    original_channels = None\n                    for sp in stem_paths:\n                        wav, _ = librosa.load(sp, mono=False, sr=self.sample_rate)\n                        if wav.ndim == 1:\n                            if original_channels is None:\n                                original_channels = 1\n                            wav = np.asfortranarray([wav, wav])\n                        elif original_channels is None:\n                            original_channels = wav.shape[0]\n                        waveforms.append(wav)\n\n                    ensembled_wav = ensembler.ensemble(waveforms)\n\n                    # Restore original channel count (avoid fake stereo from mono input)\n                    if original_channels == 1 and ensembled_wav.shape[0] > 1:\n                        ensembled_wav = ensembled_wav[:1, :]\n\n                    # Determine output filename\n                    if custom_output_names and stem_name in custom_output_names:\n                        output_filename = custom_output_names[stem_name]\n                    elif self.ensemble_preset:\n                        output_filename = f\"{base_name}_({stem_name})_preset_{self.ensemble_preset}\"\n                    else:\n                        # Build descriptive name from model filenames\n                        model_slugs = []\n                        for mf in original_model_filenames:\n                            # Remove extension, then truncate to keep filenames reasonable\n                            name = os.path.splitext(mf)[0]\n                            # Remove common verbose prefixes\n                            for prefix in [\"mel_band_roformer_\", \"melband_roformer_\", \"bs_roformer_\", \"model_bs_roformer_\", \"UVR-MDX-NET-\", \"UVR_MDXNET_\"]:\n                                if name.startswith(prefix):\n                                    name = name[len(prefix):]\n                                    break\n                            model_slugs.append(name[:12])\n                        slugs_str = \"_\".join(model_slugs)\n                        output_filename = f\"{base_name}_({stem_name})_custom_ensemble_{slugs_str}\"\n\n                    output_path = f\"{output_filename}.{self.output_format.lower()}\"\n\n                    # Use a dummy model instance to write the audio if necessary,\n                    # or just use the last model instance we had.\n                    # Actually, we can use the write_audio method from the last model_instance\n                    if self.model_instance:\n                        # Ensure the model instance has the correct audio_file_path and output_dir\n                        self.model_instance.audio_file_path = path\n                        self.model_instance.output_dir = self.output_dir\n                        self.model_instance.write_audio(output_path, ensembled_wav.T)\n                        final_output_path = os.path.join(self.output_dir, output_path)\n                        output_files.append(final_output_path)\n                    else:\n                        # Fallback writer if no model instance is available\n                        self.logger.warning(f\"No model instance available to write ensembled audio. Using fallback writer for {output_path}\")\n                        final_output_path = os.path.join(self.output_dir, output_path)\n\n                        import soundfile as sf\n\n                        try:\n                            self.logger.debug(f\"Attempting to write ensembled audio to {final_output_path}...\")\n                            sf.write(final_output_path, ensembled_wav.T, self.sample_rate)\n                        except Exception as e:\n                            self.logger.error(f\"Error writing {self.output_format} format: {e}. Falling back to WAV.\")\n                            final_output_path = final_output_path.rsplit(\".\", 1)[0] + \".wav\"\n                            sf.write(final_output_path, ensembled_wav.T, self.sample_rate)\n\n                        output_files.append(final_output_path)\n\n            finally:\n                # Restore original model filenames state\n                self.model_filename = original_model_filename\n                self.model_filenames = original_model_filenames\n\n                # Clear model instance reference\n                self.model_instance = None\n\n                # Clean up temporary directory\n                if os.path.exists(temp_dir):\n                    self.logger.debug(f\"Cleaning up temporary directory: {temp_dir}\")\n                    shutil.rmtree(temp_dir, ignore_errors=True)\n\n        return output_files\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/__init__.py",
    "content": ""
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/__init__.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/__main__.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nimport json\nimport os\nimport sys\nimport time\nfrom dataclasses import dataclass, field\nfrom fractions import Fraction\n\nimport torch as th\nfrom torch import distributed, nn\nfrom torch.nn.parallel.distributed import DistributedDataParallel\n\nfrom .augment import FlipChannels, FlipSign, Remix, Shift\nfrom .compressed import StemsSet, build_musdb_metadata, get_musdb_tracks\nfrom .model import Demucs\nfrom .parser import get_name, get_parser\nfrom .raw import Rawset\nfrom .tasnet import ConvTasNet\nfrom .test import evaluate\nfrom .train import train_model, validate_model\nfrom .utils import human_seconds, load_model, save_model, sizeof_fmt\n\n\n@dataclass\nclass SavedState:\n    metrics: list = field(default_factory=list)\n    last_state: dict = None\n    best_state: dict = None\n    optimizer: dict = None\n\n\ndef main():\n    parser = get_parser()\n    args = parser.parse_args()\n    name = get_name(parser, args)\n    print(f\"Experiment {name}\")\n\n    if args.musdb is None and args.rank == 0:\n        print(\"You must provide the path to the MusDB dataset with the --musdb flag. \" \"To download the MusDB dataset, see https://sigsep.github.io/datasets/musdb.html.\", file=sys.stderr)\n        sys.exit(1)\n\n    eval_folder = args.evals / name\n    eval_folder.mkdir(exist_ok=True, parents=True)\n    args.logs.mkdir(exist_ok=True)\n    metrics_path = args.logs / f\"{name}.json\"\n    eval_folder.mkdir(exist_ok=True, parents=True)\n    args.checkpoints.mkdir(exist_ok=True, parents=True)\n    args.models.mkdir(exist_ok=True, parents=True)\n\n    if args.device is None:\n        device = \"cpu\"\n        if th.cuda.is_available():\n            device = \"cuda\"\n    else:\n        device = args.device\n\n    th.manual_seed(args.seed)\n    # Prevents too many threads to be started when running `museval` as it can be quite\n    # inefficient on NUMA architectures.\n    os.environ[\"OMP_NUM_THREADS\"] = \"1\"\n\n    if args.world_size > 1:\n        if device != \"cuda\" and args.rank == 0:\n            print(\"Error: distributed training is only available with cuda device\", file=sys.stderr)\n            sys.exit(1)\n        th.cuda.set_device(args.rank % th.cuda.device_count())\n        distributed.init_process_group(backend=\"nccl\", init_method=\"tcp://\" + args.master, rank=args.rank, world_size=args.world_size)\n\n    checkpoint = args.checkpoints / f\"{name}.th\"\n    checkpoint_tmp = args.checkpoints / f\"{name}.th.tmp\"\n    if args.restart and checkpoint.exists():\n        checkpoint.unlink()\n\n    if args.test:\n        args.epochs = 1\n        args.repeat = 0\n        model = load_model(args.models / args.test)\n    elif args.tasnet:\n        model = ConvTasNet(audio_channels=args.audio_channels, samplerate=args.samplerate, X=args.X)\n    else:\n        model = Demucs(\n            audio_channels=args.audio_channels,\n            channels=args.channels,\n            context=args.context,\n            depth=args.depth,\n            glu=args.glu,\n            growth=args.growth,\n            kernel_size=args.kernel_size,\n            lstm_layers=args.lstm_layers,\n            rescale=args.rescale,\n            rewrite=args.rewrite,\n            sources=4,\n            stride=args.conv_stride,\n            upsample=args.upsample,\n            samplerate=args.samplerate,\n        )\n    model.to(device)\n    if args.show:\n        print(model)\n        size = sizeof_fmt(4 * sum(p.numel() for p in model.parameters()))\n        print(f\"Model size {size}\")\n        return\n\n    optimizer = th.optim.Adam(model.parameters(), lr=args.lr)\n\n    try:\n        saved = th.load(checkpoint, map_location=\"cpu\")\n    except IOError:\n        saved = SavedState()\n    else:\n        model.load_state_dict(saved.last_state)\n        optimizer.load_state_dict(saved.optimizer)\n\n    if args.save_model:\n        if args.rank == 0:\n            model.to(\"cpu\")\n            model.load_state_dict(saved.best_state)\n            save_model(model, args.models / f\"{name}.th\")\n        return\n\n    if args.rank == 0:\n        done = args.logs / f\"{name}.done\"\n        if done.exists():\n            done.unlink()\n\n    if args.augment:\n        augment = nn.Sequential(FlipSign(), FlipChannels(), Shift(args.data_stride), Remix(group_size=args.remix_group_size)).to(device)\n    else:\n        augment = Shift(args.data_stride)\n\n    if args.mse:\n        criterion = nn.MSELoss()\n    else:\n        criterion = nn.L1Loss()\n\n    # Setting number of samples so that all convolution windows are full.\n    # Prevents hard to debug mistake with the prediction being shifted compared\n    # to the input mixture.\n    samples = model.valid_length(args.samples)\n    print(f\"Number of training samples adjusted to {samples}\")\n\n    if args.raw:\n        train_set = Rawset(args.raw / \"train\", samples=samples + args.data_stride, channels=args.audio_channels, streams=[0, 1, 2, 3, 4], stride=args.data_stride)\n\n        valid_set = Rawset(args.raw / \"valid\", channels=args.audio_channels)\n    else:\n        if not args.metadata.is_file() and args.rank == 0:\n            build_musdb_metadata(args.metadata, args.musdb, args.workers)\n        if args.world_size > 1:\n            distributed.barrier()\n        metadata = json.load(open(args.metadata))\n        duration = Fraction(samples + args.data_stride, args.samplerate)\n        stride = Fraction(args.data_stride, args.samplerate)\n        train_set = StemsSet(get_musdb_tracks(args.musdb, subsets=[\"train\"], split=\"train\"), metadata, duration=duration, stride=stride, samplerate=args.samplerate, channels=args.audio_channels)\n        valid_set = StemsSet(get_musdb_tracks(args.musdb, subsets=[\"train\"], split=\"valid\"), metadata, samplerate=args.samplerate, channels=args.audio_channels)\n\n    best_loss = float(\"inf\")\n    for epoch, metrics in enumerate(saved.metrics):\n        print(f\"Epoch {epoch:03d}: \" f\"train={metrics['train']:.8f} \" f\"valid={metrics['valid']:.8f} \" f\"best={metrics['best']:.4f} \" f\"duration={human_seconds(metrics['duration'])}\")\n        best_loss = metrics[\"best\"]\n\n    if args.world_size > 1:\n        dmodel = DistributedDataParallel(model, device_ids=[th.cuda.current_device()], output_device=th.cuda.current_device())\n    else:\n        dmodel = model\n\n    for epoch in range(len(saved.metrics), args.epochs):\n        begin = time.time()\n        model.train()\n        train_loss = train_model(\n            epoch, train_set, dmodel, criterion, optimizer, augment, batch_size=args.batch_size, device=device, repeat=args.repeat, seed=args.seed, workers=args.workers, world_size=args.world_size\n        )\n        model.eval()\n        valid_loss = validate_model(epoch, valid_set, model, criterion, device=device, rank=args.rank, split=args.split_valid, world_size=args.world_size)\n\n        duration = time.time() - begin\n        if valid_loss < best_loss:\n            best_loss = valid_loss\n            saved.best_state = {key: value.to(\"cpu\").clone() for key, value in model.state_dict().items()}\n        saved.metrics.append({\"train\": train_loss, \"valid\": valid_loss, \"best\": best_loss, \"duration\": duration})\n        if args.rank == 0:\n            json.dump(saved.metrics, open(metrics_path, \"w\"))\n\n        saved.last_state = model.state_dict()\n        saved.optimizer = optimizer.state_dict()\n        if args.rank == 0 and not args.test:\n            th.save(saved, checkpoint_tmp)\n            checkpoint_tmp.rename(checkpoint)\n\n        print(f\"Epoch {epoch:03d}: \" f\"train={train_loss:.8f} valid={valid_loss:.8f} best={best_loss:.4f} \" f\"duration={human_seconds(duration)}\")\n\n    del dmodel\n    model.load_state_dict(saved.best_state)\n    if args.eval_cpu:\n        device = \"cpu\"\n        model.to(device)\n    model.eval()\n    evaluate(model, args.musdb, eval_folder, rank=args.rank, world_size=args.world_size, device=device, save=args.save, split=args.split_valid, shifts=args.shifts, workers=args.eval_workers)\n    model.to(\"cpu\")\n    save_model(model, args.models / f\"{name}.th\")\n    if args.rank == 0:\n        print(\"done\")\n        done.write_text(\"done\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/apply.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"\nCode to apply a model to a mix. It will handle chunking with overlaps and\ninteprolation between chunks, as well as the \"shift trick\".\n\"\"\"\nfrom concurrent.futures import ThreadPoolExecutor\nimport random\nimport typing as tp\n\nimport torch as th\nfrom torch import nn\nfrom torch.nn import functional as F\nimport tqdm\n\nfrom .demucs import Demucs\nfrom .hdemucs import HDemucs\nfrom .utils import center_trim, DummyPoolExecutor\n\nModel = tp.Union[Demucs, HDemucs]\n\nprogress_bar_num = 0\n\n\nclass BagOfModels(nn.Module):\n    def __init__(self, models: tp.List[Model], weights: tp.Optional[tp.List[tp.List[float]]] = None, segment: tp.Optional[float] = None):\n        \"\"\"\n        Represents a bag of models with specific weights.\n        You should call `apply_model` rather than calling directly the forward here for\n        optimal performance.\n\n        Args:\n            models (list[nn.Module]): list of Demucs/HDemucs models.\n            weights (list[list[float]]): list of weights. If None, assumed to\n                be all ones, otherwise it should be a list of N list (N number of models),\n                each containing S floats (S number of sources).\n            segment (None or float): overrides the `segment` attribute of each model\n                (this is performed inplace, be careful if you reuse the models passed).\n        \"\"\"\n\n        super().__init__()\n        assert len(models) > 0\n        first = models[0]\n        for other in models:\n            assert other.sources == first.sources\n            assert other.samplerate == first.samplerate\n            assert other.audio_channels == first.audio_channels\n            if segment is not None:\n                other.segment = segment\n\n        self.audio_channels = first.audio_channels\n        self.samplerate = first.samplerate\n        self.sources = first.sources\n        self.models = nn.ModuleList(models)\n\n        if weights is None:\n            weights = [[1.0 for _ in first.sources] for _ in models]\n        else:\n            assert len(weights) == len(models)\n            for weight in weights:\n                assert len(weight) == len(first.sources)\n        self.weights = weights\n\n    def forward(self, x):\n        raise NotImplementedError(\"Call `apply_model` on this.\")\n\n\nclass TensorChunk:\n    def __init__(self, tensor, offset=0, length=None):\n        total_length = tensor.shape[-1]\n        assert offset >= 0\n        assert offset < total_length\n\n        if length is None:\n            length = total_length - offset\n        else:\n            length = min(total_length - offset, length)\n\n        if isinstance(tensor, TensorChunk):\n            self.tensor = tensor.tensor\n            self.offset = offset + tensor.offset\n        else:\n            self.tensor = tensor\n            self.offset = offset\n        self.length = length\n        self.device = tensor.device\n\n    @property\n    def shape(self):\n        shape = list(self.tensor.shape)\n        shape[-1] = self.length\n        return shape\n\n    def padded(self, target_length):\n        delta = target_length - self.length\n        total_length = self.tensor.shape[-1]\n        assert delta >= 0\n\n        start = self.offset - delta // 2\n        end = start + target_length\n\n        correct_start = max(0, start)\n        correct_end = min(total_length, end)\n\n        pad_left = correct_start - start\n        pad_right = end - correct_end\n\n        out = F.pad(self.tensor[..., correct_start:correct_end], (pad_left, pad_right))\n        assert out.shape[-1] == target_length\n        return out\n\n\ndef tensor_chunk(tensor_or_chunk):\n    if isinstance(tensor_or_chunk, TensorChunk):\n        return tensor_or_chunk\n    else:\n        assert isinstance(tensor_or_chunk, th.Tensor)\n        return TensorChunk(tensor_or_chunk)\n\n\ndef apply_model(model, mix, shifts=1, split=True, overlap=0.25, transition_power=1.0, static_shifts=1, set_progress_bar=None, device=None, progress=False, num_workers=0, pool=None):\n    \"\"\"\n    Apply model to a given mixture.\n\n    Args:\n        shifts (int): if > 0, will shift in time `mix` by a random amount between 0 and 0.5 sec\n            and apply the oppositve shift to the output. This is repeated `shifts` time and\n            all predictions are averaged. This effectively makes the model time equivariant\n            and improves SDR by up to 0.2 points.\n        split (bool): if True, the input will be broken down in 8 seconds extracts\n            and predictions will be performed individually on each and concatenated.\n            Useful for model with large memory footprint like Tasnet.\n        progress (bool): if True, show a progress bar (requires split=True)\n        device (torch.device, str, or None): if provided, device on which to\n            execute the computation, otherwise `mix.device` is assumed.\n            When `device` is different from `mix.device`, only local computations will\n            be on `device`, while the entire tracks will be stored on `mix.device`.\n    \"\"\"\n\n    global fut_length\n    global bag_num\n    global prog_bar\n\n    if device is None:\n        device = mix.device\n    else:\n        device = th.device(device)\n    if pool is None:\n        if num_workers > 0 and device.type == \"cpu\":\n            pool = ThreadPoolExecutor(num_workers)\n        else:\n            pool = DummyPoolExecutor()\n\n    kwargs = {\n        \"shifts\": shifts,\n        \"split\": split,\n        \"overlap\": overlap,\n        \"transition_power\": transition_power,\n        \"progress\": progress,\n        \"device\": device,\n        \"pool\": pool,\n        \"set_progress_bar\": set_progress_bar,\n        \"static_shifts\": static_shifts,\n    }\n\n    if isinstance(model, BagOfModels):\n        # Special treatment for bag of model.\n        # We explicitely apply multiple times `apply_model` so that the random shifts\n        # are different for each model.\n\n        estimates = 0\n        totals = [0] * len(model.sources)\n        bag_num = len(model.models)\n        fut_length = 0\n        prog_bar = 0\n        current_model = 0  # (bag_num + 1)\n        for sub_model, weight in zip(model.models, model.weights):\n            original_model_device = next(iter(sub_model.parameters())).device\n            sub_model.to(device)\n            fut_length += fut_length\n            current_model += 1\n            out = apply_model(sub_model, mix, **kwargs)\n            sub_model.to(original_model_device)\n            for k, inst_weight in enumerate(weight):\n                out[:, k, :, :] *= inst_weight\n                totals[k] += inst_weight\n            estimates += out\n            del out\n\n        for k in range(estimates.shape[1]):\n            estimates[:, k, :, :] /= totals[k]\n        return estimates\n\n    model.to(device)\n    model.eval()\n    assert transition_power >= 1, \"transition_power < 1 leads to weird behavior.\"\n    batch, channels, length = mix.shape\n\n    if shifts:\n        kwargs[\"shifts\"] = 0\n        max_shift = int(0.5 * model.samplerate)\n        mix = tensor_chunk(mix)\n        padded_mix = mix.padded(length + 2 * max_shift)\n        out = 0\n        for _ in range(shifts):\n            offset = random.randint(0, max_shift)\n            shifted = TensorChunk(padded_mix, offset, length + max_shift - offset)\n            shifted_out = apply_model(model, shifted, **kwargs)\n            out += shifted_out[..., max_shift - offset :]\n        out /= shifts\n        return out\n    elif split:\n        kwargs[\"split\"] = False\n        out = th.zeros(batch, len(model.sources), channels, length, device=mix.device)\n        sum_weight = th.zeros(length, device=mix.device)\n        segment = int(model.samplerate * model.segment)\n        stride = int((1 - overlap) * segment)\n        offsets = range(0, length, stride)\n        scale = float(format(stride / model.samplerate, \".2f\"))\n        # We start from a triangle shaped weight, with maximal weight in the middle\n        # of the segment. Then we normalize and take to the power `transition_power`.\n        # Large values of transition power will lead to sharper transitions.\n        weight = th.cat([th.arange(1, segment // 2 + 1, device=device), th.arange(segment - segment // 2, 0, -1, device=device)])\n        assert len(weight) == segment\n        # If the overlap < 50%, this will translate to linear transition when\n        # transition_power is 1.\n        weight = (weight / weight.max()) ** transition_power\n        futures = []\n        for offset in offsets:\n            chunk = TensorChunk(mix, offset, segment)\n            future = pool.submit(apply_model, model, chunk, **kwargs)\n            futures.append((future, offset))\n            offset += segment\n        if progress:\n            futures = tqdm.tqdm(futures)\n        for future, offset in futures:\n            if set_progress_bar:\n                fut_length = len(futures) * bag_num * static_shifts\n                prog_bar += 1\n                set_progress_bar(0.1, (0.8 / fut_length * prog_bar))\n            chunk_out = future.result()\n            chunk_length = chunk_out.shape[-1]\n            out[..., offset : offset + segment] += (weight[:chunk_length] * chunk_out).to(mix.device)\n            sum_weight[offset : offset + segment] += weight[:chunk_length].to(mix.device)\n        assert sum_weight.min() > 0\n        out /= sum_weight\n        return out\n    else:\n        if hasattr(model, \"valid_length\"):\n            valid_length = model.valid_length(length)\n        else:\n            valid_length = length\n        mix = tensor_chunk(mix)\n        padded_mix = mix.padded(valid_length).to(device)\n        with th.no_grad():\n            out = model(padded_mix)\n        return center_trim(out, length)\n\n\ndef demucs_segments(demucs_segment, demucs_model):\n\n    if demucs_segment == \"Default\":\n        segment = None\n        if isinstance(demucs_model, BagOfModels):\n            if segment is not None:\n                for sub in demucs_model.models:\n                    sub.segment = segment\n        else:\n            if segment is not None:\n                sub.segment = segment\n    else:\n        try:\n            segment = int(demucs_segment)\n            if isinstance(demucs_model, BagOfModels):\n                if segment is not None:\n                    for sub in demucs_model.models:\n                        sub.segment = segment\n            else:\n                if segment is not None:\n                    sub.segment = segment\n        except:\n            segment = None\n            if isinstance(demucs_model, BagOfModels):\n                if segment is not None:\n                    for sub in demucs_model.models:\n                        sub.segment = segment\n            else:\n                if segment is not None:\n                    sub.segment = segment\n\n    return demucs_model\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/demucs.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nimport math\nimport typing as tp\n\nimport julius\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom .states import capture_init\nfrom .utils import center_trim, unfold\n\n\nclass BLSTM(nn.Module):\n    \"\"\"\n    BiLSTM with same hidden units as input dim.\n    If `max_steps` is not None, input will be splitting in overlapping\n    chunks and the LSTM applied separately on each chunk.\n    \"\"\"\n\n    def __init__(self, dim, layers=1, max_steps=None, skip=False):\n        super().__init__()\n        assert max_steps is None or max_steps % 4 == 0\n        self.max_steps = max_steps\n        self.lstm = nn.LSTM(bidirectional=True, num_layers=layers, hidden_size=dim, input_size=dim)\n        self.linear = nn.Linear(2 * dim, dim)\n        self.skip = skip\n\n    def forward(self, x):\n        B, C, T = x.shape\n        y = x\n        framed = False\n        if self.max_steps is not None and T > self.max_steps:\n            width = self.max_steps\n            stride = width // 2\n            frames = unfold(x, width, stride)\n            nframes = frames.shape[2]\n            framed = True\n            x = frames.permute(0, 2, 1, 3).reshape(-1, C, width)\n\n        x = x.permute(2, 0, 1)\n\n        x = self.lstm(x)[0]\n        x = self.linear(x)\n        x = x.permute(1, 2, 0)\n        if framed:\n            out = []\n            frames = x.reshape(B, -1, C, width)\n            limit = stride // 2\n            for k in range(nframes):\n                if k == 0:\n                    out.append(frames[:, k, :, :-limit])\n                elif k == nframes - 1:\n                    out.append(frames[:, k, :, limit:])\n                else:\n                    out.append(frames[:, k, :, limit:-limit])\n            out = torch.cat(out, -1)\n            out = out[..., :T]\n            x = out\n        if self.skip:\n            x = x + y\n        return x\n\n\ndef rescale_conv(conv, reference):\n    \"\"\"Rescale initial weight scale. It is unclear why it helps but it certainly does.\"\"\"\n    std = conv.weight.std().detach()\n    scale = (std / reference) ** 0.5\n    conv.weight.data /= scale\n    if conv.bias is not None:\n        conv.bias.data /= scale\n\n\ndef rescale_module(module, reference):\n    for sub in module.modules():\n        if isinstance(sub, (nn.Conv1d, nn.ConvTranspose1d, nn.Conv2d, nn.ConvTranspose2d)):\n            rescale_conv(sub, reference)\n\n\nclass LayerScale(nn.Module):\n    \"\"\"Layer scale from [Touvron et al 2021] (https://arxiv.org/pdf/2103.17239.pdf).\n    This rescales diagonaly residual outputs close to 0 initially, then learnt.\n    \"\"\"\n\n    def __init__(self, channels: int, init: float = 0):\n        super().__init__()\n        self.scale = nn.Parameter(torch.zeros(channels, requires_grad=True))\n        self.scale.data[:] = init\n\n    def forward(self, x):\n        return self.scale[:, None] * x\n\n\nclass DConv(nn.Module):\n    \"\"\"\n    New residual branches in each encoder layer.\n    This alternates dilated convolutions, potentially with LSTMs and attention.\n    Also before entering each residual branch, dimension is projected on a smaller subspace,\n    e.g. of dim `channels // compress`.\n    \"\"\"\n\n    def __init__(self, channels: int, compress: float = 4, depth: int = 2, init: float = 1e-4, norm=True, attn=False, heads=4, ndecay=4, lstm=False, gelu=True, kernel=3, dilate=True):\n        \"\"\"\n        Args:\n            channels: input/output channels for residual branch.\n            compress: amount of channel compression inside the branch.\n            depth: number of layers in the residual branch. Each layer has its own\n                projection, and potentially LSTM and attention.\n            init: initial scale for LayerNorm.\n            norm: use GroupNorm.\n            attn: use LocalAttention.\n            heads: number of heads for the LocalAttention.\n            ndecay: number of decay controls in the LocalAttention.\n            lstm: use LSTM.\n            gelu: Use GELU activation.\n            kernel: kernel size for the (dilated) convolutions.\n            dilate: if true, use dilation, increasing with the depth.\n        \"\"\"\n\n        super().__init__()\n        assert kernel % 2 == 1\n        self.channels = channels\n        self.compress = compress\n        self.depth = abs(depth)\n        dilate = depth > 0\n\n        norm_fn: tp.Callable[[int], nn.Module]\n        norm_fn = lambda d: nn.Identity()  # noqa\n        if norm:\n            norm_fn = lambda d: nn.GroupNorm(1, d)  # noqa\n\n        hidden = int(channels / compress)\n\n        act: tp.Type[nn.Module]\n        if gelu:\n            act = nn.GELU\n        else:\n            act = nn.ReLU\n\n        self.layers = nn.ModuleList([])\n        for d in range(self.depth):\n            dilation = 2**d if dilate else 1\n            padding = dilation * (kernel // 2)\n            mods = [\n                nn.Conv1d(channels, hidden, kernel, dilation=dilation, padding=padding),\n                norm_fn(hidden),\n                act(),\n                nn.Conv1d(hidden, 2 * channels, 1),\n                norm_fn(2 * channels),\n                nn.GLU(1),\n                LayerScale(channels, init),\n            ]\n            if attn:\n                mods.insert(3, LocalState(hidden, heads=heads, ndecay=ndecay))\n            if lstm:\n                mods.insert(3, BLSTM(hidden, layers=2, max_steps=200, skip=True))\n            layer = nn.Sequential(*mods)\n            self.layers.append(layer)\n\n    def forward(self, x):\n        for layer in self.layers:\n            x = x + layer(x)\n        return x\n\n\nclass LocalState(nn.Module):\n    \"\"\"Local state allows to have attention based only on data (no positional embedding),\n    but while setting a constraint on the time window (e.g. decaying penalty term).\n\n    Also a failed experiments with trying to provide some frequency based attention.\n    \"\"\"\n\n    def __init__(self, channels: int, heads: int = 4, nfreqs: int = 0, ndecay: int = 4):\n        super().__init__()\n        assert channels % heads == 0, (channels, heads)\n        self.heads = heads\n        self.nfreqs = nfreqs\n        self.ndecay = ndecay\n        self.content = nn.Conv1d(channels, channels, 1)\n        self.query = nn.Conv1d(channels, channels, 1)\n        self.key = nn.Conv1d(channels, channels, 1)\n        if nfreqs:\n            self.query_freqs = nn.Conv1d(channels, heads * nfreqs, 1)\n        if ndecay:\n            self.query_decay = nn.Conv1d(channels, heads * ndecay, 1)\n            # Initialize decay close to zero (there is a sigmoid), for maximum initial window.\n            self.query_decay.weight.data *= 0.01\n            assert self.query_decay.bias is not None  # stupid type checker\n            self.query_decay.bias.data[:] = -2\n        self.proj = nn.Conv1d(channels + heads * nfreqs, channels, 1)\n\n    def forward(self, x):\n        B, C, T = x.shape\n        heads = self.heads\n        indexes = torch.arange(T, device=x.device, dtype=x.dtype)\n        # left index are keys, right index are queries\n        delta = indexes[:, None] - indexes[None, :]\n\n        queries = self.query(x).view(B, heads, -1, T)\n        keys = self.key(x).view(B, heads, -1, T)\n        # t are keys, s are queries\n        dots = torch.einsum(\"bhct,bhcs->bhts\", keys, queries)\n        dots /= keys.shape[2] ** 0.5\n        if self.nfreqs:\n            periods = torch.arange(1, self.nfreqs + 1, device=x.device, dtype=x.dtype)\n            freq_kernel = torch.cos(2 * math.pi * delta / periods.view(-1, 1, 1))\n            freq_q = self.query_freqs(x).view(B, heads, -1, T) / self.nfreqs**0.5\n            dots += torch.einsum(\"fts,bhfs->bhts\", freq_kernel, freq_q)\n        if self.ndecay:\n            decays = torch.arange(1, self.ndecay + 1, device=x.device, dtype=x.dtype)\n            decay_q = self.query_decay(x).view(B, heads, -1, T)\n            decay_q = torch.sigmoid(decay_q) / 2\n            decay_kernel = -decays.view(-1, 1, 1) * delta.abs() / self.ndecay**0.5\n            dots += torch.einsum(\"fts,bhfs->bhts\", decay_kernel, decay_q)\n\n        # Kill self reference.\n        dots.masked_fill_(torch.eye(T, device=dots.device, dtype=torch.bool), -100)\n        weights = torch.softmax(dots, dim=2)\n\n        content = self.content(x).view(B, heads, -1, T)\n        result = torch.einsum(\"bhts,bhct->bhcs\", weights, content)\n        if self.nfreqs:\n            time_sig = torch.einsum(\"bhts,fts->bhfs\", weights, freq_kernel)\n            result = torch.cat([result, time_sig], 2)\n        result = result.reshape(B, -1, T)\n        return x + self.proj(result)\n\n\nclass Demucs(nn.Module):\n    @capture_init\n    def __init__(\n        self,\n        sources,\n        # Channels\n        audio_channels=2,\n        channels=64,\n        growth=2.0,\n        # Main structure\n        depth=6,\n        rewrite=True,\n        lstm_layers=0,\n        # Convolutions\n        kernel_size=8,\n        stride=4,\n        context=1,\n        # Activations\n        gelu=True,\n        glu=True,\n        # Normalization\n        norm_starts=4,\n        norm_groups=4,\n        # DConv residual branch\n        dconv_mode=1,\n        dconv_depth=2,\n        dconv_comp=4,\n        dconv_attn=4,\n        dconv_lstm=4,\n        dconv_init=1e-4,\n        # Pre/post processing\n        normalize=True,\n        resample=True,\n        # Weight init\n        rescale=0.1,\n        # Metadata\n        samplerate=44100,\n        segment=4 * 10,\n    ):\n        \"\"\"\n        Args:\n            sources (list[str]): list of source names\n            audio_channels (int): stereo or mono\n            channels (int): first convolution channels\n            depth (int): number of encoder/decoder layers\n            growth (float): multiply (resp divide) number of channels by that\n                for each layer of the encoder (resp decoder)\n            depth (int): number of layers in the encoder and in the decoder.\n            rewrite (bool): add 1x1 convolution to each layer.\n            lstm_layers (int): number of lstm layers, 0 = no lstm. Deactivated\n                by default, as this is now replaced by the smaller and faster small LSTMs\n                in the DConv branches.\n            kernel_size (int): kernel size for convolutions\n            stride (int): stride for convolutions\n            context (int): kernel size of the convolution in the\n                decoder before the transposed convolution. If > 1,\n                will provide some context from neighboring time steps.\n            gelu: use GELU activation function.\n            glu (bool): use glu instead of ReLU for the 1x1 rewrite conv.\n            norm_starts: layer at which group norm starts being used.\n                decoder layers are numbered in reverse order.\n            norm_groups: number of groups for group norm.\n            dconv_mode: if 1: dconv in encoder only, 2: decoder only, 3: both.\n            dconv_depth: depth of residual DConv branch.\n            dconv_comp: compression of DConv branch.\n            dconv_attn: adds attention layers in DConv branch starting at this layer.\n            dconv_lstm: adds a LSTM layer in DConv branch starting at this layer.\n            dconv_init: initial scale for the DConv branch LayerScale.\n            normalize (bool): normalizes the input audio on the fly, and scales back\n                the output by the same amount.\n            resample (bool): upsample x2 the input and downsample /2 the output.\n            rescale (int): rescale initial weights of convolutions\n                to get their standard deviation closer to `rescale`.\n            samplerate (int): stored as meta information for easing\n                future evaluations of the model.\n            segment (float): duration of the chunks of audio to ideally evaluate the model on.\n                This is used by `demucs.apply.apply_model`.\n        \"\"\"\n\n        super().__init__()\n        self.audio_channels = audio_channels\n        self.sources = sources\n        self.kernel_size = kernel_size\n        self.context = context\n        self.stride = stride\n        self.depth = depth\n        self.resample = resample\n        self.channels = channels\n        self.normalize = normalize\n        self.samplerate = samplerate\n        self.segment = segment\n        self.encoder = nn.ModuleList()\n        self.decoder = nn.ModuleList()\n        self.skip_scales = nn.ModuleList()\n\n        if glu:\n            activation = nn.GLU(dim=1)\n            ch_scale = 2\n        else:\n            activation = nn.ReLU()\n            ch_scale = 1\n        if gelu:\n            act2 = nn.GELU\n        else:\n            act2 = nn.ReLU\n\n        in_channels = audio_channels\n        padding = 0\n        for index in range(depth):\n            norm_fn = lambda d: nn.Identity()  # noqa\n            if index >= norm_starts:\n                norm_fn = lambda d: nn.GroupNorm(norm_groups, d)  # noqa\n\n            encode = []\n            encode += [nn.Conv1d(in_channels, channels, kernel_size, stride), norm_fn(channels), act2()]\n            attn = index >= dconv_attn\n            lstm = index >= dconv_lstm\n            if dconv_mode & 1:\n                encode += [DConv(channels, depth=dconv_depth, init=dconv_init, compress=dconv_comp, attn=attn, lstm=lstm)]\n            if rewrite:\n                encode += [nn.Conv1d(channels, ch_scale * channels, 1), norm_fn(ch_scale * channels), activation]\n            self.encoder.append(nn.Sequential(*encode))\n\n            decode = []\n            if index > 0:\n                out_channels = in_channels\n            else:\n                out_channels = len(self.sources) * audio_channels\n            if rewrite:\n                decode += [nn.Conv1d(channels, ch_scale * channels, 2 * context + 1, padding=context), norm_fn(ch_scale * channels), activation]\n            if dconv_mode & 2:\n                decode += [DConv(channels, depth=dconv_depth, init=dconv_init, compress=dconv_comp, attn=attn, lstm=lstm)]\n            decode += [nn.ConvTranspose1d(channels, out_channels, kernel_size, stride, padding=padding)]\n            if index > 0:\n                decode += [norm_fn(out_channels), act2()]\n            self.decoder.insert(0, nn.Sequential(*decode))\n            in_channels = channels\n            channels = int(growth * channels)\n\n        channels = in_channels\n        if lstm_layers:\n            self.lstm = BLSTM(channels, lstm_layers)\n        else:\n            self.lstm = None\n\n        if rescale:\n            rescale_module(self, reference=rescale)\n\n    def valid_length(self, length):\n        \"\"\"\n        Return the nearest valid length to use with the model so that\n        there is no time steps left over in a convolution, e.g. for all\n        layers, size of the input - kernel_size % stride = 0.\n\n        Note that input are automatically padded if necessary to ensure that the output\n        has the same length as the input.\n        \"\"\"\n        if self.resample:\n            length *= 2\n\n        for _ in range(self.depth):\n            length = math.ceil((length - self.kernel_size) / self.stride) + 1\n            length = max(1, length)\n\n        for idx in range(self.depth):\n            length = (length - 1) * self.stride + self.kernel_size\n\n        if self.resample:\n            length = math.ceil(length / 2)\n        return int(length)\n\n    def forward(self, mix):\n        x = mix\n        length = x.shape[-1]\n\n        if self.normalize:\n            mono = mix.mean(dim=1, keepdim=True)\n            mean = mono.mean(dim=-1, keepdim=True)\n            std = mono.std(dim=-1, keepdim=True)\n            x = (x - mean) / (1e-5 + std)\n        else:\n            mean = 0\n            std = 1\n\n        delta = self.valid_length(length) - length\n        x = F.pad(x, (delta // 2, delta - delta // 2))\n\n        if self.resample:\n            x = julius.resample_frac(x, 1, 2)\n\n        saved = []\n        for encode in self.encoder:\n            x = encode(x)\n            saved.append(x)\n\n        if self.lstm:\n            x = self.lstm(x)\n\n        for decode in self.decoder:\n            skip = saved.pop(-1)\n            skip = center_trim(skip, x)\n            x = decode(x + skip)\n\n        if self.resample:\n            x = julius.resample_frac(x, 2, 1)\n        x = x * std + mean\n        x = center_trim(x, length)\n        x = x.view(x.size(0), len(self.sources), self.audio_channels, x.size(-1))\n        return x\n\n    def load_state_dict(self, state, strict=True):\n        # fix a mismatch with previous generation Demucs models.\n        for idx in range(self.depth):\n            for a in [\"encoder\", \"decoder\"]:\n                for b in [\"bias\", \"weight\"]:\n                    new = f\"{a}.{idx}.3.{b}\"\n                    old = f\"{a}.{idx}.2.{b}\"\n                    if old in state and new not in state:\n                        state[new] = state.pop(old)\n        super().load_state_dict(state, strict=strict)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/filtering.py",
    "content": "from typing import Optional\nimport torch\nimport torch.nn as nn\nfrom torch import Tensor\nfrom torch.utils.data import DataLoader\n\n\ndef atan2(y, x):\n    r\"\"\"Element-wise arctangent function of y/x.\n    Returns a new tensor with signed angles in radians.\n    It is an alternative implementation of torch.atan2\n\n    Args:\n        y (Tensor): First input tensor\n        x (Tensor): Second input tensor [shape=y.shape]\n\n    Returns:\n        Tensor: [shape=y.shape].\n    \"\"\"\n    pi = 2 * torch.asin(torch.tensor(1.0))\n    x += ((x == 0) & (y == 0)) * 1.0\n    out = torch.atan(y / x)\n    out += ((y >= 0) & (x < 0)) * pi\n    out -= ((y < 0) & (x < 0)) * pi\n    out *= 1 - ((y > 0) & (x == 0)) * 1.0\n    out += ((y > 0) & (x == 0)) * (pi / 2)\n    out *= 1 - ((y < 0) & (x == 0)) * 1.0\n    out += ((y < 0) & (x == 0)) * (-pi / 2)\n    return out\n\n\n# Define basic complex operations on torch.Tensor objects whose last dimension\n# consists in the concatenation of the real and imaginary parts.\n\n\ndef _norm(x: torch.Tensor) -> torch.Tensor:\n    r\"\"\"Computes the norm value of a torch Tensor, assuming that it\n    comes as real and imaginary part in its last dimension.\n\n    Args:\n        x (Tensor): Input Tensor of shape [shape=(..., 2)]\n\n    Returns:\n        Tensor: shape as x excluding the last dimension.\n    \"\"\"\n    return torch.abs(x[..., 0]) ** 2 + torch.abs(x[..., 1]) ** 2\n\n\ndef _mul_add(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:\n    \"\"\"Element-wise multiplication of two complex Tensors described\n    through their real and imaginary parts.\n    The result is added to the `out` tensor\"\"\"\n\n    # check `out` and allocate it if needed\n    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])\n    if out is None or out.shape != target_shape:\n        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)\n    if out is a:\n        real_a = a[..., 0]\n        out[..., 0] = out[..., 0] + (real_a * b[..., 0] - a[..., 1] * b[..., 1])\n        out[..., 1] = out[..., 1] + (real_a * b[..., 1] + a[..., 1] * b[..., 0])\n    else:\n        out[..., 0] = out[..., 0] + (a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1])\n        out[..., 1] = out[..., 1] + (a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0])\n    return out\n\n\ndef _mul(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:\n    \"\"\"Element-wise multiplication of two complex Tensors described\n    through their real and imaginary parts\n    can work in place in case out is a only\"\"\"\n    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])\n    if out is None or out.shape != target_shape:\n        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)\n    if out is a:\n        real_a = a[..., 0]\n        out[..., 0] = real_a * b[..., 0] - a[..., 1] * b[..., 1]\n        out[..., 1] = real_a * b[..., 1] + a[..., 1] * b[..., 0]\n    else:\n        out[..., 0] = a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1]\n        out[..., 1] = a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0]\n    return out\n\n\ndef _inv(z: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:\n    \"\"\"Element-wise multiplicative inverse of a Tensor with complex\n    entries described through their real and imaginary parts.\n    can work in place in case out is z\"\"\"\n    ez = _norm(z)\n    if out is None or out.shape != z.shape:\n        out = torch.zeros_like(z)\n    out[..., 0] = z[..., 0] / ez\n    out[..., 1] = -z[..., 1] / ez\n    return out\n\n\ndef _conj(z, out: Optional[torch.Tensor] = None) -> torch.Tensor:\n    \"\"\"Element-wise complex conjugate of a Tensor with complex entries\n    described through their real and imaginary parts.\n    can work in place in case out is z\"\"\"\n    if out is None or out.shape != z.shape:\n        out = torch.zeros_like(z)\n    out[..., 0] = z[..., 0]\n    out[..., 1] = -z[..., 1]\n    return out\n\n\ndef _invert(M: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:\n    \"\"\"\n    Invert 1x1 or 2x2 matrices\n\n    Will generate errors if the matrices are singular: user must handle this\n    through his own regularization schemes.\n\n    Args:\n        M (Tensor): [shape=(..., nb_channels, nb_channels, 2)]\n            matrices to invert: must be square along dimensions -3 and -2\n\n    Returns:\n        invM (Tensor): [shape=M.shape]\n            inverses of M\n    \"\"\"\n    nb_channels = M.shape[-2]\n\n    if out is None or out.shape != M.shape:\n        out = torch.empty_like(M)\n\n    if nb_channels == 1:\n        # scalar case\n        out = _inv(M, out)\n    elif nb_channels == 2:\n        # two channels case: analytical expression\n\n        # first compute the determinent\n        det = _mul(M[..., 0, 0, :], M[..., 1, 1, :])\n        det = det - _mul(M[..., 0, 1, :], M[..., 1, 0, :])\n        # invert it\n        invDet = _inv(det)\n\n        # then fill out the matrix with the inverse\n        out[..., 0, 0, :] = _mul(invDet, M[..., 1, 1, :], out[..., 0, 0, :])\n        out[..., 1, 0, :] = _mul(-invDet, M[..., 1, 0, :], out[..., 1, 0, :])\n        out[..., 0, 1, :] = _mul(-invDet, M[..., 0, 1, :], out[..., 0, 1, :])\n        out[..., 1, 1, :] = _mul(invDet, M[..., 0, 0, :], out[..., 1, 1, :])\n    else:\n        raise Exception(\"Only 2 channels are supported for the torch version.\")\n    return out\n\n\n# Now define the signal-processing low-level functions used by the Separator\n\n\ndef expectation_maximization(y: torch.Tensor, x: torch.Tensor, iterations: int = 2, eps: float = 1e-10, batch_size: int = 200):\n    r\"\"\"Expectation maximization algorithm, for refining source separation\n    estimates.\n\n    This algorithm allows to make source separation results better by\n    enforcing multichannel consistency for the estimates. This usually means\n    a better perceptual quality in terms of spatial artifacts.\n\n    The implementation follows the details presented in [1]_, taking\n    inspiration from the original EM algorithm proposed in [2]_ and its\n    weighted refinement proposed in [3]_, [4]_.\n    It works by iteratively:\n\n     * Re-estimate source parameters (power spectral densities and spatial\n       covariance matrices) through :func:`get_local_gaussian_model`.\n\n     * Separate again the mixture with the new parameters by first computing\n       the new modelled mixture covariance matrices with :func:`get_mix_model`,\n       prepare the Wiener filters through :func:`wiener_gain` and apply them\n       with :func:`apply_filter``.\n\n    References\n    ----------\n    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and\n        N. Takahashi and Y. Mitsufuji, \"Improving music source separation based\n        on deep neural networks through data augmentation and network\n        blending.\" 2017 IEEE International Conference on Acoustics, Speech\n        and Signal Processing (ICASSP). IEEE, 2017.\n\n    .. [2] N.Q. Duong and E. Vincent and R.Gribonval. \"Under-determined\n        reverberant audio source separation using a full-rank spatial\n        covariance model.\" IEEE Transactions on Audio, Speech, and Language\n        Processing 18.7 (2010): 1830-1840.\n\n    .. [3] A. Nugraha and A. Liutkus and E. Vincent. \"Multichannel audio source\n        separation with deep neural networks.\" IEEE/ACM Transactions on Audio,\n        Speech, and Language Processing 24.9 (2016): 1652-1664.\n\n    .. [4] A. Nugraha and A. Liutkus and E. Vincent. \"Multichannel music\n        separation with deep neural networks.\" 2016 24th European Signal\n        Processing Conference (EUSIPCO). IEEE, 2016.\n\n    .. [5] A. Liutkus and R. Badeau and G. Richard \"Kernel additive models for\n        source separation.\" IEEE Transactions on Signal Processing\n        62.16 (2014): 4298-4310.\n\n    Args:\n        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]\n            initial estimates for the sources\n        x (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2)]\n            complex STFT of the mixture signal\n        iterations (int): [scalar]\n            number of iterations for the EM algorithm.\n        eps (float or None): [scalar]\n            The epsilon value to use for regularization and filters.\n\n    Returns:\n        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]\n            estimated sources after iterations\n        v (Tensor): [shape=(nb_frames, nb_bins, nb_sources)]\n            estimated power spectral densities\n        R (Tensor): [shape=(nb_bins, nb_channels, nb_channels, 2, nb_sources)]\n            estimated spatial covariance matrices\n\n    Notes:\n        * You need an initial estimate for the sources to apply this\n          algorithm. This is precisely what the :func:`wiener` function does.\n        * This algorithm *is not* an implementation of the \"exact\" EM\n          proposed in [1]_. In particular, it does compute the posterior\n          covariance matrices the same (exact) way. Instead, it uses the\n          simplified approximate scheme initially proposed in [5]_ and further\n          refined in [3]_, [4]_, that boils down to just take the empirical\n          covariance of the recent source estimates, followed by a weighted\n          average for the update of the spatial covariance matrix. It has been\n          empirically demonstrated that this simplified algorithm is more\n          robust for music separation.\n\n    Warning:\n        It is *very* important to make sure `x.dtype` is `torch.float64`\n        if you want double precision, because this function will **not**\n        do such conversion for you from `torch.complex32`, in case you want the\n        smaller RAM usage on purpose.\n\n        It is usually always better in terms of quality to have double\n        precision, by e.g. calling :func:`expectation_maximization`\n        with ``x.to(torch.float64)``.\n    \"\"\"\n    # dimensions\n    (nb_frames, nb_bins, nb_channels) = x.shape[:-1]\n    nb_sources = y.shape[-1]\n\n    regularization = torch.cat((torch.eye(nb_channels, dtype=x.dtype, device=x.device)[..., None], torch.zeros((nb_channels, nb_channels, 1), dtype=x.dtype, device=x.device)), dim=2)\n    regularization = torch.sqrt(torch.as_tensor(eps)) * (regularization[None, None, ...].expand((-1, nb_bins, -1, -1, -1)))\n\n    # allocate the spatial covariance matrices\n    R = [torch.zeros((nb_bins, nb_channels, nb_channels, 2), dtype=x.dtype, device=x.device) for j in range(nb_sources)]\n    weight: torch.Tensor = torch.zeros((nb_bins,), dtype=x.dtype, device=x.device)\n\n    v: torch.Tensor = torch.zeros((nb_frames, nb_bins, nb_sources), dtype=x.dtype, device=x.device)\n    for it in range(iterations):\n        # constructing the mixture covariance matrix. Doing it with a loop\n        # to avoid storing anytime in RAM the whole 6D tensor\n\n        # update the PSD as the average spectrogram over channels\n        v = torch.mean(torch.abs(y[..., 0, :]) ** 2 + torch.abs(y[..., 1, :]) ** 2, dim=-2)\n\n        # update spatial covariance matrices (weighted update)\n        for j in range(nb_sources):\n            R[j] = torch.tensor(0.0, device=x.device)\n            weight = torch.tensor(eps, device=x.device)\n            pos: int = 0\n            batch_size = batch_size if batch_size else nb_frames\n            while pos < nb_frames:\n                t = torch.arange(pos, min(nb_frames, pos + batch_size))\n                pos = int(t[-1]) + 1\n\n                R[j] = R[j] + torch.sum(_covariance(y[t, ..., j]), dim=0)\n                weight = weight + torch.sum(v[t, ..., j], dim=0)\n            R[j] = R[j] / weight[..., None, None, None]\n            weight = torch.zeros_like(weight)\n\n        # cloning y if we track gradient, because we're going to update it\n        if y.requires_grad:\n            y = y.clone()\n\n        pos = 0\n        while pos < nb_frames:\n            t = torch.arange(pos, min(nb_frames, pos + batch_size))\n            pos = int(t[-1]) + 1\n\n            y[t, ...] = torch.tensor(0.0, device=x.device, dtype=x.dtype)\n\n            # compute mix covariance matrix\n            Cxx = regularization\n            for j in range(nb_sources):\n                Cxx = Cxx + (v[t, ..., j, None, None, None] * R[j][None, ...].clone())\n\n            # invert it\n            inv_Cxx = _invert(Cxx)\n\n            # separate the sources\n            for j in range(nb_sources):\n\n                # create a wiener gain for this source\n                gain = torch.zeros_like(inv_Cxx)\n\n                # computes multichannel Wiener gain as v_j R_j inv_Cxx\n                indices = torch.cartesian_prod(torch.arange(nb_channels), torch.arange(nb_channels), torch.arange(nb_channels))\n                for index in indices:\n                    gain[:, :, index[0], index[1], :] = _mul_add(R[j][None, :, index[0], index[2], :].clone(), inv_Cxx[:, :, index[2], index[1], :], gain[:, :, index[0], index[1], :])\n                gain = gain * v[t, ..., None, None, None, j]\n\n                # apply it to the mixture\n                for i in range(nb_channels):\n                    y[t, ..., j] = _mul_add(gain[..., i, :], x[t, ..., i, None, :], y[t, ..., j])\n\n    return y, v, R\n\n\ndef wiener(targets_spectrograms: torch.Tensor, mix_stft: torch.Tensor, iterations: int = 1, softmask: bool = False, residual: bool = False, scale_factor: float = 10.0, eps: float = 1e-10):\n    \"\"\"Wiener-based separation for multichannel audio.\n\n    The method uses the (possibly multichannel) spectrograms  of the\n    sources to separate the (complex) Short Term Fourier Transform  of the\n    mix. Separation is done in a sequential way by:\n\n    * Getting an initial estimate. This can be done in two ways: either by\n      directly using the spectrograms with the mixture phase, or\n      by using a softmasking strategy. This initial phase is controlled\n      by the `softmask` flag.\n\n    * If required, adding an additional residual target as the mix minus\n      all targets.\n\n    * Refinining these initial estimates through a call to\n      :func:`expectation_maximization` if the number of iterations is nonzero.\n\n    This implementation also allows to specify the epsilon value used for\n    regularization. It is based on [1]_, [2]_, [3]_, [4]_.\n\n    References\n    ----------\n    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and\n        N. Takahashi and Y. Mitsufuji, \"Improving music source separation based\n        on deep neural networks through data augmentation and network\n        blending.\" 2017 IEEE International Conference on Acoustics, Speech\n        and Signal Processing (ICASSP). IEEE, 2017.\n\n    .. [2] A. Nugraha and A. Liutkus and E. Vincent. \"Multichannel audio source\n        separation with deep neural networks.\" IEEE/ACM Transactions on Audio,\n        Speech, and Language Processing 24.9 (2016): 1652-1664.\n\n    .. [3] A. Nugraha and A. Liutkus and E. Vincent. \"Multichannel music\n        separation with deep neural networks.\" 2016 24th European Signal\n        Processing Conference (EUSIPCO). IEEE, 2016.\n\n    .. [4] A. Liutkus and R. Badeau and G. Richard \"Kernel additive models for\n        source separation.\" IEEE Transactions on Signal Processing\n        62.16 (2014): 4298-4310.\n\n    Args:\n        targets_spectrograms (Tensor): spectrograms of the sources\n            [shape=(nb_frames, nb_bins, nb_channels, nb_sources)].\n            This is a nonnegative tensor that is\n            usually the output of the actual separation method of the user. The\n            spectrograms may be mono, but they need to be 4-dimensional in all\n            cases.\n        mix_stft (Tensor): [shape=(nb_frames, nb_bins, nb_channels, complex=2)]\n            STFT of the mixture signal.\n        iterations (int): [scalar]\n            number of iterations for the EM algorithm\n        softmask (bool): Describes how the initial estimates are obtained.\n            * if `False`, then the mixture phase will directly be used with the\n            spectrogram as initial estimates.\n            * if `True`, initial estimates are obtained by multiplying the\n            complex mix element-wise with the ratio of each target spectrogram\n            with the sum of them all. This strategy is better if the model are\n            not really good, and worse otherwise.\n        residual (bool): if `True`, an additional target is created, which is\n            equal to the mixture minus the other targets, before application of\n            expectation maximization\n        eps (float): Epsilon value to use for computing the separations.\n            This is used whenever division with a model energy is\n            performed, i.e. when softmasking and when iterating the EM.\n            It can be understood as the energy of the additional white noise\n            that is taken out when separating.\n\n    Returns:\n        Tensor: shape=(nb_frames, nb_bins, nb_channels, complex=2, nb_sources)\n            STFT of estimated sources\n\n    Notes:\n        * Be careful that you need *magnitude spectrogram estimates* for the\n        case `softmask==False`.\n        * `softmask=False` is recommended\n        * The epsilon value will have a huge impact on performance. If it's\n        large, only the parts of the signal with a significant energy will\n        be kept in the sources. This epsilon then directly controls the\n        energy of the reconstruction error.\n\n    Warning:\n        As in :func:`expectation_maximization`, we recommend converting the\n        mixture `x` to double precision `torch.float64` *before* calling\n        :func:`wiener`.\n    \"\"\"\n    if softmask:\n        # if we use softmask, we compute the ratio mask for all targets and\n        # multiply by the mix stft\n        y = mix_stft[..., None] * (targets_spectrograms / (eps + torch.sum(targets_spectrograms, dim=-1, keepdim=True).to(mix_stft.dtype)))[..., None, :]\n    else:\n        # otherwise, we just multiply the targets spectrograms with mix phase\n        # we tacitly assume that we have magnitude estimates.\n        angle = atan2(mix_stft[..., 1], mix_stft[..., 0])[..., None]\n        nb_sources = targets_spectrograms.shape[-1]\n        y = torch.zeros(mix_stft.shape + (nb_sources,), dtype=mix_stft.dtype, device=mix_stft.device)\n        y[..., 0, :] = targets_spectrograms * torch.cos(angle)\n        y[..., 1, :] = targets_spectrograms * torch.sin(angle)\n\n    if residual:\n        # if required, adding an additional target as the mix minus\n        # available targets\n        y = torch.cat([y, mix_stft[..., None] - y.sum(dim=-1, keepdim=True)], dim=-1)\n\n    if iterations == 0:\n        return y\n\n    # we need to refine the estimates. Scales down the estimates for\n    # numerical stability\n    max_abs = torch.max(torch.as_tensor(1.0, dtype=mix_stft.dtype, device=mix_stft.device), torch.sqrt(_norm(mix_stft)).max() / scale_factor)\n\n    mix_stft = mix_stft / max_abs\n    y = y / max_abs\n\n    # call expectation maximization\n    y = expectation_maximization(y, mix_stft, iterations, eps=eps)[0]\n\n    # scale estimates up again\n    y = y * max_abs\n    return y\n\n\ndef _covariance(y_j):\n    \"\"\"\n    Compute the empirical covariance for a source.\n\n    Args:\n        y_j (Tensor): complex stft of the source.\n            [shape=(nb_frames, nb_bins, nb_channels, 2)].\n\n    Returns:\n        Cj (Tensor): [shape=(nb_frames, nb_bins, nb_channels, nb_channels, 2)]\n            just y_j * conj(y_j.T): empirical covariance for each TF bin.\n    \"\"\"\n    (nb_frames, nb_bins, nb_channels) = y_j.shape[:-1]\n    Cj = torch.zeros((nb_frames, nb_bins, nb_channels, nb_channels, 2), dtype=y_j.dtype, device=y_j.device)\n    indices = torch.cartesian_prod(torch.arange(nb_channels), torch.arange(nb_channels))\n    for index in indices:\n        Cj[:, :, index[0], index[1], :] = _mul_add(y_j[:, :, index[0], :], _conj(y_j[:, :, index[1], :]), Cj[:, :, index[0], index[1], :])\n    return Cj\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/hdemucs.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"\nThis code contains the spectrogram and Hybrid version of Demucs.\n\"\"\"\nfrom copy import deepcopy\nimport math\nimport typing as tp\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom .filtering import wiener\nfrom .demucs import DConv, rescale_module\nfrom .states import capture_init\nfrom .spec import spectro, ispectro\n\n\ndef pad1d(x: torch.Tensor, paddings: tp.Tuple[int, int], mode: str = \"constant\", value: float = 0.0):\n    \"\"\"Tiny wrapper around F.pad, just to allow for reflect padding on small input.\n    If this is the case, we insert extra 0 padding to the right before the reflection happen.\"\"\"\n    x0 = x\n    length = x.shape[-1]\n    padding_left, padding_right = paddings\n    if mode == \"reflect\":\n        max_pad = max(padding_left, padding_right)\n        if length <= max_pad:\n            extra_pad = max_pad - length + 1\n            extra_pad_right = min(padding_right, extra_pad)\n            extra_pad_left = extra_pad - extra_pad_right\n            paddings = (padding_left - extra_pad_left, padding_right - extra_pad_right)\n            x = F.pad(x, (extra_pad_left, extra_pad_right))\n    out = F.pad(x, paddings, mode, value)\n    assert out.shape[-1] == length + padding_left + padding_right\n    assert (out[..., padding_left : padding_left + length] == x0).all()\n    return out\n\n\nclass ScaledEmbedding(nn.Module):\n    \"\"\"\n    Boost learning rate for embeddings (with `scale`).\n    Also, can make embeddings continuous with `smooth`.\n    \"\"\"\n\n    def __init__(self, num_embeddings: int, embedding_dim: int, scale: float = 10.0, smooth=False):\n        super().__init__()\n        self.embedding = nn.Embedding(num_embeddings, embedding_dim)\n        if smooth:\n            weight = torch.cumsum(self.embedding.weight.data, dim=0)\n            # when summing gaussian, overscale raises as sqrt(n), so we nornalize by that.\n            weight = weight / torch.arange(1, num_embeddings + 1).to(weight).sqrt()[:, None]\n            self.embedding.weight.data[:] = weight\n        self.embedding.weight.data /= scale\n        self.scale = scale\n\n    @property\n    def weight(self):\n        return self.embedding.weight * self.scale\n\n    def forward(self, x):\n        out = self.embedding(x) * self.scale\n        return out\n\n\nclass HEncLayer(nn.Module):\n    def __init__(self, chin, chout, kernel_size=8, stride=4, norm_groups=1, empty=False, freq=True, dconv=True, norm=True, context=0, dconv_kw={}, pad=True, rewrite=True):\n        \"\"\"Encoder layer. This used both by the time and the frequency branch.\n\n        Args:\n            chin: number of input channels.\n            chout: number of output channels.\n            norm_groups: number of groups for group norm.\n            empty: used to make a layer with just the first conv. this is used\n                before merging the time and freq. branches.\n            freq: this is acting on frequencies.\n            dconv: insert DConv residual branches.\n            norm: use GroupNorm.\n            context: context size for the 1x1 conv.\n            dconv_kw: list of kwargs for the DConv class.\n            pad: pad the input. Padding is done so that the output size is\n                always the input size / stride.\n            rewrite: add 1x1 conv at the end of the layer.\n        \"\"\"\n        super().__init__()\n        norm_fn = lambda d: nn.Identity()  # noqa\n        if norm:\n            norm_fn = lambda d: nn.GroupNorm(norm_groups, d)  # noqa\n        if pad:\n            pad = kernel_size // 4\n        else:\n            pad = 0\n        klass = nn.Conv1d\n        self.freq = freq\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.empty = empty\n        self.norm = norm\n        self.pad = pad\n        if freq:\n            kernel_size = [kernel_size, 1]\n            stride = [stride, 1]\n            pad = [pad, 0]\n            klass = nn.Conv2d\n        self.conv = klass(chin, chout, kernel_size, stride, pad)\n        if self.empty:\n            return\n        self.norm1 = norm_fn(chout)\n        self.rewrite = None\n        if rewrite:\n            self.rewrite = klass(chout, 2 * chout, 1 + 2 * context, 1, context)\n            self.norm2 = norm_fn(2 * chout)\n\n        self.dconv = None\n        if dconv:\n            self.dconv = DConv(chout, **dconv_kw)\n\n    def forward(self, x, inject=None):\n        \"\"\"\n        `inject` is used to inject the result from the time branch into the frequency branch,\n        when both have the same stride.\n        \"\"\"\n        if not self.freq and x.dim() == 4:\n            B, C, Fr, T = x.shape\n            x = x.view(B, -1, T)\n\n        if not self.freq:\n            le = x.shape[-1]\n            if not le % self.stride == 0:\n                x = F.pad(x, (0, self.stride - (le % self.stride)))\n        y = self.conv(x)\n        if self.empty:\n            return y\n        if inject is not None:\n            assert inject.shape[-1] == y.shape[-1], (inject.shape, y.shape)\n            if inject.dim() == 3 and y.dim() == 4:\n                inject = inject[:, :, None]\n            y = y + inject\n        y = F.gelu(self.norm1(y))\n        if self.dconv:\n            if self.freq:\n                B, C, Fr, T = y.shape\n                y = y.permute(0, 2, 1, 3).reshape(-1, C, T)\n            y = self.dconv(y)\n            if self.freq:\n                y = y.view(B, Fr, C, T).permute(0, 2, 1, 3)\n        if self.rewrite:\n            z = self.norm2(self.rewrite(y))\n            z = F.glu(z, dim=1)\n        else:\n            z = y\n        return z\n\n\nclass MultiWrap(nn.Module):\n    \"\"\"\n    Takes one layer and replicate it N times. each replica will act\n    on a frequency band. All is done so that if the N replica have the same weights,\n    then this is exactly equivalent to applying the original module on all frequencies.\n\n    This is a bit over-engineered to avoid edge artifacts when splitting\n    the frequency bands, but it is possible the naive implementation would work as well...\n    \"\"\"\n\n    def __init__(self, layer, split_ratios):\n        \"\"\"\n        Args:\n            layer: module to clone, must be either HEncLayer or HDecLayer.\n            split_ratios: list of float indicating which ratio to keep for each band.\n        \"\"\"\n        super().__init__()\n        self.split_ratios = split_ratios\n        self.layers = nn.ModuleList()\n        self.conv = isinstance(layer, HEncLayer)\n        assert not layer.norm\n        assert layer.freq\n        assert layer.pad\n        if not self.conv:\n            assert not layer.context_freq\n        for k in range(len(split_ratios) + 1):\n            lay = deepcopy(layer)\n            if self.conv:\n                lay.conv.padding = (0, 0)\n            else:\n                lay.pad = False\n            for m in lay.modules():\n                if hasattr(m, \"reset_parameters\"):\n                    m.reset_parameters()\n            self.layers.append(lay)\n\n    def forward(self, x, skip=None, length=None):\n        B, C, Fr, T = x.shape\n\n        ratios = list(self.split_ratios) + [1]\n        start = 0\n        outs = []\n        for ratio, layer in zip(ratios, self.layers):\n            if self.conv:\n                pad = layer.kernel_size // 4\n                if ratio == 1:\n                    limit = Fr\n                    frames = -1\n                else:\n                    limit = int(round(Fr * ratio))\n                    le = limit - start\n                    if start == 0:\n                        le += pad\n                    frames = round((le - layer.kernel_size) / layer.stride + 1)\n                    limit = start + (frames - 1) * layer.stride + layer.kernel_size\n                    if start == 0:\n                        limit -= pad\n                assert limit - start > 0, (limit, start)\n                assert limit <= Fr, (limit, Fr)\n                y = x[:, :, start:limit, :]\n                if start == 0:\n                    y = F.pad(y, (0, 0, pad, 0))\n                if ratio == 1:\n                    y = F.pad(y, (0, 0, 0, pad))\n                outs.append(layer(y))\n                start = limit - layer.kernel_size + layer.stride\n            else:\n                if ratio == 1:\n                    limit = Fr\n                else:\n                    limit = int(round(Fr * ratio))\n                last = layer.last\n                layer.last = True\n\n                y = x[:, :, start:limit]\n                s = skip[:, :, start:limit]\n                out, _ = layer(y, s, None)\n                if outs:\n                    outs[-1][:, :, -layer.stride :] += out[:, :, : layer.stride] - layer.conv_tr.bias.view(1, -1, 1, 1)\n                    out = out[:, :, layer.stride :]\n                if ratio == 1:\n                    out = out[:, :, : -layer.stride // 2, :]\n                if start == 0:\n                    out = out[:, :, layer.stride // 2 :, :]\n                outs.append(out)\n                layer.last = last\n                start = limit\n        out = torch.cat(outs, dim=2)\n        if not self.conv and not last:\n            out = F.gelu(out)\n        if self.conv:\n            return out\n        else:\n            return out, None\n\n\nclass HDecLayer(nn.Module):\n    def __init__(\n        self, chin, chout, last=False, kernel_size=8, stride=4, norm_groups=1, empty=False, freq=True, dconv=True, norm=True, context=1, dconv_kw={}, pad=True, context_freq=True, rewrite=True\n    ):\n        \"\"\"\n        Same as HEncLayer but for decoder. See `HEncLayer` for documentation.\n        \"\"\"\n        super().__init__()\n        norm_fn = lambda d: nn.Identity()  # noqa\n        if norm:\n            norm_fn = lambda d: nn.GroupNorm(norm_groups, d)  # noqa\n        if pad:\n            pad = kernel_size // 4\n        else:\n            pad = 0\n        self.pad = pad\n        self.last = last\n        self.freq = freq\n        self.chin = chin\n        self.empty = empty\n        self.stride = stride\n        self.kernel_size = kernel_size\n        self.norm = norm\n        self.context_freq = context_freq\n        klass = nn.Conv1d\n        klass_tr = nn.ConvTranspose1d\n        if freq:\n            kernel_size = [kernel_size, 1]\n            stride = [stride, 1]\n            klass = nn.Conv2d\n            klass_tr = nn.ConvTranspose2d\n        self.conv_tr = klass_tr(chin, chout, kernel_size, stride)\n        self.norm2 = norm_fn(chout)\n        if self.empty:\n            return\n        self.rewrite = None\n        if rewrite:\n            if context_freq:\n                self.rewrite = klass(chin, 2 * chin, 1 + 2 * context, 1, context)\n            else:\n                self.rewrite = klass(chin, 2 * chin, [1, 1 + 2 * context], 1, [0, context])\n            self.norm1 = norm_fn(2 * chin)\n\n        self.dconv = None\n        if dconv:\n            self.dconv = DConv(chin, **dconv_kw)\n\n    def forward(self, x, skip, length):\n        if self.freq and x.dim() == 3:\n            B, C, T = x.shape\n            x = x.view(B, self.chin, -1, T)\n\n        if not self.empty:\n            x = x + skip\n\n            if self.rewrite:\n                y = F.glu(self.norm1(self.rewrite(x)), dim=1)\n            else:\n                y = x\n            if self.dconv:\n                if self.freq:\n                    B, C, Fr, T = y.shape\n                    y = y.permute(0, 2, 1, 3).reshape(-1, C, T)\n                y = self.dconv(y)\n                if self.freq:\n                    y = y.view(B, Fr, C, T).permute(0, 2, 1, 3)\n        else:\n            y = x\n            assert skip is None\n        z = self.norm2(self.conv_tr(y))\n        if self.freq:\n            if self.pad:\n                z = z[..., self.pad : -self.pad, :]\n        else:\n            z = z[..., self.pad : self.pad + length]\n            assert z.shape[-1] == length, (z.shape[-1], length)\n        if not self.last:\n            z = F.gelu(z)\n        return z, y\n\n\nclass HDemucs(nn.Module):\n    \"\"\"\n    Spectrogram and hybrid Demucs model.\n    The spectrogram model has the same structure as Demucs, except the first few layers are over the\n    frequency axis, until there is only 1 frequency, and then it moves to time convolutions.\n    Frequency layers can still access information across time steps thanks to the DConv residual.\n\n    Hybrid model have a parallel time branch. At some layer, the time branch has the same stride\n    as the frequency branch and then the two are combined. The opposite happens in the decoder.\n\n    Models can either use naive iSTFT from masking, Wiener filtering ([Ulhih et al. 2017]),\n    or complex as channels (CaC) [Choi et al. 2020]. Wiener filtering is based on\n    Open Unmix implementation [Stoter et al. 2019].\n\n    The loss is always on the temporal domain, by backpropagating through the above\n    output methods and iSTFT. This allows to define hybrid models nicely. However, this breaks\n    a bit Wiener filtering, as doing more iteration at test time will change the spectrogram\n    contribution, without changing the one from the waveform, which will lead to worse performance.\n    I tried using the residual option in OpenUnmix Wiener implementation, but it didn't improve.\n    CaC on the other hand provides similar performance for hybrid, and works naturally with\n    hybrid models.\n\n    This model also uses frequency embeddings are used to improve efficiency on convolutions\n    over the freq. axis, following [Isik et al. 2020] (https://arxiv.org/pdf/2008.04470.pdf).\n\n    Unlike classic Demucs, there is no resampling here, and normalization is always applied.\n    \"\"\"\n\n    @capture_init\n    def __init__(\n        self,\n        sources,\n        # Channels\n        audio_channels=2,\n        channels=48,\n        channels_time=None,\n        growth=2,\n        # STFT\n        nfft=4096,\n        wiener_iters=0,\n        end_iters=0,\n        wiener_residual=False,\n        cac=True,\n        # Main structure\n        depth=6,\n        rewrite=True,\n        hybrid=True,\n        hybrid_old=False,\n        # Frequency branch\n        multi_freqs=None,\n        multi_freqs_depth=2,\n        freq_emb=0.2,\n        emb_scale=10,\n        emb_smooth=True,\n        # Convolutions\n        kernel_size=8,\n        time_stride=2,\n        stride=4,\n        context=1,\n        context_enc=0,\n        # Normalization\n        norm_starts=4,\n        norm_groups=4,\n        # DConv residual branch\n        dconv_mode=1,\n        dconv_depth=2,\n        dconv_comp=4,\n        dconv_attn=4,\n        dconv_lstm=4,\n        dconv_init=1e-4,\n        # Weight init\n        rescale=0.1,\n        # Metadata\n        samplerate=44100,\n        segment=4 * 10,\n    ):\n        \"\"\"\n        Args:\n            sources (list[str]): list of source names.\n            audio_channels (int): input/output audio channels.\n            channels (int): initial number of hidden channels.\n            channels_time: if not None, use a different `channels` value for the time branch.\n            growth: increase the number of hidden channels by this factor at each layer.\n            nfft: number of fft bins. Note that changing this require careful computation of\n                various shape parameters and will not work out of the box for hybrid models.\n            wiener_iters: when using Wiener filtering, number of iterations at test time.\n            end_iters: same but at train time. For a hybrid model, must be equal to `wiener_iters`.\n            wiener_residual: add residual source before wiener filtering.\n            cac: uses complex as channels, i.e. complex numbers are 2 channels each\n                in input and output. no further processing is done before ISTFT.\n            depth (int): number of layers in the encoder and in the decoder.\n            rewrite (bool): add 1x1 convolution to each layer.\n            hybrid (bool): make a hybrid time/frequency domain, otherwise frequency only.\n            hybrid_old: some models trained for MDX had a padding bug. This replicates\n                this bug to avoid retraining them.\n            multi_freqs: list of frequency ratios for splitting frequency bands with `MultiWrap`.\n            multi_freqs_depth: how many layers to wrap with `MultiWrap`. Only the outermost\n                layers will be wrapped.\n            freq_emb: add frequency embedding after the first frequency layer if > 0,\n                the actual value controls the weight of the embedding.\n            emb_scale: equivalent to scaling the embedding learning rate\n            emb_smooth: initialize the embedding with a smooth one (with respect to frequencies).\n            kernel_size: kernel_size for encoder and decoder layers.\n            stride: stride for encoder and decoder layers.\n            time_stride: stride for the final time layer, after the merge.\n            context: context for 1x1 conv in the decoder.\n            context_enc: context for 1x1 conv in the encoder.\n            norm_starts: layer at which group norm starts being used.\n                decoder layers are numbered in reverse order.\n            norm_groups: number of groups for group norm.\n            dconv_mode: if 1: dconv in encoder only, 2: decoder only, 3: both.\n            dconv_depth: depth of residual DConv branch.\n            dconv_comp: compression of DConv branch.\n            dconv_attn: adds attention layers in DConv branch starting at this layer.\n            dconv_lstm: adds a LSTM layer in DConv branch starting at this layer.\n            dconv_init: initial scale for the DConv branch LayerScale.\n            rescale: weight recaling trick\n\n        \"\"\"\n        super().__init__()\n\n        self.cac = cac\n        self.wiener_residual = wiener_residual\n        self.audio_channels = audio_channels\n        self.sources = sources\n        self.kernel_size = kernel_size\n        self.context = context\n        self.stride = stride\n        self.depth = depth\n        self.channels = channels\n        self.samplerate = samplerate\n        self.segment = segment\n\n        self.nfft = nfft\n        self.hop_length = nfft // 4\n        self.wiener_iters = wiener_iters\n        self.end_iters = end_iters\n        self.freq_emb = None\n        self.hybrid = hybrid\n        self.hybrid_old = hybrid_old\n        if hybrid_old:\n            assert hybrid, \"hybrid_old must come with hybrid=True\"\n        if hybrid:\n            assert wiener_iters == end_iters\n\n        self.encoder = nn.ModuleList()\n        self.decoder = nn.ModuleList()\n\n        if hybrid:\n            self.tencoder = nn.ModuleList()\n            self.tdecoder = nn.ModuleList()\n\n        chin = audio_channels\n        chin_z = chin  # number of channels for the freq branch\n        if self.cac:\n            chin_z *= 2\n        chout = channels_time or channels\n        chout_z = channels\n        freqs = nfft // 2\n\n        for index in range(depth):\n            lstm = index >= dconv_lstm\n            attn = index >= dconv_attn\n            norm = index >= norm_starts\n            freq = freqs > 1\n            stri = stride\n            ker = kernel_size\n            if not freq:\n                assert freqs == 1\n                ker = time_stride * 2\n                stri = time_stride\n\n            pad = True\n            last_freq = False\n            if freq and freqs <= kernel_size:\n                ker = freqs\n                pad = False\n                last_freq = True\n\n            kw = {\n                \"kernel_size\": ker,\n                \"stride\": stri,\n                \"freq\": freq,\n                \"pad\": pad,\n                \"norm\": norm,\n                \"rewrite\": rewrite,\n                \"norm_groups\": norm_groups,\n                \"dconv_kw\": {\"lstm\": lstm, \"attn\": attn, \"depth\": dconv_depth, \"compress\": dconv_comp, \"init\": dconv_init, \"gelu\": True},\n            }\n            kwt = dict(kw)\n            kwt[\"freq\"] = 0\n            kwt[\"kernel_size\"] = kernel_size\n            kwt[\"stride\"] = stride\n            kwt[\"pad\"] = True\n            kw_dec = dict(kw)\n            multi = False\n            if multi_freqs and index < multi_freqs_depth:\n                multi = True\n                kw_dec[\"context_freq\"] = False\n\n            if last_freq:\n                chout_z = max(chout, chout_z)\n                chout = chout_z\n\n            enc = HEncLayer(chin_z, chout_z, dconv=dconv_mode & 1, context=context_enc, **kw)\n            if hybrid and freq:\n                tenc = HEncLayer(chin, chout, dconv=dconv_mode & 1, context=context_enc, empty=last_freq, **kwt)\n                self.tencoder.append(tenc)\n\n            if multi:\n                enc = MultiWrap(enc, multi_freqs)\n            self.encoder.append(enc)\n            if index == 0:\n                chin = self.audio_channels * len(self.sources)\n                chin_z = chin\n                if self.cac:\n                    chin_z *= 2\n            dec = HDecLayer(chout_z, chin_z, dconv=dconv_mode & 2, last=index == 0, context=context, **kw_dec)\n            if multi:\n                dec = MultiWrap(dec, multi_freqs)\n            if hybrid and freq:\n                tdec = HDecLayer(chout, chin, dconv=dconv_mode & 2, empty=last_freq, last=index == 0, context=context, **kwt)\n                self.tdecoder.insert(0, tdec)\n            self.decoder.insert(0, dec)\n\n            chin = chout\n            chin_z = chout_z\n            chout = int(growth * chout)\n            chout_z = int(growth * chout_z)\n            if freq:\n                if freqs <= kernel_size:\n                    freqs = 1\n                else:\n                    freqs //= stride\n            if index == 0 and freq_emb:\n                self.freq_emb = ScaledEmbedding(freqs, chin_z, smooth=emb_smooth, scale=emb_scale)\n                self.freq_emb_scale = freq_emb\n\n        if rescale:\n            rescale_module(self, reference=rescale)\n\n    def _spec(self, x):\n        hl = self.hop_length\n        nfft = self.nfft\n        x0 = x  # noqa\n\n        if self.hybrid:\n            # We re-pad the signal in order to keep the property\n            # that the size of the output is exactly the size of the input\n            # divided by the stride (here hop_length), when divisible.\n            # This is achieved by padding by 1/4th of the kernel size (here nfft).\n            # which is not supported by torch.stft.\n            # Having all convolution operations follow this convention allow to easily\n            # align the time and frequency branches later on.\n            assert hl == nfft // 4\n            le = int(math.ceil(x.shape[-1] / hl))\n            pad = hl // 2 * 3\n            if not self.hybrid_old:\n                x = pad1d(x, (pad, pad + le * hl - x.shape[-1]), mode=\"reflect\")\n            else:\n                x = pad1d(x, (pad, pad + le * hl - x.shape[-1]))\n\n        z = spectro(x, nfft, hl)[..., :-1, :]\n        if self.hybrid:\n            assert z.shape[-1] == le + 4, (z.shape, x.shape, le)\n            z = z[..., 2 : 2 + le]\n        return z\n\n    def _ispec(self, z, length=None, scale=0):\n        hl = self.hop_length // (4**scale)\n        z = F.pad(z, (0, 0, 0, 1))\n        if self.hybrid:\n            z = F.pad(z, (2, 2))\n            pad = hl // 2 * 3\n            if not self.hybrid_old:\n                le = hl * int(math.ceil(length / hl)) + 2 * pad\n            else:\n                le = hl * int(math.ceil(length / hl))\n            x = ispectro(z, hl, length=le)\n            if not self.hybrid_old:\n                x = x[..., pad : pad + length]\n            else:\n                x = x[..., :length]\n        else:\n            x = ispectro(z, hl, length)\n        return x\n\n    def _magnitude(self, z):\n        # return the magnitude of the spectrogram, except when cac is True,\n        # in which case we just move the complex dimension to the channel one.\n        if self.cac:\n            B, C, Fr, T = z.shape\n            m = torch.view_as_real(z).permute(0, 1, 4, 2, 3)\n            m = m.reshape(B, C * 2, Fr, T)\n        else:\n            m = z.abs()\n        return m\n\n    def _mask(self, z, m):\n        # Apply masking given the mixture spectrogram `z` and the estimated mask `m`.\n        # If `cac` is True, `m` is actually a full spectrogram and `z` is ignored.\n        niters = self.wiener_iters\n        if self.cac:\n            B, S, C, Fr, T = m.shape\n            out = m.view(B, S, -1, 2, Fr, T).permute(0, 1, 2, 4, 5, 3)\n            out = torch.view_as_complex(out.contiguous())\n            return out\n        if self.training:\n            niters = self.end_iters\n        if niters < 0:\n            z = z[:, None]\n            return z / (1e-8 + z.abs()) * m\n        else:\n            return self._wiener(m, z, niters)\n\n    def _wiener(self, mag_out, mix_stft, niters):\n        # apply wiener filtering from OpenUnmix.\n        init = mix_stft.dtype\n        wiener_win_len = 300\n        residual = self.wiener_residual\n\n        B, S, C, Fq, T = mag_out.shape\n        mag_out = mag_out.permute(0, 4, 3, 2, 1)\n        mix_stft = torch.view_as_real(mix_stft.permute(0, 3, 2, 1))\n\n        outs = []\n        for sample in range(B):\n            pos = 0\n            out = []\n            for pos in range(0, T, wiener_win_len):\n                frame = slice(pos, pos + wiener_win_len)\n                z_out = wiener(mag_out[sample, frame], mix_stft[sample, frame], niters, residual=residual)\n                out.append(z_out.transpose(-1, -2))\n            outs.append(torch.cat(out, dim=0))\n        out = torch.view_as_complex(torch.stack(outs, 0))\n        out = out.permute(0, 4, 3, 2, 1).contiguous()\n        if residual:\n            out = out[:, :-1]\n        assert list(out.shape) == [B, S, C, Fq, T]\n        return out.to(init)\n\n    def forward(self, mix):\n        x = mix\n        length = x.shape[-1]\n\n        z = self._spec(mix)\n        mag = self._magnitude(z).to(mix.device)\n        x = mag\n\n        B, C, Fq, T = x.shape\n\n        # unlike previous Demucs, we always normalize because it is easier.\n        mean = x.mean(dim=(1, 2, 3), keepdim=True)\n        std = x.std(dim=(1, 2, 3), keepdim=True)\n        x = (x - mean) / (1e-5 + std)\n        # x will be the freq. branch input.\n\n        if self.hybrid:\n            # Prepare the time branch input.\n            xt = mix\n            meant = xt.mean(dim=(1, 2), keepdim=True)\n            stdt = xt.std(dim=(1, 2), keepdim=True)\n            xt = (xt - meant) / (1e-5 + stdt)\n\n        # okay, this is a giant mess I know...\n        saved = []  # skip connections, freq.\n        saved_t = []  # skip connections, time.\n        lengths = []  # saved lengths to properly remove padding, freq branch.\n        lengths_t = []  # saved lengths for time branch.\n        for idx, encode in enumerate(self.encoder):\n            lengths.append(x.shape[-1])\n            inject = None\n            if self.hybrid and idx < len(self.tencoder):\n                # we have not yet merged branches.\n                lengths_t.append(xt.shape[-1])\n                tenc = self.tencoder[idx]\n                xt = tenc(xt)\n                if not tenc.empty:\n                    # save for skip connection\n                    saved_t.append(xt)\n                else:\n                    # tenc contains just the first conv., so that now time and freq.\n                    # branches have the same shape and can be merged.\n                    inject = xt\n            x = encode(x, inject)\n            if idx == 0 and self.freq_emb is not None:\n                # add frequency embedding to allow for non equivariant convolutions\n                # over the frequency axis.\n                frs = torch.arange(x.shape[-2], device=x.device)\n                emb = self.freq_emb(frs).t()[None, :, :, None].expand_as(x)\n                x = x + self.freq_emb_scale * emb\n\n            saved.append(x)\n\n        x = torch.zeros_like(x)\n        if self.hybrid:\n            xt = torch.zeros_like(x)\n        # initialize everything to zero (signal will go through u-net skips).\n\n        for idx, decode in enumerate(self.decoder):\n            skip = saved.pop(-1)\n            x, pre = decode(x, skip, lengths.pop(-1))\n            # `pre` contains the output just before final transposed convolution,\n            # which is used when the freq. and time branch separate.\n\n            if self.hybrid:\n                offset = self.depth - len(self.tdecoder)\n            if self.hybrid and idx >= offset:\n                tdec = self.tdecoder[idx - offset]\n                length_t = lengths_t.pop(-1)\n                if tdec.empty:\n                    assert pre.shape[2] == 1, pre.shape\n                    pre = pre[:, :, 0]\n                    xt, _ = tdec(pre, None, length_t)\n                else:\n                    skip = saved_t.pop(-1)\n                    xt, _ = tdec(xt, skip, length_t)\n\n        # Let's make sure we used all stored skip connections.\n        assert len(saved) == 0\n        assert len(lengths_t) == 0\n        assert len(saved_t) == 0\n\n        S = len(self.sources)\n        x = x.view(B, S, -1, Fq, T)\n        x = x * std[:, None] + mean[:, None]\n\n        # to cpu as non-cuda GPUs don't support complex numbers\n        # demucs issue #435 ##432\n        # NOTE: in this case z already is on cpu\n        # TODO: remove this when mps supports complex numbers\n\n        device_type = x.device.type\n        device_load = f\"{device_type}:{x.device.index}\" if not device_type == \"mps\" else device_type\n        x_is_other_gpu = not device_type in [\"cuda\", \"cpu\"]\n\n        if x_is_other_gpu:\n            x = x.cpu()\n\n        zout = self._mask(z, x)\n        x = self._ispec(zout, length)\n\n        # back to other device\n        if x_is_other_gpu:\n            x = x.to(device_load)\n\n        if self.hybrid:\n            xt = xt.view(B, S, -1, length)\n            xt = xt * stdt[:, None] + meant[:, None]\n            x = xt + x\n        return x\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/htdemucs.py",
    "content": "# Copyright (c) Meta, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# First author is Simon Rouard.\n\"\"\"\nThis code contains the spectrogram and Hybrid version of Demucs.\n\"\"\"\nimport math\n\nfrom .filtering import wiener\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom fractions import Fraction\nfrom einops import rearrange\n\nfrom .transformer import CrossTransformerEncoder\n\nfrom .demucs import rescale_module\nfrom .states import capture_init\nfrom .spec import spectro, ispectro\nfrom .hdemucs import pad1d, ScaledEmbedding, HEncLayer, MultiWrap, HDecLayer\n\n\nclass HTDemucs(nn.Module):\n    \"\"\"\n    Spectrogram and hybrid Demucs model.\n    The spectrogram model has the same structure as Demucs, except the first few layers are over the\n    frequency axis, until there is only 1 frequency, and then it moves to time convolutions.\n    Frequency layers can still access information across time steps thanks to the DConv residual.\n\n    Hybrid model have a parallel time branch. At some layer, the time branch has the same stride\n    as the frequency branch and then the two are combined. The opposite happens in the decoder.\n\n    Models can either use naive iSTFT from masking, Wiener filtering ([Ulhih et al. 2017]),\n    or complex as channels (CaC) [Choi et al. 2020]. Wiener filtering is based on\n    Open Unmix implementation [Stoter et al. 2019].\n\n    The loss is always on the temporal domain, by backpropagating through the above\n    output methods and iSTFT. This allows to define hybrid models nicely. However, this breaks\n    a bit Wiener filtering, as doing more iteration at test time will change the spectrogram\n    contribution, without changing the one from the waveform, which will lead to worse performance.\n    I tried using the residual option in OpenUnmix Wiener implementation, but it didn't improve.\n    CaC on the other hand provides similar performance for hybrid, and works naturally with\n    hybrid models.\n\n    This model also uses frequency embeddings are used to improve efficiency on convolutions\n    over the freq. axis, following [Isik et al. 2020] (https://arxiv.org/pdf/2008.04470.pdf).\n\n    Unlike classic Demucs, there is no resampling here, and normalization is always applied.\n    \"\"\"\n\n    @capture_init\n    def __init__(\n        self,\n        sources,\n        # Channels\n        audio_channels=2,\n        channels=48,\n        channels_time=None,\n        growth=2,\n        # STFT\n        nfft=4096,\n        wiener_iters=0,\n        end_iters=0,\n        wiener_residual=False,\n        cac=True,\n        # Main structure\n        depth=4,\n        rewrite=True,\n        # Frequency branch\n        multi_freqs=None,\n        multi_freqs_depth=3,\n        freq_emb=0.2,\n        emb_scale=10,\n        emb_smooth=True,\n        # Convolutions\n        kernel_size=8,\n        time_stride=2,\n        stride=4,\n        context=1,\n        context_enc=0,\n        # Normalization\n        norm_starts=4,\n        norm_groups=4,\n        # DConv residual branch\n        dconv_mode=1,\n        dconv_depth=2,\n        dconv_comp=8,\n        dconv_init=1e-3,\n        # Before the Transformer\n        bottom_channels=0,\n        # Transformer\n        t_layers=5,\n        t_emb=\"sin\",\n        t_hidden_scale=4.0,\n        t_heads=8,\n        t_dropout=0.0,\n        t_max_positions=10000,\n        t_norm_in=True,\n        t_norm_in_group=False,\n        t_group_norm=False,\n        t_norm_first=True,\n        t_norm_out=True,\n        t_max_period=10000.0,\n        t_weight_decay=0.0,\n        t_lr=None,\n        t_layer_scale=True,\n        t_gelu=True,\n        t_weight_pos_embed=1.0,\n        t_sin_random_shift=0,\n        t_cape_mean_normalize=True,\n        t_cape_augment=True,\n        t_cape_glob_loc_scale=[5000.0, 1.0, 1.4],\n        t_sparse_self_attn=False,\n        t_sparse_cross_attn=False,\n        t_mask_type=\"diag\",\n        t_mask_random_seed=42,\n        t_sparse_attn_window=500,\n        t_global_window=100,\n        t_sparsity=0.95,\n        t_auto_sparsity=False,\n        # ------ Particuliar parameters\n        t_cross_first=False,\n        # Weight init\n        rescale=0.1,\n        # Metadata\n        samplerate=44100,\n        segment=10,\n        use_train_segment=True,\n    ):\n        \"\"\"\n        Args:\n            sources (list[str]): list of source names.\n            audio_channels (int): input/output audio channels.\n            channels (int): initial number of hidden channels.\n            channels_time: if not None, use a different `channels` value for the time branch.\n            growth: increase the number of hidden channels by this factor at each layer.\n            nfft: number of fft bins. Note that changing this require careful computation of\n                various shape parameters and will not work out of the box for hybrid models.\n            wiener_iters: when using Wiener filtering, number of iterations at test time.\n            end_iters: same but at train time. For a hybrid model, must be equal to `wiener_iters`.\n            wiener_residual: add residual source before wiener filtering.\n            cac: uses complex as channels, i.e. complex numbers are 2 channels each\n                in input and output. no further processing is done before ISTFT.\n            depth (int): number of layers in the encoder and in the decoder.\n            rewrite (bool): add 1x1 convolution to each layer.\n            multi_freqs: list of frequency ratios for splitting frequency bands with `MultiWrap`.\n            multi_freqs_depth: how many layers to wrap with `MultiWrap`. Only the outermost\n                layers will be wrapped.\n            freq_emb: add frequency embedding after the first frequency layer if > 0,\n                the actual value controls the weight of the embedding.\n            emb_scale: equivalent to scaling the embedding learning rate\n            emb_smooth: initialize the embedding with a smooth one (with respect to frequencies).\n            kernel_size: kernel_size for encoder and decoder layers.\n            stride: stride for encoder and decoder layers.\n            time_stride: stride for the final time layer, after the merge.\n            context: context for 1x1 conv in the decoder.\n            context_enc: context for 1x1 conv in the encoder.\n            norm_starts: layer at which group norm starts being used.\n                decoder layers are numbered in reverse order.\n            norm_groups: number of groups for group norm.\n            dconv_mode: if 1: dconv in encoder only, 2: decoder only, 3: both.\n            dconv_depth: depth of residual DConv branch.\n            dconv_comp: compression of DConv branch.\n            dconv_attn: adds attention layers in DConv branch starting at this layer.\n            dconv_lstm: adds a LSTM layer in DConv branch starting at this layer.\n            dconv_init: initial scale for the DConv branch LayerScale.\n            bottom_channels: if >0 it adds a linear layer (1x1 Conv) before and after the\n                transformer in order to change the number of channels\n            t_layers: number of layers in each branch (waveform and spec) of the transformer\n            t_emb: \"sin\", \"cape\" or \"scaled\"\n            t_hidden_scale: the hidden scale of the Feedforward parts of the transformer\n                for instance if C = 384 (the number of channels in the transformer) and\n                t_hidden_scale = 4.0 then the intermediate layer of the FFN has dimension\n                384 * 4 = 1536\n            t_heads: number of heads for the transformer\n            t_dropout: dropout in the transformer\n            t_max_positions: max_positions for the \"scaled\" positional embedding, only\n                useful if t_emb=\"scaled\"\n            t_norm_in: (bool) norm before addinf positional embedding and getting into the\n                transformer layers\n            t_norm_in_group: (bool) if True while t_norm_in=True, the norm is on all the\n                timesteps (GroupNorm with group=1)\n            t_group_norm: (bool) if True, the norms of the Encoder Layers are on all the\n                timesteps (GroupNorm with group=1)\n            t_norm_first: (bool) if True the norm is before the attention and before the FFN\n            t_norm_out: (bool) if True, there is a GroupNorm (group=1) at the end of each layer\n            t_max_period: (float) denominator in the sinusoidal embedding expression\n            t_weight_decay: (float) weight decay for the transformer\n            t_lr: (float) specific learning rate for the transformer\n            t_layer_scale: (bool) Layer Scale for the transformer\n            t_gelu: (bool) activations of the transformer are GeLU if True, ReLU else\n            t_weight_pos_embed: (float) weighting of the positional embedding\n            t_cape_mean_normalize: (bool) if t_emb=\"cape\", normalisation of positional embeddings\n                see: https://arxiv.org/abs/2106.03143\n            t_cape_augment: (bool) if t_emb=\"cape\", must be True during training and False\n                during the inference, see: https://arxiv.org/abs/2106.03143\n            t_cape_glob_loc_scale: (list of 3 floats) if t_emb=\"cape\", CAPE parameters\n                see: https://arxiv.org/abs/2106.03143\n            t_sparse_self_attn: (bool) if True, the self attentions are sparse\n            t_sparse_cross_attn: (bool) if True, the cross-attentions are sparse (don't use it\n                unless you designed really specific masks)\n            t_mask_type: (str) can be \"diag\", \"jmask\", \"random\", \"global\" or any combination\n                with '_' between: i.e. \"diag_jmask_random\" (note that this is permutation\n                invariant i.e. \"diag_jmask_random\" is equivalent to \"jmask_random_diag\")\n            t_mask_random_seed: (int) if \"random\" is in t_mask_type, controls the seed\n                that generated the random part of the mask\n            t_sparse_attn_window: (int) if \"diag\" is in t_mask_type, for a query (i), and\n                a key (j), the mask is True id |i-j|<=t_sparse_attn_window\n            t_global_window: (int) if \"global\" is in t_mask_type, mask[:t_global_window, :]\n                and mask[:, :t_global_window] will be True\n            t_sparsity: (float) if \"random\" is in t_mask_type, t_sparsity is the sparsity\n                level of the random part of the mask.\n            t_cross_first: (bool) if True cross attention is the first layer of the\n                transformer (False seems to be better)\n            rescale: weight rescaling trick\n            use_train_segment: (bool) if True, the actual size that is used during the\n                training is used during inference.\n        \"\"\"\n        super().__init__()\n        self.cac = cac\n        self.wiener_residual = wiener_residual\n        self.audio_channels = audio_channels\n        self.sources = sources\n        self.kernel_size = kernel_size\n        self.context = context\n        self.stride = stride\n        self.depth = depth\n        self.bottom_channels = bottom_channels\n        self.channels = channels\n        self.samplerate = samplerate\n        self.segment = segment\n        self.use_train_segment = use_train_segment\n        self.nfft = nfft\n        self.hop_length = nfft // 4\n        self.wiener_iters = wiener_iters\n        self.end_iters = end_iters\n        self.freq_emb = None\n        assert wiener_iters == end_iters\n\n        self.encoder = nn.ModuleList()\n        self.decoder = nn.ModuleList()\n\n        self.tencoder = nn.ModuleList()\n        self.tdecoder = nn.ModuleList()\n\n        chin = audio_channels\n        chin_z = chin  # number of channels for the freq branch\n        if self.cac:\n            chin_z *= 2\n        chout = channels_time or channels\n        chout_z = channels\n        freqs = nfft // 2\n\n        for index in range(depth):\n            norm = index >= norm_starts\n            freq = freqs > 1\n            stri = stride\n            ker = kernel_size\n            if not freq:\n                assert freqs == 1\n                ker = time_stride * 2\n                stri = time_stride\n\n            pad = True\n            last_freq = False\n            if freq and freqs <= kernel_size:\n                ker = freqs\n                pad = False\n                last_freq = True\n\n            kw = {\n                \"kernel_size\": ker,\n                \"stride\": stri,\n                \"freq\": freq,\n                \"pad\": pad,\n                \"norm\": norm,\n                \"rewrite\": rewrite,\n                \"norm_groups\": norm_groups,\n                \"dconv_kw\": {\"depth\": dconv_depth, \"compress\": dconv_comp, \"init\": dconv_init, \"gelu\": True},\n            }\n            kwt = dict(kw)\n            kwt[\"freq\"] = 0\n            kwt[\"kernel_size\"] = kernel_size\n            kwt[\"stride\"] = stride\n            kwt[\"pad\"] = True\n            kw_dec = dict(kw)\n            multi = False\n            if multi_freqs and index < multi_freqs_depth:\n                multi = True\n                kw_dec[\"context_freq\"] = False\n\n            if last_freq:\n                chout_z = max(chout, chout_z)\n                chout = chout_z\n\n            enc = HEncLayer(chin_z, chout_z, dconv=dconv_mode & 1, context=context_enc, **kw)\n            if freq:\n                tenc = HEncLayer(chin, chout, dconv=dconv_mode & 1, context=context_enc, empty=last_freq, **kwt)\n                self.tencoder.append(tenc)\n\n            if multi:\n                enc = MultiWrap(enc, multi_freqs)\n            self.encoder.append(enc)\n            if index == 0:\n                chin = self.audio_channels * len(self.sources)\n                chin_z = chin\n                if self.cac:\n                    chin_z *= 2\n            dec = HDecLayer(chout_z, chin_z, dconv=dconv_mode & 2, last=index == 0, context=context, **kw_dec)\n            if multi:\n                dec = MultiWrap(dec, multi_freqs)\n            if freq:\n                tdec = HDecLayer(chout, chin, dconv=dconv_mode & 2, empty=last_freq, last=index == 0, context=context, **kwt)\n                self.tdecoder.insert(0, tdec)\n            self.decoder.insert(0, dec)\n\n            chin = chout\n            chin_z = chout_z\n            chout = int(growth * chout)\n            chout_z = int(growth * chout_z)\n            if freq:\n                if freqs <= kernel_size:\n                    freqs = 1\n                else:\n                    freqs //= stride\n            if index == 0 and freq_emb:\n                self.freq_emb = ScaledEmbedding(freqs, chin_z, smooth=emb_smooth, scale=emb_scale)\n                self.freq_emb_scale = freq_emb\n\n        if rescale:\n            rescale_module(self, reference=rescale)\n\n        transformer_channels = channels * growth ** (depth - 1)\n        if bottom_channels:\n            self.channel_upsampler = nn.Conv1d(transformer_channels, bottom_channels, 1)\n            self.channel_downsampler = nn.Conv1d(bottom_channels, transformer_channels, 1)\n            self.channel_upsampler_t = nn.Conv1d(transformer_channels, bottom_channels, 1)\n            self.channel_downsampler_t = nn.Conv1d(bottom_channels, transformer_channels, 1)\n\n            transformer_channels = bottom_channels\n\n        if t_layers > 0:\n            self.crosstransformer = CrossTransformerEncoder(\n                dim=transformer_channels,\n                emb=t_emb,\n                hidden_scale=t_hidden_scale,\n                num_heads=t_heads,\n                num_layers=t_layers,\n                cross_first=t_cross_first,\n                dropout=t_dropout,\n                max_positions=t_max_positions,\n                norm_in=t_norm_in,\n                norm_in_group=t_norm_in_group,\n                group_norm=t_group_norm,\n                norm_first=t_norm_first,\n                norm_out=t_norm_out,\n                max_period=t_max_period,\n                weight_decay=t_weight_decay,\n                lr=t_lr,\n                layer_scale=t_layer_scale,\n                gelu=t_gelu,\n                sin_random_shift=t_sin_random_shift,\n                weight_pos_embed=t_weight_pos_embed,\n                cape_mean_normalize=t_cape_mean_normalize,\n                cape_augment=t_cape_augment,\n                cape_glob_loc_scale=t_cape_glob_loc_scale,\n                sparse_self_attn=t_sparse_self_attn,\n                sparse_cross_attn=t_sparse_cross_attn,\n                mask_type=t_mask_type,\n                mask_random_seed=t_mask_random_seed,\n                sparse_attn_window=t_sparse_attn_window,\n                global_window=t_global_window,\n                sparsity=t_sparsity,\n                auto_sparsity=t_auto_sparsity,\n            )\n        else:\n            self.crosstransformer = None\n\n    def _spec(self, x):\n        hl = self.hop_length\n        nfft = self.nfft\n        x0 = x  # noqa\n\n        # We re-pad the signal in order to keep the property\n        # that the size of the output is exactly the size of the input\n        # divided by the stride (here hop_length), when divisible.\n        # This is achieved by padding by 1/4th of the kernel size (here nfft).\n        # which is not supported by torch.stft.\n        # Having all convolution operations follow this convention allow to easily\n        # align the time and frequency branches later on.\n        assert hl == nfft // 4\n        le = int(math.ceil(x.shape[-1] / hl))\n        pad = hl // 2 * 3\n        x = pad1d(x, (pad, pad + le * hl - x.shape[-1]), mode=\"reflect\")\n\n        z = spectro(x, nfft, hl)[..., :-1, :]\n        assert z.shape[-1] == le + 4, (z.shape, x.shape, le)\n        z = z[..., 2 : 2 + le]\n        return z\n\n    def _ispec(self, z, length=None, scale=0):\n        hl = self.hop_length // (4**scale)\n        z = F.pad(z, (0, 0, 0, 1))\n        z = F.pad(z, (2, 2))\n        pad = hl // 2 * 3\n        le = hl * int(math.ceil(length / hl)) + 2 * pad\n        x = ispectro(z, hl, length=le)\n        x = x[..., pad : pad + length]\n        return x\n\n    def _magnitude(self, z):\n        # return the magnitude of the spectrogram, except when cac is True,\n        # in which case we just move the complex dimension to the channel one.\n        if self.cac:\n            B, C, Fr, T = z.shape\n            m = torch.view_as_real(z).permute(0, 1, 4, 2, 3)\n            m = m.reshape(B, C * 2, Fr, T)\n        else:\n            m = z.abs()\n        return m\n\n    def _mask(self, z, m):\n        # Apply masking given the mixture spectrogram `z` and the estimated mask `m`.\n        # If `cac` is True, `m` is actually a full spectrogram and `z` is ignored.\n        niters = self.wiener_iters\n        if self.cac:\n            B, S, C, Fr, T = m.shape\n            out = m.view(B, S, -1, 2, Fr, T).permute(0, 1, 2, 4, 5, 3)\n            out = torch.view_as_complex(out.contiguous())\n            return out\n        if self.training:\n            niters = self.end_iters\n        if niters < 0:\n            z = z[:, None]\n            return z / (1e-8 + z.abs()) * m\n        else:\n            return self._wiener(m, z, niters)\n\n    def _wiener(self, mag_out, mix_stft, niters):\n        # apply wiener filtering from OpenUnmix.\n        init = mix_stft.dtype\n        wiener_win_len = 300\n        residual = self.wiener_residual\n\n        B, S, C, Fq, T = mag_out.shape\n        mag_out = mag_out.permute(0, 4, 3, 2, 1)\n        mix_stft = torch.view_as_real(mix_stft.permute(0, 3, 2, 1))\n\n        outs = []\n        for sample in range(B):\n            pos = 0\n            out = []\n            for pos in range(0, T, wiener_win_len):\n                frame = slice(pos, pos + wiener_win_len)\n                z_out = wiener(mag_out[sample, frame], mix_stft[sample, frame], niters, residual=residual)\n                out.append(z_out.transpose(-1, -2))\n            outs.append(torch.cat(out, dim=0))\n        out = torch.view_as_complex(torch.stack(outs, 0))\n        out = out.permute(0, 4, 3, 2, 1).contiguous()\n        if residual:\n            out = out[:, :-1]\n        assert list(out.shape) == [B, S, C, Fq, T]\n        return out.to(init)\n\n    def valid_length(self, length: int):\n        \"\"\"\n        Return a length that is appropriate for evaluation.\n        In our case, always return the training length, unless\n        it is smaller than the given length, in which case this\n        raises an error.\n        \"\"\"\n        if not self.use_train_segment:\n            return length\n        training_length = int(self.segment * self.samplerate)\n        if training_length < length:\n            raise ValueError(f\"Given length {length} is longer than \" f\"training length {training_length}\")\n        return training_length\n\n    def forward(self, mix):\n        length = mix.shape[-1]\n        length_pre_pad = None\n        if self.use_train_segment:\n            if self.training:\n                self.segment = Fraction(mix.shape[-1], self.samplerate)\n            else:\n                training_length = int(self.segment * self.samplerate)\n                if mix.shape[-1] < training_length:\n                    length_pre_pad = mix.shape[-1]\n                    mix = F.pad(mix, (0, training_length - length_pre_pad))\n        z = self._spec(mix)\n        mag = self._magnitude(z).to(mix.device)\n        x = mag\n\n        B, C, Fq, T = x.shape\n\n        # unlike previous Demucs, we always normalize because it is easier.\n        mean = x.mean(dim=(1, 2, 3), keepdim=True)\n        std = x.std(dim=(1, 2, 3), keepdim=True)\n        x = (x - mean) / (1e-5 + std)\n        # x will be the freq. branch input.\n\n        # Prepare the time branch input.\n        xt = mix\n        meant = xt.mean(dim=(1, 2), keepdim=True)\n        stdt = xt.std(dim=(1, 2), keepdim=True)\n        xt = (xt - meant) / (1e-5 + stdt)\n\n        # okay, this is a giant mess I know...\n        saved = []  # skip connections, freq.\n        saved_t = []  # skip connections, time.\n        lengths = []  # saved lengths to properly remove padding, freq branch.\n        lengths_t = []  # saved lengths for time branch.\n        for idx, encode in enumerate(self.encoder):\n            lengths.append(x.shape[-1])\n            inject = None\n            if idx < len(self.tencoder):\n                # we have not yet merged branches.\n                lengths_t.append(xt.shape[-1])\n                tenc = self.tencoder[idx]\n                xt = tenc(xt)\n                if not tenc.empty:\n                    # save for skip connection\n                    saved_t.append(xt)\n                else:\n                    # tenc contains just the first conv., so that now time and freq.\n                    # branches have the same shape and can be merged.\n                    inject = xt\n            x = encode(x, inject)\n            if idx == 0 and self.freq_emb is not None:\n                # add frequency embedding to allow for non equivariant convolutions\n                # over the frequency axis.\n                frs = torch.arange(x.shape[-2], device=x.device)\n                emb = self.freq_emb(frs).t()[None, :, :, None].expand_as(x)\n                x = x + self.freq_emb_scale * emb\n\n            saved.append(x)\n        if self.crosstransformer:\n            if self.bottom_channels:\n                b, c, f, t = x.shape\n                x = rearrange(x, \"b c f t-> b c (f t)\")\n                x = self.channel_upsampler(x)\n                x = rearrange(x, \"b c (f t)-> b c f t\", f=f)\n                xt = self.channel_upsampler_t(xt)\n\n            x, xt = self.crosstransformer(x, xt)\n\n            if self.bottom_channels:\n                x = rearrange(x, \"b c f t-> b c (f t)\")\n                x = self.channel_downsampler(x)\n                x = rearrange(x, \"b c (f t)-> b c f t\", f=f)\n                xt = self.channel_downsampler_t(xt)\n\n        for idx, decode in enumerate(self.decoder):\n            skip = saved.pop(-1)\n            x, pre = decode(x, skip, lengths.pop(-1))\n            # `pre` contains the output just before final transposed convolution,\n            # which is used when the freq. and time branch separate.\n\n            offset = self.depth - len(self.tdecoder)\n            if idx >= offset:\n                tdec = self.tdecoder[idx - offset]\n                length_t = lengths_t.pop(-1)\n                if tdec.empty:\n                    assert pre.shape[2] == 1, pre.shape\n                    pre = pre[:, :, 0]\n                    xt, _ = tdec(pre, None, length_t)\n                else:\n                    skip = saved_t.pop(-1)\n                    xt, _ = tdec(xt, skip, length_t)\n\n        # Let's make sure we used all stored skip connections.\n        assert len(saved) == 0\n        assert len(lengths_t) == 0\n        assert len(saved_t) == 0\n\n        S = len(self.sources)\n        x = x.view(B, S, -1, Fq, T)\n        x = x * std[:, None] + mean[:, None]\n\n        # to cpu as non-cuda GPUs don't support complex numbers\n        # demucs issue #435 ##432\n        # NOTE: in this case z already is on cpu\n        # TODO: remove this when mps supports complex numbers\n\n        device_type = x.device.type\n        device_load = f\"{device_type}:{x.device.index}\" if not device_type == \"mps\" else device_type\n        x_is_other_gpu = not device_type in [\"cuda\", \"cpu\"]\n\n        if x_is_other_gpu:\n            x = x.cpu()\n\n        zout = self._mask(z, x)\n        if self.use_train_segment:\n            if self.training:\n                x = self._ispec(zout, length)\n            else:\n                x = self._ispec(zout, training_length)\n        else:\n            x = self._ispec(zout, length)\n\n        # back to other device\n        if x_is_other_gpu:\n            x = x.to(device_load)\n\n        if self.use_train_segment:\n            if self.training:\n                xt = xt.view(B, S, -1, length)\n            else:\n                xt = xt.view(B, S, -1, training_length)\n        else:\n            xt = xt.view(B, S, -1, length)\n        xt = xt * stdt[:, None] + meant[:, None]\n        x = xt + x\n        if length_pre_pad:\n            x = x[..., :length_pre_pad]\n        return x\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/model.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nimport math\n\nimport torch as th\nfrom torch import nn\n\nfrom .utils import capture_init, center_trim\n\n\nclass BLSTM(nn.Module):\n    def __init__(self, dim, layers=1):\n        super().__init__()\n        self.lstm = nn.LSTM(bidirectional=True, num_layers=layers, hidden_size=dim, input_size=dim)\n        self.linear = nn.Linear(2 * dim, dim)\n\n    def forward(self, x):\n        x = x.permute(2, 0, 1)\n        x = self.lstm(x)[0]\n        x = self.linear(x)\n        x = x.permute(1, 2, 0)\n        return x\n\n\ndef rescale_conv(conv, reference):\n    std = conv.weight.std().detach()\n    scale = (std / reference) ** 0.5\n    conv.weight.data /= scale\n    if conv.bias is not None:\n        conv.bias.data /= scale\n\n\ndef rescale_module(module, reference):\n    for sub in module.modules():\n        if isinstance(sub, (nn.Conv1d, nn.ConvTranspose1d)):\n            rescale_conv(sub, reference)\n\n\ndef upsample(x, stride):\n    \"\"\"\n    Linear upsampling, the output will be `stride` times longer.\n    \"\"\"\n    batch, channels, time = x.size()\n    weight = th.arange(stride, device=x.device, dtype=th.float) / stride\n    x = x.view(batch, channels, time, 1)\n    out = x[..., :-1, :] * (1 - weight) + x[..., 1:, :] * weight\n    return out.reshape(batch, channels, -1)\n\n\ndef downsample(x, stride):\n    \"\"\"\n    Downsample x by decimation.\n    \"\"\"\n    return x[:, :, ::stride]\n\n\nclass Demucs(nn.Module):\n    @capture_init\n    def __init__(\n        self, sources=4, audio_channels=2, channels=64, depth=6, rewrite=True, glu=True, upsample=False, rescale=0.1, kernel_size=8, stride=4, growth=2.0, lstm_layers=2, context=3, samplerate=44100\n    ):\n        \"\"\"\n        Args:\n            sources (int): number of sources to separate\n            audio_channels (int): stereo or mono\n            channels (int): first convolution channels\n            depth (int): number of encoder/decoder layers\n            rewrite (bool): add 1x1 convolution to each encoder layer\n                and a convolution to each decoder layer.\n                For the decoder layer, `context` gives the kernel size.\n            glu (bool): use glu instead of ReLU\n            upsample (bool): use linear upsampling with convolutions\n                Wave-U-Net style, instead of transposed convolutions\n            rescale (int): rescale initial weights of convolutions\n                to get their standard deviation closer to `rescale`\n            kernel_size (int): kernel size for convolutions\n            stride (int): stride for convolutions\n            growth (float): multiply (resp divide) number of channels by that\n                for each layer of the encoder (resp decoder)\n            lstm_layers (int): number of lstm layers, 0 = no lstm\n            context (int): kernel size of the convolution in the\n                decoder before the transposed convolution. If > 1,\n                will provide some context from neighboring time\n                steps.\n        \"\"\"\n\n        super().__init__()\n        self.audio_channels = audio_channels\n        self.sources = sources\n        self.kernel_size = kernel_size\n        self.context = context\n        self.stride = stride\n        self.depth = depth\n        self.upsample = upsample\n        self.channels = channels\n        self.samplerate = samplerate\n\n        self.encoder = nn.ModuleList()\n        self.decoder = nn.ModuleList()\n\n        self.final = None\n        if upsample:\n            self.final = nn.Conv1d(channels + audio_channels, sources * audio_channels, 1)\n            stride = 1\n\n        if glu:\n            activation = nn.GLU(dim=1)\n            ch_scale = 2\n        else:\n            activation = nn.ReLU()\n            ch_scale = 1\n        in_channels = audio_channels\n        for index in range(depth):\n            encode = []\n            encode += [nn.Conv1d(in_channels, channels, kernel_size, stride), nn.ReLU()]\n            if rewrite:\n                encode += [nn.Conv1d(channels, ch_scale * channels, 1), activation]\n            self.encoder.append(nn.Sequential(*encode))\n\n            decode = []\n            if index > 0:\n                out_channels = in_channels\n            else:\n                if upsample:\n                    out_channels = channels\n                else:\n                    out_channels = sources * audio_channels\n            if rewrite:\n                decode += [nn.Conv1d(channels, ch_scale * channels, context), activation]\n            if upsample:\n                decode += [nn.Conv1d(channels, out_channels, kernel_size, stride=1)]\n            else:\n                decode += [nn.ConvTranspose1d(channels, out_channels, kernel_size, stride)]\n            if index > 0:\n                decode.append(nn.ReLU())\n            self.decoder.insert(0, nn.Sequential(*decode))\n            in_channels = channels\n            channels = int(growth * channels)\n\n        channels = in_channels\n\n        if lstm_layers:\n            self.lstm = BLSTM(channels, lstm_layers)\n        else:\n            self.lstm = None\n\n        if rescale:\n            rescale_module(self, reference=rescale)\n\n    def valid_length(self, length):\n        \"\"\"\n        Return the nearest valid length to use with the model so that\n        there is no time steps left over in a convolutions, e.g. for all\n        layers, size of the input - kernel_size % stride = 0.\n\n        If the mixture has a valid length, the estimated sources\n        will have exactly the same length when context = 1. If context > 1,\n        the two signals can be center trimmed to match.\n\n        For training, extracts should have a valid length.For evaluation\n        on full tracks we recommend passing `pad = True` to :method:`forward`.\n        \"\"\"\n        for _ in range(self.depth):\n            if self.upsample:\n                length = math.ceil(length / self.stride) + self.kernel_size - 1\n            else:\n                length = math.ceil((length - self.kernel_size) / self.stride) + 1\n            length = max(1, length)\n            length += self.context - 1\n        for _ in range(self.depth):\n            if self.upsample:\n                length = length * self.stride + self.kernel_size - 1\n            else:\n                length = (length - 1) * self.stride + self.kernel_size\n\n        return int(length)\n\n    def forward(self, mix):\n        x = mix\n        saved = [x]\n        for encode in self.encoder:\n            x = encode(x)\n            saved.append(x)\n            if self.upsample:\n                x = downsample(x, self.stride)\n        if self.lstm:\n            x = self.lstm(x)\n        for decode in self.decoder:\n            if self.upsample:\n                x = upsample(x, stride=self.stride)\n            skip = center_trim(saved.pop(-1), x)\n            x = x + skip\n            x = decode(x)\n        if self.final:\n            skip = center_trim(saved.pop(-1), x)\n            x = th.cat([x, skip], dim=1)\n            x = self.final(x)\n\n        x = x.view(x.size(0), self.sources, self.audio_channels, x.size(-1))\n        return x\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/model_v2.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nimport math\n\nimport julius\nfrom torch import nn\nfrom .tasnet_v2 import ConvTasNet\n\nfrom .utils import capture_init, center_trim\n\n\nclass BLSTM(nn.Module):\n    def __init__(self, dim, layers=1):\n        super().__init__()\n        self.lstm = nn.LSTM(bidirectional=True, num_layers=layers, hidden_size=dim, input_size=dim)\n        self.linear = nn.Linear(2 * dim, dim)\n\n    def forward(self, x):\n        x = x.permute(2, 0, 1)\n        x = self.lstm(x)[0]\n        x = self.linear(x)\n        x = x.permute(1, 2, 0)\n        return x\n\n\ndef rescale_conv(conv, reference):\n    std = conv.weight.std().detach()\n    scale = (std / reference) ** 0.5\n    conv.weight.data /= scale\n    if conv.bias is not None:\n        conv.bias.data /= scale\n\n\ndef rescale_module(module, reference):\n    for sub in module.modules():\n        if isinstance(sub, (nn.Conv1d, nn.ConvTranspose1d)):\n            rescale_conv(sub, reference)\n\n\ndef auto_load_demucs_model_v2(sources, demucs_model_name):\n\n    if \"48\" in demucs_model_name:\n        channels = 48\n    elif \"unittest\" in demucs_model_name:\n        channels = 4\n    else:\n        channels = 64\n\n    if \"tasnet\" in demucs_model_name:\n        init_demucs_model = ConvTasNet(sources, X=10)\n    else:\n        init_demucs_model = Demucs(sources, channels=channels)\n\n    return init_demucs_model\n\n\nclass Demucs(nn.Module):\n    @capture_init\n    def __init__(\n        self,\n        sources,\n        audio_channels=2,\n        channels=64,\n        depth=6,\n        rewrite=True,\n        glu=True,\n        rescale=0.1,\n        resample=True,\n        kernel_size=8,\n        stride=4,\n        growth=2.0,\n        lstm_layers=2,\n        context=3,\n        normalize=False,\n        samplerate=44100,\n        segment_length=4 * 10 * 44100,\n    ):\n        \"\"\"\n        Args:\n            sources (list[str]): list of source names\n            audio_channels (int): stereo or mono\n            channels (int): first convolution channels\n            depth (int): number of encoder/decoder layers\n            rewrite (bool): add 1x1 convolution to each encoder layer\n                and a convolution to each decoder layer.\n                For the decoder layer, `context` gives the kernel size.\n            glu (bool): use glu instead of ReLU\n            resample_input (bool): upsample x2 the input and downsample /2 the output.\n            rescale (int): rescale initial weights of convolutions\n                to get their standard deviation closer to `rescale`\n            kernel_size (int): kernel size for convolutions\n            stride (int): stride for convolutions\n            growth (float): multiply (resp divide) number of channels by that\n                for each layer of the encoder (resp decoder)\n            lstm_layers (int): number of lstm layers, 0 = no lstm\n            context (int): kernel size of the convolution in the\n                decoder before the transposed convolution. If > 1,\n                will provide some context from neighboring time\n                steps.\n            samplerate (int): stored as meta information for easing\n                future evaluations of the model.\n            segment_length (int): stored as meta information for easing\n                future evaluations of the model. Length of the segments on which\n                the model was trained.\n        \"\"\"\n\n        super().__init__()\n        self.audio_channels = audio_channels\n        self.sources = sources\n        self.kernel_size = kernel_size\n        self.context = context\n        self.stride = stride\n        self.depth = depth\n        self.resample = resample\n        self.channels = channels\n        self.normalize = normalize\n        self.samplerate = samplerate\n        self.segment_length = segment_length\n\n        self.encoder = nn.ModuleList()\n        self.decoder = nn.ModuleList()\n\n        if glu:\n            activation = nn.GLU(dim=1)\n            ch_scale = 2\n        else:\n            activation = nn.ReLU()\n            ch_scale = 1\n        in_channels = audio_channels\n        for index in range(depth):\n            encode = []\n            encode += [nn.Conv1d(in_channels, channels, kernel_size, stride), nn.ReLU()]\n            if rewrite:\n                encode += [nn.Conv1d(channels, ch_scale * channels, 1), activation]\n            self.encoder.append(nn.Sequential(*encode))\n\n            decode = []\n            if index > 0:\n                out_channels = in_channels\n            else:\n                out_channels = len(self.sources) * audio_channels\n            if rewrite:\n                decode += [nn.Conv1d(channels, ch_scale * channels, context), activation]\n            decode += [nn.ConvTranspose1d(channels, out_channels, kernel_size, stride)]\n            if index > 0:\n                decode.append(nn.ReLU())\n            self.decoder.insert(0, nn.Sequential(*decode))\n            in_channels = channels\n            channels = int(growth * channels)\n\n        channels = in_channels\n\n        if lstm_layers:\n            self.lstm = BLSTM(channels, lstm_layers)\n        else:\n            self.lstm = None\n\n        if rescale:\n            rescale_module(self, reference=rescale)\n\n    def valid_length(self, length):\n        \"\"\"\n        Return the nearest valid length to use with the model so that\n        there is no time steps left over in a convolutions, e.g. for all\n        layers, size of the input - kernel_size % stride = 0.\n\n        If the mixture has a valid length, the estimated sources\n        will have exactly the same length when context = 1. If context > 1,\n        the two signals can be center trimmed to match.\n\n        For training, extracts should have a valid length.For evaluation\n        on full tracks we recommend passing `pad = True` to :method:`forward`.\n        \"\"\"\n        if self.resample:\n            length *= 2\n        for _ in range(self.depth):\n            length = math.ceil((length - self.kernel_size) / self.stride) + 1\n            length = max(1, length)\n            length += self.context - 1\n        for _ in range(self.depth):\n            length = (length - 1) * self.stride + self.kernel_size\n\n        if self.resample:\n            length = math.ceil(length / 2)\n        return int(length)\n\n    def forward(self, mix):\n        x = mix\n\n        if self.normalize:\n            mono = mix.mean(dim=1, keepdim=True)\n            mean = mono.mean(dim=-1, keepdim=True)\n            std = mono.std(dim=-1, keepdim=True)\n        else:\n            mean = 0\n            std = 1\n\n        x = (x - mean) / (1e-5 + std)\n\n        if self.resample:\n            x = julius.resample_frac(x, 1, 2)\n\n        saved = []\n        for encode in self.encoder:\n            x = encode(x)\n            saved.append(x)\n        if self.lstm:\n            x = self.lstm(x)\n        for decode in self.decoder:\n            skip = center_trim(saved.pop(-1), x)\n            x = x + skip\n            x = decode(x)\n\n        if self.resample:\n            x = julius.resample_frac(x, 2, 1)\n        x = x * std + mean\n        x = x.view(x.size(0), len(self.sources), self.audio_channels, x.size(-1))\n        return x\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/pretrained.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"Loading pretrained models.\n\"\"\"\n\nimport logging\nfrom pathlib import Path\nimport typing as tp\n\n# from dora.log import fatal\n\nimport logging\n\nfrom diffq import DiffQuantizer\nimport torch.hub\n\nfrom .model import Demucs\nfrom .tasnet_v2 import ConvTasNet\nfrom .utils import set_state\n\nfrom .hdemucs import HDemucs\nfrom .repo import RemoteRepo, LocalRepo, ModelOnlyRepo, BagOnlyRepo, AnyModelRepo, ModelLoadingError  # noqa\n\nlogger = logging.getLogger(__name__)\nROOT_URL = \"https://dl.fbaipublicfiles.com/demucs/mdx_final/\"\nREMOTE_ROOT = Path(__file__).parent / \"remote\"\n\nSOURCES = [\"drums\", \"bass\", \"other\", \"vocals\"]\n\n\ndef demucs_unittest():\n    model = HDemucs(channels=4, sources=SOURCES)\n    return model\n\n\ndef add_model_flags(parser):\n    group = parser.add_mutually_exclusive_group(required=False)\n    group.add_argument(\"-s\", \"--sig\", help=\"Locally trained XP signature.\")\n    group.add_argument(\"-n\", \"--name\", default=\"mdx_extra_q\", help=\"Pretrained model name or signature. Default is mdx_extra_q.\")\n    parser.add_argument(\"--repo\", type=Path, help=\"Folder containing all pre-trained models for use with -n.\")\n\n\ndef _parse_remote_files(remote_file_list) -> tp.Dict[str, str]:\n    root: str = \"\"\n    models: tp.Dict[str, str] = {}\n    for line in remote_file_list.read_text().split(\"\\n\"):\n        line = line.strip()\n        if line.startswith(\"#\"):\n            continue\n        elif line.startswith(\"root:\"):\n            root = line.split(\":\", 1)[1].strip()\n        else:\n            sig = line.split(\"-\", 1)[0]\n            assert sig not in models\n            models[sig] = ROOT_URL + root + line\n    return models\n\n\ndef get_model(name: str, repo: tp.Optional[Path] = None):\n    \"\"\"`name` must be a bag of models name or a pretrained signature\n    from the remote AWS model repo or the specified local repo if `repo` is not None.\n    \"\"\"\n    if name == \"demucs_unittest\":\n        return demucs_unittest()\n    model_repo: ModelOnlyRepo\n    if repo is None:\n        models = _parse_remote_files(REMOTE_ROOT / \"files.txt\")\n        model_repo = RemoteRepo(models)\n        bag_repo = BagOnlyRepo(REMOTE_ROOT, model_repo)\n    else:\n        if not repo.is_dir():\n            fatal(f\"{repo} must exist and be a directory.\")\n        model_repo = LocalRepo(repo)\n        bag_repo = BagOnlyRepo(repo, model_repo)\n    any_repo = AnyModelRepo(model_repo, bag_repo)\n    model = any_repo.get_model(name)\n    model.eval()\n    return model\n\n\ndef get_model_from_args(args):\n    \"\"\"\n    Load local model package or pre-trained model.\n    \"\"\"\n    return get_model(name=args.name, repo=args.repo)\n\n\nlogger = logging.getLogger(__name__)\nROOT = \"https://dl.fbaipublicfiles.com/demucs/v3.0/\"\n\nPRETRAINED_MODELS = {\n    \"demucs\": \"e07c671f\",\n    \"demucs48_hq\": \"28a1282c\",\n    \"demucs_extra\": \"3646af93\",\n    \"demucs_quantized\": \"07afea75\",\n    \"tasnet\": \"beb46fac\",\n    \"tasnet_extra\": \"df3777b2\",\n    \"demucs_unittest\": \"09ebc15f\",\n}\n\nSOURCES = [\"drums\", \"bass\", \"other\", \"vocals\"]\n\n\ndef get_url(name):\n    sig = PRETRAINED_MODELS[name]\n    return ROOT + name + \"-\" + sig[:8] + \".th\"\n\n\ndef is_pretrained(name):\n    return name in PRETRAINED_MODELS\n\n\ndef load_pretrained(name):\n    if name == \"demucs\":\n        return demucs(pretrained=True)\n    elif name == \"demucs48_hq\":\n        return demucs(pretrained=True, hq=True, channels=48)\n    elif name == \"demucs_extra\":\n        return demucs(pretrained=True, extra=True)\n    elif name == \"demucs_quantized\":\n        return demucs(pretrained=True, quantized=True)\n    elif name == \"demucs_unittest\":\n        return demucs_unittest(pretrained=True)\n    elif name == \"tasnet\":\n        return tasnet(pretrained=True)\n    elif name == \"tasnet_extra\":\n        return tasnet(pretrained=True, extra=True)\n    else:\n        raise ValueError(f\"Invalid pretrained name {name}\")\n\n\ndef _load_state(name, model, quantizer=None):\n    url = get_url(name)\n    state = torch.hub.load_state_dict_from_url(url, map_location=\"cpu\", check_hash=True)\n    set_state(model, quantizer, state)\n    if quantizer:\n        quantizer.detach()\n\n\ndef demucs_unittest(pretrained=True):\n    model = Demucs(channels=4, sources=SOURCES)\n    if pretrained:\n        _load_state(\"demucs_unittest\", model)\n    return model\n\n\ndef demucs(pretrained=True, extra=False, quantized=False, hq=False, channels=64):\n    if not pretrained and (extra or quantized or hq):\n        raise ValueError(\"if extra or quantized is True, pretrained must be True.\")\n    model = Demucs(sources=SOURCES, channels=channels)\n    if pretrained:\n        name = \"demucs\"\n        if channels != 64:\n            name += str(channels)\n        quantizer = None\n        if sum([extra, quantized, hq]) > 1:\n            raise ValueError(\"Only one of extra, quantized, hq, can be True.\")\n        if quantized:\n            quantizer = DiffQuantizer(model, group_size=8, min_size=1)\n            name += \"_quantized\"\n        if extra:\n            name += \"_extra\"\n        if hq:\n            name += \"_hq\"\n        _load_state(name, model, quantizer)\n    return model\n\n\ndef tasnet(pretrained=True, extra=False):\n    if not pretrained and extra:\n        raise ValueError(\"if extra is True, pretrained must be True.\")\n    model = ConvTasNet(X=10, sources=SOURCES)\n    if pretrained:\n        name = \"tasnet\"\n        if extra:\n            name = \"tasnet_extra\"\n        _load_state(name, model)\n    return model\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/repo.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"Represents a model repository, including pre-trained models and bags of models.\nA repo can either be the main remote repository stored in AWS, or a local repository\nwith your own models.\n\"\"\"\n\nfrom hashlib import sha256\nfrom pathlib import Path\nimport typing as tp\n\nimport torch\nimport yaml\n\nfrom .apply import BagOfModels, Model\nfrom .states import load_model\n\n\nAnyModel = tp.Union[Model, BagOfModels]\n\n\nclass ModelLoadingError(RuntimeError):\n    pass\n\n\ndef check_checksum(path: Path, checksum: str):\n    sha = sha256()\n    with open(path, \"rb\") as file:\n        while True:\n            buf = file.read(2**20)\n            if not buf:\n                break\n            sha.update(buf)\n    actual_checksum = sha.hexdigest()[: len(checksum)]\n    if actual_checksum != checksum:\n        raise ModelLoadingError(f\"Invalid checksum for file {path}, \" f\"expected {checksum} but got {actual_checksum}\")\n\n\nclass ModelOnlyRepo:\n    \"\"\"Base class for all model only repos.\"\"\"\n\n    def has_model(self, sig: str) -> bool:\n        raise NotImplementedError()\n\n    def get_model(self, sig: str) -> Model:\n        raise NotImplementedError()\n\n\nclass RemoteRepo(ModelOnlyRepo):\n    def __init__(self, models: tp.Dict[str, str]):\n        self._models = models\n\n    def has_model(self, sig: str) -> bool:\n        return sig in self._models\n\n    def get_model(self, sig: str) -> Model:\n        try:\n            url = self._models[sig]\n        except KeyError:\n            raise ModelLoadingError(f\"Could not find a pre-trained model with signature {sig}.\")\n        pkg = torch.hub.load_state_dict_from_url(url, map_location=\"cpu\", check_hash=True)\n        return load_model(pkg)\n\n\nclass LocalRepo(ModelOnlyRepo):\n    def __init__(self, root: Path):\n        self.root = root\n        self.scan()\n\n    def scan(self):\n        self._models = {}\n        self._checksums = {}\n        for file in self.root.iterdir():\n            if file.suffix == \".th\":\n                if \"-\" in file.stem:\n                    xp_sig, checksum = file.stem.split(\"-\")\n                    self._checksums[xp_sig] = checksum\n                else:\n                    xp_sig = file.stem\n                if xp_sig in self._models:\n                    print(\"Whats xp? \", xp_sig)\n                    raise ModelLoadingError(f\"Duplicate pre-trained model exist for signature {xp_sig}. \" \"Please delete all but one.\")\n                self._models[xp_sig] = file\n\n    def has_model(self, sig: str) -> bool:\n        return sig in self._models\n\n    def get_model(self, sig: str) -> Model:\n        try:\n            file = self._models[sig]\n        except KeyError:\n            raise ModelLoadingError(f\"Could not find pre-trained model with signature {sig}.\")\n        if sig in self._checksums:\n            check_checksum(file, self._checksums[sig])\n        return load_model(file)\n\n\nclass BagOnlyRepo:\n    \"\"\"Handles only YAML files containing bag of models, leaving the actual\n    model loading to some Repo.\n    \"\"\"\n\n    def __init__(self, root: Path, model_repo: ModelOnlyRepo):\n        self.root = root\n        self.model_repo = model_repo\n        self.scan()\n\n    def scan(self):\n        self._bags = {}\n        for file in self.root.iterdir():\n            if file.suffix == \".yaml\":\n                self._bags[file.stem] = file\n\n    def has_model(self, name: str) -> bool:\n        return name in self._bags\n\n    def get_model(self, name: str) -> BagOfModels:\n        try:\n            yaml_file = self._bags[name]\n        except KeyError:\n            raise ModelLoadingError(f\"{name} is neither a single pre-trained model or \" \"a bag of models.\")\n        bag = yaml.safe_load(open(yaml_file))\n        signatures = bag[\"models\"]\n        models = [self.model_repo.get_model(sig) for sig in signatures]\n        weights = bag.get(\"weights\")\n        segment = bag.get(\"segment\")\n        return BagOfModels(models, weights, segment)\n\n\nclass AnyModelRepo:\n    def __init__(self, model_repo: ModelOnlyRepo, bag_repo: BagOnlyRepo):\n        self.model_repo = model_repo\n        self.bag_repo = bag_repo\n\n    def has_model(self, name_or_sig: str) -> bool:\n        return self.model_repo.has_model(name_or_sig) or self.bag_repo.has_model(name_or_sig)\n\n    def get_model(self, name_or_sig: str) -> AnyModel:\n        # print('name_or_sig: ', name_or_sig)\n        if self.model_repo.has_model(name_or_sig):\n            return self.model_repo.get_model(name_or_sig)\n        else:\n            return self.bag_repo.get_model(name_or_sig)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/spec.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"Conveniance wrapper to perform STFT and iSTFT\"\"\"\n\nimport torch as th\n\n\ndef spectro(x, n_fft=512, hop_length=None, pad=0):\n    *other, length = x.shape\n    x = x.reshape(-1, length)\n\n    device_type = x.device.type\n    is_other_gpu = not device_type in [\"cuda\", \"cpu\"]\n\n    if is_other_gpu:\n        x = x.cpu()\n    z = th.stft(x, n_fft * (1 + pad), hop_length or n_fft // 4, window=th.hann_window(n_fft).to(x), win_length=n_fft, normalized=True, center=True, return_complex=True, pad_mode=\"reflect\")\n    _, freqs, frame = z.shape\n    return z.view(*other, freqs, frame)\n\n\ndef ispectro(z, hop_length=None, length=None, pad=0):\n    *other, freqs, frames = z.shape\n    n_fft = 2 * freqs - 2\n    z = z.view(-1, freqs, frames)\n    win_length = n_fft // (1 + pad)\n\n    device_type = z.device.type\n    is_other_gpu = not device_type in [\"cuda\", \"cpu\"]\n\n    if is_other_gpu:\n        z = z.cpu()\n    x = th.istft(z, n_fft, hop_length, window=th.hann_window(win_length).to(z.real), win_length=win_length, normalized=True, length=length, center=True)\n    _, length = x.shape\n    return x.view(*other, length)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/states.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\"\"\"\nUtilities to save and load models.\n\"\"\"\nfrom contextlib import contextmanager\n\nimport functools\nimport hashlib\nimport inspect\nimport io\nfrom pathlib import Path\nimport warnings\n\nfrom diffq import DiffQuantizer, UniformQuantizer, restore_quantized_state\nimport torch\n\n\ndef get_quantizer(model, args, optimizer=None):\n    \"\"\"Return the quantizer given the XP quantization args.\"\"\"\n    quantizer = None\n    if args.diffq:\n        quantizer = DiffQuantizer(model, min_size=args.min_size, group_size=args.group_size)\n        if optimizer is not None:\n            quantizer.setup_optimizer(optimizer)\n    elif args.qat:\n        quantizer = UniformQuantizer(model, bits=args.qat, min_size=args.min_size)\n    return quantizer\n\n\ndef load_model(path_or_package, strict=False):\n    \"\"\"Load a model from the given serialized model, either given as a dict (already loaded)\n    or a path to a file on disk.\"\"\"\n    if isinstance(path_or_package, dict):\n        package = path_or_package\n    elif isinstance(path_or_package, (str, Path)):\n        with warnings.catch_warnings():\n            warnings.simplefilter(\"ignore\")\n            path = path_or_package\n            package = torch.load(path, \"cpu\", weights_only=False)\n    else:\n        raise ValueError(f\"Invalid type for {path_or_package}.\")\n\n    klass = package[\"klass\"]\n    args = package[\"args\"]\n    kwargs = package[\"kwargs\"]\n\n    if strict:\n        model = klass(*args, **kwargs)\n    else:\n        sig = inspect.signature(klass)\n        for key in list(kwargs):\n            if key not in sig.parameters:\n                warnings.warn(\"Dropping inexistant parameter \" + key)\n                del kwargs[key]\n        model = klass(*args, **kwargs)\n\n    state = package[\"state\"]\n\n    set_state(model, state)\n    return model\n\n\ndef get_state(model, quantizer, half=False):\n    \"\"\"Get the state from a model, potentially with quantization applied.\n    If `half` is True, model are stored as half precision, which shouldn't impact performance\n    but half the state size.\"\"\"\n    if quantizer is None:\n        dtype = torch.half if half else None\n        state = {k: p.data.to(device=\"cpu\", dtype=dtype) for k, p in model.state_dict().items()}\n    else:\n        state = quantizer.get_quantized_state()\n        state[\"__quantized\"] = True\n    return state\n\n\ndef set_state(model, state, quantizer=None):\n    \"\"\"Set the state on a given model.\"\"\"\n    if state.get(\"__quantized\"):\n        if quantizer is not None:\n            quantizer.restore_quantized_state(model, state[\"quantized\"])\n        else:\n            restore_quantized_state(model, state)\n    else:\n        model.load_state_dict(state)\n    return state\n\n\ndef save_with_checksum(content, path):\n    \"\"\"Save the given value on disk, along with a sha256 hash.\n    Should be used with the output of either `serialize_model` or `get_state`.\"\"\"\n    buf = io.BytesIO()\n    torch.save(content, buf)\n    sig = hashlib.sha256(buf.getvalue()).hexdigest()[:8]\n\n    path = path.parent / (path.stem + \"-\" + sig + path.suffix)\n    path.write_bytes(buf.getvalue())\n\n\ndef copy_state(state):\n    return {k: v.cpu().clone() for k, v in state.items()}\n\n\n@contextmanager\ndef swap_state(model, state):\n    \"\"\"\n    Context manager that swaps the state of a model, e.g:\n\n        # model is in old state\n        with swap_state(model, new_state):\n            # model in new state\n        # model back to old state\n    \"\"\"\n    old_state = copy_state(model.state_dict())\n    model.load_state_dict(state, strict=False)\n    try:\n        yield\n    finally:\n        model.load_state_dict(old_state)\n\n\ndef capture_init(init):\n    @functools.wraps(init)\n    def __init__(self, *args, **kwargs):\n        self._init_args_kwargs = (args, kwargs)\n        init(self, *args, **kwargs)\n\n    return __init__\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/tasnet.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n#\n# Created on 2018/12\n# Author: Kaituo XU\n# Modified on 2019/11 by Alexandre Defossez, added support for multiple output channels\n# Here is the original license:\n# The MIT License (MIT)\n#\n# Copyright (c) 2018 Kaituo XU\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n# The above copyright notice and this permission notice shall be included in all\n# copies or substantial portions of the Software.\n#\n# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n# SOFTWARE.\n\nimport math\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom .utils import capture_init\n\nEPS = 1e-8\n\n\ndef overlap_and_add(signal, frame_step):\n    outer_dimensions = signal.size()[:-2]\n    frames, frame_length = signal.size()[-2:]\n\n    subframe_length = math.gcd(frame_length, frame_step)  # gcd=Greatest Common Divisor\n    subframe_step = frame_step // subframe_length\n    subframes_per_frame = frame_length // subframe_length\n    output_size = frame_step * (frames - 1) + frame_length\n    output_subframes = output_size // subframe_length\n\n    subframe_signal = signal.view(*outer_dimensions, -1, subframe_length)\n\n    frame = torch.arange(0, output_subframes, device=signal.device).unfold(0, subframes_per_frame, subframe_step)\n    frame = frame.long()  # signal may in GPU or CPU\n    frame = frame.contiguous().view(-1)\n\n    result = signal.new_zeros(*outer_dimensions, output_subframes, subframe_length)\n    result.index_add_(-2, frame, subframe_signal)\n    result = result.view(*outer_dimensions, -1)\n    return result\n\n\nclass ConvTasNet(nn.Module):\n    @capture_init\n    def __init__(self, N=256, L=20, B=256, H=512, P=3, X=8, R=4, C=4, audio_channels=1, samplerate=44100, norm_type=\"gLN\", causal=False, mask_nonlinear=\"relu\"):\n        \"\"\"\n        Args:\n            N: Number of filters in autoencoder\n            L: Length of the filters (in samples)\n            B: Number of channels in bottleneck 1 × 1-conv block\n            H: Number of channels in convolutional blocks\n            P: Kernel size in convolutional blocks\n            X: Number of convolutional blocks in each repeat\n            R: Number of repeats\n            C: Number of speakers\n            norm_type: BN, gLN, cLN\n            causal: causal or non-causal\n            mask_nonlinear: use which non-linear function to generate mask\n        \"\"\"\n        super(ConvTasNet, self).__init__()\n        # Hyper-parameter\n        self.N, self.L, self.B, self.H, self.P, self.X, self.R, self.C = N, L, B, H, P, X, R, C\n        self.norm_type = norm_type\n        self.causal = causal\n        self.mask_nonlinear = mask_nonlinear\n        self.audio_channels = audio_channels\n        self.samplerate = samplerate\n        # Components\n        self.encoder = Encoder(L, N, audio_channels)\n        self.separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type, causal, mask_nonlinear)\n        self.decoder = Decoder(N, L, audio_channels)\n        # init\n        for p in self.parameters():\n            if p.dim() > 1:\n                nn.init.xavier_normal_(p)\n\n    def valid_length(self, length):\n        return length\n\n    def forward(self, mixture):\n        \"\"\"\n        Args:\n            mixture: [M, T], M is batch size, T is #samples\n        Returns:\n            est_source: [M, C, T]\n        \"\"\"\n        mixture_w = self.encoder(mixture)\n        est_mask = self.separator(mixture_w)\n        est_source = self.decoder(mixture_w, est_mask)\n\n        # T changed after conv1d in encoder, fix it here\n        T_origin = mixture.size(-1)\n        T_conv = est_source.size(-1)\n        est_source = F.pad(est_source, (0, T_origin - T_conv))\n        return est_source\n\n\nclass Encoder(nn.Module):\n    \"\"\"Estimation of the nonnegative mixture weight by a 1-D conv layer.\"\"\"\n\n    def __init__(self, L, N, audio_channels):\n        super(Encoder, self).__init__()\n        # Hyper-parameter\n        self.L, self.N = L, N\n        # Components\n        # 50% overlap\n        self.conv1d_U = nn.Conv1d(audio_channels, N, kernel_size=L, stride=L // 2, bias=False)\n\n    def forward(self, mixture):\n        \"\"\"\n        Args:\n            mixture: [M, T], M is batch size, T is #samples\n        Returns:\n            mixture_w: [M, N, K], where K = (T-L)/(L/2)+1 = 2T/L-1\n        \"\"\"\n        mixture_w = F.relu(self.conv1d_U(mixture))  # [M, N, K]\n        return mixture_w\n\n\nclass Decoder(nn.Module):\n    def __init__(self, N, L, audio_channels):\n        super(Decoder, self).__init__()\n        # Hyper-parameter\n        self.N, self.L = N, L\n        self.audio_channels = audio_channels\n        # Components\n        self.basis_signals = nn.Linear(N, audio_channels * L, bias=False)\n\n    def forward(self, mixture_w, est_mask):\n        \"\"\"\n        Args:\n            mixture_w: [M, N, K]\n            est_mask: [M, C, N, K]\n        Returns:\n            est_source: [M, C, T]\n        \"\"\"\n        # D = W * M\n        source_w = torch.unsqueeze(mixture_w, 1) * est_mask  # [M, C, N, K]\n        source_w = torch.transpose(source_w, 2, 3)  # [M, C, K, N]\n        # S = DV\n        est_source = self.basis_signals(source_w)  # [M, C, K, ac * L]\n        m, c, k, _ = est_source.size()\n        est_source = est_source.view(m, c, k, self.audio_channels, -1).transpose(2, 3).contiguous()\n        est_source = overlap_and_add(est_source, self.L // 2)  # M x C x ac x T\n        return est_source\n\n\nclass TemporalConvNet(nn.Module):\n    def __init__(self, N, B, H, P, X, R, C, norm_type=\"gLN\", causal=False, mask_nonlinear=\"relu\"):\n        \"\"\"\n        Args:\n            N: Number of filters in autoencoder\n            B: Number of channels in bottleneck 1 × 1-conv block\n            H: Number of channels in convolutional blocks\n            P: Kernel size in convolutional blocks\n            X: Number of convolutional blocks in each repeat\n            R: Number of repeats\n            C: Number of speakers\n            norm_type: BN, gLN, cLN\n            causal: causal or non-causal\n            mask_nonlinear: use which non-linear function to generate mask\n        \"\"\"\n        super(TemporalConvNet, self).__init__()\n        # Hyper-parameter\n        self.C = C\n        self.mask_nonlinear = mask_nonlinear\n        # Components\n        # [M, N, K] -> [M, N, K]\n        layer_norm = ChannelwiseLayerNorm(N)\n        # [M, N, K] -> [M, B, K]\n        bottleneck_conv1x1 = nn.Conv1d(N, B, 1, bias=False)\n        # [M, B, K] -> [M, B, K]\n        repeats = []\n        for r in range(R):\n            blocks = []\n            for x in range(X):\n                dilation = 2**x\n                padding = (P - 1) * dilation if causal else (P - 1) * dilation // 2\n                blocks += [TemporalBlock(B, H, P, stride=1, padding=padding, dilation=dilation, norm_type=norm_type, causal=causal)]\n            repeats += [nn.Sequential(*blocks)]\n        temporal_conv_net = nn.Sequential(*repeats)\n        # [M, B, K] -> [M, C*N, K]\n        mask_conv1x1 = nn.Conv1d(B, C * N, 1, bias=False)\n        # Put together\n        self.network = nn.Sequential(layer_norm, bottleneck_conv1x1, temporal_conv_net, mask_conv1x1)\n\n    def forward(self, mixture_w):\n        \"\"\"\n        Keep this API same with TasNet\n        Args:\n            mixture_w: [M, N, K], M is batch size\n        returns:\n            est_mask: [M, C, N, K]\n        \"\"\"\n        M, N, K = mixture_w.size()\n        score = self.network(mixture_w)  # [M, N, K] -> [M, C*N, K]\n        score = score.view(M, self.C, N, K)  # [M, C*N, K] -> [M, C, N, K]\n        if self.mask_nonlinear == \"softmax\":\n            est_mask = F.softmax(score, dim=1)\n        elif self.mask_nonlinear == \"relu\":\n            est_mask = F.relu(score)\n        else:\n            raise ValueError(\"Unsupported mask non-linear function\")\n        return est_mask\n\n\nclass TemporalBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type=\"gLN\", causal=False):\n        super(TemporalBlock, self).__init__()\n        # [M, B, K] -> [M, H, K]\n        conv1x1 = nn.Conv1d(in_channels, out_channels, 1, bias=False)\n        prelu = nn.PReLU()\n        norm = chose_norm(norm_type, out_channels)\n        # [M, H, K] -> [M, B, K]\n        dsconv = DepthwiseSeparableConv(out_channels, in_channels, kernel_size, stride, padding, dilation, norm_type, causal)\n        # Put together\n        self.net = nn.Sequential(conv1x1, prelu, norm, dsconv)\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, B, K]\n        Returns:\n            [M, B, K]\n        \"\"\"\n        residual = x\n        out = self.net(x)\n        # TODO: when P = 3 here works fine, but when P = 2 maybe need to pad?\n        return out + residual  # look like w/o F.relu is better than w/ F.relu\n        # return F.relu(out + residual)\n\n\nclass DepthwiseSeparableConv(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type=\"gLN\", causal=False):\n        super(DepthwiseSeparableConv, self).__init__()\n        # Use `groups` option to implement depthwise convolution\n        # [M, H, K] -> [M, H, K]\n        depthwise_conv = nn.Conv1d(in_channels, in_channels, kernel_size, stride=stride, padding=padding, dilation=dilation, groups=in_channels, bias=False)\n        if causal:\n            chomp = Chomp1d(padding)\n        prelu = nn.PReLU()\n        norm = chose_norm(norm_type, in_channels)\n        # [M, H, K] -> [M, B, K]\n        pointwise_conv = nn.Conv1d(in_channels, out_channels, 1, bias=False)\n        # Put together\n        if causal:\n            self.net = nn.Sequential(depthwise_conv, chomp, prelu, norm, pointwise_conv)\n        else:\n            self.net = nn.Sequential(depthwise_conv, prelu, norm, pointwise_conv)\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, H, K]\n        Returns:\n            result: [M, B, K]\n        \"\"\"\n        return self.net(x)\n\n\nclass Chomp1d(nn.Module):\n    \"\"\"To ensure the output length is the same as the input.\"\"\"\n\n    def __init__(self, chomp_size):\n        super(Chomp1d, self).__init__()\n        self.chomp_size = chomp_size\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, H, Kpad]\n        Returns:\n            [M, H, K]\n        \"\"\"\n        return x[:, :, : -self.chomp_size].contiguous()\n\n\ndef chose_norm(norm_type, channel_size):\n    \"\"\"The input of normlization will be (M, C, K), where M is batch size,\n    C is channel size and K is sequence length.\n    \"\"\"\n    if norm_type == \"gLN\":\n        return GlobalLayerNorm(channel_size)\n    elif norm_type == \"cLN\":\n        return ChannelwiseLayerNorm(channel_size)\n    elif norm_type == \"id\":\n        return nn.Identity()\n    else:  # norm_type == \"BN\":\n        # Given input (M, C, K), nn.BatchNorm1d(C) will accumulate statics\n        # along M and K, so this BN usage is right.\n        return nn.BatchNorm1d(channel_size)\n\n\n# TODO: Use nn.LayerNorm to impl cLN to speed up\nclass ChannelwiseLayerNorm(nn.Module):\n    \"\"\"Channel-wise Layer Normalization (cLN)\"\"\"\n\n    def __init__(self, channel_size):\n        super(ChannelwiseLayerNorm, self).__init__()\n        self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        self.gamma.data.fill_(1)\n        self.beta.data.zero_()\n\n    def forward(self, y):\n        \"\"\"\n        Args:\n            y: [M, N, K], M is batch size, N is channel size, K is length\n        Returns:\n            cLN_y: [M, N, K]\n        \"\"\"\n        mean = torch.mean(y, dim=1, keepdim=True)  # [M, 1, K]\n        var = torch.var(y, dim=1, keepdim=True, unbiased=False)  # [M, 1, K]\n        cLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta\n        return cLN_y\n\n\nclass GlobalLayerNorm(nn.Module):\n    \"\"\"Global Layer Normalization (gLN)\"\"\"\n\n    def __init__(self, channel_size):\n        super(GlobalLayerNorm, self).__init__()\n        self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        self.gamma.data.fill_(1)\n        self.beta.data.zero_()\n\n    def forward(self, y):\n        \"\"\"\n        Args:\n            y: [M, N, K], M is batch size, N is channel size, K is length\n        Returns:\n            gLN_y: [M, N, K]\n        \"\"\"\n        # TODO: in torch 1.0, torch.mean() support dim list\n        mean = y.mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)  # [M, 1, 1]\n        var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)\n        gLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta\n        return gLN_y\n\n\nif __name__ == \"__main__\":\n    torch.manual_seed(123)\n    M, N, L, T = 2, 3, 4, 12\n    K = 2 * T // L - 1\n    B, H, P, X, R, C, norm_type, causal = 2, 3, 3, 3, 2, 2, \"gLN\", False\n    mixture = torch.randint(3, (M, T))\n    # test Encoder\n    encoder = Encoder(L, N)\n    encoder.conv1d_U.weight.data = torch.randint(2, encoder.conv1d_U.weight.size())\n    mixture_w = encoder(mixture)\n    print(\"mixture\", mixture)\n    print(\"U\", encoder.conv1d_U.weight)\n    print(\"mixture_w\", mixture_w)\n    print(\"mixture_w size\", mixture_w.size())\n\n    # test TemporalConvNet\n    separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type=norm_type, causal=causal)\n    est_mask = separator(mixture_w)\n    print(\"est_mask\", est_mask)\n\n    # test Decoder\n    decoder = Decoder(N, L)\n    est_mask = torch.randint(2, (B, K, C, N))\n    est_source = decoder(mixture_w, est_mask)\n    print(\"est_source\", est_source)\n\n    # test Conv-TasNet\n    conv_tasnet = ConvTasNet(N, L, B, H, P, X, R, C, norm_type=norm_type)\n    est_source = conv_tasnet(mixture)\n    print(\"est_source\", est_source)\n    print(\"est_source size\", est_source.size())\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/tasnet_v2.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n#\n# Created on 2018/12\n# Author: Kaituo XU\n# Modified on 2019/11 by Alexandre Defossez, added support for multiple output channels\n# Here is the original license:\n# The MIT License (MIT)\n#\n# Copyright (c) 2018 Kaituo XU\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n# The above copyright notice and this permission notice shall be included in all\n# copies or substantial portions of the Software.\n#\n# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n# SOFTWARE.\n\nimport math\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom .utils import capture_init\n\nEPS = 1e-8\n\n\ndef overlap_and_add(signal, frame_step):\n    outer_dimensions = signal.size()[:-2]\n    frames, frame_length = signal.size()[-2:]\n\n    subframe_length = math.gcd(frame_length, frame_step)  # gcd=Greatest Common Divisor\n    subframe_step = frame_step // subframe_length\n    subframes_per_frame = frame_length // subframe_length\n    output_size = frame_step * (frames - 1) + frame_length\n    output_subframes = output_size // subframe_length\n\n    subframe_signal = signal.view(*outer_dimensions, -1, subframe_length)\n\n    frame = torch.arange(0, output_subframes, device=signal.device).unfold(0, subframes_per_frame, subframe_step)\n    frame = frame.long()  # signal may in GPU or CPU\n    frame = frame.contiguous().view(-1)\n\n    result = signal.new_zeros(*outer_dimensions, output_subframes, subframe_length)\n    result.index_add_(-2, frame, subframe_signal)\n    result = result.view(*outer_dimensions, -1)\n    return result\n\n\nclass ConvTasNet(nn.Module):\n    @capture_init\n    def __init__(self, sources, N=256, L=20, B=256, H=512, P=3, X=8, R=4, audio_channels=2, norm_type=\"gLN\", causal=False, mask_nonlinear=\"relu\", samplerate=44100, segment_length=44100 * 2 * 4):\n        \"\"\"\n        Args:\n            sources: list of sources\n            N: Number of filters in autoencoder\n            L: Length of the filters (in samples)\n            B: Number of channels in bottleneck 1 × 1-conv block\n            H: Number of channels in convolutional blocks\n            P: Kernel size in convolutional blocks\n            X: Number of convolutional blocks in each repeat\n            R: Number of repeats\n            norm_type: BN, gLN, cLN\n            causal: causal or non-causal\n            mask_nonlinear: use which non-linear function to generate mask\n        \"\"\"\n        super(ConvTasNet, self).__init__()\n        # Hyper-parameter\n        self.sources = sources\n        self.C = len(sources)\n        self.N, self.L, self.B, self.H, self.P, self.X, self.R = N, L, B, H, P, X, R\n        self.norm_type = norm_type\n        self.causal = causal\n        self.mask_nonlinear = mask_nonlinear\n        self.audio_channels = audio_channels\n        self.samplerate = samplerate\n        self.segment_length = segment_length\n        # Components\n        self.encoder = Encoder(L, N, audio_channels)\n        self.separator = TemporalConvNet(N, B, H, P, X, R, self.C, norm_type, causal, mask_nonlinear)\n        self.decoder = Decoder(N, L, audio_channels)\n        # init\n        for p in self.parameters():\n            if p.dim() > 1:\n                nn.init.xavier_normal_(p)\n\n    def valid_length(self, length):\n        return length\n\n    def forward(self, mixture):\n        \"\"\"\n        Args:\n            mixture: [M, T], M is batch size, T is #samples\n        Returns:\n            est_source: [M, C, T]\n        \"\"\"\n        mixture_w = self.encoder(mixture)\n        est_mask = self.separator(mixture_w)\n        est_source = self.decoder(mixture_w, est_mask)\n\n        # T changed after conv1d in encoder, fix it here\n        T_origin = mixture.size(-1)\n        T_conv = est_source.size(-1)\n        est_source = F.pad(est_source, (0, T_origin - T_conv))\n        return est_source\n\n\nclass Encoder(nn.Module):\n    \"\"\"Estimation of the nonnegative mixture weight by a 1-D conv layer.\"\"\"\n\n    def __init__(self, L, N, audio_channels):\n        super(Encoder, self).__init__()\n        # Hyper-parameter\n        self.L, self.N = L, N\n        # Components\n        # 50% overlap\n        self.conv1d_U = nn.Conv1d(audio_channels, N, kernel_size=L, stride=L // 2, bias=False)\n\n    def forward(self, mixture):\n        \"\"\"\n        Args:\n            mixture: [M, T], M is batch size, T is #samples\n        Returns:\n            mixture_w: [M, N, K], where K = (T-L)/(L/2)+1 = 2T/L-1\n        \"\"\"\n        mixture_w = F.relu(self.conv1d_U(mixture))  # [M, N, K]\n        return mixture_w\n\n\nclass Decoder(nn.Module):\n    def __init__(self, N, L, audio_channels):\n        super(Decoder, self).__init__()\n        # Hyper-parameter\n        self.N, self.L = N, L\n        self.audio_channels = audio_channels\n        # Components\n        self.basis_signals = nn.Linear(N, audio_channels * L, bias=False)\n\n    def forward(self, mixture_w, est_mask):\n        \"\"\"\n        Args:\n            mixture_w: [M, N, K]\n            est_mask: [M, C, N, K]\n        Returns:\n            est_source: [M, C, T]\n        \"\"\"\n        # D = W * M\n        source_w = torch.unsqueeze(mixture_w, 1) * est_mask  # [M, C, N, K]\n        source_w = torch.transpose(source_w, 2, 3)  # [M, C, K, N]\n        # S = DV\n        est_source = self.basis_signals(source_w)  # [M, C, K, ac * L]\n        m, c, k, _ = est_source.size()\n        est_source = est_source.view(m, c, k, self.audio_channels, -1).transpose(2, 3).contiguous()\n        est_source = overlap_and_add(est_source, self.L // 2)  # M x C x ac x T\n        return est_source\n\n\nclass TemporalConvNet(nn.Module):\n    def __init__(self, N, B, H, P, X, R, C, norm_type=\"gLN\", causal=False, mask_nonlinear=\"relu\"):\n        \"\"\"\n        Args:\n            N: Number of filters in autoencoder\n            B: Number of channels in bottleneck 1 × 1-conv block\n            H: Number of channels in convolutional blocks\n            P: Kernel size in convolutional blocks\n            X: Number of convolutional blocks in each repeat\n            R: Number of repeats\n            C: Number of speakers\n            norm_type: BN, gLN, cLN\n            causal: causal or non-causal\n            mask_nonlinear: use which non-linear function to generate mask\n        \"\"\"\n        super(TemporalConvNet, self).__init__()\n        # Hyper-parameter\n        self.C = C\n        self.mask_nonlinear = mask_nonlinear\n        # Components\n        # [M, N, K] -> [M, N, K]\n        layer_norm = ChannelwiseLayerNorm(N)\n        # [M, N, K] -> [M, B, K]\n        bottleneck_conv1x1 = nn.Conv1d(N, B, 1, bias=False)\n        # [M, B, K] -> [M, B, K]\n        repeats = []\n        for r in range(R):\n            blocks = []\n            for x in range(X):\n                dilation = 2**x\n                padding = (P - 1) * dilation if causal else (P - 1) * dilation // 2\n                blocks += [TemporalBlock(B, H, P, stride=1, padding=padding, dilation=dilation, norm_type=norm_type, causal=causal)]\n            repeats += [nn.Sequential(*blocks)]\n        temporal_conv_net = nn.Sequential(*repeats)\n        # [M, B, K] -> [M, C*N, K]\n        mask_conv1x1 = nn.Conv1d(B, C * N, 1, bias=False)\n        # Put together\n        self.network = nn.Sequential(layer_norm, bottleneck_conv1x1, temporal_conv_net, mask_conv1x1)\n\n    def forward(self, mixture_w):\n        \"\"\"\n        Keep this API same with TasNet\n        Args:\n            mixture_w: [M, N, K], M is batch size\n        returns:\n            est_mask: [M, C, N, K]\n        \"\"\"\n        M, N, K = mixture_w.size()\n        score = self.network(mixture_w)  # [M, N, K] -> [M, C*N, K]\n        score = score.view(M, self.C, N, K)  # [M, C*N, K] -> [M, C, N, K]\n        if self.mask_nonlinear == \"softmax\":\n            est_mask = F.softmax(score, dim=1)\n        elif self.mask_nonlinear == \"relu\":\n            est_mask = F.relu(score)\n        else:\n            raise ValueError(\"Unsupported mask non-linear function\")\n        return est_mask\n\n\nclass TemporalBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type=\"gLN\", causal=False):\n        super(TemporalBlock, self).__init__()\n        # [M, B, K] -> [M, H, K]\n        conv1x1 = nn.Conv1d(in_channels, out_channels, 1, bias=False)\n        prelu = nn.PReLU()\n        norm = chose_norm(norm_type, out_channels)\n        # [M, H, K] -> [M, B, K]\n        dsconv = DepthwiseSeparableConv(out_channels, in_channels, kernel_size, stride, padding, dilation, norm_type, causal)\n        # Put together\n        self.net = nn.Sequential(conv1x1, prelu, norm, dsconv)\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, B, K]\n        Returns:\n            [M, B, K]\n        \"\"\"\n        residual = x\n        out = self.net(x)\n        # TODO: when P = 3 here works fine, but when P = 2 maybe need to pad?\n        return out + residual  # look like w/o F.relu is better than w/ F.relu\n        # return F.relu(out + residual)\n\n\nclass DepthwiseSeparableConv(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type=\"gLN\", causal=False):\n        super(DepthwiseSeparableConv, self).__init__()\n        # Use `groups` option to implement depthwise convolution\n        # [M, H, K] -> [M, H, K]\n        depthwise_conv = nn.Conv1d(in_channels, in_channels, kernel_size, stride=stride, padding=padding, dilation=dilation, groups=in_channels, bias=False)\n        if causal:\n            chomp = Chomp1d(padding)\n        prelu = nn.PReLU()\n        norm = chose_norm(norm_type, in_channels)\n        # [M, H, K] -> [M, B, K]\n        pointwise_conv = nn.Conv1d(in_channels, out_channels, 1, bias=False)\n        # Put together\n        if causal:\n            self.net = nn.Sequential(depthwise_conv, chomp, prelu, norm, pointwise_conv)\n        else:\n            self.net = nn.Sequential(depthwise_conv, prelu, norm, pointwise_conv)\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, H, K]\n        Returns:\n            result: [M, B, K]\n        \"\"\"\n        return self.net(x)\n\n\nclass Chomp1d(nn.Module):\n    \"\"\"To ensure the output length is the same as the input.\"\"\"\n\n    def __init__(self, chomp_size):\n        super(Chomp1d, self).__init__()\n        self.chomp_size = chomp_size\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: [M, H, Kpad]\n        Returns:\n            [M, H, K]\n        \"\"\"\n        return x[:, :, : -self.chomp_size].contiguous()\n\n\ndef chose_norm(norm_type, channel_size):\n    \"\"\"The input of normlization will be (M, C, K), where M is batch size,\n    C is channel size and K is sequence length.\n    \"\"\"\n    if norm_type == \"gLN\":\n        return GlobalLayerNorm(channel_size)\n    elif norm_type == \"cLN\":\n        return ChannelwiseLayerNorm(channel_size)\n    elif norm_type == \"id\":\n        return nn.Identity()\n    else:  # norm_type == \"BN\":\n        # Given input (M, C, K), nn.BatchNorm1d(C) will accumulate statics\n        # along M and K, so this BN usage is right.\n        return nn.BatchNorm1d(channel_size)\n\n\n# TODO: Use nn.LayerNorm to impl cLN to speed up\nclass ChannelwiseLayerNorm(nn.Module):\n    \"\"\"Channel-wise Layer Normalization (cLN)\"\"\"\n\n    def __init__(self, channel_size):\n        super(ChannelwiseLayerNorm, self).__init__()\n        self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        self.gamma.data.fill_(1)\n        self.beta.data.zero_()\n\n    def forward(self, y):\n        \"\"\"\n        Args:\n            y: [M, N, K], M is batch size, N is channel size, K is length\n        Returns:\n            cLN_y: [M, N, K]\n        \"\"\"\n        mean = torch.mean(y, dim=1, keepdim=True)  # [M, 1, K]\n        var = torch.var(y, dim=1, keepdim=True, unbiased=False)  # [M, 1, K]\n        cLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta\n        return cLN_y\n\n\nclass GlobalLayerNorm(nn.Module):\n    \"\"\"Global Layer Normalization (gLN)\"\"\"\n\n    def __init__(self, channel_size):\n        super(GlobalLayerNorm, self).__init__()\n        self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1))  # [1, N, 1]\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        self.gamma.data.fill_(1)\n        self.beta.data.zero_()\n\n    def forward(self, y):\n        \"\"\"\n        Args:\n            y: [M, N, K], M is batch size, N is channel size, K is length\n        Returns:\n            gLN_y: [M, N, K]\n        \"\"\"\n        # TODO: in torch 1.0, torch.mean() support dim list\n        mean = y.mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)  # [M, 1, 1]\n        var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)\n        gLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta\n        return gLN_y\n\n\nif __name__ == \"__main__\":\n    torch.manual_seed(123)\n    M, N, L, T = 2, 3, 4, 12\n    K = 2 * T // L - 1\n    B, H, P, X, R, C, norm_type, causal = 2, 3, 3, 3, 2, 2, \"gLN\", False\n    mixture = torch.randint(3, (M, T))\n    # test Encoder\n    encoder = Encoder(L, N)\n    encoder.conv1d_U.weight.data = torch.randint(2, encoder.conv1d_U.weight.size())\n    mixture_w = encoder(mixture)\n    print(\"mixture\", mixture)\n    print(\"U\", encoder.conv1d_U.weight)\n    print(\"mixture_w\", mixture_w)\n    print(\"mixture_w size\", mixture_w.size())\n\n    # test TemporalConvNet\n    separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type=norm_type, causal=causal)\n    est_mask = separator(mixture_w)\n    print(\"est_mask\", est_mask)\n\n    # test Decoder\n    decoder = Decoder(N, L)\n    est_mask = torch.randint(2, (B, K, C, N))\n    est_source = decoder(mixture_w, est_mask)\n    print(\"est_source\", est_source)\n\n    # test Conv-TasNet\n    conv_tasnet = ConvTasNet(N, L, B, H, P, X, R, C, norm_type=norm_type)\n    est_source = conv_tasnet(mixture)\n    print(\"est_source\", est_source)\n    print(\"est_source size\", est_source.size())\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/transformer.py",
    "content": "# Copyright (c) 2019-present, Meta, Inc.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n# First author is Simon Rouard.\n\nimport random\nimport typing as tp\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nimport math\nfrom einops import rearrange\n\n\ndef create_sin_embedding(length: int, dim: int, shift: int = 0, device=\"cpu\", max_period=10000):\n    # We aim for TBC format\n    assert dim % 2 == 0\n    pos = shift + torch.arange(length, device=device).view(-1, 1, 1)\n    half_dim = dim // 2\n    adim = torch.arange(dim // 2, device=device).view(1, 1, -1)\n    phase = pos / (max_period ** (adim / (half_dim - 1)))\n    return torch.cat([torch.cos(phase), torch.sin(phase)], dim=-1)\n\n\ndef create_2d_sin_embedding(d_model, height, width, device=\"cpu\", max_period=10000):\n    \"\"\"\n    :param d_model: dimension of the model\n    :param height: height of the positions\n    :param width: width of the positions\n    :return: d_model*height*width position matrix\n    \"\"\"\n    if d_model % 4 != 0:\n        raise ValueError(\"Cannot use sin/cos positional encoding with \" \"odd dimension (got dim={:d})\".format(d_model))\n    pe = torch.zeros(d_model, height, width)\n    # Each dimension use half of d_model\n    d_model = int(d_model / 2)\n    div_term = torch.exp(torch.arange(0.0, d_model, 2) * -(math.log(max_period) / d_model))\n    pos_w = torch.arange(0.0, width).unsqueeze(1)\n    pos_h = torch.arange(0.0, height).unsqueeze(1)\n    pe[0:d_model:2, :, :] = torch.sin(pos_w * div_term).transpose(0, 1).unsqueeze(1).repeat(1, height, 1)\n    pe[1:d_model:2, :, :] = torch.cos(pos_w * div_term).transpose(0, 1).unsqueeze(1).repeat(1, height, 1)\n    pe[d_model::2, :, :] = torch.sin(pos_h * div_term).transpose(0, 1).unsqueeze(2).repeat(1, 1, width)\n    pe[d_model + 1 :: 2, :, :] = torch.cos(pos_h * div_term).transpose(0, 1).unsqueeze(2).repeat(1, 1, width)\n\n    return pe[None, :].to(device)\n\n\ndef create_sin_embedding_cape(\n    length: int,\n    dim: int,\n    batch_size: int,\n    mean_normalize: bool,\n    augment: bool,  # True during training\n    max_global_shift: float = 0.0,  # delta max\n    max_local_shift: float = 0.0,  # epsilon max\n    max_scale: float = 1.0,\n    device: str = \"cpu\",\n    max_period: float = 10000.0,\n):\n    # We aim for TBC format\n    assert dim % 2 == 0\n    pos = 1.0 * torch.arange(length).view(-1, 1, 1)  # (length, 1, 1)\n    pos = pos.repeat(1, batch_size, 1)  # (length, batch_size, 1)\n    if mean_normalize:\n        pos -= torch.nanmean(pos, dim=0, keepdim=True)\n\n    if augment:\n        delta = np.random.uniform(-max_global_shift, +max_global_shift, size=[1, batch_size, 1])\n        delta_local = np.random.uniform(-max_local_shift, +max_local_shift, size=[length, batch_size, 1])\n        log_lambdas = np.random.uniform(-np.log(max_scale), +np.log(max_scale), size=[1, batch_size, 1])\n        pos = (pos + delta + delta_local) * np.exp(log_lambdas)\n\n    pos = pos.to(device)\n\n    half_dim = dim // 2\n    adim = torch.arange(dim // 2, device=device).view(1, 1, -1)\n    phase = pos / (max_period ** (adim / (half_dim - 1)))\n    return torch.cat([torch.cos(phase), torch.sin(phase)], dim=-1).float()\n\n\ndef get_causal_mask(length):\n    pos = torch.arange(length)\n    return pos > pos[:, None]\n\n\ndef get_elementary_mask(T1, T2, mask_type, sparse_attn_window, global_window, mask_random_seed, sparsity, device):\n    \"\"\"\n    When the input of the Decoder has length T1 and the output T2\n    The mask matrix has shape (T2, T1)\n    \"\"\"\n    assert mask_type in [\"diag\", \"jmask\", \"random\", \"global\"]\n\n    if mask_type == \"global\":\n        mask = torch.zeros(T2, T1, dtype=torch.bool)\n        mask[:, :global_window] = True\n        line_window = int(global_window * T2 / T1)\n        mask[:line_window, :] = True\n\n    if mask_type == \"diag\":\n\n        mask = torch.zeros(T2, T1, dtype=torch.bool)\n        rows = torch.arange(T2)[:, None]\n        cols = (T1 / T2 * rows + torch.arange(-sparse_attn_window, sparse_attn_window + 1)).long().clamp(0, T1 - 1)\n        mask.scatter_(1, cols, torch.ones(1, dtype=torch.bool).expand_as(cols))\n\n    elif mask_type == \"jmask\":\n        mask = torch.zeros(T2 + 2, T1 + 2, dtype=torch.bool)\n        rows = torch.arange(T2 + 2)[:, None]\n        t = torch.arange(0, int((2 * T1) ** 0.5 + 1))\n        t = (t * (t + 1) / 2).int()\n        t = torch.cat([-t.flip(0)[:-1], t])\n        cols = (T1 / T2 * rows + t).long().clamp(0, T1 + 1)\n        mask.scatter_(1, cols, torch.ones(1, dtype=torch.bool).expand_as(cols))\n        mask = mask[1:-1, 1:-1]\n\n    elif mask_type == \"random\":\n        gene = torch.Generator(device=device)\n        gene.manual_seed(mask_random_seed)\n        mask = torch.rand(T1 * T2, generator=gene, device=device).reshape(T2, T1) > sparsity\n\n    mask = mask.to(device)\n    return mask\n\n\ndef get_mask(T1, T2, mask_type, sparse_attn_window, global_window, mask_random_seed, sparsity, device):\n    \"\"\"\n    Return a SparseCSRTensor mask that is a combination of elementary masks\n    mask_type can be a combination of multiple masks: for instance \"diag_jmask_random\"\n    \"\"\"\n    from xformers.sparse import SparseCSRTensor\n\n    # create a list\n    mask_types = mask_type.split(\"_\")\n\n    all_masks = [get_elementary_mask(T1, T2, mask, sparse_attn_window, global_window, mask_random_seed, sparsity, device) for mask in mask_types]\n\n    final_mask = torch.stack(all_masks).sum(axis=0) > 0\n\n    return SparseCSRTensor.from_dense(final_mask[None])\n\n\nclass ScaledEmbedding(nn.Module):\n    def __init__(self, num_embeddings: int, embedding_dim: int, scale: float = 1.0, boost: float = 3.0):\n        super().__init__()\n        self.embedding = nn.Embedding(num_embeddings, embedding_dim)\n        self.embedding.weight.data *= scale / boost\n        self.boost = boost\n\n    @property\n    def weight(self):\n        return self.embedding.weight * self.boost\n\n    def forward(self, x):\n        return self.embedding(x) * self.boost\n\n\nclass LayerScale(nn.Module):\n    \"\"\"Layer scale from [Touvron et al 2021] (https://arxiv.org/pdf/2103.17239.pdf).\n    This rescales diagonaly residual outputs close to 0 initially, then learnt.\n    \"\"\"\n\n    def __init__(self, channels: int, init: float = 0, channel_last=False):\n        \"\"\"\n        channel_last = False corresponds to (B, C, T) tensors\n        channel_last = True corresponds to (T, B, C) tensors\n        \"\"\"\n        super().__init__()\n        self.channel_last = channel_last\n        self.scale = nn.Parameter(torch.zeros(channels, requires_grad=True))\n        self.scale.data[:] = init\n\n    def forward(self, x):\n        if self.channel_last:\n            return self.scale * x\n        else:\n            return self.scale[:, None] * x\n\n\nclass MyGroupNorm(nn.GroupNorm):\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n\n    def forward(self, x):\n        \"\"\"\n        x: (B, T, C)\n        if num_groups=1: Normalisation on all T and C together for each B\n        \"\"\"\n        x = x.transpose(1, 2)\n        return super().forward(x).transpose(1, 2)\n\n\nclass MyTransformerEncoderLayer(nn.TransformerEncoderLayer):\n    def __init__(\n        self,\n        d_model,\n        nhead,\n        dim_feedforward=2048,\n        dropout=0.1,\n        activation=F.relu,\n        group_norm=0,\n        norm_first=False,\n        norm_out=False,\n        layer_norm_eps=1e-5,\n        layer_scale=False,\n        init_values=1e-4,\n        device=None,\n        dtype=None,\n        sparse=False,\n        mask_type=\"diag\",\n        mask_random_seed=42,\n        sparse_attn_window=500,\n        global_window=50,\n        auto_sparsity=False,\n        sparsity=0.95,\n        batch_first=False,\n    ):\n        factory_kwargs = {\"device\": device, \"dtype\": dtype}\n        super().__init__(\n            d_model=d_model,\n            nhead=nhead,\n            dim_feedforward=dim_feedforward,\n            dropout=dropout,\n            activation=activation,\n            layer_norm_eps=layer_norm_eps,\n            batch_first=batch_first,\n            norm_first=norm_first,\n            device=device,\n            dtype=dtype,\n        )\n        self.sparse = sparse\n        self.auto_sparsity = auto_sparsity\n        if sparse:\n            if not auto_sparsity:\n                self.mask_type = mask_type\n                self.sparse_attn_window = sparse_attn_window\n                self.global_window = global_window\n            self.sparsity = sparsity\n        if group_norm:\n            self.norm1 = MyGroupNorm(int(group_norm), d_model, eps=layer_norm_eps, **factory_kwargs)\n            self.norm2 = MyGroupNorm(int(group_norm), d_model, eps=layer_norm_eps, **factory_kwargs)\n\n        self.norm_out = None\n        if self.norm_first & norm_out:\n            self.norm_out = MyGroupNorm(num_groups=int(norm_out), num_channels=d_model)\n        self.gamma_1 = LayerScale(d_model, init_values, True) if layer_scale else nn.Identity()\n        self.gamma_2 = LayerScale(d_model, init_values, True) if layer_scale else nn.Identity()\n\n        if sparse:\n            self.self_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first, auto_sparsity=sparsity if auto_sparsity else 0)\n            self.__setattr__(\"src_mask\", torch.zeros(1, 1))\n            self.mask_random_seed = mask_random_seed\n\n    def forward(self, src, src_mask=None, src_key_padding_mask=None):\n        \"\"\"\n        if batch_first = False, src shape is (T, B, C)\n        the case where batch_first=True is not covered\n        \"\"\"\n        device = src.device\n        x = src\n        T, B, C = x.shape\n        if self.sparse and not self.auto_sparsity:\n            assert src_mask is None\n            src_mask = self.src_mask\n            if src_mask.shape[-1] != T:\n                src_mask = get_mask(T, T, self.mask_type, self.sparse_attn_window, self.global_window, self.mask_random_seed, self.sparsity, device)\n                self.__setattr__(\"src_mask\", src_mask)\n\n        if self.norm_first:\n            x = x + self.gamma_1(self._sa_block(self.norm1(x), src_mask, src_key_padding_mask))\n            x = x + self.gamma_2(self._ff_block(self.norm2(x)))\n\n            if self.norm_out:\n                x = self.norm_out(x)\n        else:\n            x = self.norm1(x + self.gamma_1(self._sa_block(x, src_mask, src_key_padding_mask)))\n            x = self.norm2(x + self.gamma_2(self._ff_block(x)))\n\n        return x\n\n\nclass CrossTransformerEncoderLayer(nn.Module):\n    def __init__(\n        self,\n        d_model: int,\n        nhead: int,\n        dim_feedforward: int = 2048,\n        dropout: float = 0.1,\n        activation=F.relu,\n        layer_norm_eps: float = 1e-5,\n        layer_scale: bool = False,\n        init_values: float = 1e-4,\n        norm_first: bool = False,\n        group_norm: bool = False,\n        norm_out: bool = False,\n        sparse=False,\n        mask_type=\"diag\",\n        mask_random_seed=42,\n        sparse_attn_window=500,\n        global_window=50,\n        sparsity=0.95,\n        auto_sparsity=None,\n        device=None,\n        dtype=None,\n        batch_first=False,\n    ):\n        factory_kwargs = {\"device\": device, \"dtype\": dtype}\n        super().__init__()\n\n        self.sparse = sparse\n        self.auto_sparsity = auto_sparsity\n        if sparse:\n            if not auto_sparsity:\n                self.mask_type = mask_type\n                self.sparse_attn_window = sparse_attn_window\n                self.global_window = global_window\n            self.sparsity = sparsity\n\n        self.cross_attn: nn.Module\n        self.cross_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first)\n        # Implementation of Feedforward model\n        self.linear1 = nn.Linear(d_model, dim_feedforward, **factory_kwargs)\n        self.dropout = nn.Dropout(dropout)\n        self.linear2 = nn.Linear(dim_feedforward, d_model, **factory_kwargs)\n\n        self.norm_first = norm_first\n        self.norm1: nn.Module\n        self.norm2: nn.Module\n        self.norm3: nn.Module\n        if group_norm:\n            self.norm1 = MyGroupNorm(int(group_norm), d_model, eps=layer_norm_eps, **factory_kwargs)\n            self.norm2 = MyGroupNorm(int(group_norm), d_model, eps=layer_norm_eps, **factory_kwargs)\n            self.norm3 = MyGroupNorm(int(group_norm), d_model, eps=layer_norm_eps, **factory_kwargs)\n        else:\n            self.norm1 = nn.LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)\n            self.norm2 = nn.LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)\n            self.norm3 = nn.LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)\n\n        self.norm_out = None\n        if self.norm_first & norm_out:\n            self.norm_out = MyGroupNorm(num_groups=int(norm_out), num_channels=d_model)\n\n        self.gamma_1 = LayerScale(d_model, init_values, True) if layer_scale else nn.Identity()\n        self.gamma_2 = LayerScale(d_model, init_values, True) if layer_scale else nn.Identity()\n\n        self.dropout1 = nn.Dropout(dropout)\n        self.dropout2 = nn.Dropout(dropout)\n\n        # Legacy string support for activation function.\n        if isinstance(activation, str):\n            self.activation = self._get_activation_fn(activation)\n        else:\n            self.activation = activation\n\n        if sparse:\n            self.cross_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first, auto_sparsity=sparsity if auto_sparsity else 0)\n            if not auto_sparsity:\n                self.__setattr__(\"mask\", torch.zeros(1, 1))\n                self.mask_random_seed = mask_random_seed\n\n    def forward(self, q, k, mask=None):\n        \"\"\"\n        Args:\n            q: tensor of shape (T, B, C)\n            k: tensor of shape (S, B, C)\n            mask: tensor of shape (T, S)\n\n        \"\"\"\n        device = q.device\n        T, B, C = q.shape\n        S, B, C = k.shape\n        if self.sparse and not self.auto_sparsity:\n            assert mask is None\n            mask = self.mask\n            if mask.shape[-1] != S or mask.shape[-2] != T:\n                mask = get_mask(S, T, self.mask_type, self.sparse_attn_window, self.global_window, self.mask_random_seed, self.sparsity, device)\n                self.__setattr__(\"mask\", mask)\n\n        if self.norm_first:\n            x = q + self.gamma_1(self._ca_block(self.norm1(q), self.norm2(k), mask))\n            x = x + self.gamma_2(self._ff_block(self.norm3(x)))\n            if self.norm_out:\n                x = self.norm_out(x)\n        else:\n            x = self.norm1(q + self.gamma_1(self._ca_block(q, k, mask)))\n            x = self.norm2(x + self.gamma_2(self._ff_block(x)))\n\n        return x\n\n    # self-attention block\n    def _ca_block(self, q, k, attn_mask=None):\n        x = self.cross_attn(q, k, k, attn_mask=attn_mask, need_weights=False)[0]\n        return self.dropout1(x)\n\n    # feed forward block\n    def _ff_block(self, x):\n        x = self.linear2(self.dropout(self.activation(self.linear1(x))))\n        return self.dropout2(x)\n\n    def _get_activation_fn(self, activation):\n        if activation == \"relu\":\n            return F.relu\n        elif activation == \"gelu\":\n            return F.gelu\n\n        raise RuntimeError(\"activation should be relu/gelu, not {}\".format(activation))\n\n\n# ----------------- MULTI-BLOCKS MODELS: -----------------------\n\n\nclass CrossTransformerEncoder(nn.Module):\n    def __init__(\n        self,\n        dim: int,\n        emb: str = \"sin\",\n        hidden_scale: float = 4.0,\n        num_heads: int = 8,\n        num_layers: int = 6,\n        cross_first: bool = False,\n        dropout: float = 0.0,\n        max_positions: int = 1000,\n        norm_in: bool = True,\n        norm_in_group: bool = False,\n        group_norm: int = False,\n        norm_first: bool = False,\n        norm_out: bool = False,\n        max_period: float = 10000.0,\n        weight_decay: float = 0.0,\n        lr: tp.Optional[float] = None,\n        layer_scale: bool = False,\n        gelu: bool = True,\n        sin_random_shift: int = 0,\n        weight_pos_embed: float = 1.0,\n        cape_mean_normalize: bool = True,\n        cape_augment: bool = True,\n        cape_glob_loc_scale: list = [5000.0, 1.0, 1.4],\n        sparse_self_attn: bool = False,\n        sparse_cross_attn: bool = False,\n        mask_type: str = \"diag\",\n        mask_random_seed: int = 42,\n        sparse_attn_window: int = 500,\n        global_window: int = 50,\n        auto_sparsity: bool = False,\n        sparsity: float = 0.95,\n    ):\n        super().__init__()\n        \"\"\"\n        \"\"\"\n        assert dim % num_heads == 0\n\n        hidden_dim = int(dim * hidden_scale)\n\n        self.num_layers = num_layers\n        # classic parity = 1 means that if idx%2 == 1 there is a\n        # classical encoder else there is a cross encoder\n        self.classic_parity = 1 if cross_first else 0\n        self.emb = emb\n        self.max_period = max_period\n        self.weight_decay = weight_decay\n        self.weight_pos_embed = weight_pos_embed\n        self.sin_random_shift = sin_random_shift\n        if emb == \"cape\":\n            self.cape_mean_normalize = cape_mean_normalize\n            self.cape_augment = cape_augment\n            self.cape_glob_loc_scale = cape_glob_loc_scale\n        if emb == \"scaled\":\n            self.position_embeddings = ScaledEmbedding(max_positions, dim, scale=0.2)\n\n        self.lr = lr\n\n        activation: tp.Any = F.gelu if gelu else F.relu\n\n        self.norm_in: nn.Module\n        self.norm_in_t: nn.Module\n        if norm_in:\n            self.norm_in = nn.LayerNorm(dim)\n            self.norm_in_t = nn.LayerNorm(dim)\n        elif norm_in_group:\n            self.norm_in = MyGroupNorm(int(norm_in_group), dim)\n            self.norm_in_t = MyGroupNorm(int(norm_in_group), dim)\n        else:\n            self.norm_in = nn.Identity()\n            self.norm_in_t = nn.Identity()\n\n        # spectrogram layers\n        self.layers = nn.ModuleList()\n        # temporal layers\n        self.layers_t = nn.ModuleList()\n\n        kwargs_common = {\n            \"d_model\": dim,\n            \"nhead\": num_heads,\n            \"dim_feedforward\": hidden_dim,\n            \"dropout\": dropout,\n            \"activation\": activation,\n            \"group_norm\": group_norm,\n            \"norm_first\": norm_first,\n            \"norm_out\": norm_out,\n            \"layer_scale\": layer_scale,\n            \"mask_type\": mask_type,\n            \"mask_random_seed\": mask_random_seed,\n            \"sparse_attn_window\": sparse_attn_window,\n            \"global_window\": global_window,\n            \"sparsity\": sparsity,\n            \"auto_sparsity\": auto_sparsity,\n            \"batch_first\": True,\n        }\n\n        kwargs_classic_encoder = dict(kwargs_common)\n        kwargs_classic_encoder.update({\"sparse\": sparse_self_attn})\n        kwargs_cross_encoder = dict(kwargs_common)\n        kwargs_cross_encoder.update({\"sparse\": sparse_cross_attn})\n\n        for idx in range(num_layers):\n            if idx % 2 == self.classic_parity:\n\n                self.layers.append(MyTransformerEncoderLayer(**kwargs_classic_encoder))\n                self.layers_t.append(MyTransformerEncoderLayer(**kwargs_classic_encoder))\n\n            else:\n                self.layers.append(CrossTransformerEncoderLayer(**kwargs_cross_encoder))\n\n                self.layers_t.append(CrossTransformerEncoderLayer(**kwargs_cross_encoder))\n\n    def forward(self, x, xt):\n        B, C, Fr, T1 = x.shape\n        pos_emb_2d = create_2d_sin_embedding(C, Fr, T1, x.device, self.max_period)  # (1, C, Fr, T1)\n        pos_emb_2d = rearrange(pos_emb_2d, \"b c fr t1 -> b (t1 fr) c\")\n        x = rearrange(x, \"b c fr t1 -> b (t1 fr) c\")\n        x = self.norm_in(x)\n        x = x + self.weight_pos_embed * pos_emb_2d\n\n        B, C, T2 = xt.shape\n        xt = rearrange(xt, \"b c t2 -> b t2 c\")  # now T2, B, C\n        pos_emb = self._get_pos_embedding(T2, B, C, x.device)\n        pos_emb = rearrange(pos_emb, \"t2 b c -> b t2 c\")\n        xt = self.norm_in_t(xt)\n        xt = xt + self.weight_pos_embed * pos_emb\n\n        for idx in range(self.num_layers):\n            if idx % 2 == self.classic_parity:\n                x = self.layers[idx](x)\n                xt = self.layers_t[idx](xt)\n            else:\n                old_x = x\n                x = self.layers[idx](x, xt)\n                xt = self.layers_t[idx](xt, old_x)\n\n        x = rearrange(x, \"b (t1 fr) c -> b c fr t1\", t1=T1)\n        xt = rearrange(xt, \"b t2 c -> b c t2\")\n        return x, xt\n\n    def _get_pos_embedding(self, T, B, C, device):\n        if self.emb == \"sin\":\n            shift = random.randrange(self.sin_random_shift + 1)\n            pos_emb = create_sin_embedding(T, C, shift=shift, device=device, max_period=self.max_period)\n        elif self.emb == \"cape\":\n            if self.training:\n                pos_emb = create_sin_embedding_cape(\n                    T,\n                    C,\n                    B,\n                    device=device,\n                    max_period=self.max_period,\n                    mean_normalize=self.cape_mean_normalize,\n                    augment=self.cape_augment,\n                    max_global_shift=self.cape_glob_loc_scale[0],\n                    max_local_shift=self.cape_glob_loc_scale[1],\n                    max_scale=self.cape_glob_loc_scale[2],\n                )\n            else:\n                pos_emb = create_sin_embedding_cape(T, C, B, device=device, max_period=self.max_period, mean_normalize=self.cape_mean_normalize, augment=False)\n\n        elif self.emb == \"scaled\":\n            pos = torch.arange(T, device=device)\n            pos_emb = self.position_embeddings(pos)[:, None]\n\n        return pos_emb\n\n    def make_optim_group(self):\n        group = {\"params\": list(self.parameters()), \"weight_decay\": self.weight_decay}\n        if self.lr is not None:\n            group[\"lr\"] = self.lr\n        return group\n\n\n# Attention Modules\n\n\nclass MultiheadAttention(nn.Module):\n    def __init__(self, embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None, batch_first=False, auto_sparsity=None):\n        super().__init__()\n        assert auto_sparsity is not None, \"sanity check\"\n        self.num_heads = num_heads\n        self.q = torch.nn.Linear(embed_dim, embed_dim, bias=bias)\n        self.k = torch.nn.Linear(embed_dim, embed_dim, bias=bias)\n        self.v = torch.nn.Linear(embed_dim, embed_dim, bias=bias)\n        self.attn_drop = torch.nn.Dropout(dropout)\n        self.proj = torch.nn.Linear(embed_dim, embed_dim, bias)\n        self.proj_drop = torch.nn.Dropout(dropout)\n        self.batch_first = batch_first\n        self.auto_sparsity = auto_sparsity\n\n    def forward(self, query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None, average_attn_weights=True):\n\n        if not self.batch_first:  # N, B, C\n            query = query.permute(1, 0, 2)  # B, N_q, C\n            key = key.permute(1, 0, 2)  # B, N_k, C\n            value = value.permute(1, 0, 2)  # B, N_k, C\n        B, N_q, C = query.shape\n        B, N_k, C = key.shape\n\n        q = self.q(query).reshape(B, N_q, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)\n        q = q.flatten(0, 1)\n        k = self.k(key).reshape(B, N_k, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)\n        k = k.flatten(0, 1)\n        v = self.v(value).reshape(B, N_k, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)\n        v = v.flatten(0, 1)\n\n        if self.auto_sparsity:\n            assert attn_mask is None\n            x = dynamic_sparse_attention(q, k, v, sparsity=self.auto_sparsity)\n        else:\n            x = scaled_dot_product_attention(q, k, v, attn_mask, dropout=self.attn_drop)\n        x = x.reshape(B, self.num_heads, N_q, C // self.num_heads)\n\n        x = x.transpose(1, 2).reshape(B, N_q, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        if not self.batch_first:\n            x = x.permute(1, 0, 2)\n        return x, None\n\n\ndef scaled_query_key_softmax(q, k, att_mask):\n    from xformers.ops import masked_matmul\n\n    q = q / (k.size(-1)) ** 0.5\n    att = masked_matmul(q, k.transpose(-2, -1), att_mask)\n    att = torch.nn.functional.softmax(att, -1)\n    return att\n\n\ndef scaled_dot_product_attention(q, k, v, att_mask, dropout):\n    att = scaled_query_key_softmax(q, k, att_mask=att_mask)\n    att = dropout(att)\n    y = att @ v\n    return y\n\n\ndef _compute_buckets(x, R):\n    qq = torch.einsum(\"btf,bfhi->bhti\", x, R)\n    qq = torch.cat([qq, -qq], dim=-1)\n    buckets = qq.argmax(dim=-1)\n\n    return buckets.permute(0, 2, 1).byte().contiguous()\n\n\ndef dynamic_sparse_attention(query, key, value, sparsity, infer_sparsity=True, attn_bias=None):\n    # assert False, \"The code for the custom sparse kernel is not ready for release yet.\"\n    from xformers.ops import find_locations, sparse_memory_efficient_attention\n\n    n_hashes = 32\n    proj_size = 4\n    query, key, value = [x.contiguous() for x in [query, key, value]]\n    with torch.no_grad():\n        R = torch.randn(1, query.shape[-1], n_hashes, proj_size // 2, device=query.device)\n        bucket_query = _compute_buckets(query, R)\n        bucket_key = _compute_buckets(key, R)\n        row_offsets, column_indices = find_locations(bucket_query, bucket_key, sparsity, infer_sparsity)\n    return sparse_memory_efficient_attention(query, key, value, row_offsets, column_indices, attn_bias)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/demucs/utils.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nfrom collections import defaultdict\nfrom contextlib import contextmanager\nimport math\nimport os\nimport tempfile\nimport typing as tp\n\nimport errno\nimport functools\nimport hashlib\nimport inspect\nimport io\nimport os\nimport random\nimport socket\nimport tempfile\nimport warnings\nimport zlib\n\nfrom diffq import UniformQuantizer, DiffQuantizer\nimport torch as th\nimport tqdm\nfrom torch import distributed\nfrom torch.nn import functional as F\n\nimport torch\n\n\ndef unfold(a, kernel_size, stride):\n    \"\"\"Given input of size [*OT, T], output Tensor of size [*OT, F, K]\n    with K the kernel size, by extracting frames with the given stride.\n\n    This will pad the input so that `F = ceil(T / K)`.\n\n    see https://github.com/pytorch/pytorch/issues/60466\n    \"\"\"\n    *shape, length = a.shape\n    n_frames = math.ceil(length / stride)\n    tgt_length = (n_frames - 1) * stride + kernel_size\n    a = F.pad(a, (0, tgt_length - length))\n    strides = list(a.stride())\n    assert strides[-1] == 1, \"data should be contiguous\"\n    strides = strides[:-1] + [stride, 1]\n    return a.as_strided([*shape, n_frames, kernel_size], strides)\n\n\ndef center_trim(tensor: torch.Tensor, reference: tp.Union[torch.Tensor, int]):\n    \"\"\"\n    Center trim `tensor` with respect to `reference`, along the last dimension.\n    `reference` can also be a number, representing the length to trim to.\n    If the size difference != 0 mod 2, the extra sample is removed on the right side.\n    \"\"\"\n    ref_size: int\n    if isinstance(reference, torch.Tensor):\n        ref_size = reference.size(-1)\n    else:\n        ref_size = reference\n    delta = tensor.size(-1) - ref_size\n    if delta < 0:\n        raise ValueError(\"tensor must be larger than reference. \" f\"Delta is {delta}.\")\n    if delta:\n        tensor = tensor[..., delta // 2 : -(delta - delta // 2)]\n    return tensor\n\n\ndef pull_metric(history: tp.List[dict], name: str):\n    out = []\n    for metrics in history:\n        metric = metrics\n        for part in name.split(\".\"):\n            metric = metric[part]\n        out.append(metric)\n    return out\n\n\ndef EMA(beta: float = 1):\n    \"\"\"\n    Exponential Moving Average callback.\n    Returns a single function that can be called to repeatidly update the EMA\n    with a dict of metrics. The callback will return\n    the new averaged dict of metrics.\n\n    Note that for `beta=1`, this is just plain averaging.\n    \"\"\"\n    fix: tp.Dict[str, float] = defaultdict(float)\n    total: tp.Dict[str, float] = defaultdict(float)\n\n    def _update(metrics: dict, weight: float = 1) -> dict:\n        nonlocal total, fix\n        for key, value in metrics.items():\n            total[key] = total[key] * beta + weight * float(value)\n            fix[key] = fix[key] * beta + weight\n        return {key: tot / fix[key] for key, tot in total.items()}\n\n    return _update\n\n\ndef sizeof_fmt(num: float, suffix: str = \"B\"):\n    \"\"\"\n    Given `num` bytes, return human readable size.\n    Taken from https://stackoverflow.com/a/1094933\n    \"\"\"\n    for unit in [\"\", \"Ki\", \"Mi\", \"Gi\", \"Ti\", \"Pi\", \"Ei\", \"Zi\"]:\n        if abs(num) < 1024.0:\n            return \"%3.1f%s%s\" % (num, unit, suffix)\n        num /= 1024.0\n    return \"%.1f%s%s\" % (num, \"Yi\", suffix)\n\n\n@contextmanager\ndef temp_filenames(count: int, delete=True):\n    names = []\n    try:\n        for _ in range(count):\n            names.append(tempfile.NamedTemporaryFile(delete=False).name)\n        yield names\n    finally:\n        if delete:\n            for name in names:\n                os.unlink(name)\n\n\ndef average_metric(metric, count=1.0):\n    \"\"\"\n    Average `metric` which should be a float across all hosts. `count` should be\n    the weight for this particular host (i.e. number of examples).\n    \"\"\"\n    metric = th.tensor([count, count * metric], dtype=th.float32, device=\"cuda\")\n    distributed.all_reduce(metric, op=distributed.ReduceOp.SUM)\n    return metric[1].item() / metric[0].item()\n\n\ndef free_port(host=\"\", low=20000, high=40000):\n    \"\"\"\n    Return a port number that is most likely free.\n    This could suffer from a race condition although\n    it should be quite rare.\n    \"\"\"\n    sock = socket.socket()\n    while True:\n        port = random.randint(low, high)\n        try:\n            sock.bind((host, port))\n        except OSError as error:\n            if error.errno == errno.EADDRINUSE:\n                continue\n            raise\n        return port\n\n\ndef sizeof_fmt(num, suffix=\"B\"):\n    \"\"\"\n    Given `num` bytes, return human readable size.\n    Taken from https://stackoverflow.com/a/1094933\n    \"\"\"\n    for unit in [\"\", \"Ki\", \"Mi\", \"Gi\", \"Ti\", \"Pi\", \"Ei\", \"Zi\"]:\n        if abs(num) < 1024.0:\n            return \"%3.1f%s%s\" % (num, unit, suffix)\n        num /= 1024.0\n    return \"%.1f%s%s\" % (num, \"Yi\", suffix)\n\n\ndef human_seconds(seconds, display=\".2f\"):\n    \"\"\"\n    Given `seconds` seconds, return human readable duration.\n    \"\"\"\n    value = seconds * 1e6\n    ratios = [1e3, 1e3, 60, 60, 24]\n    names = [\"us\", \"ms\", \"s\", \"min\", \"hrs\", \"days\"]\n    last = names.pop(0)\n    for name, ratio in zip(names, ratios):\n        if value / ratio < 0.3:\n            break\n        value /= ratio\n        last = name\n    return f\"{format(value, display)} {last}\"\n\n\nclass TensorChunk:\n    def __init__(self, tensor, offset=0, length=None):\n        total_length = tensor.shape[-1]\n        assert offset >= 0\n        assert offset < total_length\n\n        if length is None:\n            length = total_length - offset\n        else:\n            length = min(total_length - offset, length)\n\n        self.tensor = tensor\n        self.offset = offset\n        self.length = length\n        self.device = tensor.device\n\n    @property\n    def shape(self):\n        shape = list(self.tensor.shape)\n        shape[-1] = self.length\n        return shape\n\n    def padded(self, target_length):\n        delta = target_length - self.length\n        total_length = self.tensor.shape[-1]\n        assert delta >= 0\n\n        start = self.offset - delta // 2\n        end = start + target_length\n\n        correct_start = max(0, start)\n        correct_end = min(total_length, end)\n\n        pad_left = correct_start - start\n        pad_right = end - correct_end\n\n        out = F.pad(self.tensor[..., correct_start:correct_end], (pad_left, pad_right))\n        assert out.shape[-1] == target_length\n        return out\n\n\ndef tensor_chunk(tensor_or_chunk):\n    if isinstance(tensor_or_chunk, TensorChunk):\n        return tensor_or_chunk\n    else:\n        assert isinstance(tensor_or_chunk, th.Tensor)\n        return TensorChunk(tensor_or_chunk)\n\n\ndef apply_model_v1(model, mix, shifts=None, split=False, progress=False, set_progress_bar=None):\n    \"\"\"\n    Apply model to a given mixture.\n\n    Args:\n        shifts (int): if > 0, will shift in time `mix` by a random amount between 0 and 0.5 sec\n            and apply the oppositve shift to the output. This is repeated `shifts` time and\n            all predictions are averaged. This effectively makes the model time equivariant\n            and improves SDR by up to 0.2 points.\n        split (bool): if True, the input will be broken down in 8 seconds extracts\n            and predictions will be performed individually on each and concatenated.\n            Useful for model with large memory footprint like Tasnet.\n        progress (bool): if True, show a progress bar (requires split=True)\n    \"\"\"\n\n    channels, length = mix.size()\n    device = mix.device\n    progress_value = 0\n\n    if split:\n        out = th.zeros(4, channels, length, device=device)\n        shift = model.samplerate * 10\n        offsets = range(0, length, shift)\n        scale = 10\n        if progress:\n            offsets = tqdm.tqdm(offsets, unit_scale=scale, ncols=120, unit=\"seconds\")\n        for offset in offsets:\n            chunk = mix[..., offset : offset + shift]\n            if set_progress_bar:\n                progress_value += 1\n                set_progress_bar(0.1, (0.8 / len(offsets) * progress_value))\n                chunk_out = apply_model_v1(model, chunk, shifts=shifts, set_progress_bar=set_progress_bar)\n            else:\n                chunk_out = apply_model_v1(model, chunk, shifts=shifts)\n            out[..., offset : offset + shift] = chunk_out\n            offset += shift\n        return out\n    elif shifts:\n        max_shift = int(model.samplerate / 2)\n        mix = F.pad(mix, (max_shift, max_shift))\n        offsets = list(range(max_shift))\n        random.shuffle(offsets)\n        out = 0\n        for offset in offsets[:shifts]:\n            shifted = mix[..., offset : offset + length + max_shift]\n            if set_progress_bar:\n                shifted_out = apply_model_v1(model, shifted, set_progress_bar=set_progress_bar)\n            else:\n                shifted_out = apply_model_v1(model, shifted)\n            out += shifted_out[..., max_shift - offset : max_shift - offset + length]\n        out /= shifts\n        return out\n    else:\n        valid_length = model.valid_length(length)\n        delta = valid_length - length\n        padded = F.pad(mix, (delta // 2, delta - delta // 2))\n        with th.no_grad():\n            out = model(padded.unsqueeze(0))[0]\n        return center_trim(out, mix)\n\n\ndef apply_model_v2(model, mix, shifts=None, split=False, overlap=0.25, transition_power=1.0, progress=False, set_progress_bar=None):\n    \"\"\"\n    Apply model to a given mixture.\n\n    Args:\n        shifts (int): if > 0, will shift in time `mix` by a random amount between 0 and 0.5 sec\n            and apply the oppositve shift to the output. This is repeated `shifts` time and\n            all predictions are averaged. This effectively makes the model time equivariant\n            and improves SDR by up to 0.2 points.\n        split (bool): if True, the input will be broken down in 8 seconds extracts\n            and predictions will be performed individually on each and concatenated.\n            Useful for model with large memory footprint like Tasnet.\n        progress (bool): if True, show a progress bar (requires split=True)\n    \"\"\"\n\n    assert transition_power >= 1, \"transition_power < 1 leads to weird behavior.\"\n    device = mix.device\n    channels, length = mix.shape\n    progress_value = 0\n\n    if split:\n        out = th.zeros(len(model.sources), channels, length, device=device)\n        sum_weight = th.zeros(length, device=device)\n        segment = model.segment_length\n        stride = int((1 - overlap) * segment)\n        offsets = range(0, length, stride)\n        scale = stride / model.samplerate\n        if progress:\n            offsets = tqdm.tqdm(offsets, unit_scale=scale, ncols=120, unit=\"seconds\")\n        # We start from a triangle shaped weight, with maximal weight in the middle\n        # of the segment. Then we normalize and take to the power `transition_power`.\n        # Large values of transition power will lead to sharper transitions.\n        weight = th.cat([th.arange(1, segment // 2 + 1), th.arange(segment - segment // 2, 0, -1)]).to(device)\n        assert len(weight) == segment\n        # If the overlap < 50%, this will translate to linear transition when\n        # transition_power is 1.\n        weight = (weight / weight.max()) ** transition_power\n        for offset in offsets:\n            chunk = TensorChunk(mix, offset, segment)\n            if set_progress_bar:\n                progress_value += 1\n                set_progress_bar(0.1, (0.8 / len(offsets) * progress_value))\n                chunk_out = apply_model_v2(model, chunk, shifts=shifts, set_progress_bar=set_progress_bar)\n            else:\n                chunk_out = apply_model_v2(model, chunk, shifts=shifts)\n            chunk_length = chunk_out.shape[-1]\n            out[..., offset : offset + segment] += weight[:chunk_length] * chunk_out\n            sum_weight[offset : offset + segment] += weight[:chunk_length]\n            offset += segment\n        assert sum_weight.min() > 0\n        out /= sum_weight\n        return out\n    elif shifts:\n        max_shift = int(0.5 * model.samplerate)\n        mix = tensor_chunk(mix)\n        padded_mix = mix.padded(length + 2 * max_shift)\n        out = 0\n        for _ in range(shifts):\n            offset = random.randint(0, max_shift)\n            shifted = TensorChunk(padded_mix, offset, length + max_shift - offset)\n\n            if set_progress_bar:\n                progress_value += 1\n                shifted_out = apply_model_v2(model, shifted, set_progress_bar=set_progress_bar)\n            else:\n                shifted_out = apply_model_v2(model, shifted)\n            out += shifted_out[..., max_shift - offset :]\n        out /= shifts\n        return out\n    else:\n        valid_length = model.valid_length(length)\n        mix = tensor_chunk(mix)\n        padded_mix = mix.padded(valid_length)\n        with th.no_grad():\n            out = model(padded_mix.unsqueeze(0))[0]\n        return center_trim(out, length)\n\n\n@contextmanager\ndef temp_filenames(count, delete=True):\n    names = []\n    try:\n        for _ in range(count):\n            names.append(tempfile.NamedTemporaryFile(delete=False).name)\n        yield names\n    finally:\n        if delete:\n            for name in names:\n                os.unlink(name)\n\n\ndef get_quantizer(model, args, optimizer=None):\n    quantizer = None\n    if args.diffq:\n        quantizer = DiffQuantizer(model, min_size=args.q_min_size, group_size=8)\n        if optimizer is not None:\n            quantizer.setup_optimizer(optimizer)\n    elif args.qat:\n        quantizer = UniformQuantizer(model, bits=args.qat, min_size=args.q_min_size)\n    return quantizer\n\n\ndef load_model(path, strict=False):\n    with warnings.catch_warnings():\n        warnings.simplefilter(\"ignore\")\n        load_from = path\n        package = th.load(load_from, \"cpu\")\n\n    klass = package[\"klass\"]\n    args = package[\"args\"]\n    kwargs = package[\"kwargs\"]\n\n    if strict:\n        model = klass(*args, **kwargs)\n    else:\n        sig = inspect.signature(klass)\n        for key in list(kwargs):\n            if key not in sig.parameters:\n                warnings.warn(\"Dropping inexistant parameter \" + key)\n                del kwargs[key]\n        model = klass(*args, **kwargs)\n\n    state = package[\"state\"]\n    training_args = package[\"training_args\"]\n    quantizer = get_quantizer(model, training_args)\n\n    set_state(model, quantizer, state)\n    return model\n\n\ndef get_state(model, quantizer):\n    if quantizer is None:\n        state = {k: p.data.to(\"cpu\") for k, p in model.state_dict().items()}\n    else:\n        state = quantizer.get_quantized_state()\n        buf = io.BytesIO()\n        th.save(state, buf)\n        state = {\"compressed\": zlib.compress(buf.getvalue())}\n    return state\n\n\ndef set_state(model, quantizer, state):\n    if quantizer is None:\n        model.load_state_dict(state)\n    else:\n        buf = io.BytesIO(zlib.decompress(state[\"compressed\"]))\n        state = th.load(buf, \"cpu\")\n        quantizer.restore_quantized_state(state)\n\n    return state\n\n\ndef save_state(state, path):\n    buf = io.BytesIO()\n    th.save(state, buf)\n    sig = hashlib.sha256(buf.getvalue()).hexdigest()[:8]\n\n    path = path.parent / (path.stem + \"-\" + sig + path.suffix)\n    path.write_bytes(buf.getvalue())\n\n\ndef save_model(model, quantizer, training_args, path):\n    args, kwargs = model._init_args_kwargs\n    klass = model.__class__\n\n    state = get_state(model, quantizer)\n\n    save_to = path\n    package = {\"klass\": klass, \"args\": args, \"kwargs\": kwargs, \"state\": state, \"training_args\": training_args}\n    th.save(package, save_to)\n\n\ndef capture_init(init):\n    @functools.wraps(init)\n    def __init__(self, *args, **kwargs):\n        self._init_args_kwargs = (args, kwargs)\n        init(self, *args, **kwargs)\n\n    return __init__\n\n\nclass DummyPoolExecutor:\n    class DummyResult:\n        def __init__(self, func, *args, **kwargs):\n            self.func = func\n            self.args = args\n            self.kwargs = kwargs\n\n        def result(self):\n            return self.func(*self.args, **self.kwargs)\n\n    def __init__(self, workers=0):\n        pass\n\n    def submit(self, func, *args, **kwargs):\n        return DummyPoolExecutor.DummyResult(func, *args, **kwargs)\n\n    def __enter__(self):\n        return self\n\n    def __exit__(self, exc_type, exc_value, exc_tb):\n        return\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/mdxnet.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .modules import TFC_TDF\nfrom pytorch_lightning import LightningModule\n\ndim_s = 4\n\nclass AbstractMDXNet(LightningModule):\n    def __init__(self, target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap):\n        super().__init__()\n        self.target_name = target_name\n        self.lr = lr\n        self.optimizer = optimizer\n        self.dim_c = dim_c\n        self.dim_f = dim_f\n        self.dim_t = dim_t\n        self.n_fft = n_fft\n        self.n_bins = n_fft // 2 + 1\n        self.hop_length = hop_length\n        self.window = nn.Parameter(torch.hann_window(window_length=self.n_fft, periodic=True), requires_grad=False)\n        self.freq_pad = nn.Parameter(torch.zeros([1, dim_c, self.n_bins - self.dim_f, self.dim_t]), requires_grad=False)\n\n    def get_optimizer(self):\n        if self.optimizer == 'rmsprop':\n            return torch.optim.RMSprop(self.parameters(), self.lr)\n        \n        if self.optimizer == 'adamw':\n            return torch.optim.AdamW(self.parameters(), self.lr)\n\nclass ConvTDFNet(AbstractMDXNet):\n    def __init__(self, target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length,\n                 num_blocks, l, g, k, bn, bias, overlap):\n\n        super(ConvTDFNet, self).__init__(\n            target_name, lr, optimizer, dim_c, dim_f, dim_t, n_fft, hop_length, overlap)\n        #self.save_hyperparameters()\n\n        self.num_blocks = num_blocks\n        self.l = l\n        self.g = g\n        self.k = k\n        self.bn = bn\n        self.bias = bias\n\n        if optimizer == 'rmsprop':\n            norm = nn.BatchNorm2d\n            \n        if optimizer == 'adamw':\n            norm = lambda input:nn.GroupNorm(2, input)\n            \n        self.n = num_blocks // 2\n        scale = (2, 2)\n\n        self.first_conv = nn.Sequential(\n            nn.Conv2d(in_channels=self.dim_c, out_channels=g, kernel_size=(1, 1)),\n            norm(g),\n            nn.ReLU(),\n        )\n\n        f = self.dim_f\n        c = g\n        self.encoding_blocks = nn.ModuleList()\n        self.ds = nn.ModuleList()\n        for i in range(self.n):\n            self.encoding_blocks.append(TFC_TDF(c, l, f, k, bn, bias=bias, norm=norm))\n            self.ds.append(\n                nn.Sequential(\n                    nn.Conv2d(in_channels=c, out_channels=c + g, kernel_size=scale, stride=scale),\n                    norm(c + g),\n                    nn.ReLU()\n                )\n            )\n            f = f // 2\n            c += g\n\n        self.bottleneck_block = TFC_TDF(c, l, f, k, bn, bias=bias, norm=norm)\n\n        self.decoding_blocks = nn.ModuleList()\n        self.us = nn.ModuleList()\n        for i in range(self.n):\n            self.us.append(\n                nn.Sequential(\n                    nn.ConvTranspose2d(in_channels=c, out_channels=c - g, kernel_size=scale, stride=scale),\n                    norm(c - g),\n                    nn.ReLU()\n                )\n            )\n            f = f * 2\n            c -= g\n\n            self.decoding_blocks.append(TFC_TDF(c, l, f, k, bn, bias=bias, norm=norm))\n\n        self.final_conv = nn.Sequential(\n            nn.Conv2d(in_channels=c, out_channels=self.dim_c, kernel_size=(1, 1)),\n        )\n\n    def forward(self, x):\n\n        x = self.first_conv(x)\n\n        x = x.transpose(-1, -2)\n\n        ds_outputs = []\n        for i in range(self.n):\n            x = self.encoding_blocks[i](x)\n            ds_outputs.append(x)\n            x = self.ds[i](x)\n\n        x = self.bottleneck_block(x)\n\n        for i in range(self.n):\n            x = self.us[i](x)\n            x *= ds_outputs[-i - 1]\n            x = self.decoding_blocks[i](x)\n\n        x = x.transpose(-1, -2)\n\n        x = self.final_conv(x)\n\n        return x\n    \nclass Mixer(nn.Module):\n    def __init__(self, device, mixer_path):\n        \n        super(Mixer, self).__init__()\n        \n        self.linear = nn.Linear((dim_s+1)*2, dim_s*2, bias=False)\n        \n        self.load_state_dict(\n            torch.load(mixer_path, map_location=device)\n        )\n\n    def forward(self, x):\n        x = x.reshape(1,(dim_s+1)*2,-1).transpose(-1,-2)\n        x = self.linear(x)\n        return x.transpose(-1,-2).reshape(dim_s,2,-1)"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/modules.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass TFC(nn.Module):\n    def __init__(self, c, l, k, norm):\n        super(TFC, self).__init__()\n\n        self.H = nn.ModuleList()\n        for i in range(l):\n            self.H.append(\n                nn.Sequential(\n                    nn.Conv2d(in_channels=c, out_channels=c, kernel_size=k, stride=1, padding=k // 2),\n                    norm(c),\n                    nn.ReLU(),\n                )\n            )\n\n    def forward(self, x):\n        for h in self.H:\n            x = h(x)\n        return x\n\n\nclass DenseTFC(nn.Module):\n    def __init__(self, c, l, k, norm):\n        super(DenseTFC, self).__init__()\n\n        self.conv = nn.ModuleList()\n        for i in range(l):\n            self.conv.append(\n                nn.Sequential(\n                    nn.Conv2d(in_channels=c, out_channels=c, kernel_size=k, stride=1, padding=k // 2),\n                    norm(c),\n                    nn.ReLU(),\n                )\n            )\n\n    def forward(self, x):\n        for layer in self.conv[:-1]:\n            x = torch.cat([layer(x), x], 1)\n        return self.conv[-1](x)\n\n\nclass TFC_TDF(nn.Module):\n    def __init__(self, c, l, f, k, bn, dense=False, bias=True, norm=nn.BatchNorm2d):\n\n        super(TFC_TDF, self).__init__()\n\n        self.use_tdf = bn is not None\n\n        self.tfc = DenseTFC(c, l, k, norm) if dense else TFC(c, l, k, norm)\n\n        if self.use_tdf:\n            if bn == 0:\n                self.tdf = nn.Sequential(\n                    nn.Linear(f, f, bias=bias),\n                    norm(c),\n                    nn.ReLU()\n                )\n            else:\n                self.tdf = nn.Sequential(\n                    nn.Linear(f, f // bn, bias=bias),\n                    norm(c),\n                    nn.ReLU(),\n                    nn.Linear(f // bn, f, bias=bias),\n                    norm(c),\n                    nn.ReLU()\n                )\n\n    def forward(self, x):\n        x = self.tfc(x)\n        return x + self.tdf(x) if self.use_tdf else x\n\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/playsound.py",
    "content": "import logging\nlogger = logging.getLogger(__name__)\n\nclass PlaysoundException(Exception):\n    pass\n\ndef _canonicalizePath(path):\n    \"\"\"\n    Support passing in a pathlib.Path-like object by converting to str.\n    \"\"\"\n    import sys\n    if sys.version_info[0] >= 3:\n        return str(path)\n    else:\n        # On earlier Python versions, str is a byte string, so attempting to\n        # convert a unicode string to str will fail. Leave it alone in this case.\n        return path\n\ndef _playsoundWin(sound, block = True):\n    '''\n    Utilizes windll.winmm. Tested and known to work with MP3 and WAVE on\n    Windows 7 with Python 2.7. Probably works with more file formats.\n    Probably works on Windows XP thru Windows 10. Probably works with all\n    versions of Python.\n\n    Inspired by (but not copied from) Michael Gundlach <gundlach@gmail.com>'s mp3play:\n    https://github.com/michaelgundlach/mp3play\n\n    I never would have tried using windll.winmm without seeing his code.\n    '''\n    sound = '\"' + _canonicalizePath(sound) + '\"'\n\n    from ctypes import create_unicode_buffer, windll, wintypes\n    windll.winmm.mciSendStringW.argtypes = [wintypes.LPCWSTR, wintypes.LPWSTR, wintypes.UINT, wintypes.HANDLE]\n    windll.winmm.mciGetErrorStringW.argtypes = [wintypes.DWORD, wintypes.LPWSTR, wintypes.UINT]\n\n    def winCommand(*command):\n        bufLen = 600\n        buf = create_unicode_buffer(bufLen)\n        command = ' '.join(command)\n        errorCode = int(windll.winmm.mciSendStringW(command, buf, bufLen - 1, 0))  # use widestring version of the function\n        if errorCode:\n            errorBuffer = create_unicode_buffer(bufLen)\n            windll.winmm.mciGetErrorStringW(errorCode, errorBuffer, bufLen - 1)  # use widestring version of the function\n            exceptionMessage = ('\\n    Error ' + str(errorCode) + ' for command:'\n                                '\\n        ' + command +\n                                '\\n    ' + errorBuffer.value)\n            logger.error(exceptionMessage)\n            raise PlaysoundException(exceptionMessage)\n        return buf.value\n\n    try:\n        logger.debug('Starting')\n        winCommand(u'open {}'.format(sound))\n        winCommand(u'play {}{}'.format(sound, ' wait' if block else ''))\n        logger.debug('Returning')\n    finally:\n        try:\n            winCommand(u'close {}'.format(sound))\n        except PlaysoundException:\n            logger.warning(u'Failed to close the file: {}'.format(sound))\n            # If it fails, there's nothing more that can be done...\n            pass\n\ndef _handlePathOSX(sound):\n    sound = _canonicalizePath(sound)\n\n    if '://' not in sound:\n        if not sound.startswith('/'):\n            from os import getcwd\n            sound = getcwd() + '/' + sound\n        sound = 'file://' + sound\n\n    try:\n        # Don't double-encode it.\n        sound.encode('ascii')\n        return sound.replace(' ', '%20')\n    except UnicodeEncodeError:\n        try:\n            from urllib.parse import quote  # Try the Python 3 import first...\n        except ImportError:\n            from urllib import quote  # Try using the Python 2 import before giving up entirely...\n\n        parts = sound.split('://', 1)\n        return parts[0] + '://' + quote(parts[1].encode('utf-8')).replace(' ', '%20')\n\n\ndef _playsoundOSX(sound, block = True):\n    '''\n    Utilizes AppKit.NSSound. Tested and known to work with MP3 and WAVE on\n    OS X 10.11 with Python 2.7. Probably works with anything QuickTime supports.\n    Probably works on OS X 10.5 and newer. Probably works with all versions of\n    Python.\n\n    Inspired by (but not copied from) Aaron's Stack Overflow answer here:\n    http://stackoverflow.com/a/34568298/901641\n\n    I never would have tried using AppKit.NSSound without seeing his code.\n    '''\n    try:\n        from AppKit import NSSound\n    except ImportError:\n        logger.warning(\"playsound could not find a copy of AppKit - falling back to using macOS's system copy.\")\n        sys.path.append('/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/PyObjC')\n        from AppKit import NSSound\n\n    from Foundation import NSURL\n    from time       import sleep\n\n    sound = _handlePathOSX(sound)\n    url   = NSURL.URLWithString_(sound)\n    if not url:\n        raise PlaysoundException('Cannot find a sound with filename: ' + sound)\n\n    for i in range(5):\n        nssound = NSSound.alloc().initWithContentsOfURL_byReference_(url, True)\n        if nssound:\n            break\n        else:\n            logger.debug('Failed to load sound, although url was good... ' + sound)\n    else:\n        raise PlaysoundException('Could not load sound with filename, although URL was good... ' + sound)\n    nssound.play()\n\n    if block:\n        sleep(nssound.duration())\n\ndef _playsoundNix(sound, block = True):\n    \"\"\"Play a sound using GStreamer.\n\n    Inspired by this:\n    https://gstreamer.freedesktop.org/documentation/tutorials/playback/playbin-usage.html\n    \"\"\"\n    sound = _canonicalizePath(sound)\n\n    # pathname2url escapes non-URL-safe characters\n    from os.path import abspath, exists\n    try:\n        from urllib.request import pathname2url\n    except ImportError:\n        # python 2\n        from urllib import pathname2url\n\n    import gi\n    gi.require_version('Gst', '1.0')\n    from gi.repository import Gst\n\n    Gst.init(None)\n\n    playbin = Gst.ElementFactory.make('playbin', 'playbin')\n    if sound.startswith(('http://', 'https://')):\n        playbin.props.uri = sound\n    else:\n        path = abspath(sound)\n        if not exists(path):\n            raise PlaysoundException(u'File not found: {}'.format(path))\n        playbin.props.uri = 'file://' + pathname2url(path)\n\n\n    set_result = playbin.set_state(Gst.State.PLAYING)\n    if set_result != Gst.StateChangeReturn.ASYNC:\n        raise PlaysoundException(\n            \"playbin.set_state returned \" + repr(set_result))\n\n    # FIXME: use some other bus method than poll() with block=False\n    # https://lazka.github.io/pgi-docs/#Gst-1.0/classes/Bus.html\n    logger.debug('Starting play')\n    if block:\n        bus = playbin.get_bus()\n        try:\n            bus.poll(Gst.MessageType.EOS, Gst.CLOCK_TIME_NONE)\n        finally:\n            playbin.set_state(Gst.State.NULL)\n            \n    logger.debug('Finishing play')\n\ndef _playsoundAnotherPython(otherPython, sound, block = True, macOS = False):\n    '''\n    Mostly written so that when this is run on python3 on macOS, it can invoke\n    python2 on macOS... but maybe this idea could be useful on linux, too.\n    '''\n    from inspect    import getsourcefile\n    from os.path    import abspath, exists\n    from subprocess import check_call\n    from threading  import Thread\n\n    sound = _canonicalizePath(sound)\n\n    class PropogatingThread(Thread):\n        def run(self):\n            self.exc = None\n            try:\n                self.ret = self._target(*self._args, **self._kwargs)\n            except BaseException as e:\n                self.exc = e\n\n        def join(self, timeout = None):\n            super().join(timeout)\n            if self.exc:\n                raise self.exc\n            return self.ret\n\n    # Check if the file exists...\n    if not exists(abspath(sound)):\n        raise PlaysoundException('Cannot find a sound with filename: ' + sound)\n\n    playsoundPath = abspath(getsourcefile(lambda: 0))\n    t = PropogatingThread(target = lambda: check_call([otherPython, playsoundPath, _handlePathOSX(sound) if macOS else sound]))\n    t.start()\n    if block:\n        t.join()\n\nfrom platform import system\nsystem = system()\n\nif system == 'Windows':\n    playsound_func = _playsoundWin\nelif system == 'Darwin':\n    playsound_func = _playsoundOSX\n    import sys\n    if sys.version_info[0] > 2:\n        try:\n            from AppKit import NSSound\n        except ImportError:\n            logger.warning(\"playsound is relying on a python 2 subprocess. Please use `pip3 install PyObjC` if you want playsound to run more efficiently.\")\n            playsound_func = lambda sound, block = True: _playsoundAnotherPython('/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python', sound, block, macOS = True)\nelse:\n    playsound_func = _playsoundNix\n    if __name__ != '__main__':  # Ensure we don't infinitely recurse trying to get another python instance.\n        try:\n            import gi\n            gi.require_version('Gst', '1.0')\n            from gi.repository import Gst\n        except:\n            logger.warning(\"playsound is relying on another python subprocess. Please use `pip install pygobject` if you want playsound to run more efficiently.\")\n            playsound_func = lambda sound, block = True: _playsoundAnotherPython('/usr/bin/python3', sound, block, macOS = False)\n\ndel system\n\ndef play(audio_filepath):\n    playsound_func(audio_filepath)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/pyrb.py",
    "content": "import os\nimport subprocess\nimport tempfile\nimport six\nimport numpy as np\nimport soundfile as sf\nimport sys\n\nif getattr(sys, 'frozen', False):\n    BASE_PATH_RUB = sys._MEIPASS\nelse:\n    BASE_PATH_RUB = os.path.dirname(os.path.abspath(__file__))\n\n__all__ = ['time_stretch', 'pitch_shift']\n\n__RUBBERBAND_UTIL = os.path.join(BASE_PATH_RUB, 'rubberband')\n\nif six.PY2:\n    DEVNULL = open(os.devnull, 'w')\nelse:\n    DEVNULL = subprocess.DEVNULL\n\ndef __rubberband(y, sr, **kwargs):\n\n    assert sr > 0\n\n    # Get the input and output tempfile\n    fd, infile = tempfile.mkstemp(suffix='.wav')\n    os.close(fd)\n    fd, outfile = tempfile.mkstemp(suffix='.wav')\n    os.close(fd)\n\n    # dump the audio\n    sf.write(infile, y, sr)\n\n    try:\n        # Execute rubberband\n        arguments = [__RUBBERBAND_UTIL, '-q']\n\n        for key, value in six.iteritems(kwargs):\n            arguments.append(str(key))\n            arguments.append(str(value))\n\n        arguments.extend([infile, outfile])\n\n        subprocess.check_call(arguments, stdout=DEVNULL, stderr=DEVNULL)\n\n        # Load the processed audio.\n        y_out, _ = sf.read(outfile, always_2d=True)\n\n        # make sure that output dimensions matches input\n        if y.ndim == 1:\n            y_out = np.squeeze(y_out)\n\n    except OSError as exc:\n        six.raise_from(RuntimeError('Failed to execute rubberband. '\n                                    'Please verify that rubberband-cli '\n                                    'is installed.'),\n                       exc)\n\n    finally:\n        # Remove temp files\n        os.unlink(infile)\n        os.unlink(outfile)\n\n    return y_out\n\ndef time_stretch(y, sr, rate, rbargs=None):\n    if rate <= 0:\n        raise ValueError('rate must be strictly positive')\n\n    if rate == 1.0:\n        return y\n\n    if rbargs is None:\n        rbargs = dict()\n\n    rbargs.setdefault('--tempo', rate)\n\n    return __rubberband(y, sr, **rbargs)\n\ndef pitch_shift(y, sr, n_steps, rbargs=None):\n\n    if n_steps == 0:\n        return y\n\n    if rbargs is None:\n        rbargs = dict()\n\n    rbargs.setdefault('--pitch', n_steps)\n\n    return __rubberband(y, sr, **rbargs)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/results.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nMatchering - Audio Matching and Mastering Python Library\nCopyright (C) 2016-2022 Sergree\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see <https://www.gnu.org/licenses/>.\n\"\"\"\n\nimport os\nimport soundfile as sf\n\n\nclass Result:\n    def __init__(\n        self, file: str, subtype: str, use_limiter: bool = True, normalize: bool = True\n    ):\n        _, file_ext = os.path.splitext(file)\n        file_ext = file_ext[1:].upper()\n        if not sf.check_format(file_ext):\n            raise TypeError(f\"{file_ext} format is not supported\")\n        if not sf.check_format(file_ext, subtype):\n            raise TypeError(f\"{file_ext} format does not have {subtype} subtype\")\n        self.file = file\n        self.subtype = subtype\n        self.use_limiter = use_limiter\n        self.normalize = normalize\n\n\ndef pcm16(file: str) -> Result:\n    return Result(file, \"PCM_16\")\n\ndef pcm24(file: str) -> Result:\n    return Result(file, \"FLOAT\")\n\ndef save_audiofile(file: str, wav_set=\"PCM_16\") -> Result:\n    return Result(file, wav_set)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/roformer/attend.py",
    "content": "from functools import wraps\nfrom packaging import version\nfrom collections import namedtuple\n\nimport torch\nfrom torch import nn, einsum\nimport torch.nn.functional as F\n\nfrom einops import rearrange, reduce\n\n# constants\n\nFlashAttentionConfig = namedtuple(\"FlashAttentionConfig\", [\"enable_flash\", \"enable_math\", \"enable_mem_efficient\"])\n\n# helpers\n\n\ndef exists(val):\n    return val is not None\n\n\ndef once(fn):\n    called = False\n\n    @wraps(fn)\n    def inner(x):\n        nonlocal called\n        if called:\n            return\n        called = True\n        return fn(x)\n\n    return inner\n\n\nprint_once = once(print)\n\n# main class\n\n\nclass Attend(nn.Module):\n    def __init__(self, dropout=0.0, flash=False):\n        super().__init__()\n        self.dropout = dropout\n        self.attn_dropout = nn.Dropout(dropout)\n\n        self.flash = flash\n        assert not (flash and version.parse(torch.__version__) < version.parse(\"2.0.0\")), \"in order to use flash attention, you must be using pytorch 2.0 or above\"\n\n        # determine efficient attention configs for cuda and cpu\n\n        self.cpu_config = FlashAttentionConfig(True, True, True)\n        self.cuda_config = None\n\n        if not torch.cuda.is_available() or not flash:\n            return\n\n        device_properties = torch.cuda.get_device_properties(torch.device(\"cuda\"))\n\n        if device_properties.major == 8 and device_properties.minor == 0:\n            print_once(\"A100 GPU detected, using flash attention if input tensor is on cuda\")\n            self.cuda_config = FlashAttentionConfig(True, False, False)\n        else:\n            self.cuda_config = FlashAttentionConfig(False, True, True)\n\n    def flash_attn(self, q, k, v):\n        _, heads, q_len, _, k_len, is_cuda, device = *q.shape, k.shape[-2], q.is_cuda, q.device\n\n        # Check if there is a compatible device for flash attention\n\n        config = self.cuda_config if is_cuda else self.cpu_config\n\n        # sdpa_flash kernel only supports float16 on sm80+ architecture gpu\n        if is_cuda and q.dtype != torch.float16:\n            config = FlashAttentionConfig(False, True, True)\n\n        # pytorch 2.0 flash attn: q, k, v, mask, dropout, softmax_scale\n        with torch.backends.cuda.sdp_kernel(**config._asdict()):\n            out = F.scaled_dot_product_attention(q, k, v, dropout_p=self.dropout if self.training else 0.0)\n\n        return out\n\n    def forward(self, q, k, v):\n        \"\"\"\n        einstein notation\n        b - batch\n        h - heads\n        n, i, j - sequence length (base sequence length, source, target)\n        d - feature dimension\n        \"\"\"\n\n        q_len, k_len, device = q.shape[-2], k.shape[-2], q.device\n\n        scale = q.shape[-1] ** -0.5\n\n        if self.flash:\n            return self.flash_attn(q, k, v)\n\n        # similarity\n\n        sim = einsum(f\"b h i d, b h j d -> b h i j\", q, k) * scale\n\n        # attention\n\n        attn = sim.softmax(dim=-1)\n        attn = self.attn_dropout(attn)\n\n        # aggregate values\n\n        out = einsum(f\"b h i j, b h j d -> b h i d\", attn, v)\n\n        return out\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/roformer/bs_roformer.py",
    "content": "from functools import partial\n\nimport torch\nfrom torch import nn, einsum, Tensor\nfrom torch.nn import Module, ModuleList\nimport torch.nn.functional as F\n\nfrom .attend import Attend\n\nfrom beartype.typing import Tuple, Optional, List, Callable\nfrom beartype import beartype\n\nfrom rotary_embedding_torch import RotaryEmbedding\n\nfrom einops import rearrange, pack, unpack\nfrom einops.layers.torch import Rearrange\n\n# helper functions\n\n\ndef exists(val):\n    return val is not None\n\n\ndef default(v, d):\n    return v if exists(v) else d\n\n\ndef pack_one(t, pattern):\n    return pack([t], pattern)\n\n\ndef unpack_one(t, ps, pattern):\n    return unpack(t, ps, pattern)[0]\n\n\n# norm\n\n\ndef l2norm(t):\n    return F.normalize(t, dim=-1, p=2)\n\n\nclass RMSNorm(Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.scale = dim**0.5\n        self.gamma = nn.Parameter(torch.ones(dim))\n\n    def forward(self, x):\n        x = x.to(self.gamma.device)\n        return F.normalize(x, dim=-1) * self.scale * self.gamma\n\n\n# attention\n\n\nclass FeedForward(Module):\n    def __init__(self, dim, mult=4, dropout=0.0):\n        super().__init__()\n        dim_inner = int(dim * mult)\n        self.net = nn.Sequential(RMSNorm(dim), nn.Linear(dim, dim_inner), nn.GELU(), nn.Dropout(dropout), nn.Linear(dim_inner, dim), nn.Dropout(dropout))\n\n    def forward(self, x):\n        return self.net(x)\n\n\nclass Attention(Module):\n    def __init__(self, dim, heads=8, dim_head=64, dropout=0.0, rotary_embed=None, flash=True):\n        super().__init__()\n        self.heads = heads\n        self.scale = dim_head**-0.5\n        dim_inner = heads * dim_head\n\n        self.rotary_embed = rotary_embed\n\n        self.attend = Attend(flash=flash, dropout=dropout)\n\n        self.norm = RMSNorm(dim)\n        self.to_qkv = nn.Linear(dim, dim_inner * 3, bias=False)\n\n        self.to_gates = nn.Linear(dim, heads)\n\n        self.to_out = nn.Sequential(nn.Linear(dim_inner, dim, bias=False), nn.Dropout(dropout))\n\n    def forward(self, x):\n        x = self.norm(x)\n\n        q, k, v = rearrange(self.to_qkv(x), \"b n (qkv h d) -> qkv b h n d\", qkv=3, h=self.heads)\n\n        if exists(self.rotary_embed):\n            q = self.rotary_embed.rotate_queries_or_keys(q)\n            k = self.rotary_embed.rotate_queries_or_keys(k)\n\n        out = self.attend(q, k, v)\n\n        gates = self.to_gates(x)\n        out = out * rearrange(gates, \"b n h -> b h n 1\").sigmoid()\n\n        out = rearrange(out, \"b h n d -> b n (h d)\")\n        return self.to_out(out)\n\n\nclass LinearAttention(Module):\n    \"\"\"\n    this flavor of linear attention proposed in https://arxiv.org/abs/2106.09681 by El-Nouby et al.\n    \"\"\"\n\n    @beartype\n    def __init__(self, *, dim, dim_head=32, heads=8, scale=8, flash=False, dropout=0.0):\n        super().__init__()\n        dim_inner = dim_head * heads\n        self.norm = RMSNorm(dim)\n\n        self.to_qkv = nn.Sequential(nn.Linear(dim, dim_inner * 3, bias=False), Rearrange(\"b n (qkv h d) -> qkv b h d n\", qkv=3, h=heads))\n\n        self.temperature = nn.Parameter(torch.ones(heads, 1, 1))\n\n        self.attend = Attend(scale=scale, dropout=dropout, flash=flash)\n\n        self.to_out = nn.Sequential(Rearrange(\"b h d n -> b n (h d)\"), nn.Linear(dim_inner, dim, bias=False))\n\n    def forward(self, x):\n        x = self.norm(x)\n\n        q, k, v = self.to_qkv(x)\n\n        q, k = map(l2norm, (q, k))\n        q = q * self.temperature.exp()\n\n        out = self.attend(q, k, v)\n\n        return self.to_out(out)\n\n\nclass Transformer(Module):\n    def __init__(self, *, dim, depth, dim_head=64, heads=8, attn_dropout=0.0, ff_dropout=0.0, ff_mult=4, norm_output=True, rotary_embed=None, flash_attn=True, linear_attn=False):\n        super().__init__()\n        self.layers = ModuleList([])\n\n        for _ in range(depth):\n            if linear_attn:\n                attn = LinearAttention(dim=dim, dim_head=dim_head, heads=heads, dropout=attn_dropout, flash=flash_attn)\n            else:\n                attn = Attention(dim=dim, dim_head=dim_head, heads=heads, dropout=attn_dropout, rotary_embed=rotary_embed, flash=flash_attn)\n\n            self.layers.append(ModuleList([attn, FeedForward(dim=dim, mult=ff_mult, dropout=ff_dropout)]))\n\n        self.norm = RMSNorm(dim) if norm_output else nn.Identity()\n\n    def forward(self, x):\n\n        for attn, ff in self.layers:\n            x = attn(x) + x\n            x = ff(x) + x\n\n        return self.norm(x)\n\n\n# bandsplit module\n\n\nclass BandSplit(Module):\n    @beartype\n    def __init__(self, dim, dim_inputs: Tuple[int, ...]):\n        super().__init__()\n        self.dim_inputs = dim_inputs\n        self.to_features = ModuleList([])\n\n        for dim_in in dim_inputs:\n            net = nn.Sequential(RMSNorm(dim_in), nn.Linear(dim_in, dim))\n\n            self.to_features.append(net)\n\n    def forward(self, x):\n        x = x.split(self.dim_inputs, dim=-1)\n\n        outs = []\n        for split_input, to_feature in zip(x, self.to_features):\n            split_output = to_feature(split_input)\n            outs.append(split_output)\n\n        return torch.stack(outs, dim=-2)\n\n\ndef MLP(dim_in, dim_out, dim_hidden=None, depth=1, activation=nn.Tanh):\n    dim_hidden = default(dim_hidden, dim_in)\n\n    net = []\n    dims = (dim_in, *((dim_hidden,) * (depth - 1)), dim_out)\n\n    for ind, (layer_dim_in, layer_dim_out) in enumerate(zip(dims[:-1], dims[1:])):\n        is_last = ind == (len(dims) - 2)\n\n        net.append(nn.Linear(layer_dim_in, layer_dim_out))\n\n        if is_last:\n            continue\n\n        net.append(activation())\n\n    return nn.Sequential(*net)\n\n\nclass MaskEstimator(Module):\n    @beartype\n    def __init__(self, dim, dim_inputs: Tuple[int, ...], depth, mlp_expansion_factor=4):\n        super().__init__()\n        self.dim_inputs = dim_inputs\n        self.to_freqs = ModuleList([])\n        dim_hidden = dim * mlp_expansion_factor\n\n        for dim_in in dim_inputs:\n            net = []\n\n            mlp = nn.Sequential(MLP(dim, dim_in * 2, dim_hidden=dim_hidden, depth=depth), nn.GLU(dim=-1))\n\n            self.to_freqs.append(mlp)\n\n    def forward(self, x):\n        x = x.unbind(dim=-2)\n\n        outs = []\n\n        for band_features, mlp in zip(x, self.to_freqs):\n            freq_out = mlp(band_features)\n            outs.append(freq_out)\n\n        return torch.cat(outs, dim=-1)\n\n\n# main class\n\nDEFAULT_FREQS_PER_BANDS = (\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    2,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    4,\n    12,\n    12,\n    12,\n    12,\n    12,\n    12,\n    12,\n    12,\n    24,\n    24,\n    24,\n    24,\n    24,\n    24,\n    24,\n    24,\n    48,\n    48,\n    48,\n    48,\n    48,\n    48,\n    48,\n    48,\n    128,\n    129,\n)\n\n\nclass BSRoformer(Module):\n\n    @beartype\n    def __init__(\n        self,\n        dim,\n        *,\n        depth,\n        stereo=False,\n        num_stems=1,\n        time_transformer_depth=2,\n        freq_transformer_depth=2,\n        linear_transformer_depth=0,\n        freqs_per_bands: Tuple[int, ...] = DEFAULT_FREQS_PER_BANDS,\n        # in the paper, they divide into ~60 bands, test with 1 for starters\n        dim_head=64,\n        heads=8,\n        attn_dropout=0.0,\n        ff_dropout=0.0,\n        flash_attn=True,\n        # New parameters for updated implementation\n        mlp_expansion_factor=4,\n        sage_attention=False,\n        zero_dc=True,\n        use_torch_checkpoint=False,\n        skip_connection=False,\n        # Original parameters continue\n        dim_freqs_in=1025,\n        stft_n_fft=2048,\n        stft_hop_length=512,\n        # 10ms at 44100Hz, from sections 4.1, 4.4 in the paper - @faroit recommends // 2 or // 4 for better reconstruction\n        stft_win_length=2048,\n        stft_normalized=False,\n        stft_window_fn: Optional[Callable] = None,\n        mask_estimator_depth=2,\n        multi_stft_resolution_loss_weight=1.0,\n        multi_stft_resolutions_window_sizes: Tuple[int, ...] = (4096, 2048, 1024, 512, 256),\n        multi_stft_hop_size=147,\n        multi_stft_normalized=False,\n        multi_stft_window_fn: Callable = torch.hann_window,\n    ):\n        super().__init__()\n\n        self.stereo = stereo\n        self.audio_channels = 2 if stereo else 1\n        self.num_stems = num_stems\n        \n        # Store new parameters as instance variables\n        self.mlp_expansion_factor = mlp_expansion_factor\n        self.sage_attention = sage_attention\n        self.zero_dc = zero_dc\n        self.use_torch_checkpoint = use_torch_checkpoint\n        self.skip_connection = skip_connection\n\n        self.layers = ModuleList([])\n\n        # Add parameters to transformer kwargs (excluding sage_attention for now)\n        transformer_kwargs = dict(\n            dim=dim, \n            heads=heads, \n            dim_head=dim_head, \n            attn_dropout=attn_dropout, \n            ff_dropout=ff_dropout, \n            flash_attn=flash_attn, \n            norm_output=False\n        )\n        \n        # Print sage attention status if enabled (as per research findings)\n        if sage_attention:\n            print(\"Use Sage Attention\")\n\n        time_rotary_embed = RotaryEmbedding(dim=dim_head)\n        freq_rotary_embed = RotaryEmbedding(dim=dim_head)\n\n        for _ in range(depth):\n            tran_modules = []\n            if linear_transformer_depth > 0:\n                tran_modules.append(Transformer(depth=linear_transformer_depth, linear_attn=True, **transformer_kwargs))\n            tran_modules.append(Transformer(depth=time_transformer_depth, rotary_embed=time_rotary_embed, **transformer_kwargs))\n            tran_modules.append(Transformer(depth=freq_transformer_depth, rotary_embed=freq_rotary_embed, **transformer_kwargs))\n            self.layers.append(nn.ModuleList(tran_modules))\n\n        self.final_norm = RMSNorm(dim)\n\n        self.stft_kwargs = dict(n_fft=stft_n_fft, hop_length=stft_hop_length, win_length=stft_win_length, normalized=stft_normalized)\n\n        self.stft_window_fn = partial(default(stft_window_fn, torch.hann_window), stft_win_length)\n\n        freqs = torch.stft(torch.randn(1, 4096), **self.stft_kwargs, return_complex=True).shape[1]\n\n        assert len(freqs_per_bands) > 1\n        assert sum(freqs_per_bands) == freqs, f\"the number of freqs in the bands must equal {freqs} based on the STFT settings, but got {sum(freqs_per_bands)}\"\n\n        freqs_per_bands_with_complex = tuple(2 * f * self.audio_channels for f in freqs_per_bands)\n\n        self.band_split = BandSplit(dim=dim, dim_inputs=freqs_per_bands_with_complex)\n\n        self.mask_estimators = nn.ModuleList([])\n\n        for _ in range(num_stems):\n            mask_estimator = MaskEstimator(\n                dim=dim, \n                dim_inputs=freqs_per_bands_with_complex, \n                depth=mask_estimator_depth,\n                mlp_expansion_factor=mlp_expansion_factor  # Use the new parameter\n            )\n\n            self.mask_estimators.append(mask_estimator)\n\n        # for the multi-resolution stft loss\n\n        self.multi_stft_resolution_loss_weight = multi_stft_resolution_loss_weight\n        self.multi_stft_resolutions_window_sizes = multi_stft_resolutions_window_sizes\n        self.multi_stft_n_fft = stft_n_fft\n        self.multi_stft_window_fn = multi_stft_window_fn\n\n        self.multi_stft_kwargs = dict(hop_length=multi_stft_hop_size, normalized=multi_stft_normalized)\n\n    def forward(self, raw_audio, target=None, return_loss_breakdown=False):\n        \"\"\"\n        einops\n\n        b - batch\n        f - freq\n        t - time\n        s - audio channel (1 for mono, 2 for stereo)\n        n - number of 'stems'\n        c - complex (2)\n        d - feature dimension\n        \"\"\"\n\n        original_device = raw_audio.device\n        x_is_mps = True if original_device.type == \"mps\" else False\n\n        # if x_is_mps:\n        #     raw_audio = raw_audio.cpu()\n\n        device = raw_audio.device\n\n        if raw_audio.ndim == 2:\n            raw_audio = rearrange(raw_audio, \"b t -> b 1 t\")\n\n        channels = raw_audio.shape[1]\n        assert (not self.stereo and channels == 1) or (\n            self.stereo and channels == 2\n        ), \"stereo needs to be set to True if passing in audio signal that is stereo (channel dimension of 2). also need to be False if mono (channel dimension of 1)\"\n\n        # to stft\n\n        raw_audio, batch_audio_channel_packed_shape = pack_one(raw_audio, \"* t\")\n\n        stft_window = self.stft_window_fn().to(device)\n\n        stft_repr = torch.stft(raw_audio, **self.stft_kwargs, window=stft_window, return_complex=True)\n        stft_repr = torch.view_as_real(stft_repr)\n\n        stft_repr = unpack_one(stft_repr, batch_audio_channel_packed_shape, \"* f t c\")\n        stft_repr = rearrange(stft_repr, \"b s f t c -> b (f s) t c\")  # merge stereo / mono into the frequency, with frequency leading dimension, for band splitting\n\n        x = rearrange(stft_repr, \"b f t c -> b t (f c)\")\n\n        x = self.band_split(x)\n\n        # axial / hierarchical attention\n\n        for transformer_block in self.layers:\n\n            if len(transformer_block) == 3:\n                linear_transformer, time_transformer, freq_transformer = transformer_block\n\n                x, ft_ps = pack([x], \"b * d\")\n                x = linear_transformer(x)\n                (x,) = unpack(x, ft_ps, \"b * d\")\n            else:\n                time_transformer, freq_transformer = transformer_block\n\n            x = rearrange(x, \"b t f d -> b f t d\")\n            x, ps = pack([x], \"* t d\")\n\n            x = time_transformer(x)\n\n            (x,) = unpack(x, ps, \"* t d\")\n            x = rearrange(x, \"b f t d -> b t f d\")\n            x, ps = pack([x], \"* f d\")\n\n            x = freq_transformer(x)\n\n            (x,) = unpack(x, ps, \"* f d\")\n\n        x = self.final_norm(x)\n\n        mask = torch.stack([fn(x) for fn in self.mask_estimators], dim=1)\n        mask = rearrange(mask, \"b n t (f c) -> b n f t c\", c=2)\n\n        # if x_is_mps:\n        #     mask = mask.to('cpu')\n\n        # modulate frequency representation\n\n        stft_repr = rearrange(stft_repr, \"b f t c -> b 1 f t c\")\n\n        # complex number multiplication\n\n        stft_repr = torch.view_as_complex(stft_repr)\n        mask = torch.view_as_complex(mask)\n\n        stft_repr = stft_repr * mask\n\n        # istft\n\n        stft_repr = rearrange(stft_repr, \"b n (f s) t -> (b n s) f t\", s=self.audio_channels)\n\n        recon_audio = torch.istft(stft_repr.cpu() if x_is_mps else stft_repr, **self.stft_kwargs, window=stft_window.cpu() if x_is_mps else stft_window, return_complex=False).to(device)\n\n        recon_audio = rearrange(recon_audio, \"(b n s) t -> b n s t\", s=self.audio_channels, n=self.num_stems)\n\n        if self.num_stems == 1:\n            recon_audio = rearrange(recon_audio, \"b 1 s t -> b s t\")\n\n        # if a target is passed in, calculate loss for learning\n\n        if not exists(target):\n            return recon_audio\n\n        if self.num_stems > 1:\n            assert target.ndim == 4 and target.shape[1] == self.num_stems\n\n        if target.ndim == 2:\n            target = rearrange(target, \"... t -> ... 1 t\")\n\n        target = target[..., : recon_audio.shape[-1]]\n\n        loss = F.l1_loss(recon_audio, target)\n\n        multi_stft_resolution_loss = 0.0\n\n        for window_size in self.multi_stft_resolutions_window_sizes:\n            res_stft_kwargs = dict(\n                n_fft=max(window_size, self.multi_stft_n_fft), win_length=window_size, return_complex=True, window=self.multi_stft_window_fn(window_size, device=device), **self.multi_stft_kwargs\n            )\n\n            recon_Y = torch.stft(rearrange(recon_audio, \"... s t -> (... s) t\"), **res_stft_kwargs)\n            target_Y = torch.stft(rearrange(target, \"... s t -> (... s) t\"), **res_stft_kwargs)\n\n            multi_stft_resolution_loss = multi_stft_resolution_loss + F.l1_loss(recon_Y, target_Y)\n\n        weighted_multi_resolution_loss = multi_stft_resolution_loss * self.multi_stft_resolution_loss_weight\n\n        total_loss = loss + weighted_multi_resolution_loss\n\n        if not return_loss_breakdown:\n            # Move the result back to the original device if it was moved to CPU for MPS compatibility\n            # if x_is_mps:\n            #     total_loss = total_loss.to(original_device)\n            return total_loss\n\n        # For detailed loss breakdown, ensure all components are moved back to the original device for MPS\n        # if x_is_mps:\n        #     loss = loss.to(original_device)\n        #     multi_stft_resolution_loss = multi_stft_resolution_loss.to(original_device)\n        #     weighted_multi_resolution_loss = weighted_multi_resolution_loss.to(original_device)\n\n        return total_loss, (loss, multi_stft_resolution_loss)\n\n        # if not return_loss_breakdown:\n        #     return total_loss\n\n        # return total_loss, (loss, multi_stft_resolution_loss)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/roformer/mel_band_roformer.py",
    "content": "from functools import partial\n\nimport torch\nfrom torch import nn, einsum, Tensor\nfrom torch.nn import Module, ModuleList\nimport torch.nn.functional as F\n\nfrom .attend import Attend\n\nfrom beartype.typing import Tuple, Optional, List, Callable\nfrom beartype import beartype\n\nfrom rotary_embedding_torch import RotaryEmbedding\n\nfrom einops import rearrange, pack, unpack, reduce, repeat\n\nfrom librosa import filters\n\n\ndef exists(val):\n    return val is not None\n\n\ndef default(v, d):\n    return v if exists(v) else d\n\n\ndef pack_one(t, pattern):\n    return pack([t], pattern)\n\n\ndef unpack_one(t, ps, pattern):\n    return unpack(t, ps, pattern)[0]\n\n\ndef pad_at_dim(t, pad, dim=-1, value=0.0):\n    dims_from_right = (-dim - 1) if dim < 0 else (t.ndim - dim - 1)\n    zeros = (0, 0) * dims_from_right\n    return F.pad(t, (*zeros, *pad), value=value)\n\n\nclass RMSNorm(Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.scale = dim**0.5\n        self.gamma = nn.Parameter(torch.ones(dim))\n\n    def forward(self, x):\n        x = x.to(self.gamma.device)\n        return F.normalize(x, dim=-1) * self.scale * self.gamma\n\n\nclass FeedForward(Module):\n    def __init__(self, dim, mult=4, dropout=0.0):\n        super().__init__()\n        dim_inner = int(dim * mult)\n        self.net = nn.Sequential(RMSNorm(dim), nn.Linear(dim, dim_inner), nn.GELU(), nn.Dropout(dropout), nn.Linear(dim_inner, dim), nn.Dropout(dropout))\n\n    def forward(self, x):\n        return self.net(x)\n\n\nclass Attention(Module):\n    def __init__(self, dim, heads=8, dim_head=64, dropout=0.0, rotary_embed=None, flash=True):\n        super().__init__()\n        self.heads = heads\n        self.scale = dim_head**-0.5\n        dim_inner = heads * dim_head\n\n        self.rotary_embed = rotary_embed\n\n        self.attend = Attend(flash=flash, dropout=dropout)\n\n        self.norm = RMSNorm(dim)\n        self.to_qkv = nn.Linear(dim, dim_inner * 3, bias=False)\n\n        self.to_gates = nn.Linear(dim, heads)\n\n        self.to_out = nn.Sequential(nn.Linear(dim_inner, dim, bias=False), nn.Dropout(dropout))\n\n    def forward(self, x):\n        x = self.norm(x)\n\n        q, k, v = rearrange(self.to_qkv(x), \"b n (qkv h d) -> qkv b h n d\", qkv=3, h=self.heads)\n\n        if exists(self.rotary_embed):\n            q = self.rotary_embed.rotate_queries_or_keys(q)\n            k = self.rotary_embed.rotate_queries_or_keys(k)\n\n        out = self.attend(q, k, v)\n\n        gates = self.to_gates(x)\n        out = out * rearrange(gates, \"b n h -> b h n 1\").sigmoid()\n\n        out = rearrange(out, \"b h n d -> b n (h d)\")\n        return self.to_out(out)\n\n\nclass Transformer(Module):\n    def __init__(self, *, dim, depth, dim_head=64, heads=8, attn_dropout=0.0, ff_dropout=0.0, ff_mult=4, norm_output=True, rotary_embed=None, flash_attn=True):\n        super().__init__()\n        self.layers = ModuleList([])\n\n        for _ in range(depth):\n            self.layers.append(\n                ModuleList(\n                    [Attention(dim=dim, dim_head=dim_head, heads=heads, dropout=attn_dropout, rotary_embed=rotary_embed, flash=flash_attn), FeedForward(dim=dim, mult=ff_mult, dropout=ff_dropout)]\n                )\n            )\n\n        self.norm = RMSNorm(dim) if norm_output else nn.Identity()\n\n    def forward(self, x):\n\n        for attn, ff in self.layers:\n            x = attn(x) + x\n            x = ff(x) + x\n\n        return self.norm(x)\n\n\nclass BandSplit(Module):\n    @beartype\n    def __init__(self, dim, dim_inputs: Tuple[int, ...]):\n        super().__init__()\n        self.dim_inputs = dim_inputs\n        self.to_features = ModuleList([])\n\n        for dim_in in dim_inputs:\n            net = nn.Sequential(RMSNorm(dim_in), nn.Linear(dim_in, dim))\n\n            self.to_features.append(net)\n\n    def forward(self, x):\n        x = x.split(self.dim_inputs, dim=-1)\n\n        outs = []\n        for split_input, to_feature in zip(x, self.to_features):\n            split_output = to_feature(split_input)\n            outs.append(split_output)\n\n        return torch.stack(outs, dim=-2)\n\n\ndef MLP(dim_in, dim_out, dim_hidden=None, depth=1, activation=nn.Tanh):\n    dim_hidden = default(dim_hidden, dim_in)\n\n    net = []\n    dims = (dim_in, *((dim_hidden,) * depth), dim_out)\n\n    for ind, (layer_dim_in, layer_dim_out) in enumerate(zip(dims[:-1], dims[1:])):\n        is_last = ind == (len(dims) - 2)\n\n        net.append(nn.Linear(layer_dim_in, layer_dim_out))\n\n        if is_last:\n            continue\n\n        net.append(activation())\n\n    return nn.Sequential(*net)\n\n\nclass MaskEstimator(Module):\n    @beartype\n    def __init__(self, dim, dim_inputs: Tuple[int, ...], depth, mlp_expansion_factor=4):\n        super().__init__()\n        self.dim_inputs = dim_inputs\n        self.to_freqs = ModuleList([])\n        dim_hidden = dim * mlp_expansion_factor\n\n        for dim_in in dim_inputs:\n            net = []\n\n            mlp = nn.Sequential(MLP(dim, dim_in * 2, dim_hidden=dim_hidden, depth=depth), nn.GLU(dim=-1))\n\n            self.to_freqs.append(mlp)\n\n    def forward(self, x):\n        x = x.unbind(dim=-2)\n\n        outs = []\n\n        for band_features, mlp in zip(x, self.to_freqs):\n            freq_out = mlp(band_features)\n            outs.append(freq_out)\n\n        return torch.cat(outs, dim=-1)\n\n\nclass MelBandRoformer(Module):\n\n    @beartype\n    def __init__(\n        self,\n        dim,\n        *,\n        depth,\n        stereo=False,\n        num_stems=1,\n        time_transformer_depth=2,\n        freq_transformer_depth=2,\n        num_bands=60,\n        dim_head=64,\n        heads=8,\n        attn_dropout=0.1,\n        ff_dropout=0.1,\n        flash_attn=True,\n        # New parameters for updated implementation\n        mlp_expansion_factor=4,\n        sage_attention=False,\n        zero_dc=True,\n        use_torch_checkpoint=False,\n        skip_connection=False,\n        # Original parameters continue\n        dim_freqs_in=1025,\n        sample_rate=44100,\n        stft_n_fft=2048,\n        stft_hop_length=512,\n        stft_win_length=2048,\n        stft_normalized=False,\n        stft_window_fn: Optional[Callable] = None,\n        mask_estimator_depth=1,\n        multi_stft_resolution_loss_weight=1.0,\n        multi_stft_resolutions_window_sizes: Tuple[int, ...] = (4096, 2048, 1024, 512, 256),\n        multi_stft_hop_size=147,\n        multi_stft_normalized=False,\n        multi_stft_window_fn: Callable = torch.hann_window,\n        match_input_audio_length=False,\n    ):\n        super().__init__()\n\n        self.stereo = stereo\n        self.audio_channels = 2 if stereo else 1\n        self.num_stems = num_stems\n        \n        # Store new parameters as instance variables\n        self.mlp_expansion_factor = mlp_expansion_factor\n        self.sage_attention = sage_attention\n        self.zero_dc = zero_dc\n        self.use_torch_checkpoint = use_torch_checkpoint\n        self.skip_connection = skip_connection\n\n        self.layers = ModuleList([])\n\n        # Add parameters to transformer kwargs (excluding sage_attention for now)\n        transformer_kwargs = dict(\n            dim=dim, \n            heads=heads, \n            dim_head=dim_head, \n            attn_dropout=attn_dropout, \n            ff_dropout=ff_dropout, \n            flash_attn=flash_attn\n        )\n        \n        # Print sage attention status if enabled (as per research findings)\n        if sage_attention:\n            print(\"Use Sage Attention\")\n\n        time_rotary_embed = RotaryEmbedding(dim=dim_head)\n        freq_rotary_embed = RotaryEmbedding(dim=dim_head)\n\n        for _ in range(depth):\n            self.layers.append(\n                nn.ModuleList(\n                    [\n                        Transformer(depth=time_transformer_depth, rotary_embed=time_rotary_embed, **transformer_kwargs),\n                        Transformer(depth=freq_transformer_depth, rotary_embed=freq_rotary_embed, **transformer_kwargs),\n                    ]\n                )\n            )\n\n        self.stft_window_fn = partial(default(stft_window_fn, torch.hann_window), stft_win_length)\n\n        self.stft_kwargs = dict(n_fft=stft_n_fft, hop_length=stft_hop_length, win_length=stft_win_length, normalized=stft_normalized)\n\n        freqs = torch.stft(torch.randn(1, 4096), **self.stft_kwargs, return_complex=True).shape[1]\n\n        mel_filter_bank_numpy = filters.mel(sr=sample_rate, n_fft=stft_n_fft, n_mels=num_bands)\n\n        mel_filter_bank = torch.from_numpy(mel_filter_bank_numpy)\n\n        mel_filter_bank[0][0] = 1.0\n\n        mel_filter_bank[-1, -1] = 1.0\n\n        freqs_per_band = mel_filter_bank > 0\n        assert freqs_per_band.any(dim=0).all(), \"all frequencies need to be covered by all bands for now\"\n\n        repeated_freq_indices = repeat(torch.arange(freqs), \"f -> b f\", b=num_bands)\n        freq_indices = repeated_freq_indices[freqs_per_band]\n\n        if stereo:\n            freq_indices = repeat(freq_indices, \"f -> f s\", s=2)\n            freq_indices = freq_indices * 2 + torch.arange(2)\n            freq_indices = rearrange(freq_indices, \"f s -> (f s)\")\n\n        self.register_buffer(\"freq_indices\", freq_indices, persistent=False)\n        self.register_buffer(\"freqs_per_band\", freqs_per_band, persistent=False)\n\n        num_freqs_per_band = reduce(freqs_per_band, \"b f -> b\", \"sum\")\n        num_bands_per_freq = reduce(freqs_per_band, \"b f -> f\", \"sum\")\n\n        self.register_buffer(\"num_freqs_per_band\", num_freqs_per_band, persistent=False)\n        self.register_buffer(\"num_bands_per_freq\", num_bands_per_freq, persistent=False)\n\n        freqs_per_bands_with_complex = tuple(2 * f * self.audio_channels for f in num_freqs_per_band.tolist())\n\n        self.band_split = BandSplit(dim=dim, dim_inputs=freqs_per_bands_with_complex)\n\n        self.mask_estimators = nn.ModuleList([])\n\n        for _ in range(num_stems):\n            mask_estimator = MaskEstimator(dim=dim, dim_inputs=freqs_per_bands_with_complex, depth=mask_estimator_depth)\n\n            self.mask_estimators.append(mask_estimator)\n\n        self.multi_stft_resolution_loss_weight = multi_stft_resolution_loss_weight\n        self.multi_stft_resolutions_window_sizes = multi_stft_resolutions_window_sizes\n        self.multi_stft_n_fft = stft_n_fft\n        self.multi_stft_window_fn = multi_stft_window_fn\n\n        self.multi_stft_kwargs = dict(hop_length=multi_stft_hop_size, normalized=multi_stft_normalized)\n\n        self.match_input_audio_length = match_input_audio_length\n\n    def forward(self, raw_audio, target=None, return_loss_breakdown=False):\n        \"\"\"\n        einops\n\n        b - batch\n        f - freq\n        t - time\n        s - audio channel (1 for mono, 2 for stereo)\n        n - number of 'stems'\n        c - complex (2)\n        d - feature dimension\n        \"\"\"\n\n        original_device = raw_audio.device\n        x_is_mps = True if original_device.type == \"mps\" else False\n\n        if x_is_mps:\n            raw_audio = raw_audio.cpu()\n\n        device = raw_audio.device\n\n        if raw_audio.ndim == 2:\n            raw_audio = rearrange(raw_audio, \"b t -> b 1 t\")\n\n        batch, channels, raw_audio_length = raw_audio.shape\n\n        istft_length = raw_audio_length if self.match_input_audio_length else None\n\n        assert (not self.stereo and channels == 1) or (\n            self.stereo and channels == 2\n        ), \"stereo needs to be set to True if passing in audio signal that is stereo (channel dimension of 2). also need to be False if mono (channel dimension of 1)\"\n\n        raw_audio, batch_audio_channel_packed_shape = pack_one(raw_audio, \"* t\")\n\n        stft_window = self.stft_window_fn().to(device)\n\n        stft_repr = torch.stft(raw_audio, **self.stft_kwargs, window=stft_window, return_complex=True)\n        stft_repr = torch.view_as_real(stft_repr)\n\n        stft_repr = unpack_one(stft_repr, batch_audio_channel_packed_shape, \"* f t c\")\n        stft_repr = rearrange(stft_repr, \"b s f t c -> b (f s) t c\")  # merge stereo / mono into the frequency, with frequency leading dimension, for band splitting\n\n        batch_arange = torch.arange(batch, device=device)[..., None]\n\n        x = stft_repr[batch_arange, self.freq_indices.cpu()] if x_is_mps else stft_repr[batch_arange, self.freq_indices]\n\n        x = rearrange(x, \"b f t c -> b t (f c)\")\n\n        x = self.band_split(x)\n\n        for time_transformer, freq_transformer in self.layers:\n            x = rearrange(x, \"b t f d -> b f t d\")\n            x, ps = pack([x], \"* t d\")\n\n            x = time_transformer(x)\n\n            (x,) = unpack(x, ps, \"* t d\")\n            x = rearrange(x, \"b f t d -> b t f d\")\n            x, ps = pack([x], \"* f d\")\n\n            x = freq_transformer(x)\n\n            (x,) = unpack(x, ps, \"* f d\")\n\n        masks = torch.stack([fn(x) for fn in self.mask_estimators], dim=1)\n        masks = rearrange(masks, \"b n t (f c) -> b n f t c\", c=2)\n\n        if x_is_mps:\n            masks = masks.cpu()\n\n        stft_repr = rearrange(stft_repr, \"b f t c -> b 1 f t c\")\n\n        stft_repr = torch.view_as_complex(stft_repr)\n        masks = torch.view_as_complex(masks)\n\n        masks = masks.type(stft_repr.dtype)\n\n        if x_is_mps:\n            scatter_indices = repeat(self.freq_indices.cpu(), \"f -> b n f t\", b=batch, n=self.num_stems, t=stft_repr.shape[-1])\n        else:\n            scatter_indices = repeat(self.freq_indices, \"f -> b n f t\", b=batch, n=self.num_stems, t=stft_repr.shape[-1])\n\n        stft_repr_expanded_stems = repeat(stft_repr, \"b 1 ... -> b n ...\", n=self.num_stems)\n        masks_summed = (\n            torch.zeros_like(stft_repr_expanded_stems.cpu() if x_is_mps else stft_repr_expanded_stems)\n            .scatter_add_(2, scatter_indices.cpu() if x_is_mps else scatter_indices, masks.cpu() if x_is_mps else masks)\n            .to(device)\n        )\n\n        denom = repeat(self.num_bands_per_freq, \"f -> (f r) 1\", r=channels)\n\n        if x_is_mps:\n            denom = denom.cpu()\n\n        masks_averaged = masks_summed / denom.clamp(min=1e-8)\n\n        stft_repr = stft_repr * masks_averaged\n\n        stft_repr = rearrange(stft_repr, \"b n (f s) t -> (b n s) f t\", s=self.audio_channels)\n\n        recon_audio = torch.istft(stft_repr.cpu() if x_is_mps else stft_repr, **self.stft_kwargs, window=stft_window.cpu() if x_is_mps else stft_window, return_complex=False, length=istft_length)\n\n        recon_audio = rearrange(recon_audio, \"(b n s) t -> b n s t\", b=batch, s=self.audio_channels, n=self.num_stems)\n\n        if self.num_stems == 1:\n            recon_audio = rearrange(recon_audio, \"b 1 s t -> b s t\")\n\n        if not exists(target):\n            return recon_audio\n\n        if self.num_stems > 1:\n            assert target.ndim == 4 and target.shape[1] == self.num_stems\n\n        if target.ndim == 2:\n            target = rearrange(target, \"... t -> ... 1 t\")\n\n        target = target[..., : recon_audio.shape[-1]]\n\n        loss = F.l1_loss(recon_audio, target)\n\n        multi_stft_resolution_loss = 0.0\n\n        for window_size in self.multi_stft_resolutions_window_sizes:\n            res_stft_kwargs = dict(\n                n_fft=max(window_size, self.multi_stft_n_fft), win_length=window_size, return_complex=True, window=self.multi_stft_window_fn(window_size, device=device), **self.multi_stft_kwargs\n            )\n\n            recon_Y = torch.stft(rearrange(recon_audio, \"... s t -> (... s) t\"), **res_stft_kwargs)\n            target_Y = torch.stft(rearrange(target, \"... s t -> (... s) t\"), **res_stft_kwargs)\n\n            multi_stft_resolution_loss = multi_stft_resolution_loss + F.l1_loss(recon_Y, target_Y)\n\n        weighted_multi_resolution_loss = multi_stft_resolution_loss * self.multi_stft_resolution_loss_weight\n\n        total_loss = loss + weighted_multi_resolution_loss\n\n        # Move the total loss back to the original device if necessary\n        if x_is_mps:\n            total_loss = total_loss.to(original_device)\n\n        if not return_loss_breakdown:\n            return total_loss\n\n        # If detailed loss breakdown is requested, ensure all components are on the original device\n        return total_loss, (loss.to(original_device) if x_is_mps else loss, multi_stft_resolution_loss.to(original_device) if x_is_mps else multi_stft_resolution_loss)\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/spec_utils.py",
    "content": "import audioread\r\nimport librosa\r\nimport numpy as np\r\nimport soundfile as sf\r\nimport math\r\nimport platform\r\nimport traceback\r\nfrom audio_separator.separator.uvr_lib_v5 import pyrb\r\nfrom scipy.signal import correlate, hilbert\r\nimport io\r\n\r\nOPERATING_SYSTEM = platform.system()\r\nSYSTEM_ARCH = platform.platform()\r\nSYSTEM_PROC = platform.processor()\r\nARM = \"arm\"\r\n\r\nAUTO_PHASE = \"Automatic\"\r\nPOSITIVE_PHASE = \"Positive Phase\"\r\nNEGATIVE_PHASE = \"Negative Phase\"\r\nNONE_P = (\"None\",)\r\nLOW_P = (\"Shifts: Low\",)\r\nMED_P = (\"Shifts: Medium\",)\r\nHIGH_P = (\"Shifts: High\",)\r\nVHIGH_P = \"Shifts: Very High\"\r\nMAXIMUM_P = \"Shifts: Maximum\"\r\n\r\nprogress_value = 0\r\nlast_update_time = 0\r\nis_macos = False\r\n\r\n\r\nif OPERATING_SYSTEM == \"Darwin\":\r\n    wav_resolution = \"polyphase\" if SYSTEM_PROC == ARM or ARM in SYSTEM_ARCH else \"sinc_fastest\"\r\n    wav_resolution_float_resampling = \"kaiser_best\" if SYSTEM_PROC == ARM or ARM in SYSTEM_ARCH else wav_resolution\r\n    is_macos = True\r\nelse:\r\n    wav_resolution = \"sinc_fastest\"\r\n    wav_resolution_float_resampling = wav_resolution\r\n\r\nMAX_SPEC = \"Max Spec\"\r\nMIN_SPEC = \"Min Spec\"\r\nLIN_ENSE = \"Linear Ensemble\"\r\n\r\nMAX_WAV = MAX_SPEC\r\nMIN_WAV = MIN_SPEC\r\n\r\nAVERAGE = \"Average\"\r\n\r\n\r\ndef crop_center(h1, h2):\r\n    \"\"\"\r\n    This function crops the center of the first input tensor to match the size of the second input tensor.\r\n    It is used to ensure that the two tensors have the same size in the time dimension.\r\n    \"\"\"\r\n    h1_shape = h1.size()\r\n    h2_shape = h2.size()\r\n\r\n    # If the time dimensions are already equal, return the first tensor as is\r\n    if h1_shape[3] == h2_shape[3]:\r\n        return h1\r\n    # If the time dimension of the first tensor is smaller, raise an error\r\n    elif h1_shape[3] < h2_shape[3]:\r\n        raise ValueError(\"h1_shape[3] must be greater than h2_shape[3]\")\r\n\r\n    # Calculate the start and end indices for cropping\r\n    s_time = (h1_shape[3] - h2_shape[3]) // 2\r\n    e_time = s_time + h2_shape[3]\r\n    # Crop the first tensor\r\n    h1 = h1[:, :, :, s_time:e_time]\r\n\r\n    return h1\r\n\r\n\r\ndef preprocess(X_spec):\r\n    \"\"\"\r\n    This function preprocesses a spectrogram by separating it into magnitude and phase components.\r\n    This is a common preprocessing step in audio processing tasks.\r\n    \"\"\"\r\n    X_mag = np.abs(X_spec)\r\n    X_phase = np.angle(X_spec)\r\n\r\n    return X_mag, X_phase\r\n\r\n\r\ndef make_padding(width, cropsize, offset):\r\n    \"\"\"\r\n    This function calculates the padding needed to make the width of an image divisible by the crop size.\r\n    It is used in the process of splitting an image into smaller patches.\r\n    \"\"\"\r\n    left = offset\r\n    roi_size = cropsize - offset * 2\r\n    if roi_size == 0:\r\n        roi_size = cropsize\r\n    right = roi_size - (width % roi_size) + left\r\n\r\n    return left, right, roi_size\r\n\r\n\r\ndef normalize(wave, max_peak=1.0, min_peak=None):\r\n    \"\"\"Normalize (or amplify) audio waveform to a specified peak value.\r\n\r\n    Args:\r\n        wave (array-like): Audio waveform.\r\n        max_peak (float): Maximum peak value for normalization.\r\n\r\n    Returns:\r\n        array-like: Normalized or original waveform.\r\n    \"\"\"\r\n    maxv = np.abs(wave).max()\r\n    if maxv > max_peak:\r\n        wave *= max_peak / maxv\r\n    elif min_peak is not None and maxv < min_peak:\r\n        wave *= min_peak / maxv\r\n\r\n    return wave\r\n\r\n\r\ndef auto_transpose(audio_array: np.ndarray):\r\n    \"\"\"\r\n    Ensure that the audio array is in the (channels, samples) format.\r\n\r\n    Parameters:\r\n        audio_array (ndarray): Input audio array.\r\n\r\n    Returns:\r\n        ndarray: Transposed audio array if necessary.\r\n    \"\"\"\r\n\r\n    # If the second dimension is 2 (indicating stereo channels), transpose the array\r\n    if audio_array.shape[1] == 2:\r\n        return audio_array.T\r\n    return audio_array\r\n\r\n\r\ndef write_array_to_mem(audio_data, subtype):\r\n    if isinstance(audio_data, np.ndarray):\r\n        audio_buffer = io.BytesIO()\r\n        sf.write(audio_buffer, audio_data, 44100, subtype=subtype, format=\"WAV\")\r\n        audio_buffer.seek(0)\r\n        return audio_buffer\r\n    else:\r\n        return audio_data\r\n\r\n\r\ndef spectrogram_to_image(spec, mode=\"magnitude\"):\r\n    if mode == \"magnitude\":\r\n        if np.iscomplexobj(spec):\r\n            y = np.abs(spec)\r\n        else:\r\n            y = spec\r\n        y = np.log10(y**2 + 1e-8)\r\n    elif mode == \"phase\":\r\n        if np.iscomplexobj(spec):\r\n            y = np.angle(spec)\r\n        else:\r\n            y = spec\r\n\r\n    y -= y.min()\r\n    y *= 255 / y.max()\r\n    img = np.uint8(y)\r\n\r\n    if y.ndim == 3:\r\n        img = img.transpose(1, 2, 0)\r\n        img = np.concatenate([np.max(img, axis=2, keepdims=True), img], axis=2)\r\n\r\n    return img\r\n\r\n\r\ndef reduce_vocal_aggressively(X, y, softmask):\r\n    v = X - y\r\n    y_mag_tmp = np.abs(y)\r\n    v_mag_tmp = np.abs(v)\r\n\r\n    v_mask = v_mag_tmp > y_mag_tmp\r\n    y_mag = np.clip(y_mag_tmp - v_mag_tmp * v_mask * softmask, 0, np.inf)\r\n\r\n    return y_mag * np.exp(1.0j * np.angle(y))\r\n\r\n\r\ndef merge_artifacts(y_mask, thres=0.01, min_range=64, fade_size=32):\r\n    mask = y_mask\r\n\r\n    try:\r\n        if min_range < fade_size * 2:\r\n            raise ValueError(\"min_range must be >= fade_size * 2\")\r\n\r\n        idx = np.where(y_mask.min(axis=(0, 1)) > thres)[0]\r\n        start_idx = np.insert(idx[np.where(np.diff(idx) != 1)[0] + 1], 0, idx[0])\r\n        end_idx = np.append(idx[np.where(np.diff(idx) != 1)[0]], idx[-1])\r\n        artifact_idx = np.where(end_idx - start_idx > min_range)[0]\r\n        weight = np.zeros_like(y_mask)\r\n        if len(artifact_idx) > 0:\r\n            start_idx = start_idx[artifact_idx]\r\n            end_idx = end_idx[artifact_idx]\r\n            old_e = None\r\n            for s, e in zip(start_idx, end_idx):\r\n                if old_e is not None and s - old_e < fade_size:\r\n                    s = old_e - fade_size * 2\r\n\r\n                if s != 0:\r\n                    weight[:, :, s : s + fade_size] = np.linspace(0, 1, fade_size)\r\n                else:\r\n                    s -= fade_size\r\n\r\n                if e != y_mask.shape[2]:\r\n                    weight[:, :, e - fade_size : e] = np.linspace(1, 0, fade_size)\r\n                else:\r\n                    e += fade_size\r\n\r\n                weight[:, :, s + fade_size : e - fade_size] = 1\r\n                old_e = e\r\n\r\n        v_mask = 1 - y_mask\r\n        y_mask += weight * v_mask\r\n\r\n        mask = y_mask\r\n    except Exception as e:\r\n        error_name = f\"{type(e).__name__}\"\r\n        traceback_text = \"\".join(traceback.format_tb(e.__traceback__))\r\n        message = f'{error_name}: \"{e}\"\\n{traceback_text}\"'\r\n        print(\"Post Process Failed: \", message)\r\n\r\n    return mask\r\n\r\n\r\ndef align_wave_head_and_tail(a, b):\r\n    l = min([a[0].size, b[0].size])\r\n\r\n    return a[:l, :l], b[:l, :l]\r\n\r\n\r\ndef convert_channels(spec, mp, band):\r\n    cc = mp.param[\"band\"][band].get(\"convert_channels\")\r\n\r\n    if \"mid_side_c\" == cc:\r\n        spec_left = np.add(spec[0], spec[1] * 0.25)\r\n        spec_right = np.subtract(spec[1], spec[0] * 0.25)\r\n    elif \"mid_side\" == cc:\r\n        spec_left = np.add(spec[0], spec[1]) / 2\r\n        spec_right = np.subtract(spec[0], spec[1])\r\n    elif \"stereo_n\" == cc:\r\n        spec_left = np.add(spec[0], spec[1] * 0.25) / 0.9375\r\n        spec_right = np.add(spec[1], spec[0] * 0.25) / 0.9375\r\n    else:\r\n        return spec\r\n\r\n    return np.asfortranarray([spec_left, spec_right])\r\n\r\n\r\ndef combine_spectrograms(specs, mp, is_v51_model=False):\r\n    l = min([specs[i].shape[2] for i in specs])\r\n    spec_c = np.zeros(shape=(2, mp.param[\"bins\"] + 1, l), dtype=np.complex64)\r\n    offset = 0\r\n    bands_n = len(mp.param[\"band\"])\r\n\r\n    for d in range(1, bands_n + 1):\r\n        h = mp.param[\"band\"][d][\"crop_stop\"] - mp.param[\"band\"][d][\"crop_start\"]\r\n        spec_c[:, offset : offset + h, :l] = specs[d][:, mp.param[\"band\"][d][\"crop_start\"] : mp.param[\"band\"][d][\"crop_stop\"], :l]\r\n        offset += h\r\n\r\n    if offset > mp.param[\"bins\"]:\r\n        raise ValueError(\"Too much bins\")\r\n\r\n    # lowpass fiter\r\n\r\n    if mp.param[\"pre_filter_start\"] > 0:\r\n        if is_v51_model:\r\n            spec_c *= get_lp_filter_mask(spec_c.shape[1], mp.param[\"pre_filter_start\"], mp.param[\"pre_filter_stop\"])\r\n        else:\r\n            if bands_n == 1:\r\n                spec_c = fft_lp_filter(spec_c, mp.param[\"pre_filter_start\"], mp.param[\"pre_filter_stop\"])\r\n            else:\r\n                gp = 1\r\n                for b in range(mp.param[\"pre_filter_start\"] + 1, mp.param[\"pre_filter_stop\"]):\r\n                    g = math.pow(10, -(b - mp.param[\"pre_filter_start\"]) * (3.5 - gp) / 20.0)\r\n                    gp = g\r\n                    spec_c[:, b, :] *= g\r\n\r\n    return np.asfortranarray(spec_c)\r\n\r\n\r\ndef wave_to_spectrogram(wave, hop_length, n_fft, mp, band, is_v51_model=False):\r\n\r\n    if wave.ndim == 1:\r\n        wave = np.asfortranarray([wave, wave])\r\n\r\n    if not is_v51_model:\r\n        if mp.param[\"reverse\"]:\r\n            wave_left = np.flip(np.asfortranarray(wave[0]))\r\n            wave_right = np.flip(np.asfortranarray(wave[1]))\r\n        elif mp.param[\"mid_side\"]:\r\n            wave_left = np.asfortranarray(np.add(wave[0], wave[1]) / 2)\r\n            wave_right = np.asfortranarray(np.subtract(wave[0], wave[1]))\r\n        elif mp.param[\"mid_side_b2\"]:\r\n            wave_left = np.asfortranarray(np.add(wave[1], wave[0] * 0.5))\r\n            wave_right = np.asfortranarray(np.subtract(wave[0], wave[1] * 0.5))\r\n        else:\r\n            wave_left = np.asfortranarray(wave[0])\r\n            wave_right = np.asfortranarray(wave[1])\r\n    else:\r\n        wave_left = np.asfortranarray(wave[0])\r\n        wave_right = np.asfortranarray(wave[1])\r\n\r\n    spec_left = librosa.stft(wave_left, n_fft=n_fft, hop_length=hop_length)\r\n    spec_right = librosa.stft(wave_right, n_fft=n_fft, hop_length=hop_length)\r\n\r\n    spec = np.asfortranarray([spec_left, spec_right])\r\n\r\n    if is_v51_model:\r\n        spec = convert_channels(spec, mp, band)\r\n\r\n    return spec\r\n\r\n\r\ndef spectrogram_to_wave(spec, hop_length=1024, mp={}, band=0, is_v51_model=True):\r\n    spec_left = np.asfortranarray(spec[0])\r\n    spec_right = np.asfortranarray(spec[1])\r\n\r\n    wave_left = librosa.istft(spec_left, hop_length=hop_length)\r\n    wave_right = librosa.istft(spec_right, hop_length=hop_length)\r\n\r\n    if is_v51_model:\r\n        cc = mp.param[\"band\"][band].get(\"convert_channels\")\r\n        if \"mid_side_c\" == cc:\r\n            return np.asfortranarray([np.subtract(wave_left / 1.0625, wave_right / 4.25), np.add(wave_right / 1.0625, wave_left / 4.25)])\r\n        elif \"mid_side\" == cc:\r\n            return np.asfortranarray([np.add(wave_left, wave_right / 2), np.subtract(wave_left, wave_right / 2)])\r\n        elif \"stereo_n\" == cc:\r\n            return np.asfortranarray([np.subtract(wave_left, wave_right * 0.25), np.subtract(wave_right, wave_left * 0.25)])\r\n    else:\r\n        if mp.param[\"reverse\"]:\r\n            return np.asfortranarray([np.flip(wave_left), np.flip(wave_right)])\r\n        elif mp.param[\"mid_side\"]:\r\n            return np.asfortranarray([np.add(wave_left, wave_right / 2), np.subtract(wave_left, wave_right / 2)])\r\n        elif mp.param[\"mid_side_b2\"]:\r\n            return np.asfortranarray([np.add(wave_right / 1.25, 0.4 * wave_left), np.subtract(wave_left / 1.25, 0.4 * wave_right)])\r\n\r\n    return np.asfortranarray([wave_left, wave_right])\r\n\r\n\r\ndef cmb_spectrogram_to_wave(spec_m, mp, extra_bins_h=None, extra_bins=None, is_v51_model=False):\r\n    bands_n = len(mp.param[\"band\"])\r\n    offset = 0\r\n\r\n    for d in range(1, bands_n + 1):\r\n        bp = mp.param[\"band\"][d]\r\n        spec_s = np.zeros(shape=(2, bp[\"n_fft\"] // 2 + 1, spec_m.shape[2]), dtype=complex)\r\n        h = bp[\"crop_stop\"] - bp[\"crop_start\"]\r\n        spec_s[:, bp[\"crop_start\"] : bp[\"crop_stop\"], :] = spec_m[:, offset : offset + h, :]\r\n\r\n        offset += h\r\n        if d == bands_n:  # higher\r\n            if extra_bins_h:  # if --high_end_process bypass\r\n                max_bin = bp[\"n_fft\"] // 2\r\n                spec_s[:, max_bin - extra_bins_h : max_bin, :] = extra_bins[:, :extra_bins_h, :]\r\n            if bp[\"hpf_start\"] > 0:\r\n                if is_v51_model:\r\n                    spec_s *= get_hp_filter_mask(spec_s.shape[1], bp[\"hpf_start\"], bp[\"hpf_stop\"] - 1)\r\n                else:\r\n                    spec_s = fft_hp_filter(spec_s, bp[\"hpf_start\"], bp[\"hpf_stop\"] - 1)\r\n            if bands_n == 1:\r\n                wave = spectrogram_to_wave(spec_s, bp[\"hl\"], mp, d, is_v51_model)\r\n            else:\r\n                wave = np.add(wave, spectrogram_to_wave(spec_s, bp[\"hl\"], mp, d, is_v51_model))\r\n        else:\r\n            sr = mp.param[\"band\"][d + 1][\"sr\"]\r\n            if d == 1:  # lower\r\n                if is_v51_model:\r\n                    spec_s *= get_lp_filter_mask(spec_s.shape[1], bp[\"lpf_start\"], bp[\"lpf_stop\"])\r\n                else:\r\n                    spec_s = fft_lp_filter(spec_s, bp[\"lpf_start\"], bp[\"lpf_stop\"])\r\n\r\n                try:\r\n                    wave = librosa.resample(spectrogram_to_wave(spec_s, bp[\"hl\"], mp, d, is_v51_model), orig_sr=bp[\"sr\"], target_sr=sr, res_type=wav_resolution)\r\n                except ValueError as e:\r\n                    print(f\"Error during resampling: {e}\")\r\n                    print(f\"Spec_s shape: {spec_s.shape}, SR: {sr}, Res type: {wav_resolution}\")\r\n\r\n            else:  # mid\r\n                if is_v51_model:\r\n                    spec_s *= get_hp_filter_mask(spec_s.shape[1], bp[\"hpf_start\"], bp[\"hpf_stop\"] - 1)\r\n                    spec_s *= get_lp_filter_mask(spec_s.shape[1], bp[\"lpf_start\"], bp[\"lpf_stop\"])\r\n                else:\r\n                    spec_s = fft_hp_filter(spec_s, bp[\"hpf_start\"], bp[\"hpf_stop\"] - 1)\r\n                    spec_s = fft_lp_filter(spec_s, bp[\"lpf_start\"], bp[\"lpf_stop\"])\r\n\r\n                wave2 = np.add(wave, spectrogram_to_wave(spec_s, bp[\"hl\"], mp, d, is_v51_model))\r\n\r\n                try:\r\n                    wave = librosa.resample(wave2, orig_sr=bp[\"sr\"], target_sr=sr, res_type=wav_resolution)\r\n                except ValueError as e:\r\n                    print(f\"Error during resampling: {e}\")\r\n                    print(f\"Spec_s shape: {spec_s.shape}, SR: {sr}, Res type: {wav_resolution}\")\r\n\r\n    return wave\r\n\r\n\r\ndef get_lp_filter_mask(n_bins, bin_start, bin_stop):\r\n    mask = np.concatenate([np.ones((bin_start - 1, 1)), np.linspace(1, 0, bin_stop - bin_start + 1)[:, None], np.zeros((n_bins - bin_stop, 1))], axis=0)\r\n\r\n    return mask\r\n\r\n\r\ndef get_hp_filter_mask(n_bins, bin_start, bin_stop):\r\n    mask = np.concatenate([np.zeros((bin_stop + 1, 1)), np.linspace(0, 1, 1 + bin_start - bin_stop)[:, None], np.ones((n_bins - bin_start - 2, 1))], axis=0)\r\n\r\n    return mask\r\n\r\n\r\ndef fft_lp_filter(spec, bin_start, bin_stop):\r\n    g = 1.0\r\n    for b in range(bin_start, bin_stop):\r\n        g -= 1 / (bin_stop - bin_start)\r\n        spec[:, b, :] = g * spec[:, b, :]\r\n\r\n    spec[:, bin_stop:, :] *= 0\r\n\r\n    return spec\r\n\r\n\r\ndef fft_hp_filter(spec, bin_start, bin_stop):\r\n    g = 1.0\r\n    for b in range(bin_start, bin_stop, -1):\r\n        g -= 1 / (bin_start - bin_stop)\r\n        spec[:, b, :] = g * spec[:, b, :]\r\n\r\n    spec[:, 0 : bin_stop + 1, :] *= 0\r\n\r\n    return spec\r\n\r\n\r\ndef spectrogram_to_wave_old(spec, hop_length=1024):\r\n    if spec.ndim == 2:\r\n        wave = librosa.istft(spec, hop_length=hop_length)\r\n    elif spec.ndim == 3:\r\n        spec_left = np.asfortranarray(spec[0])\r\n        spec_right = np.asfortranarray(spec[1])\r\n\r\n        wave_left = librosa.istft(spec_left, hop_length=hop_length)\r\n        wave_right = librosa.istft(spec_right, hop_length=hop_length)\r\n        wave = np.asfortranarray([wave_left, wave_right])\r\n\r\n    return wave\r\n\r\n\r\ndef wave_to_spectrogram_old(wave, hop_length, n_fft):\r\n    wave_left = np.asfortranarray(wave[0])\r\n    wave_right = np.asfortranarray(wave[1])\r\n\r\n    spec_left = librosa.stft(wave_left, n_fft=n_fft, hop_length=hop_length)\r\n    spec_right = librosa.stft(wave_right, n_fft=n_fft, hop_length=hop_length)\r\n\r\n    spec = np.asfortranarray([spec_left, spec_right])\r\n\r\n    return spec\r\n\r\n\r\ndef mirroring(a, spec_m, input_high_end, mp):\r\n    if \"mirroring\" == a:\r\n        mirror = np.flip(np.abs(spec_m[:, mp.param[\"pre_filter_start\"] - 10 - input_high_end.shape[1] : mp.param[\"pre_filter_start\"] - 10, :]), 1)\r\n        mirror = mirror * np.exp(1.0j * np.angle(input_high_end))\r\n\r\n        return np.where(np.abs(input_high_end) <= np.abs(mirror), input_high_end, mirror)\r\n\r\n    if \"mirroring2\" == a:\r\n        mirror = np.flip(np.abs(spec_m[:, mp.param[\"pre_filter_start\"] - 10 - input_high_end.shape[1] : mp.param[\"pre_filter_start\"] - 10, :]), 1)\r\n        mi = np.multiply(mirror, input_high_end * 1.7)\r\n\r\n        return np.where(np.abs(input_high_end) <= np.abs(mi), input_high_end, mi)\r\n\r\n\r\ndef adjust_aggr(mask, is_non_accom_stem, aggressiveness):\r\n    aggr = aggressiveness[\"value\"] * 2\r\n\r\n    if aggr != 0:\r\n        if is_non_accom_stem:\r\n            aggr = 1 - aggr\r\n\r\n        if np.any(aggr > 10) or np.any(aggr < -10):\r\n            print(f\"Warning: Extreme aggressiveness values detected: {aggr}\")\r\n\r\n        aggr = [aggr, aggr]\r\n\r\n        if aggressiveness[\"aggr_correction\"] is not None:\r\n            aggr[0] += aggressiveness[\"aggr_correction\"][\"left\"]\r\n            aggr[1] += aggressiveness[\"aggr_correction\"][\"right\"]\r\n\r\n        for ch in range(2):\r\n            mask[ch, : aggressiveness[\"split_bin\"]] = np.power(mask[ch, : aggressiveness[\"split_bin\"]], 1 + aggr[ch] / 3)\r\n            mask[ch, aggressiveness[\"split_bin\"] :] = np.power(mask[ch, aggressiveness[\"split_bin\"] :], 1 + aggr[ch])\r\n\r\n    return mask\r\n\r\n\r\ndef stft(wave, nfft, hl):\r\n    wave_left = np.asfortranarray(wave[0])\r\n    wave_right = np.asfortranarray(wave[1])\r\n    spec_left = librosa.stft(wave_left, n_fft=nfft, hop_length=hl)\r\n    spec_right = librosa.stft(wave_right, n_fft=nfft, hop_length=hl)\r\n    spec = np.asfortranarray([spec_left, spec_right])\r\n\r\n    return spec\r\n\r\n\r\ndef istft(spec, hl):\r\n    spec_left = np.asfortranarray(spec[0])\r\n    spec_right = np.asfortranarray(spec[1])\r\n    wave_left = librosa.istft(spec_left, hop_length=hl)\r\n    wave_right = librosa.istft(spec_right, hop_length=hl)\r\n    wave = np.asfortranarray([wave_left, wave_right])\r\n\r\n    return wave\r\n\r\n\r\ndef spec_effects(wave, algorithm=\"Default\", value=None):\r\n    if np.isnan(wave).any() or np.isinf(wave).any():\r\n        print(f\"Warning: Detected NaN or infinite values in wave input. Shape: {wave.shape}\")\r\n\r\n    spec = [stft(wave[0], 2048, 1024), stft(wave[1], 2048, 1024)]\r\n    if algorithm == \"Min_Mag\":\r\n        v_spec_m = np.where(np.abs(spec[1]) <= np.abs(spec[0]), spec[1], spec[0])\r\n        wave = istft(v_spec_m, 1024)\r\n    elif algorithm == \"Max_Mag\":\r\n        v_spec_m = np.where(np.abs(spec[1]) >= np.abs(spec[0]), spec[1], spec[0])\r\n        wave = istft(v_spec_m, 1024)\r\n    elif algorithm == \"Default\":\r\n        wave = (wave[1] * value) + (wave[0] * (1 - value))\r\n    elif algorithm == \"Invert_p\":\r\n        X_mag = np.abs(spec[0])\r\n        y_mag = np.abs(spec[1])\r\n        max_mag = np.where(X_mag >= y_mag, X_mag, y_mag)\r\n        v_spec = spec[1] - max_mag * np.exp(1.0j * np.angle(spec[0]))\r\n        wave = istft(v_spec, 1024)\r\n\r\n    return wave\r\n\r\n\r\ndef spectrogram_to_wave_no_mp(spec, n_fft=2048, hop_length=1024):\r\n    wave = librosa.istft(spec, n_fft=n_fft, hop_length=hop_length)\r\n\r\n    if wave.ndim == 1:\r\n        wave = np.asfortranarray([wave, wave])\r\n\r\n    return wave\r\n\r\n\r\ndef wave_to_spectrogram_no_mp(wave):\r\n\r\n    spec = librosa.stft(wave, n_fft=2048, hop_length=1024)\r\n\r\n    if spec.ndim == 1:\r\n        spec = np.asfortranarray([spec, spec])\r\n\r\n    return spec\r\n\r\n\r\ndef invert_audio(specs, invert_p=True):\r\n\r\n    ln = min([specs[0].shape[2], specs[1].shape[2]])\r\n    specs[0] = specs[0][:, :, :ln]\r\n    specs[1] = specs[1][:, :, :ln]\r\n\r\n    if invert_p:\r\n        X_mag = np.abs(specs[0])\r\n        y_mag = np.abs(specs[1])\r\n        max_mag = np.where(X_mag >= y_mag, X_mag, y_mag)\r\n        v_spec = specs[1] - max_mag * np.exp(1.0j * np.angle(specs[0]))\r\n    else:\r\n        specs[1] = reduce_vocal_aggressively(specs[0], specs[1], 0.2)\r\n        v_spec = specs[0] - specs[1]\r\n\r\n    return v_spec\r\n\r\n\r\ndef invert_stem(mixture, stem):\r\n    mixture = wave_to_spectrogram_no_mp(mixture)\r\n    stem = wave_to_spectrogram_no_mp(stem)\r\n    output = spectrogram_to_wave_no_mp(invert_audio([mixture, stem]))\r\n\r\n    return -output.T\r\n\r\n\r\ndef ensembling(a, inputs, is_wavs=False):\r\n\r\n    for i in range(1, len(inputs)):\r\n        if i == 1:\r\n            input = inputs[0]\r\n\r\n        if is_wavs:\r\n            ln = min([input.shape[1], inputs[i].shape[1]])\r\n            input = input[:, :ln]\r\n            inputs[i] = inputs[i][:, :ln]\r\n        else:\r\n            ln = min([input.shape[2], inputs[i].shape[2]])\r\n            input = input[:, :, :ln]\r\n            inputs[i] = inputs[i][:, :, :ln]\r\n\r\n        if MIN_SPEC == a:\r\n            input = np.where(np.abs(inputs[i]) <= np.abs(input), inputs[i], input)\r\n        if MAX_SPEC == a:\r\n            #input = np.array(np.where(np.greater_equal(np.abs(inputs[i]), np.abs(input)), inputs[i], input), dtype=object)\r\n            input = np.where(np.abs(inputs[i]) >= np.abs(input), inputs[i], input)\r\n            #max_spec = np.array([np.where(np.greater_equal(np.abs(inputs[i]), np.abs(input)), s, specs[0]) for s in specs[1:]], dtype=object)[-1]\r\n\r\n    # linear_ensemble\r\n    # input = ensemble_wav(inputs, split_size=1)\r\n\r\n    return input\r\n\r\n\r\ndef ensemble_for_align(waves):\r\n\r\n    specs = []\r\n\r\n    for wav in waves:\r\n        spec = wave_to_spectrogram_no_mp(wav.T)\r\n        specs.append(spec)\r\n\r\n    wav_aligned = spectrogram_to_wave_no_mp(ensembling(MIN_SPEC, specs)).T\r\n    wav_aligned = match_array_shapes(wav_aligned, waves[1], is_swap=True)\r\n\r\n    return wav_aligned\r\n\r\n\r\ndef ensemble_inputs(audio_input, algorithm, is_normalization, wav_type_set, save_path, is_wave=False, is_array=False):\r\n\r\n    wavs_ = []\r\n\r\n    if algorithm == AVERAGE:\r\n        output = average_audio(audio_input)\r\n        samplerate = 44100\r\n    else:\r\n        specs = []\r\n\r\n        for i in range(len(audio_input)):\r\n            wave, samplerate = librosa.load(audio_input[i], mono=False, sr=44100)\r\n            wavs_.append(wave)\r\n            spec = wave if is_wave else wave_to_spectrogram_no_mp(wave)\r\n            specs.append(spec)\r\n\r\n        wave_shapes = [w.shape[1] for w in wavs_]\r\n        target_shape = wavs_[wave_shapes.index(max(wave_shapes))]\r\n\r\n        if is_wave:\r\n            output = ensembling(algorithm, specs, is_wavs=True)\r\n        else:\r\n            output = spectrogram_to_wave_no_mp(ensembling(algorithm, specs))\r\n\r\n        output = to_shape(output, target_shape.shape)\r\n\r\n    sf.write(save_path, normalize(output.T, is_normalization), samplerate, subtype=wav_type_set)\r\n\r\n\r\ndef to_shape(x, target_shape):\r\n    padding_list = []\r\n    for x_dim, target_dim in zip(x.shape, target_shape):\r\n        pad_value = target_dim - x_dim\r\n        pad_tuple = (0, pad_value)\r\n        padding_list.append(pad_tuple)\r\n\r\n    return np.pad(x, tuple(padding_list), mode=\"constant\")\r\n\r\n\r\ndef to_shape_minimize(x: np.ndarray, target_shape):\r\n\r\n    padding_list = []\r\n    for x_dim, target_dim in zip(x.shape, target_shape):\r\n        pad_value = target_dim - x_dim\r\n        pad_tuple = (0, pad_value)\r\n        padding_list.append(pad_tuple)\r\n\r\n    return np.pad(x, tuple(padding_list), mode=\"constant\")\r\n\r\n\r\ndef detect_leading_silence(audio, sr, silence_threshold=0.007, frame_length=1024):\r\n    \"\"\"\r\n    Detect silence at the beginning of an audio signal.\r\n\r\n    :param audio: np.array, audio signal\r\n    :param sr: int, sample rate\r\n    :param silence_threshold: float, magnitude threshold below which is considered silence\r\n    :param frame_length: int, the number of samples to consider for each check\r\n\r\n    :return: float, duration of the leading silence in milliseconds\r\n    \"\"\"\r\n\r\n    if len(audio.shape) == 2:\r\n        # If stereo, pick the channel with more energy to determine the silence\r\n        channel = np.argmax(np.sum(np.abs(audio), axis=1))\r\n        audio = audio[channel]\r\n\r\n    for i in range(0, len(audio), frame_length):\r\n        if np.max(np.abs(audio[i : i + frame_length])) > silence_threshold:\r\n            return (i / sr) * 1000\r\n\r\n    return (len(audio) / sr) * 1000\r\n\r\n\r\ndef adjust_leading_silence(target_audio, reference_audio, silence_threshold=0.01, frame_length=1024):\r\n    \"\"\"\r\n    Adjust the leading silence of the target_audio to match the leading silence of the reference_audio.\r\n\r\n    :param target_audio: np.array, audio signal that will have its silence adjusted\r\n    :param reference_audio: np.array, audio signal used as a reference\r\n    :param sr: int, sample rate\r\n    :param silence_threshold: float, magnitude threshold below which is considered silence\r\n    :param frame_length: int, the number of samples to consider for each check\r\n\r\n    :return: np.array, target_audio adjusted to have the same leading silence as reference_audio\r\n    \"\"\"\r\n\r\n    def find_silence_end(audio):\r\n        if len(audio.shape) == 2:\r\n            # If stereo, pick the channel with more energy to determine the silence\r\n            channel = np.argmax(np.sum(np.abs(audio), axis=1))\r\n            audio_mono = audio[channel]\r\n        else:\r\n            audio_mono = audio\r\n\r\n        for i in range(0, len(audio_mono), frame_length):\r\n            if np.max(np.abs(audio_mono[i : i + frame_length])) > silence_threshold:\r\n                return i\r\n        return len(audio_mono)\r\n\r\n    ref_silence_end = find_silence_end(reference_audio)\r\n    target_silence_end = find_silence_end(target_audio)\r\n    silence_difference = ref_silence_end - target_silence_end\r\n\r\n    try:\r\n        ref_silence_end_p = (ref_silence_end / 44100) * 1000\r\n        target_silence_end_p = (target_silence_end / 44100) * 1000\r\n        silence_difference_p = ref_silence_end_p - target_silence_end_p\r\n        print(\"silence_difference: \", silence_difference_p)\r\n    except Exception as e:\r\n        pass\r\n\r\n    if silence_difference > 0:  # Add silence to target_audio\r\n        if len(target_audio.shape) == 2:  # stereo\r\n            silence_to_add = np.zeros((target_audio.shape[0], silence_difference))\r\n        else:  # mono\r\n            silence_to_add = np.zeros(silence_difference)\r\n        return np.hstack((silence_to_add, target_audio))\r\n    elif silence_difference < 0:  # Remove silence from target_audio\r\n        if len(target_audio.shape) == 2:  # stereo\r\n            return target_audio[:, -silence_difference:]\r\n        else:  # mono\r\n            return target_audio[-silence_difference:]\r\n    else:  # No adjustment needed\r\n        return target_audio\r\n\r\n\r\ndef match_array_shapes(array_1: np.ndarray, array_2: np.ndarray, is_swap=False):\r\n\r\n    if is_swap:\r\n        array_1, array_2 = array_1.T, array_2.T\r\n\r\n    # print(\"before\", array_1.shape, array_2.shape)\r\n    if array_1.shape[1] > array_2.shape[1]:\r\n        array_1 = array_1[:, : array_2.shape[1]]\r\n    elif array_1.shape[1] < array_2.shape[1]:\r\n        padding = array_2.shape[1] - array_1.shape[1]\r\n        array_1 = np.pad(array_1, ((0, 0), (0, padding)), \"constant\", constant_values=0)\r\n\r\n    # print(\"after\", array_1.shape, array_2.shape)\r\n\r\n    if is_swap:\r\n        array_1, array_2 = array_1.T, array_2.T\r\n\r\n    return array_1\r\n\r\n\r\ndef match_mono_array_shapes(array_1: np.ndarray, array_2: np.ndarray):\r\n\r\n    if len(array_1) > len(array_2):\r\n        array_1 = array_1[: len(array_2)]\r\n    elif len(array_1) < len(array_2):\r\n        padding = len(array_2) - len(array_1)\r\n        array_1 = np.pad(array_1, (0, padding), \"constant\", constant_values=0)\r\n\r\n    return array_1\r\n\r\n\r\ndef change_pitch_semitones(y, sr, semitone_shift):\r\n    factor = 2 ** (semitone_shift / 12)  # Convert semitone shift to factor for resampling\r\n    y_pitch_tuned = []\r\n    for y_channel in y:\r\n        y_pitch_tuned.append(librosa.resample(y_channel, orig_sr=sr, target_sr=sr * factor, res_type=wav_resolution_float_resampling))\r\n    y_pitch_tuned = np.array(y_pitch_tuned)\r\n    new_sr = sr * factor\r\n    return y_pitch_tuned, new_sr\r\n\r\n\r\ndef augment_audio(export_path, audio_file, rate, is_normalization, wav_type_set, save_format=None, is_pitch=False, is_time_correction=True):\r\n\r\n    wav, sr = librosa.load(audio_file, sr=44100, mono=False)\r\n\r\n    if wav.ndim == 1:\r\n        wav = np.asfortranarray([wav, wav])\r\n\r\n    if not is_time_correction:\r\n        wav_mix = change_pitch_semitones(wav, 44100, semitone_shift=-rate)[0]\r\n    else:\r\n        if is_pitch:\r\n            wav_1 = pyrb.pitch_shift(wav[0], sr, rate, rbargs=None)\r\n            wav_2 = pyrb.pitch_shift(wav[1], sr, rate, rbargs=None)\r\n        else:\r\n            wav_1 = pyrb.time_stretch(wav[0], sr, rate, rbargs=None)\r\n            wav_2 = pyrb.time_stretch(wav[1], sr, rate, rbargs=None)\r\n\r\n        if wav_1.shape > wav_2.shape:\r\n            wav_2 = to_shape(wav_2, wav_1.shape)\r\n        if wav_1.shape < wav_2.shape:\r\n            wav_1 = to_shape(wav_1, wav_2.shape)\r\n\r\n        wav_mix = np.asfortranarray([wav_1, wav_2])\r\n\r\n    sf.write(export_path, normalize(wav_mix.T, is_normalization), sr, subtype=wav_type_set)\r\n    save_format(export_path)\r\n\r\n\r\ndef average_audio(audio):\r\n\r\n    waves = []\r\n    wave_shapes = []\r\n    final_waves = []\r\n\r\n    for i in range(len(audio)):\r\n        wave = librosa.load(audio[i], sr=44100, mono=False)\r\n        waves.append(wave[0])\r\n        wave_shapes.append(wave[0].shape[1])\r\n\r\n    wave_shapes_index = wave_shapes.index(max(wave_shapes))\r\n    target_shape = waves[wave_shapes_index]\r\n    waves.pop(wave_shapes_index)\r\n    final_waves.append(target_shape)\r\n\r\n    for n_array in waves:\r\n        wav_target = to_shape(n_array, target_shape.shape)\r\n        final_waves.append(wav_target)\r\n\r\n    waves = sum(final_waves)\r\n    waves = waves / len(audio)\r\n\r\n    return waves\r\n\r\n\r\ndef average_dual_sources(wav_1, wav_2, value):\r\n\r\n    if wav_1.shape > wav_2.shape:\r\n        wav_2 = to_shape(wav_2, wav_1.shape)\r\n    if wav_1.shape < wav_2.shape:\r\n        wav_1 = to_shape(wav_1, wav_2.shape)\r\n\r\n    wave = (wav_1 * value) + (wav_2 * (1 - value))\r\n\r\n    return wave\r\n\r\n\r\ndef reshape_sources(wav_1: np.ndarray, wav_2: np.ndarray):\r\n\r\n    if wav_1.shape > wav_2.shape:\r\n        wav_2 = to_shape(wav_2, wav_1.shape)\r\n    if wav_1.shape < wav_2.shape:\r\n        ln = min([wav_1.shape[1], wav_2.shape[1]])\r\n        wav_2 = wav_2[:, :ln]\r\n\r\n    ln = min([wav_1.shape[1], wav_2.shape[1]])\r\n    wav_1 = wav_1[:, :ln]\r\n    wav_2 = wav_2[:, :ln]\r\n\r\n    return wav_2\r\n\r\n\r\ndef reshape_sources_ref(wav_1_shape, wav_2: np.ndarray):\r\n\r\n    if wav_1_shape > wav_2.shape:\r\n        wav_2 = to_shape(wav_2, wav_1_shape)\r\n\r\n    return wav_2\r\n\r\n\r\ndef combine_arrarys(audio_sources, is_swap=False):\r\n    source = np.zeros_like(max(audio_sources, key=np.size))\r\n\r\n    for v in audio_sources:\r\n        v = match_array_shapes(v, source, is_swap=is_swap)\r\n        source += v\r\n\r\n    return source\r\n\r\n\r\ndef combine_audio(paths: list, audio_file_base=None, wav_type_set=\"FLOAT\", save_format=None):\r\n\r\n    source = combine_arrarys([load_audio(i) for i in paths])\r\n    save_path = f\"{audio_file_base}_combined.wav\"\r\n    sf.write(save_path, source.T, 44100, subtype=wav_type_set)\r\n    save_format(save_path)\r\n\r\n\r\ndef reduce_mix_bv(inst_source, voc_source, reduction_rate=0.9):\r\n    # Reduce the volume\r\n    inst_source = inst_source * (1 - reduction_rate)\r\n\r\n    mix_reduced = combine_arrarys([inst_source, voc_source], is_swap=True)\r\n\r\n    return mix_reduced\r\n\r\n\r\ndef organize_inputs(inputs):\r\n    input_list = {\"target\": None, \"reference\": None, \"reverb\": None, \"inst\": None}\r\n\r\n    for i in inputs:\r\n        if i.endswith(\"_(Vocals).wav\"):\r\n            input_list[\"reference\"] = i\r\n        elif \"_RVC_\" in i:\r\n            input_list[\"target\"] = i\r\n        elif i.endswith(\"reverbed_stem.wav\"):\r\n            input_list[\"reverb\"] = i\r\n        elif i.endswith(\"_(Instrumental).wav\"):\r\n            input_list[\"inst\"] = i\r\n\r\n    return input_list\r\n\r\n\r\ndef check_if_phase_inverted(wav1, wav2, is_mono=False):\r\n    # Load the audio files\r\n    if not is_mono:\r\n        wav1 = np.mean(wav1, axis=0)\r\n        wav2 = np.mean(wav2, axis=0)\r\n\r\n    # Compute the correlation\r\n    correlation = np.corrcoef(wav1[:1000], wav2[:1000])\r\n\r\n    return correlation[0, 1] < 0\r\n\r\n\r\ndef align_audio(\r\n    file1,\r\n    file2,\r\n    file2_aligned,\r\n    file_subtracted,\r\n    wav_type_set,\r\n    is_save_aligned,\r\n    command_Text,\r\n    save_format,\r\n    align_window: list,\r\n    align_intro_val: list,\r\n    db_analysis: tuple,\r\n    set_progress_bar,\r\n    phase_option,\r\n    phase_shifts,\r\n    is_match_silence,\r\n    is_spec_match,\r\n):\r\n\r\n    global progress_value\r\n    progress_value = 0\r\n    is_mono = False\r\n\r\n    def get_diff(a, b):\r\n        corr = np.correlate(a, b, \"full\")\r\n        diff = corr.argmax() - (b.shape[0] - 1)\r\n\r\n        return diff\r\n\r\n    def progress_bar(length):\r\n        global progress_value\r\n        progress_value += 1\r\n\r\n        if (0.90 / length * progress_value) >= 0.9:\r\n            length = progress_value + 1\r\n\r\n        set_progress_bar(0.1, (0.9 / length * progress_value))\r\n\r\n    # read tracks\r\n\r\n    if file1.endswith(\".mp3\") and is_macos:\r\n        length1 = rerun_mp3(file1)\r\n        wav1, sr1 = librosa.load(file1, duration=length1, sr=44100, mono=False)\r\n    else:\r\n        wav1, sr1 = librosa.load(file1, sr=44100, mono=False)\r\n\r\n    if file2.endswith(\".mp3\") and is_macos:\r\n        length2 = rerun_mp3(file2)\r\n        wav2, sr2 = librosa.load(file2, duration=length2, sr=44100, mono=False)\r\n    else:\r\n        wav2, sr2 = librosa.load(file2, sr=44100, mono=False)\r\n\r\n    if wav1.ndim == 1 and wav2.ndim == 1:\r\n        is_mono = True\r\n    elif wav1.ndim == 1:\r\n        wav1 = np.asfortranarray([wav1, wav1])\r\n    elif wav2.ndim == 1:\r\n        wav2 = np.asfortranarray([wav2, wav2])\r\n\r\n    # Check if phase is inverted\r\n    if phase_option == AUTO_PHASE:\r\n        if check_if_phase_inverted(wav1, wav2, is_mono=is_mono):\r\n            wav2 = -wav2\r\n    elif phase_option == POSITIVE_PHASE:\r\n        wav2 = +wav2\r\n    elif phase_option == NEGATIVE_PHASE:\r\n        wav2 = -wav2\r\n\r\n    if is_match_silence:\r\n        wav2 = adjust_leading_silence(wav2, wav1)\r\n\r\n    wav1_length = int(librosa.get_duration(y=wav1, sr=44100))\r\n    wav2_length = int(librosa.get_duration(y=wav2, sr=44100))\r\n\r\n    if not is_mono:\r\n        wav1 = wav1.transpose()\r\n        wav2 = wav2.transpose()\r\n\r\n    wav2_org = wav2.copy()\r\n\r\n    command_Text(\"Processing files... \\n\")\r\n    seconds_length = min(wav1_length, wav2_length)\r\n\r\n    wav2_aligned_sources = []\r\n\r\n    for sec_len in align_intro_val:\r\n        # pick a position at 1 second in and get diff\r\n        sec_seg = 1 if sec_len == 1 else int(seconds_length // sec_len)\r\n        index = sr1 * sec_seg  # 1 second in, assuming sr1 = sr2 = 44100\r\n\r\n        if is_mono:\r\n            samp1, samp2 = wav1[index : index + sr1], wav2[index : index + sr1]\r\n            diff = get_diff(samp1, samp2)\r\n            # print(f\"Estimated difference: {diff}\\n\")\r\n        else:\r\n            index = sr1 * sec_seg  # 1 second in, assuming sr1 = sr2 = 44100\r\n            samp1, samp2 = wav1[index : index + sr1, 0], wav2[index : index + sr1, 0]\r\n            samp1_r, samp2_r = wav1[index : index + sr1, 1], wav2[index : index + sr1, 1]\r\n            diff, diff_r = get_diff(samp1, samp2), get_diff(samp1_r, samp2_r)\r\n            # print(f\"Estimated difference Left Channel: {diff}\\nEstimated difference Right Channel: {diff_r}\\n\")\r\n\r\n        # make aligned track 2\r\n        if diff > 0:\r\n            zeros_to_append = np.zeros(diff) if is_mono else np.zeros((diff, 2))\r\n            wav2_aligned = np.append(zeros_to_append, wav2_org, axis=0)\r\n        elif diff < 0:\r\n            wav2_aligned = wav2_org[-diff:]\r\n        else:\r\n            wav2_aligned = wav2_org\r\n            # command_Text(f\"Audio files already aligned.\\n\")\r\n\r\n        if not any(np.array_equal(wav2_aligned, source) for source in wav2_aligned_sources):\r\n            wav2_aligned_sources.append(wav2_aligned)\r\n\r\n    # print(\"Unique Sources: \", len(wav2_aligned_sources))\r\n\r\n    unique_sources = len(wav2_aligned_sources)\r\n\r\n    sub_mapper_big_mapper = {}\r\n\r\n    for s in wav2_aligned_sources:\r\n        wav2_aligned = match_mono_array_shapes(s, wav1) if is_mono else match_array_shapes(s, wav1, is_swap=True)\r\n\r\n        if align_window:\r\n            wav_sub = time_correction(\r\n                wav1, wav2_aligned, seconds_length, align_window=align_window, db_analysis=db_analysis, progress_bar=progress_bar, unique_sources=unique_sources, phase_shifts=phase_shifts\r\n            )\r\n            wav_sub_size = np.abs(wav_sub).mean()\r\n            sub_mapper_big_mapper = {**sub_mapper_big_mapper, **{wav_sub_size: wav_sub}}\r\n        else:\r\n            wav2_aligned = wav2_aligned * np.power(10, db_analysis[0] / 20)\r\n            db_range = db_analysis[1]\r\n\r\n            for db_adjustment in db_range:\r\n                # Adjust the dB of track2\r\n                s_adjusted = wav2_aligned * (10 ** (db_adjustment / 20))\r\n                wav_sub = wav1 - s_adjusted\r\n                wav_sub_size = np.abs(wav_sub).mean()\r\n                sub_mapper_big_mapper = {**sub_mapper_big_mapper, **{wav_sub_size: wav_sub}}\r\n\r\n        # print(sub_mapper_big_mapper.keys(), min(sub_mapper_big_mapper.keys()))\r\n\r\n    sub_mapper_value_list = list(sub_mapper_big_mapper.values())\r\n\r\n    if is_spec_match and len(sub_mapper_value_list) >= 2:\r\n        # print(\"using spec ensemble with align\")\r\n        wav_sub = ensemble_for_align(list(sub_mapper_big_mapper.values()))\r\n    else:\r\n        # print(\"using linear ensemble with align\")\r\n        wav_sub = ensemble_wav(list(sub_mapper_big_mapper.values()))\r\n\r\n    # print(f\"Mix Mean: {np.abs(wav1).mean()}\\nInst Mean: {np.abs(wav2).mean()}\")\r\n    # print('Final: ', np.abs(wav_sub).mean())\r\n    wav_sub = np.clip(wav_sub, -1, +1)\r\n\r\n    command_Text(f\"Saving inverted track... \")\r\n\r\n    if is_save_aligned or is_spec_match:\r\n        wav1 = match_mono_array_shapes(wav1, wav_sub) if is_mono else match_array_shapes(wav1, wav_sub, is_swap=True)\r\n        wav2_aligned = wav1 - wav_sub\r\n\r\n        if is_spec_match:\r\n            if wav1.ndim == 1 and wav2.ndim == 1:\r\n                wav2_aligned = np.asfortranarray([wav2_aligned, wav2_aligned]).T\r\n                wav1 = np.asfortranarray([wav1, wav1]).T\r\n\r\n            wav2_aligned = ensemble_for_align([wav2_aligned, wav1])\r\n            wav_sub = wav1 - wav2_aligned\r\n\r\n        if is_save_aligned:\r\n            sf.write(file2_aligned, wav2_aligned, sr1, subtype=wav_type_set)\r\n            save_format(file2_aligned)\r\n\r\n    sf.write(file_subtracted, wav_sub, sr1, subtype=wav_type_set)\r\n    save_format(file_subtracted)\r\n\r\n\r\ndef phase_shift_hilbert(signal, degree):\r\n    analytic_signal = hilbert(signal)\r\n    return np.cos(np.radians(degree)) * analytic_signal.real - np.sin(np.radians(degree)) * analytic_signal.imag\r\n\r\n\r\ndef get_phase_shifted_tracks(track, phase_shift):\r\n    if phase_shift == 180:\r\n        return [track, -track]\r\n\r\n    step = phase_shift\r\n    end = 180 - (180 % step) if 180 % step == 0 else 181\r\n    phase_range = range(step, end, step)\r\n\r\n    flipped_list = [track, -track]\r\n    for i in phase_range:\r\n        flipped_list.extend([phase_shift_hilbert(track, i), phase_shift_hilbert(track, -i)])\r\n\r\n    return flipped_list\r\n\r\n\r\ndef time_correction(mix: np.ndarray, instrumental: np.ndarray, seconds_length, align_window, db_analysis, sr=44100, progress_bar=None, unique_sources=None, phase_shifts=NONE_P):\r\n    # Function to align two tracks using cross-correlation\r\n\r\n    def align_tracks(track1, track2):\r\n        # A dictionary to store each version of track2_shifted and its mean absolute value\r\n        shifted_tracks = {}\r\n\r\n        # Loop to adjust dB of track2\r\n        track2 = track2 * np.power(10, db_analysis[0] / 20)\r\n        db_range = db_analysis[1]\r\n\r\n        if phase_shifts == 190:\r\n            track2_flipped = [track2]\r\n        else:\r\n            track2_flipped = get_phase_shifted_tracks(track2, phase_shifts)\r\n\r\n        for db_adjustment in db_range:\r\n            for t in track2_flipped:\r\n                # Adjust the dB of track2\r\n                track2_adjusted = t * (10 ** (db_adjustment / 20))\r\n                corr = correlate(track1, track2_adjusted)\r\n                delay = np.argmax(np.abs(corr)) - (len(track1) - 1)\r\n                track2_shifted = np.roll(track2_adjusted, shift=delay)\r\n\r\n                # Compute the mean absolute value of track2_shifted\r\n                track2_shifted_sub = track1 - track2_shifted\r\n                mean_abs_value = np.abs(track2_shifted_sub).mean()\r\n\r\n                # Store track2_shifted and its mean absolute value in the dictionary\r\n                shifted_tracks[mean_abs_value] = track2_shifted\r\n\r\n        # Return the version of track2_shifted with the smallest mean absolute value\r\n\r\n        return shifted_tracks[min(shifted_tracks.keys())]\r\n\r\n    # Make sure the audio files have the same shape\r\n\r\n    assert mix.shape == instrumental.shape, f\"Audio files must have the same shape - Mix: {mix.shape}, Inst: {instrumental.shape}\"\r\n\r\n    seconds_length = seconds_length // 2\r\n\r\n    sub_mapper = {}\r\n\r\n    progress_update_interval = 120\r\n    total_iterations = 0\r\n\r\n    if len(align_window) > 2:\r\n        progress_update_interval = 320\r\n\r\n    for secs in align_window:\r\n        step = secs / 2\r\n        window_size = int(sr * secs)\r\n        step_size = int(sr * step)\r\n\r\n        if len(mix.shape) == 1:\r\n            total_mono = (len(range(0, len(mix) - window_size, step_size)) // progress_update_interval) * unique_sources\r\n            total_iterations += total_mono\r\n        else:\r\n            total_stereo_ = len(range(0, len(mix[:, 0]) - window_size, step_size)) * 2\r\n            total_stereo = (total_stereo_ // progress_update_interval) * unique_sources\r\n            total_iterations += total_stereo\r\n\r\n    # print(total_iterations)\r\n\r\n    for secs in align_window:\r\n        sub = np.zeros_like(mix)\r\n        divider = np.zeros_like(mix)\r\n        step = secs / 2\r\n        window_size = int(sr * secs)\r\n        step_size = int(sr * step)\r\n        window = np.hanning(window_size)\r\n\r\n        # For the mono case:\r\n        if len(mix.shape) == 1:\r\n            # The files are mono\r\n            counter = 0\r\n            for i in range(0, len(mix) - window_size, step_size):\r\n                counter += 1\r\n                if counter % progress_update_interval == 0:\r\n                    progress_bar(total_iterations)\r\n                window_mix = mix[i : i + window_size] * window\r\n                window_instrumental = instrumental[i : i + window_size] * window\r\n                window_instrumental_aligned = align_tracks(window_mix, window_instrumental)\r\n                sub[i : i + window_size] += window_mix - window_instrumental_aligned\r\n                divider[i : i + window_size] += window\r\n        else:\r\n            # The files are stereo\r\n            counter = 0\r\n            for ch in range(mix.shape[1]):\r\n                for i in range(0, len(mix[:, ch]) - window_size, step_size):\r\n                    counter += 1\r\n                    if counter % progress_update_interval == 0:\r\n                        progress_bar(total_iterations)\r\n                    window_mix = mix[i : i + window_size, ch] * window\r\n                    window_instrumental = instrumental[i : i + window_size, ch] * window\r\n                    window_instrumental_aligned = align_tracks(window_mix, window_instrumental)\r\n                    sub[i : i + window_size, ch] += window_mix - window_instrumental_aligned\r\n                    divider[i : i + window_size, ch] += window\r\n\r\n        # Normalize the result by the overlap count\r\n        sub = np.where(divider > 1e-6, sub / divider, sub)\r\n        sub_size = np.abs(sub).mean()\r\n        sub_mapper = {**sub_mapper, **{sub_size: sub}}\r\n\r\n    # print(\"SUB_LEN\", len(list(sub_mapper.values())))\r\n\r\n    sub = ensemble_wav(list(sub_mapper.values()), split_size=12)\r\n\r\n    return sub\r\n\r\n\r\ndef ensemble_wav(waveforms, split_size=240):\r\n    # Create a dictionary to hold the thirds of each waveform and their mean absolute values\r\n    waveform_thirds = {i: np.array_split(waveform, split_size) for i, waveform in enumerate(waveforms)}\r\n\r\n    # Initialize the final waveform\r\n    final_waveform = []\r\n\r\n    # For chunk\r\n    for third_idx in range(split_size):\r\n        # Compute the mean absolute value of each third from each waveform\r\n        means = [np.abs(waveform_thirds[i][third_idx]).mean() for i in range(len(waveforms))]\r\n\r\n        # Find the index of the waveform with the lowest mean absolute value for this third\r\n        min_index = np.argmin(means)\r\n\r\n        # Add the least noisy third to the final waveform\r\n        final_waveform.append(waveform_thirds[min_index][third_idx])\r\n\r\n    # Concatenate all the thirds to create the final waveform\r\n    final_waveform = np.concatenate(final_waveform)\r\n\r\n    return final_waveform\r\n\r\n\r\ndef ensemble_wav_min(waveforms):\r\n    for i in range(1, len(waveforms)):\r\n        if i == 1:\r\n            wave = waveforms[0]\r\n\r\n        ln = min(len(wave), len(waveforms[i]))\r\n        wave = wave[:ln]\r\n        waveforms[i] = waveforms[i][:ln]\r\n\r\n        wave = np.where(np.abs(waveforms[i]) <= np.abs(wave), waveforms[i], wave)\r\n\r\n    return wave\r\n\r\n\r\ndef align_audio_test(wav1, wav2, sr1=44100):\r\n    def get_diff(a, b):\r\n        corr = np.correlate(a, b, \"full\")\r\n        diff = corr.argmax() - (b.shape[0] - 1)\r\n        return diff\r\n\r\n    # read tracks\r\n    wav1 = wav1.transpose()\r\n    wav2 = wav2.transpose()\r\n\r\n    # print(f\"Audio file shapes: {wav1.shape} / {wav2.shape}\\n\")\r\n\r\n    wav2_org = wav2.copy()\r\n\r\n    # pick a position at 1 second in and get diff\r\n    index = sr1  # *seconds_length  # 1 second in, assuming sr1 = sr2 = 44100\r\n    samp1 = wav1[index : index + sr1, 0]  # currently use left channel\r\n    samp2 = wav2[index : index + sr1, 0]\r\n    diff = get_diff(samp1, samp2)\r\n\r\n    # make aligned track 2\r\n    if diff > 0:\r\n        wav2_aligned = np.append(np.zeros((diff, 1)), wav2_org, axis=0)\r\n    elif diff < 0:\r\n        wav2_aligned = wav2_org[-diff:]\r\n    else:\r\n        wav2_aligned = wav2_org\r\n\r\n    return wav2_aligned\r\n\r\n\r\ndef load_audio(audio_file):\r\n    wav, sr = librosa.load(audio_file, sr=44100, mono=False)\r\n\r\n    if wav.ndim == 1:\r\n        wav = np.asfortranarray([wav, wav])\r\n\r\n    return wav\r\n\r\n\r\ndef rerun_mp3(audio_file):\r\n    with audioread.audio_open(audio_file) as f:\r\n        track_length = int(f.duration)\r\n\r\n    return track_length\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/stft.py",
    "content": "import torch\n\n\nclass STFT:\n    \"\"\"\n    This class performs the Short-Time Fourier Transform (STFT) and its inverse (ISTFT).\n    These functions are essential for converting the audio between the time domain and the frequency domain,\n    which is a crucial aspect of audio processing in neural networks.\n    \"\"\"\n\n    def __init__(self, logger, n_fft, hop_length, dim_f, device):\n        self.logger = logger\n        self.n_fft = n_fft\n        self.hop_length = hop_length\n        self.dim_f = dim_f\n        self.device = device\n        # Create a Hann window tensor for use in the STFT.\n        self.hann_window = torch.hann_window(window_length=self.n_fft, periodic=True)\n\n    def __call__(self, input_tensor):\n        # Determine if the input tensor's device is not a standard computing device (i.e., not CPU or CUDA).\n        is_non_standard_device = not input_tensor.device.type in [\"cuda\", \"cpu\"]\n\n        # If on a non-standard device, temporarily move the tensor to CPU for processing.\n        if is_non_standard_device:\n            input_tensor = input_tensor.cpu()\n\n        # Transfer the pre-defined window tensor to the same device as the input tensor.\n        stft_window = self.hann_window.to(input_tensor.device)\n\n        # Extract batch dimensions (all dimensions except the last two which are channel and time).\n        batch_dimensions = input_tensor.shape[:-2]\n\n        # Extract channel and time dimensions (last two dimensions of the tensor).\n        channel_dim, time_dim = input_tensor.shape[-2:]\n\n        # Reshape the tensor to merge batch and channel dimensions for STFT processing.\n        reshaped_tensor = input_tensor.reshape([-1, time_dim])\n\n        # Perform the Short-Time Fourier Transform (STFT) on the reshaped tensor.\n        stft_output = torch.stft(reshaped_tensor, n_fft=self.n_fft, hop_length=self.hop_length, window=stft_window, center=True, return_complex=False)\n\n        # Rearrange the dimensions of the STFT output to bring the frequency dimension forward.\n        permuted_stft_output = stft_output.permute([0, 3, 1, 2])\n\n        # Reshape the output to restore the original batch and channel dimensions, while keeping the newly formed frequency and time dimensions.\n        final_output = permuted_stft_output.reshape([*batch_dimensions, channel_dim, 2, -1, permuted_stft_output.shape[-1]]).reshape(\n            [*batch_dimensions, channel_dim * 2, -1, permuted_stft_output.shape[-1]]\n        )\n\n        # If the original tensor was on a non-standard device, move the processed tensor back to that device.\n        if is_non_standard_device:\n            final_output = final_output.to(self.device)\n\n        # Return the transformed tensor, sliced to retain only the required frequency dimension (`dim_f`).\n        return final_output[..., : self.dim_f, :]\n\n    def pad_frequency_dimension(self, input_tensor, batch_dimensions, channel_dim, freq_dim, time_dim, num_freq_bins):\n        \"\"\"\n        Adds zero padding to the frequency dimension of the input tensor.\n        \"\"\"\n        # Create a padding tensor for the frequency dimension\n        freq_padding = torch.zeros([*batch_dimensions, channel_dim, num_freq_bins - freq_dim, time_dim]).to(input_tensor.device)\n\n        # Concatenate the padding to the input tensor along the frequency dimension.\n        padded_tensor = torch.cat([input_tensor, freq_padding], -2)\n\n        return padded_tensor\n\n    def calculate_inverse_dimensions(self, input_tensor):\n        # Extract batch dimensions and frequency-time dimensions.\n        batch_dimensions = input_tensor.shape[:-3]\n        channel_dim, freq_dim, time_dim = input_tensor.shape[-3:]\n\n        # Calculate the number of frequency bins for the inverse STFT.\n        num_freq_bins = self.n_fft // 2 + 1\n\n        return batch_dimensions, channel_dim, freq_dim, time_dim, num_freq_bins\n\n    def prepare_for_istft(self, padded_tensor, batch_dimensions, channel_dim, num_freq_bins, time_dim):\n        \"\"\"\n        Prepares the tensor for Inverse Short-Time Fourier Transform (ISTFT) by reshaping\n        and creating a complex tensor from the real and imaginary parts.\n        \"\"\"\n        # Reshape the tensor to separate real and imaginary parts and prepare for ISTFT.\n        reshaped_tensor = padded_tensor.reshape([*batch_dimensions, channel_dim // 2, 2, num_freq_bins, time_dim])\n\n        # Flatten batch dimensions and rearrange for ISTFT.\n        flattened_tensor = reshaped_tensor.reshape([-1, 2, num_freq_bins, time_dim])\n\n        # Rearrange the dimensions of the tensor to bring the frequency dimension forward.\n        permuted_tensor = flattened_tensor.permute([0, 2, 3, 1])\n\n        # Combine real and imaginary parts into a complex tensor.\n        complex_tensor = permuted_tensor[..., 0] + permuted_tensor[..., 1] * 1.0j\n\n        return complex_tensor\n\n    def inverse(self, input_tensor):\n        # Determine if the input tensor's device is not a standard computing device (i.e., not CPU or CUDA).\n        is_non_standard_device = not input_tensor.device.type in [\"cuda\", \"cpu\"]\n\n        # If on a non-standard device, temporarily move the tensor to CPU for processing.\n        if is_non_standard_device:\n            input_tensor = input_tensor.cpu()\n\n        # Transfer the pre-defined Hann window tensor to the same device as the input tensor.\n        stft_window = self.hann_window.to(input_tensor.device)\n\n        batch_dimensions, channel_dim, freq_dim, time_dim, num_freq_bins = self.calculate_inverse_dimensions(input_tensor)\n\n        padded_tensor = self.pad_frequency_dimension(input_tensor, batch_dimensions, channel_dim, freq_dim, time_dim, num_freq_bins)\n\n        complex_tensor = self.prepare_for_istft(padded_tensor, batch_dimensions, channel_dim, num_freq_bins, time_dim)\n\n        # Perform the Inverse Short-Time Fourier Transform (ISTFT).\n        istft_result = torch.istft(complex_tensor, n_fft=self.n_fft, hop_length=self.hop_length, window=stft_window, center=True)\n\n        # Reshape ISTFT result to restore original batch and channel dimensions.\n        final_output = istft_result.reshape([*batch_dimensions, 2, -1])\n\n        # If the original tensor was on a non-standard device, move the processed tensor back to that device.\n        if is_non_standard_device:\n            final_output = final_output.to(self.device)\n\n        return final_output\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/tfc_tdf_v3.py",
    "content": "import torch\nimport torch.nn as nn\nfrom functools import partial\n\nclass STFT:\n    def __init__(self, n_fft, hop_length, dim_f, device):\n        self.n_fft = n_fft\n        self.hop_length = hop_length\n        self.window = torch.hann_window(window_length=self.n_fft, periodic=True)\n        self.dim_f = dim_f\n        self.device = device\n\n    def __call__(self, x):\n        \n        x_is_mps = not x.device.type in [\"cuda\", \"cpu\"]\n        if x_is_mps:\n            x = x.cpu()\n\n        window = self.window.to(x.device)\n        batch_dims = x.shape[:-2]\n        c, t = x.shape[-2:]\n        x = x.reshape([-1, t])\n        x = torch.stft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True,return_complex=False)\n        x = x.permute([0, 3, 1, 2])\n        x = x.reshape([*batch_dims, c, 2, -1, x.shape[-1]]).reshape([*batch_dims, c * 2, -1, x.shape[-1]])\n\n        if x_is_mps:\n            x = x.to(self.device)\n\n        return x[..., :self.dim_f, :]\n\n    def inverse(self, x):\n        \n        x_is_mps = not x.device.type in [\"cuda\", \"cpu\"]\n        if x_is_mps:\n            x = x.cpu()\n\n        window = self.window.to(x.device)\n        batch_dims = x.shape[:-3]\n        c, f, t = x.shape[-3:]\n        n = self.n_fft // 2 + 1\n        f_pad = torch.zeros([*batch_dims, c, n - f, t]).to(x.device)\n        x = torch.cat([x, f_pad], -2)\n        x = x.reshape([*batch_dims, c // 2, 2, n, t]).reshape([-1, 2, n, t])\n        x = x.permute([0, 2, 3, 1])\n        x = x[..., 0] + x[..., 1] * 1.j\n        x = torch.istft(x, n_fft=self.n_fft, hop_length=self.hop_length, window=window, center=True)\n        x = x.reshape([*batch_dims, 2, -1])\n\n        if x_is_mps:\n            x = x.to(self.device)\n\n        return x\n\ndef get_norm(norm_type):\n    def norm(c, norm_type):\n        if norm_type is None:\n            return nn.Identity()\n        elif norm_type == 'BatchNorm':\n            return nn.BatchNorm2d(c)\n        elif norm_type == 'InstanceNorm':\n            return nn.InstanceNorm2d(c, affine=True)\n        elif 'GroupNorm' in norm_type:\n            g = int(norm_type.replace('GroupNorm', ''))\n            return nn.GroupNorm(num_groups=g, num_channels=c)\n        else:\n            return nn.Identity()\n\n    return partial(norm, norm_type=norm_type)\n\n\ndef get_act(act_type):\n    if act_type == 'gelu':\n        return nn.GELU()\n    elif act_type == 'relu':\n        return nn.ReLU()\n    elif act_type[:3] == 'elu':\n        alpha = float(act_type.replace('elu', ''))\n        return nn.ELU(alpha)\n    else:\n        raise Exception\n\n\nclass Upscale(nn.Module):\n    def __init__(self, in_c, out_c, scale, norm, act):\n        super().__init__()\n        self.conv = nn.Sequential(\n            norm(in_c),\n            act,\n            nn.ConvTranspose2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)\n        )\n\n    def forward(self, x):\n        return self.conv(x)\n\n\nclass Downscale(nn.Module):\n    def __init__(self, in_c, out_c, scale, norm, act):\n        super().__init__()\n        self.conv = nn.Sequential(\n            norm(in_c),\n            act,\n            nn.Conv2d(in_channels=in_c, out_channels=out_c, kernel_size=scale, stride=scale, bias=False)\n        )\n\n    def forward(self, x):\n        return self.conv(x)\n\n\nclass TFC_TDF(nn.Module):\n    def __init__(self, in_c, c, l, f, bn, norm, act):\n        super().__init__()\n\n        self.blocks = nn.ModuleList()\n        for i in range(l):\n            block = nn.Module()\n\n            block.tfc1 = nn.Sequential(\n                norm(in_c),\n                act,\n                nn.Conv2d(in_c, c, 3, 1, 1, bias=False),\n            )\n            block.tdf = nn.Sequential(\n                norm(c),\n                act,\n                nn.Linear(f, f // bn, bias=False),\n                norm(c),\n                act,\n                nn.Linear(f // bn, f, bias=False),\n            )\n            block.tfc2 = nn.Sequential(\n                norm(c),\n                act,\n                nn.Conv2d(c, c, 3, 1, 1, bias=False),\n            )\n            block.shortcut = nn.Conv2d(in_c, c, 1, 1, 0, bias=False)\n\n            self.blocks.append(block)\n            in_c = c\n\n    def forward(self, x):\n        for block in self.blocks:\n            s = block.shortcut(x)\n            x = block.tfc1(x)\n            x = x + block.tdf(x)\n            x = block.tfc2(x)\n            x = x + s\n        return x\n\n\nclass TFC_TDF_net(nn.Module):\n    def __init__(self, config, device):\n        super().__init__()\n        self.config = config\n        self.device = device\n\n        # Defensive checks for normalization configuration\n        try:\n            norm_type = config.model.norm\n        except (AttributeError, KeyError):\n            norm_type = None\n            print(\"Warning: Model configuration missing 'norm' attribute, using Identity normalization\")\n        \n        norm = get_norm(norm_type=norm_type)\n        \n        try:\n            act_type = config.model.act\n        except (AttributeError, KeyError):\n            act_type = 'gelu'\n            print(\"Warning: Model configuration missing 'act' attribute, using GELU activation\")\n            \n        act = get_act(act_type=act_type)\n\n        self.num_target_instruments = 1 if config.training.target_instrument else len(config.training.instruments)\n        self.num_subbands = config.model.num_subbands\n\n        dim_c = self.num_subbands * config.audio.num_channels * 2\n        n = config.model.num_scales\n        scale = config.model.scale\n        l = config.model.num_blocks_per_scale\n        c = config.model.num_channels\n        g = config.model.growth\n        bn = config.model.bottleneck_factor\n        f = config.audio.dim_f // self.num_subbands\n\n        self.first_conv = nn.Conv2d(dim_c, c, 1, 1, 0, bias=False)\n\n        self.encoder_blocks = nn.ModuleList()\n        for i in range(n):\n            block = nn.Module()\n            block.tfc_tdf = TFC_TDF(c, c, l, f, bn, norm, act)\n            block.downscale = Downscale(c, c + g, scale, norm, act)\n            f = f // scale[1]\n            c += g\n            self.encoder_blocks.append(block)\n\n        self.bottleneck_block = TFC_TDF(c, c, l, f, bn, norm, act)\n\n        self.decoder_blocks = nn.ModuleList()\n        for i in range(n):\n            block = nn.Module()\n            block.upscale = Upscale(c, c - g, scale, norm, act)\n            f = f * scale[1]\n            c -= g\n            block.tfc_tdf = TFC_TDF(2 * c, c, l, f, bn, norm, act)\n            self.decoder_blocks.append(block)\n\n        self.final_conv = nn.Sequential(\n            nn.Conv2d(c + dim_c, c, 1, 1, 0, bias=False),\n            act,\n            nn.Conv2d(c, self.num_target_instruments * dim_c, 1, 1, 0, bias=False)\n        )\n\n        self.stft = STFT(config.audio.n_fft, config.audio.hop_length, config.audio.dim_f, self.device)\n\n    def cac2cws(self, x):\n        k = self.num_subbands\n        b, c, f, t = x.shape\n        x = x.reshape(b, c, k, f // k, t)\n        x = x.reshape(b, c * k, f // k, t)\n        return x\n\n    def cws2cac(self, x):\n        k = self.num_subbands\n        b, c, f, t = x.shape\n        x = x.reshape(b, c // k, k, f, t)\n        x = x.reshape(b, c // k, f * k, t)\n        return x\n\n    def forward(self, x):\n\n        x = self.stft(x)\n\n        mix = x = self.cac2cws(x)\n\n        first_conv_out = x = self.first_conv(x)\n\n        x = x.transpose(-1, -2)\n\n        encoder_outputs = []\n        for block in self.encoder_blocks:\n            x = block.tfc_tdf(x)\n            encoder_outputs.append(x)\n            x = block.downscale(x)\n\n        x = self.bottleneck_block(x)\n\n        for block in self.decoder_blocks:\n            x = block.upscale(x)\n            x = torch.cat([x, encoder_outputs.pop()], 1)\n            x = block.tfc_tdf(x)\n\n        x = x.transpose(-1, -2)\n\n        x = x * first_conv_out  # reduce artifacts\n\n        x = self.final_conv(torch.cat([mix, x], 1))\n\n        x = self.cws2cac(x)\n\n        if self.num_target_instruments > 1:\n            b, c, f, t = x.shape\n            x = x.reshape(b, self.num_target_instruments, -1, f, t)\n\n        x = self.stft.inverse(x)\n\n        return x\n\n\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/__init__.py",
    "content": "# VR init.\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/layers.py",
    "content": "import torch\r\nfrom torch import nn\r\nimport torch.nn.functional as F\r\n\r\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\r\n\r\n\r\nclass Conv2DBNActiv(nn.Module):\r\n    \"\"\"\r\n    This class implements a convolutional layer followed by batch normalization and an activation function.\r\n    It is a common pattern in deep learning for processing images or feature maps. The convolutional layer\r\n    applies a set of learnable filters to the input. Batch normalization then normalizes the output of the\r\n    convolution, and finally, an activation function introduces non-linearity to the model, allowing it to\r\n    learn more complex patterns.\r\n\r\n    Attributes:\r\n        conv (nn.Sequential): A sequential container of Conv2d, BatchNorm2d, and an activation layer.\r\n\r\n    Args:\r\n        num_input_channels (int): Number of input channels.\r\n        num_output_channels (int): Number of output channels.\r\n        kernel_size (int, optional): Size of the kernel. Defaults to 3.\r\n        stride_length (int, optional): Stride of the convolution. Defaults to 1.\r\n        padding_size (int, optional): Padding added to all sides of the input. Defaults to 1.\r\n        dilation_rate (int, optional): Spacing between kernel elements. Defaults to 1.\r\n        activation_function (callable, optional): The activation function to use. Defaults to nn.ReLU.\r\n    \"\"\"\r\n\r\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, dilation=1, activ=nn.ReLU):\r\n        super(Conv2DBNActiv, self).__init__()\r\n\r\n        # The nn.Sequential container allows us to stack the Conv2d, BatchNorm2d, and activation layers\r\n        # into a single module, simplifying the forward pass.\r\n        self.conv = nn.Sequential(nn.Conv2d(nin, nout, kernel_size=ksize, stride=stride, padding=pad, dilation=dilation, bias=False), nn.BatchNorm2d(nout), activ())\r\n\r\n    def __call__(self, input_tensor):\r\n        # Defines the computation performed at every call.\r\n        # Simply passes the input through the sequential container.\r\n        return self.conv(input_tensor)\r\n\r\n\r\nclass SeperableConv2DBNActiv(nn.Module):\r\n    \"\"\"\r\n    This class implements a separable convolutional layer followed by batch normalization and an activation function.\r\n    Separable convolutions are a type of convolution that splits the convolution operation into two simpler operations:\r\n    a depthwise convolution and a pointwise convolution. This can reduce the number of parameters and computational cost,\r\n    making the network more efficient while maintaining similar performance.\r\n\r\n    The depthwise convolution applies a single filter per input channel (input depth). The pointwise convolution,\r\n    which follows, applies a 1x1 convolution to combine the outputs of the depthwise convolution across channels.\r\n    Batch normalization is then applied to stabilize learning and reduce internal covariate shift. Finally,\r\n    an activation function introduces non-linearity, allowing the network to learn complex patterns.\r\n    Attributes:\r\n        conv (nn.Sequential): A sequential container of depthwise Conv2d, pointwise Conv2d, BatchNorm2d, and an activation layer.\r\n\r\n    Args:\r\n        num_input_channels (int): Number of input channels.\r\n        num_output_channels (int): Number of output channels.\r\n        kernel_size (int, optional): Size of the kernel for the depthwise convolution. Defaults to 3.\r\n        stride_length (int, optional): Stride of the convolution. Defaults to 1.\r\n        padding_size (int, optional): Padding added to all sides of the input for the depthwise convolution. Defaults to 1.\r\n        dilation_rate (int, optional): Spacing between kernel elements for the depthwise convolution. Defaults to 1.\r\n        activation_function (callable, optional): The activation function to use. Defaults to nn.ReLU.\r\n    \"\"\"\r\n\r\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, dilation=1, activ=nn.ReLU):\r\n        super(SeperableConv2DBNActiv, self).__init__()\r\n\r\n        # Initialize the sequential container with the depthwise convolution.\r\n        # The number of groups in the depthwise convolution is set to num_input_channels, which means each input channel is treated separately.\r\n        # The pointwise convolution then combines these separate channels into num_output_channels channels.\r\n        # Batch normalization is applied to the output of the pointwise convolution.\r\n        # Finally, the activation function is applied to introduce non-linearity.\r\n        self.conv = nn.Sequential(\r\n            nn.Conv2d(\r\n                nin,\r\n                nin,  # For depthwise convolution, in_channels = out_channels = num_input_channels\r\n                kernel_size=ksize,\r\n                stride=stride,\r\n                padding=pad,\r\n                dilation=dilation,\r\n                groups=nin,  # This makes it a depthwise convolution\r\n                bias=False,  # Bias is not used because it will be handled by BatchNorm2d\r\n            ),\r\n            nn.Conv2d(\r\n                nin,\r\n                nout,  # Pointwise convolution to combine channels\r\n                kernel_size=1,  # Kernel size of 1 for pointwise convolution\r\n                bias=False,  # Bias is not used because it will be handled by BatchNorm2d\r\n            ),\r\n            nn.BatchNorm2d(nout),  # Normalize the output of the pointwise convolution\r\n            activ(),  # Apply the activation function\r\n        )\r\n\r\n    def __call__(self, input_tensor):\r\n        # Pass the input through the sequential container.\r\n        # This performs the depthwise convolution, followed by the pointwise convolution,\r\n        # batch normalization, and finally applies the activation function.\r\n        return self.conv(input_tensor)\r\n\r\n\r\nclass Encoder(nn.Module):\r\n    \"\"\"\r\n    The Encoder class is a part of the neural network architecture that is responsible for processing the input data.\r\n    It consists of two convolutional layers, each followed by batch normalization and an activation function.\r\n    The purpose of the Encoder is to transform the input data into a higher-level, abstract representation.\r\n    This is achieved by applying filters (through convolutions) that can capture patterns or features in the data.\r\n    The Encoder can be thought of as a feature extractor that prepares the data for further processing by the network.\r\n    Attributes:\r\n        conv1 (Conv2DBNActiv): The first convolutional layer in the encoder.\r\n        conv2 (Conv2DBNActiv): The second convolutional layer in the encoder.\r\n\r\n    Args:\r\n        number_of_input_channels (int): Number of input channels for the first convolutional layer.\r\n        number_of_output_channels (int): Number of output channels for the convolutional layers.\r\n        kernel_size (int): Kernel size for the convolutional layers.\r\n        stride_length (int): Stride for the convolutional operations.\r\n        padding_size (int): Padding added to all sides of the input for the convolutional layers.\r\n        activation_function (callable): The activation function to use after each convolutional layer.\r\n    \"\"\"\r\n\r\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, activ=nn.LeakyReLU):\r\n        super(Encoder, self).__init__()\r\n\r\n        # The first convolutional layer takes the input and applies a convolution,\r\n        # followed by batch normalization and an activation function specified by `activation_function`.\r\n        # This layer is responsible for capturing the initial set of features from the input data.\r\n        self.conv1 = Conv2DBNActiv(nin, nout, ksize, 1, pad, activ=activ)\r\n\r\n        # The second convolutional layer further processes the output from the first layer,\r\n        # applying another set of convolution, batch normalization, and activation.\r\n        # This layer helps in capturing more complex patterns in the data by building upon the initial features extracted by conv1.\r\n        self.conv2 = Conv2DBNActiv(nout, nout, ksize, stride, pad, activ=activ)\r\n\r\n    def __call__(self, input_tensor):\r\n        # The input data `input_tensor` is passed through the first convolutional layer.\r\n        # The output of this layer serves as a 'skip connection' that can be used later in the network to preserve spatial information.\r\n        skip = self.conv1(input_tensor)\r\n\r\n        # The output from the first layer is then passed through the second convolutional layer.\r\n        # This processed data `hidden` is the final output of the Encoder, representing the abstracted features of the input.\r\n        hidden = self.conv2(skip)\r\n\r\n        # The Encoder returns two outputs: `hidden`, the abstracted feature representation, and `skip`, the intermediate representation from conv1.\r\n        return hidden, skip\r\n\r\n\r\nclass Decoder(nn.Module):\r\n    \"\"\"\r\n    The Decoder class is part of the neural network architecture, specifically designed to perform the inverse operation of an encoder.\r\n    Its main role is to reconstruct or generate data from encoded representations, which is crucial in tasks like image segmentation or audio processing.\r\n    This class uses upsampling, convolution, optional dropout for regularization, and concatenation of skip connections to achieve its goal.\r\n\r\n    Attributes:\r\n        convolution (Conv2DBNActiv): A convolutional layer with batch normalization and activation function.\r\n        dropout_layer (nn.Dropout2d): An optional dropout layer for regularization to prevent overfitting.\r\n\r\n    Args:\r\n        input_channels (int): Number of input channels for the convolutional layer.\r\n        output_channels (int): Number of output channels for the convolutional layer.\r\n        kernel_size (int): Kernel size for the convolutional layer.\r\n        stride (int): Stride for the convolutional operations.\r\n        padding (int): Padding added to all sides of the input for the convolutional layer.\r\n        activation_function (callable): The activation function to use after the convolutional layer.\r\n        include_dropout (bool): Whether to include a dropout layer for regularization.\r\n    \"\"\"\r\n\r\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, activ=nn.ReLU, dropout=False):\r\n        super(Decoder, self).__init__()\r\n\r\n        # Initialize the convolutional layer with specified parameters.\r\n        self.conv = Conv2DBNActiv(nin, nout, ksize, 1, pad, activ=activ)\r\n\r\n        # Initialize the dropout layer if include_dropout is set to True\r\n        self.dropout = nn.Dropout2d(0.1) if dropout else None\r\n\r\n    def __call__(self, input_tensor, skip=None):\r\n        # Upsample the input tensor to a higher resolution using bilinear interpolation.\r\n        input_tensor = F.interpolate(input_tensor, scale_factor=2, mode=\"bilinear\", align_corners=True)\r\n        # If a skip connection is provided, crop it to match the size of input_tensor and concatenate them along the channel dimension.\r\n        if skip is not None:\r\n            skip = spec_utils.crop_center(skip, input_tensor)  # Crop skip_connection to match input_tensor's dimensions.\r\n            input_tensor = torch.cat([input_tensor, skip], dim=1)  # Concatenate input_tensor and skip_connection along the channel dimension.\r\n\r\n        # Pass the concatenated tensor (or just input_tensor if no skip_connection is provided) through the convolutional layer.\r\n        output_tensor = self.conv(input_tensor)\r\n\r\n        # If dropout is enabled, apply it to the output of the convolutional layer.\r\n        if self.dropout is not None:\r\n            output_tensor = self.dropout(output_tensor)\r\n\r\n        # Return the final output tensor.\r\n        return output_tensor\r\n\r\n\r\nclass ASPPModule(nn.Module):\r\n    \"\"\"\r\n    Atrous Spatial Pyramid Pooling (ASPP) Module is designed for capturing multi-scale context by applying\r\n    atrous convolution at multiple rates. This is particularly useful in segmentation tasks where capturing\r\n    objects at various scales is beneficial. The module applies several parallel dilated convolutions with\r\n    different dilation rates to the input feature map, allowing it to efficiently capture information at\r\n    multiple scales.\r\n\r\n    Attributes:\r\n        conv1 (nn.Sequential): Applies adaptive average pooling followed by a 1x1 convolution.\r\n        nn_architecture (int): Identifier for the neural network architecture being used.\r\n        six_layer (list): List containing architecture identifiers that require six layers.\r\n        seven_layer (list): List containing architecture identifiers that require seven layers.\r\n        conv2-conv7 (nn.Module): Convolutional layers with varying dilation rates for multi-scale feature extraction.\r\n        bottleneck (nn.Sequential): A 1x1 convolutional layer that combines all features followed by dropout for regularization.\r\n    \"\"\"\r\n\r\n    def __init__(self, nn_architecture, nin, nout, dilations=(4, 8, 16), activ=nn.ReLU):\r\n        \"\"\"\r\n        Initializes the ASPP module with specified parameters.\r\n\r\n        Args:\r\n            nn_architecture (int): Identifier for the neural network architecture.\r\n            input_channels (int): Number of input channels.\r\n            output_channels (int): Number of output channels.\r\n            dilations (tuple): Tuple of dilation rates for the atrous convolutions.\r\n            activation (callable): Activation function to use after convolutional layers.\r\n        \"\"\"\r\n        super(ASPPModule, self).__init__()\r\n\r\n        # Adaptive average pooling reduces the spatial dimensions to 1x1, focusing on global context,\r\n        # followed by a 1x1 convolution to project back to the desired channel dimension.\r\n        self.conv1 = nn.Sequential(nn.AdaptiveAvgPool2d((1, None)), Conv2DBNActiv(nin, nin, 1, 1, 0, activ=activ))\r\n\r\n        self.nn_architecture = nn_architecture\r\n        # Architecture identifiers for models requiring additional layers.\r\n        self.six_layer = [129605]\r\n        self.seven_layer = [537238, 537227, 33966]\r\n\r\n        # Extra convolutional layer used for six and seven layer configurations.\r\n        extra_conv = SeperableConv2DBNActiv(nin, nin, 3, 1, dilations[2], dilations[2], activ=activ)\r\n\r\n        # Standard 1x1 convolution for channel reduction.\r\n        self.conv2 = Conv2DBNActiv(nin, nin, 1, 1, 0, activ=activ)\r\n\r\n        # Separable convolutions with different dilation rates for multi-scale feature extraction.\r\n        self.conv3 = SeperableConv2DBNActiv(nin, nin, 3, 1, dilations[0], dilations[0], activ=activ)\r\n        self.conv4 = SeperableConv2DBNActiv(nin, nin, 3, 1, dilations[1], dilations[1], activ=activ)\r\n        self.conv5 = SeperableConv2DBNActiv(nin, nin, 3, 1, dilations[2], dilations[2], activ=activ)\r\n\r\n        # Depending on the architecture, include the extra convolutional layers.\r\n        if self.nn_architecture in self.six_layer:\r\n            self.conv6 = extra_conv\r\n            nin_x = 6\r\n        elif self.nn_architecture in self.seven_layer:\r\n            self.conv6 = extra_conv\r\n            self.conv7 = extra_conv\r\n            nin_x = 7\r\n        else:\r\n            nin_x = 5\r\n\r\n        # Bottleneck layer combines all the multi-scale features into the desired number of output channels.\r\n        self.bottleneck = nn.Sequential(Conv2DBNActiv(nin * nin_x, nout, 1, 1, 0, activ=activ), nn.Dropout2d(0.1))\r\n\r\n    def forward(self, input_tensor):\r\n        \"\"\"\r\n        Forward pass of the ASPP module.\r\n\r\n        Args:\r\n            input_tensor (Tensor): Input tensor.\r\n\r\n        Returns:\r\n            Tensor: Output tensor after applying ASPP.\r\n        \"\"\"\r\n        _, _, h, w = input_tensor.size()\r\n\r\n        # Apply the first convolutional sequence and upsample to the original resolution.\r\n        feat1 = F.interpolate(self.conv1(input_tensor), size=(h, w), mode=\"bilinear\", align_corners=True)\r\n\r\n        # Apply the remaining convolutions directly on the input.\r\n        feat2 = self.conv2(input_tensor)\r\n        feat3 = self.conv3(input_tensor)\r\n        feat4 = self.conv4(input_tensor)\r\n        feat5 = self.conv5(input_tensor)\r\n\r\n        # Concatenate features from all layers. Depending on the architecture, include the extra features.\r\n        if self.nn_architecture in self.six_layer:\r\n            feat6 = self.conv6(input_tensor)\r\n            out = torch.cat((feat1, feat2, feat3, feat4, feat5, feat6), dim=1)\r\n        elif self.nn_architecture in self.seven_layer:\r\n            feat6 = self.conv6(input_tensor)\r\n            feat7 = self.conv7(input_tensor)\r\n            out = torch.cat((feat1, feat2, feat3, feat4, feat5, feat6, feat7), dim=1)\r\n        else:\r\n            out = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1)\r\n\r\n        # Apply the bottleneck layer to combine and reduce the channel dimensions.\r\n        bottleneck_output = self.bottleneck(out)\r\n        return bottleneck_output\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/layers_new.py",
    "content": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nfrom audio_separator.separator.uvr_lib_v5 import spec_utils\n\n\nclass Conv2DBNActiv(nn.Module):\n    \"\"\"\n    Conv2DBNActiv Class:\n    This class implements a convolutional layer followed by batch normalization and an activation function.\n    It is a fundamental building block for constructing neural networks, especially useful in image and audio processing tasks.\n    The class encapsulates the pattern of applying a convolution, normalizing the output, and then applying a non-linear activation.\n    \"\"\"\n\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, dilation=1, activ=nn.ReLU):\n        super(Conv2DBNActiv, self).__init__()\n\n        # Sequential model combining Conv2D, BatchNorm, and activation function into a single module\n        self.conv = nn.Sequential(nn.Conv2d(nin, nout, kernel_size=ksize, stride=stride, padding=pad, dilation=dilation, bias=False), nn.BatchNorm2d(nout), activ())\n\n    def __call__(self, input_tensor):\n        # Forward pass through the sequential model\n        return self.conv(input_tensor)\n\n\nclass Encoder(nn.Module):\n    \"\"\"\n    Encoder Class:\n    This class defines an encoder module typically used in autoencoder architectures.\n    It consists of two convolutional layers, each followed by batch normalization and an activation function.\n    \"\"\"\n\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, activ=nn.LeakyReLU):\n        super(Encoder, self).__init__()\n\n        # First convolutional layer of the encoder\n        self.conv1 = Conv2DBNActiv(nin, nout, ksize, stride, pad, activ=activ)\n        # Second convolutional layer of the encoder\n        self.conv2 = Conv2DBNActiv(nout, nout, ksize, 1, pad, activ=activ)\n\n    def __call__(self, input_tensor):\n        # Applying the first and then the second convolutional layers\n        hidden = self.conv1(input_tensor)\n        hidden = self.conv2(hidden)\n\n        return hidden\n\n\nclass Decoder(nn.Module):\n    \"\"\"\n    Decoder Class:\n    This class defines a decoder module, which is the counterpart of the Encoder class in autoencoder architectures.\n    It applies a convolutional layer followed by batch normalization and an activation function, with an optional dropout layer for regularization.\n    \"\"\"\n\n    def __init__(self, nin, nout, ksize=3, stride=1, pad=1, activ=nn.ReLU, dropout=False):\n        super(Decoder, self).__init__()\n        # Convolutional layer with optional dropout for regularization\n        self.conv1 = Conv2DBNActiv(nin, nout, ksize, 1, pad, activ=activ)\n        # self.conv2 = Conv2DBNActiv(nout, nout, ksize, 1, pad, activ=activ)\n        self.dropout = nn.Dropout2d(0.1) if dropout else None\n\n    def __call__(self, input_tensor, skip=None):\n        # Forward pass through the convolutional layer and optional dropout\n        input_tensor = F.interpolate(input_tensor, scale_factor=2, mode=\"bilinear\", align_corners=True)\n\n        if skip is not None:\n            skip = spec_utils.crop_center(skip, input_tensor)\n            input_tensor = torch.cat([input_tensor, skip], dim=1)\n\n        hidden = self.conv1(input_tensor)\n        # hidden = self.conv2(hidden)\n\n        if self.dropout is not None:\n            hidden = self.dropout(hidden)\n\n        return hidden\n\n\nclass ASPPModule(nn.Module):\n    \"\"\"\n    ASPPModule Class:\n    This class implements the Atrous Spatial Pyramid Pooling (ASPP) module, which is useful for semantic image segmentation tasks.\n    It captures multi-scale contextual information by applying convolutions at multiple dilation rates.\n    \"\"\"\n\n    def __init__(self, nin, nout, dilations=(4, 8, 12), activ=nn.ReLU, dropout=False):\n        super(ASPPModule, self).__init__()\n\n        # Global context convolution captures the overall context\n        self.conv1 = nn.Sequential(nn.AdaptiveAvgPool2d((1, None)), Conv2DBNActiv(nin, nout, 1, 1, 0, activ=activ))\n        self.conv2 = Conv2DBNActiv(nin, nout, 1, 1, 0, activ=activ)\n        self.conv3 = Conv2DBNActiv(nin, nout, 3, 1, dilations[0], dilations[0], activ=activ)\n        self.conv4 = Conv2DBNActiv(nin, nout, 3, 1, dilations[1], dilations[1], activ=activ)\n        self.conv5 = Conv2DBNActiv(nin, nout, 3, 1, dilations[2], dilations[2], activ=activ)\n        self.bottleneck = Conv2DBNActiv(nout * 5, nout, 1, 1, 0, activ=activ)\n        self.dropout = nn.Dropout2d(0.1) if dropout else None\n\n    def forward(self, input_tensor):\n        _, _, h, w = input_tensor.size()\n\n        # Upsample global context to match input size and combine with local and multi-scale features\n        feat1 = F.interpolate(self.conv1(input_tensor), size=(h, w), mode=\"bilinear\", align_corners=True)\n        feat2 = self.conv2(input_tensor)\n        feat3 = self.conv3(input_tensor)\n        feat4 = self.conv4(input_tensor)\n        feat5 = self.conv5(input_tensor)\n        out = torch.cat((feat1, feat2, feat3, feat4, feat5), dim=1)\n        out = self.bottleneck(out)\n\n        if self.dropout is not None:\n            out = self.dropout(out)\n\n        return out\n\n\nclass LSTMModule(nn.Module):\n    \"\"\"\n    LSTMModule Class:\n    This class defines a module that combines convolutional feature extraction with a bidirectional LSTM for sequence modeling.\n    It is useful for tasks that require understanding temporal dynamics in data, such as speech and audio processing.\n    \"\"\"\n\n    def __init__(self, nin_conv, nin_lstm, nout_lstm):\n        super(LSTMModule, self).__init__()\n        # Convolutional layer for initial feature extraction\n        self.conv = Conv2DBNActiv(nin_conv, 1, 1, 1, 0)\n\n        # Bidirectional LSTM for capturing temporal dynamics\n        self.lstm = nn.LSTM(input_size=nin_lstm, hidden_size=nout_lstm // 2, bidirectional=True)\n\n        # Dense layer for output dimensionality matching\n        self.dense = nn.Sequential(nn.Linear(nout_lstm, nin_lstm), nn.BatchNorm1d(nin_lstm), nn.ReLU())\n\n    def forward(self, input_tensor):\n        N, _, nbins, nframes = input_tensor.size()\n\n        # Extract features and prepare for LSTM\n        hidden = self.conv(input_tensor)[:, 0]  # N, nbins, nframes\n        hidden = hidden.permute(2, 0, 1)  # nframes, N, nbins\n        hidden, _ = self.lstm(hidden)\n\n        # Apply dense layer and reshape to match expected output format\n        hidden = self.dense(hidden.reshape(-1, hidden.size()[-1]))  # nframes * N, nbins\n        hidden = hidden.reshape(nframes, N, 1, nbins)\n        hidden = hidden.permute(1, 2, 3, 0)\n\n        return hidden\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/model_param_init.py",
    "content": "import json\r\n\r\ndefault_param = {}\r\ndefault_param[\"bins\"] = -1\r\ndefault_param[\"unstable_bins\"] = -1  # training only\r\ndefault_param[\"stable_bins\"] = -1  # training only\r\ndefault_param[\"sr\"] = 44100\r\ndefault_param[\"pre_filter_start\"] = -1\r\ndefault_param[\"pre_filter_stop\"] = -1\r\ndefault_param[\"band\"] = {}\r\n\r\nN_BINS = \"n_bins\"\r\n\r\n\r\ndef int_keys(d):\r\n    \"\"\"\r\n    Converts string keys that represent integers into actual integer keys in a list.\r\n\r\n    This function is particularly useful when dealing with JSON data that may represent\r\n    integer keys as strings due to the nature of JSON encoding. By converting these keys\r\n    back to integers, it ensures that the data can be used in a manner consistent with\r\n    its original representation, especially in contexts where the distinction between\r\n    string and integer keys is important.\r\n\r\n    Args:\r\n        input_list (list of tuples): A list of (key, value) pairs where keys are strings\r\n                                     that may represent integers.\r\n\r\n    Returns:\r\n        dict: A dictionary with keys converted to integers where applicable.\r\n    \"\"\"\r\n    # Initialize an empty dictionary to hold the converted key-value pairs.\r\n    result_dict = {}\r\n    # Iterate through each key-value pair in the input list.\r\n    for key, value in d:\r\n        # Check if the key is a digit (i.e., represents an integer).\r\n        if key.isdigit():\r\n            # Convert the key from a string to an integer.\r\n            key = int(key)\r\n        result_dict[key] = value\r\n    return result_dict\r\n\r\n\r\nclass ModelParameters(object):\r\n    \"\"\"\r\n    A class to manage model parameters, including loading from a configuration file.\r\n\r\n    Attributes:\r\n        param (dict): Dictionary holding all parameters for the model.\r\n    \"\"\"\r\n\r\n    def __init__(self, config_path=\"\"):\r\n        \"\"\"\r\n        Initializes the ModelParameters object by loading parameters from a JSON configuration file.\r\n\r\n        Args:\r\n            config_path (str): Path to the JSON configuration file.\r\n        \"\"\"\r\n\r\n        # Load parameters from the given configuration file path.\r\n        with open(config_path, \"r\") as f:\r\n            self.param = json.loads(f.read(), object_pairs_hook=int_keys)\r\n\r\n        # Ensure certain parameters are set to False if not specified in the configuration.\r\n        for k in [\"mid_side\", \"mid_side_b\", \"mid_side_b2\", \"stereo_w\", \"stereo_n\", \"reverse\"]:\r\n            if not k in self.param:\r\n                self.param[k] = False\r\n\r\n        # If 'n_bins' is specified in the parameters, it's used as the value for 'bins'.\r\n        if N_BINS in self.param:\r\n            self.param[\"bins\"] = self.param[N_BINS]\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr16000_hl512.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 16000,\n\t\t\t\"hl\": 512,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 1024,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 16000,\n\t\"pre_filter_start\": 1023,\n\t\"pre_filter_stop\": 1024\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr32000_hl512.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 32000,\n\t\t\t\"hl\": 512,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 1024,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"kaiser_fast\"\n\t\t}\n\t},\n\t\"sr\": 32000,\n\t\"pre_filter_start\": 1000,\n\t\"pre_filter_stop\": 1021\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr33075_hl384.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 33075,\n\t\t\t\"hl\": 384,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 1024,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 33075,\n\t\"pre_filter_start\": 1000,\n\t\"pre_filter_stop\": 1021\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr44100_hl1024.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 44100,\n\t\t\t\"hl\": 1024,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 1024,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 44100,\n\t\"pre_filter_start\": 1023,\n\t\"pre_filter_stop\": 1024\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr44100_hl256.json",
    "content": "{\n\t\"bins\": 256,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 44100,\n\t\t\t\"hl\": 256,\n\t\t\t\"n_fft\": 512,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 256,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 44100,\n\t\"pre_filter_start\": 256,\n\t\"pre_filter_stop\": 256\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr44100_hl512.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 44100,\n\t\t\t\"hl\": 512,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 1024,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 44100,\n\t\"pre_filter_start\": 1023,\n\t\"pre_filter_stop\": 1024\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr44100_hl512_cut.json",
    "content": "{\n\t\"bins\": 1024,\n\t\"unstable_bins\": 0,\n\t\"reduction_bins\": 0,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 44100,\n\t\t\t\"hl\": 512,\n\t\t\t\"n_fft\": 2048,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 700,\n\t\t\t\"hpf_start\": -1,\n\t\t\t\"res_type\": \"sinc_best\"\n\t\t}\n\t},\n\t\"sr\": 44100,\n\t\"pre_filter_start\": 1023,\n\t\"pre_filter_stop\": 700\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/1band_sr44100_hl512_nf1024.json",
    "content": "{\r\n\t\"bins\": 512,\r\n\t\"unstable_bins\": 0,\r\n\t\"reduction_bins\": 0,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 512,\r\n\t\t\t\"hpf_start\": -1,\r\n\t\t\t\"res_type\": \"sinc_best\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 511,\r\n\t\"pre_filter_stop\": 512\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/2band_32000.json",
    "content": "{\n\t\"bins\": 768,\n\t\"unstable_bins\": 7,\n\t\"reduction_bins\": 705,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 6000,\n\t\t\t\"hl\": 66,\n\t\t\t\"n_fft\": 512,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 240,\n\t\t\t\"lpf_start\": 60,\n\t\t\t\"lpf_stop\": 118,\n\t\t\t\"res_type\": \"sinc_fastest\"\n\t\t},\n\t\t\"2\": {\n\t\t\t\"sr\": 32000,\n\t\t\t\"hl\": 352,\n\t\t\t\"n_fft\": 1024,\n\t\t\t\"crop_start\": 22,\n\t\t\t\"crop_stop\": 505,\n\t\t\t\"hpf_start\": 44,\n\t\t\t\"hpf_stop\": 23,\n\t\t\t\"res_type\": \"sinc_medium\"\n\t\t}\n\t},\n\t\"sr\": 32000,\n\t\"pre_filter_start\": 710,\n\t\"pre_filter_stop\": 731\n}\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/2band_44100_lofi.json",
    "content": "{\n\t\"bins\": 512,\n\t\"unstable_bins\": 7,\n\t\"reduction_bins\": 510,\n\t\"band\": {\n\t\t\"1\": {\n\t\t\t\"sr\": 11025,\n\t\t\t\"hl\": 160,\n\t\t\t\"n_fft\": 768,\n\t\t\t\"crop_start\": 0,\n\t\t\t\"crop_stop\": 192,\n\t\t\t\"lpf_start\": 41,\n\t\t\t\"lpf_stop\": 139,\n\t\t\t\"res_type\": \"sinc_fastest\"\n\t\t},\n\t\t\"2\": {\n\t\t\t\"sr\": 44100,\n\t\t\t\"hl\": 640,\n\t\t\t\"n_fft\": 1024,\n\t\t\t\"crop_start\": 10,\n\t\t\t\"crop_stop\": 320,\n\t\t\t\"hpf_start\": 47,\n\t\t\t\"hpf_stop\": 15,\n\t\t\t\"res_type\": \"sinc_medium\"\n\t\t}\n\t},\n\t\"sr\": 44100,\n\t\"pre_filter_start\": 510,\n\t\"pre_filter_stop\": 512\n}\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/2band_48000.json",
    "content": "{\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 705,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 6000,\r\n\t\t\t\"hl\": 66,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 240,\r\n\t\t\t\"lpf_start\": 60,\r\n\t\t\t\"lpf_stop\": 240,\r\n\t\t\t\"res_type\": \"sinc_fastest\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 48000,\r\n\t\t\t\"hl\": 528,\r\n\t\t\t\"n_fft\": 1536,\r\n\t\t\t\"crop_start\": 22,\r\n\t\t\t\"crop_stop\": 505,\r\n\t\t\t\"hpf_start\": 82,\r\n\t\t\t\"hpf_stop\": 22,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 48000,\r\n\t\"pre_filter_start\": 710,\r\n\t\"pre_filter_stop\": 731\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/3band_44100.json",
    "content": "{\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 5,\r\n\t\"reduction_bins\": 733,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 278,\r\n\t\t\t\"lpf_start\": 28,\r\n\t\t\t\"lpf_stop\": 140,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 14,\r\n\t\t\t\"crop_stop\": 322,\r\n\t\t\t\"hpf_start\": 70,\r\n\t\t\t\"hpf_stop\": 14,\r\n\t\t\t\"lpf_start\": 283,\r\n\t\t\t\"lpf_stop\": 314,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 131,\r\n\t\t\t\"crop_stop\": 313,\r\n\t\t\t\"hpf_start\": 154,\r\n\t\t\t\"hpf_stop\": 141,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 757,\r\n\t\"pre_filter_stop\": 768\r\n}\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/3band_44100_mid.json",
    "content": "{\r\n\t\"mid_side\": true,\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 5,\r\n\t\"reduction_bins\": 733,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 278,\r\n\t\t\t\"lpf_start\": 28,\r\n\t\t\t\"lpf_stop\": 140,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 14,\r\n\t\t\t\"crop_stop\": 322,\r\n\t\t\t\"hpf_start\": 70,\r\n\t\t\t\"hpf_stop\": 14,\r\n\t\t\t\"lpf_start\": 283,\r\n\t\t\t\"lpf_stop\": 314,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 131,\r\n\t\t\t\"crop_stop\": 313,\r\n\t\t\t\"hpf_start\": 154,\r\n\t\t\t\"hpf_stop\": 141,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 757,\r\n\t\"pre_filter_stop\": 768\r\n}\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/3band_44100_msb2.json",
    "content": "{\r\n\t\"mid_side_b2\": true,\r\n\t\"bins\": 640,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 565,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 108,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 187,\r\n\t\t\t\"lpf_start\": 92,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 216,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 212,\r\n\t\t\t\"hpf_start\": 68,\r\n\t\t\t\"hpf_stop\": 34,\r\n\t\t\t\"lpf_start\": 174,\r\n\t\t\t\"lpf_stop\": 209,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 432,\r\n\t\t\t\"n_fft\": 640,\r\n\t\t\t\"crop_start\": 66,\r\n\t\t\t\"crop_stop\": 307,\r\n\t\t\t\"hpf_start\": 86,\r\n\t\t\t\"hpf_stop\": 72,\r\n\t\t\t\"res_type\": \"kaiser_fast\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 639,\r\n\t\"pre_filter_stop\": 640\r\n}\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100.json",
    "content": "{\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100_mid.json",
    "content": "{\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"mid_side\": true,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100_msb.json",
    "content": "{\r\n\t\"mid_side_b\": true,\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100_msb2.json",
    "content": "{\r\n\t\"mid_side_b\": true,\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100_reverse.json",
    "content": "{\r\n\t\"reverse\": true,\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_44100_sw.json",
    "content": "{\r\n\t\"stereo_w\": true,\r\n\t\"bins\": 768,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 668,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 1024,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"lpf_start\": 37,\r\n\t\t\t\"lpf_stop\": 73,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 128,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 185,\t\t\t\r\n\t\t\t\"hpf_start\": 36,\r\n\t\t\t\"hpf_stop\": 18,\r\n\t\t\t\"lpf_start\": 93,\r\n\t\t\t\"lpf_stop\": 185,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 256,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 46,\r\n\t\t\t\"crop_stop\": 186,\r\n\t\t\t\"hpf_start\": 93,\r\n\t\t\t\"hpf_stop\": 46,\r\n\t\t\t\"lpf_start\": 164,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 512,\r\n\t\t\t\"n_fft\": 768,\r\n\t\t\t\"crop_start\": 121,\r\n\t\t\t\"crop_stop\": 382,\r\n\t\t\t\"hpf_start\": 138,\r\n\t\t\t\"hpf_stop\": 123,\r\n\t\t\t\"res_type\": \"sinc_medium\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 740,\r\n\t\"pre_filter_stop\": 768\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_v2.json",
    "content": "{\r\n\t\"bins\": 672,\r\n\t\"unstable_bins\": 8,\r\n\t\"reduction_bins\": 637,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 640,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 85,\r\n\t\t\t\"lpf_start\": 25,\r\n\t\t\t\"lpf_stop\": 53,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 320,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 87,\r\n\t\t\t\"hpf_start\": 25,\r\n\t\t\t\"hpf_stop\": 12,\r\n\t\t\t\"lpf_start\": 31,\r\n\t\t\t\"lpf_stop\": 62,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 14700,\r\n\t\t\t\"hl\": 160,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 17,\r\n\t\t\t\"crop_stop\": 216,\r\n\t\t\t\"hpf_start\": 48,\r\n\t\t\t\"hpf_stop\": 24,\r\n\t\t\t\"lpf_start\": 139,\r\n\t\t\t\"lpf_stop\": 210,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 480,\r\n\t\t\t\"n_fft\": 960,\r\n\t\t\t\"crop_start\": 78,\r\n\t\t\t\"crop_stop\": 383,\r\n\t\t\t\"hpf_start\": 130,\r\n\t\t\t\"hpf_stop\": 86,\r\n\t\t\t\"res_type\": \"kaiser_fast\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 668,\r\n\t\"pre_filter_stop\": 672\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_v2_sn.json",
    "content": "{\r\n\t\"bins\": 672,\r\n\t\"unstable_bins\": 8,\r\n\t\"reduction_bins\": 637,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 640,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 85,\r\n\t\t\t\"lpf_start\": 25,\r\n\t\t\t\"lpf_stop\": 53,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 320,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 87,\r\n\t\t\t\"hpf_start\": 25,\r\n\t\t\t\"hpf_stop\": 12,\r\n\t\t\t\"lpf_start\": 31,\r\n\t\t\t\"lpf_stop\": 62,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 14700,\r\n\t\t\t\"hl\": 160,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 17,\r\n\t\t\t\"crop_stop\": 216,\r\n\t\t\t\"hpf_start\": 48,\r\n\t\t\t\"hpf_stop\": 24,\r\n\t\t\t\"lpf_start\": 139,\r\n\t\t\t\"lpf_stop\": 210,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 480,\r\n\t\t\t\"n_fft\": 960,\r\n\t\t\t\"crop_start\": 78,\r\n\t\t\t\"crop_stop\": 383,\r\n\t\t\t\"hpf_start\": 130,\r\n\t\t\t\"hpf_stop\": 86,\r\n\t\t\t\"convert_channels\": \"stereo_n\",\r\n\t\t\t\"res_type\": \"kaiser_fast\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 668,\r\n\t\"pre_filter_stop\": 672\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_v3.json",
    "content": "{\r\n\t\"bins\": 672,\r\n\t\"unstable_bins\": 8,\r\n\t\"reduction_bins\": 530,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 640,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 85,\r\n\t\t\t\"lpf_start\": 25,\r\n\t\t\t\"lpf_stop\": 53,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 320,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 87,\r\n\t\t\t\"hpf_start\": 25,\r\n\t\t\t\"hpf_stop\": 12,\r\n\t\t\t\"lpf_start\": 31,\r\n\t\t\t\"lpf_stop\": 62,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 14700,\r\n\t\t\t\"hl\": 160,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 17,\r\n\t\t\t\"crop_stop\": 216,\r\n\t\t\t\"hpf_start\": 48,\r\n\t\t\t\"hpf_stop\": 24,\r\n\t\t\t\"lpf_start\": 139,\r\n\t\t\t\"lpf_stop\": 210,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 480,\r\n\t\t\t\"n_fft\": 960,\r\n\t\t\t\"crop_start\": 78,\r\n\t\t\t\"crop_stop\": 383,\r\n\t\t\t\"hpf_start\": 130,\r\n\t\t\t\"hpf_stop\": 86,\r\n\t\t\t\"res_type\": \"kaiser_fast\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 668,\r\n\t\"pre_filter_stop\": 672\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_v3_sn.json",
    "content": "{\r\n\t\"n_bins\": 672,\r\n\t\"unstable_bins\": 8,\r\n\t\"stable_bins\": 530,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 640,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 85,\r\n\t\t\t\"lpf_start\": 25,\r\n\t\t\t\"lpf_stop\": 53,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 7350,\r\n\t\t\t\"hl\": 80,\r\n\t\t\t\"n_fft\": 320,\r\n\t\t\t\"crop_start\": 4,\r\n\t\t\t\"crop_stop\": 87,\r\n\t\t\t\"hpf_start\": 25,\r\n\t\t\t\"hpf_stop\": 12,\r\n\t\t\t\"lpf_start\": 31,\r\n\t\t\t\"lpf_stop\": 62,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 14700,\r\n\t\t\t\"hl\": 160,\r\n\t\t\t\"n_fft\": 512,\r\n\t\t\t\"crop_start\": 17,\r\n\t\t\t\"crop_stop\": 216,\r\n\t\t\t\"hpf_start\": 48,\r\n\t\t\t\"hpf_stop\": 24,\r\n\t\t\t\"lpf_start\": 139,\r\n\t\t\t\"lpf_stop\": 210,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"4\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 480,\r\n\t\t\t\"n_fft\": 960,\r\n\t\t\t\"crop_start\": 78,\r\n\t\t\t\"crop_stop\": 383,\r\n\t\t\t\"hpf_start\": 130,\r\n\t\t\t\"hpf_stop\": 86,\r\n\t\t\t\"convert_channels\": \"stereo_n\",\r\n\t\t\t\"res_type\": \"kaiser_fast\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 668,\r\n\t\"pre_filter_stop\": 672\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/4band_v4_ms_fullband.json",
    "content": "{\r\n    \"n_bins\": 896,\r\n    \"unstable_bins\": 9,\r\n    \"stable_bins\": 530,\r\n    \"band\": {\r\n        \"1\": {\r\n            \"sr\": 7350,\r\n            \"hl\": 96,\r\n            \"n_fft\": 768,\r\n            \"crop_start\": 0,\r\n            \"crop_stop\": 102,\r\n            \"lpf_start\": 30,\r\n            \"lpf_stop\": 62,\r\n            \"res_type\": \"polyphase\",\r\n            \"convert_channels\": \"mid_side\"\r\n        },\r\n        \"2\": {\r\n            \"sr\": 7350,\r\n            \"hl\": 96,\r\n            \"n_fft\": 384,\r\n            \"crop_start\": 5,\r\n            \"crop_stop\": 104,\r\n            \"hpf_start\": 30,\r\n            \"hpf_stop\": 14,\r\n            \"lpf_start\": 37,\r\n            \"lpf_stop\": 73,\r\n            \"res_type\": \"polyphase\",\r\n\t\t\t\"convert_channels\": \"mid_side\"\r\n        },\r\n        \"3\": {\r\n            \"sr\": 14700,\r\n            \"hl\": 192,\r\n            \"n_fft\": 640,\r\n            \"crop_start\": 20,\r\n            \"crop_stop\": 259,\r\n            \"hpf_start\": 58,\r\n            \"hpf_stop\": 29,\r\n            \"lpf_start\": 191,\r\n            \"lpf_stop\": 262,\r\n            \"res_type\": \"polyphase\",\r\n\t\t\t\"convert_channels\": \"mid_side\"\r\n        },\r\n        \"4\": {\r\n            \"sr\": 44100,\r\n            \"hl\": 576,\r\n            \"n_fft\": 1152,\r\n            \"crop_start\": 119,\r\n            \"crop_stop\": 575,\r\n            \"hpf_start\": 157,\r\n            \"hpf_stop\": 110,\r\n            \"res_type\": \"kaiser_fast\",\r\n\t\t\t\"convert_channels\": \"mid_side\"\r\n        }\r\n    },\r\n    \"sr\": 44100,\r\n    \"pre_filter_start\": -1,\r\n    \"pre_filter_stop\": -1\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/modelparams/ensemble.json",
    "content": "{\r\n\t\"mid_side_b2\": true,\r\n\t\"bins\": 1280,\r\n\t\"unstable_bins\": 7,\r\n\t\"reduction_bins\": 565,\r\n\t\"band\": {\r\n\t\t\"1\": {\r\n\t\t\t\"sr\": 11025,\r\n\t\t\t\"hl\": 108,\r\n\t\t\t\"n_fft\": 2048,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 374,\r\n\t\t\t\"lpf_start\": 92,\r\n\t\t\t\"lpf_stop\": 186,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\r\n\t\t\"2\": {\r\n\t\t\t\"sr\": 22050,\r\n\t\t\t\"hl\": 216,\r\n\t\t\t\"n_fft\": 1536,\r\n\t\t\t\"crop_start\": 0,\r\n\t\t\t\"crop_stop\": 424,\r\n\t\t\t\"hpf_start\": 68,\r\n\t\t\t\"hpf_stop\": 34,\r\n\t\t\t\"lpf_start\": 348,\r\n\t\t\t\"lpf_stop\": 418,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t},\t\r\n\t\t\"3\": {\r\n\t\t\t\"sr\": 44100,\r\n\t\t\t\"hl\": 432,\r\n\t\t\t\"n_fft\": 1280,\r\n\t\t\t\"crop_start\": 132,\r\n\t\t\t\"crop_stop\": 614,\r\n\t\t\t\"hpf_start\": 172,\r\n\t\t\t\"hpf_stop\": 144,\r\n\t\t\t\"res_type\": \"polyphase\"\r\n\t\t}\r\n\t},\r\n\t\"sr\": 44100,\r\n\t\"pre_filter_start\": 1280,\r\n\t\"pre_filter_stop\": 1280\r\n}"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/nets.py",
    "content": "import torch\r\nfrom torch import nn\r\nimport torch.nn.functional as F\r\n\r\nfrom . import layers\r\n\r\n\r\nclass BaseASPPNet(nn.Module):\r\n    \"\"\"\r\n    BaseASPPNet Class:\r\n    This class defines the base architecture for an Atrous Spatial Pyramid Pooling (ASPP) network.\r\n    It is designed to extract features from input data at multiple scales by using dilated convolutions.\r\n    This is particularly useful for tasks that benefit from understanding context at different resolutions,\r\n    such as semantic segmentation. The network consists of a series of encoder layers for downsampling and feature extraction,\r\n    followed by an ASPP module for multi-scale feature extraction, and finally a series of decoder layers for upsampling.\r\n    \"\"\"\r\n\r\n    def __init__(self, nn_architecture, nin, ch, dilations=(4, 8, 16)):\r\n        super(BaseASPPNet, self).__init__()\r\n        self.nn_architecture = nn_architecture\r\n\r\n        # Encoder layers progressively increase the number of channels while reducing spatial dimensions.\r\n        self.enc1 = layers.Encoder(nin, ch, 3, 2, 1)\r\n        self.enc2 = layers.Encoder(ch, ch * 2, 3, 2, 1)\r\n        self.enc3 = layers.Encoder(ch * 2, ch * 4, 3, 2, 1)\r\n        self.enc4 = layers.Encoder(ch * 4, ch * 8, 3, 2, 1)\r\n\r\n        # Depending on the network architecture, an additional encoder layer and a specific ASPP module are initialized.\r\n        if self.nn_architecture == 129605:\r\n            self.enc5 = layers.Encoder(ch * 8, ch * 16, 3, 2, 1)\r\n            self.aspp = layers.ASPPModule(nn_architecture, ch * 16, ch * 32, dilations)\r\n            self.dec5 = layers.Decoder(ch * (16 + 32), ch * 16, 3, 1, 1)\r\n        else:\r\n            self.aspp = layers.ASPPModule(nn_architecture, ch * 8, ch * 16, dilations)\r\n\r\n        # Decoder layers progressively decrease the number of channels while increasing spatial dimensions.\r\n        self.dec4 = layers.Decoder(ch * (8 + 16), ch * 8, 3, 1, 1)\r\n        self.dec3 = layers.Decoder(ch * (4 + 8), ch * 4, 3, 1, 1)\r\n        self.dec2 = layers.Decoder(ch * (2 + 4), ch * 2, 3, 1, 1)\r\n        self.dec1 = layers.Decoder(ch * (1 + 2), ch, 3, 1, 1)\r\n\r\n    def __call__(self, input_tensor):\r\n        # The input tensor is passed through a series of encoder layers.\r\n        hidden_state, encoder_output1 = self.enc1(input_tensor)\r\n        hidden_state, encoder_output2 = self.enc2(hidden_state)\r\n        hidden_state, encoder_output3 = self.enc3(hidden_state)\r\n        hidden_state, encoder_output4 = self.enc4(hidden_state)\r\n\r\n        # Depending on the network architecture, the hidden state is processed by an additional encoder layer and the ASPP module.\r\n        if self.nn_architecture == 129605:\r\n            hidden_state, encoder_output5 = self.enc5(hidden_state)\r\n            hidden_state = self.aspp(hidden_state)\r\n            # The decoder layers use skip connections from the encoder layers for better feature integration.\r\n            hidden_state = self.dec5(hidden_state, encoder_output5)\r\n        else:\r\n            hidden_state = self.aspp(hidden_state)\r\n\r\n        # The hidden state is further processed by the decoder layers, using skip connections for feature integration.\r\n        hidden_state = self.dec4(hidden_state, encoder_output4)\r\n        hidden_state = self.dec3(hidden_state, encoder_output3)\r\n        hidden_state = self.dec2(hidden_state, encoder_output2)\r\n        hidden_state = self.dec1(hidden_state, encoder_output1)\r\n\r\n        return hidden_state\r\n\r\n\r\ndef determine_model_capacity(n_fft_bins, nn_architecture):\r\n    \"\"\"\r\n    The determine_model_capacity function is designed to select the appropriate model configuration\r\n    based on the frequency bins and network architecture. It maps specific architectures to predefined\r\n    model capacities, which dictate the structure and parameters of the CascadedASPPNet model.\r\n    \"\"\"\r\n\r\n    # Predefined model architectures categorized by their precision level.\r\n    sp_model_arch = [31191, 33966, 129605]\r\n    hp_model_arch = [123821, 123812]\r\n    hp2_model_arch = [537238, 537227]\r\n\r\n    # Mapping network architectures to their corresponding model capacity data.\r\n    if nn_architecture in sp_model_arch:\r\n        model_capacity_data = [(2, 16), (2, 16), (18, 8, 1, 1, 0), (8, 16), (34, 16, 1, 1, 0), (16, 32), (32, 2, 1), (16, 2, 1), (16, 2, 1)]\r\n\r\n    if nn_architecture in hp_model_arch:\r\n        model_capacity_data = [(2, 32), (2, 32), (34, 16, 1, 1, 0), (16, 32), (66, 32, 1, 1, 0), (32, 64), (64, 2, 1), (32, 2, 1), (32, 2, 1)]\r\n\r\n    if nn_architecture in hp2_model_arch:\r\n        model_capacity_data = [(2, 64), (2, 64), (66, 32, 1, 1, 0), (32, 64), (130, 64, 1, 1, 0), (64, 128), (128, 2, 1), (64, 2, 1), (64, 2, 1)]\r\n\r\n    # Initializing the CascadedASPPNet model with the selected model capacity data.\r\n    cascaded = CascadedASPPNet\r\n    model = cascaded(n_fft_bins, model_capacity_data, nn_architecture)\r\n\r\n    return model\r\n\r\n\r\nclass CascadedASPPNet(nn.Module):\r\n    \"\"\"\r\n    CascadedASPPNet Class:\r\n    This class implements a cascaded version of the ASPP network, designed for processing audio signals\r\n    for tasks such as vocal removal. It consists of multiple stages, each with its own ASPP network,\r\n    to process different frequency bands of the input signal. This allows the model to effectively\r\n    handle the full spectrum of audio frequencies by focusing on different frequency bands separately.\r\n    \"\"\"\r\n\r\n    def __init__(self, n_fft, model_capacity_data, nn_architecture):\r\n        super(CascadedASPPNet, self).__init__()\r\n        # The first stage processes the low and high frequency bands separately.\r\n        self.stg1_low_band_net = BaseASPPNet(nn_architecture, *model_capacity_data[0])\r\n        self.stg1_high_band_net = BaseASPPNet(nn_architecture, *model_capacity_data[1])\r\n\r\n        # Bridge layers connect different stages of the network.\r\n        self.stg2_bridge = layers.Conv2DBNActiv(*model_capacity_data[2])\r\n        self.stg2_full_band_net = BaseASPPNet(nn_architecture, *model_capacity_data[3])\r\n\r\n        self.stg3_bridge = layers.Conv2DBNActiv(*model_capacity_data[4])\r\n        self.stg3_full_band_net = BaseASPPNet(nn_architecture, *model_capacity_data[5])\r\n\r\n        # Output layers for the final mask prediction and auxiliary outputs.\r\n        self.out = nn.Conv2d(*model_capacity_data[6], bias=False)\r\n        self.aux1_out = nn.Conv2d(*model_capacity_data[7], bias=False)\r\n        self.aux2_out = nn.Conv2d(*model_capacity_data[8], bias=False)\r\n\r\n        # Parameters for handling the frequency bins of the input signal.\r\n        self.max_bin = n_fft // 2\r\n        self.output_bin = n_fft // 2 + 1\r\n\r\n        self.offset = 128\r\n\r\n    def forward(self, input_tensor):\r\n        # The forward pass processes the input tensor through each stage of the network,\r\n        # combining the outputs of different frequency bands and stages to produce the final mask.\r\n        mix = input_tensor.detach()\r\n        input_tensor = input_tensor.clone()\r\n\r\n        # Preparing the input tensor by selecting the mainput_tensorimum frequency bin.\r\n        input_tensor = input_tensor[:, :, : self.max_bin]\r\n\r\n        # Processing the low and high frequency bands separately in the first stage.\r\n        bandwidth = input_tensor.size()[2] // 2\r\n        aux1 = torch.cat([self.stg1_low_band_net(input_tensor[:, :, :bandwidth]), self.stg1_high_band_net(input_tensor[:, :, bandwidth:])], dim=2)\r\n\r\n        # Combining the outputs of the first stage and passing through the second stage.\r\n        hidden_state = torch.cat([input_tensor, aux1], dim=1)\r\n        aux2 = self.stg2_full_band_net(self.stg2_bridge(hidden_state))\r\n\r\n        # Further processing the combined outputs through the third stage.\r\n        hidden_state = torch.cat([input_tensor, aux1, aux2], dim=1)\r\n        hidden_state = self.stg3_full_band_net(self.stg3_bridge(hidden_state))\r\n\r\n        # Applying the final output layer to produce the mask.\r\n        mask = torch.sigmoid(self.out(hidden_state))\r\n\r\n        # Padding the mask to match the output frequency bin size.\r\n        mask = F.pad(input=mask, pad=(0, 0, 0, self.output_bin - mask.size()[2]), mode=\"replicate\")\r\n\r\n        # During training, auxiliary outputs are also produced and padded accordingly.\r\n        if self.training:\r\n            aux1 = torch.sigmoid(self.aux1_out(aux1))\r\n            aux1 = F.pad(input=aux1, pad=(0, 0, 0, self.output_bin - aux1.size()[2]), mode=\"replicate\")\r\n            aux2 = torch.sigmoid(self.aux2_out(aux2))\r\n            aux2 = F.pad(input=aux2, pad=(0, 0, 0, self.output_bin - aux2.size()[2]), mode=\"replicate\")\r\n            return mask * mix, aux1 * mix, aux2 * mix\r\n        else:\r\n            return mask  # * mix\r\n\r\n    def predict_mask(self, input_tensor):\r\n        # This method predicts the mask for the input tensor by calling the forward method\r\n        # and applying any necessary padding adjustments.\r\n        mask = self.forward(input_tensor)\r\n\r\n        # Adjusting the mask by removing padding offsets if present.\r\n        if self.offset > 0:\r\n            mask = mask[:, :, :, self.offset : -self.offset]\r\n\r\n        return mask\r\n"
  },
  {
    "path": "audio_separator/separator/uvr_lib_v5/vr_network/nets_new.py",
    "content": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nfrom . import layers_new as layers\n\n\nclass BaseNet(nn.Module):\n    \"\"\"\n    BaseNet Class:\n    This class defines the base network architecture for vocal removal. It includes a series of encoders for feature extraction,\n    an ASPP module for capturing multi-scale context, and a series of decoders for reconstructing the output. Additionally,\n    it incorporates an LSTM module for capturing temporal dependencies.\n    \"\"\"\n\n    def __init__(self, nin, nout, nin_lstm, nout_lstm, dilations=((4, 2), (8, 4), (12, 6))):\n        super(BaseNet, self).__init__()\n        # Initialize the encoder layers with increasing output channels for hierarchical feature extraction.\n        self.enc1 = layers.Conv2DBNActiv(nin, nout, 3, 1, 1)\n        self.enc2 = layers.Encoder(nout, nout * 2, 3, 2, 1)\n        self.enc3 = layers.Encoder(nout * 2, nout * 4, 3, 2, 1)\n        self.enc4 = layers.Encoder(nout * 4, nout * 6, 3, 2, 1)\n        self.enc5 = layers.Encoder(nout * 6, nout * 8, 3, 2, 1)\n\n        # ASPP module for capturing multi-scale features with different dilation rates.\n        self.aspp = layers.ASPPModule(nout * 8, nout * 8, dilations, dropout=True)\n\n        # Decoder layers for upscaling and merging features from different levels of the encoder and ASPP module.\n        self.dec4 = layers.Decoder(nout * (6 + 8), nout * 6, 3, 1, 1)\n        self.dec3 = layers.Decoder(nout * (4 + 6), nout * 4, 3, 1, 1)\n        self.dec2 = layers.Decoder(nout * (2 + 4), nout * 2, 3, 1, 1)\n\n        # LSTM module for capturing temporal dependencies in the sequence of features.\n        self.lstm_dec2 = layers.LSTMModule(nout * 2, nin_lstm, nout_lstm)\n        self.dec1 = layers.Decoder(nout * (1 + 2) + 1, nout * 1, 3, 1, 1)\n\n    def __call__(self, input_tensor):\n        # Sequentially pass the input through the encoder layers.\n        encoded1 = self.enc1(input_tensor)\n        encoded2 = self.enc2(encoded1)\n        encoded3 = self.enc3(encoded2)\n        encoded4 = self.enc4(encoded3)\n        encoded5 = self.enc5(encoded4)\n\n        # Pass the deepest encoder output through the ASPP module.\n        bottleneck = self.aspp(encoded5)\n\n        # Sequentially upscale and merge the features using the decoder layers.\n        bottleneck = self.dec4(bottleneck, encoded4)\n        bottleneck = self.dec3(bottleneck, encoded3)\n        bottleneck = self.dec2(bottleneck, encoded2)\n        # Concatenate the LSTM module output for temporal feature enhancement.\n        bottleneck = torch.cat([bottleneck, self.lstm_dec2(bottleneck)], dim=1)\n        bottleneck = self.dec1(bottleneck, encoded1)\n\n        return bottleneck\n\n\nclass CascadedNet(nn.Module):\n    \"\"\"\n    CascadedNet Class:\n    This class defines a cascaded network architecture that processes input in multiple stages, each stage focusing on different frequency bands.\n    It utilizes the BaseNet for processing, and combines outputs from different stages to produce the final mask for vocal removal.\n    \"\"\"\n\n    def __init__(self, n_fft, nn_arch_size=51000, nout=32, nout_lstm=128):\n        super(CascadedNet, self).__init__()\n        # Calculate frequency bins based on FFT size.\n        self.max_bin = n_fft // 2\n        self.output_bin = n_fft // 2 + 1\n        self.nin_lstm = self.max_bin // 2\n        self.offset = 64\n        # Adjust output channels based on the architecture size.\n        nout = 64 if nn_arch_size == 218409 else nout\n\n        # print(nout, nout_lstm, n_fft)\n\n        # Initialize the network stages, each focusing on different frequency bands and progressively refining the output.\n        self.stg1_low_band_net = nn.Sequential(BaseNet(2, nout // 2, self.nin_lstm // 2, nout_lstm), layers.Conv2DBNActiv(nout // 2, nout // 4, 1, 1, 0))\n        self.stg1_high_band_net = BaseNet(2, nout // 4, self.nin_lstm // 2, nout_lstm // 2)\n\n        self.stg2_low_band_net = nn.Sequential(BaseNet(nout // 4 + 2, nout, self.nin_lstm // 2, nout_lstm), layers.Conv2DBNActiv(nout, nout // 2, 1, 1, 0))\n        self.stg2_high_band_net = BaseNet(nout // 4 + 2, nout // 2, self.nin_lstm // 2, nout_lstm // 2)\n\n        self.stg3_full_band_net = BaseNet(3 * nout // 4 + 2, nout, self.nin_lstm, nout_lstm)\n\n        # Output layer for generating the final mask.\n        self.out = nn.Conv2d(nout, 2, 1, bias=False)\n        # Auxiliary output layer for intermediate supervision during training.\n        self.aux_out = nn.Conv2d(3 * nout // 4, 2, 1, bias=False)\n\n    def forward(self, input_tensor):\n        # Preprocess input tensor to match the maximum frequency bin.\n        input_tensor = input_tensor[:, :, : self.max_bin]\n\n        # Split the input into low and high frequency bands.\n        bandw = input_tensor.size()[2] // 2\n        l1_in = input_tensor[:, :, :bandw]\n        h1_in = input_tensor[:, :, bandw:]\n\n        # Process each band through the first stage networks.\n        l1 = self.stg1_low_band_net(l1_in)\n        h1 = self.stg1_high_band_net(h1_in)\n\n        # Combine the outputs for auxiliary supervision.\n        aux1 = torch.cat([l1, h1], dim=2)\n\n        # Prepare inputs for the second stage by concatenating the original and processed bands.\n        l2_in = torch.cat([l1_in, l1], dim=1)\n        h2_in = torch.cat([h1_in, h1], dim=1)\n\n        # Process through the second stage networks.\n        l2 = self.stg2_low_band_net(l2_in)\n        h2 = self.stg2_high_band_net(h2_in)\n\n        # Combine the outputs for auxiliary supervision.\n        aux2 = torch.cat([l2, h2], dim=2)\n\n        # Prepare input for the third stage by concatenating all previous outputs with the original input.\n        f3_in = torch.cat([input_tensor, aux1, aux2], dim=1)\n\n        # Process through the third stage network.\n        f3 = self.stg3_full_band_net(f3_in)\n\n        # Apply the output layer to generate the final mask and apply sigmoid for normalization.\n        mask = torch.sigmoid(self.out(f3))\n\n        # Pad the mask to match the output frequency bin size.\n        mask = F.pad(input=mask, pad=(0, 0, 0, self.output_bin - mask.size()[2]), mode=\"replicate\")\n\n        # During training, generate and pad the auxiliary output for additional supervision.\n        if self.training:\n            aux = torch.cat([aux1, aux2], dim=1)\n            aux = torch.sigmoid(self.aux_out(aux))\n            aux = F.pad(input=aux, pad=(0, 0, 0, self.output_bin - aux.size()[2]), mode=\"replicate\")\n            return mask, aux\n        else:\n            return mask\n\n    # Method for predicting the mask given an input tensor.\n    def predict_mask(self, input_tensor):\n        mask = self.forward(input_tensor)\n\n        # If an offset is specified, crop the mask to remove edge artifacts.\n        if self.offset > 0:\n            mask = mask[:, :, :, self.offset : -self.offset]\n            assert mask.size()[3] > 0\n\n        return mask\n\n    # Method for applying the predicted mask to the input tensor to obtain the predicted magnitude.\n    def predict(self, input_tensor):\n        mask = self.forward(input_tensor)\n        pred_mag = input_tensor * mask\n\n        # If an offset is specified, crop the predicted magnitude to remove edge artifacts.\n        if self.offset > 0:\n            pred_mag = pred_mag[:, :, :, self.offset : -self.offset]\n            assert pred_mag.size()[3] > 0\n\n        return pred_mag\n"
  },
  {
    "path": "audio_separator/utils/__init__.py",
    "content": ""
  },
  {
    "path": "audio_separator/utils/cli.py",
    "content": "#!/usr/bin/env python\nimport argparse\nimport logging\nimport json\nimport sys\nimport os\nfrom importlib import metadata\n\n\ndef main():\n    \"\"\"Main entry point for the CLI.\"\"\"\n    logger = logging.getLogger(__name__)\n    log_handler = logging.StreamHandler()\n    log_formatter = logging.Formatter(fmt=\"%(asctime)s.%(msecs)03d - %(levelname)s - %(module)s - %(message)s\", datefmt=\"%Y-%m-%d %H:%M:%S\")\n    log_handler.setFormatter(log_formatter)\n    logger.addHandler(log_handler)\n\n    parser = argparse.ArgumentParser(description=\"Separate audio file into different stems.\", formatter_class=lambda prog: argparse.RawTextHelpFormatter(prog, max_help_position=60))\n\n    parser.add_argument(\"audio_files\", nargs=\"*\", help=\"The audio file paths or directory to separate, in any common format.\", default=argparse.SUPPRESS)\n\n    package_version = metadata.distribution(\"audio-separator\").version\n\n    version_help = \"Show the program's version number and exit.\"\n    debug_help = \"Enable debug logging, equivalent to --log_level=debug.\"\n    env_info_help = \"Print environment information and exit.\"\n    list_models_help = \"List all supported models and exit. Use --list_filter to filter/sort the list and --list_limit to show only top N results.\"\n    log_level_help = \"Log level, e.g. info, debug, warning (default: %(default)s).\"\n\n    info_params = parser.add_argument_group(\"Info and Debugging\")\n    info_params.add_argument(\"-v\", \"--version\", action=\"version\", version=f\"%(prog)s {package_version}\", help=version_help)\n    info_params.add_argument(\"-d\", \"--debug\", action=\"store_true\", help=debug_help)\n    info_params.add_argument(\"-e\", \"--env_info\", action=\"store_true\", help=env_info_help)\n    info_params.add_argument(\"-l\", \"--list_models\", action=\"store_true\", help=list_models_help)\n    info_params.add_argument(\"--log_level\", default=\"info\", help=log_level_help)\n    info_params.add_argument(\"--list_filter\", help=\"Filter and sort the model list by 'name', 'filename', or any stem e.g. vocals, instrumental, drums\")\n    info_params.add_argument(\"--list_limit\", type=int, help=\"Limit the number of models shown\")\n    info_params.add_argument(\"--list_format\", choices=[\"pretty\", \"json\"], default=\"pretty\", help=\"Format for listing models: 'pretty' for formatted output, 'json' for raw JSON dump\")\n\n    model_filename_help = \"Model to use for separation (default: %(default)s). Example: -m model1.ckpt\"\n    extra_models_help = \"Additional models for ensembling. Requires -m for the primary model. Example: --extra_models model2.onnx model3.ckpt\"\n    output_format_help = \"Output format for separated files, any common format (default: %(default)s). Example: --output_format=MP3\"\n    output_bitrate_help = \"Output bitrate for separated files, any ffmpeg-compatible bitrate (default: %(default)s). Example: --output_bitrate=320k\"\n    output_dir_help = \"Directory to write output files (default: <current dir>). Example: --output_dir=/app/separated\"\n    model_file_dir_help = \"Model files directory (default: %(default)s or AUDIO_SEPARATOR_MODEL_DIR env var if set). Example: --model_file_dir=/app/models\"\n    download_model_only_help = \"Download a single model file only, without performing separation.\"\n\n    io_params = parser.add_argument_group(\"Separation I/O Params\")\n    io_params.add_argument(\"-m\", \"--model_filename\", default=\"model_bs_roformer_ep_317_sdr_12.9755.ckpt\", help=model_filename_help)\n    io_params.add_argument(\"--extra_models\", nargs=\"+\", default=None, help=extra_models_help)\n    io_params.add_argument(\"--output_format\", default=\"FLAC\", help=output_format_help)\n    io_params.add_argument(\"--output_bitrate\", default=None, help=output_bitrate_help)\n    io_params.add_argument(\"--output_dir\", default=None, help=output_dir_help)\n    io_params.add_argument(\"--model_file_dir\", default=\"/tmp/audio-separator-models/\", help=model_file_dir_help)\n    io_params.add_argument(\"--download_model_only\", action=\"store_true\", help=download_model_only_help)\n\n    invert_spect_help = \"Invert secondary stem using spectrogram (default: %(default)s). Example: --invert_spect\"\n    normalization_help = \"Max peak amplitude to normalize input and output audio to (default: %(default)s). Example: --normalization=0.7\"\n    amplification_help = \"Min peak amplitude to amplify input and output audio to (default: %(default)s). Example: --amplification=0.4\"\n    single_stem_help = \"Output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental\"\n    sample_rate_help = \"Modify the sample rate of the output audio (default: %(default)s). Example: --sample_rate=44100\"\n    use_soundfile_help = \"Use soundfile to write audio output (default: %(default)s). Example: --use_soundfile\"\n    use_autocast_help = \"Use PyTorch autocast for faster inference (default: %(default)s). Do not use for CPU inference. Example: --use_autocast\"\n    chunk_duration_help = \"Split audio into chunks of this duration in seconds (default: %(default)s = no chunking). Useful for processing very long audio files on systems with limited memory. Recommended: 600 (10 minutes) for files >1 hour. Chunks are concatenated without overlap/crossfade. Example: --chunk_duration=600\"\n    ensemble_algorithm_help = \"Algorithm to use for ensembling multiple models (default: avg_wave). Choices: avg_wave, median_wave, min_wave, max_wave, avg_fft, median_fft, min_fft, max_fft, uvr_max_spec, uvr_min_spec, ensemble_wav. Example: --ensemble_algorithm=uvr_max_spec\"\n    ensemble_weights_help = \"Weights for ensembling multiple models (default: equal). Number of weights must match number of models. Example: --ensemble_weights 1.0 0.5\"\n    ensemble_preset_help = \"Use a named ensemble preset (e.g. vocal_balanced, karaoke). Presets define models + algorithm. Use --list_presets to see all. Example: --ensemble_preset=vocal_balanced\"\n    list_presets_help = \"List all available ensemble presets and exit.\"\n    custom_output_names_help = 'Custom names for all output files in JSON format (default: %(default)s). Example: --custom_output_names=\\'{\"Vocals\": \"vocals_output\", \"Drums\": \"drums_output\"}\\''\n\n    common_params = parser.add_argument_group(\"Common Separation Parameters\")\n    common_params.add_argument(\"--invert_spect\", action=\"store_true\", help=invert_spect_help)\n    common_params.add_argument(\"--normalization\", type=float, default=0.9, help=normalization_help)\n    common_params.add_argument(\"--amplification\", type=float, default=0.0, help=amplification_help)\n    common_params.add_argument(\"--single_stem\", default=None, help=single_stem_help)\n    common_params.add_argument(\"--sample_rate\", type=int, default=44100, help=sample_rate_help)\n    common_params.add_argument(\"--use_soundfile\", action=\"store_true\", help=use_soundfile_help)\n    common_params.add_argument(\"--use_autocast\", action=\"store_true\", help=use_autocast_help)\n    common_params.add_argument(\"--chunk_duration\", type=float, default=None, help=chunk_duration_help)\n    common_params.add_argument(\n        \"--ensemble_algorithm\",\n        default=None,\n        choices=[\"avg_wave\", \"median_wave\", \"min_wave\", \"max_wave\", \"avg_fft\", \"median_fft\", \"min_fft\", \"max_fft\", \"uvr_max_spec\", \"uvr_min_spec\", \"ensemble_wav\"],\n        help=ensemble_algorithm_help,\n    )\n    common_params.add_argument(\"--ensemble_weights\", nargs=\"+\", type=float, default=None, help=ensemble_weights_help)\n    common_params.add_argument(\"--ensemble_preset\", default=None, help=ensemble_preset_help)\n    common_params.add_argument(\"--list_presets\", action=\"store_true\", help=list_presets_help)\n    common_params.add_argument(\"--custom_output_names\", type=json.loads, default=None, help=custom_output_names_help)\n\n    mdx_segment_size_help = \"Larger consumes more resources, but may give better results (default: %(default)s). Example: --mdx_segment_size=256\"\n    mdx_overlap_help = \"Amount of overlap between prediction windows, 0.001-0.999. Higher is better but slower (default: %(default)s). Example: --mdx_overlap=0.25\"\n    mdx_batch_size_help = \"Larger consumes more RAM but may process slightly faster (default: %(default)s). Example: --mdx_batch_size=4\"\n    mdx_hop_length_help = \"Usually called stride in neural networks, only change if you know what you're doing (default: %(default)s). Example: --mdx_hop_length=1024\"\n    mdx_enable_denoise_help = \"Enable denoising during separation (default: %(default)s). Example: --mdx_enable_denoise\"\n\n    mdx_params = parser.add_argument_group(\"MDX Architecture Parameters\")\n    mdx_params.add_argument(\"--mdx_segment_size\", type=int, default=256, help=mdx_segment_size_help)\n    mdx_params.add_argument(\"--mdx_overlap\", type=float, default=0.25, help=mdx_overlap_help)\n    mdx_params.add_argument(\"--mdx_batch_size\", type=int, default=1, help=mdx_batch_size_help)\n    mdx_params.add_argument(\"--mdx_hop_length\", type=int, default=1024, help=mdx_hop_length_help)\n    mdx_params.add_argument(\"--mdx_enable_denoise\", action=\"store_true\", help=mdx_enable_denoise_help)\n\n    vr_batch_size_help = \"Number of batches to process at a time. Higher = more RAM, slightly faster processing (default: %(default)s). Example: --vr_batch_size=16\"\n    vr_window_size_help = \"Balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: %(default)s). Example: --vr_window_size=320\"\n    vr_aggression_help = \"Intensity of primary stem extraction, -100 - 100. Typically, 5 for vocals & instrumentals (default: %(default)s). Example: --vr_aggression=2\"\n    vr_enable_tta_help = \"Enable Test-Time-Augmentation; slow but improves quality (default: %(default)s). Example: --vr_enable_tta\"\n    vr_high_end_process_help = \"Mirror the missing frequency range of the output (default: %(default)s). Example: --vr_high_end_process\"\n    vr_enable_post_process_help = \"Identify leftover artifacts within vocal output; may improve separation for some songs (default: %(default)s). Example: --vr_enable_post_process\"\n    vr_post_process_threshold_help = \"Threshold for post_process feature: 0.1-0.3 (default: %(default)s). Example: --vr_post_process_threshold=0.1\"\n\n    vr_params = parser.add_argument_group(\"VR Architecture Parameters\")\n    vr_params.add_argument(\"--vr_batch_size\", type=int, default=1, help=vr_batch_size_help)\n    vr_params.add_argument(\"--vr_window_size\", type=int, default=512, help=vr_window_size_help)\n    vr_params.add_argument(\"--vr_aggression\", type=int, default=5, help=vr_aggression_help)\n    vr_params.add_argument(\"--vr_enable_tta\", action=\"store_true\", help=vr_enable_tta_help)\n    vr_params.add_argument(\"--vr_high_end_process\", action=\"store_true\", help=vr_high_end_process_help)\n    vr_params.add_argument(\"--vr_enable_post_process\", action=\"store_true\", help=vr_enable_post_process_help)\n    vr_params.add_argument(\"--vr_post_process_threshold\", type=float, default=0.2, help=vr_post_process_threshold_help)\n\n    demucs_segment_size_help = \"Size of segments into which the audio is split, 1-100. Higher = slower but better quality (default: %(default)s). Example: --demucs_segment_size=256\"\n    demucs_shifts_help = \"Number of predictions with random shifts, higher = slower but better quality (default: %(default)s). Example: --demucs_shifts=4\"\n    demucs_overlap_help = \"Overlap between prediction windows, 0.001-0.999. Higher = slower but better quality (default: %(default)s). Example: --demucs_overlap=0.25\"\n    demucs_segments_enabled_help = \"Enable segment-wise processing (default: %(default)s). Example: --demucs_segments_enabled=False\"\n\n    demucs_params = parser.add_argument_group(\"Demucs Architecture Parameters\")\n    demucs_params.add_argument(\"--demucs_segment_size\", type=str, default=\"Default\", help=demucs_segment_size_help)\n    demucs_params.add_argument(\"--demucs_shifts\", type=int, default=2, help=demucs_shifts_help)\n    demucs_params.add_argument(\"--demucs_overlap\", type=float, default=0.25, help=demucs_overlap_help)\n    demucs_params.add_argument(\"--demucs_segments_enabled\", type=bool, default=True, help=demucs_segments_enabled_help)\n\n    mdxc_segment_size_help = \"Larger consumes more resources, but may give better results (default: %(default)s). Example: --mdxc_segment_size=256\"\n    mdxc_override_model_segment_size_help = \"Override model default segment size instead of using the model default value. Example: --mdxc_override_model_segment_size\"\n    mdxc_overlap_help = \"Amount of overlap between prediction windows, 2-50. Higher is better but slower (default: %(default)s). Example: --mdxc_overlap=8\"\n    mdxc_batch_size_help = \"Larger consumes more RAM but may process slightly faster (default: %(default)s). Example: --mdxc_batch_size=4\"\n    mdxc_pitch_shift_help = \"Shift audio pitch by a number of semitones while processing. May improve output for deep/high vocals. (default: %(default)s). Example: --mdxc_pitch_shift=2\"\n\n    mdxc_params = parser.add_argument_group(\"MDXC Architecture Parameters\")\n    mdxc_params.add_argument(\"--mdxc_segment_size\", type=int, default=256, help=mdxc_segment_size_help)\n    mdxc_params.add_argument(\"--mdxc_override_model_segment_size\", action=\"store_true\", help=mdxc_override_model_segment_size_help)\n    mdxc_params.add_argument(\"--mdxc_overlap\", type=int, default=8, help=mdxc_overlap_help)\n    mdxc_params.add_argument(\"--mdxc_batch_size\", type=int, default=1, help=mdxc_batch_size_help)\n    mdxc_params.add_argument(\"--mdxc_pitch_shift\", type=int, default=0, help=mdxc_pitch_shift_help)\n\n    args = parser.parse_args()\n\n    if args.debug:\n        log_level = logging.DEBUG\n    else:\n        log_level = getattr(logging, args.log_level.upper())\n    logger.setLevel(log_level)\n\n    from audio_separator.separator import Separator\n\n    if args.env_info:\n        separator = Separator()\n        sys.exit(0)\n\n    if args.list_models:\n        separator = Separator(info_only=True)\n\n        if args.list_format == \"json\":\n            model_list = separator.list_supported_model_files()\n            print(json.dumps(model_list, indent=2))\n        else:\n            models = separator.get_simplified_model_list(filter_sort_by=args.list_filter)\n\n            # Apply limit if specified\n            if args.list_limit and args.list_limit > 0:\n                models = dict(list(models.items())[: args.list_limit])\n\n            # Calculate maximum widths for each column\n            filename_width = max(len(\"Model Filename\"), max(len(filename) for filename in models.keys()))\n            arch_width = max(len(\"Arch\"), max(len(info[\"Type\"]) for info in models.values()))\n            stems_width = max(len(\"Output Stems (SDR)\"), max(len(\", \".join(info[\"Stems\"])) for info in models.values()))\n            name_width = max(len(\"Friendly Name\"), max(len(info[\"Name\"]) for info in models.values()))\n\n            # Calculate total width for separator line\n            total_width = filename_width + arch_width + stems_width + name_width + 15  # 15 accounts for spacing between columns\n\n            # Format the output with dynamic widths and extra spacing\n            print(\"-\" * total_width)\n            print(f\"{'Model Filename':<{filename_width}}  {'Arch':<{arch_width}}  {'Output Stems (SDR)':<{stems_width}}  {'Friendly Name'}\")\n            print(\"-\" * total_width)\n\n            for filename, info in models.items():\n                stems = \", \".join(info[\"Stems\"])\n                print(f\"{filename:<{filename_width}}  {info['Type']:<{arch_width}}  {stems:<{stems_width}}  {info['Name']}\")\n\n        sys.exit(0)\n\n    if args.list_presets:\n        separator = Separator(info_only=True)\n        presets = separator.list_ensemble_presets()\n\n        if not presets:\n            print(\"No ensemble presets available.\")\n            sys.exit(0)\n\n        # Calculate column widths\n        id_width = max(len(\"Preset ID\"), max(len(k) for k in presets.keys()))\n        desc_width = max(len(\"Description\"), max(len(p.get(\"description\", \"\")[:60]) for p in presets.values()))\n        models_width = len(\"Models\")\n        algo_width = max(len(\"Algorithm\"), max(len(p.get(\"algorithm\", \"\")) for p in presets.values()))\n        total_width = id_width + desc_width + models_width + algo_width + 12\n\n        print(\"-\" * total_width)\n        print(f\"{'Preset ID':<{id_width}}  {'Description':<{desc_width}}  {'Models':<{models_width}}  {'Algorithm'}\")\n        print(\"-\" * total_width)\n\n        for preset_id, preset in presets.items():\n            desc = preset.get(\"description\", \"\")[:60]\n            num_models = len(preset.get(\"models\", []))\n            algo = preset.get(\"algorithm\", \"\")\n            print(f\"{preset_id:<{id_width}}  {desc:<{desc_width}}  {num_models:<{models_width}}  {algo}\")\n\n        sys.exit(0)\n\n    if args.download_model_only:\n        models_to_download = [args.model_filename] + (args.extra_models or [])\n        separator = Separator(log_formatter=log_formatter, log_level=log_level, model_file_dir=args.model_file_dir)\n        for model in models_to_download:\n            logger.info(f\"Separator version {package_version} downloading model {model} to directory {args.model_file_dir}\")\n            separator.download_model_and_data(model)\n\n        logger.info(f\"Model {', '.join(models_to_download)} downloaded successfully.\")\n        sys.exit(0)\n\n    audio_files = list(getattr(args, \"audio_files\", []))\n    if not audio_files:\n        parser.print_help()\n        sys.exit(1)\n\n    logger.info(f\"Separator version {package_version} beginning with input path(s): {', '.join(audio_files)}\")\n\n    separator = Separator(\n        log_formatter=log_formatter,\n        log_level=log_level,\n        model_file_dir=args.model_file_dir,\n        output_dir=args.output_dir,\n        output_format=args.output_format,\n        output_bitrate=args.output_bitrate,\n        normalization_threshold=args.normalization,\n        amplification_threshold=args.amplification,\n        output_single_stem=args.single_stem,\n        invert_using_spec=args.invert_spect,\n        sample_rate=args.sample_rate,\n        use_soundfile=args.use_soundfile,\n        use_autocast=args.use_autocast,\n        chunk_duration=args.chunk_duration,\n        ensemble_algorithm=args.ensemble_algorithm,\n        ensemble_weights=args.ensemble_weights,\n        ensemble_preset=args.ensemble_preset,\n        mdx_params={\n            \"hop_length\": args.mdx_hop_length,\n            \"segment_size\": args.mdx_segment_size,\n            \"overlap\": args.mdx_overlap,\n            \"batch_size\": args.mdx_batch_size,\n            \"enable_denoise\": args.mdx_enable_denoise,\n        },\n        vr_params={\n            \"batch_size\": args.vr_batch_size,\n            \"window_size\": args.vr_window_size,\n            \"aggression\": args.vr_aggression,\n            \"enable_tta\": args.vr_enable_tta,\n            \"enable_post_process\": args.vr_enable_post_process,\n            \"post_process_threshold\": args.vr_post_process_threshold,\n            \"high_end_process\": args.vr_high_end_process,\n        },\n        demucs_params={\n            \"segment_size\": args.demucs_segment_size,\n            \"shifts\": args.demucs_shifts,\n            \"overlap\": args.demucs_overlap,\n            \"segments_enabled\": args.demucs_segments_enabled,\n        },\n        mdxc_params={\n            \"segment_size\": args.mdxc_segment_size,\n            \"batch_size\": args.mdxc_batch_size,\n            \"overlap\": args.mdxc_overlap,\n            \"override_model_segment_size\": args.mdxc_override_model_segment_size,\n            \"pitch_shift\": args.mdxc_pitch_shift,\n        },\n    )\n\n    # Combine primary model with any extra models for ensembling\n    # If a preset is active and no explicit models were provided, use preset models via default\n    if args.ensemble_preset and args.model_filename == \"model_bs_roformer_ep_317_sdr_12.9755.ckpt\" and not args.extra_models:\n        separator.load_model()\n    else:\n        model_filenames = [args.model_filename] + (args.extra_models or [])\n        if len(model_filenames) == 1:\n            model_filenames = model_filenames[0]\n        separator.load_model(model_filename=model_filenames)\n\n    output_files = separator.separate(audio_files, custom_output_names=args.custom_output_names)\n    logger.info(f\"Separation complete! Output file(s): {' '.join(output_files)}\")\n"
  },
  {
    "path": "cloudbuild.yaml",
    "content": "# Cloud Build config for building the audio-separator Docker image with baked models.\n# Run manually: gcloud builds submit --config cloudbuild.yaml --region=us-east4\n# Uses e2-highcpu-32 machine type for fast builds with enough RAM for model loading.\n\nsteps:\n  - name: 'gcr.io/cloud-builders/docker'\n    args:\n      - 'build'\n      - '-f'\n      - 'Dockerfile.cloudrun'\n      - '-t'\n      - 'us-east4-docker.pkg.dev/$PROJECT_ID/audio-separator/api:$SHORT_SHA'\n      - '-t'\n      - 'us-east4-docker.pkg.dev/$PROJECT_ID/audio-separator/api:latest'\n      - '.'\n\nimages:\n  - 'us-east4-docker.pkg.dev/$PROJECT_ID/audio-separator/api:$SHORT_SHA'\n  - 'us-east4-docker.pkg.dev/$PROJECT_ID/audio-separator/api:latest'\n\noptions:\n  machineType: 'E2_HIGHCPU_32'\n  logging: 'CLOUD_LOGGING_ONLY'\n\ntimeout: '3600s'\n"
  },
  {
    "path": "docs/BIT_DEPTH_IMPLEMENTATION_SUMMARY.md",
    "content": "# Summary: Bit Depth Preservation Implementation\n\n## Issue\n[GitHub Issue #243](https://github.com/nomadkaraoke/python-audio-separator/issues/243) - Users reported that audio-separator was reducing audio quality by always outputting 16-bit audio, even when the input was 24-bit or 32-bit.\n\n## Solution\nImplemented automatic bit depth preservation that matches the output audio bit depth to the input audio file's bit depth. This ensures no quality loss when processing high-resolution audio files.\n\n## Key Changes\n\n### 1. **Dependencies** (`pyproject.toml`)\n- Added `soundfile >= 0.12` for reading audio file metadata\n\n### 2. **Core Implementation** (`audio_separator/separator/common_separator.py`)\n- Added `input_bit_depth` and `input_subtype` attributes to track input audio properties\n- Modified `prepare_mix()` to detect bit depth using soundfile\n- Updated `write_audio_pydub()` to use appropriate scaling and ffmpeg codecs for each bit depth\n- Updated `write_audio_soundfile()` to preserve subtype when writing\n\n### 3. **Comprehensive Tests**\nCreated 3 test suites with 17 tests total:\n\n**Unit Tests:**\n- `tests/unit/test_bit_depth_detection.py` - 5 tests for bit depth detection\n- `tests/unit/test_bit_depth_writing.py` - 5 tests for write functions\n\n**Integration Tests:**\n- `tests/integration/test_bit_depth_e2e.py` - 2 end-to-end tests\n- `tests/integration/test_bit_depth_preservation.py` - 6 comprehensive integration tests\n\n**Manual Test:**\n- `tests/manual_test_bit_depth.py` - Demonstrates functionality\n\n## Test Results\n\n✅ **All tests pass:**\n```\n16-bit (pydub)      ✅ PASS\n24-bit (pydub)      ✅ PASS\n32-bit (pydub)      ✅ PASS\n16-bit (soundfile)  ✅ PASS\n24-bit (soundfile)  ✅ PASS\n32-bit (soundfile)  ✅ PASS\n```\n\n## Behavior\n\n| Input Bit Depth | Previous Output | New Output |\n|----------------|-----------------|------------|\n| 16-bit         | 16-bit         | 16-bit ✅  |\n| 24-bit         | **16-bit** ❌  | 24-bit ✅  |\n| 32-bit         | **16-bit** ❌  | 32-bit ✅  |\n\n## Impact\n\n✅ **Quality Preservation:** No more quality loss when processing high-resolution audio\n✅ **Backward Compatible:** Existing 16-bit workflows unchanged\n✅ **Automatic:** No configuration required - works out of the box\n✅ **Transparent:** Logs show detected and output bit depths\n✅ **Robust:** Graceful fallback to 16-bit for unknown formats\n\n## Technical Details\n\nThe implementation:\n- Reads audio metadata before loading with librosa\n- Maps PCM subtypes to bit depths (PCM_16→16, PCM_24→24, PCM_32→32)\n- Scales audio data appropriately for each bit depth\n- Passes correct codec parameters to ffmpeg/pydub\n- Works with both pydub (default) and soundfile backends\n- Handles multiple files with different bit depths correctly\n\n## Files Modified\n\n1. `pyproject.toml` - Added soundfile dependency\n2. `audio_separator/separator/common_separator.py` - Core implementation\n\n## Files Added\n\n1. `tests/unit/test_bit_depth_detection.py` - Unit tests for detection\n2. `tests/unit/test_bit_depth_writing.py` - Unit tests for writing\n3. `tests/integration/test_bit_depth_e2e.py` - End-to-end tests\n4. `tests/integration/test_bit_depth_preservation.py` - Integration tests\n5. `tests/manual_test_bit_depth.py` - Manual test script\n6. `BIT_DEPTH_PRESERVATION.md` - Detailed documentation\n\n## No Breaking Changes\n\nThis implementation is fully backward compatible:\n- No API changes required\n- No new parameters needed\n- Existing functionality unchanged\n- Only affects output bit depth to match input\n\n## Resolves\n\n✅ Closes #243 - Output bit depth now matches input automatically\n\n"
  },
  {
    "path": "docs/BIT_DEPTH_PRESERVATION.md",
    "content": "# Bit Depth Preservation Implementation\n\n## Summary\n\nThis implementation ensures that the output audio files from audio-separator preserve the bit depth of the input audio files, preventing quality loss when processing 24-bit or 32-bit audio files.\n\n## Changes Made\n\n### 1. Added `soundfile` to Dependencies (`pyproject.toml`)\n- Added `soundfile >= 0.12` as a dependency to enable reading audio file metadata\n\n### 2. Modified `CommonSeparator` Class (`audio_separator/separator/common_separator.py`)\n\n#### Added Bit Depth Tracking\n- Added `self.input_bit_depth` attribute to store the detected bit depth (16, 24, or 32)\n- Added `self.input_subtype` attribute to store the audio file subtype\n\n#### Modified `prepare_mix()` Method\n- Added code to read audio file metadata using `soundfile.info()` before loading with librosa\n- Detects and stores the input audio file's bit depth based on the subtype:\n  - PCM_16, PCM_S8 → 16-bit\n  - PCM_24 → 24-bit\n  - PCM_32, FLOAT, DOUBLE → 32-bit\n- Defaults to 16-bit for numpy arrays or unknown formats\n- Logs the detected bit depth for debugging\n\n#### Modified `write_audio_pydub()` Method\n- Determines output bit depth from `self.input_bit_depth`\n- Scales audio data appropriately for each bit depth:\n  - 16-bit: scale by 32767, use int16, sample_width=2\n  - 24-bit: scale by 8388607, use int32, sample_width=3\n  - 32-bit: scale by 2147483647, use int32, sample_width=4\n- Passes appropriate ffmpeg codec parameters for WAV files:\n  - 16-bit: `pcm_s16le`\n  - 24-bit: `pcm_s24le`\n  - 32-bit: `pcm_s32le`\n\n#### Modified `write_audio_soundfile()` Method\n- Determines output subtype from `self.input_subtype` or `self.input_bit_depth`\n- Passes the subtype to `sf.write()` to preserve bit depth\n- Removed manual interleaving (soundfile handles multi-channel properly)\n\n## Tests Created\n\n### Unit Tests\n\n#### `tests/unit/test_bit_depth_detection.py`\nTests bit depth detection logic:\n- `test_16bit_detection`: Verifies 16-bit files are correctly detected\n- `test_24bit_detection`: Verifies 24-bit files are correctly detected\n- `test_32bit_detection`: Verifies 32-bit files are correctly detected\n- `test_numpy_array_input_defaults_to_16bit`: Verifies numpy arrays default to 16-bit\n- `test_bit_depth_preserved_across_multiple_files`: Verifies bit depth updates between files\n\n#### `tests/unit/test_bit_depth_writing.py`\nTests bit depth preservation in write functions:\n- `test_write_16bit_with_pydub`: Tests 16-bit output with pydub backend\n- `test_write_24bit_with_pydub`: Tests 24-bit output with pydub backend\n- `test_write_32bit_with_pydub`: Tests 32-bit output with pydub backend\n- `test_write_24bit_with_soundfile`: Tests 24-bit output with soundfile backend\n- `test_write_16bit_with_soundfile`: Tests 16-bit output with soundfile backend\n\n### Integration Tests\n\n#### `tests/integration/test_bit_depth_e2e.py`\nEnd-to-end tests with full separation workflow:\n- `test_e2e_24bit_preservation`: Full test with 24-bit input\n- `test_e2e_16bit_preservation`: Full test with 16-bit input\n\n#### `tests/integration/test_bit_depth_preservation.py`\nComprehensive integration tests (skipped if package not installed):\n- Tests for 16-bit, 24-bit, and 32-bit preservation\n- Tests with FLAC output format\n- Tests with soundfile backend\n- Tests with multiple files of different bit depths\n\n## Test Results\n\nAll unit tests pass successfully:\n- 5/5 bit depth detection tests ✅\n- 5/5 bit depth writing tests ✅\n- 10/10 total unit tests ✅\n\n## Behavior\n\n### Before\n- All output files were written as 16-bit PCM, regardless of input bit depth\n- This resulted in quality loss when processing 24-bit or 32-bit audio files\n\n### After\n- Output bit depth matches input bit depth automatically\n- 16-bit input → 16-bit output\n- 24-bit input → 24-bit output\n- 32-bit input → 32-bit output\n- No quality loss when processing high-quality audio files\n- Backward compatible - 16-bit inputs still produce 16-bit outputs\n\n## Supported Formats\n\nBit depth preservation works with:\n- **WAV files**: Full support for 16, 24, and 32-bit\n- **FLAC files**: Full support for 16, 24, and 32-bit\n- **Other formats**: Depends on ffmpeg support through pydub\n\nBoth output backends are supported:\n- **pydub/ffmpeg** (default): Uses codec parameters to enforce bit depth\n- **soundfile**: Uses subtype parameter to enforce bit depth\n\n## Notes\n\n- The implementation is backward compatible - no changes to the API are required\n- Bit depth is detected automatically from input files\n- Logs provide visibility into detected and output bit depths\n- Falls back to 16-bit for unknown formats or errors\n- Works correctly when processing multiple files with different bit depths\n\n"
  },
  {
    "path": "docs/CI-GPU-RUNNERS.md",
    "content": "# CI GPU Runner Infrastructure\n\nThis document explains how the GPU-based integration test infrastructure works for this repo.\n\n## Overview\n\nIntegration tests require GPU hardware to run ML model inference. GPU VMs are expensive (~$1.62/hr for 3x T4), so they auto-scale to zero when idle. The system automatically starts runners when CI jobs need them and stops them after 15 minutes of inactivity.\n\n## Architecture\n\n```\nGitHub webhook (workflow_job.queued)\n    │\n    ▼\nCloud Function (github-runner-manager)\n    │\n    ├── Job has \"gpu\" label? → Start GPU runners (3x n1-standard-4 + T4)\n    ├── Job has \"self-hosted\" label? → Start CPU runners\n    └── Neither? → Ignore\n\nCloud Scheduler (every 15 min)\n    │\n    ▼\nCloud Function (?action=check_idle)\n    │\n    └── No pending jobs + runner idle > 15 min? → Stop runner\n```\n\n### Components\n\n| Component | Location | Purpose |\n|-----------|----------|---------|\n| Cloud Function | `karaoke-gen/infrastructure/functions/runner_manager/main.py` | Starts/stops runner VMs based on demand |\n| Pulumi module | `karaoke-gen/infrastructure/modules/runner_manager.py` | Deploys the function, scheduler, and IAM |\n| GPU VM definitions | `karaoke-gen/infrastructure/compute/github_runners.py` | 3x n1-standard-4 with T4 GPU |\n| GPU startup script | `karaoke-gen/infrastructure/compute/startup_scripts/github_runner_gpu.sh` | Installs NVIDIA drivers, Python, registers runner |\n| Config | `karaoke-gen/infrastructure/config.py` | Runner count, labels, idle timeout |\n| GitHub webhook | Org-level (`nomadkaraoke`) | Sends `workflow_job` events to Cloud Function |\n\n### GPU Runner VMs\n\n- **Count**: 3 (configurable via `NUM_GPU_RUNNERS` in config.py)\n- **Machine type**: n1-standard-4 (4 vCPU, 15GB RAM) + 1x NVIDIA T4\n- **Zone**: us-central1-a\n- **Labels**: `self-hosted, linux, x64, gcp, gpu`\n- **Startup time**: ~15-20 min (NVIDIA driver install, Python build, model download)\n- **Model cache**: ~14GB of ML models pre-downloaded to `/opt/audio-separator-models/`\n\n### Required GitHub Branch Protection Checks\n\nThe `Protect main` ruleset (ID: 529535) requires these checks to pass before merge:\n\n- `unit-tests` — from `run-unit-tests.yaml` (runs on GitHub-hosted runners)\n- `ensemble-presets` — from `run-integration-tests.yaml` (runs on GPU runners)\n- `core-models` — from `run-integration-tests.yaml` (runs on GPU runners)\n- `stems-and-quality` — from `run-integration-tests.yaml` (runs on GPU runners)\n\n**IMPORTANT**: If integration test job names change (e.g., splitting or renaming jobs), you MUST update the ruleset to match. The ruleset is configured at:\nhttps://github.com/nomadkaraoke/python-audio-separator/settings/rules/529535\n\nTo update via API:\n```bash\ngh api repos/nomadkaraoke/python-audio-separator/rulesets/529535 \\\n  --method PUT --input - <<'EOF'\n{\n  \"name\": \"Protect main\",\n  \"enforcement\": \"active\",\n  \"target\": \"branch\",\n  \"conditions\": {\"ref_name\": {\"include\": [\"~DEFAULT_BRANCH\"], \"exclude\": []}},\n  \"rules\": [\n    {\"type\": \"deletion\"},\n    {\"type\": \"pull_request\", \"parameters\": {\n      \"required_approving_review_count\": 0,\n      \"allowed_merge_methods\": [\"squash\"]\n    }},\n    {\"type\": \"required_status_checks\", \"parameters\": {\n      \"required_status_checks\": [\n        {\"context\": \"unit-tests\", \"integration_id\": 15368},\n        {\"context\": \"JOB_NAME_HERE\", \"integration_id\": 15368}\n      ]\n    }}\n  ]\n}\nEOF\n```\n\n## Troubleshooting\n\n### Integration tests stuck in \"queued\"\n\n**Symptoms**: PR checks show `pending` for `ensemble-presets`, `core-models`, `stems-and-quality`.\n\n**Diagnosis steps**:\n\n1. Check if GPU runners are online:\n   ```bash\n   gh api orgs/nomadkaraoke/actions/runners \\\n     --jq '.runners[] | select(.labels[].name == \"gpu\") | {name, status, busy}'\n   ```\n\n2. Check if GPU VMs exist:\n   ```bash\n   gcloud compute instances list --project=nomadkaraoke --filter=\"name~gpu\"\n   ```\n\n3. Check Cloud Function logs for webhook delivery:\n   ```bash\n   gcloud logging read 'resource.labels.service_name=\"github-runner-manager\"' \\\n     --project=nomadkaraoke --limit=20 \\\n     --format=\"value(timestamp,textPayload,jsonPayload.message)\"\n   ```\n\n4. Check GPU runner startup logs (if VMs are RUNNING but GitHub shows offline):\n   ```bash\n   gcloud compute ssh github-gpu-runner-1 --zone=us-central1-a --project=nomadkaraoke \\\n     --command=\"tail -50 /var/log/github-runner-startup.log\"\n   ```\n\n### GPU VMs don't exist\n\nIf `gcloud compute instances list` shows no GPU runners but Pulumi state thinks they exist:\n\n```bash\n# 1. Remove stale state (from karaoke-gen/infrastructure/ dir)\npulumi state delete \"urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-1\" --target-dependents --yes\npulumi state delete \"urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-2\" --target-dependents --yes\npulumi state delete \"urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-3\" --target-dependents --yes\n\n# 2. Recreate\npulumi up --yes\n\n# 3. Re-import dependent resources that got removed (runner-manager function, IAM, scheduler)\n# Check `pulumi preview` for what needs importing\n```\n\n### GPU runner startup fails (NVIDIA driver issues)\n\nThe startup script handles kernel header mismatches by upgrading the kernel and rebooting once. If the runner still fails:\n\n```bash\n# SSH in and check\ngcloud compute ssh github-gpu-runner-1 --zone=us-central1-a --project=nomadkaraoke \\\n  --command=\"nvidia-smi; dkms status; uname -r\"\n```\n\nSee `karaoke-gen` memory file `project_gpu_runner_drivers.md` for known issues.\n\n### Webhook not firing\n\nCheck the org-level webhook configuration:\n```bash\ngh api orgs/nomadkaraoke/hooks \\\n  --jq '.[] | select(.events[] == \"workflow_job\") | {id, active, config: {url: .config.url}}'\n```\n\nThe webhook URL should point to: `https://us-central1-nomadkaraoke.cloudfunctions.net/github-runner-manager`\n\n## Cost\n\n| Scenario | Cost |\n|----------|------|\n| Per GPU runner hour | ~$0.54/hr (n1-standard-4 + T4) |\n| 3 runners × 15 min CI run | ~$0.41 |\n| Idle (scale to zero) | $0 |\n| Typical daily cost (5 PRs) | ~$2 |\n"
  },
  {
    "path": "docs/archive/2026-03-22-modal-to-gcp-migration-plan.md",
    "content": "# Plan: Modal → GCP Audio Separation Migration\n\n**Created:** 2026-03-22\n**Branch:** feat/sess-20260321-2314-modal-gcp-migration\n**Worktrees:** `karaoke-gen-modal-gcp-migration` (infra + backend), `python-audio-separator-modal-gcp-migration` (server)\n**Status:** Draft → Ready for implementation\n\n## Overview\n\nMigrate audio stem separation from Modal to a Cloud Run Service with L4 GPU on GCP. This eliminates the only third-party compute dependency, fixes intermittent Modal API failures (\"no files were downloaded\"), upgrades to latest ensemble models for better quality, and decouples separation from the lyrics review critical path so users can start reviewing lyrics faster.\n\n### Architecture Decision: Cloud Run GPU Service\n\n| Factor | Cloud Run GPU | GCE VM + auto-stop |\n|--------|--------------|-------------------|\n| Idle cost | $0 (scales to zero) | $0 (when stopped) |\n| Cold start | ~30-60s (model load from GCS) | ~60-120s (VM boot + model load) |\n| Ops overhead | None (serverless) | Moderate (start/stop scripts, health monitoring) |\n| GPU available | L4 (24GB VRAM) in us-central1 | T4/L4/A100 |\n| Scaling | Automatic | Manual orchestration |\n| Cost/job (~12 min GPU) | ~$0.13 | ~$0.07-0.10 (T4) |\n| Deployment | Docker image push | Packer image + GCS wheel + SSH restart |\n\nCloud Run GPU wins on simplicity. L4 is faster than T4, cold start is acceptable, and per-job cost well under $1.\n\n### Model Upgrade: Ensemble Presets as Default\n\n**Current models (single-model):**\n| Stage | Model | SDR | Notes |\n|-------|-------|-----|-------|\n| 1 (instrumental) | `model_bs_roformer_ep_317_sdr_12.9755.ckpt` | 12.97 | Older BS-Roformer |\n| 1 (other stems) | `htdemucs_6s.yaml` | — | Demucs 6-stem — **dropping** |\n| 2 (karaoke/BV) | `mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt` | 10.20 | Single karaoke model |\n\n**New defaults (ensemble presets):**\n| Stage | Preset | Models | SDR | Notes |\n|-------|--------|--------|-----|-------|\n| 1 | `instrumental_clean` | Fv7z + Resurrection | ~17.5 | +35% quality, bleedless |\n| 2 | `karaoke` | 3 karaoke models (aufr33+gabox_v2+becruily) | ~10.6 | +4% quality, 3-model ensemble |\n\n**Key design: preset-name references, not model filenames.** karaoke-gen references preset names (`instrumental_clean`, `karaoke`). The audio-separator package resolves preset → models + ensemble algorithm. When better models come out, update presets in audio-separator and release a new version — no karaoke-gen changes needed.\n\n### Pipeline Decoupling: Separation Off Critical Path\n\n**Current flow (both gate review):**\n```\nJob created\n├── Audio worker (separation) ──→ audio_complete=True ─┐\n│                                                       ├→ GENERATING_SCREENS → AWAITING_REVIEW\n└── Lyrics worker (transcription) → lyrics_complete=True┘\n```\n\n**New flow (lyrics gates review, separation runs in background):**\n```\nJob created\n├── Audio worker (separation) ──→ audio_complete=True (background, not gating)\n│\n└── Lyrics worker (transcription) → lyrics_complete=True → GENERATING_SCREENS → AWAITING_REVIEW\n                                                                                     │\n                                                                              User reviews lyrics\n                                                                                     │\n                                                                              Instrumental review\n                                                                              (waits for audio_complete\n                                                                               if not ready yet)\n```\n\n**Why this works:**\n- Lyrics review (`/app/jobs#/{jobId}/review`) only needs transcription output — no stems needed\n- Instrumental review (`/app/jobs#/{jobId}/instrumental`) needs stems — but user typically spends 5+ min on lyrics review, buying time for separation to finish\n- In the rare case separation isn't done when user reaches instrumental review, show a \"Separation in progress...\" waiting state\n- Screens worker only truly needs lyrics to generate title/end screens\n\n### Estimated Timeline\n\n| Scenario | Stage 1 | Stage 2 | Cold start | Total |\n|----------|---------|---------|------------|-------|\n| Current (Modal, single models) | 3-5 min | 2-3 min | 0 | 7-11 min |\n| New ensemble (Cloud Run L4) | ~4-6 min | ~3-5 min | ~30-60s | ~8-12 min |\n| **User-perceived wait** (new) | — | — | — | **0 min** (decoupled) |\n\nSeparation takes slightly longer with ensembles, but users never wait for it — they're reviewing lyrics while it runs.\n\n## Requirements\n\n- [ ] Audio separation runs on GCP Cloud Run with L4 GPU\n- [ ] Same HTTP API contract as Modal deployment (endpoints, request/response format)\n- [ ] `audio-separator-remote` CLI and `AudioSeparatorAPIClient` work unchanged\n- [ ] Default models use ensemble presets (`instrumental_clean` + `karaoke`)\n- [ ] karaoke-gen references preset names, not model filenames\n- [ ] Demucs 6-stem separation dropped from pipeline\n- [ ] Scale-to-zero when not processing (no idle GPU cost)\n- [ ] Cold start < 60 seconds\n- [ ] Per-job cost < $1\n- [ ] Models stored in GCS, loaded on container startup\n- [ ] Publicly accessible endpoint with auth token (reuse `admin-tokens` secret)\n- [ ] Infrastructure managed via Pulumi in karaoke-gen\n- [ ] Separation decoupled from lyrics review critical path\n- [ ] Instrumental review page handles \"separation still in progress\" gracefully\n- [ ] Docker image CI lives in python-audio-separator repo, pushes to Artifact Registry\n\n## Implementation Steps\n\n### Phase 1: Cloud Run GPU Server (python-audio-separator repo)\n\n#### Step 1.1 — Create Cloud Run-compatible FastAPI server\n- [ ] Create `audio_separator/remote/deploy_cloudrun.py` adapted from `deploy_modal.py`\n- [ ] Replace Modal-specific code:\n  - `modal.Dict` → in-memory `dict` (single instance handles one job at a time)\n  - `modal.Volume` → local `/tmp` storage + GCS for model cache\n  - `modal.Function.spawn()` → synchronous processing (no background tasks needed)\n  - `modal.Image` → Dockerfile\n  - `modal.App` → standard FastAPI + uvicorn\n- [ ] Keep all existing API endpoints identical:\n  - `POST /separate` — submit separation job\n  - `GET /status/{task_id}` — return job status\n  - `GET /download/{task_id}/{file_hash}` — download result file\n  - `GET /models-json`, `GET /models` — list models\n  - `GET /health` — health check (with model readiness indicator)\n  - `GET /` — root info\n- [ ] Add model download on startup from GCS bucket (`gs://nomadkaraoke-audio-separator-models/`)\n- [ ] Add ensemble preset support: accept `preset` parameter in `/separate` that resolves to model list + algorithm\n- [ ] Add startup probe endpoint for Cloud Run GPU readiness\n\n**Design:** Make `/separate` effectively synchronous — process inline, store results in-memory dict + local filesystem. Cloud Run instance stays alive for scale-down timeout (600s), so Stage 2 hits the same warm instance. Async polling API contract preserved for client compatibility.\n\n#### Step 1.2 — Create Dockerfile\n- [ ] Create `Dockerfile.cloudrun` in repo root\n- [ ] Base: `nvidia/cuda:12.6.3-runtime-ubuntu22.04` (matches Cloud Run L4 driver support)\n- [ ] Install: Python 3.13, FFmpeg, libsndfile, sox, system audio libs\n- [ ] Install: `audio-separator[gpu]` from current repo\n- [ ] Entrypoint: `python -m audio_separator.remote.deploy_cloudrun`\n- [ ] Expose port 8080\n- [ ] Set env: `MODEL_DIR=/models`, `STORAGE_DIR=/tmp/storage`\n\n#### Step 1.3 — Upload models to GCS\n- [ ] Create GCS bucket `nomadkaraoke-audio-separator-models` (us-central1, standard storage)\n- [ ] Upload all models needed by default ensemble presets:\n  - `mel_band_roformer_instrumental_fv7z_gabox.ckpt` (instrumental_clean preset)\n  - `bs_roformer_instrumental_resurrection_unwa.ckpt` (instrumental_clean preset)\n  - `mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt` (karaoke preset)\n  - `mel_band_roformer_karaoke_gabox_v2.ckpt` (karaoke preset)\n  - `mel_band_roformer_karaoke_becruily.ckpt` (karaoke preset)\n- [ ] Total: ~1-1.5 GB of models\n\n#### Step 1.4 — Local testing\n- [ ] Build Docker image locally\n- [ ] Test with `docker run --gpus all` (if local GPU available) or CPU mode\n- [ ] Verify API compatibility: submit job with `preset=instrumental_clean`, poll status, download files\n- [ ] Verify output filename format matches expected pattern: `filename_(StemType)_modelname.ext`\n- [ ] Verify ensemble output: ensembled stems have correct naming\n- [ ] Compare output quality with Modal (A/B test on reference songs)\n\n#### Step 1.5 — CI/CD for Docker image\n- [ ] Create `.github/workflows/deploy-to-cloudrun.yml` in python-audio-separator repo\n- [ ] Triggers: PyPI release, changes to Dockerfile.cloudrun, manual dispatch\n- [ ] Steps: build Docker image → push to Artifact Registry (`us-central1-docker.pkg.dev/nomadkaraoke/audio-separator`)\n- [ ] Use Workload Identity Federation for GCP auth\n\n### Phase 2: GCP Infrastructure (karaoke-gen repo)\n\n#### Step 2.1 — Artifact Registry\n- [ ] Add Artifact Registry Docker repo to Pulumi\n- [ ] Repository: `audio-separator` in `us-central1`\n\n#### Step 2.2 — GCS Model Bucket\n- [ ] Create `nomadkaraoke-audio-separator-models` bucket via Pulumi\n- [ ] Standard storage class, us-central1\n- [ ] Grant read access to Cloud Run service account\n\n#### Step 2.3 — Cloud Run GPU Service\n- [ ] Create `infrastructure/modules/audio_separator_service.py`\n- [ ] Cloud Run Service configuration:\n  - Image: from Artifact Registry\n  - GPU: 1x NVIDIA L4\n  - CPU: 4 vCPU (minimum required for L4)\n  - Memory: 16 GiB\n  - Min instances: 0 (scale to zero)\n  - Max instances: 2 (handle concurrent jobs)\n  - Request timeout: 1800s (30 min)\n  - Scale-down delay: 600s (keep warm between Stage 1 → Stage 2)\n  - Startup probe: HTTP GET /health, 120s initial delay, 10s period\n  - Env vars:\n    - `MODEL_BUCKET=nomadkaraoke-audio-separator-models`\n    - `MODEL_DIR=/models`\n    - `ADMIN_TOKEN` (from Secret Manager, reuse existing `admin-tokens`)\n  - Region: us-central1\n  - Ingress: all traffic (public endpoint with auth)\n\n#### Step 2.4 — Service Account & IAM\n- [ ] Create `audio-separator` service account\n- [ ] Grant: `storage.objectViewer` on model bucket\n- [ ] Grant: `secretmanager.secretAccessor` for admin-tokens\n- [ ] Grant: `logging.logWriter`, `monitoring.metricWriter`\n\n#### Step 2.5 — Wire into Pulumi\n- [ ] Add to `infrastructure/__main__.py`\n- [ ] Add config constants to `infrastructure/config.py`\n\n### Phase 3: Pipeline Decoupling + Model Upgrade (karaoke-gen repo)\n\n#### Step 3.1 — Decouple separation from lyrics review path\n- [ ] In `backend/services/job_manager.py`:\n  - Change `check_parallel_processing_complete()` to only check `lyrics_complete` (not `audio_complete`)\n  - `mark_lyrics_complete()` triggers screens worker on its own (no need to wait for audio)\n  - `mark_audio_complete()` no longer triggers screens — just sets the flag\n- [ ] In `backend/workers/screens_worker.py`:\n  - Remove validation that `audio_complete` must be True\n  - Screens only needs lyrics data to generate title/end screens\n  - Skip instrumental analysis step if audio isn't complete yet (or make it a no-op)\n- [ ] Verify: lyrics review page works without stems present\n\n#### Step 3.2 — Add \"waiting for separation\" state to instrumental review\n- [ ] In frontend instrumental review page (`/app/jobs#/{jobId}/instrumental`):\n  - Check `state_data.audio_complete` on page load\n  - If false, show \"Audio separation in progress...\" with a spinner/progress indicator\n  - Poll job status every 5-10 seconds until `audio_complete=True`\n  - Once complete, load and display instrumental options as normal\n- [ ] Backend: ensure instrumental review API endpoint returns separation status\n\n#### Step 3.3 — Switch to preset-based model configuration\n- [ ] In `backend/workers/audio_worker.py`:\n  - Replace `DEFAULT_CLEAN_MODEL` with `DEFAULT_INSTRUMENTAL_PRESET = \"instrumental_clean\"`\n  - Replace `DEFAULT_BACKING_MODELS` with `DEFAULT_KARAOKE_PRESET = \"karaoke\"`\n  - Remove `DEFAULT_OTHER_MODELS` (Demucs dropped)\n  - Pass `preset=` parameter to API client instead of `models=`\n- [ ] In `karaoke_gen/audio_processor.py`:\n  - Update `_process_audio_separation_remote()` to pass presets\n  - Stage 1: `api_client.separate_audio_and_wait(audio_file, preset=\"instrumental_clean\", ...)`\n  - Stage 2: `api_client.separate_audio_and_wait(vocals_file, preset=\"karaoke\", ...)`\n  - Remove `other_stems_models` parameter (or default to empty)\n  - Update result organization for ensemble outputs (stem names may include ensemble info)\n- [ ] In `audio_separator/remote/api_client.py` (python-audio-separator repo):\n  - Add `preset` parameter to `separate_audio()` and `separate_audio_and_wait()`\n  - Client passes `preset` field in multipart form data to API\n  - API server resolves preset → models + algorithm\n\n#### Step 3.4 — Update tests\n- [ ] Update `tests/unit/test_audio_remote.py`:\n  - Test preset-based separation calls\n  - Remove Demucs 6-stem references\n  - Test new default model/preset names\n- [ ] Add test for pipeline decoupling:\n  - Verify `mark_lyrics_complete()` triggers screens without `audio_complete`\n  - Verify `mark_audio_complete()` sets flag but doesn't trigger screens\n- [ ] Add frontend test for instrumental review waiting state\n\n### Phase 4: Cutover & Cleanup\n\n#### Step 4.1 — Deploy and test\n- [ ] Deploy Cloud Run GPU service via `pulumi up`\n- [ ] Run separation on 3-5 test songs with ensemble presets\n- [ ] Compare output quality to Modal (listen test)\n- [ ] Verify timing: ensemble separation completes within ~8-12 min\n- [ ] Test cold start scenario (wait for scale-down, then submit)\n- [ ] Test back-to-back jobs (Stage 1 → Stage 2 hits warm instance)\n- [ ] Test pipeline decoupling: verify lyrics review available before separation completes\n- [ ] Test instrumental review waiting state\n\n#### Step 4.2 — Update Cloud Run audio worker config\n- [ ] Change `AUDIO_SEPARATOR_API_URL` in `infrastructure/modules/cloud_run.py` from Modal URL to Cloud Run URL\n- [ ] Deploy via `pulumi up`\n- [ ] Run 5-10 production jobs, monitor for errors\n\n#### Step 4.3 — Monitor (1 week)\n- [ ] Watch Cloud Run logs for errors\n- [ ] Monitor separation timing in job state_data\n- [ ] Check Cloud Run billing (verify per-job cost < $1)\n- [ ] Verify scale-to-zero works (no idle GPU charges)\n- [ ] Watch for users hitting the \"waiting for separation\" state — measure frequency\n\n#### Step 4.4 — Decommission Modal\n- [ ] Remove Modal deployment workflow from python-audio-separator repo\n- [ ] Delete Modal app\n- [ ] Close Modal account\n- [ ] Remove `modal` from python-audio-separator dependencies\n- [ ] Update `AUDIO_SEPARATOR_API_URL` env var in local `.envrc` files\n\n## Files to Create/Modify\n\n### python-audio-separator repo (`python-audio-separator-modal-gcp-migration` worktree)\n| File | Action | Description |\n|------|--------|-------------|\n| `audio_separator/remote/deploy_cloudrun.py` | Create | Cloud Run-compatible FastAPI server (adapted from deploy_modal.py) |\n| `audio_separator/remote/api_client.py` | Modify | Add `preset` parameter to separate methods |\n| `Dockerfile.cloudrun` | Create | Docker image for Cloud Run GPU deployment |\n| `.github/workflows/deploy-to-cloudrun.yml` | Create | CI/CD: build image → push to Artifact Registry |\n\n### karaoke-gen repo (`karaoke-gen-modal-gcp-migration` worktree)\n| File | Action | Description |\n|------|--------|-------------|\n| `infrastructure/modules/audio_separator_service.py` | Create | Pulumi: Cloud Run GPU service + model bucket + IAM |\n| `infrastructure/__main__.py` | Modify | Wire up audio separator service |\n| `infrastructure/config.py` | Modify | Add audio separator constants |\n| `infrastructure/modules/cloud_run.py` | Modify | Update `AUDIO_SEPARATOR_API_URL` to Cloud Run URL |\n| `backend/services/job_manager.py` | Modify | Decouple: lyrics_complete alone triggers screens |\n| `backend/workers/screens_worker.py` | Modify | Remove audio_complete prerequisite |\n| `backend/workers/audio_worker.py` | Modify | Switch to preset-based config, drop Demucs |\n| `karaoke_gen/audio_processor.py` | Modify | Pass presets instead of model filenames |\n| `frontend/` (instrumental review) | Modify | Add \"waiting for separation\" state |\n| `tests/unit/test_audio_remote.py` | Modify | Update for presets, remove Demucs tests |\n| `.github/workflows/deploy-audio-separator.yml` | Create | CI: deploy Cloud Run revision on image push |\n\n## Testing Strategy\n\n- **Unit tests:** Preset resolution, pipeline decoupling (lyrics triggers screens alone), model name updates\n- **Integration test:** Deploy Cloud Run service, run full separation with ensemble presets, verify output files\n- **A/B comparison:** Same songs through Modal (single model) and Cloud Run (ensemble) — quality should be better\n- **Pipeline test:** Submit job, verify lyrics review available before separation completes\n- **Frontend test:** Playwright E2E for instrumental review waiting state\n- **Cold start test:** Wait for scale-down, submit job, measure total time\n- **Production E2E:** After cutover, run 10 production jobs through full pipeline\n\n## Cost Estimate\n\n| Scenario | Monthly cost |\n|----------|-------------|\n| 10 jobs/day × 12 min GPU = 2 hrs/day | ~$40/mo |\n| 30 jobs/day × 12 min GPU = 6 hrs/day | ~$120/mo |\n| Per-job cost (12 min L4 @ $0.67/hr) | ~$0.13 |\n\nWell under $1/job budget. No idle cost due to scale-to-zero.\n\n## Resolved Questions\n\n- [x] Cloud Run vs GCE VM → **Cloud Run GPU Service** (simplest, scale-to-zero)\n- [x] Which GPU → **L4** (only option on Cloud Run, 24GB VRAM)\n- [x] Model upgrade → **Ensemble presets as default** (quality > speed)\n- [x] Demucs 6-stem → **Drop it**\n- [x] Auth → **Reuse existing `admin-tokens` secret**\n- [x] Docker CI → **python-audio-separator repo builds + pushes image**\n- [x] Ensemble presets UI → **Backend-only; presets defined in audio-separator package**\n- [x] Speed vs quality → **Quality wins; decouple separation from critical path so user never waits**\n\n## Rollback Plan\n\n1. **Quick rollback:** Change `AUDIO_SEPARATOR_API_URL` back to Modal URL in Pulumi config, `pulumi up`. Takes ~2 minutes.\n2. **Pipeline rollback:** Revert job_manager changes to re-gate screens on `audio_complete`. One commit.\n3. **Keep Modal running** during the monitoring period (Phase 4.3). Don't decommission until confident.\n4. **Model rollback:** Preset config can be changed back to direct model filenames in one commit.\n"
  },
  {
    "path": "docs/deton24-audio-separation-info-2026-03-15.md",
    "content": "## *edit.* 14.03.26\n\n*deton24’s*\n\n**Instrumental and vocal & stems separation &** [***mastering***](#_k34y1vaaneb1)\n\n*(UVR 5 GUI:* [VR](#_atxff7m4vp8n)/[MDX-Net](#_7znr3r5gprdy)/[MDX23C](#_7znr3r5gprdy)/[Demucs](#_m9ndauawzs5f) 1-4, and BS/Mel-Roformer in [beta](#_6y2plb943p9v) / [MSST](#_2y2nycmmf53)\n\n[MVSEP-MDX23-Colab](#_jmb1yj7x3kj7)/[Drumsep](#_m55fp5i7rdpm)/[SCNet](#_sjf0vefmplt)/[Apollo](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07)/[MedleyVox](#_s4sjh68fo1sw)\n\n[x-minus](#_wbc0pja7faof).pro ([uvronline.app](https://uvronline.app/ai))/[mvsep](#_wbc0pja7faof).com/[Colabs](#_wbc0pja7faof)\n\n[Gaudio](#_yy2jex1n5sq)/[Dango](#_xdux18tet3x9).ai/[Audioshake](#_tc4az79fufkn)/[Music ai](#_vyz1ol39n8d2))\n\n[General reading advice](#_jx9um5zd7fnp) | [Discord](https://discord.gg/ZPtAU5R6rP) (ask, or suggest edits only there) | Table of [content](#_sm5m61aib1vx) (up-to-date in Options>Document outline on PC or in the app) | [Training](#_bg6u0y2kn4ui)\n\nStraight to the currently best [models list](#_rz0d5zk9ms4w)\n\n[Instrumentals](#_2vdz5zlpb27h) ([ensembles](#_nk4nvhlv1pnt)) | [Vocals](#_n8ac32fhltgg) ([ensembles](#_i7k483hodhhu)) | [De-bleeding](#_tv0x7idkh1ua) | [Karaoke](#_vg1wnx1dc4g0) | [BVs](#_kkeba46q17rq) | [Two singers](#_7bakw3ajb3ii) | [Harmonies](#_9h585vwqgcvg) | [Speech](#_o6au7k9vcmk6) | [Various speakers](#_ea9fj444mg3m) | [Phantom center](#_3c6n9m7vjxul) | [for RVC](#_phr57dt8x5mj) | [4-6 stems](#_sjf0vefmplt) | [Bass](#_hslkyvpqcfi0) | [Drums](#_m8brt73sq15r) | [Drumsep](#_93froqq0fnq6) | [Electric guitar](#_io845jglru5c) | [Acoustic guitar](#_99mh96i7mlar) | [Sep. both](#_og1lah39r7sc) | [Lead and rhythm guitar](#_uhs4ackpxojq) | [Piano](#_m6621q81n96t) | [Synths](#_1igtx0lprdsx) | [Organs](#_2ud4h14piyt8) | [Bells](#_860rzprafq1e) | [Strings](#_fdqaq0q4s020) | [Trumpets](#_5r5gvhhmqt8n) | [SFX](#_owqo9q2d774z) | [Bird sounds](#_c1uhro1ipnbv) | [De-crowd](#_h5cpiy7ueljn) | [De-breath](#_7otrtv77n5kq) | [De-reverb](#_kcswx79hi856) | [De-noising](#_hyzts95m298o) | [De-clippping](#_hk34hc4d1ah7) | [Upscalers](#_kmvf6iw5hfvm) | [Mastering](#_k34y1vaaneb1)\n\n[DOCX](https://tinyurl.com/gdocdocx) | [PDF](https://tinyurl.com/gdocpdf):\n\n<http://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/export?format=pdf>\n\n\\_\\_\\_\n\n##### *Last updates and news*\n\n- gilliaan released a new dual phantom center model [gilliaan\\_MonoStereo\\_Dual\\_Beta2](https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality/blob/main/gilliaan_MonoStereo_Dual_Beta2.ckpt) | [yaml](https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality). It has better metrics than the beta 1. Newer models might be trained on other archs now which have better understanding of stereo mapping and spatialisation. [More info](https://discord.com/channels/708579735583588363/708580573697933382/1482589903253409982).\n\n- The doc previously lacked SCNet difference/sides model by Dry Paint Dealer Undr and it was added in the relevant section.\n\n<https://drive.google.com/drive/folders/1ZSUw6ZuhJusv7HE5eMa-MORKA0XbSEht?usp=sharing>\n\n- gilliaan released leaderboard for phantom center with explanation on important metrics - [click](#_3c6n9m7vjxul)\n\n- gilliaan (heauxdontlast) released\n\ngilliaan\\_MonoStereo\\_Dual\\_Beta1\n\n<https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality>\n\n“Phantom center dual extractor trained on a custom synthetic dataset of 1000+ 4 mins songs. This model is trained to isolate only the correlated center content, so hard-panned signals stay in the side output and don't leak into the center.\n\nBleedless/Fullness are on the same level.\n\nOn my validation set it scores higher SDR than what's currently out there. Still training, this is a demo beta.\n\nFeedbacks are welcomed\n\nmid — SDR 10.41, aura\\_mrstft 16.79, bleedless 35.29\n\nside — SDR 10.39, aura\\_mrstft 13.68, bleedless 27.95\n\navg — SDR 10.40, aura\\_mrstft 15.24, bleedless 31.62\n\nI recommend overlap 8 (or more) for best quality.”\n\n- anvuew released a new “vocal separation model specialized for magnitude spectrum accuracy.” called [BS\\_RoFormer\\_mag](https://huggingface.co/anvuew/BS_RoFormer_mag) | [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\nBleedless 32.17, fullness 22.15, SDR 11.09['](https://mvsep.com/quality_checker/entry/9684)\n\nWorked in UVR, but compatibility with RTX 5000 patch is not guaranteed.\n\n“managed to pull out harmonies I knew were missing from my test track as well as not extracting some vocal sampled perc that had been in other models previously” - cristouk\n\n“pretty noisy to me compared to fv7beta2\n\nbut I only tried it on two songs. def was keeping the harmonies at a more proper volume and less muddy, though. but I think it has the same problem as deux.. a lot of noise sometimes during quieter/silent parts” - rainboomdash\n\nFor comparison: Becruily “[deux](https://huggingface.co/becruily/mel-band-roformer-deux/tree/main)” for vocals:\nvoc. bleedless: 28.30, fullness: 23.25, SDR: 11.37[’](https://mvsep.com/quality_checker/entry/9482)\nAll the bleedless, fullness, SDR metrics of the “mag” model are a bit better than Revive3e.\n\n- drypaintdealerundr shared two previously unreleased Phantom Center/Similarity extraction models in BS-Roformer architecture, from some training sessions from September or October. [DL](https://drive.google.com/drive/folders/1jSQ3FdbQOjC6PnIWmqMByPmlK482iAKT?usp=sharing)\n\n- unwa released [BS-EXP-SiameseRoformer](https://huggingface.co/pcunwa/BS-EXP-SiameseRoformer) vocal model | [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\n“seems pretty high bleedless... muddying/breaking up the voice a bit” - rainboomdash\n\nIncompatible with UVR. You need to replace that attached py file in your MSST installation.\n\nSome training details in the repo.\n\n- It looks like the model below doesn't work with the UVR's RTX 5000 patch. First, giving an inference error, then with torchdynamo when you use a fixed config probably already linked below, but I'll link here another working one by analogspiderweb just in case: [DL](https://drive.google.com/file/d/1UCTBgYalu4FbPt5HR0XWSORUDeAdK6mV/view?usp=drivesdk).\n\nLike always in case of torchdynamo issues, we recommend [MSST](#_2y2nycmmf53) instead for faster separation on RTX 5000 GPUs, instead of using UVR’s non-RTX 5000 patch, as you will have near zero GPU acceleration in such case on those GPUs.\n\n- New Mel-Roformer [small\\_karaoke\\_gaboxauf](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/small_karaoke_gaboxaufr.ckpt) by Gabox and Aufr33 was released | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-small/blob/main/config_melbandroformer_small.yaml) | [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY) (fixed)\n\nCompatible with UVR (but not RTX 5000 patch). [Metrics](https://mvsep.com/quality_checker/entry/9661).\n\n“already noticing that it sounds much more fuller than other karaoke models i use [anvuew bs roformer, bs karaoke gabox, and becruily karaoke] and clears out bg vocal much more accurately” - baptizedinfear\n\n“great! got to hear vocals I could not hear before.” - makesomenoiseyuh\n\n“it's not great with duets btw” - Gabox\n\nLowering chunk\\_size to 352800 makes it muddier, but less noise, “maybe in between would be nice” - rainboomdash\n\n- (uvronline.app) Model dereverb\\_bs\\_roformer\\_anvuew\\_sdr\\_22.5050 added\n\n- Gabox released experimental [last\\_bs\\_roformer](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/bsroformers/last_bs_roformer.ckpt). It “removes lead vocals focusing in bv”\n\n“vocals has backing vocals and instrumental, instrumental has just lead vocals”\n\n“some lead vocal leaked into instrumental so i'm cleaning the dataset for the fifth time” - Gabox\n\n- (MVSEP) “New wind model MVSep Bagpipes (bagpipes, other) has been added.\n\nDescription: <https://mvsep.com/algorithms/113>\n\nDemo: <https://mvsep.com/result/20260221123255-f0bb276157-mixture.wav>\n\nNew model MVSep Braam (braam, other) has been added. I put it in the \"Effect\" section.\n\nDescription: <https://mvsep.com/algorithms/115>\n\nDemo: <https://mvsep.com/result/20260221124005-f0bb276157-mixture.wav>” - ZFTurbo\n\nBraam is a loud, potentially distorted, low-sounding effect, most commonly known from the movie Inception (the composer used piano with 10x brass for it).\n\nBagpipes - “not too bad, you can still faintly hear the pipes though.” - fal\\_2067\n\n- [splifft](https://github.com/undef13/splifft) now has support for most community [models](https://discord.com/channels/708579735583588363/1220364005034561628/1473047214409515138) (BS/Mel-Roformers and MDX23 supported). [More](https://discord.com/channels/708579735583588363/1220364005034561628/1473047214409515138).\n\n- unwa released Big Beta 7 Mel-Roformer vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big) | [Mkd Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nBleedless 38.77, fullness 16.20, SDR 11.20[.](https://mvsep.com/quality_checker/entry/9633)\n\n“SDR is lower than the BS model, but personally I prefer this one. (...) Although not reflected in the metrics, noise has been reduced in sections without vocals.” - unwa\n\nMore bleedless model than 6X:\n\nBleedless: 35.16, fullness: 17.77, SDR: 11.12\n\n- luidrains’ GitHub account has been suspended without a warning, and along with it, all the repos are gone. They have been moved [here](https://codeberg.org/lucidrains) and [here](https://gitlab.com/lucidrains).\nDear Microsoft,\n\nHave you fucking lost your mind?\n\n- Anvuew released a new BS-Roformer 22.5050 de-reverb [model](https://huggingface.co/anvuew/dereverb_bs_roformer/tree/main)\n\n“Very interesting model, sounds like the Mel-Roformer one but even more aggressive, it's good”, it works on “reverb effect on vocals” at least - isling\n\n“sounds cleaner than the `mono sdr 20.4029` one” - rainboomdash\n\n“The dataset is the same as mel, all reverb is generated by VST plugins and includes waves IR1 presets. so it depends on whether you consider IR1 to be room reverb” - anvuew\n\n- It turns out someone found a way to use GPU Conversion with Wine by using Bottles to use newer [UVR](#_6y2plb943p9v) Windows codebase with better support of the newest Roformers than using outdated native Linux code which is also convoluted to install.\nAlthough it might be slower than [MSST](#_2y2nycmmf53) due to additional translation layer, so consider MSST instead, but it was still relatively fast on RTX 5060 using Demucs (we received an independent confirmation from other users that it works indeed):\n\n\"My way to easily run UVR on Linux:\n\nI just downloaded Bottles\n\n<https://usebottles.com/> (which uses WINE) and used the provided .exe file from the repository's releases. I created a new bottle with a gaming profile (to utilize GPU) and moved the exe file into \"drive\\_c\" (otherwise it won't work), than just ran through the installer and it worked like a charm!\" ([src](https://github.com/Anjok07/ultimatevocalremovergui/issues/2108#issuecomment-386623143))\n\nBTW. “Soda is bottles internal WINE fork”\n\n- “Discord will require a face scan or ID for full access next month (...) Users who aren’t verified as adults will not be able to access age-restricted servers and channels, won’t be able to speak in Discord’s livestream-like “stage” channels, and will see content filters for any content Discord detects as graphic or sensitive. (...) However, some users may not have to go through either form of age verification. Discord is also rolling out an age inference model that analyzes metadata like the types of games a user plays, their activity on Discord, and behavioral signals like signs of working hours or the amount of time they spend on Discord.” - [theverge.com](https://www.theverge.com/tech/875309/discord-age-verification-global-roll-out)\n\n- In February 2026 it started to happen that Chrome on Windows was closing after opening this document. It helps to reopen it a few times when it closes. Before I also tried it along with uninstalling the offline GDoc extension (restart the browser afterwards) and later after going to: chrome://settings/content/all?sort=data-stored > Google > docs.google.com (I just did all of these), but the issue was recurring. Also, when you just disable the extension it was able to re-enable itself, but the issue might be bound more to another browser extension incompatibility too. Actually the same things even started after opening YT, and multiple reopenings of the browser helped in both cases ultimately. Few days later the same happened on Gmail.\nYou can also just download this document as PDF, or docx to preserve up-to-date table of content (although it will list all the headings used in the document in the docx, so it will be messy).\n\n- “Github is facing issues since some hours\n\n<https://www.githubstatus.com/>”. It results in at least some problems with cloning and errors 500 & 128 in Colabs randomly.\n\n- MVSEP introduced a limit of 50 separations per day for free users ([more](https://discord.com/channels/708579735583588363/911050124661227542/1468312326385176812)). There were cases where certain free users were clogging the queue substantially.\n\n“There won't be reset [time], it will just count all separations in the last 24 hours.”\n\n- Jasper (a.k.a. jazzpear) made a an auxiliary/helper model for DNRv3 and some vocal model working for [speech](#_o6au7k9vcmk6), serving for extracting foreground and background SFXs (it wasn't trained on voice and music; [more info](https://discord.com/channels/708579735583588363/708580573697933382/1467461947187527814))\n\nMDX23C: [model](https://drive.google.com/file/d/1ftKK4G9aOPknAm5wvGsJpZpFJ5L4rjBe/view?usp=sharing) | [yaml](https://drive.google.com/file/d/1xo37CIqjmpgSBM5ZENsVYysQs_sILPiv/view?usp=sharing)\n\n- Mel-Roformer for background music from a movie: [model](https://drive.google.com/file/d/10eTjyjmUwdjTgfEBS4Cwks-XxBLEkEEu/view?usp=sharing) | [yaml](https://drive.google.com/file/d/1WDgbQPTykaEVNrG4qRy1FlwE_YCiOE6V/view?usp=sharing) (stems: voice with SFX/BGM without singing voices), you might want to use some vocal model as a preprocessor here too.\n\nCompatibility with UVR not guaranteed, consider using [MSST](#_2y2nycmmf53) in case of any issues.\n\n- Gabox released [voc\\_fv7](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv7.ckpt) Mel-Roformer | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/v7.yaml) | makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nBleedless: 33.85, fullness: 17.99, SDR: 11.16[’](https://mvsep.com/quality_checker/entry/9588)\n\nBeta 6X metrics for comparison\n\nBleedless: 35.16, fullness: 17.77, SDR: 11.12\n\nHyperACE voc V2 for comparison:\n\nBleedless 34.08, fullness 19.10, SDR: 11.40\n\nSometimes results sound not as noisy as BigBeta5e (but it depends on a a song) and voc HyperACE (which might sound less muddy, but noisier), but it catches harmonies much better than BigBeta6X:\n\nvoc\\_fv7 is \"a little noisy, quiet, but at least it's capturing [the harmonies] (...) a little more muddy than big beta 6x, big beta 6x is already muddy... so it's just.. meh...\n\ndefinitely captures harmonies a LOT better than big beta 6x.\n\nelsewhere from those, it's pretty mixed... sometimes big beta 6x was better, sometimes fv7. (...)\n\nI tested a few more songs, a couple fv7 was fuller, but most songs it's more bleedless compared to big beta 6x.\n\nAs with all models, it varies drastically song to song on how full it is compared to other models\n\nfv7 was also cutting the reverb really aggressively on one song compared to big beta 6x. I'd personally like if it was a little fuller, like in-between fv7beta2 and fv7.\n\nBleedless models usually have significant issues with backing vocals, they make the vocal way quieter than it's supposed to be to suppress noise (...) I am noticing fv7 frequently creates bursts of noise, but it's \\*usually\\* good.. it's doing that to capture something difficult\n\nwhile big beta 6x doesn't do that as much\n\noverall, fv7 does seem better than big beta 6x.. it might have a \\*slightly\\* higher chance of capturing instruments.. but far less than fv7beta did.. I would need to do more testing to really know. and it's pretty comparable to big beta 6x in noise level, so I'd say a good upgrade overall.. big beta 6x is usually good, but it falls apart sometimes, especially some backing vocals it just completely falls apart\n\nvoc hyperace is noisy in comparison (...) bleedless score is about the same between fv7 and voc hyperace.. but the fullness of voc hyperace is about 1 higher in metrics\n\nwhich more would line up with how much noise I'm hearing.. it's def perceptually noisier (...) “def picks up more instruments than big beta 6x :/ but it picks up harmonies waaaay better... meh (...) it's picking up a lot of partial intact stuff from the instrumental\n\nbut the overall noise level is relatively low\" - rainboomdash\n\n- “I released example of Telegram bot: <https://github.com/ZFTurbo/MVSep-Telegram-Bot-Example>\n\nIt was mostly prepared by a student, so be [forgiving] please.\n\nAlso I put bot on the permanent run here: <https://t.me/MVSep_com_bot>\n\nReport [here](https://discord.com/channels/708579735583588363/911050124661227542/1465068360009253135) what changes you want if any.” - ZFTurbo\n\n“it's always 120bpm. if you want it to play at the speed of the original file, set your project bpm to 120” - isling\n\n- Unwa released [BS-Roformer-Large-Inst](https://huggingface.co/pcunwa/BS-Roformer-Large-Inst) a.k.a. bs\\_large\\_v2\\_inst (there was no v1) | makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nFullness: 32.06, bleedless: 43.95, SDR: 17.61\n\nIt uses a custom bs\\_roformer.py file attached - just replace it in the MSST installation (UVR is incompatible).\n\nTraining details:\n\n\"Instead of increasing the depth to 16, I added a four-layer TransformerBlock to the MaskEstimator.\" 238MB weight.\n\n“sometimes was good and other times it completely missed the separations”, more noise than flowersv10 model - mesk ([spectrogram](https://discord.com/channels/708579735583588363/708579735583588366/1465030828336611573) example with lacking fragments missed by the model). “It leaves residue in some places” - fabio06844\n\n“Just looking at metrics, fullness is ranked about the same as resurrection FNO” - rainboomdash\n\n- Gabox Flowers FV10 model added on uvronline.app while using following links:\n\n<https://uvronline.app/ai?discordtest>\n\n- link for free accounts\n\n<https://uvronline.app/ai?hp&test>\n\n- for premium ones\n\n- Phase fixer Colab seems to be fixed [here](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm)\n\n- xlancelab who won the last raw [stems restoration competition](https://www.arxiv.org/abs/2601.04343) released pth models based on the BS-Roformer SW model along with dedicated code for chained separation (separation>dereverb>denoise).\n\nIn addition to the SW stems, they needed to train percussion and drums stems separately, and also orchestra and synthesizer. 9 stems (besides attached Aufr33 27.99 denoise and anvuew 19.17 dereverb models converted to single pth files too).\n\n[Inference](https://huggingface.co/spaces/chenxie95/xlance-msr/tree/main) | [models](https://huggingface.co/chenxie95/xlance-msr-ckpt/tree/main)\n\n“Their training used L1 loss combined with multi-resolution STFT loss on MoisesDB and a manually cleaned version of the RawStems dataset (...) single H200 GPU for large scale training and a single 4090 GPU for small scale training [was used]” [more info](https://huggingface.co/spaces/chenxie95/xlance-msr/blob/main/system_card.md) (thx jarrdou, becruily, anvuew)\n\n“i tested vocals with one song, turns out it's supposed to give you dry lead vocals (didn't sound very good anyway)” - becruily\n\n- (MVSEP) New model MVSep SATB Choir (soprano, alt, tenor, bass) has been added.\n\nDescription: <https://mvsep.com/algorithms/104>\n\nDemo 1 vocals: <https://mvsep.com/result/20260108154639-f0bb276157-mixture.wav>\n\nDemo 2 vocals: <https://mvsep.com/result/20260108155023-f0bb276157-mixture.wav>\n\nDemo strings: <https://mvsep.com/result/20260108154828-f0bb276157-mixture.wav>\n\nVery big thanks to Dry Paint Dealer Und for helping me to create this model.\n\nP.S. Model works not only with vocals but with strings too\" - ZFTurbo\n\nIt works also for instruments, piano layers \"able to split chords into each individual layer (...) and such\" - pitbulldale305\n\nIt's a BS-Roformer, and not a fine-tune of the previous model “Metrics [are] much better, so I'm not sure if it's reasonable to use the old model.”\n\nBS Roformer 11.89 is currently used for the \"Extract vocals first\" option.\n\n\"it could be worth exploring running karaoke model first and then SATB, since SATB might combine the lead with some bvs (unless this is expected)\" - becruily\n\n\"Im sure if you used a combination of a karaoke model to get a clean lead/backing stem, then put the backing through this model you would be so much better off than we have been\" - dynamic64\n\n\"Works pretty well on both \"inverted acapellas or official acapellas, official acapellas definitely would have cleaned BV\".\n\nDespite vocals, considering it was trained on DPDU’s dataset, it was also trained on MIDI strings, plus ZF usually never starts training from scratch from what he once said, but rather retrain on a model which already knows some patterns. It's also good for dubstep instrumental and \"I think this is really useful for audio to midi\" and sheet music transcriptions - pitbulldale305, dynamic64\n\nQ: How close are we to actual harmony separation?\n\nA: We're there. With a combo of karaoke and this SATB model. It's probably more manual than you'd like it to be though - dynamic64\n\nQ: What's the difference then between Choir and Choir SATB models, ie: why would anyone choose choir if you can use Choir SATB? Is the accuracy numbers higher on choir?\n\nA: Choir separates out only the choir, SATB splits the whole track into one of those stems.\n\nIf you want a choir out of a song separated, you need the choir model. - dynamic64\n\n- “New model MVSep Choir (choir, other) has been added.\n\nDemo: <https://mvsep.com/result/20260107221631-f0bb276157-mixture.wav>” - ZFTurbo\n\n(works for e.g. choirs buried under main vocal in e.g. pop music)\n\n- Gabox released [inst\\_gaboxFlowersV10](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gaboxFlowersV10.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/v10.yaml)) Mel-Roformer | makidanyee [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY)\n\nInst. bleedless 36.68, fullness 37.12, SDR 16.95[’](https://mirror.mvsep.com/quality_checker/entry/9505)\n\nAll the metrics are better than Inst\\_FV8b\n\n(Inst. bleedless: 36.90, fullness: 35.05, SDR 16.59)\n\nConsider phase fixing with Becruily voc model instrumental result - prodbyluke/ghostofmiami\\_\n\nIncompatible with RTX 5000 UVR patch (users will encounter “ModuleNotFoundError: \"No module named 'torch.\\_dynamo[.polyfills.fx](http://.polyfills.fx)'\")\n\nChunk size for the flowers model in the Colab should be 352800.\n\n“it was going to be \"inst\\_gabox\\_deuxnt”\n\nit is deux, just with \"other\" target (...)\n\nsome pop/rock n'roll added” to the dataset - Gabox.\n\nThe model is planned for training further in future.\n\nBe aware that the yaml is new, and you cannot use any old one for that model.\n\n“a huge improvement over what Gabox has accustomed us to!!!! I got amazing results. It truly achieves a stability that previous models lacked; the clarity is much greater in these.” - billieoconnell.\n\n“sounds pretty good tbh (at least for my songs)” - neoculture\n\nMore sensitive to keep some tiny sounds over gaboxflowersv10 - sakkuhantano\n\n- (MVSEP) “Three new models have been added.\n\nIn BS Roformer (vocals, instrumental):\n\n1) unwa BS Roformer HyperACE v2 instrum (SDR instrum: 17.40)\n\n2) unwa BS Roformer HyperACE v2 vocals (SDR vocals: 11.39)\n\nIn MelBand Roformer (vocals, instrumental):\n\n1) becruily deux (SDR vocals: 11.35, SDR instrum: 17.66)”\n\nWhen you search for “deux”, it somehow doesn't appear, you need to go to the MelBand section and find it near the end of the models list.\n\n- jarredou evaluated different chunk\\_size settings and how they affect different evaluation parameters on the deux model. Vocal stem was tested, but he said the inst stem has similar curves.\n\nBest SDR: 705600 (possibly the least crossbleeding)\n\nBest fullness: 1102500 (requires lots of VRAM)\n\nBest bleedless: 441000 (still not low enough for 4GB VRAM on AMD and Intel GPUs in UVR)\n\nDefault in the yaml: 573300 (becruily’s recommendation till at least now)\n\n“I also did the same test as jarredou, before releasing and inded 705600 gave the highest SDR but it also added a bit more noise (for instrumentals it might be good, for vocals not so much)\n\nso I found 573300 to be a good middle ground between fullness/bleedness/SDR” - becruily\n\n“882000 - seems the maximum viable one if you aim for best fullness before such diminishing returns” - makidanyee\n\n“I do recommend trying higher chunk sizes for instrumentals with deux\n\nI find 661500 works for a lot of songs, 749700 for a good amount of others\n\nhigher=more fullness, less bleedless” - rainboomdash\n\n[More](https://discord.com/channels/708579735583588363/708579735583588366/1456544561336946772)\n\n- (uvronline) “I updated the pre/post processing for the Male/Female and Mel-RoFormer Lead/Back models. Now the Deux model is used, so the instrumentals will be less muddy.” - Aufr33\n\n- (uvronline) The new Deux model by becruily added. ~~It uses vocals stem and instrumental is inversion, not dedicated instrumental stem, and that's how becruily recommends to use it.~~ It was fixed, and was actually a mistake.\n\n“Another improvement: since the deux model creates two stems, phase correction is now applied immediately. You don't need the correct phase button “- Aufr33. At least before, phase fixer was only for premium.\n\n“First, phase inversion is performed, resulting in a muddy instrumental. But the result is only used for phase correction.The second stem, which is available for download/listening, is more full.”\n\n- (MVSEP) New model MVSep Celesta (celesta, other) has been added.\n\nDemo: <https://mvsep.com/result/20251230133507-f0bb276157-mixture.wav>\n\n- Thanks to Ari/arxynr, we rewritten [phase fixer](#_8rocw7cwj55) section, so it no longer has the mistakes preventing you from getting correct results as intended.\n\n- Becruily released dual Mel-Roformer model called “[deux](https://huggingface.co/becruily/mel-band-roformer-deux/tree/main)” for vocal:\nVoc. bleedless: 28.30, fullness: 23.25, SDR: 11.37\nand instrumental separation (two stem model):\nInst. bleedless 41.36, fullness 34.25, SDR 17.55\n\nCompatible with UVR Roformer [patch](#_6y2plb943p9v), including the RTX 5000 one | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“Unfortunately it won't null to the mixture perfectly, this applies to any multi stem model” - becruily\n“[for instrumental] the best fullness model and does not even need phase fix. SOTA even (...) my fave fullness model” - dca100fb8\n“[for inst] more bleedless than resurrection inst (...) on a song with mostly piano, sounds considerably less muddy than resurrection inst... maybe overall slightly more noise... but the noise isn't bothersome” Resurrection inst seems to be fuller at times, but also noisier, and when the song has the noise in it intentionally, ress inst tend to pick it up. - rainboomdash. Continues -\n\n“deux just seems to make some things super muddy and/or break them up... like an instrument here is just really wobbly with deux, probably because it doesn't have the fullness required.\n\nhyperace sounds better in areas here, but noisy in others.. I guess you really gotta use all 3 models if you want a great result”\n\n“Love the new model, besides it being on par with v1e in terms of fullness (on some tracks it's even fuller) without the noise tradeoff, it's also noticeably better in terms of bleedlessness as well, it managed to completely remove some faint choir/vocals pretty much all other models weren't able to remove. Most definitely my favorite model so far.” - Shintaro\n\nDeux doesn't work well with VHS recordings - MarsEverythingTech\n\n“vocal model picks up some things super well, like certain backing vocals or there was yelling in the background that it picked up that fv7beta didn't (...) generally slightly less full than fv7beta3 from the songs I tested, but has less instrumental bleed and slightly less noise.” - rainboomdash\n“[voc] sounds great! I don't hear the dips in the vocals from some songs that are overly compressed” - Rage313\n\n“deux vocal stem has more backing vocals than gabox fv7 beta 1-3 and other models I've ever tried, but it may be rather noisy on silent parts or fadeouts” - makidanyee\n\n“I exported it using fp16 just for the smaller size [only 432 MB, the] quality is the same.” - becruily. Trained “almost” from scratch on a rented GPU.\n\nGabox tried to convert it to just an instrumental model, but the quality got worse.\n\n- (MVSEP) “New model MVSep Xylophone (xylophone, other) has been added. Demo: <https://mvsep.com/result/20251223210226-f0bb276157-mixture.wav>\n\n- [Inst\\_GaboxFv9](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv9.ckpt) Mel-Roformer has been released\n\nInst. bleedless 36.20, fullness 37.19, SDR 16.56\n\n- unwa released bs\\_roformer\\_inst\\_hyperacev2 [model](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main/v2_inst) (so alongside the voc v2)\nincompatible with UVR | use [MSST](#_2y2nycmmf53) | makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nInst. bleedless 37.87, fullness 38.03, SDR 17.40\n\n“HyperACE is def way better than v1e+. but I have also noticed it's more noisy” - rainboomdash\n\n“The model files (bs\\_roformer.py) for v2\\_voc and v2\\_inst are the same. (...)\n\nThe aura\\_mrstft score was further improved, and the SDR also increased. (...)\n\nIncidentally, this new model outperformed v1e+ on all metrics. (...) holds the highest aura\\_mrstft score on the instrumental side of the Multisong dataset.” - unwa\n\n<https://mvsep.com/quality_checker/entry/9475>\n\n“kept the chants in [one song], resurrection inst isn't really much better, though...\n\nit takes a high vocal fullness model to extract these (like fv7beta1, even fv7beta2 isn't enough), maybe that's why the instrumental models are getting confused... Surprised it's still very much there with resurrection inst…\n\nI'm surprised; hyperacev2 fixed vocal bleed on one song compared to the previous version. Does seem like vocal bleed with hyperace v2 is a bit better\n\nit's not removed, just quieter in areas where it did happen” - rainboomdash\n\n“picks up vocals better than the first model, but along with the vocals, it muffles other instruments, such as a synthesizer.” - Halif\n\n- Previously released FNO inst model by unwa now has a separate [Colab](https://colab.research.google.com/drive/14kuHLSZm4QjqMHVg14l956joSQjzafea?usp=sharing) #3 to run it\n\n- Along with the inst variant, unwa released BS-Roformer HyperAce vocal [model](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main/v2_voc) v2\n separate [Colab](https://colab.research.google.com/drive/1bd8qmLaE6WSix7M-TNs9Oj948SpHqjJc?usp=sharing) #2 | makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link) | (incompatible with UVR, use [MSST](#_2y2nycmmf53))\nVoc. bleedless 34.08, fullness 19.10, SDR 11.40[’](https://mvsep.com/quality_checker/entry/9470) (there was no v1 variant of HyperAce)\n\n“The model files (bs\\_roformer.py) for v2\\_voc and v2\\_inst are the same.”\n\n“aims to outperform Big Beta 6X overall, and I believe it has achieved that goal.\n\nScores for all metrics except bleedless and log\\_wmse have improved compared to 6X.” - unwa\n\nIs it better than Beta 6X?\n\n“hard to say, imo, tbh\n\nit has some added noise, especially during some quieter parts that big beta 6x doesn't have. Noisier in those areas a lot of times than even fuller models. unwa thinks it's overall better” - rainboomdash\n\n“Compared to the Resurrection vocal model, this new model achieves higher scores across the board except for the bleedless score. (...) This model aims to outperform Big Beta 6X overall, and I believe it has achieved that goal.\n\nScores for all metrics except bleedless and log\\_wmse have improved compared to 6X.” - unwa\n\n“I feel like most songs hyperace gets a good bleedless result but ill probably stick w voc fv7 beta3 or revive e3” - 5b\n\n“It seems to be like.. trying really hard to pull certain stuff out that is causing noise, I think\n\nit doesn't sound that bleedless to me, due to that reason.\n\n(...) there's some noise during some quieter parts but I'll def be adding this to the list of models I use. I was testing against fv7beta2, during louder parts there seemed to be less noise, but during quieter parts there seemed to be more noise (...) I'll prob use fv7beta2 for the most part still, but I'll try adding hyperace vocal model to the mix of models I use. lol, soon I'll be using 10 models throughout one song” - Rainboomdash\n\nNote: It uses its own inference script (bs\\_roformer.py) different from the previous one, and it’s also incompatible with UVR. “You can use this model by replacing the MSST repository's models/bs\\_roformer.py with the repository's bs\\_roformer.py.”\n\nTo not affect functionality of other BS-Roformer models by that file, so older BS-Roformers will still work, you can add it as new model\\_type by editing utils/settings.py and models/bs\\_roformer/init.py [here](https://imgur.com/a/dkGXo2r) (thx anvuew).\n\nFor error while installing the py file for HyperACE model in Sucial’s WebUI:\n\nfrom models.bs\\_roformer.attend import Attend\n\nModuleNotFoundError: No module named 'models'\"\n\nThe fix: “SUC-DriverOld/MSST-WebUI use the name \"modules\" and ZFTurbo/Music-Source-Separation-Training use the name \"models\". And Unwa's bs\\_roformer.py that you replace with, also use \"models\". So you'll have to do some coding and symlink to make it work.” - fjordfish\n\nTraining details: “In v2, the TFC-TDF module used in models like the MDX23C has been added to the FreqPixelShuffle module.\n\nAdditionally, frequency-domain downsampling is now performed downstream of the Backbone module.”\n\nSome components in the SegmModel module were implemented based on this paper: <https://arxiv.org/abs/2506.17733>\n\n“HyperACE is the core of YOLOv13 (and when i asked about that, he replied with [this](https://miro.medium.com/v2/resize%3Afit%3A720/format%3Awebp/1%2AuJ2USYRtHQ7S7Uhn6WiUoA.png) graph [from [here]](https://sh-tsang.medium.com/brief-review-yolov13-real-time-object-detection-with-hypergraph-enhanced-adaptive-visual-a93200963687))”\n\n- (MVSEP) “I added a new Crowd removal model based on BSRoformer architecture. It's available in \"MVSep Crowd removal (crowd, other)\" with name \"BS Roformer (SDR crowd: 7.21)\". SDR increased from 6.27 up to 7.21.” - ZFTurbo\n\nSome people have issues with it:\n\n\"The 6.27 one removed all kinds of crowd noises, sound effects and general noise, this one only removes random bits of music!\"\n\n- HyperACE model added to uvronline.app\n\n- Gabox uploaded an experimental “test” model “instmel.ckpt” loosely on some hosting “(muddy, no losses)” - Gabox\n(it won’t work in the custom import Colab as it doesn’t support direct linking):\n<https://gofile.io/d/jJbBtm>\n\nAlso, a small clarification for the old models was provided:\n\nfv7b - uses mse loss\n\nfv7z - uses l1 loss\n\nfv7 - uses stft loss\n\n- cyatarow with unwa found a way to use MSST natively on Windows on RX 9060 using ROCm and released PyTorch for ROCm for Windows without having to use WSL - [click](https://discord.com/channels/708579735583588363/708595418400817162/1449395523449520209)\n\n- Instruction for MSST-WebUI has been made too - [click](https://discord.com/channels/708579735583588363/708595418400817162/1449671635350192280)\n\n- Our user had some success in porting our Colabs to Keggle. Inference Colab and Apollo ones have been made so far, but the porting process seems to be rather straightforward.\n\nBe aware that it might gradually fall behind with any newer models published in later periods.\n\n<https://www.kaggle.com/code/zzryndm/music-source-separation-training-inference-webui>\n\n<https://www.kaggle.com/code/zzryndm/apollo-colab-inference-i-fucking-give-up-with-this>\n\nHow it was done -\n\n“I didn't even have to debug anything. Pasting the Colab's code without changes worked miraculously.\n\nKaggle doesn't support markdown IN code cells but I found a workaround using U+200C for the variable names.\n\nAlso set the acceleration to GPU P100. It's better than t4x2. I learned that through the \"patience is a virtue\" route.\n\nAnd trust me when i say it was NOT fun” - ryn (xxml)\n\n- Gabox voc\\_fv7 beta 3 added to the [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- (MVSEP) “Four new models have been added:\n\n1) MVSep Percussion (percussion, other) Demo: <https://mvsep.com/result/20251128141738-f0bb276157-mixture.wav>\n\n2) MVSep Keys (keys, other) Demo: <https://mvsep.com/result/20251128142835-f0bb276157-mixture.wav>\n\n3) MVSep Brass (brass, other) Demo: <https://mvsep.com/result/20251128142905-f0bb276157-mixture.wav>\n\n4) MVSep Woodwind (woodwind, other) Demo: <https://mvsep.com/result/20251128143157-f0bb276157-mixture.wav>\n\n- Unwa BS-Roformer HyperACE inst model has been added to a [separate Colab](https://colab.research.google.com/drive/1lqHRm_h122qgpxLyx3xfHsrVei6ASx1t?usp=sharing)\n\n- Gabox released two new models (vocal and instrumental):\n\nMel [vocfv7beta3](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta3.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml)\n\nVoc. fullness 21.82 | bleedless 30.83 | SDR 10.80\n\n“beta 1 and 2... eh, pretty close to same instrumental bleed,\n\nbut beta 3 def a step up from the two songs I compared (...)\n\nmost songs so far, fv7beta3 is fuller than fv7beta1,\n\ndef less robotic sounding at times (when a voice gets quiet/hard to capture, and it just fails).\n\nJust had another song where fv7beta1 was fuller than fv7beta3, but it was also a lot noisier\n\nlarge majority of the songs I tested, fv7beta3 was fuller... I think fv7beta3 is usually a bit noisier than fv7beta1? But also sounds fuller in those cases, I'd say it's generally worth it\n\ninstrumental bleed, usually worse with fv7beta3 versus fv7beta1, but it depends\n\nfv7beta2 is always less full/less noise, but only slightly less instrumental bleed than fv7beta1” - rainboomdash\n\nMel [inst\\_fv7b](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/inst_fv7b.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)\n\nInst. fullness 27.07 | **bleedless** 47.49 | SDR 16.71\n\n“this may be the last beta before the final model” - Gabox\n\nThe highest bleedless metric out of all instrumental models so far. But fullness is worse than even most vocal Mel-Roformers (including BS-RoFormer SW and Mel Kim OG model).\n\n“On the fuller side, somewhere around inst v1e+, maybe a tiny bit below. The main thing I notice is it captures more instruments than v1e+, but isn't muddy like [inst Resurrection] (which also captures more instruments) (...) It can add a lot of crackling noise, though, more than v1e+ (...) can be a little on the noisy side sometimes... but it at least isn't muddy and sounds natural (...) I'd still ensemble if you want the noise reduced - rainboomdash\n\nVs fv7z slightly more buzzing, but fuller and better transients, the b doesn’t eat hihats in sparse mix.\n\n([src](https://discord.com/channels/708579735583588363/708580573697933382/1441680934129631325))\n\n- Unwa released [BS-Roformer-HyperACE](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main) instrumental model | [separate Colab](https://colab.research.google.com/drive/1lqHRm_h122qgpxLyx3xfHsrVei6ASx1t?usp=sharing)\n\nInst. fullness 36.91 | bleedless 38.77 | SDR 17.27\n\n(less fullness than v1e+: 37.89, but more bleedless: 36.53, SDR: 16.65)\n\nNote: It uses its own inference script. “You can use this model by replacing the MSST repository's models/bs\\_roformer.py with the repository's bs\\_roformer.py.”\n\nTo not affect functionality of other BS-Roformer models by it, you can add it as new model\\_type by editing utils/settings.py and models/bs\\_roformer/init.py [here](https://imgur.com/a/dkGXo2r) (thx anvuew).\n\nFor error while installing the py file for HyperACE model in Sucial’s WebUI:\n\nfrom models.bs\\_roformer.attend import Attend\n\nModuleNotFoundError: No module named 'models'\"\n\nThe fix: “SUC-DriverOld/MSST-WebUI use the name \"modules\" and ZFTurbo/Music-Source-Separation-Training use the name \"models\". And Unwa's bs\\_roformer.py that you replace with, also use \"models\". So you'll have to do some coding and symlink to make it work.” - fjordfish\n\n“Currently, this model holds the highest aura\\_mrstft score on the instrumental side of the Multisong dataset. (...)\n\nSome components in the SegmModel module were implemented based on this paper: <https://arxiv.org/abs/2506.17733>\n\nSimply put, it's a module that utilizes hypergraphs to capture global relationships and standard convolutions to capture local relationships, thereby generating the final “Correlation-Enhanced” feature map.\n\nThis weight is based on the following weights. Thank you, anvuew!\n\n<https://huggingface.co/anvuew/BS-RoFormer>” - unwa\n\n- The inference file got updated to fix error\n\nConsider changing overlap from default 4 to 2 in the yaml of the model. The difference won’t be really noticeable for most people, but it will be faster.\n\n“thirty minutes of audio [on 4090]:\n\nFNO was 71.38 seconds, HyperACE was 120.37\n\nso HyperACE is about 2x longer than FNO (...) Does seem like HyperACE is picking up more instruments than v1e+\n\ndoes seem like slightly worse vocal bleed overall (still need to test this more, though)... haven't encountered the super tinny vocal bleed like v1e+, at least\n\nstill fails to pick up that brass instrument on one song... Not really any worse than v1e+, though (...) resurrection inst does sound more muddy, but also a lot less noise... which makes sense... IDK, a little muddy for my tastes.\n\nI did find one song/spot and resurrection inst was on par with HyperACE in picking up the wind instrument, v1e+ lost it for a bit.\n\nI have found in the past that resurrection inst generally picks up more instruments than v1e+ (...) fullness of HyperACE is much closer to v1e+ than resurrection inst (...) it gets pretty staticy compared to v1e+ [on some drums] (...) v1e+ does this to a lot less extent\n\nit's not super common, though… (...) I'm very confident in saying HyperACE picks up more stuff than v1e+.\n\nResurrection inst does pick it up much better than v1e+, but I think it's still too quiet\n\nresurrection inst really does just pick up so much more instruments, despite having a lot less fullness” - rainboomdash\n\n“fullness that is comparable to v1e+, but has significant more vocal crossbleeding in instrumental than BS Roformer Resurrection Inst, but still less than v1e+ and v1e” - dca100fb8\n\n“the best instrumental model ive ever heard\n\nUnbelievable how realistic it sounds\n\nespecially with bass and piano - PezZHasACat/pezz23\n\nMight have problems with flute in specific songs - Hen\n\n- (MVSEP) “We have released a new model 'MVSep Lead/Rhythm Guitar (lead-guitar, rhythm-guitar) '. It has two variants:\n\n1) Two-stage model (SDR: 9.21) - Best guitar model applied, and then 2-stem model is used which can separate lead/rhythm guitar.\n\n2) One-stage model (SDR: 9.02) - Single model is applied, which was trained on a 3 stem dataset.\n\nThey can give pretty different results, so worth trying both.\n\nDemo: <https://mvsep.com/result/20251120090832-f0bb276157-mixture.wav>\n\n- (MVSEP) We have released the \"MVSep Plucked Strings (plucked-strings, other)\" model.\n\nDemo: <https://mvsep.com/result/20251120092757-f0bb276157-mixture.wav>” - ZFTurbo\n\n- fr4z49 reported that they managed to use MSST with ROCm 7 and 6 on Linux and AMD RX 7600 for fast separations. Officially, it’s not [supported](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) by AMD, but works, although your mileage might vary from GFX to GFX ([range](https://llvm.org/docs/AMDGPUUsage.html#processors) of GPU models inside various generations/archs).\n- Probably, 5700 XT with some older ROCm versions (e.g. older than 6.33) might work too (e.g. HIP 5.7 and ROCm around 5.2.\\* - [src](https://www.reddit.com/r/ROCm/comments/1gcf3x4/comment/ma62qop/), although you can try out 6.21 or 6.2.x to ensure, as it could happen that some earlier 6.x wasn’t supporting RX 5700 XT correctly, while e.g. for RX 6000 ROCm 6.24 worked in some apps at some point, but more up-to-date information might be found in some ZLUDA guides, as it needs ROCm too - some suggestions [here](https://discord.com/channels/708579735583588363/875539590373572648/1429855493668732989)).\n\n- The most performance gains on ROCm 7 might be potentially observed on officially supported GPUs like Instinct MI350 CDNA 4, providing even 3-7x performance gains over 6.0 in some applications ([more](https://www.techpowerup.com/341074/amd-launches-rocm-7-0-up-to-3-8x-performance-uplift-over-rocm-6-0)).\n\n- Official support for RX 400/500 (a.k.a. Polaris/GCN 4/GFX803) GPUs support was dropped, but you can follow [this](https://github.com/robertrosenbusch/gfx803_rocm) repo for unofficial ROCm 6 support.\n\nOr for ROCm 5, [this](https://github.com/nikos230/Run-Pytorch-with-AMD-Radeon-GPU) Ubuntu guide (it might even potentially work from Windows using WSL [if using at least Ubuntu 22.04 LTS] with almost no GPU performance overhead). There seemed to be some issues building Torch (UtilsAVX512.cc/tensorpipe) on Python 3.13, fixed on Python 3.10, and maybe 3.11.9.\n\nAlso, there seems to be some Arch Linux community package to install Pytorch still compatible for these GPUs ([click](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux)).\nOr also might be potentially supported with some other specific versions of ROC, e.g. 5.7.2 and also described above:\n\n*export ROC\\_ENABLE\\_PRE\\_VEGA=1* (deprecated in ROCm 6; might help for lacking dependencies or wheel building issues). Or check out also this:\n\n<https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/10435#issuecomment-1555399844>, or alternatively follow below instructions:\n\n<https://pytorch.org/get-started/locally/> and then execute:\n\n*pip3 install torch torchvision torchaudio --index-url* [*https://download.pytorch.org/whl/rocm5.4.2*](https://download.pytorch.org/whl/rocm5.4.2)\n\n- Since then also ROCm 6.4.4 allowing using PyTorch natively on Linux and Windows on RX 7000 and 9000 was released ([more](https://www.techpowerup.com/341329/amd-enables-pytorch-on-radeon-rx-7000-9000-gpus-with-windows-and-linux-preview)), but it wasn’t tested yet ([DL](https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-PREVIEW.html)). You might get the [error](https://discord.com/channels/708579735583588363/708595418400817162/1439447187183505541) using at least WebUI.\n\n- Also, you might want to experiment with using ZLUDA in UVR (CUDA>ROCm translation layer - some suggestions [here](https://discord.com/channels/708579735583588363/875539590373572648/1429855493668732989)).\n\n>fr4z49 ROCm report:\n\n- “I managed to make [MSST-WebUI](https://github.com/SUC-DriverOld/MSST-WebUI) work [on Linux] with:\n\n*Torch 2.10.0.dev20251110+rocm7.0*\n\non RX 7600\n\n(...) it seems like ROCm 7.0 is about a second faster [than 6.x]”\n(probably by adding just pip install before it)\n\nTurns out that if you do:\n\n*export TORCH\\_ROCM\\_AOTRITON\\_ENABLE\\_EXPERIMENTAL=1*\n\nit uses waay less VRAM and processes even faster.\n\ninst\\_V1e\\_plus batch\\_size=2 overlap=3 chunk\\_size= 485100, 51.78s/it [3:50 of audio in 61 seconds]\n\nFor ROCm 6.x (a tad slower, might work on more GPUs) use:\n\n*torch 2.9.0+rocm6.3 torchvision0.24.0+rocm6.3 [--index-url https://download.pytorch.org/whl/rocm6.3]*\n\nOr older version suggested above*.*\n\nThanks, fr4z49.\n\n- yxlllc’s harmonic noise separation VR (6 or 5.x model, unsure) if someone was interested:\n<https://github.com/yxlllc/vocal-remover/releases/tag/hnsep_240512> (July 2025)\n\n- Gabox released beta 2 of vocfv7 Mel-Roformer “fullness went down a little bit”\n\nVoc. bleedless: 31.55, fullness: 20.44, SDR: 10.87\n\n<https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta2.ckpt> | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (TL;DR in the vocals section)\n\n“still quite a bit fuller than big beta 6x, but has less noise than even fv4 (also a bit less fullness, of course)” at least when the instruments are loud, fv7beta2 is usually quite a bit less noisy than fv4, while still maintaining a decent amount of fullness... it is a bit less, but not too much (...) both are pretty noisy with fv4 (...)\nstill gonna have an issue with backing vocals compared to fv7beta1 sometimes... (makes sense, it's a less full model). (...) Fv7beta2 has still been significantly better with BV than fv4, despite quite a bit less noise” but “significant issues on one song, while fv6/fv7beta1 didn't (...) Def an improvement over fv4. I'm really liking the balance of fullness and noise for most songs. fv4 and fv6/fv7beta1 are usually pretty noisy... this is less noisy, but still has a good amount of fullness.” “Where the noise was undesirable and I ensembled fv4/fv6/fv7beta1 with big beta 6x, now I can just use this instead”.\n\n“Fv7beta2 has still been significantly better with BV than fv4, despite quite a bit less noise” but “significant issues on one song, while fv6/fv7beta1 didn't”\n\n“If the noise isn't an issue, and you just want fullness, fv6/fv7beta1 are still the best models. I'd say fv6 and fv7beta1 are better models than fv4, fullness/noise aside. It depends with fv7beta1 versus fv7beta2, sometimes the noise can be pretty significant with fv7beta1, and fv7beta2 may have the fullness you desire.\n\nfv6 is usually more noisy/full than fv7beta1, but it just depends... I've had instances where it's less noisy/full than fv7beta1. But if you really want high fullness, fv6 and fv7beta1 are the choices. Sometimes fv6 can be quite a bit more noisy and the gain in fullness isn't worth it”.\n- rainboomdash (thx).\n“vocals sound very robotic with those models, however. Compared to fv4” - pipedream\n\n- Anvuew released a new BS-Roformer vocal model:\n\n<https://huggingface.co/anvuew/BS-RoFormer>\n\nIt also doesn't work on the UVR’s RTX 5000 patch - then use [MSST](#_2y2nycmmf53) instead.\n\nOn an M1 Mac, you will probably need to decrease chunk\\_size in the yaml a bit.\n\n\"On one song it was on par with the 2025.07 BSRoformer model on MVSep, at least to my ears (Tentative by System Of A Down)\n\nThe other song [Linkin Park - Part of Me] has some background vocals that are hard to get for a lot of voc/inst models, 2025.07 manages to get them while this model doesn't.\n\nThe instrumentals and vocals seem pretty good other than that\" - ryanz48\n\n“Guessing it's not high enough fullness\n\nthe [l o s t](https://drive.google.com/file/d/18mGbHJM8KEqKHpqLzNDpT9mXPbgO_XYM/view?usp=sharing) is extremely muddied, and the other chanting is just gone.\n\nThe lower harmony at 0:49-0:53 is mostly gone, making it sound very thin\n\nand a lot of the vocals just sound like they are breaking apart.\n\nHmm, big beta 6x is significantly better, it's def a fullness issue.. probably too high of a bleedless model for my tastes\n\nbig beta 6x still isn't super great here, the one I used for the other one I posted was fv7beta1, which is a fullness model.\n\nyeah, big beta 6x seems more balanced, it's a bit fuller but not noisy to my ears, either\n\nbut I'm not using headphones, so I won't hear any minor noise easily.\n\nyoooo, it properly doesn't capture the instrument [here](https://drive.google.com/file/d/1KdZEEMTezU4iQhv9-_zLpwPL9A84G8_m/view?usp=sharing).\n\neven FT2 bleedless gets tricked by this part, but this does just fine.\n\nMaybe I'll try it out for this song.. most I'll still use higher fullness models” - rainboomdash\n\n- Anvuew’s BS-Roformer Karaoke Model added to the inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- (MVSEP) “Eleven new models have been added:\n\n1) MVSep Triangle (triangle, other) Demo: <https://mvsep.com/result/20251104082053-f0bb276157-mixture.wav>\n\n2) MVSep Sitar (sitar, other) Demo: <https://mvsep.com/result/20251104082317-f0bb276157-mixture.wav>\n\n3) MVSep Harpsichord (harpsichord, other)\n\nDemo: <https://mvsep.com/result/20251104082809-f0bb276157-mixture.wav>\n\n4) MVSep Tuba (tuba, other) Demo: <https://mvsep.com/result/20251104082845-f0bb276157-mixture.wav>\n\n5) MVSep Bassoon (bassoon, other) Demo: <https://mvsep.com/result/20251104083211-f0bb276157-mixture.wav>\n\n6) MVSep Congas (congas, other) Demo: <https://mvsep.com/result/20251104083239-f0bb276157-mixture.wav>\n\n7) MVSep Bells (bells, other) Demo: <https://mvsep.com/result/20251104083305-f0bb276157-mixture.wav>\n\n8) MVSep Ukulele (ukulele, other) Demo: <https://mvsep.com/result/20251104083332-f0bb276157-mixture.wav>\n\n9) MVSep Dobro (dobro, other) Demo: <https://mvsep.com/result/20251104115825-f0bb276157-mixture.wav>\n\n10) MVSep Wind Chimes (wind-chimes, other) Demo: <https://mvsep.com/result/20251104115849-f0bb276157-mixture.wav>\n\n11) MVSep Accordion (accordion, other)\n\nDemo: <https://mvsep.com/result/20251104115916-f0bb276157-mixture.wav>” - ZFTurbo\n\n- “Yeah, I just tried [the bells] on a drum loop sample with sleigh bells I wanted to isolate, and I got a rude awakening lol”\n\n- “That's why that model name is confusing, lol. What they mean is tubular Bells or chimes. There's currently no sleigh bells model, but the Tambourine model may work”\n\n- “It worked (...) I used drumsep on it before”\n\n“I finally was able to extract the lead guitar in this song using the dobro model, but i noticed how the bass synth is leaking in the dobro stem”\n\n“So I thought, what if I just remove the bass in the source and try again? I did, and now it doesn't pick the lead guitar anymore”\n\n- (uvronline) “Added two new models:\n\nBS-RoFormer Kar (anvuew)\n\nDe-reverb Room (anvuew)” - Aufr33\n\nThe latter is also added to the [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb).\n\n- (MVSEP) “We added MVSep Synth (synth, other) model. Synth is included the following stems: Synth, Synthesizer, Synth Pad, Synth Bass, Synth Vocals, Synth Strings, Synth Percussion, Synth FX, Synth Keys, Synth Brass, Synth Guitar, Synth Flute, Synth Ambiant.\n\nDemo: <https://mvsep.com/result/20251031214429-f0bb276157-mixture.wav>” - ZFTurbo\n\n“So, my initial thoughts. The model works great for certain kinds of sounds e.g. leads, pads, plucks. But it's tricky to predict what it'll do, so might be safer to get rid of the other stems first, and then using synth on what's left if you need further separation.\nSome examples:\n\nIt isn't picking up synth basses for me, so use BS-Roformer SW for that.\n\nIt also sort of picks up synth brass, but wind model is catching that better, at least on the stuff I ran. The same could possibly be said for synth guitars.” - Musicalman\n\n“I tested it on a front channel rip from a cue from the TV show CHUCK, and it ripped the synth lead. But it did not isolate the synth \"effect\" at the end of it” - fal\\_2067\n\n“is good, but sometimes it can't detect some bass synths for some reason, still not a big problem since an SW does the work almost every time. And sometimes it picks the strumming of guitars. And also it seems to fail if they are vocals or harmonies.” - smilewasfound\n\n“Most of the time gets more out of the song rather than taking out guitar, keys, bass separately until synths are left over. Sometimes very few misses or stem bleeds through, but overall very impressive!! It also picks up Vibraphone” - Tobias51\n\n“Synth stem seems way too muddy on full songs, (...) But much better than I was expecting, I'll be honest. It messes up a lot on full songs, it seems. It seems like it's more like a stem remover than an isolator to me. The no synth stems sounds very clean. Tried it on Dolby stems, same thing, the no synth stem was very clean, synth stem sounded a bit muddy though. That's fine though. Gets very confused with bass though.\n\nSeems to also have a lot of the classic MVSep phase issue, where for some reason half the stem is in the synth stem and the other half is in the no synth stem\n\nand inverting it cancels it out (literally almost all models have this issue on MVSep it's very strange). It's much less than the other models, but yeah, it still happens. (...) Using any other model in UVR or uvronline don't have this issue. (...) I put a bass guitar stem from Fortnite festival to test and it has the phasing issue, it inverts” - Isling\n\n“I put it the synth model through a really packed song with no synth to see if it would get tripped up, it didn't other than some bass at the end.\n\nWhich actually didn't get picked up by the bass model, so even that is a win” - dynamic64\n\n- “Several SATB (soprano, alto, tenor, bass) choir models I trained ages ago, currently only a scnet\\_masked model is available, but I did have Demucs, MDX23C, and standard SCNet models that I will upload to this link when/if I find them, although I'm pretty sure the scnet\\_masked model was the best in the end:\n\n<https://drive.google.com/drive/folders/1BpPgtlDk0yqrlArmrq9vnYErixb8I8zJ?usp=sharing>” - Dry Paint Dealer\n\nTreat it as proof of concept.\n\n“I've tried the SCNet one, it's really noisy, and it has a lot of bleed, it kinda works. I can see the potential on this kind of model ngl.” - smilewasfound\n\n“You can’t install [the VR ones] into UVR since that only supports VR v5 [and 5.1] not [VR v6](https://github.com/tsurumeso/vocal-remover/releases)”\n\n- (MVSEP) Seven new models have been added:\n\n1) MVSep Electric Guitar (electric-guitar, other). Demo: <https://mvsep.com/result/20251031064813-f0bb276157-mixture.wav>\n\n2) MVSep French Horn (french-horn, other). Demo: <https://mvsep.com/result/20251031072529-f0bb276157-mixture.wav>\n\n3) MVSep Banjo (banjo, other). Demo: <https://mvsep.com/result/20251031095934-f0bb276157-mixture.wav>\n\n4) MVSep Marimba (marimba, other). Demo: <https://mvsep.com/result/20251031100024-f0bb276157-mixture.wav>\n\n5) MVSep Glockenspiel (glockenspiel, other). Demo: <https://mvsep.com/result/20251031100134-f0bb276157-mixture.wav>\n\n6) MVSep Timpani (timpani, other). Demo: <https://mvsep.com/result/20251031100232-f0bb276157-mixture.wav>\n\n7) MVSep Harmonica (harmonica, other). Demo: <https://mvsep.com/result/20251031100508-f0bb276157-mixture.wav>” - ZFTurbo\n\n“The Harmonica model is hit or miss.” - musicbybrooks\n\n“Wow, the electric guitar model is really neat. One thing I noticed is that it seems to be better than other models at picking up midi/synth lead guitars. At least on stuff I tried.\n\nI think it also gets tripped up a bit more by weird FX and synth sounds being partially flagged as guitar. An interesting model, though, for sure.” - Musicalman\n\n- (MVSEP) “The karaoke model by anvuew has been added under the algorithm \"MVSep Karaoke (lead/back vocals)\". It is available as the option \"BS Roformer by anvuew (SDR: 10.22)\" - ZFTurbo\n\nFor some reason it seems to give worse results than the ckpt anvuew shared.\n\n- Dear friends at Apple Music. Please stop harassing labels and their sound engineers for making Atmos mixes using our and yours awesome AI models for audio separation. The artificial artifacts you're solely looking for in spectrograms in separate channels are inaudible in the entire tracks. The tracks are well mixed and accepted by major labels, but rejected by your lazy ass incompetent bullshit. The quality of Atmos mixes got better since the very beginning, and either your employees, or your algorithms, or both, do a lazy job without even hearing the shit on their own, while still rejecting stuff without sensible reason! You're making things nasty difficult for artists who lost their multitracks for certain legacy songs, rendering re-releasing of their albums in Atmos potentially impossible. Bring it up with the executives. Get your shit together, for fuck sakes!\n\n- Full release of mesk’s rifforge Mel-Roformer [model](https://huggingface.co/meskvlla33/rifforge/tree/main) focused on inst/voc separation for metal music\n\n“The model can have some quirks (just like most models) but it's all around clean for me to release.”\n\nTraining details:\n“Characteristics:\n\nThis is a dimension 512 depth 24 model (so fairly large file size at 1.9 GB!), with an SDR of 14.2436.\n\nIt's finetuned from an older Melband Roformer checkpoint with an SDR of 13.7.”\n\n- Gabox released experimental BS-Roformer karaoke [model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/bsroformers) | [metrics](https://mvsep.com/quality_checker/entry/9198)\n\nIt gives the same error for RTX 5000 UVR patch users as the avuew’s model.\n\n- New ensemble (avg) of anvuew’s and becruily & frazer karaoke models was evaluated on the leaderboard ([metrics](https://mvsep.com/quality_checker/entry/9190) lower than BSkarfrazerBecruily+BSkarMVSEP+MBkarGaboxV2 SDR-wise). Probably you could make a [fusion model](#_900rfc8gjynn) out of the two to save on inference time in cost of slight SDR decrease (both use the same config so it might work).\n\n- erosunica found out that BS-Roformer SW drums is “really good to remove some SFX and foley, way better than DnR v3”\n\n- Gabox released voc\\_fv7 beta 1 Mel-Roformer [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta1.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVoc. fullness: 21.21, bleedless: 30.81, SDR: 10.96\n\n\"Just a better fv4 it seems, better bleedless\" (fullness: 21.33, bleedless: 29.07, SDR 10.58)\n\nvs voc\\_fv4 \"It is noisier.. Kinda closer to beta 5e?” “It's slightly less noise and fullness than beta 5e but picking up the backing vocals REALLY well, significantly better than beta 5e”\n\nBut it's pulling the backing vocals out even better than 5e” “the backing vocals are so good!\n\n“it does have significant synth bleed, too... it at least wasn't coming through at full volume\n\nwhen I say fullness, I specifically mean how muddy it sounds” - Raiboom Dash\n\n- Anvuew released a new Karaoke BS-Roformer model\n\n<https://huggingface.co/anvuew/karaoke_bs_roformer>\n\n<https://mvsep.com/quality_checker/entry/9180>\n\nUVR users will encounter “ModuleNotFoundError: \"No module named 'torch.\\_dynamo.polyfills.fx'\" with this model (consider [MSST](#_2y2nycmmf53) instead). ~~Maybe users of RTX 5000 won’t encounter that issue due to newer PyTorch in dedicated patch.~~ Sadly not - even with CPU only. Even more, the issue seems to exist only on RTX 5000 patch.\n\n“karaoke anvuew extracts lead vocals a bit better than karaoke becruily frazer, and in some parts, the lead vocals from karaoke anvuew still sound brighter compared to karaoke becruily frazer, which sounds a bit more compressed. oh, and for some reason, the becruily frazer model doesn’t detect vocals with radio effects, while anvuew’s model handles them just fine” - neoculture\n\n“lead vocals leak into instrumental (...) Mel Becruily and Frazer’s BS don’t have this problem”\n\nIn that case, maybe “isolate the acapella first in almost all cases of using a karaoke model”\n\nDemo:\n\n<https://pillows.su/f/671f60ffdd615eb2613c78dca70319fe>\n\n<https://pillows.su/f/391ec7ba8353a086989c9c0934321260>”\n\n- (MVSEP) “I added a new DeReverb model <https://huggingface.co/anvuew/dereverb_room>\n\nby avuew. It's available in Reverb Removal (noreverb) [by choosing] DeReverb room by anvuew (BSRoformer). It works only for vocals. Since it is a mono model, it processes 2 stereo channels independently.” - ZFTurbo\n\nDemo: <https://mvsep.com/result/20251017064532-53be20aa17-10seconds-song.wav>\n\n- (MVSEP) Four new models have been added:\n\n1) MVSep Tambourine (tambourine, other). Demo: <https://mvsep.com/result/20251015221411-f0bb276157-mixture.wav>\n\n2) MVSep Oboe (oboe, other). Demo: <https://mvsep.com/result/20251015221618-f0bb276157-mixture.wav>\n\n3) MVSep Clarinet (clarinet, other). Demo: <https://mvsep.com/result/20251015221718-f0bb276157-mixture.wav>\n\n4) MVSep Digital Piano (digital-piano, other). Demo: <https://mvsep.com/result/20251015221944-f0bb276157-mixture.wav>\n\nSometimes it can also work better for normal piano and make even better work then SW if it works.\n\n“Absolutely fantastic for an epiano. i just put your example song through mvsep with the bs sw piano model as a comparison and bs sw did terribly. Ofc your epiano model picked up all of the epiano in a clean way”\n\n“Pretty impressive. Besides it being more full than other piano models (in most cases), it's also by far the only piano model that doesn’t mistakenly pick up other instruments like tubular bells as piano.”\n\n“From what i tested is more for midi piano, i tested with some tracks with that kind of midi sound and it worked way better that SW.”\n\n- (training) Becruily made a modification of [dTTnet](https://github.com/junyuchen-cjy/DTTNet-Pytorch) arch working in MSST ([DL](https://drive.google.com/drive/folders/1e2NmyxxJU1h2wGxomBl7D-NXJBYHMXsU?usp=sharing)).\n\n“They report very good performance on vocals with low parameters” - Kim\n\nBack in the end of 2023, one indie pop song from multisong dataset (of the two there) received the best SDR - Bas Curtiz\n\n“Better than SCNet imo, remains to see if it can beat rofos” - Becruily\n\n“Not fast to train. I'm back with vanilla mdx23c. Trying a config to train model with less than 4GB VRAM, (...) with my 1080Ti and batch\\_size=1, chunk\\_size is around 1.5sec)” - jarreou\n\nInstallation instruction:\n\n“In the latest MSST [at least for 13.10.25]\n\nadd the ddtnet folder to \"models\" and replace your settings file in utils with this”\n\nThe mod breaks compatibility with the authors' checkpoint.\n\n“The weird thing is, it sounds like a fullness model despite not being one, I barely can find dips in instrumentals. [ddtnet vs kim melband](https://drive.google.com/drive/folders/12an8wnKC-FKE48gVu9pHvUaLSxzpC6C8?usp=sharing), if anyone is curious” - bcr\n\n“Also keep in mind authors trained with l1 loss only, default in MSST is masked loss”\n\n“l1 loss when dataset is noisy, mse loss when dataset is clean”\n\n“the loss is defined from msst, but in the original dttnet it was in the code itself\n\nyou can just --loss l1\\_loss”\n\n@jarredou “I copied your tfc and tfc\\_tdf classes to my files (and used that latest stft/istft I sent) - and seems to be better, just like the og dttnet\n\nthe tfc/tdf fixed the nan issue for me (...)\n\nKeep in mind, ddtnet was trained only with musdb and has 10-20x less params while being comparable in quality”\n\n“the authors checkpoints had 16khz cutoff because dim\\_f was smaller than nfft/2\n\nif you want to train model with cutoff it's fine, if you want fullband then dim\\_f must be half of nfft + 1” - becruily\n\nHit our [#dev-talk](https://discord.com/channels/708579735583588363/1220364005034561628/1414698972752248904) for more.\n\n- New sites added to [Site and rippers](#_ataywcoviqx0) ([deezmate.com](https://deezmate.com/) and [tidal.qqdl.site](https://tidal.qqdl.site/)).\nQubuz remains or defunct/problematic for now.\n\n- We have numerous reports about some models like Unwa Resurrection inst having problems on AMD (and probably Intel GPUs) in UVR, returning “Invalid parameter” error. In that case, uncheck GPU Conversion (but it will be slower). If you find a fix, please let us know on the Discord (link at the top of the doc).\n\n- if you deal with slow separation times on becruily & Frazer karaoke model, decrease chunk\\_size to 160000 on 8GB GPUs. As long as decreasing chunk\\_size on CUDA (NVIDIA) doesn't seem to affect separation times, it's not the case with DirectML (AMD/Intel), if you're exceeding your VRAM, but it still doesn’t crash.\n\n- Added anvuew BS-Roformer Dereverb Room (mono) [model](https://huggingface.co/anvuew/dereverb_room) to the inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- Sir Joseph released a Colab for A2SB: Audio Restoration NVIDIA’s upscaler.\n\nIt’s very slow - on 4070 Super it was slow already, and in free Colab we got Tesla T4 with RTX 3050 performance with 12GB of VRAM instead of 8 - memory issues might occasionally occur, Colab Pro recommended.\nOnly inpainting doesn’t work (feature for filling silences if exist or missing parts) - “I couldn't fix the error. If anyone solves it, I'd be glad if they let me know so I can update it too.”. A2SB should rather surpass AudioSR, Apollo and FlashSR (it does at least metrically).\n\n<https://colab.research.google.com/drive/1ThenZDCRTJKV1I_ax17XGWmkB1qoKrFs?usp=sharing>\n\n- Gabox released inst\\_fv4 model. Don’t confuse it with inst\\_fv4noise - the regular variant was never released before (and with voc\\_fv4).\n\n<https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_Fv4.ckpt> ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“Seems to be erasing a xylophone instrument. Does sound not too noisy and not muddy, I like it. (...) A little noisy with piano (I split the song up and process with resurrection inst there). (...) Does have some issues that resurrection inst doesn't have, but it doesn't sound muddy! It usually works great. (...) In my opinion, fv4 still has vocal traces, I don't know if in all of its songs and v1e plus doesn't have them, but the noise can bother you even though it's not much. Does have more vocal bleed at times. I think a lot of what I thought was vocal bleed was a synth, it did a pretty good job... There was one segment on a song where it caught vocal residues, though” - rainboomdash\n\n- neoculture released a Mel-Roformer instrumental model focused on preserving vocal chops\n\nInst. fullness 39.88, bleedless: 32.56, SDR: 14.35\n\n<https://huggingface.co/natanworkspace/melband_roformer/blob/main/Neo_InstVFX.ckpt> ([yaml](https://huggingface.co/natanworkspace/melband_roformer/blob/main/config_neo_inst.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“great model (at least for K-pop it achieved the clarity and quality that no other model managed to have) it should be noted that it has a bit of noise even in its latest update, its stability is impressive, how it captures vocal chops, in blank spaces it does not leave a vocal record, sometimes the voice on certain occasions tries to eliminate them confusing them with noise, but in general it was a model that impressed me. It captures the instruments very clearly” - billieoconnell.\n\n“NOISY AF, this is probably the dumbest idea ever had for an instrumental model. Don’t use it as your main one, some vocals will leak because I added tracks with vocal chops to the dataset. Just use this model for songs that have vocal chops” - neoculture\n\nIt was trained on only RTX 4060 8GB.\n\n- Aname Mel trained a Mel-Roformer model called [Full Scratch](https://huggingface.co/Aname-Tommy/Mel_Band_Roformer_Full_Scratch)\n\nInst. fullness: 25.10, bleedless: 37.13, SDR: 14.32\n\nVoc. fullness: 13.24, bleedless: 30.75, SDR: 8.01\n\n(“trained from scratch on a custom-built dataset targeting vocals. It can be used as a base model or for direct inference. Estimated Training cost: ~$100”)\n\nFor state\\_dict error, update MSST to the last repo version:\n\n!rm -rf /content/Music-Source-Separation-Training\n\n!git clone https://github.com/ZFTurbo/Music-Source-Separation-Training\n\n“and you must reinstall main branch's requirement.txt. (before it, edit requirements.txt to remove wxpython)” - Essid\n\nKim Mel model for reference:\n\nInst. fullness 27.44, bleedless 46.56, SDR: 17.32\n\nVoc. bleedless: 36.75, fullness: 16.26, SDR: 11.07\n\n- (MVSEP) “1) Model by baicai1145 was added in Apollo Enhancers (by JusperLee, Lew, baicai1145) with name Universal Super Resolution (by baicai1145) (...)\n\n2) New option added for Apollo Enhancers (by JusperLee, Lew, baicai1145) - Cutoff (Hz). Sometimes it can be useful to cut higher frequencies before applying model.” - ZFTurbo\n\n- baicai1145 released their own Apollo vocal restoration model, which surpassed Lew’s vocal V2 model metrically.\n\n“with a 92-hour high-quality vocal dataset trained for 1 million steps.”\n\n<https://huggingface.co/baicai1145/Apollo-vocal-msst/tree/main>\n\n<https://mvsep.com/quality_checker/entry/9105>\n\n21.24 vs 13.09 Aura MR STFT\n\n(thx Essid)\n\nReminderL Apollo arch support was added to UVR too (acceleration work with NVIDIA GPUs only). Installing the model should be possible also there, although currently UVR seems to be incompatible with the model outputting following error:\n*KeyError: \"'infos'\"*\n\n- (MVSEP) “Two new algorithms have been added:\n\n1) MVSep Mandolin (mandolin, other). Demo: <https://mvsep.com/result/20250927132339-f0bb276157-mixture.wav>\n\n2) MVSep Trombone (trombone, other). Demo: <https://mvsep.com/result/20250927132547-f0bb276157-mixture.wav>” - ZFTurbo\n\n- ROCm 6.4.4 now allows using PyTorch natively on Linux and Windows on RX 7000 and 9000, so you don’t need WSL with them - [link](https://www.techpowerup.com/341329/amd-enables-pytorch-on-radeon-rx-7000-9000-gpus-with-windows-and-linux-preview)\n\n- BS-Roformer 6 stems added on uvronline\n\n- “I added new version of MVSep Organ (organ, other) model: \"BS Roformer (SDR organ: 5.08)\". SDR increased from 3.05 to 5.08.\n\nDemo: <https://mvsep.com/result/20250924223759-0efd607228-song-organ-000-mixture.wav>” - ZFTurbo\n\n“I think the model is remarkable improvement” - totalmentenormal\n\n“the result is great, no bleed from what I tested it on” - dynamic64\n\n“much better isolation of the Hammond organs compared to the previous model. In places where the organ sound was not picked up before, it is now separated in the track” - lukasz2286\n\n- Google Colab now allows pinning your environment to specific version having the same versions of packages, so maybe your notebook won't break in the future due to changes in the environment introduced by Google in the Colab with package updates.\n\nFor now there is only 2025.07 and the latest environment to choose from, and it's hard to tell if e g. 2025.07 environment will be gradually replaced along the time while new changes to the latest Colab environment will be made:\n\n<https://developers.googleblog.com/en/google-colab-adds-more-back-to-school-improvements/>\n\nTo use it, go to Environment>Change environment type>Environment type version and choose 2025.07 option\n\n- Essid reevaluated GAudio (a.k.a. [GSEP](#_yy2jex1n5sq)) for the leaderboard.\n\n<https://mvsep.com/quality_checker/entry/9095>\n\nInst fullness: 28.83, bleedless: 31.18, SDR: 12.59\n\nThe result would rather cover my observations that instrumentals rather have gotten worse over the years (at least since the last 2023 Bas' evaluation or even earlier, at least for certain songs). But it appears that the vocals might got better.\n\n<https://mvsep.com/quality_checker/multisong_leaderboard?algo_name_filter=Gsep&sort=instrum&ranking_metrics=>\n\nDespite the fact the metrics are worse than even the least bleedless free community models like even V1e, for specific songs where bleeding doesn't occur so badly, GSEP might be still interesting too try out to some limited extend, being a different architecture, sounding maybe less filtered. Also, mixdown of multi stem extraction instead, should rather have bigger bleedless metric, but since the appearance of instrumental Roformers, GSEP relevance for separation is rather faded.\n\n- \"Ensemble of 3 [karaoke] models \"Mvsep + gabox + frazer/becruily\" gives 10.6 SDR on leaderboard. I didn't upload it yet, but I had local testing.” - ZFTurbo\n\n- fabio06844 shared his method for “very clean and full” instrumental lately.\n\n1) Go to MVSep and separate your song with the latest Karaoke BS-Roformer by MVSep Team\n\n2) On its instrumental stem use DEBLEED-MelBand-Roformer (by unwa/97chris)\n\n([model](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/bleed_suppressor_v1.ckpt) | [yaml](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/config_bleed_suppressor_v1.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb))\n\nDespite the fact that “the MVSep Team Karaoke uses the MVSep BS model to extract/remove vocals, then applies [the] karaoke model to that”, it was told to be not enough to just use BS 2025.07 model instead, leaving a little more residues.\n\n- Aname released Mel-Roformer duality [model](https://huggingface.co/Aname-Tommy/Mel-Band-Roformer_Duality).\n\n“it's odd why the model is named duality, but it has a single target (and the file size of the ckpt confirms it further)” - becruily\nIt’s focused more on bleedless than fullness metric contrary to the unwa’s duality v2 model, but with bigger SDR.\n\nInst. fullness 24.36, bleedless 46.52, SDR: 17.15\n\n“instrumental is really muddy” - Gabox\n\nFor comparison -\n\nMel Duality v2 by unwa\n\nInst. fullness 28.03, bleedless 44.16, SDR: 16.69\n\nMelBand Roformer vocals by Kim\n\nInst. fullness 27.44, bleedless 46.56, SDR: 17.39\n\nInstrumental public models with the biggest fullness metric -\n\nGabox Mel Roformer Inst\\_GaboxFv7z\n\nInst. fullness: 29.96, bleedless: 44.61, SDR: 16.62\n\nUnwa BS-Roformer-Inst-FNO\n\nInst. fullness: 32.03, bleedless: 42.87, SDR: 17.60\n\n- (MVSEP) “I added new SCNet vocals model: SCNet XL IHF (high instrum fullness by becruily). It's high fullness version for instrumental prepared by becruily.”\n\nInst. fullness 32.31, inst. bleedless 38.15, SDR 17.20\n\n“One of my favorite instrumental models, Roformer-like quality.\n\nFor busy songs it works great, for trap/acoustic etc. Roformer is better due to SCNet bleed” - becruily\n\n“It's better than BS Roformer (mvsep 2025.07 and inst Resurrection) at low frequencies, but bad at highs due the bleeding. I think it has better phase understanding, because it keeps the harmonics that were masked behind vocals cleaner (but it might not necessary be true to the source, it might just interpret/make up the harmonics instead of actual unmasking)” - IntroC\n\n- Dry Paint Dealer Undr released Melband Roformer and Demucs (“or at least I think this is the correct model file”) Lead and Rhythm guitar [models](https://drive.google.com/drive/folders/1JH2tjhgDcJgvdi-hrQT82RsaV2XhE8hn?usp=sharing).\n\n“My own very mediocre model for it that I never shared. It does work but has issues that I imagine any better executed model won't.”\n\n“Wait, I think it separated doubles in vocals” - isling\n\nDemucs model doesn't work in UVR, as it was trained on MSST, and not the OG code (I tried to workaround the bag\\_num issue [before](#_3c6n9m7vjxul), and failed)\n\n- (MVSEP) “Two new instrumental models have been added:\n\nMVSep Harp (harp, other)\n\nDemo: <https://mvsep.com/result/20250921131108-f0bb276157-mixture.wav>\n\nMVSep Double Bass (double-bass, other)\n\nDemo: <https://mvsep.com/result/20250921131129-f0bb276157-mixture.wav>” - ZFTurbo\n\n“The BS-Roformer SW bass model should probably be used first to extract the double bass. Creates a better sound. This does not apply to bowed double bass.\n\nBowed double bass doesn't get picked up by BS-Roformer and therefore needs the double bass model. Good news is that bowed double bass is picked up in the strings stem so if you run the strings model you're good either way.” - dynamic64\n\n- (MVSEP) “A new saxophone model based on BSRoformer was added. It has a much better metric compared to the previous [model]. SDR grew from 7.13 up to 9.77.\n\nIt's available in \"MVSep Saxophone (saxophone, other)\" with option \"BS Roformer (SDR saxophone: 9.77)\n\nDemo: <https://mvsep.com/result/20250920151232-f0bb276157-mixture.wav>” - ZFTurbo\n\n“after testing this on a song where trumpet and sax play in unison, doing the trumpet model is cleaner than doing the sax model” - dynamic64\n\n“Amazing. Tested it on one song, it got every single Saxophone Part from the song it seems lit. Can hear one small little bitty part of it where it tries to come in off the Sax part, however I can barely hear it” - cali\\_tay98\n\n- GAudio (a.k.a. GSEP) announced their SFX (DnR) model in their API:\n“DME Separation (Dialogue, Music, Effects)”\nSo far it’s not available for everyone on their regular site:\n\n<https://studio.gaudiolab.io/>\n\nBut the link on their Discord redirects to the site with a form to write an inquiry:\n\n<https://www.gaudiolab.com/developers>\n\nShortly after entering the one or both of the links and logging on the first, you might get an email that $20 of free credits to access their API have been added to your account, and link to the API documentation:\n\n<https://www.gaudiolab.com/docs>\n\n- New Dango Karaoke model released\n\n<https://tuanziai.com/en-US/blog/68ca20c87c8c85686c1b4511>\n\nA lot of problems when songs don't have lead vocals in the center.\n\n###### - (MVSEP) “I added new Karaoke model: \"BS Roformer by MVSep Team (SDR: 10.41)\" it's available under option \"MVSep MelBand Karaoke (lead/back vocals)\". [Metrics](https://mvsep.com/quality_checker/entry/9068).\n\n###### In contrast with other Karaoke models, it returns 3 stems: \"lead\", \"back\" and \"instrumental\".\n\n###### Example: <https://mvsep.com/result/20250915192251-53be20aa17-10seconds-song.wav>” - ZFTurbo\n\n“If I had to compare it to any of the models, it is similar to the frazer and becruily model. Sometimes it does not detect the lead vocals specially if there's some heavy hard panning, but when it does, there is almost no bleed, and it works very well with heavy harmonies in mono from what I tested.” - smilewasfound\n\n“becruily & frazer is better a little when the main voice is stereo” - daylightgay\n\n“On tracks I tested, harmony preservation was better in becruily & frazer (...) the new model isn't worse, I ended up finding examples like Chan Chan by Buena Vista Social Club or The Way I Are by Timbaland where it is better than the previous kar model. The thing is, with the Kar models, it's just track per track. Difficult to find a model for batch processing as it's really different from one track to another” - dca100fb8\n\n“I also found the new model to not keep some BGVs, mainly mono/low octave ones, despite higher SDR” - becruily\n\n“I think I've found a solution for people who don't like the new model.\n\nIf you put an audio file through the karaoke model and then put the lead vocal result through that, it usually picks up doubles.\nWhich you can then put in your BGV stem if you'd like” - dynamic64\n\n“it's definitely not as good as the one by frazer and becruily. SDR can be misleading sometimes” - ryanz48\n\nbecruily [“our model] uses 11.9 SDR vocal model as a base”\n\nZFTurbo “I started from SW weights”\n\n“I've had fantastic results with it so far. Much MUCH better at holding the 'S' & 'T' sounds than the Rofo oke (for backing vox). Generally seems to provide fuller results .. but also the typical 'ghost' residue from the main vox can end up in the backing vox sometimes, but it's usually not enough to be an issue. I won't go so far as so say that it's replacing the other backing vox models for me entirely .. but it feels like the best of both worlds that Rofo and UVR2 provide.” - CC Karaoke\n\n######\n\n###### - (MVSEP) “We’ve added a mirror of MVSep (big thanks to okhostok): <https://mirror.mvsep.com>\n\n###### If you have a problem with upload/download speed or can't reach the main site then try the mirror.\n\n###### Report please if it helped you to speed things up.” - ZFTurbo\n\nSome issues with being unable to click separate button for some users were fixed.\n\n######\n\n###### - Gabox released BS\\_ResurrectioN [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/BS_ResurrectioN.ckpt) | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml)\n\n###### “It is a finetune of BS Roformer Resurrection Inst but with higher fullness (like v1e for example), it needs [MVSEP’s] BS 2025.07 (as a source/reference) phase fix [so you “should process the instrumental result using BS 2025.07 then put [it] as source in UVR GUI phase fix tool”]. I requested it because I found some songs where Resur Inst was producing muddy instrum results (...) I requested it not just for me because I saw other people were looking for something like v1e++” - dca\n\n######\n\n###### - anvuew released BS-Roformer Dereverb Room [model](https://huggingface.co/anvuew/dereverb_room) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n###### “specifically for mono vocal room reverb.” as most are recorded in mono.\n\nNot that long inference compared to other Roformers.\n\n“Really liking the fullness in the noreverb stem. Virtually all dereverb roformers I've tried sound muddy, but this one is just the opposite. (...) Other noises may interfere, and in my experience, makes the model underestimate the reverb. [The previous anvuew’s mono model] is way different [from] this one in every way. So, like I say, worth a shot.” - Musicalman. “WOAH this is insane. This would go viral if someone implemented in a plugin” - heuhew\n\nWe have reports about errors in UVR while using this model. Consider using [MSST](#_2y2nycmmf53) instead.\nIf you have stereo errors using MSST on stereo files, update MSST (git clone and git pull commands) or:\n(it might work in your current version and not only in the linked repo too, but potentially the code will be located in a different line, the change will be pushed there later)\n\n“Edit inference.py from my [repo](https://github.com/jarredou/Music-Source-Separation-Training/tree/colab-inference) line 59:\n\nReplace :\n\n# Convert mono to stereo if needed\n\nif len(mix.shape) == 1:\n\nmix = np.stack([mix, mix], axis=0)\n\nby :\n\n# If mono audio we must adjust it depending on model\n\nif len(mix.shape) == 1:\n\nmix = np.expand\\_dims(mix, axis=0)\n\nif 'num\\_channels' in config.audio:\n\nif config.audio['num\\_channels'] == 2:\n\nprint(f'Convert mono track to stereo...')\n\nmix = np.concatenate([mix, mix], axis=0)”\n\n- jarredou\n\n######\n\n###### - BS-Roformer Karaoke [model](https://huggingface.co/becruily/bs-roformer-karaoke/tree/main) by becruily & frazer released | MVSEP | uvronline [Metrics](https://mvsep.com/quality_checker/entry/9013) better than even fused model gabox + aufr33/viperx and SCNet IHF below).\n\n###### Make sure you don’t have the option “Vocals only” checked in UVR.\n\n“After dozens of tests I can tell this (...) is the best (better harmony detection, better differentiation between LVs and BVs, sounds fuller, less background roformer bleed, better uncommon panning handling etc)” - dca\n\n“it also can detect the double vocals” - black\\_as\\_night\n\nIt works the best for some previously difficult songs. Aufr33 and viperx model seems more consistent, but the new BS is still the best in overall - Musicalman\n\n“my og Mel also catches some of the FX/drums, I guess quite a difficult one due to how it’s mixed” - becruily\n\n“it does do better on mono than previous\n\nsometimes confuses which voice should be the lead, but all models do that on mono in the exact use-case I normally test” - Dry Paint Dealer Undr\n\n“In my opinion, this model is in no way inferior to the ViperX (Play da Segunda) — it's really very good. (...) I noticed that in the separation, the first voice still appears mixed with the second. The second voice, however, stands out more, but not completely isolated—in some passages, it still appears alongside the first. In short: the model better separates the second voice, but still presents some mixing between them.” - fabio5284\n\n“The new karaoke model doesn't actually differentiate between lvs & bvs and there's some lead vocal bleeding in the instrumental stem” - scdxtherevolution\n\nFixes and expansion to the dataset and retrain of the model possible in the future.\n\n“The dataset isn't correctly labelled, so in some training examples it was literally training the model to treat the backing vocal as the lead” - frazer\n\nVS the newer BS-Roformer MVSEP team model above: “sound isn't as clear, but it does an infinitely better job at telling lead/BGV apart”\n\nBecruily:\n\n“I want to remind something regarding my (and the frazer) models\n\nthey're made to separate true lead vocals, meaning either all of the main singer's vocals, or if it's multiple singers - theirs too\n\nthis means if the main singer has stuff like adlibs on top of the main vocals, these are considered lead vocals too - they go together\n\nif there are multiple singers singing on top of each other, including harmonise each other, and if there are additional background vocals behind those - all the singers will be separated as one main lead vocal, leaving only the true background vocals”\n\nthink of them like concert ready models - the output instrumentals will be ready to play in cases where all main vocalists are going to sing on top of the karaoke instrumental\n\nps: and yes, double/stereo lead vocals are still lead vocals, they're not bgvs (only in rare cases)\n\nps 2: if there are two singers singing the same melody and they don't harmonise each other - the model will most likely consider both singers as one lead vocal (again in rare cases one singer could be left) ”\n\n######\n\n###### - (MVSEP) “New Karaoke model based on SCNet XL IHF was added on site in \"MVSep MelBand Karaoke (lead/back vocals)\". Name of model \"SCNet XL IHF by becruily (SDR: 9.53, [metrics](https://mvsep.com/quality_checker/entry/8962))\". It has slightly worse metrics than the top Roformer model, but since it's different architecture it can give better results in some cases where the Rofo failed.\n\n###### Demo: <https://mvsep.com/result/20250908072226-f0bb276157-mixture.wav>” - ZFTurbo\n\nIirc it's BVE or IHF unpublic ZFTurbo model retrain, and ckpt won't be public till further notice, as becruily said.\n\n“SCNet is more bleedy in general despite me trying to reduce the leakage\n\nit's recommended for busy songs, often captures proper lead vocals better than Roformer. Another use case is to ensemble it with Roformer to improve fullness” - becruily\n\n“Oh, might be related to the lead vocals panning, it seems this model doesn't like when it's not center (...) I'm indeed noticing this model works really great on some songs that the Mel Rofo Karaoke had trouble with (...) I noticed that, this model, instead of creating crossbleeding between LVs and BVs, make them both quieter. I prefer that compared to previous models Plus, it handle songs which have lead vocals in the sides and BVs also in the sides better”\n\nTo fix bleed in back-instrum stem, use “Extract vocals first, but, “I noticed a pattern that if you hear the lead vocals in the back-instrum track already (SCNet bleed), dont try to use Extract vocals first because there will be even more lead vocal bleed” - dca\n\n“Separates lead vocals better than Mel-Roformer karaoke becruily. It's not perfectly clean, sometimes a bit of the backing vocals slips through, but for now, scent karaoke model still the most reliable for lead vocals separation (imo)\n\n<https://pillows.su/f/df8c1791bceba5fe3ef6b16d310ec123>\n\n<https://pillows.su/f/e1272a02c56e3d3eb7ba4007bbb0c4bd>” - neoculture.\n\n“the model seems to handle mono vocals better than melband but isn't as clean, lot of bleed” (extract vocals first was also used to test this) - Dry Paint Dealer Undr\n\nSince the Mel Kar Becruily's model, the dataset is “larger” now, but still not “great”, and it might get eventually fixed, becruily said.\n\n###### - (MVSEP) Four “new models for independent instruments were added:\n\n###### 1) MVSep Viola (viola, other) Demo: <https://mvsep.com/result/20250907234931-f0bb276157-mixture.wav>\n\n###### 2) MVSep Cello (cello, other) Demo: <https://mvsep.com/result/20250907235225-f0bb276157-mixture.wav>\n\n“quite impressive” - dynamic64\n\n###### 3) MVSep Trumpet (trumpet, other) Demo: <https://mvsep.com/result/20250907235543-f0bb276157-mixture.wav>\n\n“I can't get over how good the trumpet model is, it's so cleannn” - Shintaro\n\n“trumpet struggles a bit on muted trumpet” - dynamic64\n\n4) MVSEP Strings BS-Roformer (strings, other)\n\nDemo: <https://mvsep.com/result/20250907225920-f0bb276157-mixture.wav>\n\n###### The SDR has increased significantly compared to the previous MDX23C model, from 3.84 to 5.41. It is currently the best model on the leaderboard: <https://mvsep.com/quality_checker/leaderboard/strings/?sort=strings>” - ZFTurbo\n\n“From some quick testing, it does not disappoint. Still playing with it, but atm it's exactly what I hoped for.” - Musicalman\n\n“Yeah, I’m running some tests too with a few tracks that were really hard to separate, mostly ones with cellos or vocals that were too blended with the strings to isolate even with the latest inst/voc models, and it’s been working out surprisingly well.”\n\n###### - anvuew released experimental BS-Roformer vocal model (nfft 4096, stft\\_hop\\_length 1024 “so not that large”) with 12 SDR measured on musdb18hq dataset. Might be worth checking: [download](https://drive.google.com/drive/folders/18Va9tQqh_mkUHe_R2syl19bB9q_zzUre?usp=drive_link) (dead; [newer model](https://huggingface.co/anvuew/BS-RoFormer/tree/main) was released since then).\n\n###### 11.60 SDR on the same test set was previously achieved by one of the first Mel-Roformers trained by Bytedance on musdbhq + 500 songs ([paper](https://arxiv.org/pdf/2310.01809)), although it wasn't nfft 4096.\n\nIt uses a very high 1024000 chunk\\_size in the yaml, so consider decreasing it when having memory issues, 500MB ckpt size.\n\n######\n\n###### - introC released a python [script](https://file.garden/Z3gSJFxsb21HAqp6/scripts/v1ep_resonance_remover.zip) to get rid of vocal leakage in v1e+ model\n\n######\n\n###### - iZotope released Ozone 12. Separation still has Spleeter-like quality, but “it's unclear what they use” - Spleeter references disappeared from their readme (jarredou).\n\n“the stems sound very bleedy and not at all usable” - becruily.\n\nA notable new feature working competitively is their Delimiter.\n\n######\n\n###### - Ableton received its own stem separation feature in Live 12.3. It’s made in cooperation with Moises.ai. <https://www.youtube.com/watch?v=uSahY-HGKt4>\n\n“doesn’t even sound good” - isling\n\nIt doesn’t use GPU, and has a slower High Quality setting too (single model for each stem opposing to default multi stem), but it can take even 20 minutes for 1 minute file on a slower CPU. At least default sounds more similar to Demucs than Roformers or SCNet archs, although files look like BS-Roformer judging by memory dump (model files are encrypted). [Here](https://mvsep.com/quality_checker/entry/8977) are the low default model stems metrics - e.g. vocals only 8.71 SDR, but HQ option has bigger SDR than public SCNet weights released by ZFTurbo in MSST repo, but they're on a bleedless metric side, fullness is lower than in the public undertrained SCNet XL [4 stem](#_sjf0vefmplt) model. Average SDR of the first 12 songs in the multisong dataset vs public SCNet XL: drums: 11.58 vs 11.22, bass: 12.25 vs 11.27 (thx jarredou).\n\n“The boring thing is that you have to launch separation for each file manually (no batch processing). To nice things is that the separated stems are automatically saved individually in folder (no need to export them individually through Live's rendering and all issue that this can produce; different length...)” - jarredou\n\n- It seems like we’ve received a step-by-step tutorial how to install the new Nvidia’s upscaler: [click](https://discord.com/channels/708579735583588363/814405660325969942/1412356906030334003) (thanks Pipedream)\n\n- “I added BS Roformer flute model. It's available in \"MVSep Flute (flute, other)\". It superior comparing to SCNet version. SDR: 9.45 vs 6.27. More than 3 SDR difference.\n\nExample: <https://mvsep.com/result/20250830211041-f0bb276157-mixture.wav>” ZFTurbo\n\n######\n\n###### - Thanks to Essid, metrics for following instrumental models were added to the models [list](#_2vdz5zlpb27h): INSTV7N, inst\\_fv8 (v2), inst\\_gabox3, Rifforge model, older mesk’s metal model, FVX, Bv1, Bv2 (b - bleedless, v - for version)\n\n######\n\n###### - “New Wind model based on BS Roformer has been added in MVSep Wind (wind, other):\n\n###### Demo: <https://mvsep.com/result/20250829230056-f0bb276157-mixture.wav>\n\n###### Results on quality checker: <https://mvsep.com/quality_checker/entry/8933>\n\n###### It increased SDR +2.5 comparing to previous best model.” - ZFTurbo\n\n“this one does not disappoint. At least not with the stuff I've tried so far. (...) the improvement is most noticeable with orchestral music. In heavy mixes eg. with lots of strings, the old models trip out. [The] new one is a lot more robust.” - Musicalman\n\n“the model is not only cleaner but also detects some wind instruments that the previous one couldn't (specially baritone saxophones, I need to test it a bit more)” - smilewasfound\n\n“the bs roformer wind model does really well with the other result and the violin model really is quite useful” - dio7500, dynamic64\n\n###### - Suno now has stem separation feature “t's generative, so the separation isn't exact. Also, you apparently can't use it on like famous songs because they'll get flagged.” - Musicalman\n\n”it sounds like shit tbh, tried it out” - dynamic64\n\n###### - Gabox released experimental inst Mel-Roformer [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Fullness.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) called just “fullness”.\n\n“this isn't called fullness.ckpt for nothing.” - Musicalman\n\nInst. fullness: 37.66, bleedless: 35.53, SDR: 15.91 (thx Essid)\n\n######\n\n###### - (MVSEP) “We added 2 new algorithms for Acoustic Guitar (based on BS Roformer) and for Flute (based on SCNet XL)\n\n######\n\n###### 1) `MVSep Acoustic Guitar (acoustic-guitar, other)` Example: <https://mvsep.com/result/20250825095613-f0bb276157-mixture.wav>”\n\n“excellent, it's separating acoustic from electric very well, even in fuzzy, lo-fi recordings” - Input Output (A5)\n\n“outperforms moises' model like crazy” - Sausum\n\n###### “2) `MVSep Flute (fulte, other)` Example: <https://mvsep.com/result/20250825095856-f0bb276157-mixture.wav>” - ZFTurbo\n\n“I tried the fulte model on stairway to heaven and it was so disappointing” - santilli\\_\n\n######\n\n###### - We have the first lucky person on the server who succeeded to actually use the new NVIDIA’s upscaler, and their messy AF code on Windows using Docker.\n\nThe output is mono, so you need to process each channel manually.\n\nAlso, it's extremely slow, even on 4070 Super, but results are “impressive”. [More](https://discord.com/channels/708579735583588363/814405660325969942/1409223906828750878) (don't expect step-by-step tutorial for now because the guy is “not tech support”).\n\n######\n\n###### - Unwa released [BS-Roformer-Inst-FNO](https://huggingface.co/pcunwa/BS-Roformer-Inst-FNO) model (incompatible with UVR, use [MSST](#_2y2nycmmf53) and read special model installation instruction below).\n\ninst. bleedless: 42.87, fullness: 32.03, SDR: 17.60\n\n“very small amount of noise compared to other fullness inst models, while keeping enough fullness IMO. I don't even know if phase fix is needed. Maybe it's still needed a little bit.” dca\n\n“seems less full than resurrection, which I would expect given the MVSEP [metric] results. (...) I'd say it's roughly comparable to gabox inst v7”\n\n“I replaced the MLP of the BS-Roformer mask estimator with FNO1d [Fourier Neural Operator], froze everything except the mask estimator, and trained it, which yielded good results. (...) While MLP is a universal function approximator, FNO learns mappings (operators) on function spaces.”\n\n“(The base weight is Resurrection Inst)”\n\n*Installing the model - instructions*:\n1. For Pytorch newer than 2.6, replace in utils folder by this [models\\_utils.py](https://drive.google.com/file/d/1SeCG31kz6yDa2A9gPS0YVfOGM-lAdYt7/view?usp=sharing) (neoculture), or edit it manually:\n\n“I had many errors with torch.load and load\\_state\\_dict, but I managed to solve them.\nPyTorch 2.6 and later have improved security when loading checkpoints, which causes the problem. torch.\\_C\\_.nn.gelu must be set to exception”\n> “Add the following line above torch.load (at utils/model\\_utils.py line 479; 531/532 in updated MSST - old one doesn’t have utils folder and that py file):\n\nwith torch.serialization.safe\\_globals([torch.\\_C.\\_nn.gelu])\n\n- unwa\n\n> Or use PyTorch older than 2.6.\n\nFor old MSST without utils/model\\_utils.py replace that [inference.py](https://github.com/deton24/Music-Source-Separation-Training/blob/main/FNO/inference.py) in the root MSST directory.\n\n2. (linked [model card](https://huggingface.co/pcunwa/BS-Roformer-Inst-FNO) for reference).\n\nReplace this [bs\\_roformer.py](https://github.com/deton24/Music-Source-Separation-Training/blob/main/models/bs_roformer/FNO/bs_roformer.py) in models\\bs\\_roformer folder, or edit it manually:\n\n“You need to replace the entire \"MaskEstimator\" class in original bs\\_roformer.py from ZFTurbo (in models/bs\\_roformer folder) with the code provided by unwa [indention error fixed].\n\n3. Also install this lib <https://pypi.org/project/neuraloperator/>” so:\n\n“pip install neuraloperator==1.0.2”.\n\n“Please note that since FNO1d appears to have been removed in the new version of neuraloperator, you will need to install an older version. [so not current 2.x]” - unwa\n\n4. “Errors may also occur when using load\\_state\\_dict. In such cases, specify strict=False as an argument.(at utils/model\\_utils.py line 532)”\n\n6\\*. To not affect functionality of other BS-Roformer models by that file, so older BS-Roformers will still work, you can add it as new model\\_type by editing utils/settings.py and models/bs\\_roformer/init.py [here](https://imgur.com/a/dkGXo2r) (thx anvuew).\n\nFor error while installing the bs\\_roformer.py file in Sucial’s WebUI:\n\nfrom models.bs\\_roformer.attend import Attend\n\nModuleNotFoundError: No module named 'models'\"\n\nThe fix: “SUC-DriverOld/MSST-WebUI use the name \"modules\" and ZFTurbo/Music-Source-Separation-Training use the name \"models\". And Unwa's bs\\_roformer.py that you replace with, also use \"models\". So you'll have to do some coding and symlink to make it work.” - fjordfish\n\n7\\*. Seems like [MSST](#_2y2nycmmf53) might have some issues with GPUs other than corresponding archs to RTX 5000, 4000, 3000, H100, H200 or maybe using ROCm, resulting in SageAttention error, forcing slower CPU separation.\n\nIn that case, ensure you have compatible CUDA/torch/torchvision/torchaudio installed:\n\nCompatible CUDA version requirement for GTX 1660 is 10 (e.g. on GTX 1060, Torch 2.5.1+cu121 can be used), but pip doesn’t find such package of Torch. To fix it:\n\n\\*a) Check out index-url method described below:\n\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 torchvision torchaudio —-extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nand\n\npip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nReplacing cu118 with newer cu121 or even 129 seems to give proper working URL too.\n\nMaybe replacing 2.3.0 with 2.3.1 will work too.\n\n\\*a2) Alternatively, you can try to install it from [here](https://download.pytorch.org/whl/torch_stable.html) from wheels by the following command:\n\n“pip install SomePackage-1.0-py2.py3-none-any.whl” - providing full path with the file name should do the trick. Just for the location with spaces, you also need \" \".\n\nOn GTX 1660 and Turing GPUs, you might seek for e.g. cu121/torch-2.3.1\" and those various CP wheels (there are no newer versions).\n\nJFYI, the official PyTorch page: <https://pytorch.org/get-started/previous-versions/>\n\nlacks links for CUDA 10 compatible versions for older GPUs other than v1.12.1 (which is pretty old, and might be a bit slower if even compatible at all), so the only way to install newer versions for CUDA 10 is the --extra-index-url trick, as executing normally “pip install torch==2.3.0+cu118” will end up with the version not found error.\n\n\\*b) You might still have SageAttention not found error. Perform the following:\n\n“Had to replace cufft64\\_10.dll from C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\torch\\lib\n\nby the one from C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin”\n\nIt is even compatible with the newest Torch 2.8.0 (if you followed the instruction to fix the dict issue above) if you grab that apparently “ fixed version of cufft64\\_10.dll from CUDA v10.0” - dca\n\n~~“I guess it's possible to use it with the Colab inference custom model one, you run install cell, install the neuralop thing with \"!pip\" in a cell code (on Colab, system command needs the \"!\" before them), then edit the existing Roformer code accordingly to unwa's guidelines on his repo” - jarredou~~ [Tried](https://discord.com/channels/708579735583588363/1310358701831487508/1451481765158977547), doesn’t work.\n\nIf you want to train with FNO1d or Conformer, you might check out this [repo](https://github.com/dillfrescott/mvsep-beta/tree/main).\n\n- Turns out Duality model is very good for pops and clicks of 45 RPM vinyl, moving them to instrumental stem (bratmix)\n\n- Google broke installing dependencies in many Colabs.\n\nFor the inference Colab by jarredou, see [here](https://discord.com/channels/708579735583588363/1310358701831487508/1407544876026957966) for troubleshooting (pushed the changes - not tested, might be useful for other Colabs; fixed).\n\nIn the case of AudioSR, use [huggingface](https://huggingface.co/spaces/Nick088/Audio-SR).\n\n- NVIDIA released their own audio upscaler, with also an ability of inpainting (so it can fill short silences between damaged segments of audio).\n\n<https://github.com/NVIDIA/diffusion-audio-restoration>\n\nBut maybe don’t try it out the upscaler just yet, as the code is currently so messy and difficult to deploy e.g. on MacOS, that it took 9 hours for two of our users, and they still didn't succeed even with help of AI chats. And to make it work on Colab, the code needs to be completely rewritten, jarredou says.\n\n- mesk released a beta version of his metal Mel-Roformer fine-tune instrumental model called “Rifforge” focused more on bleedless.\n\n“training is still in progress, that's why it's a beta test of the model; It should work fine for a lot of things, but it HAS quirks on some tracks + to me there's some vocal stuff still audible on some tracks, I'm mostly trying to get feedback on how I could improve it” [known issues](https://discord.com/channels/708579735583588363/708580573697933382/1405719212592464003).\n\n<https://drive.proton.me/urls/5XM3PR1M7G#F3UhCU8RDGhX>\n\n- Custom model import Colab might have currently some issues with the model above. Probably, using that old version will work (at least locally).\n\n\"My old MSST repo I'm using, but I removed all the training stuff\n\n<https://drive.proton.me/urls/P530GFQR4W#VCAsF0E1TPje>\n\npip install -r requirements.txt (u gotta have Python and PyTorch installed as well) for the script to work.\n\nYou just gotta put all the tracks you want to test on in the \"tracks\" folder then double-click on \"inference.bat\" to run the inference script\n\nit's like if you were to type in the command in cmd, but it's simpler, and I'm lazy\" - mesk\n\n- Shared bias added during weight conversion was removed from the SW model, making it compatible with UVR and normal MSST repo code (it was just a leftover not doing anything, just zeroes). Also, delete the shared bias line from the yaml.\n\nAlso, it was possible to trim the model size to have only vocals (although it probably can be achievable quicker in the config). mask\\_estimators.0 is responsible for vocals (each mask estimator is responsible for the other stem).\n\n- The new violin model on MVSEP sometimes does better than the strings model for strings (dynamic64)\n\n- Aufr33’s Mel-Roformer Denoise average variant ([link](https://mega.nz/file/vM4mHTYQ#f_uCxxS_olfTR4iAsOc-XS6sfUecfbF-ZKXrk3IjbnY) | [yaml](https://drive.google.com/file/d/1uwInhwgjOMIdOMTgj_oNR_dmaq7E-b3g/view?usp=sharing) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)) can be also used for crowd removal (Gabox)\n\n- (MVSEP) “I released a new MVSep Violin (violin, other). It’s based on BS-Roformer model with SDR: 7.29 for violin on my internal validation.\n\nLink: <https://mvsep.com/home?sep_type=65>\n\nExample: <https://mvsep.com/result/20250809120109-f0bb276157-mixture.wav>”- ZFTurbo\n\n“I've only played around with it a little bit, but it can even separate violin quartets from cellos, so cool.” - smilewasfound\n\n“Very neat model. (...) Sometimes the model does seem to pick up more than just violins imo, but yeah for separating high strings in particular it is really cool.” - Musicalman\n\n- MVSEP now has also official YouTube channel:\n\n[https://www.youtube.com/@MVSEP](https://www.youtube.com/%40MVSEP)\n\n- Issues with <https://huggingface.co/spaces/TheStinger/UVR5_UI> have been fixed.\n\nMirror is still functional: <https://huggingface.co/spaces/qtzmusic/UVR5_UI>\n\n- Unwa BS-Roformer Resurrection instrumental model added on MVSEP and on uvronline with these links for [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) accounts.\n\n- Gabox released experimental voc\\_fv6 [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv6.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml)\n\n“Sounds like b5e with vocal enhancer. Needs more training, some instruments are confused as vocals” - Gabox. “fv6 = fv4 but with better background vocal capture” - neoculture\n\nbleedless: 26.61 | fullness: 24.93 | SDR: 10.64\n\nFor comparison:\nSCNet XL very high fuillness on MVSEP has followin metrics:\n\nVocals bleedless: 25.30, fullness: 23.50, SDR: 10.40\n\n- yt-dlp and their frontenteds like cobalt.tools are currently defunct. It might affect some Colabs YT downloading features, although JDownloader 2 still works.\n\n- The below model added on x-minus/uvronline\n\n<https://uvronline.app/ai?discordtest> - free accounts\n\n<https://uvronline.app/ai?hp&test> - premium accounts\n\n- Unwa released a new BS-Roformer Resurrection instrumental [model](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst.ckpt) | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nSDR: 17.25, bleedless: 40.14, fullness: 34.93\n\nCompatible with UVR (model type v1). “Fast model to inference (204 MB only)”.\n\n“One of my favorite fullness inst models ATM. Sounds like v1e to me, but cleaner. Especially with guitar/piano where v1e tended to add more phase distortion, I guess that's what you'd call it lol. This model preserves their purity better IMO” - Musicalman\n\n“the way it sounds, is indeed the best fullness model, it's like between v1e and v1e+, so not so noisy and full enough, though it creates problems with instruments gone in the instrumental sadly, but apparently it seems Roformer inst models will always have problems with instruments it seems, seems like a rule. (...) Instrument preservation (...) is between v1e and v1e+ (...) Fixes crossbleeding of vocals in instrumental in a lot of songs, compared to previous models (...) No robotic voice bug at silent instrumental moments” - dca100fb8\n\n“Some songs leaves vocal residue. It is heard little but felt” - Fabio\n\n“Almost loses some sounds that v1e+ picks up just fine” - neoculture\n\nMushes some synths a bit in e.g. trap/drill tune compared to inst Mel-Roformers like INSTV7/Becruily/FVX/inst3, but the residues/vocal shells are a bit quieter, although the clarity is also decreased a bit. Kind of a trade.\n\nSo far, none models work for phase fixer/swapper besides 1296/1297 by viperx and unwa BS Large V1 to alleviate the remaining noise. ~ dca. SW model not tested.\n\nLess crossbleeding than paid Dango 11.\n\n- Gabox released a bunch of new models:\n\na) Gabox Inst\\_ExperimentalV1 model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)\n\nb) Gabox Kar v2 Mel-Roformer | [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Karaoke_GaboxV2.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/karaokegabox_1750911344.yaml)\n\nSDR is very similar with the v1 Gabox model: 9.7699 vs 9.7661.\n\nLead:\n\nbleedless: 27.58 vs 28.18, fullness: 15.24 vs 14.79\nBack-instrum:\nbleedless: 50.67 vs 50.74, fullness: 32.46 vs 32.84\n\n(but you’ll most likely get better results with Gabox denoise/debleed Mel-Roformer model instead ~Gabox, but it can’t remove vocal residues\n\n[model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/denoisedebleed.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet))\n\nc) Gabox Lead Vocal De-Reverb Mel-Roformer | [DL](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Lead_VocalDereverb.ckpt) | [config](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/karaokegabox_1750911344.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“just use it on the mixture” - Gabox, “sounds great” - Rage313\n\nSometimes removes back vocals, especially if they're panned to the sides.\n\n“(...) also a vocal/inst separator. Dry vocals go in vocal stem, everything else goes to reverb. Don't think anvuew's models do that.\n\nI might still preprocess with vocal isolation before dereverb. But only really worth it if you're after high fullness vocals.” - Musicalman\n\n- Issues with <https://huggingface.co/spaces/TheStinger/UVR5_UI> occur.\n\n“We have a problem with Zero GPU atm, waiting for a fix from HF staff\n\nisn't related to the code or last commit” - Not Eddy\n\nMeanwhile, you can use: <https://huggingface.co/spaces/qtzmusic/UVR5_UI>\n\n- (MVSEP) Now 32-bit float for WAV will be used only if gain level falls outside 1.0 range to prevent clipping, otherwise 16 bit PCM will be used, when it won't occur. If you really need it anyway, 32-bit float output for all files unconditionally is available for paid users.\n\nIf you have troubles with nulling due to the new changes in free version, consider decreasing volume of your mixtures by e.g. 3-5dB, and you won’t be affected, although it might slightly affect separation results.\n\nAlso, FLAC now uses 16-bit instead of 24-bit.\n\n- (MVSEP) “Sometimes we have complaints on speed from different parts of the world. The best way is to use VPN to solve them.” - ZFTurbo\n\n- Gabox’ voc\\_Fv5, Inst\\_GaboxFv7z, Unwa’s voc Resurrection, voc\\_gabox2 and the new jarredou’s drumsep 5 stem added to the [inference Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n(Resurrection [now with the config] and Fv7z fixed). Also, previously released anvuew’s mono dereverb added.\n\n- If you feel overwhelmed by this GDoc’s list, isling released his own, shorter version with recommended models for audio separation - [click](https://docs.google.com/document/d/19A9Z32LgqTUdI7z0LUAoGXjm07BO-IkRNS_QNnUxaYA/edit?usp=sharing).\n\n- Also, [here](https://docs.google.com/document/d/1tSOkau6iZ8DxenKs8ZD-7gmgEc5GaXuOSfl9ULioCcs/edit?usp=sharing) you’ll find an excerpt of the current document with only models and their links, if you find it hard to navigate through the whole document (edit. 24.07.25)\n\n- (MVSEP) BS-Roformer 2025.06 described previously below received two updates (11.81 -> 11.86 and 11.86>11.89) and it has been changed to:\n\nBS-Roformer 2025.07. [Full metrics](https://mvsep.com/quality_checker/entry/8693).\n“All Ensembles and models where this model is involved improved a little bit too.” - ZFTurbo\n\nMVSEP Multichannel BS feature started using 11.81 model at some point, now sure if now uses 11.89.\n\nIf you tried achieving results any similar to BS-Roformer 2025.07, you could potentially try out [splifft](https://github.com/undef13/splifft/releases) or its [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Colab_Inference_BSRofo_SW_fp16.ipynb). If you fail to use Spliff check the [model](https://drive.google.com/drive/folders/1ee9HBdwygactWLi_7hdZiFgFNv45Y22m) before conversion on MSST repo ([more](https://drive.google.com/file/d/1mHbBZGcjXHwVfV5hxfyLY2d6ZhaZofkB/view?usp=sharing))\n\n- (MVSEP) Gabox INSTV7 instrumental model added\n\n- (MVSEP) MelBand Karaoke (lead/back vocals) Gabox model added (SDR: 9.67)\n\n- Fused model of Gabox and Aufr33/viperx weights 0.5 + 0.5 added (SDR: 9.85)\nIt gives maybe only slightly worse results than normal ensembling, but with separation time of just one model “it doesn't have the same quality and definition as Gabox Karaoke, fused doesn't separate well.” - Billie O’Connell.\n\nYou can perform fusion of models using [ZFTurbo script](https://drive.google.com/file/d/18E5uTSVJV6rn8gTsOc0RC1m12lJFJGmP/view?usp=sharing)([src](https://discord.com/channels/708579735583588363/1220364005034561628/1386610707042271243)) or by [Sucial script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) (they’re similar if not the same). “I think the models need to have at least the same dim and depth but I'm not sure about that” - mesk.\n\nDespite the higher SDR, the fusion model seems to confuse lead/back vocals more.\n- The same goes to public Karaoke fusion models released by Gonzaluigi [here](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-3094728853)\n\n- Gabox released Mel-Roformer [voc\\_gabox2](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox2.ckpt) vocal model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVocal bleedless: 33.13, fullness: 18.98, SDR: 10.98\n\n- Unwa released a BS-Roformer vocal model called \"[Resurrection](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection.ckpt)\" | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection-Config.yaml) which shares some similarities with the SW model (might be a retrain). The default chunk\\_size is pretty big, so if you run out of memory, decrease it to e.g. 523776.\n\nVocal bleedless: 39.99, fullness: 15.14, SDR: 11.34.\n\n\"Omg, this model is doing a really good job at capturing backing vocals (...)\n\nHonestly, it sounds a bit muddy, and there's some instrumental bleeding into the vocal stems\" neoculture\n\nNot so good for speech denoising unlike some other models (Musicalman).\n\n- mesk’s training guide updated and [link](https://docs.google.com/document/d/1jUcwiPfrJ8CpHqXIRHuOu70cFDMv_n-UzW53iaFuM9w/edit) changed\n\n- (MVSEP) New All-in and 5 stem ensembles have been added for paid users\n\n- AudioSR WebUI [Colab](https://colab.research.google.com/drive/1e9CjxMDyYnggKKQBPpGLufuVzt_yJJrL) by Sir Joseph got fixed\n\n- MVSep Ensemble 11.93 (vocals, instrum) (2025.06.28) added.\nEventually surpassed sami-bytedance-v.1.1 on the multisong dataset SDR-wise.\n\nInstrumental bleedless: 47.65, fullness: 28.76, SDR: 18.24\n\nVocal bleedless: 36.30, fullness: 17.73, SDR: 11.93\n\n- corrected typos in the metrics (thx yakomotoo)\n\n- (MVSEP) MultiChannel now uses 11.81 BS Roformer model.\n\n- (MVSEP) New BS Roformer model is now available on site - it’s called 2025.06 (don’t confuse it with SW).\n\nVocals bleedless: 48.59, fullness: 27.85, SDR: 11.82\n\nInstrumental bleedless: 37.83, fullness: 17.30, SDR: 18.12\n\n“It has +0.5 SDR to the previous best [24.08] model. We reached ByteDance's best model quality [only 0.1 SDR difference). It is also TOP1 on the Synth dataset. It's balanced between both [instrumental and vocals]. I used metal dataset during training as well\"\n\nCompared to previous models, picks up backing vocals and vocal chops greatly where 6X struggles, and fixes crossbleeding and reverbs where in some songs previous models struggled before. Sometimes you might still get better results with Beta 6X or voc\\_fv4 (depending on a song). “Very similar to SCNet very high fullness without the crazy noise” - dynamic64, “handles speech very well. Most models get confused by stuff like birds churping (they put it in the vocal stem), but this model keeps them out of the vocal stem way more than most. I love it!”\n\n“not a fan of the inst result. I feel like unwa and gabox sound better despite being less accurate” - dynamic64. Might be better than Fv7n “I think gabox tends to sound better but the new BS-Roformer is more accurate” dynamic64, “instrumentals are muddy” - santilli\\_,\n\n“I think the Gabox [fv7n] model sounded more crispier than BS” - REYYY. “[voc\\_]fv4 sounds better” - neoculture, “instrumentals sound very good” - GameAgainPL.\n\n“it did things i never thought it could before” “this model is insane wtf (...) never seen a model accurately do the ayahuasca experience before” - mesk.\n\n“the first model to not produce vocal bleed in instrumental for \"Supersonic\" by Jamiroquai (not even Dango does it). It is also the case with \"Samsam (Chanson du générique)\" and \"Porcelain\" by Moby.” and \"In the Air Tonight\" by Phil Collins, also “removes very most of Daft Punk vocoder vocals\" - dca. “my new favorite for vocals. It sounds fantastic” - dynamics64. “for the first time ever it managed to remove the reverb from one specific song. it is not perfect, but still much better than previous attempts” - santilli\\_\n\n“It even seems to handle speech very well. Most models get confused by stuff like birds churping (they put it in the vocal stem), but this model keeps them out of the vocal stem way more than most. I love it!”. “sometimes 6x is better sometimes bs is better” - isling “for me it's picked up a lot that 6x hadn't for backing vocals\n\n- Using this [repo](https://github.com/RyanMetcalfeInt8/Music-Source-Separation-Training/tree/openvino_conversion/openvino_conversion), you can convert Mel-Roformers, HTDemucs and Apollo models to OpenVINO (so to onnx)\n\n- Lew, if you read it, some guy wants to add your Apollo uni model into a plugin for OpenVINO and Intel’s HF, but the model lacks an open source licence. If you could re-release it with the proper licence, it would be appreciated. [More](https://github.com/intel/openvino-plugins-ai-audacity/discussions/356#discussioncomment-13600033)\n\n- (x-minus/uvronline) “I added two new models to remove vocals and hid a few old ones.\n\nSo there are now only three main models in the menu for different purposes:\n\nMel-RoFormer by Gabox Fv7z - best bleedless, good fullness, almost noiseless\n\nMel-RoFormer by unwa v1e+ - best fullness, average bleedless\n\nMel-RoFormer unwa big beta6x - best vocals\n\nOlder models are still available at the link:\n\n<https://uvronline.app/ai?hp&test> (premium)\n\n<https://uvronline.app/ai?test> (free)” - Aufr33\n\n“Oh! Lead vocal panning has been added for Mel Kar Old! (...)\n\nAlong with MDX Kar old and UVR Kar old to the test page!!” - dca\n\n- Gabox released a new experimental [Karaoke model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/karaoke). It’s one stem target so keep extract\\_instrumental enabled for the rest stem.\n\n“really hard to tell the difference between this and becruily's karaoke model” minus the latter has more target stems.\n\n- jarredou released his new MDX23C drumsep 5 stem model, which is public for everyone to [download](https://github.com/jarredou/models/releases/tag/DrumSep). All SDR metrics are better than the previous model (“on kick/snare/toms it's around +2 SDR better than previous version”):\n\nSDR: kick: 16.66, snare: 11.54, toms 12.34, hihat: 4.04, cymbals: 6.36 ([all metrics](https://mvsep.com/quality_checker/entry/8460)).\n\nMetric fullness for snare: 25.0361, bleedless for hh: 12.3470, log\\_wmse for snare: 13.8959\n\n“Quite cleaner than the previous one”, “it's more on the fullness side than bleedless”,\n\nFrom all the metrics, only bleedless for snare is worse than in the previous model:\n26.8420 vs 30.4149 and indeed “snare has a bit of bleed sometimes” - isling, “as well as cymbals bleed in hi hat track, but the stems sound clean” - dca.\n\n“a lot noisier than other drumpsep models, but that's not necessarily a bad thing.”\n\n“Surprisingly, it's the 2nd best model for hi hat and 2nd best model for cymbals on mvsep leaderboard. It's a bit biased because ZF's top mel model is 4 stem only.”\nFor comparison, [metrics](https://mvsep.com/quality_checker/entry/8195) of the old 6 stem jarredou/Aufr33 MDX23C model\n\n(which has cymbals divided into ride and crash which are not evaluated):\n\nSDR: kick: 14.55, snare: 9.79, toms: 10.64, hihat: 3.20, cymbals: 6.08\n\nMetric fullness for snare: 25.0361, bleedless for hh: 10.2765, log\\_wmse for snare: 12.4258\n\nThe model was trained with a lightweight config to train on a subpar T4 GPU on free Colabs and 10 accounts (“CRAAZY fast” for inferencing). The metrics do not surpass exclusive drumsep Mel-Roformer and SCNet models on MVSEP, but at least you can use this one locally.\n\n“Most of the issues with my model are already known issues with mdx23c arch, it's bleedy and has band splitting artifacts. Like I said a few days ago, if I would have to redo it now, it would have probably gone with SCNet Masked. It's using 4x times lower n\\_fft resolution than InstVocHQ while using 2 times longer chunk\\_size (and with MDX23C, whatever number of stems, it's the same inference speed). A bit like the fruit’s model is doing”.\nTrained on 511 tracks, MVSEP models were trained on almost the same dataset.\n\nMaybe if we separate just snare with the old MDX23C model from an already separated drums stem, and mix/invert to get the rest, then pass it through the new model, the bleed would be gone.\n\nRemember that you need already separated [drums](#_sjf0vefmplt) in one track to use this model effectively.\n\nAbout used dataset: “It was around 2/3 acoustic drums and 1/3 electro drums dataset at start of training, I've added more electro drums at end of training to balance it a bit more.” - jarredou\n\n- septcoco released [macvsep](https://github.com/septcoco/macvsep/) which is “macOS client for the Mvsep music separation API”\n\n- Added Clear Voice in [speech separation](#_o6au7k9vcmk6)\n\n- Gabox released [Inst\\_GaboxFv7z](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv7z.ckpt) Mel Roformer | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)\n\nInst. fullness: 29.38, bleedless: 44.95\n\n“Focusing on the less amount of noise keeping fullness”\n\n“The results were similar to INSTV7 but with less noise” - neoculture\n\nMetrically better bleedless than Unwa v2 (although it’s even more muddy), for comparison:\n\nFullness: 31.85, bleedless: 41.73\n\n- (MVSEP) “I added a new SCNet vocal model. It's called SCNet XL IHF. It has a better SDR than previous versions. Very close to Roformers now\".\nVocal bleedless is the best among all SCNet variants on MVSEP. [Metrics](https://imgur.com/a/2E9rI5G).\nIHF stands for “Improved high frequencies”.\n\nVocal bleedless 28.31, fullness 17.98\n\n“certainly sounds better than classic SCNet XL (...) less crossbleeding of vocals in instrumental so far, and handle complex vocals better (...) problems with instruments, compared to high fullness one. XL high fullness remain the one without too many instruments cut”, but some difficult songs used with previous models can yield better results - dca\n\n- Great news! MVSEP now allows sorting scores on the [Multisong Leaderboard](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum&ranking_metrics=fullness) by SDR, fullness, bleedless, aura\\_stft, aura\\_mrstft, log\\_wmse, l1\\_freq, si\\_sdr.\nBe aware that Gabox (and probably sometimes becruily) used to give funny names to their evaluations, so finding proper model names on the leaderboard is sometimes impossible. But I’ve tracked down all possible models with their metrics and proper names in the [instrumentals](#_2vdz5zlpb27h) and [vocal](#_n8ac32fhltgg) models section, so no worries.\n\nAlso, metrics beside SDR are not available for old evaluations where they weren’t listed in the model details yet. You can find more info about bleedless/fullness metrics [here](#_le80353knnv5).\n\nLog WMSE metric is good “at least for drums or anything rich in low frequency content” - jarredou\n\n- Our server members send their warm regards to A5 whose account disappeared for the ~5th time :) And later reappeared weirdly mutated.\n\n- “Dango launched their new instrumental model\n\n<https://tuanziai.com/en-US/blog/684841907c8c85686c1b3da6>” It’s version 11.\n\n“there is no opportunity to try at least 3 complete tracks for free.”\n\nSome crossbleeding issues from v10 are still present, plus some songs are even getting worse results than in v10. You might want to use v1e (with phase fix) + Becruily vocal model (Max Spec) instead, although some people might still like Dango anyway.\n\n“Some tracks are fuller than Gabox v8”. Conservative mode is less full than V1e.\n\n“They have a tool called \"edit & improve\" [or “Advanced Repair tool”] that lets you use 'Conservative mode' for some of more complex parts of a song and 'Smart mode' for other parts. I find that way more convenient than processing the entire track in 'Conservative' mode.”\n\nThey plan to release a karaoke model in two months.\n\n- Gabox released a “[small](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/small_inst.ckpt)” version of Mel instrumental model for faster inference | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-small/resolve/main/config_melbandroformer_small.yaml)\nBe aware that it can have some audible faint constant residues.\n\n- ZFTurbo: “I added BS Roformer SW to \"MVSep Piano\", \"MVSep Guitar\", \"MVSep Bass\", \"MVSep Drums\" algorithms. For Bass and Drums available new Ensembles.”\n\n“drum SDR jumped by .6 on the ensemble! Atho fullness took a hit” - heuhew\n\n“Same with bass, sdr leaped but fullness shot down 4 points” - dynamic64\n\n- BS-Roformer SW 6 stem model replaced the old one.\nIirc, the model didn’t change, just the inference code. SW stands for “shared weights”\n\n“I got better drums/bass separation with that model than with any others when input is some live/rehearsal recordings with shitty sound”\n\nAlso, there’s better SDR and fullness for instrumentals when you invert vocals against mixture instead of mixing down drums/bass/other stems.\n\n- undef13 released these “bs-roformer weights stored in fp16 precision, half the size of frazer's initial version. quality is the exact same as the fp32 version”.\n\n- First vocal retrain was published shortly after a day “May not perform very well” (at least for now)\n\nIf you were to fine-tune it “this model generalizes like crazy (...) hasn't failed yet to confuse instruments and just chews through whatever you put through it (i ignore the overall mudiness)” you can retrain it to just being inst/voc model “I’m currently training it to my 2 stem (...) dataset (...) I was pleasantly surprised)” iirc on even laptop RTX 3070...\n\n- Added on MVSEP as BS-Roformer 6 stem (no clicking issues)\n\n- The new Logic Pro model has been reversed/cracked and shared as a standalone model for inference. Full metrics for all stems added later below. It has the best SDR on multisong dataset for all stems besides vocals (but still not bad).\nIt uses a BS-Reformer arch. .MIL (CoreML) model file was converted to .PT.\n\n“The only change they made was a global parameter for bias which I've never seen before so I guess it's Apple secret sauce”. No quantization was used “they had a shared bias across QKV and the out\\_proj”.\n\nSince then, weight compatible with UVR with deleted shared bias was shared (there were actually only zeroes). Also, with the mask estimator method, just one stem file can be extracted out of the full weight. Vocals only were shared, but config for UVR will rather require some tweaks.\n\nA bit scared to share it, but seek and ye shall find.\n\n“It is wonderful to achieve such results with dim 256. It seems that what was still needed was depth.”\n\nUsage for the old model-specific inference code before shared bias was deleted:\n\npython inference.py --audio\\_path=\"./sample.flac\"\n\nFor: ModuleNotFoundError: No module named 'hyper\\_connections'\n\nRun: pip install hyper\\_connections\n\n“looks like chunks aren't overlapping? Getting clicks in output.”\n\n“A very small edit, line 13:\n\nparser.add\\_argument('--chunk\\_size', type=int, default=588800)\n\n- this produces 99% identical results with the DAW.\n\nPrevious 117760 chunk size was adding clicks and was lower quality in general.”\n\nStill, the code doesn’t use overlap, and it will result in click, just less than before.\nAlso, you can run out of memory with 588800 with 5GB VRAM free.\n882000 was tested to have the biggest SDR in that model (not lower or higher).\n\nOn a CPU without an Nvidia GPU it will probably be long.\n\nThe inference script and model probably still needs the validation to ensure the metrics are the same with the [validation](https://mvsep.com/quality_checker/entry/8340) made from DAW lately, but it’s rather the same (at least other inference code got 0.03 SDR difference or same results based on the same converted weights).\n\nTo use the old model version with ZFTurbo MSST repo:\n“You need to replace bs\\_roformer.py in the repo with file from the archive (...) and change line 8 to:\n\nfrom models.bs\\_roformer.attend import Attend” and then use separately shared config for the MSST repo and the model. Using MSST repo for inferencing fixes the clicking issue.\n\nFor “unrecognized arguments” issue, “you must put your path inside quotation marks or apostrophes”.\n\n“If you're using the script GUI, be aware that the browser popup window when choosing checkpoint has some predefined extension and .pt is not part of it”\n\n- Becruily guitar model added to inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb), bleed suppressor by unwa/97chris model fixed, denoise-debleed by Gabox added, Revive 3e fixed, Revive 2 added\n\n- Gabox released [instv7plus](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/instv7plus.ckpt) bleedless model (experimental)\n\nfullness: 29.83, bleedless: 39.36, SDR 16.51\n\n- And [Inst\\_FV8b](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Inst_FV8b.ckpt)\n\nfullness: 35.05, bleedless: 36.90, SDR 16.59\n\n“Very clean” although muddier than V1E+.\n\n- wesleyr36/Dry Paint Dealer Undr HTDemucs Phantom Center model was added to the [inference Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- (Unwa) “After a long time, I'm uploading a vocal model specialized in fullness.\n\nRevive 3e is the opposite of version 2 — it pushes fullness to the extreme.\n\nAlso, the training dataset was provided by Aufr33. Many thanks for that.”\n\n[bs\\_roformer\\_revive3e](https://huggingface.co/pcunwa/BS-Roformer-Revive/blob/main/bs_roformer_revive3e.ckpt) | [config](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (should be fixed now)\nVoc. SDR: 10.98, fullness: 21.43, bleedless: 30.51\n\n- Logic Pro updated their stem separation feature, which now incorporates guitar\n\nOverall, it’s “surprisingly good” - dynamic64. And a piano separator was also added to it. [More](https://9to5mac.com/2025/05/28/logic-pro-update-adds-guitar-and-piano-stems-new-sound-packs-and-can-even-recover-tracks-you-didnt-save/)\n\n“Guitar & Piano separation seems to be really on point. So far it separated super well, also didn’t confuse organs for guitars and certain piano sounds as well.” - Tobias51\n\n“guitar model sounds better than demucs, mvsep, and moises” - Sausum\n\n“it's not a fullness emphasis or anything, but it's shockingly good at understanding different types of instruments and keeping them consistent sounding” - becruily\nYou don’t need to process L and R for bleeding across channels like in other models, there isn’t any in this one - A5\n\nFull [evaluation](https://mvsep.com/quality_checker/entry/8355) on multisong dataset (besides instrumental):\n\nSDR piano 7.79, bleedless 31.96, fullness 14.42\n\nSDR other 19.90, bleedless 58.68, fullness 49.85\n\nSDR guitar 9.00, bleedless 31.54, fullness 15.95\n\nSDR other 15.94, bleedless 49.36, fullness 31.57\n\nSDR drums 14.05 (although lower fullness than MVSep SCNet XL drums 14.26 vs 21.21),\nSDR bass 14.57 (-||-), other 8.66, vocals 11.27 (only that is not SOTA)\n\nMVSep Piano Ensemble (SCNet + Mel) has only other fullness higher: 56.96 ([click](https://mvsep.com/quality_checker/entry/7396))\n\n- Since 23.05.25 jarredou (Discord: rigo2) and dca100fb8 (Discord) also have writing privileges to this document. You can find it mirrored to this date [here](https://drive.google.com/drive/folders/1_ShCnI3Qvp2Q7l_0R55jJuXzzKRtc_Ye?usp=sharing) in docx, pdf and html.\n\n- Becruily released Melband guitar [model](https://huggingface.co/becruily/mel-band-roformer-guitar/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“Not SOTA, but much more efficient and comparable to existing guitar models, and for some songs it might work better because it picks up more guitars (though it can also pick some other instruments).\n\nFor better results you might try first removing vocals.”\n\n- (MVSEP) “We added a new GUI example to work with the MVSep API. Now it allows to use multiple files and multiple algorithms at once.\n\nIt exists as standalone .exe file, so it doesn't require python installation\n\nRepository: <https://github.com/ZFTurbo/MVSep-API-Examples>\n\nExe for Windows: <https://github.com/ZFTurbo/MVSep-API-Examples/raw/refs/heads/main/python_example5_gui/mvsep_client_gui_win.exe>” - ZFTurbo\n\nTL;DR “You can process a song with multiple models, and process multiple songs”\n\n- Sucial released v1/2 de-breath VR models:\n\n<https://huggingface.co/Sucial/De-Breathe-Models/tree/main>\n\nAlternatively, for this purpose you can also try out free/abandonware:\n<https://archive.org/details/accusonus-era-bundle-v-6.2.00>\n\n- (x-minus/uvronline) Aufr33 added new Lead and Backing vocal separator.\n\n~~It uses big beta 5e model as preprocessor for becruily Mel Karaoke model~~  “In fact, the big beta 5e model is run after becruily Mel Karaoke” Aufr33 (so you don’t need the additional step to use this separator), plus it also allows controlling option for lead vocal panning like for BVE v2 (it’s to “to \"tell\" the AI ​​where the main vocals are located (how they are mixed).”. Becruily’s model “doesn't even need Lead vocal panning a lot of the time, [the] ability to recognize what is LV and what is BV [is] impressive” - dca).\n\nThe difference from using single becruily Kar model (without preprocessor) is that, here, “you get the third track, backing vocals.”.\n\n“The new separator is available in the free version, however, due to its resource intensity, only the first minute of the song will be processed.” if you don’t have premium.\n\nBecruily:\n\n“Probably too resource-intensive, but you could try adding demudders to each step\n\n1) karaoke model + demudding\n\n2) separate vocals of bgv + demudidng\n\nBut not sure how much noise this will bring\n\n(Or even a 50:50 ensemble with BVE OG)”\n\n- Unwa released [Revive 2](https://huggingface.co/pcunwa/BS-Roformer-Revive/tree/main) variant of his BS-Roformer fine-tune of viperx 1297 model\n\nVoc. bleedless: 40.07, fullness: 15.13, SDR: 10.97\n\n“has a Bleedless score that surpasses the FT2 Bleedless” and fullness lower by 0.64.\n\n“can keep the string well” better than viperx 1297 (...) in my country they have some song with Ethnic instruments. Only 1297 and Revive2 can keep them in Instrumental while other model notice them as Vocal” ~daylight\n\n“it does capture more than viperx's” - mesk\n\nIt’s depth 12 and dim 512, so the inference is much slower than with some newer Mel-Roformers like voc\\_fv4 (even two times), with the exception of Mel 1143 which is as slow as BS 1297 (thx dca, neoculture).\n\n- BS-Roformer Revive unwa’s vocal [model](https://huggingface.co/pcunwa/BS-Roformer-Revive/tree/main) (viperx 1297 model fine-tuned) was released.\n\nVoc. bleedless: 38.80, fullness: 15.48, SDR: 11.03\n\n“Less instrument bleed in vocal track compared to BS 1296/1297” but it still has many [issues](https://discord.com/channels/708579735583588363/1226334240250269797/1371215438352224307), “has fewer problems with instruments bleeding it seems compared to Mel. (...) 1297 had very few instrument bleeding in vocal, and that Revive model is even better at this\n\n(...). Works great as a phase fixer reference to remove Mel Roformer inst models noise” it doesn’t seem to remove instruments like FT3 Preview for phase fixing (thx dca100fb8)\n\nAdded to [phase fixer Colab](https://colab.research.google.com/drive/1uDXiZAHYk7dQajOLtaq8QmYXL1VtybM2).\n\n- [Inst\\_GaboxFv8](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv8.ckpt) model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) checkpoint has been updated, metrics could have changed, but most of the model qualities might remain similar\n\n- (MVSEP) “I added new Drumsep MelBand Roformer (4 stems) model on MVSep (old one was removed). It gives the best metrics with big gap for kick, snare and cymbals.” - ZFTurbo\n\n([metrics](https://imgur.com/a/h924uBF); only toms are worse SDR-wise vs previous SCNet Drumsep models)\n\n- Gabox released [voc\\_fv5](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_fv5.ckpt) vocal model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml)\n\nvoc bleedless: 29.50, fullness: 20.67, SDR: 10.56\n\n“fv5 sounds a bit fuller than fv4, but the vocal chops end up in the vocal stem. In my opinion, fv4 is better for removing vocal chops from the vocal stem” - neoculture. [Examples](https://discord.com/channels/708579735583588363/708580573697933382/1369232029291511881)\n\n“v5 is slightly fuller, v4 is less full but also slightly more careful about what it considers as vocals. I think b5e is the fullest overall, but it's a bit much sometimes. Pretty sure the gabox models are a little more accurate with vocal/instrument detection.” Musicalman\n\nPasses the Gregory Brothers - Dudes a Beast test (before - trumpets in vocal stem at 0:51; unwa’s beta4 and inst v1e tested) - maxi74x1\n\n- *Some of our less active users have been accidentally kicked out of our Discord server during some administrative tasks. You’re free to rejoin using* [*this*](https://discord.gg/ZPtAU5R6rP) *invite link (unless you were banned before in some other unrelated event).*\n\n- Dry Paint Dealer Undr (a.k.a. wesley36) released new Phantom Centre Models:\n\nHTDemucs Similarity/Phantom Centre Extraction model:\n\n<https://drive.google.com/drive/folders/10PRuNxAc_VOcdZLHxawAfEdPCO6bYli3?usp=sharing> (it tends to be more “correct” in center extraction than the last MDX23C model)\n\nThe Demucs model won’t work with UVR giving bag\\_num error even with the yaml prepared in the same way as for Imagoy Drumsep and after renaming ckpt to th (it’s probably because it needs ZFTurbo inference code).\n\nSCNet Similarity/Phantom Centre Extraction model:\n\n<https://drive.google.com/drive/folders/1CM0uKDf60vhYyYOCg2G1Ft4aAiK1sLwZ?usp=sharing>\n\nAnd also, difference/Side Extraction model based on SCNet arch was released:\n\n<https://drive.google.com/drive/folders/1ZSUw6ZuhJusv7HE5eMa-MORKA0XbSEht?usp=sharing>\n\n- Aufr33 released his UVR Backing Vocals Extractor v2 [model](https://mega.nz/file/XZpXyJAD#if8wRRDxHZ0T-HiH8ZRLhXUloNIm87kpGKrBRMlHoq8), previously available only on x-minus/uvronline (VR arch).\n\n“Note that this model should be used with a rebalanced mix.\n\nThe recommended music level is no more than 25% or -12 dB.\n\nIf you use this model in your project, please credit me.”\n\nShould work in UVR. Just place the model file in Ultimate Vocal Remover\\models\\VR\\_Models and [config](https://drive.google.com/file/d/1aGjDkPhqPLLlOKXeIfWDw09LElHMJfos/view?usp=sharing) file in lib\\_v5\\vr\\_network\\modelparams. Then pick “4band\\_v4\\_ms\\_fullband.json” when asked to recognize the model (it has the same checksum as in lib\\_v5\\vr\\_network\\modelparams folder if it’s there already). Also, I think it's not VR 5.1 model. And it was used with vocal model as preprocessor.\n\nMore about its usage in [Karaoke](#_vg1wnx1dc4g0) section (scroll down a bit).\n\n- squid.wtf doesn't work anymore “it just downloads 30 seconds of a song, just a random 30 second snippet” lucida works.\n\n- [USS-Bytedance](#_4svuy3bzvi1t) Colab has been fixed (Python “No such file or directory” fix) - thx epiphery.\n\n<https://colab.research.google.com/drive/1rfl0YJt7cwxdT_pQlgobJNuX3fANyYmx?usp=sharing>\n\n- (MVSEP) “I added a new MVSep Saxophone (saxophone, other) model. It has 3 versions:\n\nSCNet XL (SDR saxophone: 6.15, other: 18.87)\n\nMelBand Roformer (SDR saxophone: 6.97, other 19.70)\n\nEnsemble Mel + SCNet (SDR saxophone: 7.13, other 19.77)” ZFTurbo\n\n“SCNet XL take[s] wurlitzer as sax tho. Mel Rofo one (...) didn't” - dca\n\n- (x-minus) Server code updated [might fix the issue with bleeding at first seconds in e.g. Mel Decrowd; edit. it didn’t]\n\nAdded Lead vocal panning setting for Mel-RoFormer Kar by becruily model.\n\n[It’s] “to \"tell\" the AI ​​where the main vocals are located (how they are mixed).\n\nAdded Demudder for the Mel-RoFormer Kar by becruily model.” - Aufr33\n\n“doesn't even need Lead vocal panning a lot of the time, [the] ability to recognize what is LV and what is BV [is] impressive” - dca\n\n- Anjok released a new UVR Roformer patch #15 fixing CUDA for RTX 5000 Series GPUs and Windows users (it’s based on CUDA 12.6 and newer PyTorch). It might not be backward compatible with older GPUs, so be aware ([src](https://github.com/CarlGao4/Demucs-Gui/issues/115#issuecomment-2819652160)).\n[Download](https://www.mediafire.com/file_premium/4jg10r9wa3tujav/UVR_Patch_4_24_25_20_11_BETA_full_cuda_12.8.zip/file)\n\n- (MVSEP) “I added becruily Karaoke model. It's available as option in MelBand Karaoke (lead/back vocals) algorithm.” ZFTurbo\n\n- (MVSEP) Since at least February there's a normalization for all input unless WAV is chosen as output format.\n\nSometimesi it can be \"annoying when you have to combine the outputs later\".\n\n“No, if you turn off normalization, FLAC will cut all above 1.0\n\nAnd if it was normalized, it means you had these values.”\n\nFLAC doesn’t support 32-bit float, it’s 32 int, so normalization is still needed.”\n\nSo if your stems don’t invert correctly, just use WAV output format - it's 32-bit float.\n\n- Audioshake now have strings model\n\n- [Fast Separation](https://colab.research.google.com/drive/1u9oUj3T0Z5F-Jl3J6Nm0dtAGsEESx80F?usp=sharing) Colab by Sir Joseph has been updated with the following models:\n\nMelBand Roformers: FT 3 by unwa, Karaoke by becruily, FVX by Gabox, INSTV8N by Gabox, INSTV8 by Gabox, INSTV7N by Gabox, Instrumental Bleedless V3 by Gabox, Inst V1 (E) Plus by Unwa, Inst V1 Plus by Unwa\n\n- (stephanie/UVR) “Those of you on Linux running the current *roformer\\_add+directml* branch that cant get becruily's karaoke model working due to the same error:\n\nit seems editing line 790 in separate.py setting the keyword argument strict to False when calling load\\_state\\_dict seems to make the karaoke model load and infer properly, so i think it will work\n\n*model.load\\_state\\_dict(checkpoint, strict=False)*\n\nI don't know if this is a robust workaround, but I haven't observed anything behaving differently than it should yet, so if you want to give it a shot I think it will work\n\nTL;DR change line 790 in separate.py to the codeblock and then run again and karaoke model should work”\n\n- Aname’s Mel-Roformer 4 stems Large added to inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- Apollo Lew Uni model can be also used as denoiser.\n\nIt tends to smooth out some noise in higher frequencies, making the spectrum more even there, smoothing out the sound in general ([example](https://drive.google.com/drive/folders/1HuDRh5dkkNbMbIhcybDTTusnWwZOgp4V?usp=sharing)).\n\nMore about the model and its usage - [click](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.7nivwczgciev).\n\n- Becruily’s Mel-Roformer Karaoke model added on x-minus/uvronline under “Keep backing vocals” option and in the inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nMost likely, you’ll have “”’norm’”” AttributeError when trying out that model in UVR. Read [here](#_vrtjkbqu3t9r) for troubleshooting. Use melband-roformer model type, not v2.\n\nMake sure you use the latest UVR Roformer [patch](#_6y2plb943p9v) - older patches like #2 will show RuntimeError about layers.\n\n- (becruily) “I'm releasing my first karaoke [model](https://huggingface.co/becruily/mel-band-roformer-karaoke/tree/main).\n\nIt's a dual model trained for both vocals and instrumental. It sounds fuller + understands better what is lead and background vocal, and to me, it is better than any other karaoke model.”\n\n“Compared to Aufr33’s Melband model, it can achieve e.g. cleaner pronunciation in some songs ([examples](https://discord.com/channels/708579735583588363/708580573697933382/1361694470403260556)) - neoculture “It is the best available, better than Mel Kar, UVR BVE v2, lalal.ai, Dango...” - dca “This sounds amazing” - Rege 313 “It performs very well with male/female duets, nice work” - Gabox\n\n“Important note: This is not a duet or male/female model. If 2 singers are singing simultaneously + background vocals, it will count both singers as lead vocals. The model strictly keeps only actual background vocals. The same goes for \"adlibs\" such as high notes or other overlapping lead vocals.\n\nThe model is not foolproof. Some songs might not sound that much improved compared to others. It's very hard to find a dataset for this kind of task.\n\nTip: For even better results, first extract the vocals with a fullness model (like mine) and combine the results with a fullness instrumental model.” becruily\n\nThe model outputs 2 stems like duality models, so you might end up with three outputs if you check the option to invert stem - don’t use it, it will rather have worse quality than what the model outputs.\n\n- (MVSEP) “I added 2 more models for DrumSep based on MelBand Roformer architecture.”\n\na) 4 stems (kick, snare, toms, cymbals) - average [SDR](https://imgur.com/a/n9WMkSY) of hihat ride, crash is 11,52 (but in one stem) and so far it’s the best SDR out of all models (even vs the previous ensemble consisting of three MDX23C and SCNet models).\n\nb) 6 stems (kick, snare, toms, hihat, ride, crash) - average SDR of hihat ride, crash is 8.18 (but from separated stems), while\n\nThe snare in a) has the best SDR out of all available models.\nKick and toms are still the best SDR-wise in the previous 3x MDX23C and SCNet ensemble (new ensemble with these new Mel-Roformers so far)\n\n- The new models “are very great for ride/crash/hh. And overall they have the best metrics almost for all stems.” - ZFTurbo\n\n- Aname released two 4 stems Mel-Roformer models:\n\n<https://huggingface.co/Aname-Tommy/melbandroformer4stems/tree/main>\n\na) Large (4GB) SDR drums: 9.72, bass: 9.40, other: 5.11, vocals 8.65 (multisong dataset)\n\nb) XL (7GB) SDR drums: 9.83, bass: 9.37, other: 5.31, vocals 8.57 (multisong dataset)\n\nThe latter doesn’t work in the custom model import Colab with at least the default chunk\\_size, and works slow on e.g. 3060 (?12GB). Both models were trained with chunks set to 15 seconds (chunk\\_size = 661500).\n\n“I tried a song on 4070 Super it took like 6 mins on XL 4 stems compared to 30 seconds on Large 4 stems” On 3060 XL is very slow.\n\nDespite lower AVG SDR on musdb18 dataset vs demucs\\_ft (8.54 vs 9), it seems to outperform that model (SDR is only better in other stem), public SCNet, SCNetXL, BS-Roformer have better [metrics](https://imgur.com/a/1pGC8En) (still musdb18 dataset, not multisong on MVSEP)\n\n“Drums are sounding really good in particular, tested a couple songs with the large model after using unwa's v1e+ for instrumental” “drums are absolutely the standout“\n\n“Large works in like 99% use case” “Large split sounds amazing so far tho”\n\nXL “result would take so much longer, but the large results sounded better imo” 5B\n\n“The Colab is forcing a different value than the one from the config. You can try to edit the inference cell code and add 661500 as possible value and see if it goes better.\n\nThe Colab only changes chunk\\_size (value from GUI), batch\\_size (forcing =1) and overlap (value from GUI), it doesn't touch other settings from config.” - jarredou\n\n“It may change audio setting, chunk\\_size=485100, n\\_fft=2048 will work, but it will go lower SDR maybe” while the lowest reasonable value will be rather 112455 (2,5 s).\n\nLarge model uses 7GB VRAM on Nvidia GPU in UVR with default config settings.\n\n- Sir Joseph released [SESA Fast Separation](https://colab.research.google.com/drive/1u9oUj3T0Z5F-Jl3J6Nm0dtAGsEESx80F?usp=sharing) Colab based on UVR. It’s faster than the regular [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) Colab (whiuch now has “added Apollo to Auto Ensemble and fixed a few technical glitches. It’s running smoother now!”)\nMore changes in the fast Colab:\n\nV1e+ and Gabox inst fv8 are missing because the model list cannot be updated in the Fast Colab yet.\n“auto-ensemble feature is included here too.\n\nBackground noise suppression is a bit more polished.\n\nYou can specify unwanted stems to filter out.”\n\n- (x-minus/uvronline) “1. A new Mel-RoFormer by unwa v1e+ model has been added. It removes vocals very gently while preserving instruments. It is recommended to use it with correct\\_phase post-processing.\n\n2. Mel-RoFormer by Kim & unwa ft3 and some other models are hidden. As before, you can find them here: <https://uvronline.app/ai?hp&test>” - Aufr33\n\n“The only problem is the phase correction, it still uses FT2 as a reference [for phase fixer], and FT2 cuts instruments still, so I'm waiting for FT3 release by unwa so it can be added as phase fixer reference and preserve instruments well” dca\n\n“Results are still better with phase fixer though, right”\n\nMake sure you’re “clicking on \"Ensemble\"? It should \"reveal\" that option” since the last website layout changes.\n\n“phase fixer [on the site] swaps the v1e+ vocals with the ft2 vocals”\n\nIirc phase fixer feature requires premium.\n\n- SESA Colab by Sir Joseph is back! The Colab link has changed - [click](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\nApollo Integration: Added Apollo audio enhancement feature. Supports Normal and Mid/Side methods.\n\nUI Updates: Added new Apollo settings components under the Settings tab.\n\nBug Fixes:\n\nFixed Apollo output not showing in the terminal.\n\nCorrected \"Phase Remix\" and \"Overlap Info\" display in the UI.\n\nTranslation Updates: Added new translation keys for Apollo, removed unused keys.\n\nColab Support: Added 10 new languages: EN\\_US (English), TR\\_TR (Turkish), AR\\_SA (Arabic), RU\\_RU (Russian), ES\\_ES (Spanish), DE\\_DE (German), ZN\\_CN (Chinese), HI\\_IN (Hindi), JA\\_JP (Japanese), IT\\_IT (Italian).\n\nand new models added\n\nNote: Enhanced UI and processing stability.\n\n- Gabox released [Inst\\_GaboxFv8](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv8.ckpt) model ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) [weight has been replaced by v2]\n\nInst. bleedless: 38.06, fullness: 35.57, SDR: 16.51 [outdated]\n\nMight have some “ugly vocal residues” at times (Phil Collins - In The Air Tonight) - 00:46, 02:56 - dca.\n\nVS v1e ”it seems to pick up some instruments better” Gabox\n\n“a bit cleaner-sounding and has less filtering/watery artifacts.\n\nBoth models are prone to very strange vocal leakage [“especially in the chorus.”].\n\nAnd because Fv8 can be so clean at times, the leakage can be fairly obvious. For now, my vote is for Fv8, but I'll still probably be switching back and forth a lot” - Musicalman\n\n“sometimes v1e+ have vocal residues which sound like you were speaking through a fan/low quality mp3” - dca\n\n- Added Mesk Metal Model Preview, Unwa v1+ Preview, and Unwa v1e+ Mel instrumental models and Beta6X and FT3 Preview by Unwa vocal models, and Bandit v2 multilingual model to inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- Unwa released a new V1e+ Mel-Roformer instrumental [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/blob/main/inst_v1e_plus.ckpt) | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/blob/main/config_melbandroformer_inst.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst bleedless: 36.53, fullness: 37.89, SDR: 16.65\n\nLess noise than v1e (esp. in the lower frequencies), but it’s also less full - “somewhere between v1 and v1e.”. It has fewer problems with quiet vocals in instrumentals than the V1+, “issues with harmonica, saxophone, elec guitar and synth seem to have been fixed. Theremin and kazoo are still problematic [like] for models from MDX-Net or SCNet [archs]). Only dango seems to correctly detect kazoo as an instrument it seems” - dca, “The loss function was changed to be more fullness-oriented, and trained a further 50k steps from the v1+ test.” Unwa\n\n“v1e keeps better instruments like trumps than v1e+\n\nWith v1e+ there is less noise, but some instruments are hidden” koseidon72\n\n“v1e+ has a strange problem of almost vocoding the vocals and keeping them in quietly” even with phase fixer\n\n“has some problems with cymbals bleed in vocals (not the case with other instrumental roformer models)” dca\n\n“trained with additional phase loss which helps remove some of that metallic fullness noise, and also has higher sdr I believe” - becruily\n\n- Unwa released V1+ Mel-Roformer instrumental [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/blob/main/config_melbandroformer_inst.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. bleedless: 38.26, fullness: 35.31, SDR: 16.72\n\n\"It is based on v1e, but the Fullness is not as high as v1e, so it is positioned as an improved version of v1.\" Unwa\n\n\"very nice model, the multistft noise is gone\"\n\nIt's probably due to:\n\n\"Unwrapped phase loss function added\" Unwa\n\nBTW. It was already proven before, that adding artificial noise to separations was increasing fullness metric.\n\n\"Seems to have significantly less sax and harmonica bleed in vocal, which is an awesome thing (...) It still struggles with other things like FX and Kazoo.\" dca\n\n\"It sounds clean. The only thing [is] that some instruments are deleted, and in some tracks leaves remnants of voice in the instrumental.\" Fabio\n\n\"Screams are not removed from the track\" Halif\n\nTraining details\n\n\"I made a small improvement to the dataset and trained about 50k steps with a batch size of 2.\n\n8192 was added to multi\\_stft\\_resolutions\\_window\\_sizes.\n\nAs it was, the memory usage increased too much, so it was rewritten to use hop\\_length = 147 when window\\_size is 4096 or less and 441 when window\\_size is greater than that.\" Unwa\n\n- Mesk released a preview of his instrumental model retrained from Mel Kim on metal dataset consisting of a few thousands of songs.\n\n<https://huggingface.co/meskvlla33/metal_roformer_preview/tree/main> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nThese are not multisong metrics, but made with private dataset!\n\nInstr bleedless: 48.81, fullness: 42.85, SDR: 13.7621\n\n\"currently restarting from scratch because I think I know what all the problematic vocal tracks were, and I removed them, we'll see if it's gonna be better\"\n\n\"vocals could follow if requested.\n\nShould work fine for all genres of metal, but doesn't work on:\n\n- hard compressed screams\n\n- some background vocals\n\n- weird tracks (think Meshuggah's \"The Ayahuasca Experience\")\n\nP.S: Use the training repo [(MSST)](#_2y2nycmmf53) if you want to [separate] with it. UVR will be abysmally slow (because of chunk\\_size [introduced since [UVR Roformer beta](#_6y2plb943p9v) #3])”\n\n- Yusuf fixed [Apollo](https://colab.research.google.com/drive/1lHnu9-rVvNp5VtU7MFjWx92501Qwfdyf) and [AudioSR](https://colab.research.google.com/drive/1e9CjxMDyYnggKKQBPpGLufuVzt_yJJrL) WebUI Colabs and mid/side method of upscaling was added to Apollo\n\n- Unwa released Big Beta 6X vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6x.ckpt) ([yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6x.yaml))\nVocal bleedless: 35.16, fullness: 17.77, SDR: 11.12\n\n“it is probably the highest SDR or log wmse score in my model to date.”\nSome leaks into vocal might occur.\n\n“dim 512, depth 12.\n\nIt is the largest Mel-Band Roformer model I have ever uploaded.”\n\n“I've added dozens of samples and songs that use a lot of them to the dataset”\n\n- (MVSEP) “I added new Apollo model with Aura MR STFT: 22.42\n\nIt's available under \"Apollo Enhancers (by JusperLee and Lew)\" with option:\n\"Universal Super Resolution (by MVSep Team)\".\n\nIt requires a hard cutoff on frequency for best experience.” - ZFTurbo\n\nIirc, it was trained by his student.\n\n“It's doing well on more transient stuff like snare hits, but it seems to really struggle to actually add harmonics. Has this really weird quality of sounding high quality and low quality at the same time”\n\n“It doesn't seem to like 8 kHz cutoff, it has generated almost nothing”\n\n“I tried with a 10 kHz cutoff and just got quitet-ish transients”\n\n“Lew told me the same while training his, the model would learn transients/drums but struggle with harmonics. Maybe it’s an Apollo limitation. I don’t recall if the OG model by jusper lee has this issue too, since it rarely works”\n\nAdvice\n\nYou might want to process your song even 4 times to potentially get better results.\n\nAlso, you can split mids and sides, and upscale them separately to get better results, although it’s not always better solution ([spectrograms](https://imgur.com/a/9hGtihd) | [tutorial](https://docs.google.com/document/d/1G7EXEQh9oEWtBXId_OSiCy-qXC6X2xve4BTRnyQlvJQ/edit?usp=sharing)), thx AG89.\n\nUsing e.g. MDX23C Similarity/Phantom Centre extraction model instead with 2x slowdown (to reduce smearing artefacts) gives less high-end recovery, but less noise resulting in more proper cancelling of both channels ([spectrograms](https://imgur.com/a/uVbNGJ7) by AG89).\nAvg ensemble will be rather diminishing returns, so consider manual weighted ensemble in DAW.\nGetting rid of noise or dithering above real frequencies by making cutoff can make a night and day difference for the result ([example](https://imgur.com/a/roAJR2G))\n\nSometimes cutting off some more existing frequencies might be beneficial too (the model was trained with hard cutoff)\nFor noise artefacts after upscaling you can use some [denoisers](#_hyzts95m298o)\n\n- Gabox released new [INSTV8N](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/INSTV8N.ckpt) instrumental model in experimental folder ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml))\n“noticed too many vocal residues. (...) there is no noise” although N stands for noise in its name.\n\n- Some upscaling Colabs are also affected by the last runtime changes in Colab made by Google. Maybe downgrading !pip install torch==2.5 would help.\n\n- We're aware of the issues in some Colabs like [MDX by HV](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb) (numpy errors related to its wrong version). Any fixing will be announced. Stay tuned.\n\n- Fixed, but initialization is slow till further notice, and you need to click initialization cell second time when you’re prompted to restart environment.\n- Fixed, but now you need to click the initialization cell again after Numpy has been installed (happens briefly after launching the initialization cell).\n\n- Unwa's FT3 test vocal model added on x-minus/uvronline\n\n“make vocals sound a bit lower at chorus compared to other parts of songs”, doesn’t happen with big beta 5e - oak\n\n- ZFTurbo: “I added 2 new super resolution algorithms on MVSep in Experimental section:\n\n1) AudioSR. Metrics: <https://mvsep.com/quality_checker/entry/8067>\n\n2) FlashSR. Metrics: <https://mvsep.com/quality_checker/entry/8071>”\n\nBe aware that both can give some errors occasionally. Some problems with mono audio were fixed already.\n\n- Unwa released FT3 preview vocal [model](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/blob/main/kimmel_unwa_ft3_prev.ckpt) | [yaml](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/resolve/main/config_kimmel_unwa_ft.yaml)\n\nVocal bleedless: 36.11, fullness: 16.80, SDR: 11.05\n\n“primarily aimed at reducing leakage of wind instruments to vocals.\n\nI will upload a further fine-tuned version as FT3 in the near future.”\n\nFor now, FT2 has less leakage for some songs (maybe till the next FT will be released).\n\n- Gabox added some new experimental instrumental models in a separate repo [folder](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/experimental).\n\nThey are called V8, V9, V10, don’t consider them as newer/better, but forgotten to upload in the meantime.\nThey’re less full than V7, but have less vocal residues. Also, the results from V8 and V10 are the same (“inverted polarity between 2 results, and it's just silence”), and also for V9.\n“Both remove some instruments from the music, like V7.\n\nAs for noise, however, they are less noisy”\n\n- Gabox released [inst\\_gaboxBv3](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gaboxBv3.ckpt) instrumental model (B for bleedless)\n\nInst. bleedless: 41.69, fullness: 32.13\n\n“can be muddy sometimes”\n\n- [mesk’s training model guide](https://docs.google.com/document/d/15CDkDugqBKGMFe98ltQ6XGZEh8pdDOCTMT4ue89GQIM/edit?usp=sharing) link has been changed (the previous one has been deleted)\n\n- Apart from new drumsep models on MVSEP, also moises.ai has their own drumsep model (paid).\nProbably their base drums model used for drumsep is not better than other solutions, so check [this](#_sjf0vefmplt) section of the doc to get better drums to separate first to test it out, although one user reported that moises’ drums model (free), probably vs Mel-Roformer on MVSEP or x-minus (not sure) can give “better results (...) if the input material is for example cassette-tape sourced or post-FM).\n\n- Joseph made the SESA Colab private till some stuff will be fixed in the future.\nConsider using [this](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) Colab with newer models added at this time.\n\n- (x-minus) Inst V7 model by Gabox replaced v1e model by Unwa.\n\nIt can be still accessed by these links:\n\n<https://uvronline.app/ai?hp&test> (premium)\n\n<https://uvronline.app/ai?test> (free)\n\n(v1e might be still fuller, and impair fewer instruments in cost of more noise, also be aware that separation on x-minus might differ from Colabs, MSST or UVR, possibly due to different inference parameters)\n\n- Training (and inferencing) locally on Radeon using [MSST](#_2y2nycmmf53), specifically RX 7900 XTX, was confirmed to work by Unwa on Ubuntu 24.04 LTS using Pytorch 2.6 for ROCm 6.3.3.\n\nCurrently, officially [supported](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) consumer GPUs with ROCm are:\n\nRX 7900 XTX, RX 7900 XT, RX 7900 GRE and AMD Radeon VII. But in fact, there are more consumer Radeons confirmed to work already too.\n\n“No special editing of the code was necessary. All we had to do was install a ROCm-compatible version of the OS, install the AMD driver, create a venv, and install ROCm-compatible PyTorch, Torchaudio, and other dependencies on it.” [More](#_bg6u0y2kn4ui)\n\n“So far I have not had any problems. Running the same thing appears to use a little more VRAM than when running on the NVIDIA GPU, but this is not a problem since my budget is not that large and if I choose NVIDIA I end up with 16GB of VRAM (4070 Ti S/4080 S).\n\nProcessing speeds are also noticeably faster, but I did not record the results on the previous GPU, so I can't compare them exactly.“ [More](#_bg6u0y2kn4ui)\n\n- [Inst\\_GaboxFVX](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/Inst_GaboxFVX.ckpt) model was released (which is “instv7+3” - so probably fuller than instv3) and\n\n- [INSTV7N](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/INSTV7N.ckpt) (so more noisy than INSTV7; “it's [even] closer to fv7 than inst3”) [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)\n\n- Gabox Karaoke model got updated (links have been replaced, and the old deleted from the repo),\n\n- and also final [INSTV7](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/Inst_GaboxV7.ckpt) was released (“I hear less noise compared to v1e, but it has worse bleedless metric”)\n\n- Gabox released instv7 [beta 2](https://gofile.io/d/aiZiEl)\nInst. bleedless: 34.66, fullness: 38.96\n\nand instv7 [beta 3](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/instv7beta3.ckpt)\n“Both are noisy with small vocal residuals in places where music is low and deletions of some musical instruments.”\n\n- New 4 stem drumsep SCNet model (kick, snare, toms, cymbals) has been added on MVSEP (best SDR for kick and similar for toms to previous 6s modelm -0.01 SDR difference), and also 8 stems ensemble of all other drumsep models (besides the older Demucs model by Imagoy) [metrics](https://imgur.com/a/aXxSgV7)\n\n- Gabox released [instv7beta](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/instv7beta.ckpt) model [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)\n\nInst. bleedless: 35.01, fullness: 38.39\n“sound is good, but sometimes some instruments are lowered or deleted”\n“while the annoying buzzing/noise is still present, it seems to be more contained.”\n\n- Gabox released Mel [KaraokeGabox](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/kar_gabox.ckpt) model (uses Aufr’s [config](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/config_mel_band_roformer_karaoke.yaml)) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\n“the lead vocals are good and clean!\nWhile the backing tracks are lossy for this model, [it still] provide[s] great convenient for those who need LdV”\n\n“The model doesn't keep the backing vocals below the main vocals, sometimes the backing vocals will be lost even though there are backing vocals there.”\n\n- New [FullnessVocalModel](https://huggingface.co/Aname-Tommy/MelBandRoformers/blob/main/FullnessVocalModel.ckpt) ([yaml](https://huggingface.co/Aname-Tommy/MelBandRoformers/blob/main/config.yaml)) vocal model was released by Aname | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\nVoc. bleedless: 32.98 (less than beta 4), fullness: 18.83 (less than big beta 5e/voc\\_fv4/becruily, more than beta 4)\n\n“While it emphasizes fullness, the noise is well-balanced and does not interfere much. (...)\n\nin sections without vocals, faint, rustling vocals can be heard.”\n\nWe have some report of very long separation of this model in UVR on Macs.\n\n> Try to change chunk\\_size: 529200 to 112455 for that model/yaml (but it’s dim\\_t 256 equivalent, so something higher to test might be a better idea too)\n\n- (SESA) No audio file found bug fixed\n\n- “In my testing, I've found that SCNet very high fullness (on mvsep) put through Mel-Roformer denoise (average) and UVR denoise (minimum) has the best acapella result\n\nwould love to see people's thoughts” dynamic\n\n- Gabox released [voc\\_fv4](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv4.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVoc. bleedless 29.07, fullness 21.33\n\n“Very clean, non-muddy vocals. Loving this model so far” (mrmason347)\n\n“lost some of the trumpet sound while on Becruily model can keep it, but some also was lost”\n\n- Joseph fixed some bugs and errors in SESA Colab, and also added new interface\n\nThere are still some issues with auto ensemble till further notice.\n\n- Fixed\n\n- unwa released Big Beta 6 vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6.ckpt) | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/blob/main/big_beta6.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n“Although it belongs to the Big series, the characteristics of the model are similar to those of the FT series. (...) this model is based on FT2 bleedless with the dim increased to 512”\n\nMuddier than Big Beta 5, might be better than FT2 at times.\n“If you liked the output of the Big Beta 5e model, you may not like 6 as much; it does not have the output noise problem of 5e, but instead sacrifices Fullness. (...) Simply put, it is a more conservative model” unwa\n\n- To get rid of noise in INSTV6N, use [denoisedebleed.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) on mixture first, then use INSTV6N - “for some reason it gives cleaner results” (Gabox)\n\n- New Gabox model released: [INSTV6N](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/INSTV6N.ckpt) (noisy) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | [metrics](https://mvsep.com/quality_checker/entry/7915):\ninst bleedless: 32.63, fullness: 41.68 (more than v1e)\nInterestingly, some people find it having less noise vs v1e, and more fullness.\nAlso, it has more fullness vs INSTV6, and more noise.\n“v1e sounds like an \"overall\" noise on the song, while v6n kind of mixes into it.\n\nv6n also sounds like two layers, one of noise that's just there. And the other one mixes into the song somehow.\n\nUsing the phase swap barely makes it any better than phase swapping with v1e though” - vernight\nAlso Kim model for phase swap seems to give less noise than unwa ft2 bleedless\n\n- Demudder in UVR using at least DirectML (Intel/AMD) works only if \"Match freq cut-off\" is enabled in MDX settings. Otherwise, you’ll get “Format not recognised” error.\n\n- SESA Colab might undergo some issues with hyper\\_connections at the moment.\nIt might be fixed tomorrow.\n- Done\n\n- [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) Colab update:\nVoc\\_Fv3 (by Gabox)\n\ndereverb\\_mel\\_band\\_roformer\\_mono (by anvuew)\n\nMelBandRoformer4StemFTLarge\n\nINSTV5N (by Gabox)\n\ndenoisedebleed (by Gabox)\n\n- Gabox released [denoisedebleed.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) for noise from fullness models (tested on v5n) - it can't remove vocal residues\n\n- Aname released small inst/voc 200MB Mel-Roformer with null target stem ([link](https://huggingface.co/Aname-Tommy/Mel_Band_Roformer_small/tree/main))\n\n- [v5\\_noise](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/INSTV5N.ckpt) inst model released | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [metrics](https://mvsep.com/quality_checker/entry/7884) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\n- New Gabox vocal model released: [voc\\_Fv3.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_Fv3.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nEnthusiastic opinions so far\n\n- [INSTV6](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/instrumental) by Gabox and De-reverb (Mono) by anvuew models added on x-minus | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\nV6 “is slightly better than v5” (although not for everyone), but “v1e still gives better fullness, but noise [in v1e] is a problem” old viperx 12xx models have less problems with sax.\n\n- (added on MVSEP as SDR 13.72) ZFTurbo trained new SCNet XL model for drums.\n\n“I have 2 versions: one is slightly higher SDR and avg Bleedless.\n\nSecond is better for fullness and L1Freq.\nPrevious best SDR model had 13.01 (it's SCNet Large).” [Metrics](https://discord.com/channels/708579735583588363/911050124661227542/1338528118973005855)\n\n15.7180 (13.72) one has much better fullness metric.\n\n“It's far superior to the other one, but I still hear some weird parts.\n\nIt still messes up on some percussion.\n\nThe drums stem sounds really weird.\n\nThe no drums is alright except for some bleeding but yeah the drums is quite muddy” - insling\n\n- Gabox released new fine-tunes of his inst Mel-Roformer models ([click](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/instrumental)):\ninst\\_gabox2.ckpt, inst\\_gabox3.ckpt. INSTV5.ckpt. INSTV6.ckpt\nwith one opinion that the last one is his best inst model so far.\n“seems like a mix between brecuily and unwa's models”\n\n“confuses way less instruments for vocals than v1e, but it's still not as full as v1e (...) But it's a very good model”\n\nRarely it can give “Run out of input error” in UVR when installing using the new Model install option (moved model has 0 bytes), while V5 worked correctly, then move the ckpt to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models manually.\n\n- We’re aware that the x86-64 version of the latest UVR patch for Mac went offline.\nAnjok was pinged about it.\n\n- anvuew released new [dereverb\\_mel\\_band\\_roformer\\_mono\\_anvuew\\_sdr\\_20.4029](https://huggingface.co/anvuew/dereverb_mel_band_roformer/blob/main/dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt) model.\n“supports mono, but ability to remove bleed and BV is decreased\nshould not matter whether it's singing or speech, because my dataset contains speech.”\n\n[Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\n- MedleyVox Colab is currently broken (you can use MVSEP instead)\n\n> fixed:\n\n<https://colab.research.google.com/drive/10x8mkZmpqiu-oKAd8oBv_GSnZNKfa8r2?usp=sharing> (although initialization now takes 7 minutes, GDrive integration added)\n\n- Phase remix functionality was added to SESA model inference Colab\n\n<https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet>\n\n- (MVSEP) ZFTurbo added new SCNet XL “high fullness” and “very high fullness” models on the site ([metrics](https://imgur.com/a/ahLe1Cz)).\n\nThey’re good for both vocals and instrumentals, and sometimes are fuller than v1e, although with more noise, which can be too strong for some people, but not all.\n\n“very high fullness” variant have both vocals and instrumental fullness and bleedless metric better than the “high fullness”.\n\n“they also correctly detect \"complex\" (for the AI) instruments as part of the instrumental track rather than in vocals (like flute or sax for example), which isn't the case for v1e and Fv5. Example: sax solo of Shine On You Crazy Diamond detected the sax solo as part of acapella using v1e or Fv5 or becruily inst.” dca100fb8\n\nThe noise \"gonna go nuts with distortion, compression and other vct plugins\" John.\n\nBoth variants \"have the same amount of buzzing noise\"\n\n\"Instrumentals are very good. It's holy shit level. Unwa v1e/Gabox Fv5 are still amazing, it's just nice to have such a decent model like these new ones on a different arch\" dca\n\nFrom the songs I’ve tested, SCNet is incredible. Very full sounding\" mrmason347\n\n\"Regular high fullness though has a less full instrumental but quite good acapella\" theamogusguy\n\nVHF leaves some vocal residues in metal, but seems to do well for e.g. alt-pop.\n\n\"scnet doesn't pick up the drone backing vocals, but 10.2024 has mad violin bleed in the vocals\" dynamic64\n\nFor \"mainly orchestral tracks with choir\" \"it gave me noticeably fuller results than v1e\" Shintaro5034.\n\n\"For noisy/dense mixes though, Roformers are probably better, especially for inst.\n\nscnet seems better at preserving treble in some vocals. These high fullness models especially so. So maybe teaming SCNet up with Roformer might give a nice middle ground\"\n\n“Rofos are really bad for some kinds of EDM that are very aggressive (Dubstep, Trance, Breakcore, etc...), also it has a very hard time with Experimental (IDM)”\n\nVHF seems to have more crossbleed in some songs, along with also basic XL model. Some songs which sound full enough even with basic SCNet XL. While others sound muddy (dca)\n\n- (X-Minus) Mel Kim model has been replaced for phase correction by Unwa’s Kim FT2 model for premium users\n\n- New [sites and rippers](#_ataywcoviqx0) added:\n\n<https://yams.tf/> (Qobuz, Tidal, Spotify, Apple Music [currently 320kbps], Deezer) - for URLs\n\n<https://us.deezer.squid.wtf/> (Deezer only) - for queries\n\n<https://github.com/ImAiiR/QobuzDownloaderX> (local ripper for premium accounts or provided ARLs)\n\n- [FlashSR](https://github.com/jakeoneijk/FlashSR_Inference) has been released ([Colab](https://colab.research.google.com/github/jarredou/FlashSR-Colab-Inference/blob/main/FlashSR_Colab.ipynb) with chunking and overlap by jarredou).\n\nIt’s a diffusion distillation of AudioSR, and has lower Aura MR STFT [metric](https://mvsep.com/quality_checker/leaderboard/super_res_music/?sort=restored), and usually lower quality as well, but it might give better results for music for some people\n\n- (MVSEP) “We trained new DrumSep models (5 stem and 6 stem) based on SCNet XL.\n\n\\* 5 stems: cymbals, hh, kick, snare, toms\n\n\\* 6 stems: ride, crash, hh, kick, snare, toms”\n\nBoth have better SDR than the previous MDX23C model by jarredou and Aufr33.\n\nThe 5 stems variant has e.g. better snare SDR than the 6 stems variant. [Full metrics](https://discord.com/channels/708579735583588363/911050124661227542/1334429689527402567).\n\nIt doesn't work correctly on the site yet, it will be announced in the link above by ZFTurbo when it will be fixed.\n\n- They work already\n\n- Unwa released ft2 bleedless vocal model | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing)\n\n<https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main>\n\nvoc bleedless 39.30 | fullness 15.77 | SDR 11.05\n\n- instv5 model released by Gabox (39.40 inst fullness | inst. bleedless 33.49) [link](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/instrumental) | [yaml](https://huggingface.co/becruily/mel-band-roformer-instrumental/resolve/main/config_instrumental_becruily.yaml) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | x-minus\n\n“it seems that most vocal leakage is gone, and the noise did significantly decrease, although there's still a bit more noise presence than v1e.\n\nIn terms of fullness though, for some reason it sounds as if it's actually less full than v1e, despite the higher instrumental fullness SDR.\n\nDespite v4's significant amount of noise, it seems to be the only model that gave me a fuller sounding result compared to v1e that's actually perceivable by my ears.” Shintaro\n\n- New inst/voc SYH99999 models released\n\n<https://huggingface.co/SYH99999/MelBandRoformerSYHFTB1/tree/main>\n\n- (x-minus) Phase fixer added for Gabox fv3 and becruily models for premium users\n\n- For vocals, you can alleviate some of the noise/residues in unwa’s 5e model by using phase fixer/swapper and using becruily vocals model as a reference (imogen).\n\n- For instrumentals, you can try unwa's v1e with phase swap at 500/500 with original mel band of kim. It consistently gives less noise - midol\n\n\"500 / 500 means you use original phase below 500 Hz and hard cutoff/swap to transferred phase above 500hz. (this can potentially create phase artifacts at 500Hz because of hard swap)\n\n500 / 20000 means you use original phase below 500hz and progressively crossfade to transferred phase until 20000hz and transferred phase is used above 20000hz. So it's softer phase swap below 20kHz\" - jarredou\n\n“using 500 on both parameters really does make me have the illusion that I have produced the official instrumental. Even tho it's unofficial haha” - midol\n\n- Deezer on Lucida doesn’t work. Doubledouble.top came back (probably temporarily), but returns mp3 128kbps from Deezer now. Also, it supported Apple Music unlike Lucida, but now it doesn’t work, (check current services [status](https://doubledouble.top/stats/)). Besides, occasionally it can happen that rips from Amazon only on doubledouble have quality higher than 44/16. Plus, downloading full albums frequently fails, swhile single songs downloading works.\n\n- Lots of new Gabox models added since then, including:\n\na) BS-Roformer instrumental variant, which doesn’t struggle so much with choirs like most Mel-Roformers, although may not help in all cases ([link](https://huggingface.co/GaboxR67/BSRoformerVocTest/tree/main))\n\nb) [inst\\_gaboxFv3.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gaboxFv3.ckpt) - like v1e when it comes to fullness (added on x-minus)\n\nInst SDR 16.43 | inst. fullness 38.71 | inst. bleedless 35.62\nIt might pick up entire sax in vocal stem.\n\n- Gabox models have been added to SESA [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) (you’ll find more info about them later below).\n\n- Along with UVR Roformer beta patch #14, Anjok released the long anticipated **demudder**.\nIt's in Settings > Advanced MDX options (so works only for Roformers and MDX models).\nIt consists of three methods to choose from (each separates your twice):\n\n- Phase Rotate\n\n- Phase Remix (Similar to X-Minus) - “the fullest sounding, but can leave a lot of artifacts with certain models. I only recommend that method for the muddiest models. Otherwise, Combined Methods is the best” “I don't recommend using phase remix on the Instrumental v1e model. I recommend combined methods or phase rotate for models produce fuller instrumentals.” Anjok\nIt might leave some choruses when using V1E (Fabio)\n\n- Combine Methods (weighted mix of the final instrumentals generated by the above). More in the [full changelog](https://discord.com/channels/708579735583588363/785664354427076648/1331189771078471721). You cannot use demudder on 4GB AMD GPUs with 800MB Roformers with even 2 seconds chunk size set (memory allocation error).\n\n“It's meant to solely target instrumentals. The vocals should stay exactly as before.\nFor Roformer models, it must detect a stem called \"Instrumental” so for some models like Mel-Kim, you need to open model’s corresponding yaml, and change “other” to “instrumental”.\n\n“I've noticed with the few amounts of tracks I've tried, demudding can sometimes accentuate instances of bleeding or otherwise entirely missed vocal-like sounds”\n\nIn case of file not found error on attempt of using demudder, reinstall UVR.\n\n“I put the demudded instrumental in the bleed suppressor, and it sounds really good, almost noise free. I either do a bleed suppressor or a V1/bleed suppressor ensemble” gilliaan\n\n“With the new config editor feature you could probably edit the configs of models to have the vocal stem labelled as the Instrumental stem so the demudder demuds the vocal stem, it definitely still makes a difference.\n\nI accidentally did this when installing another model, but it seems to actually have an effect on vocal stems too.\n\nYou just change the target instrument from vocals to instrumental I think (don't move the stems around)\n\nYou can verify it works if the stems are the other way around when processing (vocals are in the file labelled as Instrumental). Then you can use the demudder on the vocals that way\n\nI think. If you want to use the demudder with other models that aren't labeled with instrumental, you'll have to select the stem you want to demud and replace it with Instrumental.\n\nThough demudding the vocal stem will definitely make it quite noisy depending on what model you use, though there [appears](https://discord.com/channels/708579735583588363/767947630403387393/1331633822088954010) to be instances where demudding the vocal stem can mildly help with certain effects but i did not test this enough” stephanie\n\nAnjok: “Just a few quick notes on the Demudder:\n\nIt works best on tracks that are spectrally dense (ex. Metal, Rock, Alternative, EDM, etc.)\n\nI don't recommend it for acoustic or light tracks.\n\nI don't recommend using it with models that emphasize fuller instrumentals (like Unwa's v1e model).\n\nI do plan on adding options to tweak the phase rotation.\n\nI also plan on adding another combination method that may work better on certain tracks.”\n\nUVR\\_Patch\\_1\\_21\\_25\\_2\\_28\\_BETA:\nSmall patch (you must have a [Roformer Patch](#_6y2plb943p9v) [e.g. #13] previously installed for this to work): [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_21_25_2_28_BETA_small_rofo.exe)\nAlso, minor bugs fixed, calculate compensation for MDX-Net v1 models added.\nThe MacOS version will be released later ([observe](https://discord.com/channels/708579735583588363/785664354427076648/1331189771078471721)).\n\nBe aware that at least Phase Rotate doesn’t work on AMD and 4GB VRAM GPUs on even 88200 chunk size (prev. dim\\_t 201 - 2 seconds) and 800MB Roformers like Becruily’s, while 112455 (2,55s, prev. dim\\_t = 256) works just fine for normal separation.\n\n- BS-RoFormer 4 stems model by yukunelatyh / SYH99999 added on x-minus\n\nSince then, a new version was added (later epoch, but it has lower SDR for all stems).\n\n<https://uvronline.app/ai?discordtest>\nSome people liked the v1 more than Demucs, but “it's like demucs v4 but worse i think\n\nthe vocals have a ton of bleed, the bass is disappointing tbh\n\nthe other stem has a ton of bgv and adlib bleed in it” Isling\nIt has [SDR metrics](https://imgur.com/a/O2qDgTQ) for all stems worse than 4 stem BS-Roformer by ZFTurbo and demuics\\_ft.\n\n- Aname also released 4 stem BS-Roformer [model](https://mega.nz/file/GwY1TQoB#UmeMGO2BBtgrUXkmXXQQVXqwR_hwxaAmkycDr-fitWg) | [yaml](https://mega.nz/file/Pwp32DrY#Kyl5sK3j6l5kXCe7Br52gN2CXn9c8N6lzOYT1g0FOS0)\n\nIt has better SDR than the above (as in the SDR metrics link above), but worse than the other two mentioned\n\n- Gabox released Mel-Roformer instrumental model (Kim/Unwa/Becruily FT): <https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers>\n\ninst bleedless: 37.40 (better than v1e by 1.8), fullness 37.07 (better than unwa inst v1 and v2)\n“It’s like the v1 model with phase fixer, but it gets more instruments,\n\nlike, it prevents some instruments getting into the vocals”, “sometimes both models don't get choirs”.\n\n- instrumental variant called fullness v1 (“noisier but fuller”)\n\ninst bleedless: 37.19, fullness 37.26\n\n(thanks for evaluation to Bas Curtiz and his [GSheet](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?usp=sharing) with all models.)\n\n- fullness v2 released\n\n- fullness v3 released\n\n- B (bleedless) v1/v2 variants released\n\n- voc\\_gabox.ckpt:\n\nvoc bleedless: 34.66 (better than 5e), fullness 18.10 (on pair with beta 4)\n\n- Vocal model F v1\n\n- Vocal model F v2\n\nvoc bleedless: 33.4013, fullness: 19.3064\n\n- Issues with dataset 4 in MSST repo were fixed\n“I think that issue could also explain why training de-reverb models with pregenerated reverb audio files was not working that well, as reverb was not aligned with clean dry audio as it should have been.” jarredou ([more](https://discord.com/channels/708579735583588363/911050124661227542/1330983209885896757))\n\n- Aufr33 BS-Roformer Male/Female beta ([model](https://mega.nz/file/XZwV2QwB#5nvWpmvtoBMTJkpor-lMUZCbBZWDH-3i52ELJS_JmcU) | [config](https://huggingface.co/Sucial/Chorus_Male_Female_BS_Roformer/blob/main/config_chorus_male_female_bs_roformer.yaml) | [config](https://drive.google.com/file/d/15dxMvEanC8h_djEuHoQXHKMk0ERN02_y/view?usp=sharing) for UVR | tensor match error [fix](https://github.com/nomadkaraoke/python-audio-separator/releases/download/model-configs/deverb_bs_roformer_8_384dim_10depth_config.yaml)) added on [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (based on BS-RoFormer Chorus Male Female by Sucial) along with Unwa’s Kim FT2\n\n- Anjok released the MacOS versions of UVR Roformer beta patch #13.1 applying hotfix to address a few graphics issues:\n\n- Mac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_arm64_hf.dmg)\n\n- Mac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64_hf.dmg)\n\n- Anjok released UVR beta Roformer patch #13 for Mac (Windows further below):\n\nUVR\\_Patch\\_1\\_15\\_25\\_22\\_30\\_BETA:\n\n- Mac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_arm64.dmg)\n\n- Mac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64.dmg)\n\n- mesk wrote a good comprehensive [training guide](https://docs.google.com/document/d/12KzNIojKSWLA7uvHhhxydXvBLZgussewc1aRzWxqGwY/edit) for beginning model trainers. Later you can proceed to read further [section](#_bg6u0y2kn4ui) of this doc for more details and arch explanations\n\n- Apple Music bot link added in [this](#_ataywcoviqx0) section (thx mesk)\n\n- mrmason347 and Havoc shared an interesting method to get cleaner vocals. The last point of tips to enhance separation [here](#_929g1wjjaxz7):\nSeparate with becruily Mel Vocal model and its instrumental model variant, then get vocals from the vocal model, and instrumental from instrumental model, import both stems for the DAW of your choice (can be Audacity) so you’ll get a file sounding like original file, then export - perform a mixdown of both stems, then separate it with vocal model\n\n- If you somehow still struggle with “norn” issues in UVR, see at the bottom of the section [here](#_c4nrb8x886ob)\n\n- Dango released “Reverb Remover” - [click](https://tuanziai.com/en-US/de-reverb)\n\n“it's very similar to RX11 Dialogue Isolate, good/real-time set to 5\n\nit's like listening to the same inference files” John; probably also works in mono, you can get 30 seconds for free)\n\n- [filegarden](https://filegarden.com/) added to the [list of cloud services](#_5haztbxg91rt) (seems to be unlimited, registration required, link shortener with custom name available)\n- [your-good-results](https://discord.com/channels/708579735583588363/1325284373087391845) and [your-bad-results](https://discord.com/channels/708579735583588363/1325284456382075041) channels have been reopened on the server, but you need to paste links to uploads instead of uploading audio files directly on Discord due to copyright issues the server was undergoing\n\n- If you want to use Phase fixer Colab with cut-offs suggested by CC Karaoke, check [here](https://colab.research.google.com/drive/14HIQRhOcMmC8RCKUzo_t-ogIfjORbs2K)\n\n- Unwa’s Kim FT2 model added to the [inference Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (both inst and voc becruily models are added too)\n\n- jarredou released [Custom Model Import](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29_CustomModel.ipynb) Version of the inference Colab. You can use it if we don’t add any new model to the main Colab on time, or you test your own models.\n\nJust make sure that pasted link haven’t “downloaded the webpage presenting the model instead of the model itself.”\n\nSo, e.g. for yamls pasted from GH, use:\nhttps://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/config\\_vocals\\_mdx23c.yaml'\n\nInstead of:\nhttps://github.com/ZFTurbo/Music-Source-Separation-Training/main/configs/config\\_vocals\\_mdx23c.yaml'\nAnd for HF, follow the pattern presented in the Colab example (so with the resolve in the file address)\n\n- [model\\_fusion.py](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) by Sucial\n\nThis script seems to save the weighted ensemble of three models into a checkpoint called \"fused\". The result is not bigger than a single model.\n\nProbably you could basically create one checkpoint getting the same or similar results of manually weighted models, and not inference every of them one by one.\n\n- Becruily models added on MVSEP and instrumental on variant on x-minus\n\n- ZFTurbo “added new organ model: MVSep Organ (organ, other).\n\nDemo: <https://mvsep.com/result/20250116160630-f0bb276157-mixture.wav>”\n\n- Anjok released a patch #13 fixing following issue with no sound on some Roformer models (like avvuew’s and sucial’s de-reverb) on GTX 10XX or older (Windows):\n\nUVR\\_Patch\\_1\\_15\\_25\\_22\\_30\\_BETA:\n\n“- Full Install: [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_15_25_22_30_BETA_full.exe)\n\n- Patch Install (use if you still have non-beta UVR installed): [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_1_15_25_22_30_BETA_rofo.exe)\n\n- Small Patch Install (have a Roformer patch previously installed for this to work): [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_15_25_22_30_BETA.exe)\n\nThe issue was some older GPU's are not compatible with Torches \"Inference Mode,\" (which is apparently faster) so it's now using \"No Grad\" mode instead. Users can switch back to using \"Inference Mode\" via the advanced multi-network options.\n\nThe MacOS version will be released in a few days. I just need to finish testing out all the models and networks and ensure all the kinks are worked out.” [More](https://discord.com/channels/708579735583588363/785664354427076648/1329312294521405493)\n\n- Users undergo some issues (no sound) with Mel-Roformer de-reverb by anvuew (a.k.a. v2/19.1729 SDR) since the latest UVR beta #11/12 updates (the issue seems to occur only on GTX 10XX series, and maybe older). Anjok’s working on the issue.\n\nYou should be able to use more than one UVR installation at the same time when one’s been copied before updating (patch #10 still works) or use MSST repo and/or its GUIs.\n\n- Anjok released patch #12 which is a hotfix for the [4 stem](#_sjf0vefmplt) BS-Roformer model by ZFTurbo (trained on MUSDB)\n\n[UVR\\_Patch\\_1\\_13\\_0\\_23\\_46\\_BETA\\_rofo\\_fixed.exe](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_13_0_23_46_BETA_rofo_fixed.exe) (Windows only)\n\n- Anjok released a new UVR beta Roformer patch #11 (Windows only for now):\n\nUVR\\_Patch\\_1\\_13\\_0\\_23\\_46\\_BETA\\_rofo\n\nIt fixes 4 bugs: with VR post-processing threshold, Segment default in multi-arch menu, CMD will no longer pop-in during operations, and error in phase swapper.\n\n[More](https://discord.com/channels/708579735583588363/785664354427076648/1328267582871961620) details/potential updates.\n\nStandalone (for non-existent UVR installation)\n\n[UVR\\_1\\_13\\_0\\_23\\_46\\_BETA\\_full.exe](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_13_0_23_46_BETA_full.exe)\n\nFor 5.6 stable (so for non-beta Roformer installation)\n\n[UVR\\_Patch\\_1\\_13\\_0\\_23\\_46\\_BETA\\_rofo.exe](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_1_13_0_23_46_BETA_rofo.exe)\n\nSmall (for already existing Roformer beta patch installation)\n\n[UVR\\_Patch\\_1\\_13\\_25\\_0\\_23\\_46\\_rofo\\_small\\_patch.exe](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_13_25_0_23_46_rofo_small_patch.exe)\n\n- New beta UVR Roformer patch #10 released by Anjok (for now, only small patch for already existing [beta Roformer](#_6y2plb943p9v) installation is available, and only for Windows, check [here](https://discord.com/channels/708579735583588363/785664354427076648/1327237549114261524) for Mac later)\n\nUVR\\_Patch\\_1\\_9\\_25\\_23\\_46\\_BETA\\_rofo\\_small\\_patch - [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_9_25_23_46_BETA_rofo_small_patch.exe)\n\nAdded SCNet and Bandit archs with models in Download Center (SCNet models using ZFTurbo’s unofficial code update will not work since they appear to require a library \"mamba\\_ssm\" that is only available in Linux), fixed compatibility with some newer Roformer models (wesley’s MDX23C and Roformer Phantom center models, and 400MB inst small by Unwa), new Model Installer option added, model configuration menu enhanced, allowing aliases to selected models, added compatibility for Roformer/MDX23C Karaoke models with the vocal splitter, VIP code issue is gone, issues with secondary models options and minor bugs and interface annoyances are addressed, “improved the \"Change Model Settings\" menu. Now, any existing settings associated with a selected model are automatically populated, making it easier for users to review and adjust settings (previously, these settings were not visible even if applied).”.\n\nIf you have Python DLL error on startup, reinstall the last beta update using the full package instead, then the small installer from the newer patch.\n\n“If you see a different usage of VRAM than with previous Roformer beta version, it could also be because the new beta version doesn't rely on 'inference.dim\\_t' value anymore (if you were using edited \"dim\\_t\" value)\n\nYou have to edit audio.chunk\\_size now (see [here](https://discord.com/channels/708579735583588363/767947630403387393/1324901146002849925) for conversion between dim\\_t and chunk\\_size” it’s “In model yaml config file, at top of it, chunk\\_size is first parameter (...) you can edit model config files directly inside UVR now.”\n\n“Unfortunately, SCnet is not compatible with DirectML, so AMD GPU users will have to use the CPU for those models.\n\nBandit models are not compatible with MPS or DirectML. For those with AMD GPU's and Apple Silicon, those will be CPU only.\n\nThe good news is those models aren't all that slow on CPU.” - Anjok\n\nAnnoying CMD window will randomly pop up again when ffmpeg and Rubber Band are used. Regression will be fixed.\n\n- Newer Mel-Roformer Male/Female model was added by ZFTurbo on MVSEP (SDR: 13.03 vs 11.83 - the previous SCNet one, and much better bleedless metric 41.9392 vs 26.0247 with only 0.2 fullness decrease)\n“ I find it acts differently from Rofo or UVR2. Sometimes it's the one of the three that gets it right., and not strictly for male/female.” CC Karaoke\n\n- Aufr33 released his own BS-Roformer Male/Female (currently beta) model based on BS-RoFormer Chorus Male Female by Sucial.\n“this model only works with vocals. You need to pre-isolate the vocals.”\n\nAdded on MVSEP and x-minus for premium (in the new Other menu).\n\nWeights: <https://mega.nz/file/XZwV2QwB#5nvWpmvtoBMTJkpor-lMUZCbBZWDH-3i52ELJS_JmcU>\n\nConfig: <https://huggingface.co/Sucial/Chorus_Male_Female_BS_Roformer/blob/main/config_chorus_male_female_bs_roformer.yaml>\n\n- Unwa released a new version of his Mel-Kim fine-tune (ft2)\n<https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main>\nIt tends to muddy instrumental outputs at times, similarly like the OG Kim’s model was doing, which didn’t happen in the previous ft model by Unwa.\n\n[Metrics](https://mvsep.com/quality_checker/entry/7714). PS. All unwa models were trained on 3060 Ti!\n\n- Unwa released 400MB experimental BS-Roformer inst model\n<https://huggingface.co/pcunwa/BS-Roformer-Inst-EXP-Value-Residual>\nIt’s using a new Value Residual Learning added to Roformer arch by Lucidrains in the OG Roformer. If it wasn’t made compatible with MSST repo already, replace bs\\_roformer.py from this [repo](https://github.com/lucidrains/BS-RoFormer/tree/main/bs_roformer) and\nfrom bs\\_roformer.attend import attend\n\n⇩\n\nfrom models.bs\\_roformer.attend import attend\n\nin bs\\_roformer.py file\n\n“I think it sounds better than large rn but still not good, needs some [more] epoch[s]!”\n\n[later the VRL was added as Mel-Roformer v2 model type in [UVR](#_6y2plb943p9v) so it’s compatible with the model]\n\n- New dereverb model(s) released by Sucial - “fused”: [model](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt) | [yaml](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/config_dereverb_echo_mbr_v2.yaml)\n“trained two new models specifically targeting large reverb removal. After training, I combined these two models with my v2 model through a blending process, to better handle all scenarios. At this stage, I am still unsure whether my new models outperform the anvuew's v2 model overall, but I can confidently say that they are more effective in removing large reverb.” [More](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer)\n\n- Becruily Mel inst and voc models added on MVSEP and inst variant on x-minus\n\n- ZFTurbo released new models on MVSEP:\na) a new Male/Female separation model based on SCNet XL\n\nSDR on the same dataset: 11.8346 vs 6.5259 (Sucial)\nModel only works on vocals. If the track contains music, use the option to \"extract vocals\" first. Sometimes the old Sucial model might still do a better job at times, so feel free to experiment.\n\nb) SCNet XL (vocals, instum)\n\nInst SDR: 17.2785\n\nVocals have similar SDR to viperx 1297 model,\n\nand instrumental has a tiny bit worse score vs Mel-Kim model.\n\n- “All Ensembles on MVSep were updated with latest release [SCNet XL] increasing vocals SDR to 11.50 -> 11.61 and instrum SDR: 17.81 -> 17.92”.\n\n- ([MSST](https://github.com/ZFTurbo/Music-Source-Separation-Training)) You can now inference mono files without any issue\n- You can now use “batch\\_size=1 without clicks issues (with overlap >= 2 of course)” - jarredou\n\n- *Becruily’s released instrumental and vocal Mel-Roformer models* | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [UVR](#_6y2plb943p9v) beta |\n[Instrumental](https://huggingface.co/becruily/mel-band-roformer-instrumental/tree/main) model files | Inst SDR 16.4719 | inst fullness 33.9763 | bleedless 40.4849\n[Vocal](https://huggingface.co/becruily/mel-band-roformer-vocals/resolve/main/mel_band_roformer_vocals_becruily.ckpt?download=true) model file | Vocals SDR 10.5547 | voc fullness 20.7284 | bleedless 31.2549 |\n[config](https://drive.google.com/file/d/1V__MDSd9h47tgk3CCZUhbfcZz9A_oJee/view?usp=sharing) with ensemble fix in UVR.\n\nInstrumental model is as clean as unwa’s v1, but has less noise and, and it can be got rid well by Mel [denoise](#_hyzts95m298o) and/or Roformer [bleed suppressor](https://drive.google.com/file/d/1zWOrzPKd-6x7vjHNqjK_bK2UOYfPzCuu/view). Inst variant “removed some of the faint vocals that even the bleed suppressor didn't manage to filter out” before”. Doesn’t require phase fix from Mel-Kim like unwa models below.\n“it handles the busy instrumentals in a way that makes VR finally an arch of the past”\nCorrectly removes SFX voice. More instruments correctly recognized as instruments and not vocals, although not as much as Mel 2024.10 & BS 2024.08 on MVSEP, but still more than unwa’s inst v1e/v1/v2. (dca100fb8).\nTrumpet or sax sound which on unwa model was lost, can be recovered\non becruily's model (hendry.setiadi)\n\nThe instrumental model pulled out more adlibs than the released vocal model variant - it pulled out nothing (isling).\n“Vocal model pulling almost studio quality metal screams effortlessly. Wow, I've NEVER heard that scream so cleanly” (mesk)\nThe model was trained on dataset type 2 and single RTX 3090 for two days (although with months of experimentation beforehand). SDR metrics are lower than Mel-Kim model.\n\nIf you use lower dim\\_t like 256 at the bottom of config for slower GPU, these are the first models to have very bad results with that setting.\n\nYou can experiment with phase fixer with santilli\\_ suggestion “Using becruily's vocals as source and inst [model] as target, and changing high frequency weight from 0.8 to 2 makes for impressive results”.\n\n- [Phase fixer Colab](https://colab.research.google.com/github/lucassantillifuck2fa/Music-Source-Separation-Training/blob/main/Phase_Fixer.ipynb) (update 2) by santilli\\_ released - it can use e.g. Mel-Kim model phase for unwa’s v1e/v1/v2 models to automatically get rid of some noise during separation (it might no longer work due to the last changes in MSST repo), it includes also becruily models\n\n- A small UVR Roformer beta patch #9 fixing mainly Apollo arch released also for Mac (UVR\\_Patch\\_12\\_8\\_24\\_23\\_30\\_BETA):\n\nMac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1208_MacOS_arm64.dmg)\n\nMac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1208_MacOS_x86_64.dmg)\n\n- New “MVSep Bass (bass, other)\" SCNet model available on MVSEP\n\n“It achieved SDR: 13.81. In Ensemble it gives 14.07 - which is a new record on the Leaderboard.” ZFTurbo\n“It passes Food Mart - Tomodachi Life test. That's the first model to.”\n“All bass models have problems with fretless bass”\nThere’s already an option to combine all SCNet+BS Roformer+HTDemucs bass models for 14.07 SDR.\n\nEnsembles have been updated with this model too.\n\n- Reverb removal by Sucial v2 (Mel-Roformer) [model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2534381169) added on MVSEP (update of the previous model)\n\n- Lew universal upscaling [model](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/tag/uni) has been added on x-minus/uvronline too (premium users).\nJust a reminder - it’s not for badly mixed music, it’s for lossy files (also on [Colab](https://colab.research.google.com/github/jarredou/Apollo-Colab-Inference/blob/main/Apollo_Audio_Restoration_Colab.ipynb)/MVSEP/UVR [beta](#_6y2plb943p9v) [at least support for a model file])\n\n- ZFTurbo released a new 4 stem XL [model](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.13) trained on SCNet.\n“I have great results comparing with SCNet Large model (by starrytong).”\n\nSCNet Large MUSDB test avg: 9.70 (bass: 9.38, drums: 11.15 vocals: 10.94 other: 7.31)\n\nSCNet XL MUSDB test avg: 9.80 (bass: 9.23, drums: 11.51 vocals: 11.05 other: 7.41)\n\nSCNet Large Multisong avg: 9.28 (bass: 11.27, drums: 11.23 vocals: 9.05 other: 5.57)\nSCNet XL Multisong avg: 9.72 (bass: 11.87, drums: 11.49 vocals: 9.32 other: 6.19)\n\nA new SCNet bass model is incoming and already surpassed metrics of ZFTurbo’s HTDemucs and BSRoformer bass models.\n\n- Anjok released a small UVR Roformer beta patch #9 fixing mainly Apollo arch:\n\nUVR\\_Patch\\_12\\_8\\_24\\_23\\_30\\_BETA\n\nWindows only for now: [Full](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_12_8_24_23_30_BETA_rofo_full_install.exe) | [Patch](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_12_8_24_23_30_BETA_large.exe) (Use if you still have non-beta UVR installed) |\n\n[Small Patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_8_24_23_30_BETA_small_rofo.exe) (You must have a Roformer patch previously installed for this to work)\nChangelog:\n\nApollo fixes: “Chunk sizes can now be set to lower values (between 1-6)\n\nOverlap can be turned off (set to 0)”\n\nFix both for Apollo and Roformers: now 5 seconds or shorter input files no longer cause errors.\n\nOpenCL was wrongly referenced in the UVR. It was actually DirectML all the way, and Anjok changed all the OpenCL names in the app into DirectML.\n\n- Unwa released a new Kim-Mel-Band-Roformer-FT vocal [model](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nIt enhances both our new bleedless (36.95 vs 36.75) and fullness (16.40 vs 16.26) [metric](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?usp=sharing) for vocals vs the original Mel Kim model. [SDR](https://mvsep.com/quality_checker/entry/7585)-wise it’s also a tad lower (10.97 vs 11.02)\n(thx Bas Curtiz)\n\n- Male/female BS-Roformer separation model has been released by Sucial\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2525052333>\n\nIf they sing at intervals (one by one), they cannot be separated.\nWorks pretty good, bleed might occur occasionally. Also, it seems to pick up various people dialogues.\n\nIf you want to use the model in UVR, use [this](https://drive.google.com/file/d/15dxMvEanC8h_djEuHoQXHKMk0ERN02_y/view?usp=sharing) config (thx Essid)\n\nIf you have \"The size of tensor a (352768) must match the size of tensor b (352800) at non-singleton dimension 1\" e.g. in python-audio-separator, use [this](https://github.com/nomadkaraoke/python-audio-separator/releases/download/model-configs/deverb_bs_roformer_8_384dim_10depth_config.yaml) config (thx Eddycrack864)\n\n- Anjok released UVR Roformer beta patch #8 for Win: [full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_3_24_1_18_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_3_24_1_18_BETA_small_patch_rofo.exe) | Mac: [M1](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1203_MacOS_arm64.dmg) | [x86-64](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1203_MacOS_x86_64.dmg)\n- UVR\\_Patch\\_12\\_3\\_24\\_1\\_18\\_BETA\nApollo arch was made compatible with MacOS MPS (metal) and OpenCL, but with it, it might be unstable and very RAM intensive - use chunk size over 7 to prevent errors (currently it’s not certain that all models will work with less than 12GB of VRAM).\nApollo is now compatible with all Lew models (fixed incompatibility with any other than previously available in Download Center). Fixed (presumably regression with) Matchering.\n\nHow would I assign a yaml config to an Apollo model on the new UVR [patch]?\n“1. Open the Apollo models folder\n\n2. Drop the model into the folder\n\n3. From the Apollo models folder, drop the yaml into the model\\_configs directory\n\n4. From the GUI, choose the model you just added and if the model is not recognized, a pop-up window will appear, and you'll have the option to choose the yaml to associate with the model.” - Anjok\n\n“I found some overlapping issues in the UVR [using Apollo vs Colab]. like some short parts sounding duplicated overlaid” The issue is caused by different chunking, which on Colab is preconfigured to use 15GB of VRAM. “chunk\\_size has influence on results, colab uses 25sec (or 19 for latest lew model)”\n\n- Anjok released UVR Roformer beta patch #7 for Windows: [full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_2_24_2_20_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_2_24_2_20_BETA_patch_rofo.exe)\n- UVR\\_Patch\\_12\\_2\\_24\\_2\\_20\\_BETA (version for Mac probably tonight, [observe](https://discord.com/channels/708579735583588363/785664354427076648) or on [GH](https://github.com/TRvlvr/model_repo/releases/tag/uvr_update_patches))\n\nIt introduces support for Apollo arch. The OG mp3 enhancer and Lew v1 vocal enhancer were added to Download Center. Probably, now you’ll be able to add newer Lew uni enhancer and v2 vocal enhancer manually. The arch is located in Audio Tools. Sadly, this arch cannot be GPU-accelerated with OpenCL, so using AMD and Intel cards (you’re forced to use CPU, which might be long).\nAlso, “Phase Swapper” a.k.a. Phase fixer for Unwa inst models was added to Audio Tools.\n\n- New SCNet and MelBand DnR v3 (SFX) models were added on MVSEP (along with optional ensemble). “The metrics turned out to be better than those of the similar model Bandit v2” (25.11)\n\n- We fixed some issues (IndexError) with jarredou’s inference Colab due to the recent updates in the ZFTurbo code (thx for the heads-up, MrG).\n\n- Anjok released UVR Roformer beta patch #6 for MacOS as well: [M1](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_arm64.dmg) | [x86-64](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_x86_64.dmg)\n\n- Lew released a new Apollo [universal model](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/tag/uni) for upscaling various lossy audio files (added on MVSEP and x-minus premium).\nUnlike the previous mp3 model, it’s able to enhance any formats and not only mp3, including files with hard cutoff like in AAC 128 kbps ([see](https://imgur.com/a/AzagEuJ)), it struggles with 48 kbps files.\n“If anyone wants to run the new model in [Colab](https://colab.research.google.com/drive/1Iry5A_l6Ubk6bUnVjy99vbILPXAG9L5Y) [already added], set chunk\\_size to 19. Then the model uses 14.7GB VRAM” (Essid). Sometimes a lower setting is necessary (e.g. for a 3 minute song, otherwise memory error will appear).\nAs for 27.11.24 it doesn’t work with MSST yet, later added support in UVR [patch](#_6y2plb943p9v).\n\n“Actually much better than the original Apollo model. It handles artifacts really well\n\nand also noise, it understands noise while [the] OG model doesn't for some reason” John UVR/simplcup\n\nSpecifically for any muddy Roformer vocals, still use Lew vocal enhancer v1/2 as they're better for this task, though they can be noisy (available in the [Colab](https://colab.research.google.com/drive/1Iry5A_l6Ubk6bUnVjy99vbILPXAG9L5Y)).\n\n“I also included [checkpoint](https://easyupload.io/r5qzk5) you can continue training from” ([mirror](https://drive.google.com/file/d/1FAokt7DfLTkaadnxTa4bUkGQoNWutW58/view?usp=drivesdk))\n“Q: segments: 5.4 - can I assume that chunck\\_size if 5.4 \\* 44100?\nA: Yes, and dims 384\n\nThe smaller one for inference and bigger one for training\nQ: Does that model has a dataset of a wide variety of compression noise and artifacts?\n\nA: mp2, mp3, ogg, wma, aac, opus, low band width wavs. Random speed change augmentations were used too.” “It was mostly trained on music” Lew\n\n- Two weeks ago, unwa and 97chris released a [bleed suppressor](https://drive.google.com/file/d/1zWOrzPKd-6x7vjHNqjK_bK2UOYfPzCuu/view) Mel Roformer model dedicated for instrumentals (made with unwa v1 in mind). It can work with e.g. v1 and v1e. Sometimes it can remove some bleed also after using [phase fixer](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=sharing) (by becruily) dedicated for v1 model, or used also on x-minus for premium users\n\n- Anjok [released](https://discord.com/channels/708579735583588363/785664354427076648/1310529792461770814) a new UVR Roformer beta patch #6\n\nUVR\\_Patch\\_11\\_25\\_24\\_1\\_48\\_BETA (Windows: [standalone](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_25_24_1_48_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_25_24_1_48_BETA_patch_rofo.exe) | Mac: [M1](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_arm64.dmg) | [x86-64](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_x86_64.dmg))\n\naddressing “All stem” error issue with viperx’ models.\n\nAnd with it, a long anticipated MDX-Net HQ\\_5 model has been released | [Colab](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb) | MVSEP\n(it’s also added to Download Center in previous UVR patch versions).\n~~New version of the HQ\\_5 model is announced to be released in two weeks already.~~\nInstrumentals are slightly muddier than in HQ\\_4, but vocal residues are also a bit quieter (although rather still present where they were before, maybe with some exceptions).\n\nE.g. some hi hats might get a tad quieter in the mix.\nThe new model variant in two weeks was said to have fuller instrumentals.\n\nvs unwa’s v1e “HQ5 has less bleed but is prone to dips in certain situations. (...). Unwa has more stability, but the faint bleed is more audible. So I'd say it's situational. Use both. (...) Splice the two into one track depending on which part works better in whichever part of the song is what I'd do.” CC Karaoke\n\n[Model](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-MDX-NET-Inst_HQ_5.onnx) | config: \"compensate\": 1.010, \"mdx\\_dim\\_f\\_set\": 2560, \"mdx\\_dim\\_t\\_set\": 8, \"mdx\\_n\\_fft\\_scale\\_set\": 5120\n\n- We have some reports about user custom ensemble presets from older versions no longer working (since 11/17/24 patch and in newer ones). Sadly, you need to get rid of them (don’t restore their files manually) or the ensemble will not work and model choice will be greyed out. You need to start from scratch.\n\n- Sucial released a new Mel-Roformer dereverb/echo model ([model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2493431454) | MVSEP).\nIt’s good but doesn’t seem to be better than the more aggresive variant of Mel anvuew's model ([models list](#_5zlfuhnreff5)).\nStill, might depend on a use case.\n\n- People experience All stems error with viperx’12xx models in newer versions of UVR Beta Roformer patch (patch #2 was the last confirmed to work with these older models)\n\n- Lucida.to is undergoing some issues with Qobuz links. Tidal and Deezer work, but poorly, occasionally giving errors too, just retry. Doubledouble redirects to Lucida now. In case of problems with accessing the domain in your country, check out lucida.su or VPN.\nIf you have any problems during downloading files, try out in incognito mode without any browser extension, also download accelerators might cause issues too (FAQ).\n\n- Unwa released a new beta 5 model dedicated for vocals | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR instr](#_6y2plb943p9v)\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main> | yaml: big\\_beta5e.yaml\n\nIt seems to fix some issues with trumpets in vocal stem (maxi74x1).\nIt handles reverb tails much better (jarredou/Rage123).\n\n“It's noisy and, IDK, grainy? When the accompaniment gets too loud. (...) Definitely not muddy though, which is a welcome change IMHO. I think I prefer beta 4 overall” - Musicalman\n“to me, the noise sounds similar to how VR arch models sounded, except it's not poor quality”\n“Perhaps a phase problem is occurring (...) The noise is terrible when that model is used for very intense songs” - unwa\n\nPhase fixer for v1 inst model doesn’t help with the noise here (becruily).\n“it's a miracle LMAO, slow instrumentation like violin, piano, not too many drums...\nit's perfect... but unfortunately it can't process Pop or Rock correctly” gilliaan\n\n“feel so full AF, but it has noticeable noise similar to [Apollo] lew's vocal enhancer”\n“the vocal stem of beta5e may have fullness and noise level like duality v1, but it may also suffer kind of robotic phase distortion, yet may also remove some kind of bleed present in other melrofo's.” Alisa/makidanyee\n\n“bigbeta5e is particularly helpful when you invert an instrumental and then process the track with it. It really keeps the quality. Even if the instrumental was a lossy mp3 inverted to a lossless flac file, it cleans it up without making a mess. (...) some songs gets their instrumentals leaked online. And a lot of the time it's a lossy 160kbps mp3 file or even worse, you invert that instrumental file to the real song and process the result using bigbeta5e [to clean the invert]” gilliaan/heauxdontlast\n\n“Ensemble AVG Big Beta 4 + Big Beta 5e is really good to reduce the noise while keeping the fullness” - heauxdontlast\n\n- Unwa released a new Inst v1e model (“The model [yaml] configuration is the same as v1”)\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR instructions](#_6y2plb943p9v) (added in Download Center) | x-minus ([link](https://uvronline.app/ai?hp?test) for premium users) | MVSEP\n\n“The \"e\" stands for emphasis, indicating that this is a model that emphasizes fullness.”\n“However, compared to v1, while the fullness score has increased, there is a possibility that noise has also increased.” “lighter compared to v2.”\n\nWhile SDR-wise it’s [worse](https://imgur.com/a/1FMtosl) than previous unwa’s models, it has the best full [fullness](https://imgur.com/a/aJ0nNdc) factor (you can read more about this new method of evaluation later in [this](#_sc2lgq9t4p19) section).\nThe phase fixer doesn’t really fix the noise in this model like in v1.\n\nLike other unwa models, this can also confuse flute, trumpets and saxophone with vocals.\n- You might want to use this max ensemble by dca100fb8 (e.g. the BS model here is capable of detecting flute correctly and the Mel - sax and trumpet):\nunwa’s v1e + Mel 2024.10 + BS 2024.08 (Max FFT; the latter models on MVSEP, also sometimes unwa's big beta5e can also retrieve missing instruments from v1e when those two fails)\n\n- You might want to check max ensemble of instv1, instv2 and inst v1e - erdzo125\n\n(for even better fullness but more noise - you can consider the [phase fix](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=drive_link) for instv1)\n\n- Anjok released a new beta Roformer [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_19_24_1_23_BETA_patch_rofo.exe) #5 for UVR (Windows only): UVR\\_Patch\\_UVR\\_11\\_17\\_24\\_21\\_4\\_BETA\\_patch\\_roformer\n\n“- Fixed OpenCL compatibility issue with Roformer & MDX23C models.\n\n- Fixed stem swap issue with Roformer instrumental models in Ensemble Mode.”\n\nThe patch is rather not standalone like patch #3, so have a previous UVR installation.\n\n- Anjok released a new beta Roformer patch #4 for UVR: UVR\\_Patch\\_11\\_17\\_24\\_21\\_4\\_BETA (Windows: [full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_11_17_24_21_4_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_11_17_24_21_4_BETA_rofo.exe) | Mac: [M1](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_17_24_21_4_BETA_rofo_MacOS_arm64.dmg) | [x86-64](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_17_24_21_4_BETA_rofo_MacOS_x86_64.dmg))\nMinor [bug fixes](https://discord.com/channels/708579735583588363/785664354427076648/1307933134574063706). Most importantly, MacOS version fix:\n“Roformer checkbox now visible for unrecognized Roformer models” so now you can use custom Roformer models on MacOS Roformer patch without copying/modifying configuration files from Windows version or other users in order to circumvent the lack of option from Windows version to set that the recognized model is Roformer, so separation will work on that model. Plus it includes all the previous fixes in the previously released patch (so overlap code fixed, so no stem misalignment should occur on certain overlap settings - probably higher overlap now means longer separation)\n\n- Anjok released a new beta Roformer patch #3 for UVR (Windows version for now) [UVR\\_Patch\\_11\\_14\\_24\\_20\\_21\\_BETA\\_patch\\_roformer](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_11_14_24_20_21_BETA_patch_roformer.exe) - “this is a patch and requires an existing UVR installation” (so either [previous](#_6y2plb943p9v) beta Roformer patch or stable 5.6 version).\n\nThe new patch fixes the issue with stem misalignment when using incorrect overlap setting for Roformers. Now it uses ZFTurbo code (also for MDX23C), meaning that probably now increasing overlap for Roformers will result in increasing separation times and potentially better SDR (the opposite of what it used to be in the previous beta Roformer patch). Potentially, it might allow using faster settings without stem misalignments or segment popping (when overlap and dim\\_t was set to 201 and overlap 2) for 4GB VRAM cards and some heavier models.\nAmong other minor fixes: “Roformer stem naming issues resolved. Fixed manual download link issues in the Download Center. Roformer models can now be downloaded without issue.”. Implementation of SCNet and Bandit archs is still in works.\n[Full changelog](https://discord.com/channels/708579735583588363/785664354427076648/1306819453857566811).\n\n- Becruily made a Python script fixing phase with unwa v1 model, so it removes its noise.\n\n[Download](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=sharing)\n\nYou need to run: pip install librosa\nin case of “no module named librosa found” error.\n\n“The results are almost, if not the same as x-minus' phase correction.\n\nTo use, you need to have the song separated with Kim's melband model and unwa's v1 model.” 32 bit output switch added\n\n“the output length is few ms shorter than the input\n\nthe output has little popping in the end”\n\n- SYH99999/yukunelatyh released a MelBandRoformerSYHFTV3Epsilon [model](https://huggingface.co/SYH99999/MelBandRoformerSYHFTV3Epsilon).\n\nVS previous SYH’s models “this version is more consistent with separation. It's not what I'd call a clean model; It sometimes lets background noise bleed into the vocal stem. But only somewhat, and depending on how you look at it, it can be a good thing since it makes the vocals sound less muddy.” Musicalman\n\nSince then, there was also a newer [MelBandRoformerBigSYHFTV1Fast](https://huggingface.co/SYH99999/MelBandRoformerBigSYHFTV1Fast) model released.\n\n- Lew released a v2 of the vocal enhancer model for Apollo trained on Roformer vocal outputs\n\nAdded for paid users on x-minus in the Ensemble menu or in the Restoration menu (formerly De-noise) and on [Colab](https://colab.research.google.com/github/jarredou/Apollo-Colab-Inference/blob/main/Apollo_Audio_Restoration_Colab.ipynb). Model [files](https://huggingface.co/jarredou/lew_apollo_vocal_enhancer/resolve/main/apollo_model_v2.ckpt) | [config](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/download/2.0/config_apollo_vocal.yaml).\n\nWorks the best potentially on BS and Mel Roformer ensemble, but it might add some noise as well.\n\nThe model stopped progressing during training, so probably there won’t be any newer epoch of this model.\n\n- Unwa released v2 version of the inst Mel-Roformer model.\n“Sounds very similar to v1 but has less noise, pretty good”\n\n“the aforementioned noise from the V1 is less noticeable to none at all, depending on the track”.\n\n“V2 is more muddy than V1 (on some songs), but less muddy than the Kim model.\n\n(...) [As for V1,] sometimes it's better at high frequencies” Aufr33\n\nAlso, SDR got a bit bigger (16.845 vs 16.595)\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md)\n“It's the same size as the big model with depth 12 and mask\\_estimator\\_depth 3.\n\nThe improvement was stagnant with the same model size as v1.” - unwa\n\n- The model has been added to UVR [Beta Roformer](#_6y2plb943p9v) Download Center and x-minus.\n\n- MSST-GUI is now included in ZFTurbo's repo, it's the \"gui-wx.py\" file” just don’t run it by double-clicking, but run it from CMD.\n\nGPU acceleration working only Nvidia GPUs will give out of memory errors on 4GB VRAM GPUs for Roformers (you can use CPU instead).\n\n\"UnicodeEncodeError\" means there is disallowed character in your input file name, e.g. “doesn't work with [ and ] in the foldername - known bug”.\n\n- Both duality models and inst v1/2 are now added to UVR [Beta Roformer](#_6y2plb943p9v) Download Center (problems with duality models in UVR have been fixed)\n\n- Unwa released v2 version of the duality model, slightly a bit better SDR and fewer residues (available in the link below)\n\n\"[other](https://mvsep.com/quality_checker/entry/7321)\" is output from model\n\n\"[Instrumental](https://mvsep.com/quality_checker/entry/7322)\" is inverted vocals against input audio.\n\nThe latter has lower SDR and more holes in the spectrum.\n\nSo using MSST-GUI, leave the checkbox “extract instrumental” disabled for duality models.\n\n- Unwa released a new inst-voc Mel-Roformer called “duality”, focused on both instrumental and vocal stem.\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-InstVoc-Duality/tree/main> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVocals sound similar to beta 4 model, instrumentals are deprived of the noise present in inst v1 model, but as a downside, they don't sound similarly muddy to previous Roformers.\nYou can use it in the [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) for ZFTurbo script (already added) or with the OG ZF repo code. The model will now work in UVR (added in Download Center, but the problem was also fixed by Anjok and added in the OG repo’s yaml)\n\n- New Ensemble button added on x-minus for premium users for the new inst unwa’s model. It corrects the phase and almost removes the noise existing in this model\n“This post-processing uses Kim's model. After post-processing, the vocals will be replaced with those of this model.” [Examples](https://discord.com/channels/708579735583588363/900904142669754399/1299210627780444202)\n\nUsing Mel-Roformer de-noise might be better alternative:\n\n“removes more noise from the song, keeping overall instrument quality more than the new button” koseidon72. But the more aggressive variant of the model sometimes deletes parts of the mix, like snares.\n\n- New Bas Curtiz fine-tuned on MVSEP and unwa’s inst Mel-Band added on MVSEP and x-minus.\nAlthough there were only 5 submission sent to ZFTurbo for fine-tuning, and 30+ is needed, so there is not so much of a difference in the new FT.\n\n“I suggest to all of you, if there is any voice left [in inst v1], use the Mel-Roformer de-noise with minimal aggression. “not only for little voices left, but also for some background noise.\n\nUnfortunately, this new [unwa’s] model doesn't eliminate vocoder voices well from an instrumental”\n\nThe model is much faster than beta 4.\n\n- unwa released a new Mel-Roformer model focused on instrumental stem this time (a.k.a. v1):\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d) | UVR [instructions](#_6y2plb943p9v) | MVSEP\n\"much less muddy (...) but carries the exact same UVR noise from the [MDX-Net v2] models\"\n\nBut it's a different type of noise, so aufr33 denoiser won't work on it.\n\n“you can \"remove\" [the] noise with UVR-Denoise, aggr. -10 or 0” although at least with -10 it will make it sound more muddy like Kim model and synths and bass are sometimes removed with the denoiser (~becruily). UVR-Denoise-Lite doesn’t seem to damage instruments that badly, but still more than Mel denoise (recommended aggr. - 4, with 272 vs 512 windows size it’s less muddy, TTA can stress the noise more, somewhere above 10 aggr. it gets too muddy). UVR-Denoise on x-minus is even less aggressive (it’s medium aggression model for free users without aggression pick), but it might catch ends of some instruments like bass occasionally. Premium minimum aggression model is somehow more muddy, but doesn’t damage instruments. Minus the noise, this is a **groundbreaking** instrumental model among public models or existing Roformers.\n(more [training details](https://discord.com/channels/708579735583588363/767947630403387393/1298225992636174346))\n\n“Flipping the target seems to definitely have effect on the instrumental part!” Bas Curtiz\n\n“I got an error when I set num\\_stems to 2.” unwa\nYou can use “target\\_instrument: null” instead, which is also required for multistem training like on [this](https://imgur.com/a/eOSW8I7) example ~jarredou\n“It's because of the PHASE. I found a way to fix it. Today I will add a new ensemble button.”\n\n- Similarity / Phantom Center Extractor [model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2417116936) by wesleyr36 added on MVSEP (Experimental section) and x-minus.pro (Extract backing vocals).\n\n“This model is similar to the Center Channel Extractor effect in Adobe Audition or Center Extract in iZotope RX [and Audacity/Bertom], but works better.\n\nAlthough it does not isolate vocals, it can be useful.” Aufr33\nYou can find more on the topic in [Similarity Extractor](#_3c6n9m7vjxul) section.\n\n- ZFTurbo released MVSEP Wind model on the site (MelBand/SCNet/ensemble)\n\nSome songs might be separated better vs the model on x-minus, not all.\n\n- A GUI for ZFTurbo's Music Source Separation script for inference called MSST-GUI was released by Bas Curtiz (link with instruction in the description):\n<https://www.youtube.com/watch?v=M8JKFeN7HfU> (reupload)\n\nIt has screen reader compatibility, although people can't navigate with the arrow keys in the web view for now, but at least you have HTML source of the page so you can just download models from there.\n- Multiple updates were made since that excerpt was written and new models were constantly added\n\n- If you have “ERROR: Could not build wheels for diffq, pesq, which is required to install pyproject.toml-based projects” then\n“Edit the requirements.txt file and remove or comment out that line with asteroid” [click](https://imgur.com/a/rh553CR)\nThen rerun pip install -r requirements.txt\n\n- If you have decent Nvidia GPU, and no GPU acceleration maybe “Check these commands to install torch version that handle cuda”:\n\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 torchvision torchaudio —-extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\n- New beta 4 of unwa’s Mel-Roformer fine tune of Kim’s voc/inst model released:\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)\n\nBe aware that the yaml config has changed, and you need to download the new beta4 yaml.\n\n“Metrics on my test dataset have improved over beta3, but are probably not accurate due to the small test dataset. (...) The high frequencies of vocals are now extracted more aggressively. However, leakage may have increased.” - unwa\n\n“one of the best at isolating most vocals with very little vocal bleed and still doesn't sound muddy” “gives fuller vocals”. Can be a better choice on its own than some ensembles.\n\n- ZFTurbo, the owner of MVSEP, seeks help on improving Bas Curtiz’ ft Mel-Roformer model on MVSEP. How can you help?\n1) Find a badly separated song with this model (e.g. bleeding)\n2) Find other model which separates your song correctly\n3) Send the good results (instrumental stem + vocal stem in stereo/44kHz with the same length) to ZFTurbo on Discord (or email)\n\nThe result will be used as input for training a new, fine-tuned model.\n\n- “1) All [MVSEP] ensembles now use Bas Curtiz MelRoformer model. SDR for Multi dataset stayed almost the same, but greatly increased for Synth\n\n2) Drums model were updated for all Ensembles too.\n\n<https://mvsep.com/quality_checker/entry/7197>” ZFTurbo\n\n- New SCNet Large Drums model added on MVSEP\n\n- <https://studio.gaudiolab.io> introduced new Noise Reduction feature\n\n- For those having problems with too slow functioning of lucida.to, you can use <https://mp3-daddy.com/>. Sometimes FLAC option might not work, then download mp3 first, and then FLAC will work (the files have full 22kHz spectrum). Although, sometimes it may fail anyway (not always in incognito mode and with third party cookies allowed, and after long wait after error appeared). Downloading might be possible by manual download with Inspect option in your browser (it starts downloading and interrupts like on GSEP in the old days). Don’t even bother reading their site description - it’s full of AI-written sh&t. Contrary to what they say, it doesn’t support YT or YT Music/Tidal/Deezer links, so you need to use their search engine. So probably the max output quality is 44kHz/16 bit. It doesn’t seem to use Tidal (maybe Deezer).\n\n<https://doubledouble.top/> is now also back online and supports Apple Music unlike Lucida, but it might be slower and go offline eventually as before.\n\n- Strings model based on MDX23C arch added on MVSEP. It has low SDR yet (3.84), so it’s hit or miss whether it will work for your song, but some people had even good results at times. ZFTurbo plans to work on it further.\n\n- Finally, HQ\\_4 released in March has been added also on MVSEP (it was also added on x-minus/uvronline.app not long ago via [this](https://uvronline.app/ai?hp&test-mdx) link at least)\n\n- Beta 3 of the unwa’s Mel-Roformer fine-tuned Kim’s model released. Fine-tuning was started from scratch on enhanced dataset made with help of Bas Curtiz. As the result, the model is free from the high frequency ringing present in the previous beta models.\n\n“I've added hundreds of GB worth of data to my dataset”.\n\n[Download](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main) | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)\n\n\"definitely better than Kim's now\" although vocal residues might occur yet, and then use unwa’s BS-Roformer fine-tune instead. SDR is slightly lower than Kim Mel-Roformer. It’s good for RVC.\n\n- The Lew’s model was added to jarredou’s [Colab](https://colab.research.google.com/github/jarredou/Apollo-Colab-Inference/blob/main/Apollo_Audio_Restoration_Colab.ipynb)\n\n- Lew released a model for Apollo, serving to enhance vocal results of Roformers <https://ufile.io/09560o34>\n\n“You can use it in Music Source Separation Training [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training/), and it should be compatible with jarredou Apollo Colab” Links a bit below (not compatible with UVR).\n\n- Beta 2 of the unwa’s Mel-Roformer fine-tuned model released ([Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)).\n\nBe aware that both models have some ringing issues in higher frequencies. Hard to say if it will be fixed in the further training, Unwa explaining said it was mainly made with vocals in mind so it’s not sure.\n\n- Unwa released beta version of still-in-training Mel-Roformer fine-tuned model of Kim’s. Not tested SDR-wise, but might give better results than the old Unwa’s BS-Roformer model already. Download:\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main>\n\nIn UVR consider using dim\\_t = 1501 at the bottom of the yaml (can be slow), but 1333 or 1301 can be better for e.g. 40 second snippets, while the biggest SDR is for 1101 for all Roformers, but it still depends on a song what gives the best results (in reality, even SDR for each song is different, and bigger SDR not always means better quality, the quality using specific parameters might even differ in certain fragments).\n\n- (uvronline.app/x-minus) New electric and acoustic guitar models by viperx' added on the site for premium users.\n\nAcoustic seems to be good, while electric might be more problematic at times.\n\n- Now lalal.ai have some voc/inst models sounding like some ensemble of public Roformers, but still not as good, although close. Some of their specific models are worth trying out, e.g. lead guitars - the model got better by the time, or also piano model\n\n- <https://github.com/JusperLee/Apollo> | jarredou [Colab](https://colab.research.google.com/github/jarredou/Apollo-Colab-Inference/blob/main/Apollo_Audio_Restoration_Colab.ipynb)\n\njarredou: “New tool for heavily compressed mp3 restoration, using bandsplitting and roformers. It does work really great if the audio was compressed at 44.1khz sample rate, whatever bitrate [<=128kbps]. BUT if there was some resampling leading to hard cutoff, it will wrongly behave.\n\nThe current model of Apollo was only trained on mp3 compressed audio. If you use ogg/opus/m4a/whatever else compressed audio as input, it's not guaranteed that it will work as expected.”\n\nIt was also added on MVSEP as \"Apollo MP3 Enhancer\":\n\nDemo: <https://mvsep.com/result/20240919224117-f0bb276157-mixture.wav>\n\n*Advice*\n\n\"[Good use case](https://imgur.com/a/REvevJh):\n\nInput has no hard cutoff (quality slowly degrades toward high freq).\n\nGenerated output is as expected. It can fill holes, and it can remove artifacts (and probably bleeding too) and is working great with highly degraded audio here. If trained on clean source vs separated stem, which is not as much degraded content than 32kbps mp3 like previous example, I think it could work really great” [[bad use case](https://imgur.com/a/4ePVWYP)]\n\n“so far the overlap magic is needed, cause u hear the transition”\n\n“It seems to alter the tempo. It's not a constant alteration, it just shifts stuff, and you can't invert” becruily\n\n> “I've seen this too in my tests, but it seems to happen only at the end of the chunk.\n\nIn the updated version (in which the end of chunks is ditched), I haven't seen that issue again.\n\n>Overlap feature added [to the Colab].\n\nNew inference.py created for easier local CLI use.\n\nI have set chunk\\_size at 3 seconds as default in the Colab because it was the chunk\\_size used to train the model, but it seems that the highest is the best.” jarredou\n\nIt was also added into ZFTurbo’s training dataset (read [more](https://discord.com/channels/708579735583588363/911050124661227542/1286122948318724127)).\n\nAlso, for non-mp3 input files, you might want to experiment with compressing them to 64kbps first.\n\n- Also, TS-BSmamba2 was added to the repo. So it's available for training now too. But currently it works only on Linux.\n\n- Aufr33 added MDX HQ4 to x-minus/uvronline via this link: <https://uvronline.app/ai?hp&test-mdx>\n\n- (x-minus/uvronline.app) “viperx has updated the piano model!\n\nI just replaced it on the website.” Aufr33\n\n“The new piano model is incredible, I have even been able to separate a harpsichord by passing it over and over again through the model until the other instruments are left alone and it doesn't sound bad at all.”\n\nThere was some update to MVSEP piano models lately too, and there are SCNet and viperx models and ensemble with metrics added on the website (at least on separate page beside multi song dataset chart).\n\n“both similar but mvsep has a teeny bit more bleed during the choruses and whatnot”\n\n- (MVSEP) “I added possibility to use Bas Curtiz’ MelRoformer model with large score on Synth dataset. You must choose it from MelRoformer options. By default, my model is used.\n\nThe problem with Bas's model is that it's very heavy and slow, with almost the same score on Multi dataset.” Aufr33\n\n“I've tried some songs and have great result! Music sounds fuller than original Kim's one & the finetuned version from ZFTurbo. Even [though] the SDR is smaller than BS Roformer finetuned last version, but almost song has the best result in instrumental.\n\n1 song I found is bad result is from Wham - Where did your hearts go. The trumpet or sax whatever sound was lost, the model detects it as vocal, and the 1st beginning of vocal still heard. On other mel roformer, that trumpet or sax sound can still separate it as well.” Henry\n\n- (MVSEP) “Guitar model was updated. I added BSRoformer model by viperx with SDR: 7.16. And\n\n- I replaced [guitar] Ensemble. Earlier it was MDX23C + Mel. Now its BS + Mel. SDR increased from 7.18 to 7.51.\n\nDemo: <https://mvsep.com/result/20240914110542-7ab0356600-song-000-mixture.wav>\n\nAll these models are available for all users.” ZFTurbo\n\n- The new MDX HQ5 beta model is now online!\n\nUse this link to access it:\n\n<https://uvronline.app/ai?hp&test-mdx> - link for premium users\n\nGo to \"music and vocals\" and there you will see it (scroll down).\n\nIt's not a final model yet, and the model is in training from April and is still in progress.\n\nIt seems to be muddier than HQ\\_4 (and more than Kim’s and MVSEP’s Mel-Roformer), it has less vocal bleeding than before, but more than Kim Mel-Roformer. Sometimes struggles with reverb.\n\n\"Almost perfectly placed all the guitar in the vocal stem\" it might get potentially fixed in the final version of the model, which is planned for release in the mid-November as Anjok said at 04.11.24.\n\nThe model is not available in UVR yet (only on uvronline.app)\n\n- Using the [UVR Roformer](#_6y2plb943p9v) beta patch for Mac doesn’t allow you to choose the Roformer parameter to check for manually copied Roformer models to UVR like: Kim Mel-Roformer or unwa’s Roformer, and only config name can be chosen, but no confirm button is available to make the model work. Place [corresponding](https://drive.google.com/drive/folders/14IfdqN3tDjXVe0hQ9i5-1KejIJTO09xX?usp=sharing) hash-named file to models\\MDX\\_Net\\_Models\\model\\_data after placing model file to MDX\\_Net\\_Models and non-hased model’s yaml to mdx\\_c\\_configs and start the UVR.\n\n- Aufr33 released files for the new UVR de-reverb model made with jarredou\n\n(based on VR 5.1 arch).\n\n“1. Download [this](https://mega.nz/file/CFRBHLRK#uhRexQFJVo8_Owr8x9sEEohDcCNZbl3UgeX5eyD7IFA) and unzip into your Ultimate Vocal Remover folder\n\n2. Select VR architecture and DeReverb model from the menu\n\n3. Set the parameters as shown [here](https://imgur.com/a/dZAJwef)”\n\n(PS: Dry, Bal: 0, VR 5.1, Out:32/128/Param: 4band\\_v4\\_ms\\_fulband -\n\nAn already existing json config file in modelparams folder has the same checksum)\n\nBas Curtiz’ “Conclusion so far:\n\n- MDX[23C] De-Reverb seems to be cleaner, takes the reverb away, also between the words,\n\nwhereas VR leaves a little reverb\n\n- [The new] VR De-Reverb seems to sound more natural, maybe therefore actually.\n\nAlso, MDX tends to 'pinch' some stuff away to the background, which sounds unnatural.\n\nThis is just based on my experience with 3 songs/comparisons, but both points are a pattern.\n\nOverall, they're both great when u compare them against the original reverbed/untouched vocals.” [Video](https://discord.com/channels/708579735583588363/708580573697933382/1279223128848863338)\n\n- SCNet Large vocal model on MVSep published.\n\nMultisong dataset:\n\nSDR vocals: 10.74\n\nSDR other: 17.05\n\n“just like the new bs roformer ft model, but with more bleed. [BS] catches vocals with more harmonies/bgv” isling\n\n- Cyrus repaired pip issues with [Medley Vox](#_s4sjh68fo1sw) Colab\n\n- Aufr33 released MDX23C de-reverb model files\n\n<https://a19p.uvronline.app/public/dereverb_mdx23c_sdr_6.9096.ckpt> | [config](https://drive.google.com/file/d/1dQHfce4VKYSmWZ3IgIj4SZq_uTicD4At/view?usp=sharing)\n\n“If you will use this model in your project, please credit us (me and jarredou)”\n\nAlso added on MVSEP.\n\nUVR instruction:\n\n“1. Just copy model to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models\n\n2. Copy .yaml config to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models\\model\\_data\\mdx\\_c\\_configs\n\n3. When opening UVR, selecting dereverb\\_mdx23c\\_sdr\\_6.9096 from the MDX-Net process method, don't click 'RoFormer model' cause it's not.\n\n4. Select config\\_dereverb\\_mdx23c from the dropdown. Done.” ~Bas Curtiz\n\n5\\*. In case of “no key” error in UVR, changed line 30 in the config to:\n\nNo dry\n\nBut it doesn’t happen to everyone.\n\n- New UVR Dereverb model added on uvronline.app for premium users.\n\nIt seems to handle room reverb better than the previous MDX23C model, and the Foxy’s model sometimes cut “way too much” than this new model.\n\n- People cannot separate using Ripple since longer than August 12th. There's an error \"couldn't complete processing please try again\"\n\n- (x-minus.pro/uvronline.app) “Hipolink was a temporary solution. Now I can accept payment via Patreon as well.” Aufr33\n\n- MDX23C De-reverb model by Aufr33 released for premium users of uvronline.app.\n\n“Thanks to jarredou for helping me create the dataset”\n\n- Jarredou released v. 2.5 of MDX23 Colab adding the new Kim Mel-Roformer model. Final SDR is higher (17.64 vs 17.41 for instrumentals, with 2024.08.15 [MVSEP](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum) Ensemble being 17.81).\n\n<https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb>\n\n“Baseline ensemble is made with Kim Melband rofo, InstVocHQ and selected 1296 or 1297 BS Rofo” switching from 1296 to 1297 produces more muddy/worse instrumentals in this Colab (more sudden jumps of dynamics from residues).” VitLarge is no longer used by default.\n\n“I've opened a donation account for those who would want to support me: <https://ko-fi.com/jarredou>”\n\n- unwa’s fine-tuned BS-Roformer model released (12.59 for instr) - worse SDR than other fine-tuned models on MVSEP by ZFTurbo, but better SDR than Kim’s MelRoformer and viperx base model <https://drive.google.com/file/d/1Q_M9rlEjYlBZbG2qHScvp4Sa0zfdP9TL/view>\n\n- Mel-RoFormer Karaoke / Lead vocal isolation model files released by Aufr33 and viperx\n\n“If you will use this model in your project, please credit us” ([download](https://mega.nz/file/qQA1XTrb#LUNCfUMUwg4m4LZeicQwq_VdKSq9IQN34l0E1bb0fz4))\n\n[UVR instructions](#_6y2plb943p9v). Be aware that online version on uvronline/x-minus seems to work better.\n\n- doubledouble.top will be soon replaced by <https://lucida.to/>\n\n- Kimberley Jensen released her Mel-Band Roformer vocal model publicly ([download](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model))\n\n(simple [Colab](https://colab.research.google.com/drive/1tyP3ZgcD443d4Q3ly7LcS3toJroLO5o1?usp=sharing)/[CML inference](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model)/[x-minus](https://x-minus.pro/)/[MVSEP](https://mvsep.com/)/[jarredou Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) too now)\n\nWorks in UVR [beta Roformer](#_6y2plb943p9v) ([model](https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt?download=true) | [config](https://drive.google.com/file/d/1U1FnACm-ontQSjhneq-WKk1GHEiTW97s/view?usp=sharing) - place the model file to models\\MDX\\_Net\\_Models and config to model\\_data\\mdx\\_c\\_configs subfolder and “when it will ask you for the unrecognised model when you run it for the first time, you'll get some box that you'll need to tick \"roformer model\" and choose it's yaml”.\n\nUse overlap 2 for best SDR, or 8 for faster inference in UVR)\n\n- SCNet model published on MVSEP. Similar metrics to MDX23C model, but seems to leave lot of vocal residues.\n\n“ it is based on SCNet-small config from the paper, the SCNet-large config is almost 1 SDR above in the reported eval, so hopefully, next SCNet model trained by ZFTurbo with that large config will be better too.” So far he had some problems training on large config, sadly.\n\n- Slightly better Roformer 2024.08 model (0.1 SDR+) was added on MVSEP\n\nvs 2024.04 model “it seems to be much better at taking the vocals when there are a lot of vocal harmonies.”\n\n- (x-minus.pro/uvronline.app) “In the new interface, the BS-RoFormer model now also has De-mudder\n\nSelect the Music and vocals, BS-RoFormer and after processing you will see the De-mudder button appear.” Aufr33\n\nIt works for premium.\n\n- If you got an error while using jarredou’s Drumsep Colab (*object is not subscriptable*):\n\nchange to this on line 144 in inference.py:\n\n*if type(args.device\\_ids) != int:*\n\n*model = nn.DataParallel(model, device\\_ids = args.device\\_ids)*\n\n(thx DJ NUO)\n\n- (GSEP) I received email about deletion of my files on one of my accounts which is inactive, if I don’t buy premium (haven’t received it on my main account with premium), so it's probably due to inactivity and no premium. It’s probably for accounts not using the service since the release of the new paid site and/or maybe didn’t have premium since then (email is from July 9th, so 3 months after the release of the new site, so possibly your files can get deleted after 3 months after premium was disabled on your account). Normally, new separations for free users are deleted after 3 days now, but older files were preserved at least for accounts using beta till now. The account wasn’t used since the end of October 2022.\n\nCheck your mailbox to ensure, I didn’t find that mail in spam on the main account with premium, so hopefully it’s not for everyone (at least not for those with premium or who used the site since the last 3 months):\n\n“All files from Gaudio Studio will be deleted on August 7, 2024 [Wednesday]. (...) If you purchase a Studio Plan, your files will be preserved.”\n\nBe aware that they function in Japan, which is GMT+9, so it’s 6 hours sooner than CEST (Warsaw, Skopje, Zagreb).\n\n~~If you currently have premium, you can download all your previous separations in WAV without any charge (at least that’s how it used to be), without premium~~ it says ~~(misleadingly, I assume)~~ “The song processed in the beta service do not support WAV file downloads.” ~~but probably you’ll be able to do that if you buy premium if nothing has changed.~~ It’s no loner possible, and there are no references to WAV in dev tools as before.\n\n- Aufr33 released files of his Mel-Roformer de-noise models publicly:\n\n[Less aggressive](https://mega.nz/file/rIRQGJ4D#9SHaPIXt8GRoi2SL29WUILW0g9dk26I5njyFPZuPJQ8) & [More aggressive](https://mega.nz/file/vM4mHTYQ#f_uCxxS_olfTR4iAsOc-XS6sfUecfbF-ZKXrk3IjbnY) | [yaml file](https://drive.google.com/file/d/1uwInhwgjOMIdOMTgj_oNR_dmaq7E-b3g/view?usp=sharing)\n\n“If you will use this model in your project, please credit me”\n\nAdded in jarredou [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) too (and on x-minus.pro/uvronline.app for premium users and MVSEP).\n\nBoth models work in UVR too (don’t forget setting overlap to 2 to avoid stem misalignment issues like for other Roformers in UVR Roformer beta, overlap 3 or above will break separation)\n\n- Jarredou released manual ensemble Colab with drop-down menus (based on ZFTurbo code)\n\n<https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Manual_Ensemble_Colab.ipynb>\n\n- To fix issues with BS variant of anvuew’s de-reverb model in UVR “change stft\\_hop\\_length: 512 to stft\\_hop\\_length: 441 so it matches the hop\\_length above” in the yaml file. It doesn’t happen on (thx lew).\n\nIf that line is not present in your model config go to the settings, then choose MDX In the advanced menu, then click the \"clear auto-set cache\" button.\n\nThen go back to the main settings, click \"reset all settings to default\" and restart the app (thx santilli\\_).\n\n- If you still have error on every attempt of using GPU Conversion in UVR on AMD GPU (you might potentially use outdated drivers and/or Windows), go to Ultimate Vocal Remover\\torch\\_directml and replace DirectML.dll from C:\\Windows\\System32\\AMD\\ANR (make backup before). Experimentally, you can use this older [1.9.1.0](https://drive.google.com/file/d/1dxmG0cfclGMkFLfdoKAfQaDkx3Rcspku/view?usp=sharing) version of the library. Restart UVR after replacing the file!\n\nBe aware that results achieved without GPU Conversion that way, at least on certain configurations, might have noisy static instead of bleeding in less noisy parts of stems vs when using only CPU (basically, MDX noise can be somehow different on GPU and denoise standard only alleviates the issue to some extent, and you need to use Denoise Model option to get rid of this noise, or better solution - min spec manual ensemble of denoise disabled result and denoise model to get rid of more noise. Aufr’s Mel-Roformer minimum denoise works worse for it.\n\n- GSEP introduced a new model called “Vocal Remover” dedicated for vocal extraction and is only used for vocals, instrumental stem still uses the old model. Might be good at extracting SFX as well. (becruily/wancite)\n\n- ([uvronline.app](https://uvronline.app)) Mel-Roformer De-noise released for premium users.\n\n“This model is optimized for music and vocals. You can choose between two aggressiveness settings:\n\nminimum - removes fewer effects such as thunder rolls\n\naverage - usually removes more noise”\n\n“The new model works as good as my UVR De-noise model, or even better.”\n\n- [drumsep](#_2u19k7ty9b00) model by aufr33 and jarredou added on [MVSEP](https://mvsep.com/) and [uvronline.app](https://uvronline.app/ai) too\n\n- Not Eddy’s multi-arch Colab released in form of UI (like in e.g. KaraFan)\n\n<https://colab.research.google.com/github/Eddycrack864/UVR5-UI/blob/main/UVR_UI.ipynb>\n\nIn case of “FileNotFoundError: [Errno 2]” try other location than “input”, or other Google account in case of ERROR - mdxc\\_separator (helps for both).\n\n- New Mel-Roformer de-reverb model by anvuew was released\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2226805511>\n\n(to make it work with UVR, delete “linear\\_transformer\\_depth: 0” from the YAML file, copy the model to MDX\\_Net\\_Models and YAML config to model\\_data\\mdx\\_c\\_configs)\n\nAlso added on MVSEP.\n\n“I'm definitely hanging onto it. It reminds me of the equivalent dereverb mdx model, which I've always liked (when it works). The roformer model is cleaner in some ways, though slightly more filtered and aggressive.\n\nNeither the roformer or mdx models respond to mono reverb. However, adding a stereo reverb on top solves that, especially with roformer.” (Musicalman)\n\n“anvuew's models can remove reverb effect only from vocals. Old FoxJoy's model works with full track.”\n\n- BS-Roformer -||- - a bit better SDR\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2229279531>\n\n(To fix “The size of tensor a”... error with BS variant of anvuew’s de-reverb model “change stft\\_hop\\_length: 512 to stft\\_hop\\_length: 441 so it matches the hop\\_length above” in the yaml file.) thx lew\n\nAdded in the Colab:\n\n[https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music\\_Source\\_Separation\\_Training\\_(Colab\\_Inference).ipynb](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- The below model has been added. Ensembles updated as well.\n\nSome users report “bleed from some synths and bass guitar” “Some drums instruments are low volume on drums only. While mel roformer makes a good clean one” “On some parts it's almost like it doesn't separate anything for a few seconds and on some other parts, it's working just really great. The demucs one is way more stable when listening to individual model separations on the same song.” (or simply older ensemble)\n\n- (MVSEP) I finished my drums models. Results:\n\nMelRoformer SDR: 12.76\n\nDemucs4 (finetuned) SDR: 12.04\n\nEnsemble Mel + Demucs4 SDR: 13.05\n\nfor comparison:\n\nOld Best Demucs4 SDR: 11.41\n\nOld Best Ensemble SDR: 11.99\n\nNew models will be added on site soon.” ZFTurbo\n\nFor comparison, the Mel-Roformer available on x-minus trained by viperx has 12.5375 SDR.\n\n-(for models trainers)“Official SCNet repo has been updated by the author with training code: <https://github.com/starrytong/SCNet>”\n\n“ZF's script already can train SCNet, but currently it doesn't give good results”\n\n[https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v.1.0.6)\n\nThe author’s checkpoint:\n\n<https://drive.google.com/file/d/1CdEIIqsoRfHn1SJ7rccPfyYioW3BlXcW/view>\n\n“One diff I see between author config and ZF's one, is that dev has used learning rate of 5e-04 while it's 4e-05 in ZF config. And main issue ZF was facing was slow progress (while author said it worked as expected using ZF training script <https://github.com/starrytong/SCNet/issues/1#issuecomment-2063025663>)”\n\nThe author:\n\n“All our experiments are conducted on 8 Nvidia V100 GPUs.\n\nWhen training solely on the MUSDB18-HQ dataset, the model is\n\ntrained for 130 epochs with the Adam [22] optimizer with an initial\n\nlearning rate of 5e-4 and batch size of 4 for each GPU. Nevertheless,\n\nwe adjust the learning rate to 3e-4 when introducing additional data\n\nto mitigate potential gradient explosion.”\n\n“Q: So that mean that you have to modulate the learning rate depending on the size of the dataset?\n\nI think it's the first time I read something in that way.\n\nA: Yea, I suppose because the dataset is larger you need to ensure the model sees the whole distribution instead of just learning the first couple of batches”\n\n- jarredou/frazer\n\nSCNet paper: <https://arxiv.org/abs/2401.13276>\n\nOn the same dataset (MUSDB18-HQ), it performs a lot better than Demucs 4 (Demucs HT).\n\n“Melband is still SOTA cause if you increase the feature dimensions and blocks it gets better\n\nyou can't scale up scnet cause it isn't a transformer. It's a good cheap alt version tho”\n\nStill, it might potentially give interesting results when training will be mastered to the point when e.g. SDR will be in pair with at least MDX-Net models as they can still be better than Roformers for instrumentals in many cases (e.g. MDX-Net tend to have less muddy instrumentals - every arch can have its own unique sound characteristics and might be potentially useful for ensembling).\n\n- (jarredou) “I've released the Drums Separation model trained by aufr33\n\n(on my not-that-clean drums dataset).\n\nStems: kick, snare, toms, hihat, ride, crash\n\nIt can already be used, but training is not fully finished yet.\n\nThe config allows training on not so big GPUs [n\\_fft 2048 instead of 8096], it's open to anyone to resume/fine-tune it.\n\nFor now, it's struggling a bit to differentiate ride/hh/crash correctly, kick/snare/toms are more clean.\n\nDownload\n\n[attached config includes also necessary training parameters for training further using ZFTurbo [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training/tree/main)]: <https://github.com/jarredou/models/releases/tag/aufr33-jarredou_MDX23C_DrumSep_model_v0.1>\n\nUse on Colab: [https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music\\_Source\\_Separation\\_Training\\_(Colab\\_Inference).ipynb](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)”\n\nIt works in UVR too. All models should be located in the following folder:\n\nUltimate Vocal Remover\\models\\MDX\\_Net\\_Models\n\nDon't forget about copying the config file to: model\\_data\\mdx\\_c\\_configs\n\nThe model achieved much better SDR on small private jarredou's evaluation dataset compared to the previous drumsep model by Inagoy which was based on a worse dataset and older Demucs 3 arch.\n\nThe dataset for further training is available in the drums section of [Repository of stems/multitracks](#_k3cm3bvgsf4j) - you can potentially clean it further and/or expand the dataset so the results might be better after resuming the training from checkpoint. Using the current dataset, the SDR might stall for quite some amount of epochs or even decrease, but it usually increases later, so potentially training it further to 300-500-1000 epochs might be beneficial.\n\n“I’ve had models where SDR changes by 0.01 but fullness/bleedless change with 10-15 points, I wouldn’t trust it that much” - becruily\n\nCurrent model metrics:\n\n“Instr SDR kick: 18.4312\n\nInstr SDR snare: 13.6083\n\nInstr SDR toms: 13.2693\n\nInstr SDR hh: 6.6887\n\nInstr SDR ride: 5.3227\n\nInstr SDR crash: 7.5152\n\nSDR Avg: 10.8059” Aufr33\n\nAnd if evaluation dataset hasn't changed since then, the old Drumsep SDR:\n\n“kick : 13.9216\n\nsnare : 8.2344\n\ntoms : 5.4471\n\n(I can't compare cymbals score as it's different stem types)” - jarredou\n\nAfter initial jarredou’s training in Colab, Aufr33 decided to train the model for additional 7 days, to at least above epoch 113 (perhaps around 150, it wasn't said precisely), while using the same config, but on a faster GPU (2x 4090).\n\nEven epoch 5 trained on jarredou's dataset casually in free Colab (which uses Tesla T4 15GB with performance of RTX 3050, but with more VRAM) with multiple Colab accounts and very light and fast training settings, already achieved better SDR than Drumsep:\n\n“epoch 5:\n\nInstr SDR kick: 13.9763\n\nInstr SDR snare: 8.4376\n\nInstr SDR toms: 6.7399\n\nInstr SDR hh: 0.7277\n\nInstr SDR ride: 0.8014\n\nInstr SDR crash: 4.4053\n\nSDR Avg: 5.8480\n\nepoch 15:\n\nInstr SDR kick: 15.3523\n\nInstr SDR snare: 10.8604\n\nInstr SDR toms: 10.3834\n\nInstr SDR hh: 4.0184\n\nInstr SDR ride: 2.7248\n\nInstr SDR crash: 6.1663\n\nSDR Avg: 8.2509”\n\nDon't forget to use already well separated drums (e.g. from Mel-Roformer for premium users on x-minus) from well separated instrumental as input for that model, or Jarredou MDX23 Colab fork v. 2.4 or MVSEP 4/+ ensemble (premium).\n\nPurely for drums separation from even instrumentals, the model might not give good results. It was trained just on percussion sounds and not vocals or anything else.\n\nAlso, e.g. the kick and toms might have a bit weird looking spectrograms. It’s due to:\n\n“mdx23c subbands splitting + unfinished training, these artifacts are [normally] reduced/removed along [further] training.” [Examples](https://discord.com/channels/708579735583588363/900904142669754399/1258441408109613209)\n\n- BTW, just for inference (separation), “ONNX and Demucs models don't work with multi-GPU”\n\n- In the “experimental” section of MVSEP, there’s been added a new multispeaker model at the bottom.\n\nE.g. it can work well splitting rapping and singing overlapped in the same, previously well separated vocal stem, but:\n\n“It works more or less ok on my validation [5 quite different \"songs\"], but it's a disaster on real data. I opened it for everyone, but don't expect really good results” ZFTurbo\n\n- Also, there has been added a new multichannel section in “experimental” it’s just for songs with 3 or more audio channels like e.g. Dolby Atmos (FLAC/WAV input supported). It’s just BS-Roformer and there’s “no reason to process stereo tracks with it”. Also, the original sample rate of the input file is preserved here.\n\n- One of MVSEP’s GPU died recently, so the separations will be probably slower than usual.\n\n- jarredou updated his [AudioSR Colab](https://colab.research.google.com/github/jarredou/AudioSR-Colab-Fork/blob/main/AudioSR_Colab_Fork.ipynb). Now “each processed chunk is normalised at same LUFS level (fixes the volume drop issue)” plus “input audio is resampled accordingly to 'input\\_cutoff' (instead of lowpass filtering)”\n\nNow also some errors associated with mono files are fixed.\n\n- New drums model available on x-minus.pro\n\n“SDR is: 12.4066.\n\nThanks to @viperx for model training! The model is trained on 995 songs. A small number of my pairs were included in the dataset.” Aufr33\n\nVery positive reviews so far.\n\n- New guitar model added on MVSEP\n\n“Previous old model mdx23c: 4.87\n\nNew mdx23c model: 6.34\n\nNew MelRoformer model: 6.91\n\nEnsemble MDX23c + MelRoformer: 7.10\n\nExtract vocals and after apply ensemble MDX23C + MelRoformer: 7.28”\n\n- If x-minus.pro site doesn’t work for you, use the clone instead:\n\n<https://uvronline.app/ai?hp>\n\n- Demudder on x-minus was updated on 13.06 (cosmetic differences)\n\n- “Some interesting updates to SL 11\n\n<https://www.youtube.com/watch?v=2BoEgBGiafM>”\n\nSeemingly, separation features got better. Coming on 19th June.\n\nTheir new algo was [evaluated](https://mvsep.com/quality_checker/entry/6689), and SDR is a bit worse than [htdemucs](https://mvsep.com/quality_checker/entry/287) 4 stem non ft model.\n\nEvery stem has some bleed, vocals are decent, and actually have better SDR than Demucs\\_ft. GPU processing in options has low utilization and is slow, they say it’s planned to be fixed in patch. 16GB VRAM recommended at least while using brass and saxophone. Around 18 models can be used in total.\n\nUnmix Mulitple Voices is for speech case, not for singing case.\n\nUnmix Drums option can serve for further separation of drums\n\n“the residual kick/snare problem is much better, but the cymbal split does still contain bleed from the rest of the song sadly” [vs drumsep] - jasper waffles\n\n- [Multi-arch Colab by Not Eddy](https://colab.research.google.com/github/Eddycrack864/UVR5-NO-UI/blob/main/UVR5_NO_UI.ipynb)\n\nincorporates: MDX-Net, MDX23C, Roformers (incl. 1053), Demucs, and all VR models, YouTube support and batch separation. It uses broken overlap from OG beta UVR code. Use the one below for just Roformers and now also 1053 instead:\n\n[Colab with Roformers](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n- New Mel-Roformer De-Crowd model released on MVSEP. It slightly surpassed SDR of the previous MDX23C model.\n\nIt's also available publicly in the repository below:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\nTo use it in UVR, Go to UVR\\models folder, and paste [that](https://drive.google.com/drive/folders/1eO6rDhxh77eC-l0IHF16mQNwrrWOX31h) there.\n\nThen change \"dim\\_t\" value to 801 at the very bottom of: “model\\_mel\\_band\\_roformer\\_crowd.yaml” in mdx\\_c\\_configs subfolder. Don’t use overlap above 4.\n\n- Drums Roformer model shared publicly by Yolkis\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2156069553>\n\nNot totally bad results as for 7.68 SDR, but it was trained on subpar GPU for Roformers for only 5 days. To use it in UVR, delete linear\\_transformer\\_depth line in the config.\n\n- (x-minus.pro) “The new Strings model by viperx has been added!” based on Mel-Roformer arch.\n\nGood results reported so far.\n\nSometimes it can pick up brass.\n\n- (x-minus.pro) “Demudder has been added!\n\nThis only works with the mel-roformer inst/vocal model. You need premium to use it.”\n\nIt works only for instrumentals. Vocals are unaffected. It fills holes in the spectrum, basing on both vocals and instrumental stems (e.g. it won’t serve to just recover lossy mp3).\n\nThe option shows after you uploaded/processed a track (at least with mel-roformer model).\n\nIt’s capable of providing better results than max\\_mag a.k.a. BS and Mel Roformer ensemble (premium), depends on a song.\n\nSDR-wise, it’s not much worse than original model results (16.48 vs 17.32).\n\n- “New VST for real time source separation (probably same models [like in] MPC stems)\n\n<https://www.youtube.com/watch?v=0Js5bWQWY7M>\n\n<https://products.zplane.de/products/peelstems/>”\n\n- Mel-RoFormer de-crowd by aufr33 and viperx model files have been released publicly.\n\nDL: <https://buzzheavier.com/f/GV6ppLupAAA>\n\nConf: <https://buzzheavier.com/f/GV6psmJpAAA>\n\n“You can use ZFTurbo's code [check his GitHub] to run this model. If you use it in your project, please credit us”\n\nTo use it in UVR 5, change “the name of the model itself to the name of the YAML file.\n\nThis model only works the best when at 2 overlap, since anything higher than that it'll stop isolating parts of the song entirely.\n\nOr else, you can also check out setting “inference.dim\\_t” parameter at the bottom of the yaml file to 801. “Leaving dim\\_t at 256 (2.5seconds) makes the model only usable with overlap=2 (2 seconds) with current beta code. Higher value will result is missing/non-processed parts.” - jarredou\n\n“The Roformer model does a better job at retaining the instruments and vocals as well as some sound effects and synths better than the MDX-NET decrowd, but at the cost of crowd bleed. While the MDX-NET decrowd model does a better job at removing most of the crowd at the cost of instrumental bleed into the crowd stem.\n\nSometimes [the old model] mistakes the fuzzy sounds of guitar as crowd noise\n\nAlso isolates some kicks in songs” - Kashi\n\n“For really difficult live songs (where the crowd is overwhelmingly loud to the point where you can't hear the band properly) sometimes filtering vocals with mel roformer on xminus THEN running the vocals stem through the mdx decrowd model sometimes helps” - isling\n\n- (x-minus) “The new Wind / Saxophone model has been added! It completely replaces the old UVR model [on the site]. Thanks to viperx for model training.”\n\n“Really great model! Big step since the last VR winds model.” It works better for brass instruments than wind.\n\n- (x-minus) “BS-RoFormer Bass model added! This is a model by viperx.” Aufr33\n\n”much better at treble-heavy bass tones than demucs”\n\nIt’s different from the latest MVSEP bass model. Viperx’ model might be cleaner, but pick up less bass at times. Both are improvement over demucs\\_ft ~drypaintdealerundr\n\nIt’s best in no piano stem.\n\n- (x-minus) “Piano beta model added! Thanks to viperx for model training.”\n\n- (x-minus) “Now all models except BVE are available for free, even without registration!\n\nThe only restrictions:\n\nOnly mp3 downloads are available\n\nNo ensemble\n\n10 minutes of audio per day (past 24 hours). This is enough for testing 2-3 songs.\n\nMax song duration is 8 minutes\n\nIt's not available through Tor, and it's not available in some countries.” Aufr33\n\nQ: Why BVE models are excluded? [from free option]\n\nA: “Because in the free version, wav files are deleted immediately after processing is complete. This makes it impossible to download some stems. In addition, this model necessarily uses MDX for preprocessing, which is very compute-intensive.”\n\n- (mvsep) Bass model is online. The metrics:\n\nSingle models:\n\nHTDemucs4 bass SDR: 12.5295\n\nBSRoformer bass SDR: 12.4964\n\nMelRoformer bass SDR: 11.86\n\nMDX23C bass SDR: 11.20\n\nModels on site:\n\nHTDemucs4 + BSRoformer Ensemble (It's available on site as MVSep Bass (bass, other)): 13.25\n\nEnsemble 4 stems and All-In (from site): 13.34\n\nFor comparison:\n\nRipple lossless (bass): 13.49\n\nSami-ByteDance v1.0: 13.82\n\n- GSEP announced works on a new model\n\n- Mel-RoFormer Karaoke model added on [x-minus.pro](https://x-minus.pro/)\n\n“one of the cleanest lead vocals result[s]”\n\n“I noticed that the new karaoke model considers vocals as lead vocals, even if they are quite wide. In other words, it has a much larger tolerance for vocal width than other karaoke models. This means that backing vocals that sound almost centered can be removed along with the lead vocal. If I apply a stereo expander, the model produces more adequate results. So when I add the Lead vocal panning setting, the \"center\" will actually work as \"stereo -20%\" (for example).”\n\n“Q: wouldn’t this mean that there will be more backing vocal bleed in the lead vocal stem too?\n\nA: The model behaves differently. In some cases, it completely isolates vocals, in other cases it gets confused and vocals appear in both stems at once, in other cases it doesn't isolate at all.”\n\nQ: What are the differences between mel-roformer karaoke and the last model?\n\nA: “If the vocals don't contain harmonies, this model (Mel) is better. In other cases, it is better to use the MDX+UVR Chain ensemble for now.”\n\nAlthough your mileage will still vary on a song, e.g. “For most of the songs I tried it worked very well. Example: \"From Souvenirs to Souvenirs\" by Demis Roussos. It's the only model as of now which can seperate the lead from the back vocals correctly” dca100fb8\n\n- Izotope RX11 officially released. Seeing by the unencrypted onnx model file names, it uses demucs\\_ft for stem separation now, but maybe it’s not the same model as the public one, as all stems “null back to the input stereo which is something standard demucsht doesn't appear to do (...) mdx seems to always null luckily.” The feature still doesn’t use CUDA/Nvidia GPU for processing (and there’s no such option anywhere). It’s still an improvement over RX8-10 as they used in-house Spleeter 22kHz models before.\n\n[Comparison](https://discord.com/channels/708579735583588363/708579735583588366/1240418057491447829)\n\n<https://www.youtube.com/watch?v=MhUEmvneerc>\n\nNew [features](https://www.izotope.com/en/products/rx.html) (e.g. clean up dialogue in real time).\n\n- Logic Pro 11 now incorporates a [Stem Splitter](https://support.apple.com/pl-pl/guide/logicpro/lgcp61bae908/mac). Results vary from good to bad (and worse than known solutions) depending on a song.\n\n- Mel-Roformer De-crowd model added on x-minus.pro.\n\nResults are more accurate than in the old MDX model.\n\n- [GSEP](#_yy2jex1n5sq) has been updated.\n\nFree option for all stems has been removed. There's only a 20 minutes free trial. WAV is only for paid users.\n\nVocals and all other stems (including instrumentals/others) are paid, and length for each stem is taken from your account separately for each model.\n\nNo credit is not required for the trial.\n\nFor free, only mp3 output and 10 minutes input limit.\n\nFor paid users there's a 20 minutes limit, and mp3/wav output, plus paid users have faster queue, shareable links, and long term results storage.\n\n[Pricing](https://studio.gaudiolab.io/pricing)\n\n7$/60 minutes\n\n16$/240 minutes\n\n50$/1200 minutes\n\nSeems like there weren't many changes in the model (if there weren’t even more vocal residues introduced since then). People still have similar complaints to it. [Comparison](https://www.youtube.com/watch?v=OGWaoBOkiMg) video.\n\nThere was an average of 0.13 SDR increase for mp3 output and first 19 songs from multisong dataset evaluation, but judging by no audible difference for most people, they could simply change some parameters for inference.\n\nThe old files from previous separations on your account didn't get deleted so far.\n\n- (x-minus) max\\_mag of (?-)Roformer and Demucs (drums only) added\n\n“now the synths and everything else feels muddy\n\nnoticed the drums in some places (mainly louder-ish bits) sound a bit weird\n\nmostly lower end like bass drum instead of hi hats\n\ngreat improvement overall” isling\n\n- Doubledouble might have some occasional hiccups on downloading. If you encounter very slow download, don’t attempt retrying the same download, but generate a new download query. Do it even three times in a row if necessary or wait half an hour and retry. Also, you can check the option to upload your result on external hosting.\n\n- (x-minus) “Added max\\_mag ensemble for Mel-RoFormer model! It combines Mel and BS results, making the instrumentals even less muddy, while better preserving saxophone and other instruments.”\n\n- New Mel-Roformer model trained by Kimberley Jensen on Aufr33 dataset dropped exclusively on [x-minus](https://x-minus.pro/).\n\n“This model will now be used by default and in ensemble with MDX23C (avg).”\n\nIt’s less muddy than viperx model, but can have more vocal residues e.g. in silent parts of instrumentals, and can be more problematic with wind instruments putting them in vocals, plus it might leave more instrumental residues in vocals.\n\n“godsend for voice modulated in synth/electronic songs”\n\nSDR is higher than viperx model (UVR/MVSEP) but lower than fine-tuned 04.24 model on MVSEP.\n\n- New UVR patch has been released. It fixes using OpenCL on AMD and Intel GPUs (just make sure you have GPU processing turned on in the main window and (perhaps only in some cases) OpenCL turned on in the settings).\n\nPlus, it fixes errors when the notification chimey in options is turned on.\n\n<https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_4_14_24_18_7_BETA_full_Roformer.exe> (be aware that you can lose your current UVR settings after the update)\n\nTo use BS-Roformer models, go to download center and download them in MDX-Net menu (probably temp solution).\n\nFor 4GB VRAM and at least AMD/Intel GPUs, you can try out segments 32, overlap 2\n\nand dim\\_t 201 with num\\_o 2 (dim\\_t is at the bottom of e.g. model\\_bs\\_roformer\\_ep\\_368\\_sdr\\_12.9628.yaml) to avoid crashes.\n\nYou might want to check a new recommended ensemble:\n\n1296+1297+MDX23C HQ\n\nInstead of 1297 and for faster processing and similar result, make a manual ensemble with a copy of 1296 result instead. It might work in similar fashion like weighting in 2.4 Colab and model ensemble on MVSEP ([source](https://discord.com/channels/708579735583588363/708579735583588366/1228818808261840976)).\n\n- VIP code allowing access to extra models in UVR currently doesn’t work using [Roformer beta patch](#_6y2plb943p9v) older than #10, and MDX23C Inst Voc HQ 2 models disappeared from download center and GH. You can try to download VIP model files manually from this link and place them in Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models directory:\n\n<https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/UVR-MDX-NET_Main_406.onnx>\n\n<https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/UVR-MDX-NET_Main_427.onnx>\n\n<https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/MDX23C-8KFFT-InstVoc_HQ_2.ckpt>\n\nOf course, it’s not all. E.g. 390 340 models and old beta MDX-Net v2 fullband inst models epochs are not reuploaded. This situation might cause errors on an attempt of using Inst Voc HQ 2 in AI Hub fork of Karafan.\n\nDecrypted VIP repo leads to links which are offline, and also it doesn’t contain all models. Possibly the only way to access all the VIP models in beta UVR, is to roll back to stable 5.6 version from UVR official repo, and after downloading all desired VIP models, update to the latest patch.\n\n- According to their forum leak, iZotope RX11 might be released between May and July, and contain some “pretty big changes”, among others, a novel arch for separation is rumored, and a lot of options reworked. (cali\\_tay98)\n\nOfficial announcement is out:\n\n<https://www.izotope.com/en/learn/rx-11-coming-soon.html>\n\n(overhauled repair assistant, real time dialogue isolation for better separation of noise and reverb from voice recording)\n\n- GSEP announced an update on May 9th with a WAV download option and redesigned UI.\n\nThe site will be unavailable on 8th May.\n\nNoraebang (karaoke) service “due to low usage” will be shutdown, and your separated files deleted (you can make a backup of your files before).\n\nPaid plan will be offered with faster processing times and “additional features”.\n\nNo model changes are announced so far. The update schedule might change.\n\n- MDX23-Colab Fork [v2.4](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb) is out. Changes:\n\n“BS-Roformer models from viperx added, MDX-InstHQ4 model added as optional, FLAC output, control input volume gain, filter vocals below 50Hz option, better chunking algo (no clicks), some code cleaning” - jarredou\n\n- (x-minus) “Added mixing of MDX23C and BS-RoFormer results (avg/bs-roformer option). So far, it works only for MDX23C.” Aufr33\n\n- “Output has released a free AI based generator that create multitrack stem packs\n\n<https://coproducer.output.com/pack-generator>”\n\n12-seconds long audio, fullband, 8 stems (drums in one stem, electric and rhythm guitar, hammond organ, trumpet, vocals) with 8 variations\n\n“this looks more like it's mixing different real instruments, rather than actually making up songs (like a diffusion based generator)” ~jarredou/becruily\n\n- Ensemble on MVSEP updated\n\n- The site is up and running after some outage\n\n- ZFTurbo released fine-tuned viperx model (“ver. 2024.04”) on MVSEP (further trained from checkpoint on a different dataset). Ensembles will be updated tomorrow. Clicking issue has been fixed.\n\nSDR vocals: 11.24, instrumental: 17.55 (from 17.17 in the base model)\n\nDepends on a song if it’s better. Some vocals can be worse vs the previous model.\n\n- Test out ensemble 1296 + 1143 (BS-Roformer in beta UVR) + Inst HQ4 (dopfunk)\n\nEnsembles with BS-Roformer models might not work for everyone, use manual ensemble if needed.\n\n- Viperx model added also to beta Colab by jarredou. It gives only vocals, so perform inversion on your own to get instrumental\n\n<https://colab.research.google.com/drive/1pd5Eonbre-khKK_gn5kQPFtB1T1a-27p?usp=sharing>\n\nUpdate: now BS-Roformer is also added in the newer v.2.4 [Colab](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb)\n\n- Viperx’ BS-RoFormer models have been implemented by Anjok to UVR\n\n##### - **BS/Mel-Roformer UVR beta patch**\n\n##### For GPU acceleration, UVR currently supports: a) CUDA (NVIDIA GPUs) b) DirectML (AMD and Intel GPUs; previously misnamed as OpenCL) c) MPS (Mac M1 [ARM]/x86-64), - and even old CPUs (AMD A6-9225 Dual-Core or Intel Core 2 Quad [models with SSE4.1 tested]), but with at least MDX-Net HQ (v2) models without GPU acceleration. DirectML is not supported for Apollo, Bandit (also incompatible with MPS), SCNet and probably Demucs 2 archs - CPU will be used automatically.\n\n##### Minimum reasonably good enough NVIDIA GPU for Roformers might be desktop RTX 3050 6GB, 2304 CUDA cores (at best 8GB variant with more 2560 CUDA), while e.g. Colab Tesla T4 11GB also has 2560 (which without TTA [not implemented for Roformers in UVR] is just alright - see [separation times](#_6uak70lspqhf)). The old 980 Ti 6GB with 2816 CUDA will be rather slower than these due to older architecture. I'd refrain from getting a mobile RTX 3050 - Ti variant has 2560 CUDA cores, but both have 4GB, and using shared memory is slower - Roformers generally use more than that. Using CUDA, even with currently the biggest 2GB size inst model, Rifforge, doesn’t reach 6GB VRAM usage, but its separation time on T4 using TTA in Colab takes long 36 minutes for a 5 minute file, so consider min. 6GB VRAM as bare minimum for reasonable separation times with around 2560 CUDA cores. For AMD/Intel GPUs, if you want to use default high chunk\\_size with certain models, 16GB VRAM is recommended, so you won’t have to decrease it manually (DirectML is way less memory-efficient than CUDA) Full list of model architectures supported by UVR: MDX-Net, MDX23C (archs by kuielab), VR (voice-remover by tsurumeso, v. 4, 5 [UVR fork], 5.1), Demucs (arch by Meta; v. 1-4, only models trained on OG code, not MSST ones), BS-Roformer, Mel-Roformer (arch by Bytedance & impl. by lucidrains; issues on Linux explained later), SCNet, Apollo (in Tools; for upscaling, no DirectML acceleration), BandIt (SFX, no DirectML support). It has also a feature of ensemble e.g. MDX models with different archs like MDX23C/v3 or Roformers, VR. If some models don't appear on the ensemble list reach the appropriate section below (or rich the document outline in options). Demudder added in newer patches (currently not on Linux and MacOS).\n\n##### Models for Roformers/SCNet/Bandit arch are located altogether in the MDX-Net menu.\n\nApollo audio upscaler is located in Tools.\n\nDon’t forget to enable GPU Conversion - if it works, it speeds up separation hugely. Even Tiger Lake iGPUs are capable of working with at least MDX-Net HQ (v2) models.\n\n- Min. 4GB VRAM AMD GPUs tested/recommended (with chunk\\_size between 112455 and 200655 depending on the model in the Roformer yaml; [table of chunks](#_57eblyiq5076)), and min. 2GB VRAM NVIDIA GPU (even 2GB Rifforge worked with low enough chunks).\n\n- Bare minimum for GPU acceleration on NVIDIA is Maxwell/900 series GPUs/compute compatibility 5 (unsupported for CUDA in UVR are at least NVIDIA GTX 600 and GT 700 series and older, returning: “AssertionError: \"\" Traceback Error:” or \"CUDNN\\_STATUS\\_NOT\\_INITIALIZED), although DirectML should be theoretically supported by all DX12 GPUs (I’m not sure if in newer versions than 5.6.0 it’s still possible to switch to DirectML setting on NVIDIA GPUs).\n- For AMD, at least RX 4GB models tested (not sure about R9 200 4GB GPUs - either if on newer modded Radeon-ID drivers and/or with downgraded DirectML.dll attached with drivers, copied to UVR\\torch\\_directml folder, but seems like someone had occasional memory issues on HD 7870 2GB, but GPU Conversion still worked). “AssertionError: \"\" Traceback Error:” also exists when your AMD driver/Windows is outdated (then use e.g. [1.9.1.0](https://drive.google.com/file/d/1dxmG0cfclGMkFLfdoKAfQaDkx3Rcspku/view?usp=sharing) library).\n- Intel was confirmed to work with ARC GPUs, and Xe integrated graphics (e.g. Tiger Lake 2021) with at least MDX-Net HQ (v2) models.\n\n- If your separation on DirectML is stuck (AMD/Intel), and takes enormously long time, decrease [chunk\\_size](#_57eblyiq5076).\n\n- RTX 5000 series support on Windows was added in a separate UVR [patch](https://www.mediafire.com/file_premium/4jg10r9wa3tujav/UVR_Patch_4_24_25_20_11_BETA_full_cuda_12.8.zip/file) (or possibly you can use OpenCL (DirectML) in options instead [slower]). The patch is not compatible with Intel/AMD GPUs, and potentially also older NVIDIA GPUs, giving the following error:\n\n“AttributeError: module 'torch\\_directml' has no attribute 'is\\_available’\n\nBe aware that some newer models are incompatible with this version giving:\n“torch.\\_dynamo.polyfills.fx” [error](#_alby6c59i1tk)\n\n- In patch #12 a new “Inference mode” option in Advanced MDX-Net>Multi Network was implemented (disabled by default). In the current state, it fixes silence in separation on GTX 10XX and maybe older, but might make separations longer for other compatible GPUs. So if you have slower separations after updating UVR, check if GPU Conversion is still enabled (it’s rather disabled by default on new installations) and you can try to turn on Inference Mode if you have RTX GPU, or potentially GTX 16XX.\n\nDownload\n\n(*if your download speed gets slow, use e.g. “Free Download Manager” on Windows or any other increasing connection count)*.\n\n- for *Linux*\n\nSadly, official building instructions on GitHub are outdated and the current branch working on Linux with Roformers is old and doesn’t support all those models (at least without some workarounds below). Also, normally you can’t use WINE and use at least DirectML GPU acceleration ([error](https://github.com/Anjok07/ultimatevocalremovergui/issues/1888)), and I doubt VKD3D would help.\n\nIt turns out someone found a way to use GPU Conversion with Wine by using Bottles and used newer UVR Windows codebase (although it might be slower than [MSST](#_2y2nycmmf53) due to additional translation layer, so consider it instead):\n\n\"My way to easily run UVR on Linux:\n\nI just downloaded Bottles\n\n<https://usebottles.com/> (which uses WINE) and used the provided .exe file from the repository's releases. I created a new bottle with a gaming profile (to utilize GPU) and moved the exe file into \"drive\\_c\" (otherwise it won't work), then just ran through the installer and it worked like a charm!\" ([src](https://github.com/Anjok07/ultimatevocalremovergui/issues/2108#issuecomment-386623143))\n\nIf you still want to use native Linux codebase\n\n- Some installation [directions](https://discord.com/channels/708579735583588363/767947630403387393/1224237442488602684) (from our [Discord](https://discord.gg/ZPtAU5R6rP)) from Roformer patch #1/2 period (or also [here](https://discord.com/channels/708579735583588363/708579735583588366/1259017843211636816)), plus comment out:\n\nsegmentation-models-pytorch 0.3.3 in requirements.txt [here](https://discord.com/channels/708579735583588363/767947630403387393/1223850400482988143) (line 136) (for Nvidia/CPU).\nCurrent code repository for Roformers with DirectML support is located here (although at certain periods it might lack current patches):\n(might work for non-Nvidia GPUs):\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml>\n\nJudging by the date of files from 9 December 2024 in the repo at the moment, they seem to derive from outdated 12\\_8\\_24\\_23\\_30\\_BETA beta #9 patch (some newer models will fail with that codebase, plus it’s before chunk\\_size implementation, so it rather still uses dim\\_t).\n- Some other potentially useful information:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/1674>\n\n- Or “you just need to use export PYGLET\\_SHADOW\\_WINDOW=0 and it'll work”\n\n- See [here](https://github.com/Anjok07/ultimatevocalremovergui/issues/1890#issue-3174151206) if you get following errors:\n“Getting requirements to build wheel did not run successfully. ... ModuleNotFoundError: No module named 'imp'”\n\nedit requirements.txt:\npyrubberband==0.3.0\n\nPyYAML==6.0\n\nscipy==1.9.3\n\nplaysound\n\nnumpy==1.23.5\n\n- Workarounding issues with Python 3.12:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/1789>\n\n- Becruily karaoke model error fix:\n\n“Those of you on linux running the current roformer\\_add+directml branch that cant get becruily's karaoke model working due to the same error: it seems editing line 790 in separate.py setting the keyword argument strict to False when calling load\\_state\\_dict seems to make the karaoke model load and infer properly, so I think it will work\n\nmodel.load\\_state\\_dict(checkpoint, strict=False)\n\nI don't know if this is a robust workaround, but I haven't observed anything behaving differently than it should yet, so if you want to give it a shot I think it will work\n\nTL;DR change line 790 in separate.py to the codeblock and then run again and karaoke model should work” stephanie\n\nROCm instructions for AMD (also for Windows, but currently only using WSL)\n\n- For better separation speed than DirectML:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/1822#issuecomment-2824363747>\n\n- Fix for” “ModuleNotFoundError: No module named 'audioread’”\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/1797#issuecomment-3315191055>\n\n- Workarounding issues with playsound & sklearn dependencies:\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/2107>\n\n- Fixing issues in Matchering on Linux:\n\n“Succeeded to run after modifying UVR.py :\n\nmatch.process(\n\ntarget=target,\n\nreference=reference,\n\nresults=[match.save\\_audiofile(save\\_path, wav\\_set=self.wav\\_type\\_set),],\n\n)\n\nto\n\nmatch.process(\n\ntarget=target,\n\nreference=reference,\n\nresults=[match.pcm16(save\\_path)]\n\n)\n\n“\n\n- Fixing matching errors\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/2018#issue-3564850746>\n\n- Python 3.11 dependencies fix\n\n<https://github.com/Anjok07/ultimatevocalremovergui/issues/2108#issuecomment-3808617707>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nBelow you’ll find download links for ready packages for other platforms\n*(they contain isolated Python environment - so you don’t have to mess with your local Python installation - so you don’t even have to have Python installed on your computer)*\n\n- *for macOS*\n\n*(it* doesn’t incorporate demudder from Windows patch #14)\n\n- UVR Roformer beta patch #13.1\n\n(beta\\_0115\\_MacOS\\_arm64\\_hf)\n\nwhich applies a hotfix to address a few graphics issues.\n\n- Mac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_arm64_hf.dmg)\n\n- Mac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64_hf.dmg) (it went offline for some reason; older version below)\n\n*Note: Some people get error about failed malicious software check. Then check out patch #9 below or read “MacOS Users: Having Trouble Opening UVR?” section in the GH* [*repo*](https://github.com/Anjok07/ultimatevocalremovergui?tab=readme-ov-file#macos-installation)*. Plus:*\n\n*“Functionality for systems running macOS Catalina or lower is not guaranteed”.*\n\n*Also: “What ended up working for me was to make sure that UVR lived in my root level Applications folder. I normally have an 'Installs' folder inside of Applications to help me keep track of things I have downloaded and installed. Nothing would load from the Installs folder, but worked fine from the root Applications folder.”*\n\nOlder Mac packages:\n\nUVR beta Roformer patch #13\n\nVR\\_Patch\\_1\\_15\\_25\\_22\\_30\\_BETA:\n\n- Mac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_arm64.dmg)\n\n- Mac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64.dmg)\n\nUVR Roformer beta patch #9:\nUVR\\_Patch\\_12\\_8\\_24\\_23\\_30\\_BETA:\n\nMac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1208_MacOS_arm64.dmg)\n\nMac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1208_MacOS_x86_64.dmg)\n(Patch #9 fixes mainly Apollo arch issues)\n\nApollo arch was made compatible with MacOS MPS (metal) but it might be unstable and very RAM intensive - use chunk size over 7 to prevent errors.\nApollo is now compatible with all Lew models (fixed incompatibility with any other than previously available in Download Center). Fixed Matchering (presumably regression).\n\nChangelog for Mac since patch #2 (older patches later below):\n\n“Roformer checkbox now visible for unrecognized Roformer models” so now you can use custom Roformer models on MacOS Roformer patch without copying/modifying configuration files from Windows version or other users. Plus it includes all the previous fixes in the previously released patch (overlap code fixed, so no stem misalignment should occur on certain overlap settings - higher overlap now means longer separation - so it's the opposite now)\n\n**For *Windows***\n\n- (optional) Standalone UVR Roformer beta patch #15 for RTX 5000 (or more precisely 14b) fixing issues with not working or slower than CPU CUDA on those GPUs. It’s not compatible with older GPUs.\n\n- Full Install: [Download](https://www.mediafire.com/file_premium/4jg10r9wa3tujav/UVR_Patch_4_24_25_20_11_BETA_full_cuda_12.8.zip/file)\n\n(standalone, although still called patch just for simplification to refer codebase functionality as to digits hashtag number)\n\nNote: Some newer models are exclusively not compatible with this version giving torch dynamo error (then you need to use the patches below with CPU or maybe slower DirectML; you can have both installations after manual backup).\n\n- UVR Roformer beta patch #13\n\nUVR\\_Patch\\_1\\_15\\_25\\_22\\_30\\_BETA\n\nIt fixes the issue with no sound on some Roformer models (like avvuew’s de-reverb) on GTX 10XX or older:\n\n- Full Install: [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_15_25_22_30_BETA_full.exe)\n\n(means it's standalone, and doesn't need 5.6.0 on top, or any other existing UVR installation to work)\n\n- Patch Install (for non-beta UVR installed, e.g. 5.6, not Roformer 5.6.x patch): [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_1_15_25_22_30_BETA_rofo.exe)\n\n- Small Patch Install (for any Roformer patch previously installed): [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_15_25_22_30_BETA.exe)\n\nThe issue was some older GPU's are not compatible with Torches \"Inference Mode,\" (which is apparently faster) so it's now using \"No Grad\" mode instead. Users can switch back to using \"Inference Mode\" via the advanced multi-network options. [More](https://discord.com/channels/708579735583588363/785664354427076648/1329312294521405493)\n\n###### *UVR demudder*\n\n- UVR Roformer beta small patch #14 - the long anticipated **demudder** added:\n\nUVR\\_Patch\\_1\\_21\\_25\\_2\\_28\\_BETA: [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_21_25_2_28_BETA_small_rofo.exe)\nIt’s a small patch (you must have a [Roformer Installation](#_6y2plb943p9v) [e.g. full #13 above] previously installed for this to work).\n\n- Included in the standalone RTX 5000 patch above ([download](https://www.mediafire.com/file_premium/4jg10r9wa3tujav/UVR_Patch_4_24_25_20_11_BETA_full_cuda_12.8.zip/file))\nFurthermore, minor bugs were fixed, calculate compensation for MDX-Net v1 models added.\n\nThe MacOS version of the patch is not released yet.\n\nTo enable demudder, go to Settings>Choose Advanced Menu>Advanced MDX-Net Options>Enable Demudder (in Demudder options pick one out of the three described below)\n\n*Troubleshooting*:\n\n- At least Phase Rotate doesn’t work on AMD and 4GB VRAM GPUs on even 88200 chunk size (prev. dim\\_t 201 - 2 seconds) and 800MB Roformers like Becruily’s, while 112455 (2,55s, prev. dim\\_t = 256) works fine for normal separation.\n\n- In case of file not found error on attempt of using demudder, reinstall UVR.\n- In case of Format not recognised error for demudder, keep Match freq cut-off enabled in MDX settings.\n\n- “For Roformer models, it must detect a stem called \"Instrumental” so for some models like Mel-Kim, you need to open the model's corresponding yaml, and change “other” to “instrumental”.”\n\n“With the new config editor feature you could probably edit the configs of models to have the vocal stem labelled as the Instrumental stem so the demudder demuds the vocal stem, it definitely still makes a difference\n\nI accidentally did this when installing another model, but it seems to actually have an effect on vocal stems too” stephanie\n\nDemudder consists of three methods to choose from:\n\n* Phase Rotate\n* Phase Remix (Similar to X-Minus) - “the fullest sounding, but can leave a lot of artifacts with certain models. I only recommend that method for the muddiest models. Otherwise, Combined Methods is the best” “I don't recommend using phase remix on the Instrumental v1e model. I recommend combined methods or phase rotate for models produce fuller instrumentals.” Anjok\n* Combine Methods (weighted mix of the final instrumentals generated by the above). More in the [full changelog](https://discord.com/channels/708579735583588363/785664354427076648/1331189771078471721).\n* You can also use phase remix in [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) Colab\n\nDemudder is “meant to solely target instrumentals. The vocals should stay exactly as before.”\n\n“It works best on tracks that are spectrally dense (ex. Metal, Rock, Alternative, EDM, etc.)\n\nI don't recommend it for acoustic or light tracks.\n\nI don't recommend using it with models that emphasize fuller instrumentals (like Unwa's v1e model).” Anjok\n“I've noticed with the few amounts of tracks I've tried, demudding can sometimes accentuate instances of bleeding or otherwise entirely missed vocal-like sounds”. More in the [full changelog](https://discord.com/channels/708579735583588363/785664354427076648/1331189771078471721).\n“I put the demudded instrumental in the bleed suppressor, and it sounds really good, almost noise free. I either do a bleed suppressor or a V1/bleed suppressor ensemble” gilliaan\n“I found that Phase Remix also works well on pop and other genres of music, but it only works using the vocal models (Phase Remix and VOICE-MelBand-Roformer Kim FT (from Unwa) or\n\nVOICE-MelBand-Roformer Kim FT 2 (by Unwa).” Fabio\n\n“I do plan on adding options to tweak the phase rotation.\n\nI also plan on adding another combination method that may work better on certain tracks.” - Anjok\n\nIf you set 64-bit float output in Options>Additional settings, the results might be slightly less muddy, but also in very big size.\n\nDemudder can be also used in a Colab and x-minus.pro/uvronline.app ([more](#_bviye361m0v))\n\n*OG Discord* [*channel*](https://discord.com/channels/708579735583588363/785664354427076648) *to follow for updates*\n\n*Older patches*\n\n*\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_*\n\n(old) Potential fixing of RTX 5000 series CUDA acceleration for #14 patch -\nnow unnecessary since dedicated patch was released above\n(although there’s no full success with the below, so also refer [here](https://github.com/CarlGao4/Demucs-Gui/issues/115))\n\nSome people reported that following the steps below might still result in slow separations:\n\nProbably you’ll be able to install required CUDA 12.8 and nightly PyTorch to fix the compatibility issue when following steps for [manual installation](https://github.com/Anjok07/ultimatevocalremovergui#manual-windows-installation) (unfold it), so UVR won't use its own Python environment. In addition to the above link, use [this](https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml) repo with newer code, although for now, not the newest code is attached from the patches below... and all if you fix \"Getting requirements to build wheel ... error\" afterwards. Then you’ll have a bug causing no GPU Conversion option functional - then you need to use:\n\npython.exe -m pip install --upgrade torch --extra-index-url https://download.pytorch.org/whl/cu118\n\ninstead of cu117 as in the instruction above (although for RTX 5090 it probably won’t work, and you can use this https://download.pytorch.org/whl/nightly/cu128 instead.\n\n- Similar issue might occur when you don’t install \"onnxruntime-gpu\"\n\nwhen the current is \"onnxruntime\" library (which does not support GPU)\n\n- For Demucs UnpicklingError issue using manual installation:\n\nmodify the line 46 in demucs/states.py:\n\n*package = torch.load(path, 'cpu', weights\\_only=False)*\n\n- Sometimes the same happens for e.g. Becruily inst model (different arch). It’s also the indicator that the model file is corrupted and has wrong CRC (most likely wrongly downloaded) - redownload the model.\n\nAt least on the old 5.6.0 version, there's OpenCL in options instead of DirectML in newer patches (although it's the latter).\n\nAlternatively, you can use [MSST-GUI](#_2y2nycmmf53).\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n- Anjok released a new UVR beta Roformer patch #11 (Windows only for now)\n\nIt fixes 4 bugs: with VR post-processing threshold, Segment default in multi-arch menu, CMD will no longer pop-in during operations, and error in phase swapper.\n\n[More](https://discord.com/channels/708579735583588363/785664354427076648/1328267582871961620) details/potential updates.\n\nStandalone (for non-existent UVR installation)\n\n[UVR\\_1\\_13\\_0\\_23\\_46\\_BETA\\_full.exe](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_13_0_23_46_BETA_full.exe)\n\nFor 5.6 stable (so for non-beta Roformer installation)\n\n[UVR\\_Patch\\_1\\_13\\_0\\_23\\_46\\_BETA\\_rofo.exe](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_1_13_0_23_46_BETA_rofo.exe)\n\nSmall (for already existing Roformer beta patch installation)\n\n[UVR\\_Patch\\_1\\_13\\_25\\_0\\_23\\_46\\_rofo\\_small\\_patch.exe](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_13_25_0_23_46_rofo_small_patch.exe)\n\n- Patch #12 which is a hotfix for the [4 stem](#_sjf0vefmplt) BS-Roformer model by ZFTurbo (trained on MUSDB)\n\n[UVR\\_Patch\\_1\\_13\\_0\\_23\\_46\\_BETA\\_rofo\\_fixed.exe](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_13_0_23_46_BETA_rofo_fixed.exe) (Windows only)\n\nUsers undergo some issues (no sound) with Mel-Roformer de-reverb by anvuew (a.k.a. v2/19.1729 SDR) since the latest UVR beta #11 or #12 update. Patch #10 works.\n\nThe issue seems to occur only on GTX 10XX series, and maybe older.\n\nYou should be able to use more than one UVR installation at the same time when one’s been copied before updating (potentially patch #10 will still work) or use MSST repo and/or its GUIs.\n\nUVR Roformer beta patch #9:\n\nUVR\\_Patch\\_12\\_8\\_24\\_23\\_30\\_BETA\n\nWindows: [Full](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_12_8_24_23_30_BETA_rofo_full_install.exe) | [Patch](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_Patch_12_8_24_23_30_BETA_large.exe) (Use if you still have non-beta UVR installed) |\n\n[Small Patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_8_24_23_30_BETA_small_rofo.exe) (for this you must have a Roformer patch previously installed for this to work)\n\nUVR Roformer beta patch #10\n\nUVR\\_Patch\\_1\\_9\\_25\\_23\\_46\\_BETA\\_rofo\\_small\\_patch - [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_9_25_23_46_BETA_rofo_small_patch.exe)\n\nFor now, only small patch for already existing beta Roformer installation above is available, and only for Windows.\n\nIf you have Python DLL error on startup, reinstall the last beta update using the full package instead, then the small installer from the newer patch.\n\nSince beta version #10, **UVR doesn't rely on 'inference.dim\\_t' value for Roformers anymore** (if you were using edited \"dim\\_t\" value in yaml configuration files).\n\nYou have to edit audio.chunk\\_size instead if need it (e.g. for 4-12GB VRAM on AMD/Intel).\n\nIt’s located “In model yaml config file, at top of it, chunk\\_size is first parameter (...) you can edit model config files directly inside UVR now.” or in the new config editor in newer versions.\n\n“Conversion between dim\\_t and chunk\\_size\n\ndim\\_t = 801 is chunk\\_size = 352800 (8.00s)\n\ndim\\_t = 1101 is chunk\\_size = 485100 (11.00s)\n\ndim\\_t = 256 is chunk\\_size = 112455 (2,55s)\n\ndim\\_t = 1333 is chunk\\_size = 587412 (13,32s)\n\n[more values later below]\n\nThe formula is: chunk\\_size = (dim\\_t - 1) \\* hop\\_length)” - jarredou\n\nGenerally, to have the best SDR, use chunks not lower than 11s in the yaml for inference, which is usually training chunks value (rarely higher). Although, at times people get better results with 2,55s chunks, although some models behave worse than others with such small values.\n\n“most of the time using higher chunk\\_size than the one used during training gives a bit better SDR score, until a peak value, and then quality degrades.\n\nFor Roformers trained with 8sec chunk\\_size, 11 sec is giving best SDR (then it degrades with higher chunk size)\n\nFor MDX23C, when trained with ~6sec chunks, iirc, peak SDR value was around 24 sec chunks (I think it was same for vit\\_large, you could make chunks 4 times longer)\n\nHow much chunk\\_size can be extended during inference seems to be arch dependant.” - jarredou\n\nChangelog #10:\n\nAdded SCNet and Bandit archs with models in Download Center, fixed compatibility with some newer Roformer models (prob. the Phantom center and 400MB small Unwa models, not sure yet), new Model Installer option added, model configuration menu enhanced, allowing aliases to selected models, added compatibility for Roformer/MDX23C Karaoke models with the vocal splitter, VIP code issue is gone, issues with secondary models options and minor bugs and interface annoyances are addressed, “improved the \"Change Model Settings\" menu. Now, any existing settings associated with a selected model are automatically populated, making it easier for users to review and adjust settings (previously, these settings were not visible even if applied).”.\n\n“Unfortunately, SCnet is not compatible with DirectML, so AMD GPU users will have to use the CPU for those models.\n\nBandit models are not compatible with MPS or DirectML. For those with AMD GPU's and Apple Silicon, those will be CPU only.\n\nThe good news is those models aren't all that slow on CPU.” - Anjok\n\nChangelog #9:\n\nApollo fixes: “Chunk sizes can now be set to lower values (between 1-6)\n\nOverlap can be turned off (set to 0)”\n\nFix both for Apollo and Roformers: now 5 seconds or shorter input files no longer cause errors.\n\nOpenCL was wrongly referenced in the UVR. It was actually DirectML all the way, and Anjok changed all the OpenCL names in the app into DirectML.\n\nChangelog for all platforms:\nPatch #3 fixed the issue with stem misalignment when using incorrect overlap setting for Roformers. Now it uses ZFTurbo code (also for MDX23C), meaning that **now increasing overlap for Roformers will result in increasing separation times** and potentially better SDR [the opposite of what it used to be in the previous beta Roformer patches #1 and #2]. Also, it “Fixed manual download link issues in the Download Center. Roformer models can now be downloaded without issue.”). Also, new Roformer models were added to Download Center, so you don't have to download them manually.\n\n- UVR Roformer beta patch #8 for Win: [full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_3_24_1_18_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_3_24_1_18_BETA_small_patch_rofo.exe) | Mac: [M1](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1203_MacOS_arm64.dmg) | [x86-64](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_1203_MacOS_x86_64.dmg):\n\nUVR\\_Patch\\_12\\_3\\_24\\_1\\_18\\_BETA\nApollo arch was made compatible with OpenCL too, but it might be unstable and very RAM intensive - use chunk size over 7 to prevent errors (currently it’s not certain that all models will work with less than 12GB of VRAM). At least in newer patches, it can straight up say that Apollo is not compatible with DirectML, and fall back to CPU mode.\nApollo is now compatible with all Lew models (fixed incompatibility with any other than previously available in Download Center). Fixed (presumably regression with) Matchering.\n\nUVR Roformer beta patch #7 ([full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_2_24_2_20_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_12_2_24_2_20_BETA_patch_rofo.exe)) Win\nUVR\\_Patch\\_12\\_2\\_24\\_2\\_20\\_BETA.\n\nIt introduces support for Apollo arch. The OG mp3 enhancer and Lew v1 vocal enhancer were added to Download Center. The arch is located in Audio Tools. Sadly, this arch cannot be GPU accelerated with OpenCL so AMD and Intel cards (you’re forced to use CPU which might be long).\nAlso, “Phase Swapper” a.k.a. Phase fixer for Unwa inst models was added to Audio Tools.\n\nRoformer beta patch #6: [M1](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_arm64.dmg) | [x86-64](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_MacOS_x86_64.dmg)\n\nUVR Roformer beta patch #6: Win\n\nUVR\\_Patch\\_11\\_25\\_24\\_1\\_48\\_BETA ([standalone](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_25_24_1_48_BETA_full_rofo.exe) or [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_25_24_1_48_BETA_patch_rofo.exe) - you can install it on stable 5.6 version already installed)\nFixes issues with viperx’ models.\n\nAnd with it, a long anticipated MDX-Net HQ\\_5 model was released (available for older versions in Download Center too. [Changelog](https://discord.com/channels/708579735583588363/785664354427076648/1310529792461770814)\n\nUVR\\_Patch\\_UVR\\_11\\_17\\_24\\_21\\_4\\_BETA\\_patch\\_roformer (Beta [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_19_24_1_23_BETA_patch_rofo.exe) #5 for UVR, Windows only):\n\n“- Fixed OpenCL compatibility issue with Roformer & MDX23C models.\n\n- Fixed stem swap issue with Roformer instrumental models in Ensemble Mode.”\n\nThat patch is probably not standalone like patch #3, so have a previous UVR installation.\n\nRoformer patch #4 for MacOS: [M1](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_17_24_21_4_BETA_rofo_MacOS_arm64.dmg) | [x86-64](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/Ultimate_Vocal_Remover_v5_6_1_11_17_24_21_4_BETA_rofo_MacOS_x86_64.dmg)\n\nUVR\\_Patch\\_11\\_17\\_24\\_21\\_4\\_BETA (Windows: [full](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_11_17_24_21_4_BETA_full_rofo.exe) | [patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_11_17_24_21_4_BETA_rofo.exe) | [changelog](https://discord.com/channels/708579735583588363/785664354427076648/1307933134574063706)) beta patch #4\n\n[UVR\\_Patch\\_11\\_14\\_24\\_20\\_21\\_BETA\\_patch\\_roformer](http://uvr_patch_11_14_24_20_21_beta_patch_roformer.exe) (beta patch #3 “requires an existing UVR installation” so either the previous beta Roformer patch above or stable [5.6 patch](#_k3vca4e9ena8).\n[Full changelog](https://discord.com/channels/708579735583588363/785664354427076648/1306819453857566811).\n\n[UVR\\_Patch\\_4\\_14\\_24\\_18\\_7\\_BETA\\_full\\_Roformer](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_4_14_24_18_7_BETA_full_Roformer.exe) | [mirror](https://buzzheavier.com/f/GWqIXjFpAAA) (standalone Roformer beta patch #2, fixed OpenCL separation for AMD/Intel GPUs)\n\n[UVR\\_Patch\\_3\\_29\\_24\\_5\\_11\\_BETA\\_full\\_roformer.exe](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_3_29_24_5_11_BETA_full_roformer.exe) (older Roformer patch #1)\n\nWith the following issue fixed in the newer patch #2 above -\n\nif you have playsound.py errors, disable notification chimneys in settings>additional settings, using OpenCL GPU acceleration (AMD) for BS-Roformer doesn’t work (or at least not for everyone)\n\nOlder Roformer patch #1/2 for *MacOS* (ARM only) got deleted from the Discord server\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n(Roformer models are also added on MVSEP and [x-minus.pro/uvronline.app](http://x-minus.pro/uvronline.app) and [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md), and [inference](https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference) Colab and MDX23 v.2.4/2.5 [Colab](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb))\n\n###### **Instructions** for UVR Roformer patches and installing custom models\n\n- Your current settings might be lost after patching your current UVR installation\n\n- Applying newer UVR Roformer versions over some older UVR versions might cause errors on startup (e.g. Python39.dll) - then perform clean installation to fix the issue. Just make sure that after uninstalling UVR, nothing is inside the old UVR folder\n\n- To perform clean installation of the latest UVR version, for now you need:\n\n***Windows*:**\nRoformer full patch #13\n\n(it's actually standalone, and not just a patch - it doesn't need 5.6.0 on top):\n\nUVR\\_Patch\\_1\\_15\\_25\\_22\\_30\\_BETA: [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_15_25_22_30_BETA_full.exe)\n\nAnd then install this small patch #14 (demudder and other fixes; it needs the patch #13 in order to work):\n\nUVR\\_Patch\\_1\\_21\\_25\\_2\\_28\\_BETA: [Link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_21_25_2_28_BETA_small_rofo.exe)\n(it’s not standalone)\n\n(Only for) RTX 5000 full patch #14:\n\nUVR\\_Patch\\_4\\_24\\_25\\_20\\_11\\_BETA: [Link](https://www.mediafire.com/file_premium/4jg10r9wa3tujav/UVR_Patch_4_24_25_20_11_BETA_full_cuda_12.8.zip/file)\n(iirc uses newer PyTorch and CUDA; standalone)\n\n***MacOS*:**\n- Roformer full patch #13.1 (standalone)\n\n(beta\\_0115\\_MacOS\\_arm64\\_hf)\n\nwhich applies a hotfix to address a few graphics issues:\n\n- Mac M1 (arm64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_arm64_hf.dmg)\n\n- Mac Intel (x86\\_64) users - [Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64_hf.dmg) (it went offline for some reason; older version #13 below:\n\n[Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_Roformer_beta_0115_MacOS_x86_64.dmg))\n\n***Linux:***Old patch #9 only for now (no demudder, some fixable issues with certain models):\n\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml>\n\nOr use bottles with gaming profile using WINE on Windows newer codebase (utilizes GPU)\n\n(for older patches and more info and troubleshooting refer [here](#_6y2plb943p9v))\n\n- Some newer Mel/BS models really require the newest Roformer patches (otherwise you’ll get MLP error).\n\n- In the Download Center you’ll find some BS/Mel models, but not all. Refer to the full list of models [here](#_2vdz5zlpb27h).\n\n***Installing custom Roformer models in UVR*** *(those unavailable in Download More Models a.k.a. Download Center)*\n\n- Since patch #10 new “Install Model” option was added (e.g. in MDX-Net menu)\nClick RBM on the models’ list to access the option and follow the instructions on your screen.\n\n- Models for Mel/BS Roformer, SCNet and Bandit Plus/v2 are located in the MDX-Net menu.\n\n- For most models, you need both ckpt and yaml file for the model to work (if it’s not the same with any config you already downloaded before - some models share the same config file - e.g. one folder with models in the repository might have only one yaml)\n\nIf you opened yaml file to download, but its content opened instead of its downloading started, press CTRL+S or go to options of your browser and find the option called “Save As”.\n\nNow you’ll have a txt extension, but we need it to be yaml (otherwise UVR won’t detect it), so choose All files in Extension, and edit it there manually to yaml (or after download).\n\n- If you don’t see ckpt on the extensions list, perform clean UVR installation from the patches above\n\n- In “Set Model Type”, most Roformers will use Roformer non-v2 (and most are Mel).\nFor now, you should pick v2 Roformer type probably only for unwa 400MB experimental model (if you have lots of *layers* errors using Roformers, it means you picked v2 config unnecessarily).\n\n- Manual model installation for e.g. Demucs models, which doesn’t have “Install model” option (or if you use patch before #10) - on an example of MDX/Roformers/SCNet/Bandit located in MDX-Net menu:\n\nTo install models from external sources (those unavailable in Download Center) you can copy the model file to models\\MDX\\_Net\\_Models and .yaml config to models\\model\\_data\\mdx\\_c\\_configs, then after choosing the model in the app, press yes to recognize the model, wait a while. In older beta check also “roformer model” option when asked for configuration file and confirm (you cannot press confirm or check the option on the oldest Mac Roformer patch, the issue is explained below, and fixed in newer versions).\n\n(or [step by step](https://discord.com/channels/708579735583588363/708595418400817162/1366720855824142458))\n\n*Misc*\n\n- You might want to decrease default chunk\\_size in Edit Model Param (or yaml) for AMD/Intel GPUs with VRAM lower than 16GB if you have memory errors with GPU Conversion enabled or your separation is stuck on e.g. 5% (read more in [Common issues](#_c4nrb8x886ob) later below)\n\n*-* How would I assign a yaml config to an Apollo model on the new UVR [patch]?\n\n“1. Open the Apollo models folder\n\n2. Drop the model into the folder\n\n3. From the Apollo models folder, drop the yaml into the model\\_configs directory\n\n4. From the GUI, choose the model you just added and if the model is not recognized, a pop-up window will appear, and you'll have the option to choose the yaml to associate with the model.” - Anjok\n\n- “batch\\_size = 2 (not less or more)” - “1 can lead to some clicks in output, while with batch\\_size>=2, there are no clicks. Clicks are obvious in low freq of log spectrogram”\n\nedit. iirc clicks with batch\\_size=1 could have been fixed (at least were with MSST from which the inference code was implemented in newer UVR patches, but iirc it wasn’t used, and later the clicks were fixed in MSST).\n\n- Segment size in the UVR UI does nothing for these Roformer models due to Advanced arch options being set by default to Segment Default which makes it being read from the yaml file. While \"Segment\\_Default\" in Advanced MDX-NET23 settings is checked, it will use the dim\\_t value from the bottom of the config. Simply dim\\_t is the segments. Although now chunk\\_size is used in newer patches instead.\n\n- “The overlap value in yaml files is never used by UVR, only the value in GUI is used.”\n\n“UVR uses inference.dim\\_t from config as segment\\_size, but the inference.num\\_overlap is not used by UVR, it's always using the value in the GUI\n\n(while ZFTurbo original script is using audio.chunk\\_size and inference.num\\_overlap but not inference.dim\\_t . That's a mess) ” jarredou\n\n###### *Overlap comparisons for Roformers*\n\n4 is a balanced value in terms of speed/SDR according to [measurements](https://imgur.com/a/KyCtncG) (since the beta patch #3 or later used above, overlap 16 is now the [slowest](https://imgur.com/a/JtxzRZD) in UVR (not overlap 2 is the slowest anymore when it was set the opposite) and overlap 4 has a bigger SDR than overlap 2 now.\nSome people still prefer using overlap 8, while for others it’s already an overkill.\nThere’s very little SDR improvement for overlap 32, and for 50 there’s even a decrease to the level of overlap 4, and 999 was giving inferior results to overlap 16.\nCompared to overlap 2, for 8 “I noticed a bit more consistency on 8 compared to 2 (less cut parts in the spectrogram).” Instrumentals with overlap higher than 2 can get gradually muddier.\nThe info is based on evaluations conducted on multisong dataset on MVSEP. Search for e.g. overlap 32 and overlap 16 below, and you will see the results to compare:\n\n<https://mvsep.com/quality_checker/multisong_leaderboard?algo_name_filter=kim>\n\n“overlap=1 means that the chunk will not overlap at all, so no crossfades are possible between them to alleviate the click at edges.” fixed in MSST.\nThe setting in GUI overrides the one in yaml’s setting.\n\n*chunk\\_size*\n\n“Most of the time using higher chunk\\_size than the one used during training gives a bit better SDR score, until a peak value, and then quality degrades.\n\nFor Roformers trained with 8 sec chunk\\_size [can be found in the yaml], 11 sec is giving best SDR (then it degrades with higher chunk size)” - jarredou\n\nAll the notable chunk\\_sizes are described later below.\n\nSometimes chunk\\_size can influence the ability of picking up e.g. some screams in the model (“kim's can do it if you mod the chunk size”).\nLower chunks might sound a bit less muddy.\nUnless they’re set too high and VRAM is exceeded, while separation still doesn’t fail on AMD/Intel, various values should provide similar separation times, esp. for NVIDIA users.\n\n*batch\\_size* (not used in UVR, but in MSST)\n\nLeave it default, but for faster inference, e.g. Gabox used 6. Using above 2 might increase VRAM usage. 1 is forced in the inference Colab (it has the clicking issue with that setting fixed in newer MSST code).\n\n“i.e. instead of running a single song at batch\\_size = 1 you can run 2 at the same time (batch\\_size = 2)” so “you can use more than one song to process”\n\n###### **Most common issues**\n\n######\n\n- Some models might occasionally disappear from your list (most likely sideloaded outside Download Center) and change their name (probably once they got added to Download Center - e.g. Melband Roformer karaoke ckpt)\n\n- Since patch #3, Roformers’ separation times with even smaller overlap are longer than before. Also, remember about inference mode added in one of the later patches. It’s disabled by default as it causes silent separation issues on GTX 10XX GPUs and probably older. Enabling it on newer GPUs might be beneficial for performance\n\n- If UVR freezes your PC occasionally during separation, you can change priority of UVR to Idle in Task manager, and the issue is gone sooner or later.\n\nYou can force it to remember every time you run UVR with Process Lasso (it autostarts with the OS, so you don't have to see the splash screen for a free user every time you want to use it).\n\n-[Read](#_6y2plb943p9v) for RTX 5000 series issue, or use OpenCL (DirectML) in options (slower)\n\n##### Ensembling impossible - model not visible in vocal splitter in UVR\n\n##### - If user-imported Roformers aren't recognized in \"instrumental/vocals\" in ensemble or in vocal splitter, but are in \"multi-stem ensemble\": “The .yaml associated with the model usually needs to be updated to match UVR's stem naming conventions. For example, if your config shows the instruments as \"other\" and \"vocals\", it will need to be updated to \"Instrumental\" and \"Vocals\" (case-sensitive)” - Anjok\n\nE.g. for Karaoke models, you need to change “Karaoke” to “Vocals”.\n\nBut you can also use Tools>Manual Ensemble instead for ready separations.\n\n*Ensemble vs multi-stem ensemble explained (by stephanie)*\n\n“The multi-stem ensemble mode is designed to ensemble every stem in all the models that were selected. I'll explain how it's related:\n\nLet’s say you have two vocal/instrumental models selected, and two 4-stem models selected (vocals, drums, bass, other), then it will process the song through all of the models. As a result, the final ensembled output will be 5 stems: vocals, instrumental, drums, bass, and other.\n\nThe drums, bass, and other stems will just be the ensemble of the 2 models in that selection that shared those outputs. However, all models in that particular selection shared a vocal output so it will be the ensemble of each model's vocal output\n\nI'm not sure if that's very clear, but that's how it works and why the way it handles each stem is different from the other stem pair modes”\n\n###### - Roformers might be sometimes slow/stuck/give memory allocation error during separation on AMD/Intel GPUs with VRAM lower than 16GB, if you don’t lower default “chunk\\_size” during importing the model, or in the corresponding yaml in: models\\MDX\\_Net\\_Models\\model\\_data\\mdx\\_c\\_configs or in Choose Model>Edit Models Config>Change Parameters>Edit Model Param. Start with min. chunk\\_size = 112455 (dim\\_t 256 equivalent) and increase it depending on GPU or model size till you start getting errors to get the best possible SDR.\n\n######\n\n###### For older beta 1-9 and 4GB VRAM GPUs, lower dim\\_t at the bottom of the yaml (not at the top) to e.g. 256 or 201, sometimes 301 - some Roformers will require lower dim\\_t/chunk\\_size - the higher, the better till 1101, or training chunks value in the config.\n\n###### So since patch #10 “new beta version[s] doesn't rely on 'inference.dim\\_t' value anymore (if you were using edited \"dim\\_t\" value).\n\nNow you have to edit audio.chunk\\_size now “In the model yaml config file, at top of it, chunk\\_size is the first parameter (...) you can edit model config files directly inside UVR now.\n\n##### Memory issues - chunk\\_size table for combatting “RuntimeError”\n\n*dim\\_t to chunk\\_size conversion for Roformers and hop\\_length = 441 in the model’s yaml*\n\n- useful if you see insufficient memory error\n\nThe formula is: chunk\\_size = (dim\\_t - 1) \\* hop\\_length\n\ndim\\_t = 1700 is chunk\\_size = 749259 (17s) - used by inst resurrection\n\ndim\\_t = 1333 is chunk\\_size = 587412 (13,32s)\n\ndim\\_t = 1201 is chunk\\_size = 529200 (12s) - used by some newer models\n\ndim\\_t = 1101 is chunk\\_size = 485100 (11.00s) - that dim\\_t value was giving the highest SDR for models trained with 8s chunks, at least in times of models released in beta Roformer beta patch #2 period, it’s default for e.g. duality models\n\ndim\\_t = 801 is chunk\\_size = 352800 (8.00s) - default for most models, max working on Intel/AMD 8GB GPUs on 900MB models, 6-stem SW model on 4GB NVIDIA GPUs works faster with this or lower than 485100 setting\n\ndim\\_t = 556 is chunk\\_size = 244755 (5,5s)\n\ndim\\_t = 501 is chunk\\_size = 220500 (5s) - also, as below, but separation time slightly increased with at least a few browser tabs opened, higher crashes no matter what\n\ndim\\_t = 456 is chunk\\_size = 200655 (4,5s) - works with Resurrection inst on 4GB AMD GPU\n\ndim\\_t = 401 is chunk\\_size = 176400 (4s)\n\ndim\\_t = 356 is chunk\\_size = 156555 (3,55s) - max supported for becruily inst and AMD 4GB VRAM (when in previous beta, dim\\_t needed to correspond with overlap to avoid stem misalignment, so probably halves caused misalignment before),\n\nit can be more muddy in certain parts of songs vs 256\n\ndim\\_t = 301 is chunk\\_size = 132300 (3s) - max working on NVIDIA 2GB VRAM GPU with deux model (memory management in CUDA is much better than in DirectML)\n\ndim\\_t = 256 is chunk\\_size = 112455 (2,55s) - max working with e.g. becruily and smaller unwa’s 400MB exp. models on 4GB AMD GPU\n\ndim\\_t = 201 is chunk\\_size = 88200 (2s) - required value for some more resource-hungry/bigger Roformers on AMD 4GB VRAM (some models only worked with that dim\\_t with earlier beta patches and here’s it’s probably the same or with 256 equivalent), but is still not low enough for most Roformers when demudder is used, and give “Could not allocate tensor” while single model separation previously worked with even bigger chunk setting. You shouldn’t go lower with that parameter, as even a 2 seconds chunk might sometimes give audio skips every two seconds, at least on UVR Roformer patch #2. 2 or 2,5 seconds will be rather bare minimum.” - jarredou (DTN edit)\n\nAll inst/voc Roformers seem to use hop\\_length: 441 (but ensure in yaml), so you always multiply that hop value by the desired dim\\_t - 1 to get correct chunk\\_size (e.g. corresponding with old dim\\_t values you were using in older beta patches)\n\n- UVR won’t run without min. 3GB of disk space on C:\\, but sometimes it’s still not enough for e.g. GPU Conversion unchecked to not trigger memory issues (e.g. “cannot allocate”), esp. on 2-3h files with e.g. Wind model (issues occuring on even 32GB RAM).\n\n- If you don’t have much space on C:\\ you can set the pagefile to even min. 500MB on C: drive and use other partition for pagefile if you run out of space on C:\\ during the process and error occured\n\n<https://mcci.com/support/guides/how-to-change-the-windows-pagefile-size/>\n\n- We have some reports about user custom ensemble presets from older versions no longer working (since 11/17/24 patch).\n\n- Sadly, you need to get rid of them (don’t restore their files manually) or the ensemble will not work and model choice will be greyed out. You need to start from scratch\n\n**Errors troubleshooting**\n\n“got an unexpected keyword argument 'linear\\_transformer\\_depth’ ”\nIn case of the error with any external Roformer model:\n\n- delete “linear\\_transformer\\_depth: 0” line from the YAML file\n\n- To fix issues with BS variant of e.g. anvuew’s de-reverb model in UVR, additionally to the above, also change the following in the yaml file:\n\n“stft\\_hop\\_length: 512 to stft\\_hop\\_length: 441 so it matches the hop\\_length above” (thx lew).\n\nIf that line is not present in your model’s yaml config file, go to the settings, then choose MDX In the Advanced menu, and click the \"Clear auto-set cache\" button.\n\nThen go back to the main settings, click \"Reset all settings to default\" and restart the app (thx santilli\\_).\n\nThese issues don’t happen in the ZFTurbo’s CML inference code of:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/>\n\n##### 'use\\_amp' “Key error”\n\nusing the GH repo above, and referencing (separating) some models:\n\n- “add:\n\n*use\\_amp: true*\n\nin the training part of [models’ yaml] config file (it's [missing](https://imgur.com/a/tYov1Zj))”\n\n##### “”’norm’”” attributeError using e.g. unwa beta 5e model in UVR\n\n- a) Ensure you installed [UVR Roformer patch](#_eopfi619c6zr) (5.6.1), and you're not using the old 5.6 version (but 5.6.1 is reported once you open the app)\n\nb) You could pick wrong model architecture in Install model option (so not Roformer, and generally v1), or haven’t turned on “Roformer model” option during importing the model into UVR (option present in the older beta versions).\n\nYou can go to the bottom of the models list and pick “Edit Model” to change it.\n\nc) Alternatively the following might help in some cases:\n\n“Edit the yaml file [of the model] from this -\n\ntraining:\n\ninstruments:\n\n- vocals\n\n- other\n\ntarget\\_instrument: vocals\n\nuse\\_amp: True\n\nto this -\n\ntraining:\n\ninstruments:\n\n- Vocals\n\n- Instrumental\n\ntarget\\_instrument: Vocals\n\nuse\\_amp: True”\n\n- Anjok\n\n[More](https://discord.com/channels/708579735583588363/1226334240250269797/1263302732715134986) troubleshooting on the “norm” issue\n\nE.g. BS-Roformer\\_LargeV1 is stuck on 5%\n\n- Decrease chunk\\_size to 112455 (new patches)\n(pre-10# patches) “Go to MDX settings, MX23C specific, turn off default segment size and use segment size 256, it's probably filling up your VRAM”\nThe setting resets itself. You should be able to set it permanently in the yaml configuration file of the model at the bottom (dim\\_t parameter).\nIt might be required for AMD/Intel 4GB VRAM GPUs (or potentially even 201, although it was using Rofo patch #2).\n\nQ: “is there a way to reset which yaml file to use? I chose the incorrect yaml file for a particular ckpt file, and now I cannot change it”\n\nA: Go to models list>Edit Model Config>Change parameters and choose the yaml from the list.\n\nOr go to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models\\model\\_data and if it was done just now, the last modified yaml in this folder will be the one corresponding to your model. You can just open that json, and edit it to write the config name located in mdx\\_c\\_configs you want to use with that model. You should find proper hashed yaml by the saved model of your choice, but if you really feel lost, you can delete all hashed yamls, so you’ll need to go through the process of choosing configs for all custom MDX and Roformer models, but remember to not delete \"model\\_data.json\" and \"model\\_name\\_mapper.json\" as they cover models added to Download Center or written to be recognized automatically by UVR, so it’s rather not a place you look for. Also, you can decode hashed json names corresponding to specific models [here](#_cb6cxq8g7i0v).\n\nLayers error - for issues with Unwa 400MB model\n\n-“First make sure you're running the latest patch. If you're on the latest patch, It might be trying to associate with an incompatible YAML”, but resetting the parameters below might be enough.\n\nGo into the mdx\\_c\\_configs folder\n\nFind and delete BS\\_Inst\\_EXP\\_VRL.yaml\n\nGo back into the \"Download Center\"\n\nSelect \"MDX-Net\" and give it a moment.\n\nClose the Download Center and try again.\n\nIf that doesn't work, you might have a previous json model file that's interfering:\n\nSelect the model in the MDX-Net model menu\n\nThen select \"Edit Model Config\"\n\nFrom the popup, click \"Reset Parameters\"” Anjok\n\n##### Layers errors - general\n\na) You didn’t install the newest patch and still use e.g. beta 2 with some newer model\n\nb) You could check Roformer v2 instead of v1 during installing of the custom model.\n\nc) Model trainer didn't clean the weight using [this](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/scripts/prepare_weights_for_inference.py) Python script (you can do it by yourself)\n\nd) You can just use [MSST](#_2y2nycmmf53) instead\n\n##### TypeError: (...) freqs\\_per\\_bands\n\nYou probably set Mel-Roformer model type instead of BS-Roformer when it was necessary.\n\nGo to MDX-Net, pick the model>Edit model config>Choose parameters.\n\nThere you should change Model type.\n\n##### 'mlp\\_expansion\\_facfor’\n\nYou probably use outdated codebase for Linux or old UVR version incompatible with some newer Roformers like the SW.\n\nWe have some workarounds for it, but at least below not confirmed to work so far:\n\n\"The first issue is that the BS Roformer implementation in UVR5 does not accept the mlp\\_expansion\\_factor parameter. If mlp\\_expansion\\_factor is set to 4 in the [yaml] config file, simply remove that line.” - anvuew\n\nAnd parallely the second issues exists:\n\n\"The size of tensor a (484864) must match the size of tensor b (485100) at non-singleton dimension 2\"\n\nRegarding the second issue, chunk\\_size should be a multiple of stft\\_hop\\_length.\" - anvuew\n\nQ: Hop is 512 and chunk is 588800.\n\nThat's a ratio of 1150.\n\nA: Then you need to find out where this 485100 comes from.\n\n[And they didn't].\n\n##### torch.\\_dynamo.polyfills.fx\n\nSome of the newer Roformer models fail with the UVR RTX 5000 patch giving that error, while the non-RTX 5000 patch works.\n\nIt will probably fail anyway, but if you found DirectML in settings, you could try using it instead of CUDA.\n\nAlternatively, you could make a copy of your current UVR installation, and install the latest non-RTX-5000 [patches](#_eopfi619c6zr) (clean installation at best), and there look for DirectML option in settings, or just uncheck GPU Conversion, and separate using CPU, although it will be much longer than using decent GPU.\n\nAlternatively use [MSST](#_2y2nycmmf53) instead:\n\n##### [WinError 2] The system cannot find the file specified\n\n- Sometimes 32/64 bit float output set can trigger it\n\n- Or you can also reencode your input file and name it intput.wav, and choose wav mode.\n\n- Setting FLAC output might also work (it seems to happen with mp3 input and output).\n\n(can’t remember if it still exists in the latest patches)\n\n##### System error\n\n- UVR will only process files with English characters - some complicated names/paths give “System error” during separation\n\n(can’t remember if it still exists in the latest patches)\n\nE.g. for RuntimeError: \"Error opening 'F:/Fl studio files/ACAPELLAS\\1y2mate.com - 2 AM Full Video Karan Aujla Roach Killa Rupan Bal Latest Punjabi Songs 2019(Vocals).wav': System error.\" your file path/file name is too long or contains some unsupported charts. You need to shorten/simplify it and/or copy the file to a different location. E.g. D:\\input.wav\n\n##### Python39.dll\n\nProbably it happened after installation of some patch on (too) dirty UVR installation (probably already patched before or some older than 22\\_30.\nYou must reinstall UVR using only the latest required patches.\n\n**More troubleshooting**\n\n- If you have:\n\nRuntimeError: \"\"\n\nTraceback Error: \"\n\nwithout any text in these lines on AMD GPU on every attempt of using GPU Conversion in UVR for all archs and models (you probably use outdated GPU drivers and/or Windows), go to Ultimate Vocal Remover\\torch\\_directml and replace DirectML.dll from C:\\Windows\\System32\\AMD\\ANR (make a backup before). Experimentally, you can use this older [1.9.1.0](https://drive.google.com/file/d/1dxmG0cfclGMkFLfdoKAfQaDkx3Rcspku/view?usp=sharing) version of the library. Restart UVR after replacing the file!\nIf you use an incompatible library version, you’ll encounter the “Unhandled exception” startup issue.\nBe aware that the linked older version of the library might cause additional noise for MDX-Net v2 models like HQ\\_X (the issue is gone when you turn off GPU Conversion).\n\n- All MDX-Net v2 models (maybe beside 4 stem variants), have so called MDX noise, which can be cancelled by using Options>Advanced MDX-Net Settings>Denoise Output>Standard (or Model)\n\n- At least beta #2 Roformer update caused some stability and performance issues with other archs than Roformers for some people when specific parameters started to take more time than before.\n\nRoll back to stable 5.6 (non 5.6.1) in these cases if necessary. Possibly make a copy of the old installation. Your configuration files might be lost. You can use both installations at the same time (or at least when one, e.g. Roformer patch is installed or symlinked in the default location).\n\n- (I think I covered that issue above more thoroughly)\n\nRoformer models in at least patch #2 work only in “Multi-Stem” mode in UVR. Using them in Ensemble causes layers errors (you can use manual Ensemble instead).\nIirc, it’s caused by yaml config where instead of Instrumental + Vocals (with V as capital letter) there’s written other + vocals, and you need to change it. Iirc it doesn’t happen on models downloaded from Download Center as Anjok was fixing the issue, but the problem might still exist in yamls of some custom models outside the center\n\n- If you have sudden issues with not being able to separate, try to reinstall the app, and/or possibly make sure you didn’t turn on some power saving option in your laptop. Plus, you can simply try to reopen UVR (few fail tries on incompatible DirectML.dll with your GPU driver/OS will hang UVR on “Loading Model” till you close UVR manually from Task Manager).\n\n- When using e.g. BS-Roformer SW: “RuntimeError: \"The size of tensor a (2) must match the size of tensor b (6) at non-singleton dimension 0\"\n\nTraceback Error: \"”\n> Replace the yaml config manually in:\n\nC:\\Users[User]\\AppData\\Local\\Programs\\Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models\\model\\_data\\mdx\\_c\\_configs\n\nRestart UVR, start over. Make sure you were asked to replace it. If not, the yaml was maybe wrongly picked anyway. Then edit in the Edit config menu.\n\n\\_\\_\\_\\_\\_\\_\\_\n\n*You’ll find more UVR troubleshooting in* [*this*](#_ul5en196k909) *section*\n\n\\_\\_\\_\\_\\_\n\n*Problems fixed in newer patches*\n\n- (deprecated since patch #10 - now convert dim\\_t it to chunk\\_size) dim\\_t = 1101 seems to be a sweet spot in terms of speed/SDR according to [measurements](https://imgur.com/jTRoTDg) (although on 1 minute files); use 1120 if UVR refuses to accept 1101 in GUI (or edit yaml file)\n\n- (deprecated since patch #10) Some Roformer configs have wrong dim\\_t at the bottom of the yaml by default (e.g. 256), change it at the bottom of the yaml config for better SDR (not the one at the top), e.g. to 1101 (more explanations on it later).\n\n- (fixed in patch #10) VIP code in Roformer beta patch #2-9 (and probably #1) doesn’t work -\n\nDownload all the VIP models you need before patching older 5.6 to beta Roformer or use two installations of the UVR if you can’t use patch #10 with the fix.\n\n- (fixed in patch #6)People experience All stems error with viperx’ 12xx models in newer versions of UVR Beta Roformer patch (patch #2 was the last confirmed to work with these older models)\n\n- (fixed in patch #10) mlp\\_expansion\\_factor: 1 or (when mlp line is deleted from yaml) mismatch for MelBand Roformer error\nYou probably use older Roformer patch incompatible with newer models (e.g. #2)\n\nIt also appears when you wrongly set v2 model type.\n\n*Fixed in the beta patch #3 and #4 for all platforms*\n\n- Don't set overlap higher than 11 for 1101 dim\\_t (**at the bottom** of yaml file in the “inference” section, not above) and overlap 8 for 801 - these two are the fastest settings before stem misalignment issues occur. Otherwise, it can lead occasionally to some effects or synths missing from the instrumental stem (although some rules can be broken here with various settings). Also, the problems with clicks are alleviated with these good settings.\n\n- In beta #2 patch, best measured SDR for both Mel and BS-Roformers is when dim\\_t = 1101 in the inference section of yaml config and when overlap is set to 2 in GUI (although 1 wasn’t tested, and is actually lower). But the last beta patches, all bigger overlap values are slower, so SDR might be higher with higher values.\n\nBe aware that it will increase separation time. Maximum allowed value before error is 1801, but 1501 or 1601 depending on a model will be the max reasonable for experiments before some unwanted downsides of too high or too low dim\\_t appear (disappearing of some stem elements). In some specific cases, 1333 (or potentially 1301) was giving better results than 1101 or 1501, but it depended on song length - usually it happened on short fragments.\n\n- Instruction for overlap and dim\\_t above applies to other Roformer models as well, and not only those in Download Center. With the instructions, you can achieve faster separation times, as you’re not forced to use the most time-consuming overlap 2 in older patches to avoid stem misalignment issues\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n**Model characteristics**\n\n(the list might be getting outdated, read models list at the [top](#_2vdz5zlpb27h))\n*Note: E.g. unwa’s duality models v1/2 and inst v1/2/v1e are now added to UVR Beta Roformer Download Center (so you don’t have to mess with models and configs manually)*\n\n- viperx 1053 model separates drums and bass in one stem, and it's very good at it\n\n(although now it might be better to use Mel-Roformer drums on [x-minus.pro/uvronline](http://x-minus.pro/uvronline))\n\n“Target is drums and bass, and \"other\" is the rest. Despite that, it says vocals”\n\n- Unwa released a new Inst v1e [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) (“The model [yaml] configuration is the same as v1”)\n“The \"e\" stands for emphasis, indicating that this is a model that emphasizes fullness.”\n\n- unwa inst v2 - it gets muddier than v1 at times, but it has less of noise\n\n- unwa inst v1 - focused on instrumental stem:\n\n[model](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [phase fixer](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=drive_link)\n\"much less muddy (..) but carries the exact same UVR noise from the [MDX-Net v2] models\"\n\nBut it's a different type of noise, so aufr33 denoiser won't work on it.\n\n“you can \"remove\" [the] noise with uvr denoise aggr -10 or 0” although with -10 it will make it sound more muddy like Kim model and synths and bass are sometimes removed with the denoiser (~becruily). Mel-Roformer denoise might be better for it.\nbecruily released a Python [script](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=sharing) fixing the noise issue (execute “pip install librosa” in case of module not found error) - it sound similar to the method used for premium user on x-minus.\n\n- unwa beta 4 Mel-Roformer (fine tune of Kim’s voc/inst model ):\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)\n\nBe aware that the yaml config has changed, and you need to download the new beta4 yaml.\n\n“Metrics on my test dataset have improved over beta3, but are probably not accurate due to the small test dataset. (...) The high frequencies of vocals are now extracted more aggressively. However, leakage may have increased.” - unwa\n\n“one of the best at isolating most vocals with very little vocal bleed and still doesn't sound muddy” “gives fuller vocals”. Can be a better choice on its own than some ensembles.\n\n- unwa duality model - focused on both stems, and instrumental is similarly muddy like in beta 4\n\n- Kim Mel-Band Roformer vocal model\n\nIt’s less muddy than 1296/1297.\n\n([original repo](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model) - CML faster on CUDA than in UVR | [model](https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt?download=true) | [config](https://drive.google.com/file/d/1U1FnACm-ontQSjhneq-WKk1GHEiTW97s/view?usp=sharing) - place the model file to models\\MDX\\_Net\\_Models and .yaml config to model\\_data\\mdx\\_c\\_configs subfolder and “when it will ask you for the unrecognised model when you run it for the first time, you'll get some box that you'll need to tick \"roformer model\" and choose it's yaml” (Mac issue explained in the section above).\n\n(simple [Colab](https://colab.research.google.com/drive/1tyP3ZgcD443d4Q3ly7LcS3toJroLO5o1?usp=sharing)/[CML inference](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model)/[x-minus](https://x-minus.pro/)/[MVSEP](https://mvsep.com/)/[jarredou Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) too now)\n\n- unwa BS-Roformer finetuned a.k.a. large (further trained viperx 1297 model) [download](https://drive.google.com/file/d/1Q_M9rlEjYlBZbG2qHScvp4Sa0zfdP9TL/view)\n\nMore muddy than Kim above, a bit less of vocal residues, a bit more artificial sound.\n\n- Mel-RoFormer Karaoke / Lead vocal isolation model files released by Aufr33 and viperx ([download](https://mega.nz/file/qQA1XTrb#LUNCfUMUwg4m4LZeicQwq_VdKSq9IQN34l0E1bb0fz4))\n\n*Older models in Download Center*\n\n- older viperx’ 1297 model tend to be a bit better for instrumentals, and 1296 for vocals (both more muddy than Kim and Unwa models, but “still pretty good for voice cleaning” and dealing with noise) - BS-Large model by Unwa is a fine-tune of that model.\n\n- 1143 model is the first Mel-Roformer trained by viperx before Kim introduced changes to the config, which fixed the problem of lower SDR vs models trained on BS-Roformer. Use Kim Mel-Roformer instead\n\nBoth models struggle with saxophone and e.g. some Arabic guitars. It can still depend on a song whether these are better than even the second oldest Roformer than on MVSEP (from before viperx model got fine-tuned version). They tend to have more problems with recognizing instruments. Other than that, they're very good for vocals (although Mel-Roformer by Kim on x-minus tends to be better).\n\nMuddy instrumentals when not ensembled with other archs.\n\nBe aware that names of these models on UVR refer to SDR measurements of vocals conducted on private viperx dataset, not even older Synthetic dataset, instead of on multisong dataset on MVSEP, hence the numbers are higher than in the multisong chart on MVSEP.\n\n*Infos and fixes for older patch #1/2\n(with matching overlap (reversed) and dim\\_t necessity)*\n\n- To avoid separation errors for 4GB VRAM and AMD/Intel GPUs using Roformers, set segments 32, overlap 2 and dim\\_t 201 with num\\_overlap 2 both at the bottom of yaml config in \\models\\MDX\\_Net\\_Models\\model\\_data\\mdx\\_c\\_configs\n\n(dim\\_t 301 and overlap 3 also works, although not on all models [e.g. not for beta 3, but inst v1] and seems to be less muddy and fewer clicks appear).\n\ndim\\_t 201 is not optimal setting and might lead to more occasional quiet residues, clicks or sudden volume changes (like chunk was changing every 2 seconds), although there’s no stem misalignment issue with these settings (they work both for Mel and BS Roformers). dim\\_t 301 with lighter models seems to be a bare minimum to avoid the majority of audible artefacts (after patch #3 dim\\_t 256 is allowed - “make sure you check the \"Segment Default\" in MDXNET23 Only Options for it to take effect”).\n\nUsing the settings above on patch #2, with GPU acceleration it will take 39m 28s for 3:28 song using 1296 model on RX 470 4GB and 18 minutes for Kim Mel-Roformer and 3:01 song.\n\nUsing HQ\\_4 is much faster than realtime using default settings, but even longer than accelerated Roformer, when on CPU only using old Core 2 Quad @3.6 DDR2 800MHz.\n\nOn Mac M1 using the patch above, it takes 9 minutes to process a 3-minute song using BS-Roformer (dim\\_t 1101, batch size 2, overlap 8) with “constant throttling”. [Click](https://mvsep.com/quality_checker/entry/6900)\n\nAnd below 4 minutes for Kim Mel-Roformer (overlap 1, dim 801). [Click](https://mvsep.com/quality_checker/entry/6959)\n\n- Settings working for 6GB AMD GPUs: dim\\_t 601 or 701 at the bottom of the yaml file and overlap 6 or 7 in GUI.\n\n- Like I mentioned, overlap 8 can be good enough too when dim\\_t=801 is set (the fastest setting before SDR getting drastically reduced), at least in other cases you shouldn’t exceed 6, while 2 should provide the best quality in most cases.\n\n- 1602 (or rather 1601) dim\\_t might lead to less wateriness, but turns out in cost of a bit more of vocal residues.\n\n- “In theory, max overlap value [for Roformer separations without mentioned issues in UVR] can be known with formula:\n\n(dim\\_t - 1) / 100 = Max\\_overlap\\_value\n\nif dim\\_t = 801:\n\n(801 - 1) / 100 = 8\n\nif\\_dim\\_t = 1101:\n\n(1101 -1) / 100 = 10 [jarredou wrote 10 here, but it’s actually 11]\n\nAbove that max value, some parts of the input will not be processed.\n\nThe lower the overlap value is, the more overlap is used, so better SDR.\n\n- Some Rofos models still have wrong config by default, with dim\\_t=256, so max overlap value for that is 2. That's why I've advised to stick to overlap=2” - jarredou\n\nSo in times before dim\\_t was known how to be correctly set, so now overlaps can be even set to 8 now when dim\\_t=801 is set]).”\n\nThe same thing applies for both BS and Mel Roformers in UVR.\n\n“audio.dim\\_t value is not used with roformers in ZFTurbo script, it uses audio.chunk\\_size and then it's parameters in the model part of config.”\n\n- Using older Roformer beta patches for **Mac M1** doesn’t allow you to choose the Roformer parameter to check for custom Roformer models and only config name can be chosen, but no confirm button is available. So the error “File \"libv5/tfctdfv3.py\", line 152, in \\_\\_init” appears.\n\n> Place the corresponding json file with your model from [this](https://drive.google.com/drive/folders/14IfdqN3tDjXVe0hQ9i5-1KejIJTO09xX?usp=sharing) repo into: models\\MDX\\_Net\\_Models\\model\\_data beforehand, to fix the issue.\n\nIn some cases, you may still get the same error anyway and to get rid of it, you need to edit manually model\\_data.json adding desired model line at the end like your custom model was downloaded from download center. On example of unwa’s beta 3:\n\n},\n\n\"d43f93520976f1dab1e7e20f3c540825\":{\n\n\"config\\_yaml\": \"config\\_melbandroformer\\_big.yaml\",\n\n\"is\\_roformer\": true\n\n}\n\nAdditionally, you need the model at the end of model\\_data\\_mapper.json:\n \"model melband\\_roformer\\_big\\_beta3.ckpt\": \"config\\_melbandroformer\\_big\"\n\n}\n\nNow copy the hash-named json file (d43f93520976f1dab1e7e20f3c540825.json for beta 3) to model\\_data folder.\n\nAll the three modified files for beta 3 and other models [here](https://drive.google.com/drive/folders/1-uqL0AOAJyMM8mEAODXxJsdDTKLTXmdW?usp=sharing).\n\nIf you have problems generating hash on first launch of the model and your model is not uploaded in the repo above or json is not generated then use Windows installation in VM, or ask some PC user for the config. Potentially reading [Hash decoding](#_cb6cxq8g7i0v) can be helpful.\nBut maybe your hashed config name will be generated correctly already after you imported the model into UVR (although no confirmation button might prevent it), and now it will be enough to just place the following line like in the jsons presented above: \"is\\_roformer\": true” (so after “,” in the yaml line above).\n\n- More in-depth - Settings per model SDR vs Time elapsed -||- (incl. dim\\_t and overlap evaluation for Roformers) - [click](https://docs.google.com/spreadsheets/d/1XNjAyKwA2RkyOA_agmaV_6Xp2xXOHjJV09t-ho0nngk/edit?gid=1530726921#gid=1530726921) or [here](https://imgur.com/a/pGjPZec) | [conclusion](https://imgur.com/a/KBYHdNK) - made before patch #3\n\n\\_\\_\\_\n\n*Older news follow*\n\n\\_\\_\\_\n\n- The viperx model was also added on MVSEP\n\n- New ensembles with higher SDR were added on MVSEP\n\n- BS-Roformer model trained by viperx was added on x-minus (it's different from the v2 model on MVSEP, and has higher SDR, it's the “1.0” one). If it's better vs V2 might depend on a song.\n\nIt struggles with saxophone and e.g. some Arabic guitars.\n\n- (x-minus - aufr33) “I have just completed training a new UVR De-noise model. Unlike the previous version, it is less aggressive and does not remove SFX.\n\nIt was trained on a modified dataset. I reduced the noise level and made it more uniform, removed footsteps, crowd, cars and so on from the noise stems. On the contrary, the crowd is now a useful / dry signal. (...) The new model is designed mainly to remove hiss, such as preamp noise.”\n\nFor vocals that have pops or clipping crackles or other audio irregularities, use the old denoise model.\n\n- Dango.ai updated their model, also giving some kind of demudder to the instrumentals, enhancing their results. Results might be better than MDX23C and BS-Roformer v2. Still, it’s pretty pricey (8$ for 10 separations). 5x 30 seconds fragments per IP can be obtained for free, and usually it doesn’t reset. “It’s $8 for 10 tracks x 6 minutes, all aggressiveness modes included (but vocal and inst models are separate). The entire multisong dataset for proper SDR check would cost around $133.” becruily\n\n- Be aware that queues on <https://doubledouble.top/> are much shorter for Deezer than Qobuz links. If there’s no 24 bit versions for your music, use Deezer instead.\n[outdated; currently there’s no longer any MQA files on Tidal] Also, avoid Tidal and 16 bit FLACs from “Max” quality, which is slightly lossy MQA. Use 24 bit MQA from Tidal only when there’s no 24 bit on Qobuz. Most older albums under 2020 are 16 bit MQA instead of 24 bit MQA on Tidal, and are lossy compared to Deezer and Qobuz which doesn’t use MQA (so doubledouble doesn’t convert MQA to FLAC like on Tidal). MQA is only “slightly” lossy, because it affects frequencies mainly from 18kHz and up, and not greatly.\n\n- Members of neighboring AI Hub server made a fork of KaraFan Colab updated with the new HQ\\_4 and InstVoc HQ2 models. It has slow separation fix applied. [Click](https://colab.research.google.com/github/Eddycrack864/KaraFan/blob/master/KaraFan_Improved_Version.ipynb)\n\n- HQ\\_4 and Crowd models added to HV Colab temp [fork](https://colab.research.google.com/drive/1GwMEjhczFzdS0Ld7eZzMcZgEmz6Jgv6m) before merge with main GH repo\n\n- (MVSEP) “We have added longer filenames disabling option to mvsep, you can access it from Profile page\n\n20240312034817-b3f2ef51cb-ballin\\_bs\\_roformer\\_v2\\_vocals\\_[mvsep.com].wav -> ballin\\_bs\\_roformer\\_v2\\_vocals.wav\n\nDue to browser caching, you might want to hard refresh the page if you have downloaded onc”\n\n- The ensembles for 2 and 5 stems on MVSEP have been updated with bigger SDR bag of models containing now new BS-Roformer v2 (with MDX23C, VitLarge23, and for multistem, the old demucsht\\_ft, deumcs\\_ht, demucs\\_6s and demucs\\_mmi models)\n\n- All the Discord direct links leading to images in this document have expired. I already reuploaded some more important stuff. Please ping me on Discord if you need access to some specific image. Provide page and expired link.\n\n- <https://free-mp3-download.net> has been shut down. Check out alternatives [here](#_ataywcoviqx0).\n\nNew Apple Music ALAC/Atmos downloader added, but its installation is a bit twisted and subscription is required. Murglar added.\n\n- MDX-Net HQ\\_4 model (SDR 15.86) released for UVR 5 GUI! Go to Models list>Download center>MDX-Net and pick HQ\\_4 for download. It is an improved and faster than HQ\\_3, trained for epoch 1149 (only in rare cases there’s more vocal bleeding, more often instrumental bleeding in vocals, but the model is made with instrumentals in mind.\n\nAlong with it, also UVR-MDX-NET Crowd HQ 1 has been added in download center.\n\n- HQ\\_4 model added to the Colab:\n\n<https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb>\n\n- New BS-Roformer v2 model released on MVSEP. It’s more aggressive model than above.\n\n- [Fixed](https://colab.research.google.com/drive/1HwCKsVMGotBvkHe1bfR8Q5POZsrRx-iu) KaraFan Colab with the fix for slow non-MDX23 models. You'll no longer stack on voc\\_ft using any other preset than 1, but be aware that it will take 8 minutes more to initialize. (same fix as suggested before, but w/o console, as it wasn't defined, and faster ort nightly fix doesn't work here).\n\nTurns out, there has been an official non-nightly package released, and it works with KaraFan correctly (no need to wait 8 minutes any longer):\n\n!python -m pip -q install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/onnxruntime-cuda-12/pypi/simple/\n\n- (x-minus.pro) “Since Boosty is temporarily not accepting PayPal and generally working sucks, I made the decision to go back to Patreon. Please be aware that automatic charges will resume on March 22, 2024. If you have Boosty working correctly and do not intend to use Patreon, please cancel your Patreon subscription to avoid being charged.\n\nIf you wish to switch from Boosty to Patreon, please wait for further instructions in March.” Aufr33\n\n- If you suffer from bleeding in other stem of 4 stems Ripple, beside decreasing volume by e.g. 3/4dB also “when u throw the 'other stem' back into ripple 4 track split a second time, it works pretty well [to cancel the bleeding]” if it's still not enough, put other stem through Bandlab Splitter.\n\n- If you suffer from vocal residues using Ensemble 4 models on MVSEP.com, decrease volume of input file by -8dB “now it's silent. No more residue” usually 3 or 4dB was doing the trick for Ripple, but here it’s different. Might depend on a song too.\n\n- Image Line “released an update for FL Studio, and they improved the stem separation and it's better, but it has quite a bit of bleeding still, but it also seems they may have improved the vocal clarity”\n\n- (probably fixed in new HV MDX) Our newly fixed VR and newer HV MDX Colabs started to have issues with very slow initialization for some people (even 18 minutes/+ instead of normally 3). It’s probably due to very slow download of some dependencies. Possible solutions: use other Google account, use VPN, make another Google account (maybe using Polish VPN). Let us know if it happens only for some specific dependency or all of them. You can try to uncomment the ORT nightly line in mounting cell (add # before), as it triggers more dependencies to be installed, which can be slow in that case. The downside is - there won't be GPU acceleration, and one song will be processed in 6-8 minutes instead of ~20 seconds.\n\n- New paid drum separation service:\n\n<https://remuse.online/download> (might not work anymore)\n\nIt uses free [drumsep](#_jmjab44ryjjo) model (same model hash: 9C18131DA7368E3A76EF4A632CD11551)\n\n- MDX Colab seem to not work due to Numpy issues. I already fixed them in Similarity Colab, and hopefully reimplement the fixes elsewhere soon. [VR Colab](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl?usp=sharing) fixed too.\n\nTech details about introduced changes described below [Similary Extractor](#_3c6n9m7vjxul) section.\n\n- [Music AI](https://music.ai/) surfaced. Paid - $25 per month or pay as you go ([pricing chart](https://music.ai/pricing/)). No free trial. Good [selection](https://cdn.discordapp.com/attachments/708579735583588366/1206684280625963018/image.png) of models and interesting [module stacking](https://cdn.discordapp.com/attachments/708579735583588366/1206353306767728752/image.png) feature. To upload files instead of using URLs “you make the workflow, and you start a job from the main page using that custom workflow” [~ D I O ~].\n\nAllegedly it’s made by Moises team, but the results seem to be better than those on Moises.\n\n“Bass was a fair bit better than Demucs HT, Drums about the same. Guitars were very good though. Vocal was almost the same as my cleaned up work. (...) I'd say a little clearer than mvsep 4 ensemble. It seems to get the instrument bleed out quite well, (...) An engineer I've worked with demixed to almost the same results, it took me a few hours and achieve it [in] 39 seconds” Sam Hocking\n\n- “I just got an email from Myxt saying they're going to limit stem creation to 1 track per month. For creator plan users (the $8 a month one) and 2 per month for the highest plan.\n\nSo I may assume with that logic, they're gonna take it away for free users?”\n\n- (probably fixed) For all jarredou's MDX23 v. 2.3 Colab fork users:\n\n“Components of VitLarge arch are hosted on Huggingface... when their maintenance will be finished it will work again. I can't do anything about it in the meantime.”\n\n2.2 and 2.1 and MVSEP.com 4-8 models ensemble (premium users) should work fine.\n\n- Ripple now has fade in and clicking issues fixed. Also, there's less bleeding in the other stem (but Bas Curtiz’ trick for -3dB/-4dB input volume decreasing can be still necessary).\n\n“Ripple’s lossless outputs are weird, some stems like the drums are semi full band (kicks go full band, snares not etc) and the “other” stem looks like fake full band”. These fixes are applied also for old versions of the app.\n\nAlso, the lossless option fixes to some extend the offset issue so it's more similar to input now, but not identical (lossless option might require updating). Also no more abrupt endings\n\nRipple = better than CapCut as of now (and fullband).\n\nplus Ripple fixed the click/artifacts using cross-fade technique between the chunks.\n\n- ViperX currently doesn't plan to release his BS-Roformer model\n\n- New “uvr de-crowd (beta)” model added on x-minus. Seems to provide better results than the MVSEP model. Also, an MDX arch model version is planned for training.\n\n“At minimum aggressiveness value, a second model is now used, which removes less crowd but preserves other sounds/instruments better.”\n\n- Ripple seems to have a lossless export option now. “First make sure the app is updated then click the folder then click the magnet icon then export and change it to lossless”\n\n- Seems like CapCut now has added separation inside Android Capcut app in unlocked Pro version\n\n<https://play.google.com/store/apps/details?id=com.lemon.lvoverseas> (made by ByteDance)\n\nSeems like there is no other Pro variant for this app.\n\nAt least unlocked version on apklite.me have a link to regular version, so it doesn't seem to be Pro app behind any regional block. But -\n\n\"Indian users - Use VPN for Pro\" as they say, so similar situation like we had on PC [Capcut](#_f0orpif22rll) before. Can't guarantee that unlocked version on apklite.me is clean. I've never downloaded anything from there.\n\n- Mega, GDrive and direct link support for input files added on MVSep. If you want to apply MVSep algorithm to result of other algorithm, you can use \"Direct link\" upload and point https link on separated audio-file on MVSep.\n\n- If you have an issue with Demucs module not found in e.g. MDX23 v.2.3 Colab (now fixed there and also in [VR Colab](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl?usp=sharing)), here's a solution:\n\n“In the installation code, I added `!pip install samplerate==0.1.0` right before the `!pip install -r requirements.txt &> /dev/null` and I managed to get all the dependencies from the requirements.txt installed properly.” (derichtech15)\n\n- If you repost your images or files from Discord elsewhere while cutting link after \"ex=\" for all new posted files, it will make your files expire pretty soon (17.02.24). If you leave the full link with \"ex=\" and so on, it won't expire so fast, but who knows if not later.\n\nSo far, all the old Discord images shared elsewhere with \"ex=\" cut, work (also in incognito without Discord logged in), but it's not certain that it will be that way forever.\n\nDiscord announced in the end of 2023, that they'll update their mechanisms of sharing links, so they'll expire after some time when they're shared, to avoid some security vulnerabilities allowing scams. Or they just want to offload the servers.\n\n- [OpenVINO™](https://github.com/intel/openvino-plugins-ai-audacity) AI Plugins for Audacity [3.4.2](http://www.github.com/audacity/audacity/releases/tag/Audacity-3.4.2) 64-bit introduced.\n\n4 stems separation, noise suppression, Music Style Remix - uses Stable Diffusion to alter a mono or stereo track using a text prompt, Music Generation - uses Stable Diffusion to generate snippets of music from a text prompt, Whisper Transcription - uses whisper.cpp to generate a label track containing the transcription or translation for a given selection of spoken audio or vocals.\n\nNot bad results. They use Demucs.\n\n- For people with low VRAM GPUs (e.g. 4GB or less), you can test out [Replay](https://www.tryreplay.io/) app, which provides voc\\_ft model and tends to crash less than UVR. Sadly, the choice of models is much smaller, but it has some de-reverb solution. [Screenshot](https://cdn.discordapp.com/attachments/708579735583588366/1196043224846962749/image.png)\n\n- Latest MVSep changes:\n\n1) All ensembles now have option to output intermediate waveforms from independent algorithms + additional max\\_mag, min\\_mag.\n\n2) Ensemble All-In now includes DrumSep results extracted from Drum stem.\n\n- resemble-enhance ([GH](https://github.com/resemble-ai/resemble-enhance)) model added on x-minus in denoise mode. It can work better than the latest denoise model on x-minus. It is intended only for vocals. For music use UVR De-noise model on x-minus.\n\n- (fixed in kae, 2.1, 2.2 [and KaraFan irc] Colabs) All Colabs using MDX-Net models are currently very slow. GPU acceleration is broken and separations now only work on CPU with onnxruntime warnings.\n\nTo work around the issue, go to Tools>Command palette>Use fallback runtime version (while it's still available).\n\nDowngrading CUDA to 11.8 version fixes the issue too, but it takes 9 minutes in order to install that dependency, so it’s faster to use fallback runtime till it’s still available. After that period, just execute this line after initialisation cell:\n\nconsole('apt-get install cuda-11-8') and GPU acceleration will start to work as usual.\n\n>“Better fix [than CUDA 11.8] until final version is released, using that onnxruntime-gpu nightly build for cuda12:\n\n!python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/ort-cuda-12-nig\n\nhtly/pypi/simple/\n\n(no need to install cuda 11.8)” jarredou\n\nIn case of credential issues you can try out this package instead:\n\n!python -m pip -q install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/onnxruntime-cuda-12/pypi/simple/\n\n- LarsNet model was added on MVSep. It's used to separate drums tracks into 5 stems: kick, snare, cymbals, toms, hihat. Source: <https://github.com/polimi-ispl/larsnet>\n\nIt’s worse than Drumsep as it uses Spleeter-like architecture, but “at least they have an extra output, so they separate hihats and cymbals.”. [Colab](https://github.com/jarredou/larsnet-colab)\n\n“Baseline models don't seem better quality than drumsep, but the provided checkpoints are trained with oly 22 epochs, it doesn't seem much. (and STEMGMD dataset was limited by the only 10 drumkits), so it could probably be better with better dataset & training”\n\n“ it separates the toms so much better [than Drumsep]”\n\nSimilar situation as with Drumsep - you should provide drums separated from e.g. Demucs model.\n\n- Captain FLAM from [KaraFan](#_7kniy2i3s0qc) asks for some help due to some recent repercussions.\n\nYou can support him on <https://ko-fi.com/captain_flam>\n\n- To preserve instruments which are counted as vocals by other MDXv2 models in KaraFan, use [these](https://cdn.discordapp.com/attachments/1162265179271200820/1175056481956134943/image.png) preset 5 modified settings (dca100fb8).\n\n- Added more remarks from testing these settings against sax preset and others.\n\n- drumsep added on MVSEP!\n\n(separation of drums from e.g. Demucs 4 stem or “Ensemble 8 models”/+)\n\n- New Bandid Plus model added on MVSEP\n\n“I trained BandIt for vocals. But it's too far away from MDX23C” -ZFTurbo\n\n“I loved this bandit plus model!! It has great potential.”\n\n- UVR De-noise model by FoxJoy added on x-minus. It’s helpful for light noise, e.g. vinyl. (de-reverb and de-echo are up already)\n\nNew MDX de-noise model is in the works and beta model was also added!\n\n“the instruments in the background are preserved much better than the FoxJoy model”\n\nIt works for hiss, interference, crackle, rustles and soft footsteps, technical noise.\n\n- New hifi-gan-bwe Colab fork made by jarredou:\n\n<https://colab.research.google.com/github/jarredou/hifi-gan-bwe/blob/main/HIFIGAN_BWE.ipynb>\n\n- New AI speech enhancer - <https://www.resemble.ai/introducing-resemble-enhance>\n\n- Reason 12.5 (a DAW) was released with VST3 plugin support\n\n- jazzpear94 “I made a [new model](https://cdn.discordapp.com/attachments/708580573697933382/1180312079190724718/Cinematic_MDX23C.zip) with a modified version of my SFX and Music dataset with the addition of other/ambient sound and speech. It's a multistem model and should even work in UVR GUI as it is MDX23C.\n\nNote: You may want to rename the config to .yaml as UVR doesn't read .yml and I didn't notice till after sending. Renaming it fixes that, however”\n\n“You put config in models\\mdx\\_net\\_models\\model\\_data\\mdx\\_c\\_configs. Then when you use it in UVR it'll ask you for parameters, so you locate the newly placed config file.”\n\n“Keep in mind that the cinematic model focus is mainly on sfx vs instruments\n\nvoice stems are supplemental. Usually I remove voices first”\n\n- <https://github.com/karnwatcharasupat/bandit>\n\nBetter SDR for **Cinematic** Audio Source Separation (dialogue, effect, music) than Demucs 4 DNR model on MVSEP (mean SDR 10.16>11.47)\n\n- \"[Demucs+CC\\_Stereo\\_to\\_5.1](https://cdn2.imagearchive.com/quadraphonicquad/data/attach/77/77907-Demucs-CC-Stereo-to-5.1v0.2b.zip)\" - it's a script where you can convert Stereo 2.0 to 5.1 surround sound. Full [discussion](https://www.quadraphonicquad.com/forums/threads/demucs-centrecutcl-stereo-to-5-1-script-v-0-2b.32788/) about script. They use MVSep to get steams and after use script on them.\n\n- [Colab](https://colab.research.google.com/drive/17SSjougcnVhX6WewW88QoKKFuFiKNz8t?usp=sharing) by jazzpear96 for using ZFTurbo's MSS training script. “I will add inference later on, but for now you can only do the training process with this!”\n\n- New djay Pro 5.0 has “very good realtime stems with low CPU” Allegedly “faster and better than Demucs, similar” although “They are not realtime, they are buffered and cached.” it uses AudioShake. It can be better for instrumentals than UVR at times.\n\n- AudiosourceRE Demix Pro new version has lead/backing vocals separation\n\n- New **crowd** model added on MVSEP (applause, clapping, whistling, noise) (and got updated by the time 5.57 -> 6.06; added hollywood laughts, old models also available)\n\n- VitLarge23 model on MVSEP got updated (9.78>9.90 for instrumentals)\n\n- MelBand RoFormer (9.07 for vocals) model added on MVSEP for testing purposes\n\n“The model is really good at removing the hi-hat leftovers. These e.g. in the Jarredou colab sometimes when you can hear the hi-hats from the acapella. And Melband roformer can almost remove all the hi-hat leftovers from the acapella.”\n\n“are the stems not inverted result? for me it sounds like there is insane instrument loss in the instrumental stem and vocals loss in the vocal stem, yet there is no vocal bleed in instrumental stem and vice versa” “I also think that the vocals are surprisingly clean considering the instrumentals sound quite suppressed but also clean”\n\n- Goyo Beta plugin for dereverb stopped working on December 2nd (as it required internet connection and silent authorization on every initialization). They transitioned to paid Supertone Clear. They send BETA29 coupon over emails (with it, it’s $29).\n\n- New MVSep-MDX23 Colab Fork v2.3 by jarredou published under new Colab link [here](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb)\n\nNow it has Vitlarge23 model (previously used exclusively on MVSEP) instead of HQ3-Instr, also improved BigShifts and MDXv2 processing.\n\nDoesn't seem to be better than RipX which is better in preserving some instruments, and also removes vocals completely\n\n- Check out new [Karaoke](#_vg1wnx1dc4g0) recommendations (dca100fb8)\n\n- Dango.ai finally received English web interface translation\n\n- New SFX model based on Mel roformer was released by jazzpear94. [More info](https://discord.com/channels/708579735583588363/708580573697933382/1174160853877129326)\n\n- User friendly [Colab](https://colab.research.google.com/drive/1YBpeGj66FfIHS1WH8uYBcshITOGVYOHY?usp=sharing) made by jarredou and [forked](https://colab.research.google.com/drive/1jrw-cAi-JqZpBi6wyT3YIp3x-XHhDm1W?usp=sharing) by jazzpear94 with new feature. In case of some problems, use WAV file.\n\n- Seems like Ripple got updated, \"it sounds a lot better and less muddied\" doesn’t seem to give better results for all songs, though. Might be similar case with Capcut too.\n\n- Hit 'n' Mix RipX DAW Pro 7 released. For GPU acceleration, min. requirement is 8GB VRAM and NVIDIA 10XX card or newer (mentioned by the official document are: 1070, 1080, 2070, 2080, 3070, 3080, 3090, 40XX, so with min. 8GB VRAM). Additionally, for GPU acceleration to work, exactly “Nvidia CUDA Toolkit v.11.0” is necessary. Occasionally, during transition from some older versions, separation quality of harmonies can increase. Separation time with GPU acceleration can decrease from even 40 minutes on CPU to 2 minutes on decent GPU.\n\n- UVR BVE v2 beta has been updated on x-minus\n\n“It now performs better on songs with 2 people singing the lead\n\nNo longer separates the second lead along with it”\n\n-dca100fb8 found out new [settings](https://media.discordapp.net/attachments/1162265179271200820/1173570485733302312/karafan.PNG) for [KaraFan](#_7kniy2i3s0qc) which give good results for some difficult songs (e.g. Juice WRLD) for both instrumental and acapella. It’s now added as preset 5.\n\nDebug mode and God mode can be disabled, as it's like that by default.\n\n\"It's like an improved version of Max Spec ensemble algorithm [from UVR]\"\n\nProcessing time for 6:16 track on medium setting is 22 minutes.\n\n- New MDX23C model added exclusively on MVSEP:\n\nvocals SDR 10.17 -> 10.36\n\ninstrum SDR 16.48 -> 16.66\n\nAlso ensemble 4 got updated by new model (10.32>10.44 for vocals)\n\n- For some people using mitmproxy scripts for Capcut (but not everyone), they “changed their security to reject all incoming packet which was run through mitmproxy. I saw the mitmproxy log said the certificate for TLS not allowed to connect to their site to get their API. And there are some errors on mitmproxy such as events.py or bla bla bla... and capcut always warning unstable network, then processing stop to 60% without finish.” ~hendry.setiadi\n\n“At 60% it looks like the progress isn't going up, but give it idk, 1 min tops, and it splits fine.” - Bas\n\n-ZFTurbo published his training code:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\n\"It gives the ability to train 5 types of models: mdx23c, htdemucs, vitlarge23, bs\\_roformer and mel\\_band\\_roformer.\n\nI also put some weights there to not start training from the beginning.\"\n\nIt contains checkpoint of e.g. 1648 (1017 for vocals) MDX23C model to train it further.\n\nBe aware that the older bs\\_roformer implementation is very slow to train IRC.\n\nVitlarge23 “is running 2 times faster than MDX models, it's not the best quality available, but it's the fastest inference”\n\n“change the batch size in config tho\n\nI think zfturbo sets the default config suited for a single a6000 (48gb)\n\nand chunksize”\n\n-\"A small update to the backing vocals extractor [on X-Minus]\n\nNow you can more accurately specify the panning of the lead vocal.\" ~Aufr33 [Screen](https://media.discordapp.net/attachments/900904142669754399/1170201735722188810/bve.png)\n\n- IntroC created a [script](https://drive.google.com/drive/folders/12m1qrRNpsTrCxfioG9xzcZUYTV0Gl8Ap) for mitmproxy for Capcut allowing fullband output, by slowing down the track. [Video](https://www.youtube.com/watch?v=-34Q5rJ68pI)\n\n- Jazzpear created new VR SFX model. Sometimes it’s better, sometimes it’s worse than [Forte’s](https://drive.google.com/drive/folders/12CnfpIph5Ipd9ocoD6RsbOWzWsCeWAeT?usp=share_link) model. [Download](https://www.dropbox.com/scl/fo/lcpknm3rvehxhryzcd6mb/h?rlkey=zpi8pnpda30d0n71tqckiocod&dl=0)\n\nFor UVR 5.x GUI, use these parameters (irc same as Forte):\n\nUser input stem name: SFX\n\nDo NOT check inverse stem!\n\n1band sr44100 hl 1024\n\n- Now KaraFan should work locally on 4GB GTX GPUs (e.g. laptop 1060), on presets 2 or 3, and with chunk 500K, speed can be slowest. Download on GitHub the Code > ZIP\n\n-Bas Curtiz' new video on how to install and use Capcut for separation incl. exporting:\n\n<https://www.youtube.com/watch?v=ppfyl91bJIw>\n\nand saving directly as FLAC, although the core source of FLAC is still AAC in this case:\n<https://www.youtube.com/watch?v=gEQFzj6-5pk>\n\n\"It's a bit of a hassle to set it up, but do realize:\n\n- This is the only way (besides Ripple on iOS) to run ByteDance's model (best based on SDR).\n\n- Only the Chinese version has these VIP features; now u will have it in English\n\n- Exporting is a paid feature (normally); now u get it for free\n\nThe instructions displayed in the video are also in the YouTube description.\"\n\nCapcut normalizes the input, so you cannot use Bas’ trick to decrease volume by -3dB like in Ripple to workaround the issue of bleeding (unless you trick out the CapCut, possibly by adding some loud sound in the song with decreased volume, something like presented [here](https://cdn.discordapp.com/attachments/1129475305950687372/1169341419945721916/image.png)).\n\n- (fixed) KaraFan Colab will be fixed on 27th at morning.\n\n- There’s a workaround for people not able to split using Capcut. The app discriminate based on country (poor/rich) and paywalls Pro option.\n\nThe [video](https://cdn.discordapp.com/attachments/708595418400817162/1166823721831501834/Testing_CapCut_workaround.mp4) demonstration for below\n\n0. Go offline.\n\n1. Install the Chinese version from [capcut.cn](https://www.capcut.cn/)\n\n2. Use these files copied over your current Chinese installation, and don’t use English patch.\n\n3. Open CapCut, go online after closing welcome screen, happy converting!\n\n4. Before you close the app, go offline again (or the separation option will be gone later).\n\nBefore reopening the app, go offline again, open the app, close welcome screen, go online, separate, go offline, close. If you happen to missed that step, you need to start from the beginning of the instruction.\n\n(replacing [SettingsSDK](https://cdn.discordapp.com/attachments/708595418400817162/1167169672580440195/SettingsSDK.zip) folder no longer works after transition from 4.6 to 4.7, it freezes the app)\n\nFYI - the app doesn’t separate files locally.\n\n- Bas Curtiz found out that decreasing volume of mixtures for Ripple by -3dB eliminates problems with vocal residues in instrumentals in [Ripple](#_f0orpif22rll). [Video](https://cdn.discordapp.com/attachments/708579735583588366/1165647205600854118/Ripple_vs_-6db_example_2.mp4).\n\nThis is the most balanced value, which still doesn't take too many details out of the song due to volume attenuation.\n\nOther good values purely SDR-wise are -20dB>-8dB>-30dB>-6dB>-4dB> /wo vol. decr.\n\nThe method might be potentially beneficial for other models and probably work best for the loudest tracks with brickwalled waveforms.\n\n##### - Stable 5.6 OpenCL (DirectML) version of UVR 5 GUI for Windows\n\n##### Supporting AMD and Intel GPUs acceleration but no Roformers yet\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup_directml_old.exe>\n\nMac: <https://github.com/Anjok07/ultimatevocalremovergui/releases/>\n\n(newer [beta Roformer](#_6y2plb943p9v) [with “roformer” in the installer name] supports both DirectML and CUDA out of the box already; for Mac M1 [click](https://discord.com/channels/708579735583588363/767947630403387393/1224923138958299146)).\n\n- For CUDA (NVIDIA GPUs) - non-OpenCL installer in the name from here:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe>\n\n(Following based on previous OpenCL build)\n\n8GB VRAM for 3:00/3:30 tracks using MDX23C HQ model with 12GB VRAM probably enough for 5:00 track which is more than in CUDA.\n\nNow the issue should be mitigated, and less memory crashes should occur.\n\nEnsembles might require more memory due to memory allocation issues not met in CUDA before. Also, VRAM is fully freed only after closing the application.\n\nAcceleration for only Demucs 2 (and 1?) arch on AMD is not supported. All others archs should work.\n\n- Be aware that there was also **full MPS (GPU) acceleration introduced for Mac M1** for all MDX-NET Original Models (HQ3, etc.), all MDX23C Models, all Demucs v4 models (no VR models acceleration on GPU). So don’t use Windows in VM to run UVR anymore, but separate using dmg installer from [releases](https://github.com/anjok07/ultimatevocalremovergui/releases) section (ARM). GPU acceleration is 3x faster than separation took on CPU before.\n\n\\_\\_\\_\\_\n\n- “MDX23C-InstVoc HQ 2 is out as a VIP model [for UVR 5]! It's a slightly fine-tuned version of MDX23C-InstVoc HQ. The SDR is a tiny bit lower, but I found that it leaves less vocal bleeding.” ~Anjok\n\nIt’s not always the case, sometimes it can be even the opposite, but as always, all can depend on specific song.\n\n- jarredou’s MDX23 [2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) Colab should allow separating faster, and also longer files now (tech [details](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/pull/2#issuecomment-1763052970))\n\n- All-in ensemble added for premium users of MVSEP - it has vocals, vocals lead, vocals back, drums, bass, piano, guitar, other. Basically 8 stems (and from drums stem you can further separate single percussion instruments using [drumsep](#_jmjab44ryjjo) - up to 4 instruments, so it will give 10 stems in total).\n\n- <https://www.capcut.cn/> (outdated section: [read](#_f0orpif22rll))\n\nIs a new Windows app which contains Ripple/SAMI-Bytedance inst/vocal model (not 4 stems like in Ripple).\n\n“At the moment the separation is only available in Chinese version which is jianyingpro, download at capcut.cn [probably [here](https://www.capcut.cn/?ts=1697285758505) - it’s where you’re redirected after you click “Alternate download link” on the main page, where download might not work at all]\n\nSeparation doesn't require sign up/login, but exporting does, and requires VIP.\n\nSeparated vocal file is encrypted and located in C:\\Users\\yourusername\\AppData\\Local\\JianyingPro\\User Data\\Cache\\audioWave”\n\nThe unencrypted audio file in AAC format is located at \\JianyingPro Drafts\\yourprojectname\\Resources\\audioAlg (ends with download.aac)\n\nDrag and drop it in Audacity or convert to WAV (<https://cloudconvert.com/aac-to-wav>)\n\n“To get the full playable audio in mp3 format a trick that you can do is drag and drop the download.aac file into capcut and then go to export and select mp3. It will output the original file without randomisation or skipping parts”\n\n“Trying out Capcut, the quality seems the same as the Ripple app (low bitrate mp3 quality)\n\nat least the voice leftover bug is fixed, lol”\n\nRandom vocal pops from Ripple are fixed here.\n\nAlso, it still has the same clicks every 25 seconds as before in Ripple.\n\nSome people cannot find the settings on [this](https://media.discordapp.net/attachments/875539590373572648/1163477679736102932/image.png?ex=653fb807&is=652d4307&hm=567a1806a464224601faa2e16b43ba2ff856d8a70355e3926d116869cefb1360) screen in order to separate. Maybe it’s due to lack of Chinese IP, or Chinese regional settings in Windows, but logging wasn’t necessary from what someone told.\n\n- Looks like the guitar model on MVSEP can pick up piano better than the available there piano model in lots of cases (isling)\n\n###### - AudioSep has been released\n\n<https://github.com/Audio-AGI/AudioSep>\n\n(separate anything you describe)\n\n<https://replicate.com/cjwbw/audiosep?prediction=j7dsrvtbyxfm3gjax3vfzbf7py>\n\n(use short fragments as input)\n\n<https://colab.research.google.com/github/badayvedat/AudioSep/blob/main/AudioSep_Colab.ipynb> (basic Colab)\n\n<https://huggingface.co/spaces/badayvedat/AudioSep> (it’s down)\n\n\"so far it's ranged from mediocre to absolutely horrible from samples I've tried\"\n\n\"So far[,] it does [a] great job with crowd noise/cheering.\"\n\nDidn't pick piano.\n\nOutput is mono 32kHz. Where input is 30s, the output can be 5s.\n\n- UVR started to process slower for some people using Nvidia 532 and 535 drivers (at least Studio ones on at least W11). [More](https://github.com/vladmandic/automatic/discussions/1285) about the issue. Consider rolling back to 531.79.\n\n“Took 10 seconds to run Karaoke 2 on a full song (~5[]mins), with the latest drivers it took like 20 minutes”. The problem may occur once you reboot your system.\n\n- AMD GPU acceleration has been introduced in the official UVR repo under a new branch on GH. Beta as exe patch will be released in the following days. Currently, it supports only MDX-Net, but not MDX23C, and Demucs 4 models (not 3) and VR arch (5.0, but not 5.1).\n\nCurrently, GPU memory is not clearing, so you need a lot of VRAM in order to use ensembles.\n\n- (x-minus) \"Added additional download buttons when using UVR BVE model.\\*\\*\n\nNow you can download:\n\n- song without backing vocals\n\n- backing vocals\n\n- instrumental without vocals\n\n- all vocals\" Anjok\n\n- MacOS UVR versions should be fixed now - redownload the latest 5.6 patches. GPU processing on M1 is fully functioning with MacOS min. Monterey 12.3/7 (only VR models will crash with GPU processing). It’s very fast for the latest MDX23C fullband model - 11 minutes vs 1 hour on CPU previously.\n\n- Cyrus version of MedleyVox Colab with chunking introduced, so you don't need to perform this step manually\n\n<https://colab.research.google.com/drive/1StFd0QVZcv3Kn4V-DXeppMk8Zcbr5u5s?usp=sharing>\n\n“Run the 1st cell, upload song to folder infer\\_file, run 2nd cell, get results from folder results = profit”\n\n“one annoying thing is that is always converts the output to mono 28k”\n\n- Separation times since the UVR 5.6 update increased double for some people. Almost the same goes to RAM usage.\n\nHaving lots of space on your system disk or additional partition assigned for pagefile can be vital in fixing some crashes, especially for long tracks. Be aware that CPU processing tends to crash less, but it's much slower in most cases.\n\n\"I realized that with 2-3h long audio files, I was able to use Demucs, after I added another 32GB of RAM. In total my system got 64GB and I increased the swap file to 128GB, which is located on an NVME drive... so just in case the 64GB RAM are not enough, which I experienced with the \"Winds\" model, it's not crashing UVR, instead using the SWAP.\"\n\n- Segments set to default 256 instead of 512 is ⅓ faster for the new MDX23C fullband model at least for 4GB cards. But it's still very slow on such RTX 3050 mobile variant (20 minutes for 3:40 song).\n\n- Sometimes inverting vocals with mixture using MDX23C instead of using instrumental output can give better results and vice versa.\n\n“Differences were more significant with D1581 [than fullband], but secondary vocals stem has \"a bit\" higher score” ([click](https://cdn.discordapp.com/attachments/767947630403387393/1158828208028926022/image.png?ex=651daa5e&is=651c58de&hm=47ee7874240d427aee4ba65ed953678597004d2223a4d21bfaddb5e5136aa5b5&)). Generally inversion of these MDX23C models (but not spectral) was giving sometimes better results.\n\n- MedleyVox [Colab](https://colab.research.google.com/drive/17G3BPOPBPcwQdXwFiJGo0pKrz-kZ4SdU) preconfigured to use with Cyrus model\n\nNewer model epochs can be found here:\n\n<https://huggingface.co/Cyru5/MedleyVox/tree/main>\n\nQ: What is isrnet?\n\nA: It's basically just another model that builds on top of what I've built so far that performs better. That's the surface level explanation, at least.\n\n- Settings for v2.2.2 Colab\n\n<https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb>\n\nIf you stuffer from some vocal residues, try out these settings\n\nBigShifts\\_MDX: 0\n\noverlap\\_MDX: 0.65\n\noverlap\\_MDXv3: 10\n\noverlap demucs: 0.96\n\noutput\\_format: float\n\nvocals\\_instru\\_only: disabled\n\nAlso, you can manipulate with weights.\n\nE.g. different weight balance, with less MDXv3 and more VOC-FT.\n\n- As an addition to [AI-killing tracks](#_37hhz9rnw7s8) section, and in response to deletion of \"your poor results\" channel, there was recently created a [Gsheet](https://docs.google.com/spreadsheets/d/1umHpYbh1NzXIkoLj_7aM2tFwX5SHFMdaxJnhe75j8bA/edit?usp=sharing) with your problematic tracks to fill in. It is open to everyone to contribute.\n\n- Video [tutorial](https://youtu.be/VbM4qp0VP80) by Bas Curtiz how to install MedleyVox (based on Vinctekan fixed source). Cyrus trained a model. MD serves to separation of various singers from a track. It sometimes does a better job than BVE models in general.\n\nSadly, it has 24kHz output sample rate, but AudioSR works pretty good for upscaling the results.\n\n<https://github.com/haoheliu/versatile_audio_super_resolution>\n\n<https://replicate.com/nateraw/audio-super-resolution>\n\n<https://colab.research.google.com/drive/1ILUj1JLvrP0PyMxyKTflDJ--o2Nrk8w7?usp=sharing>\n\nBe aware that it may not work with full length songs - you might need to divide them into smaller 30 seconds pieces.\n\n- \"Ensemble 4/8 algorithms were updated on MVSep with new VitLarge23 model. All quality metrics were increased:\n\nMultisong Vocals: 10.26 -> 10.32\n\nMultisong Instrumental: 16.52 -> 16.63\n\nSynth Vocals: 12.42 -> 12.67\n\nSynth Instrumental 12.12 -> 12.38\n\nMDX23 Leaderboard: 11.063 -> 11.098\n\nI added Ensemble All-In algorithm which includes additionally piano, guitar, lead/back vocals. Piano and guitar has better metrics comparing to standard models, because they are extracted from high quality \"other\" stem. Lead/back vocals also has slightly better metrics.\n\npiano: 7.31 -> 7.69\n\nguitar: 7.77 -> 8.95\" ZFTurbo\n\n- New vocal model added on MVSEP:\n\n\"VitLarge23\" it's based on new transformers arch. SDR wise (9.78 vs 10.17) it's not better than MDX23C, but works \"great\" for ensemble consisting of two models with weights 2, 1.\n\n- MVSEP-MDX23-Colab fork v2.2.2 is out.\n\nIt is now using the new InstVocHQ model instead of D1581:\n\n<https://github.com/jarredou/MVSEP-MDX23-Colab_v2/>\n\nMemory issues with 5:33 songs fixed (even 19 minutes long with 500K chunks supported)\n\nIt should be slightly faster than the previous version, as the extra processing for the fullband trick is not needed anymore with the new model.\n\nQ: Why is \"overlap\\_MDX\" set to 0.0 by default in MVSEP-MDX23-Colab\\_v2 ?\n\nA: because it's a \"doublon\" with MDX BigShifts (that is better)\n\n- Stable final version of UVR v5.6.0 has been released along with MDX23C fullband model (the same as on MVSEP) - SDR is 10.17 for vocals & 16.48 for instrumentals.\n\nIt’s called MDX23C-InstVoc HQ.\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/>\n\nBe aware it’s taking much more time to process a song with it, then all previous models. Also, it doesn’t require volume compensation set. It can leave more vocal residues than HQ\\_3 models for some songs. On the other hand, it can give very good results with song with “super dense mix like Au5 - Snowblind” but also for older tracks like Queen - March Of The Black Queen (always caused issues, but it gave the best result so far, although still lot of BV is missed).\n\nPerformance:\n\n- 3:30 track with HQ\\_3 takes up to 24 minutes on i3-3217u while the new model takes 737 minutes (precisely 1:34 vs 41:00 for 15 seconds song).\n\n- RTX 3060 12 GB - takes around 15 minutes to process a 25 minutes file with the new model.\n\n- GTX 1080 Ti took about 4 minutes to process, about a 5 min 30 song\n\n- If you upgraded from beta, Matchering might not work correctly. In order to fix the error:\n\nGo to the Align tool.\n\nSelect another option under \"Volume Adjustment\", it can be anything.\n\nNow, matchering should work. The fix may not apply for Linux installations.\n\n- KaraFan [original](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) Colab seems to work now (v. 3.1) but one track with default settings takes 30 minutes for 3:37 track on free T4 (the last files processed are called Final) and it can get you disconnected from runtime quick (especially if you miss some multiple captcha prompts). V. 3.1 can have more vocal residues than in 1.x version and even more than in HQ\\_3 model on its own.\n\nYou might want to consider using older versions of KF with [Kubinka](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio_Separation/KaraFan.ipynb) Colab.\n\n- Now 3.2 version was released with less vocal residues.\n\nAs mentioned before, after runtime disconnection error, output folder still constantly populated with new files, while progress bar is not being refreshed after clicking close or even after closing your tab with Colab opened.\n\n-\"Image-Line the company that made Fl Studio 21 took to instagram announcing a beta build that allows the end users to separate stems from the actual program itself, this is in beta and isn’t final product\"\n\nPeople say it's Demucs 4, but maybe not ft model and/or with low parameters applied or/and it's their own model.\n\n\"Nothing spectacular, but not bad.\"\n\n\"- FL Studio bleeds beats, just like Demucs 4 FT\n\n- FL Studio sounds worse than Demucs 4 FT\n\n- Ripple clearly wins\"\n\n-Org. KaraFan [Colab](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) with v. 3.0 should work with the large GPU option disabled (now done by default).\n\n-You may be experiencing issues with KaraFan 3.0 alpha (e.g. lack of 5\\_F-music with which the result was better before), and using [Kubinka Colab](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio_Separation/KaraFan.ipynb) which uses the older version for now has some problems with GPU acceleration. Maybe the previous KF commit will work or even the one before (2.x is used [here](https://discord.com/channels/708579735583588363/708579735583588366/1153747778938359909)).\n\n-New UVR beta patches for Windows/Mac/M1 at the bottom of the release note\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/>\n\nUsually check for newer versions above, but this one currently fixes long error on using the new BVE model\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/UVR_Patch_9_20_23_20_40_BETA.exe>\n\n- “The new BVE (Background Vocal Extractor) model [in UVR 5 GUI] has been released!\n\nTo use the BVE model, please make sure you use the [UVR\\_Patch\\_9\\_18\\_23\\_18\\_50\\_BETA](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/UVR_Patch_9_18_23_18_50_BETA.exe) patch ([Mac](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/Ultimate_Vocal_Remover_v5.BETA.9_19_23_16_57_MacOS_arm64.dmg)). Remember, it's designed to be used in a chain ensemble, not on its own. It's better to utilize it via \"Vocal Splitter Options\". ~Anjok”\n\nUsing Lead vocal placement = stereo 80% is still only available on X-Minus only. UVR GUI doesn't support this yet - it’s for the situation when your main vocals are confused with backing vocals.\n\n- In the latest UVR GUI beta patch, vocal stems of MDX instrumental models have polarity flipped. You might want to flip it back in your DAW.\n\n- Investigating KaraFan shapes issue > [link](https://discord.com/channels/708579735583588363/708579735583588366/1153747778938359909)\n\n- New piano and guitar models added on MVSEP. Use other stem from e.g. “Ensemble 8 models” or [MDX23 Colab](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) or htdemucs\\_ft for better results.\n\n- To separate electric and acoustic guitar, you can run a song (e.g. other stem) through the Demucs guitar model and then process the guitar stem with GSEP (or MVSEP model instead of one of these).\n\nGsep only can separate electric guitar so far, so the acoustic one will stay in the \"other\" stem.\n\n- New UVR beta patch implements chain ensemble from x-minus for splitting backing and lead vocals. To use it:\n\n1. Enable \"Help Hints\" (so you can see a description of the options),\n\n2. Go to any option menu\n\n3. Click the \"\\*Vocal Splitter Options\\*\"\n\n4. From there you will see the new chain ensemble options.\n\n[Patch](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/UVR_Patch_9_15_23_6_35_BETA.exe) (patching from the app may cause startup issues)\n\n- \"New MDX23C model improved on [MVSEP] Leaderboard from 10.858 up to 11.042\"\n\n- \"For those of you who were running into errors related to missing \\*\"msvcp140d.dll\"\\* and \\*\"VCRUNTIME140D.dll\"\\* after installing the latest patch, it's been fixed.\" -Anjok\n\n[UVR\\_Patch\\_9\\_13\\_23\\_17\\_17\\_BETA](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/UVR_Patch_9_13_23_17_17_BETA.exe)\n\n- The UVR's latest beta 9 patch causes startup issue for lots of people on even clean Windows 10. No fix for it. Copying libraries manually or installing all possible redistributables doesn't work. In such case, use beta 8 patch.\n\n- If you see an error that you're disconnected from KaraFan Colab, it can still separate files in the background and consume free \"credits\" till you click Environment>Terminate session. It happens even if you close the Colab.\n\nSo, you can see your GDrive output folder still constantly populated with new files, while progress bar is not being refreshed after error of runtime disconnection or even after Closing your tab with Colab.\n\n- KaraFan got updated to 1.2 (eg. model picking was added). Deleting your old KaraFan folder on GDrive can be necessary to avoid an error now in Colab.\n\n- KaraFun - next version of MDX23 fork (originally developed by ZFTurbo, enhanced and forked by jarredou) has been created by Captain FLAM (with jarredou’s assistance on tweaks).\n\nOfficial [Colab](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) (video [guide](https://youtu.be/XZAF2rzc2Q4) in case of problems)\n\n[Colab](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio_Separation/KaraFan.ipynb) forked by Kubinka (can show error now after 1.2 update)\n\nGUI for offline use: <https://github.com/Captain-FLAM/KaraFan/tree/master>\n\nIt gives very clean instrumentals with much less of consistent vocal residues than in MDX23 2.0-2.2 and Ripple/Bytedance.\n\n(might have been changed) You can also disable SRS there to get a bit cleaner result, but in cost of more vocal residues. How detestable it will be without SRS, depends on a track - e.g. if it has heavy compressed modern vocals and lots of places with not busy mix (when not a lot of instruments play). Disabled SRS adds a substantial amount of information above 17.7kHz.\n\nOne of our users had problems caused seemingly by empty Colab Notebooks folder which he needed to delete. Could have been something else they did too, though.\n\n- New epoch of new BVE model has been added to x-minus\n\n“In some parts the new BVE is better, in some it's worse. Still a great model”\n\n> To get better results, you can downmix the result to mono and repeat the separation\n\n- For people having issues with Boosty x-minus payment:\n\n<https://boosty.to/uvr/posts/5d88402e-9eb1-4046-a00a-cf8b09e27561>\n\n- Sometimes for instrumental residues in vocals, AIs for voice recorded with home microphone can be used (e.g. Goyo [now Supertone Clear], or even Krisp, RTX Voice, AMD Noise Suppression, Elgato Wave Link 3.0 Voice Focus or Adobe Podcast as a last resort) it all depends on type of vocals and how destructive the AI can get.\n\n- Izotope Ozone 11 has been released. It’s 1200$ for Advanced Edition. It’s the only version possessing Spectral Recovery. Music Rebalance is said to have Demucs instead of Spleeter now.\n\n<https://www.izotope.com/en/products/ozone.html>\n\n- Acon Digital has released [Remix](https://acondigital.com/products/remix), their first plug-in capable of real-time separation to five stems: Vocals, Piano, Bass, Drums, and Other.\n\n“Just listened to the demo, not great but still”\n\n###### - [RemFX](https://huggingface.co/spaces/mattricesound/RemFx) for detection and removal of the following effects: chorus, delay, distortion, dynamic range compression, and reverb. [Huggingface](https://huggingface.co/spaces/mattricesound/RemFx) (currently stopped working) | [Samples](https://csteinmetz1.github.io/RemFX/)\n\nThe [Colab](https://colab.research.google.com/drive/1LoLgL1YHzIQfILEayDmRUZzDZzJpD6rD) is slow while downloading [checkpoints](https://github.com/mhrice/RemFx/blob/main/scripts/download_ckpts.sh) from zenodo (400KB/s for 1GB file out of 6), later it stopped working.\n\nOutputs in at least Huggingface are mono, may not work in every case, the website in general doesn't work well with big files, keep them short, 0-30 seconds.\nSometimes 30 seconds is still not enough on Colab and it throws OutOfMemoryError.\n\nIt's not better than our dereverb model in UVR.\n\nTo fix Colab:\n\n“speechbrain lib API was totally changed in recent 1.0.0 version, it's working if you downgrade it:\n!pip install speechbrain==0.5.16”\nOG [repo](https://github.com/mhrice/RemFx) for running locally.\n\n- Beta UVR patch also released for x86\\_64 & M1 Macs:\n\n<https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_8_28_23_2_9_BETA_MacOS_x86_64.zip>\n\n“If you have any trouble running the application, and you've already followed the \"MacOS Users: Having Trouble Opening UVR?\" instructions here, try the following:\n\nRight-click the \"Ultimate Vocal Remover\" file and select \"Show Package Contents\".\n\nGo to -> Contents -> MacOS ->\n\nOpen the \"UVR\" binary file.”\n\nIn case of further issues, check this out:\n<https://www.youtube.com/watch?v=HQsazeOd2Iw&feature=youtu.be>\n\nLooks like e.g. with Denoise Lite models it can ask for parameters. Set 4band\\_v3 and 16 channels, press yes on empty window.\n\n“The Mac beta is not stable yet.” - Anjok\n\n-\"The new beta [UVR] patch has been released! I made a lot of changes and fixed a ton of bugs. A public release that includes the newest MDX23 model will be released very soon. Please see the change log via the following message - <https://discord.com/channels/708579735583588363/785664354427076648/1145622961039101982>\"\n\nPatch:\n\n<https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_8_28_23_2_9_BETA.exe>\n\n-\"I found a way to bypass the free sample limits of Dango.ai. With VPN and incognito, when the limit appears, change the date on the computer or other device (I set the next day) and close and re-open the incognito tab. Sometimes it can show network error, in such case restart the VPN and re-enter in incognito again\" Tachoe Bell\n\n- Bas' guide to change region to US for Ripple on iOS\n\n<https://media.discordapp.net/attachments/708595418400817162/1146727313963237406/Ripple_iOS_iPad_mini_2_-_demo.mp4>\n\n- Another way to use Ripple without Apple device\n\nSign up at <https://saucelabs.com/sign-up>\n\nVerify your email, upload this as the IPA: <https://decrypt.day/app/id6447522624/dl/cllm55sbo01nfoj7yjfiyucaa>\n\nRotating puzzle captcha for TikTok account can be tasking due to low framerate. Some people can do it after two tries, others will sooner run out of credits, or completely unable to do it.\n\n- Every 8 seconds there is an artifact of chunking in Ripple. Heal feature in Adobe Audition works really well for it:\n\n<https://www.youtube.com/watch?v=Qqd8Wjqtx-8>\n\n-The same explained on RX 10 example and its Declick feature:\n\n<https://www.youtube.com/watch?v=pD3D7f3ungk>\n\n- Ripple/SAMI Bytedance's API was found. If you're Chinese, you can go through it easier.\n\nThe sami-api-bs-4track (the one with 10.8696 SDR Vocals) - you need to pass the Volcengine facial/document recognition apparently only available to Chinese people\n\n<https://www.volcengine.com/docs/6489/72011>\n\nWe already evaluated its [SDR](https://mvsep.com/quality_checker/entry/4750), and it even scored a bit better than Ripple itself.\n\nThis is the Ripple audio uploading API:\n\n<https://github.com/bitelchux/TikTokUploder/blob/2a0f0241a91b558a7574e6689f39f9dd9c39e295/uploader.py>\n\nthere's a sample script on the volcengine SAMI page\n\n\"API from volcengine only return 1 stem result from 1 request, and it offers vocal+inst only, other stems not provided. So making a quality checker result on vocal + instrument will cost 2x of its API charging\n\nsomething good is that volcengine API offers 100 min free for new users\"\n\nAPI is paid 0.2 CNY per minute.\n\nIt takes around 30 seconds for one song.\n\nIt was 1.272 USD for separating 1 stem out MVSEP's multisong dataset (100 tracks x 1 minute).\n\n- (outdated) Using Ripple on an M1 remote machine turned out to be successful but very convoluted.\n\n<https://discord.com/channels/708579735583588363/708579735583588366/1143710971798507520>\n\n-It is possible that \"a particular song that an older version of mdx23 (mdx23cmodel3.ckpt) has a much better extraction than D1581 and the current 4 model ensemble on MVSEP for preserving the instruments (also organ-like instruments)\"\n\n-Seems like Google raised Colab limit for free users from 1 hour to 5 hours. It depends on a session, but in most cases you should be able to perform tasks taking above 4 hours now.\n\n-How to change region to US in Apple App Store to make \"[Ripple - Music Creation Tool](https://apps.apple.com/us/app/ripple-music-creation-tool/id6447522624)\" (SAMI-Bytedance) work.\n\n<https://support.apple.com/en-gb/HT201389>\n\n<https://www.bestrandoms.com/random-address-in-us>\n\nOr use [this](https://media.discordapp.net/attachments/708579735583588366/1143659641277010033/20230822_233327.jpg) Walmart address in Texas, the number belongs to an airport.\n\nDo it in App Store (where you have the person-icon in top right).\n\nYou don't have to fill credit cards details, when you are rejected,\n\nreboot, check region/country... and it can be set to the US already.\n\nAlthough, it can happen for some users that it won't let you download anything forcing your real country.\n\n\"I got an error because the zip code was wrong (I did enter random numbers) and it got stuck even after changing it.\n\nSo I started from the beginning, typed in all the correct info, and voilà\"\n\nIf ''you have a store credit balance; you must spend your balance before you can change stores''.\n\nIt needs (an old?) a sim card to log your old account out if necessary.\n\n- Long awaited app made by Bytedance with one of their SAMI variants from MDX23 competition which holds top of our MVSEP leaderboard was published on iOS and for US region only\n\n(with separate possibility to sign up for beta testing, also not for people outside US, and the app is in the official store already anyway, but it was before official release - at the end of June, so it's older news).\n\nIt's a multifunctional app for audio editing, which also contains a separation model.\n\nIt's free, called:\n\n\"Ripple - Music Creation Tool\"\n\n<https://apps.apple.com/us/app/ripple-music-creation-tool/id6447522624>\n\nThe app requires iOS 14.1\n\n(it's only for iOS).\n\nOutput files are 4 stems 256kbps M4A (320 max).\n\nCurrently, the best [SDR](https://mvsep.com/quality_checker/entry/4744) for public model/AI, but it gives the best results for vocals in general. For instrumentals, it rather doesn’t beat paid Dango.ai (and rather not KaraFan too).\n\n\"My only thought is trying an iOS Emulator, but every single free one I've tried isn't far-fetched where you can actually download apps, or import files that is\"\n\nSideloading of this mobile iOS app is possible on at least M1 Macs.\n\n\"If you're desperate, you can rent an M1 Mac on Scaleway and run the app through that for $0.11 an hour using this <https://github.com/PlayCover/PlayCover>\"\n\nIPA file:\n\n<https://www.dropbox.com/s/z766tfysix5gt04/com.ripple.ios.appstore_1.9.1_und3fined.ipa?dl=0>\n\n\"been working like a dream for me on an M1 Pro… I've separated 20+ songs in the last hour\"\n\n\"bitrise.com claims to have M1s and has a free trial\"\n\nScaleway method:\n<https://cdn.discordapp.com/attachments/708579735583588366/1146136170342920302/image.png>\n\n“keep in mind that the vm has to be up for 24 hours before you can remove it, so it'll be a couple bucks in total to use it”\n\n\"I used decrypted ipa + sideloadly\n\nseems that it doesn't have internet access or something\"\n\nSo far, Ripple didn't beat voc\\_ft (although there might be cases when it's better) and Dango. Samples we got months ago are very similar to those from the app, also \\*.models files have SAMI header and MSS in model files (which use their own encryption), although processing is probably fully reliable on external servers as the app doesn't work offline (also model files are suspiciously small - few megabytes, although it's specific for mobilenet models). It's probably not the final iteration of their model, as they allegedly told someone they were afraid that their model will leak, but better than the first iteration judging by SDR with even lossy input files.\n\nLater they told that it’s different model than the one they previously evaluated, and that time it was trained with lossy 128kbps files due to some “copyright issues”.\n\nMost importantly, it's the good for vocals, also cleaning vocal inverts, and surprisingly good for e.g. Christmas songs, (it handled hip-hop, e.g. Drake pretty well). It's better for vocals than instrumentals due to residues in other stem - bass is “so” good, drums also decent. Vocals can be used for inversion to get instrumentals, and it may sound clean, but rather not as good as what 2 stem option or 3 stem mixdown gives.\n\nOther stem residues appear due to the fact they told the other stem is taken from the difference of all remaining stems - they didn’t train the other stem model to save on separation time.\n\n\"One thing you will notice is that in the Strings & Other stem there is a good chunk of residue/bleed from the other stems, the drum/vocal/bass stems all have very little to no residue/bleed\" doesn't exist in all songs.\n\nIt's fully server-based, so they may be afraid of heavy traffic publishing the app worldwide, and it's not certain that it will happen.\n\nThanks to Jorashii, Chris, Cyclcrclicly, anvuew and Bas.\n\nPress information:\n\n<https://twitter.com/AppAdsai/status/1675692821603549187/photo/1>\n\n<https://techcrunch.com/2023/06/30/tiktok-parent-bytedance-launches-music-creation-audio-editing-app/>\n\nBeta testing\n\n<https://www.ripple.club/>\n\n- Following models added on MVSep:\n\nUVR-De-Echo-Aggressive\n\nUVR-De-Echo-Normal\n\nUVR-DeNoise\n\nUVR-DeEcho-DeReverb\n\nThey are all available under the \"Ultimate Vocal Remover HQ (vocals, music)\" option (MDX FoxJoy MDX Reverb Removal model is available as a separate category).\n\n- If you looked for possibility to pay for Dango using Alipay - they recently introduced the possibility to link foreign cards, and if that option fails (sometimes does), you can open 6 months “tourcard”, and open new later if necessary, but only Visa, Mastercard, Diners Club and JCB cards are supported to top tourcard up\n\n<https://ltl-beijing.com/alipay-for-foreigners/>\n\n- Dango no longer supports Gmail email accounts\n\n- New piano model added on MVSEP. SDR-wise it’s better than GSep, but GSep is probably also using some kind of processing in order to get better separation results, but e.g. Dango instrumentals can be inverted to get just vocals despite the fact they claim to use some recovery technology.\n\n- [arigato78 method](#_vktvthhthrvh) for main vocals\n\n-Captain Curvy method for instrumentals added in instrumentals models list section (the top link)\n\n- For canceling room reverb check out:\n\nReverb HQ\n\nthen\n\nDe-echo model (J2)\n\n- Sometimes vox\\_ft can pick up SFX\n\n- Install UVR5 GUI only in the default location picked by the installer. Otherwise, you might get python39.dll error on startup. If you see that error after installing the beta patch, reinstall the whole app.\n\n- Few of our users finally evaluated sonically new dango.ai 9.0 models. Turns out the models are not UVR's (or no longer), and actually give pretty close results to original instrumentals, but not so good vocals.\n\n\"It's slightly better but still voc\\_ft keeps more reverb/delays\n\nbut again, it's 99% close, Dango has maybe more noise reduction\" maybe even less instrumental residues (can be a result of noise reduction).\n\n\"A bit cleaner than voc\\_ft in terms of having synths/instruments, but they do sound a bit filtered at times. [In] overall it's close tho\"\n\n\"I discovered Dango's conservative mode keeps instrumentals even fuller, but might introduce some background vocals\n\nstill quite better than what we have.\n\nI'm still surprised how it's so clean, as if not having vocal residues like any other MDX model. Sometimes the Dango sounds like a blend of VR's architecture, but I'm probably wrong, it could be the recovery technology\" - becruily\n\n<https://tuanziai.com/vocal-remover/upload>\n\nYou must use the built-in site translate option in e.g. Google Chrome, because it's Chinese.\n\nOn Android, it may not work correctly. In case of further issues, use Google Translate or one of Yandex apps with image to text translators.\n\nYou are able to pay for it using Alipay outside China.\n\nDango redirects to Tuanziai site - it's the same.\n\n<https://tuanziai.com/encouragement>\n\nHere you might get 30 free points (for 2 samples) and 60 paid points (for 1 full songs) \"easily\".\n\nDango.ai scores bad in SDR leaderboards due to recovery algorithms applied. Similar situation probably like in GSep.\n\n- New BVE model on X-Minus for premium users. One of, if not the best so far. It uses voc\\_ft as a preprocessor.\n\n\"BVE sounds good for now but being an (u)vr model the vocals are soft (it doesn’t extract hard sounds like K, T, S etc. very well)\"\n\n\"Pretty good, if still [in] training. Seems to begin a phrase with a bit of confusion between lead and backing, but then kicks in with better separation later in the phrase. Might just be the sample I used, though.\"\n\n- Jarredou published the final [2.2 version](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) of MDX23 Colab (don't confuse it with MDX23C single models v3 arch) - gives more vocal residues than 2.0/2.1, but better SDR. Now it has SRS trick, bigshifts, new fine-tuning, separated overlap parameters for MDX, MDXv3 and Demucs models, and also possess one narrowband MDX23C model D1581 among other MDX ones, which states a new set of models now (also said to use VOC-FT Fullband SRS instead of UVR-MDX-Instr-HQ3, although HQ3 is still listed during processing). You can also use faster optional 2 stem only output (demucs\\_ft vocal stem is used here only). Float parameter returns WAV 32-bit. Don’t set overlap v3 to more than 10, or you’ll get error. It can be way more frequent with odd values.\n\nChanging weights added: “For residues, I would first try a different weight balance, with less MDXv3 and more VOC-FT, as model D1581, and current MDXv3 models in general tend to have more residues than VOC-FT.”\n\n- New \"8K FFT full band\" model published on MVSEP. Currently, a better score than only 2.2 Colab above from commonly available solutions, although more vocal residues than current default on MVSEP at least in some cases, and “voice sounded more natural [in default] than the new 10 SDR model” but in some problematic songs it can even give the best results so far.\n\n\"Sometimes 8K FFT model is false detect the vocals, in the vocal stem synth was treated as vocal. On instrumental stem, mostly are blur result compared with 12K FFT. But 12K FFT seems to be some vocal residue but very less heard (like a whisper) and happened for several songs, not all songs.\"\n\n- \"The karaoke ensemble works best with isolated vocals rather than the full track itself\" Kashi\n\n- Center isolation method further explained in *Tips to enhance separation, step 19*\n\n- VR Kara models freeze on files over ~6 minutes in UVR beta 2 (GTX 1080).\n\n>Divide your song into two parts.\n\n- New public dataset published by Moises ([MoisesDB](https://developer.moises.ai/blog/moises-news/introducing-moisesdb-the-ultimate-multitrack-dataset-for-source-separation-beyond-4-stems)). There are some problems with downloading it now, and it’s 82,7GB and link expires during downloading after 600 seconds. Not enough for 30MB/s, but good for 10Gbps one. Moises team works on the issue. Probably it's fixed already.\n\n- RipX inside the app uses UVR for gathering stems now. Consider also comparing its stem cleanup feature to RX 10 debleed in RX Editor.\n\n- “RipX is badass for removing residues and harmonics from vocals. The ability to remove harmonics & BGVs using RipX is amazing but is very tedious but so far so good” (Kashi)\n\n- Sometimes using vocal model like voc\\_ft on the result from instrumental model might give less vocal residues or sometimes even none (Henry)\n\n- mvsep1.ru from now on, contains a content of mvsep.com, so without MDX23/C and login features, while mvsep.com has the richer content of mvsep1.ru\n\nThe old leaderboard link has changed and is now:\n\n<https://mvsep1.ru/quality_checker/leaderboard2.php?sort=instrum>\n\n- old domain is also fixed now, redirecting leaderboard links.\n\nIf you’re uploading in quality checker is stopped, clear your browser and start over.\n\n- Dereverb and denoiser for VR arch is not compatible with any VR Colab and manual installation of such model will fail with errors. It requires modifying nets and layers. [More](https://discord.com/channels/708579735583588363/708595418400817162/1116150594512625686)\n\n- New best ensemble (all Avg/Avg)\n\n(read entries details on the [chart](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum) for settings - they can have very time-consuming parameters and differ in that aspect)\n\n#1 MDX23C\\_D1581 + Voc FT | #2 MDX23C\\_D1581 + Inst HQ3 + Voc FT | #3\n\nMDX23C\\_D1581 + Inst HQ3 + Voc FT\n\nBe aware that above can sound noisy/have vocal leaks at times; consider using HQ\\_3 or kim inst then, also:\n\n- The best ensembles so far in Kashi's testing for general use:\n\nKim Vocal 2 + Kim FT other + Inst Main + 406 + 427 + htdemucs\\_ft avg/avg, or:\n\nVoc FT, inst HQ3, and Kim FT other (kim inst)\n\n“This one's much faster than the first ensemble and sometimes produces better results”\n\nIt all depends on a song. Also, sometimes \"running one model after another in the right order can yield much better results than ensembling them\".\n\n- Disable \"stem combining\" for vocal inverted against the source. Might be less muddy, possibly better SDR.\n\nIt's there in MDX23C because now the new arch supports multiple stems separation in one model file.\n\n- Disabling \"match freq cutoff\" in advanced MDX settings seems to fix issues with 10kHz cutoff in vocals of HQ3 model.\n\n- New explanations on Demucs parameters added in Demucs 4 section\n\n(shifts 0, overlap 0.99 [won](https://mvsep.com/quality_checker/entry/3848) in SDR [vs](https://mvsep.com/quality_checker/entry/3772) shifts 1, overlap 0.99 and [even](https://mvsep.com/quality_checker/entry/435) shifts 10, overlap 0.95)\n\n- \"Last update of Neutone VST plugin has now a Demucs model to use in realtime in a DAW\n\n(it's a 'light' version of Demucs\\_mmi)\n\n<https://neutone.space/models/1a36cd599cd0c44ec7ccb63e77fe8efc/>\n\nIt doesn't use GPU, and it's configured to be fast with very low parameters, also the model is not the best on its own. It doesn't give decent results, so it's better to stick to other realtime alternatives (see document outline)\n\n- Turns out that with a GPU with lots of VRAM e.g. 24GB, you can run two instances of UVR, so the processing will be faster. You only need to use 4096 segmentation instead of 8192.\n\nSDR difference between overlap 0.95 and 0.99 for voc\\_ft MDX model in (new/beta) UVR is 0.02.\n\n0.8 seems to be the best point for ensembles\n\n12K segmentation performed worse than 4K SDR-wise\n\n- Recommended balanced values between quality and time for 6GB graphic cards in the latest beta:\n\nVR Architecture:\n\nWindow Size: 320\n\nMDX-Net:\n\nSegment Size: 2752 (1024 if it’s taking too long)\n\nOverlap: 0.7-/0.8\n\nDemucs:\n\nSegment: Default\n\nShifts: 2 (def)\n\nOverlap: 0.5\n\n(exp. 0.75,\n\ndef. 0.25)\n\n\"Overlap can reduce/remove artifacts at audio chunks/segments boundaries, and improve a little bit the results the same way the shift trick works (merging multiple passes with slightly different results, each with good and bad).\n\nBut it can't fix the model flaws or change its characteristics\"\n\n“Best SDR is a hair more SDR and a shitload of more time.\n\nIn case of Voc\\_FT it's more nuanced... there it seems to make a substantial difference SDR-wise.\n\nThe question is: how long do u wanna wait vs. quality (SDR-based quality, tho)”\n\n- A script with guide for [separating multiple speakers](#_ak53injalbkf) in a recording added\n\n- If you're stuck at 5% of separation in UVR beta, try to divide your audio into smaller pieces (that's beta's regression)\n\n- A new separation site appeared, giving seemingly better results than Audioshake:\n\n<https://stemz.mwm.io/>\n\n“Guitar stem seems better than Demucs, piano maybe too. Drums sound like Spleeter. Vocal bleeds in most of the stems, or not vocals are picked up, so they end up in the synths. But that's just from one song test” becruily\n\n- Drumsep [Colab](https://colab.research.google.com/drive/1wws3Qm3I1HfMr-3gAyW6lYzUHXG_kuyz?usp=sharing#scrollTo=ZHabZkRf4ZNK) now has GPU acceleration and much better max quality optional settings\n\n- 1620 MDX23C model added on x-minus. Opposing the model on UVR, it's fullband and not released yet (16.2 SDR).\n\n\"Even if the separations have more bleeding than VOC-FT (and it's an issue), the voice sound itself is much fuller, \"in your face\" compared to VOC-FT, that I now find it like blurry sounding compared to MDXv3 models.\n\nI think that's why the new MDXv3 models are scoring better despite having more bleeding (at the moment, like I said before, trainers/finetuners have to get familiar with new arch, and that will probably help with that new bleed issue).\"\n\n- New MDX23C model added on MVSEP (better SDR - 16.17)\n\n- UVR beta [patch 2](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_7_11_23_20_51_BETA.exe) repairing no audio issue with GPU separation on the GTX 1600 series using MDX23C arch. Fixes some other bugs too.\n\n- Narrowband MDX23C vocal model (MDX23C\\_D1581 a.k.a. model\\_2\\_stem\\_061321) trained by UVR team has been released. SDR is said to be better than voc\\_ft (but the latter was evaluated with older non-beta patch). Be aware that CPU processing returns errors for MDX23C models, at least on some configs (“deserialize model on CUDA” error). Fullband models will be released in a few weeks (and as it was usually before, on x-minus first for a few weeks later). [Download](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/MDX23C_D1581.ckpt) (install [beta patch](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_7_11_23_20_51_BETA.exe) first and drop it into the MDX-Net models folder). The patch is for only Windows now, with an upcoming Mac patch planned later. For Linux, there's probably a source of the patch already out.\n\nMDX23C\\_D1581 parameters are set up with its yaml config file and its n\\_fft value is 12288, not 7680. It has cutoff at 14.7khz (while VOC-FT cutoff is 17.5khz)\n\n- \"(Probably all) models are stereo and can't handle mono audio. You have to create a fake stereo file with the same audio content on the L and R channel if the software doesn't make it by itself.\" Make sure that the other channel is not empty when isolation is executed - it can produce silent bleeding of vocals in the opposite channel (happens in e.g. MDX23 and GSEP, and errors with mono in MDX-Net)\n\n- ”For Unbound local” error while you do anything in UVR since the new model installation, you might be forced to rollback the update\n\n- Clear the Auto-Set Cache in the MDX-Net menu if you set wrong parameter and end up with error\n\n- Pitch shift is the same as soprano mode except in the GUI beta you can choose how many semitones to pitch the conversion\n\n- Dango.ai released a 9.0 model. We received a very positive report on it so far.\n\n- UVR beta patch released. Potentially new SDR increases with the same models.\n\nAdded segmentation, overlap for MDX models, batch mode changes.\n\nSoprano trick added. Basically, you can set it by semi-tones.\n\nSupport for MDX-NET23 arch. For now, it uses only basic models attached by Kuielab (low SDR, so don't bother for now), but UVR team already trained their own model for that arch, which will be released later, and a few weeks after x-minus and MVSep. And it's performing well already. Wait™. Don't exceed an overlap 0.93-0.95 for MDX models, it's getting tremendously long with not much of a difference, 0.8 might be a good choice as well. Also, segments can ditch the performance AF. 2560 might be still a high but balanced value.\n\nSadly, it looks like max mag for single models is no longer available - you can use it only under Ensemble Mode for now.\n\nQ: What is Demucs Pre-process model?\n\nA: You can process the input with another model that could do a better job at removing vocals for it to separate into the other 3 stems\n\n*Beta UVR patch* [link](https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_7_7_23_6_34_BETA.exe)\n\n- \"Post-Process [for VR] has been fixed, the very end bits of vocals don't bleed anymore no matter which threshold value is used\"\n\n- New BVE model will be ready at the beginning of August (Aufr33).\n\n- MDX23C by ZFTurbo model(s) added on mvsep.com. They're trained by him using the new 2023 MDX-Net V3 arch.\n\nSlightly worse SDR than MDX23 2.1 Colab on its own.\n\nMight be good for rock, the best when all three models are weighted/ensembled)\n\n- MDX23C ensemble/weighted available on mvsep.com for premium users (best SDR for public 2 stem model).\n\nIt might still leave some instrumental residues in vocals of some tracks (which can be cleared with MDX-UVR HQ\\_3 model) but it can be also vice versa - the same issue as kim vocal models, where the vocals are slightly left in the instrumentals [vs e.g. MDX23 2.1 free of the issue]\n\nOn some Modern Talking and CC tracks it can give the best results so far).\n\n- If you have problems with “Error when uploading file” on MVSEP, use VPN. Similar issues can happen for free X-Minus for users in Turkey.\n\n- lalal.ai cooperation with MVSEP was fake news. Go along.\n\n- As for Drumsep, besides in fixed [Colab](https://colab.research.google.com/drive/1wws3Qm3I1HfMr-3gAyW6lYzUHXG_kuyz?usp=sharing), you can also use it (the separation of single percussion instruments from drums stem) in UVR GUI. How to do this:\n\nGo to UVR settings and open the application directory.\n\nFind the folder \"models\" and go to \"demucs models\" then \"v3\\_v4\"\n\nCopy and paste both the [.th](https://drive.google.com/file/d/1S79T3XlPFosbhXgVO8h3GeBJSu43Sk-O/view) and [.yaml](https://cdn.discordapp.com/attachments/708579735583588366/1124699600717094932/drumsep.yaml) files, and it's good to go.\n\nOverlap above 0.6 or 0.7 becomes placebo, at least for dry track, with no effects.\n\n- Drumsep benefits from shifts a lot (you can use even 20).\n\n- For better results, test out potentially also -6 semitones in UVR beta, or with 31183Hz sample rate with changed tempo.\n\n12 semitones from 44100Hz is 22050 and should be rather less usable in most cases, the same for tempo preservation on.\n\n- If you have a long band\\_net [error](https://cdn.discordapp.com/attachments/708595418400817162/1124780489522282686/Captura_de_pantalla_2023-07-01_121518.png) log while using DeNoise model by Fox Joy in UVR, reinstall the app.\n\n- It can happen that every second separation using MDX Colab will fail due to memory issues, at least with Karaoke 2 model.\n\n- New fine-tuned vocal model added to UVR5 GUI download center and HV Colab (slightly better SDR than Kim Vocal 2) it's called \"UVR-MDX-Net-Voc\\_FT\" and is narrowband (because it's based on previous models).\n\n- Audioshake 3 stem model is added to <https://myxt.com/> for free demo accounts. Unfortunately, it has WAVs with 16kHz cutoff which Audioshake normally doesn't have. No other stem. Results, maybe slightly better than Demucs.\n\nMight be good for vocals.\n\n- Spectralayers 10 received an update of an AI, and they no longer use Spleeter, but Demucs 4, and they now also good kick, snare, cymbals separation too. Good opinions so far. Compared to drumsep sometimes it's better, sometimes it's not. Versus MDX23 Colab V2, instrumentals sometimes sound much worse. “SpectraLayers is great for taking Stems from UVR and then carrying on separating further and editing down. (...) Receives a GPU processing patch soon”\n\n- (? some) MDX Colabs started causing errors of insufficient driver version.\n\n> \"As a temp workaround you can go to \"Tools\" in the main menu, and \"Command Palette\", and search for \"Use fallback runtime version\", and click on it, this will restart the notebook with the previous Ubuntu version in Colab, and things should works as they were before (at least till mid July or earlier [how it was once] where it is currently scheduled to be deleted)\" probably it will be fixed.\n\nX: Some people have an error that fallback runtime is unavailable.\n\n- New v2 version of **ZFTurbo's MDX23** [Colab](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/) released by jarreadou (now also with denoiser off memory fix added). Now it should have less bleeding in general.\n\nIt includes models changed for better ones (Kim Vocal 2 and HQ\\_3), volume compensation, fullband of vocals, higher frequency bleeding fix. It all manifests in increased SDR.\n\n**Instrum** is inverted of vocals stem\n\n**Instrum2** is the sum of drums+bass+other stems (I used to prefer it, but most people rarely see any difference between both, and it also depends on specific fragments, although instrum gets better SDR and is less muddy, so it’s rather better to stick with instrum)\n\nIf your separation ends up instantly with path written below, you wrongly wrote it in the cell.\n\nSimply remove the `file - name.flac` at the end and leave only path leading to a file.\n\nIt's organized in a way that it catches all files within that path/folder.\n\nSuggestion: go to drive.google.com and create a folder `input`,\n\nand drop the tracks you want to process in there.\n\nWhen the process is done, delete them, and add others you want to process.\n\nOverlap large and small are the main settings, higher values = slightly higher score, but way longer processing.\n\nColab doesn't allow much higher value for chunk size, but you can try little higher ones and see when it crashes because of memory. Higher chunk size give better results.\n\n- [Updated](https://cdn.discordapp.com/attachments/708579735583588366/1123687734003904552/inference.py) inference with voc\\_ft model ([Colab](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb) v2.1 has denoiser now on, but updated inference not and is essentially what 2.2 currently is).\n\n*- Volume compensation fine-tuning - it is in line 359 (voc\\_ft), 388 (for ensembling the vocals stem), 394 (for HQ\\_3 instrumental stem inversion).*\n\n- chunk\\_size = 500000 will fail with 5:30 track, decrease it to at least 300K in such case.\n\nOverlap 0.8 is a good balance between duration and quality.\n\n- In case of system error wav not found, simply retry separation.\n\nNice [instruction](#_jmb1yj7x3kj7) how to use the Colab.\n\nThe v. 2.1 Colab was firstly evaluated with lower parameters, hence it received slightly worse SDR. Then it was evaluated again and got better score than v2.\n\nWiP Colabs\n\n- 2.2 Beta [1](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/2d82810d0b6ff5781a5b64cef13ca7387fc95b77) (no voc\\_ft yet)\n\n- 2.2 Beta [1.5](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/tree/7171c4704992c3a878452f52df57c8455ec7cff9)\n\n- 2.2 Beta (1.5.1, [inference](https://drive.google.com/file/d/1QbNuY2acGnuJZD0Fczsul4kqJKRFBEgO/view?usp=sharing) with voc\\_ft, replace in the Colab above; no fine-tuning)\n\n- [v2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/597b5b7f653e4593a0a94938a3923077d66f8767/MVSep-MDX23-Colab.ipynb) beta 2/3 ([working inference](https://drive.google.com/file/d/1igkChOCO9L2zyrUp9roZ5TWff2aOhOQc/view?usp=sharing)) (MDX bigshifts, overlap added, fine-tuning, no 4 stems > experimental, no support for now, 22 minutes for vocals only, mdx: bsf 21, ov 0.15, 500k, 5:30 track)\n\n- [v2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/597b5b7f653e4593a0a94938a3923077d66f8767/MVSep-MDX23-Colab.ipynb) (w/ voc\\_ft [inference](https://drive.google.com/file/d/1bpZKZynmdsYcriF-M8t8yLtVLm7zRz5U/view?usp=sharing)) pre beta 3 w/o MDX v3 yet - comment out both bigshifts in the cell - they won’t work\n\n- current beta [link](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) (WiP, might be unstable at times; e.g. here for 19.07 bigshifts doesn’t work, and you need to look for working inference in history or delete the two bigshifts references in the cell; doesn’t seem that MDX v3 model is here yet)\n\nIn general -\n\nMDX23 is quite an improvement over htdemucs\\_ft (...).\n\nDrum stem makes htdemucs\\_ft sound like lossy in comparison, absolutely beautiful\n\nBass is significantly more accurate, identifies and retains actual bass guitar frequencies with clarity and accuracy\n\n\"Other\", equally impressive improvement over htdemucs\\_ft, much more clarity in guitars\"\n\nAnd problems with vocals they originally described are probably fixed in V2 Colab.\n\n- “I just added 2 new denoise models that were made by FoxJoy. They are both very good at removing any residual noise left by MDX-Net models. You can find them both in the \"Download Center\". - Anjok\n\nBe aware that they're narrowband (17.7kHz cutoff). Good results.\n\n*To download models from Download Center -*\n\nIn UVR5 GUI, click the tools icon > click Download Center tab > Click radio button of VR architecture > click dropdown > select the model > hit Download button > wait for it to download... Profit.\n\n- New MDX-UVR “HQ\\_3” model released in UVR5 GUI! The best SDR for a single instrumental model so far. [Model file](https://cdn.discordapp.com/attachments/767947630403387393/1117709857223606300/UVR-MDX-NET-Inst_HQ_3.onnx) (but visiting download center is enough). On X-Minus I think too.\n\n-HQ\\_3 model added to [MDX Colab](https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb) (old)\n\n-HV just made a new version of her own [updated MDX Colab](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb) with all new models, including HQ\\_3. It lacks e.g. Demucs 2 for Instrumentals of vocal models, but in return it allows using YouTube and Deezer links for lossless tracks, with providing ARL, and allows specifying manually more than one file name to process at the same time. Also, for any new models in the future, there's optional input for model settings, to bypass parameters of parameters autoloader. IRC, the Colab stores its files in different path, so be aware about it when uploading tracks for separations on GDrive.\n\n- she has added volume compensation in new revision (they’re applied automatically for each model)\n\nIn previous [MDX Colabs](#_aa2xhwp434) there were also min, avg, max, and chunks, but they're gone in HV Colab.\n\n- HV also made a [new](https://colab.research.google.com/drive/1VnqwFkpjPLjMwUPmgjoZJQR1S8hd6CBJ) VR Colab which irc, now don’t clutter all your GDrive, but only downloads models which you use (but without VR ensemble) and probably might work without GDrive mounting, but it lacks VR ensemble.\n\n- New MDX models added to both variants of MVSep (Kim inst, Vocal 1/2, Main [vocal model], HQ\\_2)\n\n- ZFTurbo’s MDX23 code now requires less GPU memory. “I was able to process file on 8 GB card. Now it's default mode.”: 6GB VRAM is not enough. Lowering overlaps (e.g. 500000 instead of 1000000) or chunking track manually might be necessary in this case. Also now you can control everything from options: so you can set chunk\\_size 200000 and single ONNX. It can possibly work with 6GB VRAM that way.\n\nOverlap large and small - controls overlap of song during processing. The larger value the slower processing but better quality (both)\n\nIf you have fail to allocate memory error, use --large\\_gpu parameter\n\nSometimes turning off use large GPU and reducing chunk size from 1000000 to 500000 helps\n\n- Models/AIs of the 1st and 2nd place winners in MDX23 music challenge (ByteDance’s and quickpepper947’s) sadly won’t be released to the public (at least won’t be open-sourced). Maybe in June, ByteDance will be released as an app in worse quality.\n\nJudging by the few snippets we had:\n\n\"the vocal output, yes, better than what can be achieved right now by any other model, it seems.\n\nthe instrumental output... meh. I can hear vocals in it, on a low volume level.\" but be aware that improved their model by the time by a lot.\n\n- MDX23 4 stem model and [source code](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model) with dedicated app by ZFTurbo (3rd place) was released publicly with the whole AI and instructions how to run it locally. No longer requires minimum 16GB VRAM Nvidia GPU. It even has a neat GUI (3rd place in leaderboard C, better SDR than demucs ft). You can still use the model online on [mvsep1.ru](https://mvsep1.ru/) (now mvsep.com).\n\nThe command:\n\n\"conda install -c intel icc\\_rt\"\n\nSOLVES the LLVM ERROR\n\nFor above, you can get less vocal residues by replacing the Kim Vocal 1 model there manually by newer Kim Vocal 2 and kim inst by and Kim Inst with UVR Inst HQ 292 (“full 292 is a lot more aggressive than kim\\_inst”).\n\njarredou forked it with better models and settings already.\n\nShort technical [summary](https://discord.com/channels/708579735583588363/911050124661227542/1107002345704927362) of ZFTurbo about what is under the hood and small [paper](https://arxiv.org/pdf/2305.07489.pdf).\n\nFrom what I see in the code, it uses inverted vocals output for instrumentals from - Demucs ft, with - hdemucs\\_mmi, and - Kim vocal 1 and - Kim inst (ft other). More explanations in [MDX23](#_jmb1yj7x3kj7) dedicated section of this doc.\n\n- jarreadou made a [Colab](https://colab.research.google.com/github/jarredou/MVSep-MDX23-Colab/blob/main/MVSep-MDX23-Colab.ipynb) version of ZFTurbo MDX23:\n\n\"(It's working with `chunk\\_size = 500000` as default, no memory error at this value after few tests with Colab free)\n\nOutput files are saved on Colab drive, in the \"results\" folder inside MVSep installation folder, not in \\*your\\* GDrive.\"\n\nOn 19.05 its SDR was tested, and had better score for instrumentals than UVR5 ensemble for that time being. Currently not, but there are new versions of the Colab planned.\n\n- ByteDance-USS was released with [Colab](https://colab.research.google.com/drive/1lRjlsqeBhO9B3dvW4jSWanjFLd6tuEO9?usp=share_link) by jazzpear. It works better than zero-shot for SFX and “user-friendly wise” while zero-shot stil better for instruments.\n\n\"<https://www.dropbox.com/sh/fel3hunq4eb83rs/AAA1WoK3d85W4S4N5HObxhQGa?dl=0>\n\nQueries for ByteDance USS taken from the DNR dataset. Just DL and put these on your drive to use them in the Colab as queries.\"\n\n[QA](#_4svuy3bzvi1t) section added.\n\n- The [modified](https://colab.research.google.com/drive/1CO3KRvcFc1EuRh7YJea6DtMM6Tj8NHoB?usp=sharing) MDX Colab - now with automatic models downloading (no more manual GDrive models installations) and Karaoke 2 model.\n\n> Separate input for 3 models parameters added, so you don’t need to change models.py every time you switch to some other model. Settings for all models listed in Colab. From now on, it uses reworked main.py and models.py (made by jarredou) downloaded automatically. Don’t replace models.py from packages with models from [here](#_aa2xhwp434) now. Now denoiser optionally added!\n\n- MDX Colab with newer models is now reworked to use with current Python 3.10 runtime which all Colabs now use.\n\n- Since 28.04 lots of Colabs started having errors like \"onnxruntime module not found\". Probably only MDX Colab (was) affected.\n\n(not needed anymore)\n\n> \"As a temp workaround you can go to \"Tools\" in the main menu, and \"Command Palette\", and search for \"Use fallback runtime version\", and click on it, this will restart the notebook with the previous python version, and things should works as they were before\"\n\n- [OG](https://colab.research.google.com/drive/189nHyAUfHIfTAXbm15Aj1Onlog2qcCp0?usp=sharing) MDX HV Colab is (also) broken due to torch related issues (reported to HV). To fix it, add new code row with:\n\n!pip install torch==1.13.1\n\nbelow mounting and execute it after mounting\n\n> or use fixed MDX [Colab](https://colab.research.google.com/drive/1CO3KRvcFc1EuRh7YJea6DtMM6Tj8NHoB?usp=sharing) with newer models and fix added (now with also old Karaoke models).\n\n- While using [OG](https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/main/Vocal%20Remover%205_arch.ipynb) HV VR Colab, people are currently encountering issues related to **librosa**. The issues are already reported to HV (the author of the Colab).\n\n> use this [fixed](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl?usp=sharing) VR Colab for now (04.04.23). (the issue itself was fixed by uncommenting librosa line and setting 0.9.1 version - deleted \"#\" before the lines in Mount to Drive cell, now also fresh installation issues are fixed - probably the previous fix was based on too old HV Colab revision). VR Colab is not affected by May/April runtime issues.\n\n- If you have fast CPU, consider using it for ensemble if you have only 4GB VRAM, otherwise you can encounter more vocal residues in instrumentals. 11GB VRAM is good enough, maybe even 8GB.\n\n- New Kim's instrumental \"ft other\" model. Already added to UVR's download center with parameters.\n\nManual settings - dim\\_f = 3072 n\\_fft = 7680 <https://drive.google.com/drive/folders/19-jUNQJwols7UyuWO5PWWVUlJQEwpn78>\n\n(Unlike HQ models, it has cutoff, but better SDR than even inst3/464, added to Colab)\n\n- Anjok (UVR5) \"I released an additional HQ model to the Download Center today. \\*\\*UVR-MDX-NET Inst HQ 2\\*\\* (epoch 498) is better at removing long drawn out vocals than UVR-MDX-NET Inst HQ 1.\" It has already evaluated slightly better SDR vs HQ\\_1 both for vocals and instrumentals (HQ\\_1 evaluation was made once more since introducing Batch Mode which slightly decreases SDR for only single models vs previous versions incl. beta, but mitigates an issue when there are sudden vocal pop-ins using <11GB VRAM cards)\n\n- Anjok (UVR5, non-beta) “So I fixed MDX-Net to always use **Batch Mode,** even when chunks are on. This means setting the chunk and margin size will solely be for audio output quality. Regardless of PC specs, users will be able to set any chunk or margin size they wish. Resource usage for MDX-Net will solely depend on Batch Size.”\n\nEdit. Batch size set to default instead of chunks enabled on 11GB cards for ensemble achieves better SDR, but separation time is longer.\n\n- Public UVR5 patch with batch mode and final **full band** model was released (**MDX HQ\\_1**)\n\n- 293/403 and 450/498 (HQ\\_1 and 2) full band MDX-UVR models added to [Colab](#_zaimpsi6j19a) and (also in UVR) (PyTorch fix added for Colab)\n\n- **Wind** model (trumpet, sax) beside x-minus, added also to UVR5 GUI\n\nYou'll find it in UVR5 in Download Center -> VR Models -> select model 17\n\n(10 seconds of audio separated with Wind model, from a 7-min track, takes 29 minutes to isolate on a 3rd gen i7 - might be your last resort if it crashes your 4GB VRAM GPU as some people reported)\n\n- (x-minus/Aufr33) \"1. \\*\\*Batch mode\\*\\* is now enabled. This greatly speeds up processing without degrading quality.\n\n2. The \\*\\*b.v.\\*\\* models have been renamed to \\*\\*kar\\*\\*.\n\n3. A new \\*\\***Soprano voice**\\*\\* setting has been added for songs with the high-pitched vocals.\n\n\\*This only works with mdx models so far.\\*\"\n\nIt slows down the input file similarly to the method we described in our tip section below.\n\n- New MDX23 vocal model added to beta MVSEP site.\n\n## *- (no longer necessary)* [*Fork of UVR GUI*](https://github.com/Aloereed/ultimatevocalremovergui-directml) *and* [*How to install*](https://youtu.be/bw13MI-jMZ4) *- support for AMD and Intel GPUs appeared (works only for VR and MDX architectures), Besides W11, also W10 confirmed working, MDX achieves speeds of i5-4460s using 6700 XT, while for VR, speeds are v. fast and comparable to CUDA, so CPU processing might be slower in VR, but for MDX you might want to stick with the official UVR5 GUI.*\n\n*- Batch mode seems to fix problems with vocal popping using low chunks values in MDX models, and also enhance separation quality while eliminating lots of out of memory issues. It decreases SDR very slightly for single models, and increases SDR in ensemble.*\n\n- (outdated) New beta MDX model “Inst\\_full\\_292” without 14.7kHz cutoff released (performs better than Demucs 4 ft). If the model didn’t appear on your list in UVR 5 GUI, make sure you’ve redeemed your code <https://www.buymeacoffee.com/uvr5/vip-model-download-instructions>\n\nOr use [Colab](#_aa2xhwp434).\n\nNewer epochs available for [paid](https://boosty.to/uvr) users of <https://x-minus.pro/ai?hp&test-mdx>\n\n- To use Colabs in mobile browsers, you probably no longer need to switch your browser to PC Mode first.\n\n*News section continues in* [older](#_yx8u0ahol7ao) news/update logs\n\n###### General reading advice\n\n*- If you found this document elsewhere (e.g. as PDF),* [*here*](https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5%20c/) *is always up-to-date version of the doc*\n\n*- If you have anything to add to this doc, ping me @deton24 on our Discord* [*server*](https://discord.gg/ZPtAU5R6rP) *from the footer, but rather refrain from PMing directly if not necessary. Every time you request writing privileges via GDoc, God kills a cat. Don't click “ask for privileges”!*\n\n*- You can use the (rather outdated)* [*Table of content*](#_sm5m61aib1vx) *section, but better go to Options and show “****Document outline****” to see an up-to-date clickable table of content. If you don't have Google Docs installed, and you opened the doc in a mobile browser and no Table of content option appear, use* [*Table of content*](#_sm5m61aib1vx) *or go to options of the mobile browser and run the site in PC mode to show document outline (but it's better to have Google Docs installed on your phone instead as it’s more convenient in use).*\n\n*- Sometimes you cannot scroll down the list of headings on the phone in PC mode. Then you need to tap on the scroll bar in the very left, but it might suddenly look buggy all highlighted, but working nevertheless.*\n\n*- Downloaded .docx will have a similar document outline as in the GDoc (but more messy - with all headers used in the GDoc). If you have an error on attempt of opening the .docx file on Windows, go to RBM>Properties and check Unlock below Attributes.*\n\n*- Be aware that the document can hang for a while on the attempt of accessing a specific section of the document - it doesn't happen often on a PC browser - it’s the most stable form of reading the doc online. At least on a decent PC (so not C2Q, but even a decade old i7 is usually fine). But it can be stable on Android phone too (e.g. Snapdragon 700 series instead of old 400 series). Google’s app support for old 32-bit ROMs in e.g. Android 9 and older is terrible.*\n\n*- Search and navigating through the document outline works faster when you download the doc as .pdf or .docx, but in the latter you’ll have access to the document outline on the left like in GDoc (if not, press CTRL+F>Headings, or check View>Navigation window).*\n\n*- When visiting the online version of the doc, you can paste whole phrase when searching instead of single letters to avoid severe stuttering during using search function online.*\n\n*- Use the search bar in Google Documents, not the one from the browser - the browser’s search won’t find everything unless all the pages were shown before - the doc is huge.*\n\n*- Sometimes if you search for a specific keyword in the mobile app and the result doesn't show up, you need to go to the document outline, and open its last section and search again (so the whole document will be loaded first, otherwise you won't get all the search results in some cases). But it might happen mostly if you use the wrong search function.*\n\n*- When you click on a GDoc link with “heading” in the URL redirecting to a specific section, it should show the first page, and redirect after a few seconds.*\n\n*- You won't be able to immediately move to the specific section right after opening the GDoc. It loads all the pages for around 30 seconds, so you might need to click it sometimes twice before being redirected.*\n\n*- Make sure you've joined our Discord* [*server*](https://discord.gg/ZPtAU5R6rP) *to open some of the Discord links attached below (those without any file extension at the end).*\n\n*- Download links from Discord with file extensions at the end no longer work, but I reuploaded most of the important links already. If you need to download from previously shared Discord link anyway:\n1) Join our Discord server via invitation at the top of the document 2) Delete file name from the link 3) Open our Discord server in the browser 4) Leave everything in the link before the first slash and delete the rest (so channels\\xxxx\\ 5) paste two identifiers divided by slashes afterwards, but without file name (so channels\\xxxx\\xxxx\\xxxx - where the last two are taken from inactive file link) \\*) If you paste offline link in any channel on the source server, the link will work again*\n\n*- If you have a crash on opening the doc in the app, e.g. on Android - reset the app cache and data. Keep the app updates or find some old version (e.g. even from period when your phone was released or uninstall all updates if it's GDoc is your stock app*\n\n*- If it loads 4 minutes/infinitely in the doc app, update your Google Docs app and reset the app cache/data, e.g. if you started to have crashes after the app update.*\n\n*- You can share a specific section of this document by opening it on PC or on a mobile browser set in PC mode by clicking on one of the sections in the document outline (or hyperlinks leading to specific sections). Now it will add a reference to the section in the link in your address bar, which you can copy and paste, so opening this link will redirect someone straight to the section after opening the link (in some specific cases, some people won’t be redirected, but in fact, you only need to wait a few seconds after the first page of the doc has been shown, and then the proper redirect starts).*\n\n*Some headers not referred to in the outline are also set in a way that when you click them, the address bar will change with a link leading to that specific section. Not all headers present in the doc are shown in the outline to preserve better readability.*\n\n*- In the GDoc app sometimes you need to tap “wait” a few times when the app freezes. Afterwards, searching will start working all the time (at least till the next time). The doc is huge and the GDoc app on at least low-end Androids is cursed (desktop version on PC behaves the most stable, as long as decent phones). You've been warned.*\n\n*- If you feel overwhelmed by the doc size, theoretically you can load the doc into Google Gemini or Google NotebookLM and ask questions from there, but I encourage befriending with the document outline and the content of an interesting section yourself - asking the AI chat for the best models leads to hallucinating of the model and providing list of outdated separation models from this doc. Also, they all miserably fail with generating model and config links from even cut fragments of the GDoc. They also cannot edit the document directly (unless you paste the text, but it will rather delete all the formatting and hyperlinks which are essential to the task), maybe Office 365 with paid subscription is capable of editing docs with AI directly already.*\n\n*- Descriptions on the list of models are usually shortened compared to the news section information added when the model was released. You can use search with the model name for more possible descriptions.*\n\n*- Published audio demos of models pasted from MVSEP get online after a while, so most links to audio files from there will be offline after a week or more.*\n\n*- Without GDoc app installed, or when in not PC mode of the browser, if you you open links to this document ending with e.g. “#heading=h.hk34hc4d1ah7” or similar, you won't be redirected to a specific section of this document referred to in the heading. Redirections from such links doesn't work in mobile version of the GDoc site.*\n\n*To be redirected after a moment to proper section from outside links with “heading” in the URL, you should open these links with GDoc app installed, or on PC browser, or mobile browser with PC Mode turned on (in Chrome that option appears when you open a page already).*\n\n*- Sometimes when you click on an entry in the document outline, it might not react the first time straight up after you load the document. Most likely it's still loading and you need to click it twice or more and then you'll be redirected.*\n\n*- Even on a powerful phone with lots of RAM, GDoc app can occasionally crash, esp. while browsing it before it’s fully loaded, and even deleting the app and reopening it won’t help.*\n\n*- In February 2026 it started to happen that Chrome on Windows was closing after opening the document. It helped to reopen it a few times, maybe along with uninstalling offline GDoc extension and maybe after going to chrome://settings/content/all?sort=data-stored > Google > docs.google.com (I just did all of these). The issue was recurring even when you just disabled the extension (it was able to re-enable itself.*\n\n*- If you click on any hotlink redirecting to a specific part of this document from the mobile version of GDoc in the browser, you won't be able to show options to display the document outline after switching your browser into PC mode - it will remain in the mobile layout. It's because redirections in mobile versions have their own linking scheme adding to the site address - you need to delete its ending or reopen the doc.*\n\n*- Besides me, jarredou (Discord: rigo2), dca100fb8 (both since 23 May 25) and isling (since 12 Feb 26), currently no one else has writing privileges to this document, although the first two were reluctant to be active editors, and were granted the privileges as the last resort for possible cases of my longer absence in the future.*\n\n(I’m trying to keep the following list always updated with the Last updates/news section at the top)\n\n## \\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nEveryone asks **which service and/or model is the best** for ***instrumentals, vocals or stems***. The answer is - we have listed a few models and services which behave the best in most cases, but the truth is - the result also strictly depends on the genre, specific song, and how aggressive and heavily processed vocals it has. Also, how much distortion instruments have, style of mixing, etc. Sometimes one specific album gets the best results with one specific tool/AI/model, but there might be some exceptions for specific songs, so just feel free to experiment with each, to get the best result possible using various models, ensembles and services/AIs from those listed. SDR on MVSEP doesn't always reflect bleeding well. That’s why we introduced bleedless and fullness metrics for evaluation of the models as well. You’ll read more about it in [this](#_le80353knnv5) section.\n\n“Some people don't realize that if you want something to sound as clean as possible, you'll have to work for it. Making an instrumental/acapella sounding good takes time and effort. It's not something that can be rushed. Think of it like (...) love to a woman. You wouldn't want to just rush through it, would you? Running your song through different models/algorithms, then manually filtering, EQing, noise/bleed removing the rest is a start. You can't just run a song through one of these models and expect it to immediately sound like this” rAN\n\nSometimes you might want to combine results of specific models in specific song fragments.\nIf the song is too muddy, you might want to use demudder in [newer](#_6y2plb943p9v) UVR patches and/or use some free [AIs](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07) like AudioSR, Apollo or other, to further enhance the result.\nIf you’re still not happy, you might want to manually mix separated song stems and/or master it using plugins or AI mastering services (more about it [here](#_k34y1vaaneb1)).\n\nSometimes you might be capable of creating a loop out of the fade ins/outs/intros/outros so you could totally refrain from using AI separation in the key fragments of the song, so you could just only use AI as reference to arrange the song as it was, and only fill missing fragments with separation.\n\nA good starting point is to have [a lossless song](#_nspwy0bkpiec) (the result will be a bit less muddy after separation).\n\nNow, from free separation AIs/models, to get a decent instrumental/vocals, you can use the solutions below, starting from the models at top (every song might work differently with different models - find the best for your song - also, various headphones and speakers might be more or less sensitive to show you bleeding in song - lots of the time it will be imminent without phase fixer in songs with dense mix):\n\n### **The best models**\n\n### **for specific stems**\n\n*There's no such thing like the best model. It depends on a song, even in specific genre, mixing, effects, etc.\nYou need to test the best models posted at the top here, and see what fits the best for your song or cut it into pieces and/or check* [*ensembles*](#_nk4nvhlv1pnt)*. For constant vocal buzzing check* [*phase fixer*](#_j14b9cv2s5d9)*/swapper in e.g. UVR>Tools (and use bleedless inst result of a vocal model as reference/source). Create an account using MVSEP, or you’ll have a very long queue.*\n\n- Most models here are Mel-Roformer and BS-Roformer model type in the compatible [UVR](#_6y2plb943p9v) version - not v2 model type\n(there's only one V2 model so far); don’t confuse it with e.g. v2 versions/iterations of models below, which are just their names\n- Check the *reading about* [*SDR*](#_sc2lgq9t4p19) *and* [fullness](#_le80353knnv5) *metric.* [*Evaluations*](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum) *are made on the multisong dataset on MVSEP (table can be sorted by also fullness/bleedless and other metrics, once you open a result, fullness/bleedless metrics are shown too, excluding old results)*\n\n**2 stems:**\n\n###### > for instrumentals *(click here for* [*ensembles*](#_nk4nvhlv1pnt)*, or* [*here*](#_n8ac32fhltgg) *for vocal models)*\n\n- Model names starting with MVSEP can be used only on [MVSEP](https://mvsep.com/) (no download links available)\n\n*Good all-rounders from various categories (balanced, fullness, bleedless):*\n\n- Becruily dual Mel-Roformer “[deux](https://huggingface.co/becruily/mel-band-roformer-deux/tree/main)” model (its instrumental stem)\n\nCompatible with the latest UVR Roformer [patch](#_6y2plb943p9v) (including the RTX 5000 one) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [uvronline.app](https://uvronline.app/) | [mvsep.com](http://mvsep.com) ([tips](#_wbc0pja7faof) on both)\n\nInst. fullness 34.25, bleedless 41.36, SDR 17.55\n“the best fullness model and does not even need phase fix. SOTA even (...) my fave fullness model” - dca100fb8.\n\nOn uvronline it uses vocal stem as phase fixer reference and it happens automatically, so you don't have to click on the correct phase button (previously it was available only in premium, not sure how now). “First, phase inversion is performed, resulting in a muddy instrumental. But the result is only used for phase correction. The second stem, which is available for download/listening, is more full.”\n\nIf you have severe bleeding with some specific songs, proceed with the models listed below.\n\n“more bleedless than resurrection inst (...) on a song with mostly piano, sounds considerably less muddy than resurrection inst… maybe overall slightly more noise... but the noise isn't bothersome”, (...) deux just seems to make some things super muddy and/or break them up... like an instrument here is just really wobbly with deux, probably because it doesn't have the fullness required.\n\nHyperace sounds better in areas here, but noisy in others.. I guess you really gotta use all 3 models if you want a great result”\n\nResurrection inst seems to be also fuller at times, but while being also noisier, and when the song has the noise in it intentionally, Resurrection inst tend to pick it up. - Rainboomdash.\n\n“Love the new model, besides it being on par with v1e in terms of fullness (on some tracks it's even fuller) without the noise tradeoff, it's also noticeably better in terms of bleedlessness as well, it managed to completely remove some faint choir/vocals pretty much all other models weren't able to remove. Most definitely my favorite model so far.” - Shintaro\n\n“I think V1E+ is still fuller, sounds more open” - rainboomdash\n\nDoesn't work well with VHS recordings - MarsEverythingTech\n\nDoesn’t eat flute - mohammedmehditber\n\nMore sensitive to keep some tiny sounds over gaboxflowersv10 - sakkuhantano[’](https://discord.com/channels/708579735583588363/708580573697933382/1459865854291345633)\n\n“worse than HyperACE v2 inst to remove scratching and beatbox” - dca\n\n“I do recommend trying a higher chunk\\_size for instrumentals with deux.\n\nI find 661500 works for a lot of songs, 749700 for a good amount of others.\n\nHigher=more fullness, less bleedless (...) 705600 might be a better default setting\n\nHighest SDR and more chance it won't be noisy” - rainboomdash (it complies with jarredou's measurements of the model with different chunks). For vocals, the default 570K is well balanced. “going beyond 882000 may sharply decrease SDR and other metrics.” - makidanyee\n\nThe stem named “other” on Colab is the muddiest, and the most bleedless, reminding vocal models results.\n\nSome people like using it with overlap 8.\n\n- Unwa bs\\_roformer\\_inst\\_hyperacev2 [model](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main/v2_inst) (incompatible with UVR, use [MSST](#_2y2nycmmf53)) makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link) | MVSEP\n\nInst. fullness 38.03, bleedless 37.87, SDR 17.40\n\nYou need to replace bs\\_roformer.py in MSST for it (for v2\\_voc and v2\\_inst, .py is the same).\n\n“Definitely way better than v1e+. but I have also noticed it's more noisy” - rainboomdash\n\n“Picks up vocals better than the first model, but along with the vocals, it muffles other instruments, such as a synthesizer.” - Halif\nMuch slower than Unwa BS-Resurrection inst.\n\nThe aura\\_mrstft score was further improved, and the SDR also increased. (...)\n\nIncidentally, this new model outperformed v1e+ on all metrics. (...) holds the highest aura\\_mrstft score on the instrumental side of the Multisong dataset.” - unwa\n\n<https://mvsep.com/quality_checker/entry/9475>\n\n“kept the chants in [one song], resurrection inst isn't really much better, though...\n\nit takes a high vocal fullness model to extract these (like fv7beta1, even fv7beta2 isn't enough), maybe that's why the instrumental models are getting confused... Surprised it's still very much there with resurrection inst…\n\nI'm surprised, hyperacev2 fixed vocal bleed on one song compared to the previous version. Does seem like vocal bleed with hyperace v2 is a bit better. It's not removed, just quieter in areas where it did happen (...) I've found hyperace works extremely well with a lot of acoustic songs, as it handles it well and has low vocal bleed\n\nunlike deux which has a lot of vocal bleed due to the song not being very loud” - rainboomdash\n\nGood at preserving SFX when phase fixed - wancitte\n\nDoesn't remove vocoder - stray\\_kids\\_filters\n\n- Unwa [BS-Roformer Resurrection inst](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection-Inst.ckpt) ([yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml)) | a.k.a. “unwa high fullness inst\" on MVSEP | uvronline.app/x-minus.pro | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Kaggle](https://www.kaggle.com/code/zzryndm/music-source-separation-training-inference-webui) | [UVR](#_6y2plb943p9v) (don’t confuse with Resurrection vocals variant)\n\nInst. fullness: 34.93, bleedless: 40.14, SDR: 17.25\n\nMVSEP BS 2025.07 works as a reference for phase fix with 3000/5000 settings.\n\nOnly 200MB. Some people might prefer it over V1e+, although it’s more muddy.\n\n“use if the others [below] are noisy”\n\nModels working for phase fixer (to alleviate the noise) are only BS-Roformer 1296/1297 by viperx and BS Large V1 by unwa, but generally the model might require phase fixing less than other models here - dca\n\n“One of my favorite fullness inst models ATM. Sounds like v1e to me, but cleaner. Especially with guitar/piano where v1e tended to add more phase distortion, I guess that's what you'd call it lol. This model preserves their purity better IMO” - Musicalman\n\n“I like resurrection inst for segments of piano, a lot of other models are too noisy there (...) I also needed to turn the overlap up for piano” (from 2 to 8). FNO was less noisy for it, but “the hit to fullness was extremely apparent” - rainboomdash\n\n“The way it sounds, is indeed the best fullness model, it's like between v1e and v1e+, so not so noisy and full enough, though it creates problems with instruments gone in the instrumental sadly, but apparently it seems Roformer inst models will always have problems with instruments it seems, seems like a rule. (...) Instrument preservation (...) is between v1e and v1e+” - dca100fb8\n\n“it seems to just nip some bits of random instruments like saxophone or guitar whereas v1e+ leaves them intact.” - dennis777\n\n“In some songs leaves vocal residues. It is heard little but felt” - Fabio\n\n“Almost loses some sounds that v1e+ picks up just fine” - neoculture\n\nMushes some synths a bit in e.g. trap/drill tune compared to inst Mel-Roformers like INSTV7/Becruily/FVX/inst3, but the residues/vocal shells are a bit quieter, although the clarity is also decreased a bit. Kind of a trade.\n\nBS 2025.07/BS 2024.04/BS 2024.08/SW removes less noise than viperx models for phase fixer.\n\n- Unwa Mel-Roformer V1e+ [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/blob/main/inst_v1e_plus.ckpt) ([yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/blob/main/config_melbandroformer_inst.yaml)) | [UVR](#_6y2plb943p9v) ([guide](#_eopfi619c6zr)) | [MVSEP](https://mvsep.com/) | [x-minus](https://x-minus.pro/ai)/[uvronline](https://uvronline.app/ai) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) Colab | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | [Kaggle](https://www.kaggle.com/code/zzryndm/music-source-separation-training-inference-webui)\n\nInst. fullness: 37.89, bleedless: 36.53, SDR: 16.65\n\n\\*) [Phase fixer](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) Colab (defunct; e.g. with FT3 as src or becruily voc)/[UVR](#_j14b9cv2s5d9)>Tools, or on x-minus with becruily vocal model used as reference model (premium) - for less noise/[script](#_j14b9cv2s5d9).\n\n\\*) introC [script](https://file.garden/Z3gSJFxsb21HAqp6/scripts/v1ep_resonance_remover.zip) to get rid of vocal leakage in this model\n\n\\*) “If you use Gabox Mel [denoise/debleed](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) ([Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)) on mixture then put the “denoised” inst stem of that into unwa inst v1e+ you get a very clean result with good fullness and very little noise” - 5b. But it can’t remove vocal residues, just vocal noise.\n\nMight also sound interesting when using as target in Phase fixer, and with source set as Becruily inst model (overlap 50/chunk\\_size 112455 was used; very slow - gustownis).\n\nOr bigbeta5e as a source to get rid of vocal residues - santilli\\_\n\n*(single model inference descriptions below)*\n\n“strange leakage [robot-like] in the vocal-only section with no instrumentation” - Unwa\n\n”less noise than v1e (probably due to different loss function), but it’s also less full, “somewhere between v1e and v1”\n\n“sometimes a detail piece of instrumental sound was lost, while on becruily inst [below] can pick that sound”. Might be too strong for MIDI sounds - kittykatkat\\_uwu.\n\nProblems with broken lead-ins not happening in instv7 and v1e. Some issues with cymbals bleed in vocals - dca.\nBetter than v1+. ”has fewer problems with quiet vocals in instrumentals than the V1+, “issues with harmonica, saxophone, electric guitar and synth seem to have been fixed” - dca100fb8.\n“has this faint pitched noise whenever vocals hit in dead silence, you may need to manually cut it out.” - dynamic64. Check out also BS\\_ResurrectioN later below, it’s like v1e++ (more fullness).\n\n- Unwa [BS-Roformer-HyperACE](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main) inst (a.k.a. v1) | [separate Colab](https://colab.research.google.com/drive/1lqHRm_h122qgpxLyx3xfHsrVei6ASx1t?usp=sharing) | uvronline.app (doesn’t work in UVR)\n\nInst., fullness 36.91, bleedless 38.77, SDR 17.27\n\n“sounding just like v1e+ after phase fix, but straight out of one single model\n\n(...) quite bleedy, but honestly it's a fair price to pay, I guess” - santilli\\_\n\nAlthough for some people it can be even on pair with v1e+ bleed-wise, so check it out too (more fullness).\n\nNote: It uses its own inference script. “You can use this model by replacing the [MSST](#_2y2nycmmf53) repository's models/bs\\_roformer.py with the repository's bs\\_roformer.py.”\n\nTo not affect functionality of other BS-Roformer models by it, you can add it as new model\\_type by editing utils/settings.py and models/bs\\_roformer/init.py [here](https://imgur.com/a/dkGXo2r) (thx anvuew).\n\nFor error while installing py file for HyperACE model in Sucial’s WebUI:\n\nfrom models.bs\\_roformer.attend import Attend\n\nModuleNotFoundError: No module named 'models'\"\n\nThe fix: “SUC-DriverOld/MSST-WebUI use the name \"modules\" and ZFTurbo/Music-Source-Separation-Training use the name \"models\". And Unwa's bs\\_roformer.py that you replace with, also use \"models\". So you'll have to do some coding and symlink to make it work.” - fjordfish\n\nMetrically less fullness than v1e+: 37.89, but more bleedless: 36.53, SDR: 16.65 (v1e+).\n\nWhile using locally, consider changing overlap from default 4 to 2 in the yaml of the model. The difference won’t be really noticeable for most people, but it will be faster.\n\n“Currently, this model holds the highest aura\\_mrstft score on the instrumental side of the Multisong dataset. (...)\n\nThis weight is based on the following [weights](https://huggingface.co/anvuew/BS-RoFormer). Thank you, anvuew!” - unwa\n\n“Does seem like HyperACE is picking up more instruments than v1e+\n\ndoes seem like slightly worse vocal bleed overall (still need to test this more, though)... haven't encountered the super tinny vocal bleed like v1e+, at least\n\nstill fails to pick up that brass instrument on one song... Not really any worse than v1e+, though (...) resurrection inst does sound more muddy, but also a lot less noise.. which makes sense... IDK, a little muddy for my tastes.\n\nI did find one song/spot and resurrection inst was on par with hyperace in picking up the wind instrument, v1e+ lost it for a bit.\n\nI have found in the past that resurrection inst generally picks up more instruments than v1e+ (...) fullness of HyperACE is much closer to v1e+ than resurrection inst (...) it gets pretty staticy compared to v1e+ [on some drums] (...) v1e+ does this to a lot less extent\n\nit's not super common, though… (...) I'm very confident in saying bshyperace picks up more stuff than v1e+.\n\nresurrection inst does pick it up much better than v1e+, but I think it's still too quiet\n\nresurrection inst really does just pick up so much more instruments, despite having a lot less fullness” - rainboomdash\n\n”fullness that is comparable to v1e+, but has significant more vocal crossbleeding in instrumental than BS Roformer Resurrection Inst, but still less than v1e+ and v1e” - dca100fb8\n\n“I’ve found myself using a mix of both HyperACE v1 and deux.\n\nI'm doing this literally on a Joy Division track right now. Convert both. Invert. Spectral down the residuals just enough to make the buzz less audible without going full Deux. I'm glad to have both models.\n\nConvert the original track with each of the two models. Invert one against the other to get all the extra 'vocal residue' that the HyperACE model has compared to the deux model. Then use spectral editing to lower the volume of the vocal residue and mixing it back into the Deux model to create a kind of 'inbetween' result. Usually I leave the higher frequencies alone (above 5 or 7k or so).” - CC Karaoke\n\n- Gabox [inst\\_gaboxFlowersV10](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gaboxFlowersV10.ckpt) Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/v10.yaml)) | makidanyee's [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link) | uvronline via special link (for [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) accounts)\n\nInst. fullness 37.12, bleedless 36.68, SDR 16.95[’](https://mirror.mvsep.com/quality_checker/entry/9505)\n\nIncompatible with RTX 5000 UVR patch (users will encounter “ModuleNotFoundError: \"No module named 'torch.\\_dynamo.polyfills.fx'\").\n\nAll of these metrics are better than Inst\\_FV8b).\n\nThe yaml is unique from the previous ones. Deux model fine-tune.\nFor less buzzing, use phase fix with becruily voc model - prodbyluke ([phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm)).\n\nChunk size for the flowers model in the Colab should be 352800 (yaml value, might be not evaluated).\n\n“a huge improvement over what Gabox has accustomed us to! I got amazing results. It truly achieves a stability that previous models lacked; the clarity is much greater in these.” - billieoconnell.\n\nIt might have louder constant buzzing vs even inst\\_fv4 without phase fixer, but the FlowersV10 fixes some issues of crossbleeding existing in previous models, and also with some distorted vocals and screams (dtn, dca100fb8, Tobias51).\n\n- Gabox [Inst\\_GaboxFv8](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv8.ckpt) v2 Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (don’t confuse with Inst\\_FV8 or INSTV8)\n\nInst. fullness: 33.21, bleedless: 40.73, SDR: 16.57\n\nUsually referred to as just Inst\\_GaboxFv8 without v2. Since its release, the checkpoint has been updated on 11.05.25 (same file name), metrics have changed (updated above).\n\nIt’s not the one on uvronline.\n“Good result for bleedless instead, fullness went down instead of up a little.”\nMight be an interesting competitor to Unwa inst v2 which is muddier.\n\nInst. fullness: 35.57, bleedless: 38.06, SDR: 16.51 are the metrics of the old v1 model.\n\n*- Gabox* [*Inst\\_GaboxFv7z*](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv7z.ckpt) Mel Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [uvronline.app](https://uvronline.app/ai)/x-minus.pro\n\nInst. fullness: 29.96, *bleedless*: 44.61, SDR: 16.62\n\n(duplicate from the above)\n\nBecruily vocal used for phase fixer on x-minus.pro/uvronline (premium feature).\n\n“Focusing on the less amount of noise, keeping fullness”\n\n“the results were similar to INSTV7 but with less noise” but “the drums are totally fine with this model ”- neoculture\n\n“it seems to capture some vocals better” - Gabox\n\nIn some songs, “it leaves a lot of reverb or noise from the vocals. unva v1e+ a little better” - GameAgainPL\n\n“[one of the] best bleedless, good fullness, almost noiseless” - Aufr33\n\n- Gabox [Inst\\_FV8](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Inst_Fv8.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) a.k.a. Gabox V8 Mel-Roformer (experimental; don’t confuse with “InstGaboxFv8”) - previously exclusive uvronline model via special link ([free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test))\n\nIt gives decent results too. “it's on the higher bleedless side” - Rainboomdash (although less bleedless than fv7b and fv7z).\n\nPrevious epoch of the FV8B model, previously only on uvronline.\n\n- MVSep SCNet vocals model: SCNet XL IHF (high instrum fullness by bercuily).\n\nInst. fullness 32.31, inst. bleedless 38.15, SDR 17.20\n\n“One of my favorite instrumental models, Roformer-like quality.\n\nFor busy songs it works great, for trap/acoustic etc. Roformer is better due to SCNet bleed” - becruily\n\n“bring[s] such near perfect instrumentals”\n\nvs the previous XL models “It's high fullness version for instrumental prepared by becruily.”\n\nIt can also be an insane vocal model too.\n\n*Still good, less frequently used models*\n\n*-* Gabox [INSTV7](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/Inst_GaboxV7.ckpt) a.k.a. Inst\\_GaboxV7 Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | [MVSEP](https://mvsep.com/) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | “F”, for fullness, “V” for version.\n\nInst. fullness: 33.22, bleedless: 40.71, SDR: 16.51\n\\*) [Phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm)/UVR’s Phase Swapper (for less noise; e.g. with FT3 by Unwa vocal model as source).\n\n“I hear less noise compared to v1e, but it has a worse bleedless metric” and might be less full.\n\nIt might still have too much noise like v1e for some people, but less.\n“Relatively full/noisy model. Fvx [below] is a sort of middle ground between v3 and v7.”\nMore fullness than V6, but vs v1e, sometimes “leaves noises throughout the song, sometimes vocal remnants in the verse of the song, and some instruments are erased.”\nLess muddy than Mel 2024.10 on MVSEP, and V7 doesn’t preserve vocal chops/SFX.\n\nGabox claims it's not the same as V7 on uvronline, but FV7z, despite both appear on the list while using a special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test).\n\n- Becruily’s inst [model](https://huggingface.co/becruily/mel-band-roformer-instrumental/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | on MVSEP a.k.a. Mel-Roformer “high fullness” | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)\n\nInst. fullness 33.98, inst. bleedless 40.48, SDR 16.47\nor on x-minus/uvronline (with optional phase correction feature in premium) | [UVR](#_6y2plb943p9v)\n\\*) It's an older model, but it’s great to get less vocal residues when used in phase fixer [Colab](https://colab.research.google.com/drive/1uDXiZAHYk7dQajOLtaq8QmYXL1VtybM2?usp=sharing) (also in UVR>Tools) and “becruily's vocals as source and inst as target”.\nIf it’s too muddy, consider using a result of nosier single model as Inst\\_GaboxFv8\\_v1 (or some fullness models) as reference for Matchering in e.g. UVR.\n\nAlone, Becruily’s inst is as clean as unwa’s v1, but has less noise, and it can also be got rid well by:\n\\*) Mel [denoise](#_hyzts95m298o) and/or Roformer [bleed suppressor](https://drive.google.com/file/d/1zWOrzPKd-6x7vjHNqjK_bK2UOYfPzCuu/view) by unwa/97chris. That model “removed some of the faint vocals that even the bleed suppressor didn't manage to filter out” before”. Doesn’t require phase fix. Try out denoising on a mixture first, then use the model.\n\nOn its own, the inst model correctly removes SFX voices. The instrumental model pulled out more adlibs than the released vocal model variant, when it can pull out nothing.\nCurrently, the only model capable of keeping vocal chops.\n“Struggles a lot with low passed vocals”\n\nMore instruments correctly recognized as instruments and not vocals, although not as much as Mel 2024.10 & BS 2024.08 on MVSEP, but still more than unwa’s inst v1e/v1/v2.\n\n- If you use lower [dim\\_t](#_c4nrb8x886ob) like 256 (or maybe also corresponding [chunk\\_size](#_c4nrb8x886ob)) on weaker GPUs, these are the first Mel inst models to have muddy results with it.\n\n- In the phase fixer you can experiment with “using becruily's vocals as source and inst as target, and changing high frequency weight from 0.8 to 2 makes for impressive results” you can do it automatically after separation in this [Colab](https://colab.research.google.com/github/lucassantillifuck2fa/Music-Source-Separation-Training/blob/main/Phase_Fixer.ipynb) (santilli\\_ suggestion).\nUsing Kim Mel FT2 as source instead might be more problematic as it tends to be more harmful to instruments, and in noise removal both are similar (dca).\n\n- To demud the results from phase fixer, you can use Matchering and a well sounding fragment of single instrumental model separation with high fullness metric (e.g. 7N) as a reference and becruily inst/voc phase fixed result set as target (e.g. in UVR>Tools>Matchering). It will have less bleeding than models with low bleedless metric, but still fuller than phase-fixed results (more [here](https://discord.com/channels/708579735583588363/708580573697933382/1345895513324785788) and [here](https://discord.com/channels/708579735583588363/708580573697933382/1347158763542679573)). Phase fixer can also be used in a standalone Python [script](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=drive_link) or in the [latest UVR](#_6y2plb943p9v). Matchering can be used in [Colab](https://colab.research.google.com/github/kubinka0505/matchering-cli/blob/master/Documents/Matchering-CLI.ipynb) or [songmastr](https://www.songmastr.com/) or [locally](https://github.com/kubinka0505/matchering-cli) (it’s very lightweight and doesn’t require a GPU).\n\n- Gabox [inst\\_fv4](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_Fv4.ckpt) Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. fullness 39.40, bleedless 33.49, SDR 16.44\n\nDon’t confuse it with inst\\_fv4noise - the regular variant was never released before (and with voc\\_fv4).\n\n“Seems to be erasing a xylophone instrument. Does sound not too noisy and not muddy, I like it. (...) A little noisy with piano (I split the song up and process with resurrection inst there). (...) Does have some issues that resurrection inst doesn't have, but it doesn't sound muddy! It usually works great. (...) In my opinion, fv4 still has vocal traces, I don't know if in all of its songs and v1e plus doesn't have them, but the noise can bother you even though it's not much. Does have more vocal bleed at times. I think a lot of what I thought was vocal bleed was a synth, it did a pretty good job... There was one segment on a song where it caught vocal residues, though” - rainboomdash\n\nIt has more constant buzzing than becruily inst, but it’s fuller (might be a no-go if you’re sensitive to it e.g. using headphones).\n\n- Gabox [Inst\\_FV8b](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Inst_FV8b.ckpt) Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml))\nInst. fullness: 35.05, bleedless: 36.90, SDR 16.59.\n\nMuddier than V1e+ (37.89), but cleaner. Some people might prefer it over INSTV7.\n\n“Preserves its volume stability to the original sound of the songs, it does not go down or lose strength, which is the most important thing, it manages to capture clear vocal chops, the voice is eliminated to 99 or 100% depending on its condition, it captures the entire instrumental and when making a mix it remains like the original that with other models the volume was lowered,” - Billie O’Connell\n\n###### *Recent bleedless models*\n\n*- Gabox* [*Inst\\_GaboxFv7z*](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv7z.ckpt) Mel Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [uvronline.app](https://uvronline.app/ai)/x-minus.pro\n\nInst. fullness: 29.96, bleedless: 44.61, SDR: 16.62\n\n(duplicate from the above)\n\nBecruily vocal used for phase fixer on x-minus.pro/uvronline (premium feature).\n\n“Focusing on the less amount of noise, keeping fullness”\n\n“the results were similar to INSTV7 but with less noise” but “the drums are totally fine with this model ”- neoculture\n\n“it seems to capture some vocals better” - Gabox\n\nIn some songs, “it leaves a lot of reverb or noise from the vocals. unva v1e+ a little better” - GameAgainPL\n\n“[one of the] best bleedless, good fullness, almost noiseless” - Aufr33\n\nDespite the metrics, it can have more constant buzzing than fv7b.\n\n- Gabox [inst\\_fv7b](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/inst_fv7b.ckpt) Mel Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml))\n\nInst. fullness 27.07, **bleedless** 47.49, SDR 16.71\n\n*(duplicate from the above)*\n\nFullness worse than even most vocal Mel-Roformers (incl. BS-RoFormer SW and Mel Kim OG model).\n\n“on the fuller side, somewhere around inst v1e+, maybe a tiny bit below. The main thing I notice is it captures more instruments than v1e+, but isn't muddy like [HyperACE] (which also captures more instruments)\n\ncan be a little on the noisy side sometimes... but it at least isn't muddy and sounds natural (...) I'd still ensemble if you want the noise reduced - rainboomdash ([src](https://discord.com/channels/708579735583588363/708580573697933382/1441680934129631325))\n\nIn some lo-fi beats, it can keep muffling hi-hats by being too muddy.\n\n0) Rifforge by mesk [model](https://huggingface.co/meskvlla33/rifforge/tree/main) final | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nCan be even more destructive for drums than the two above, muddier, but it has even less buzzing by straight up picking noise existing in the mixture along with vocals. It’s slow and big. Generally dedicated to metal, achieving better results there, but can work for e.g. hip-hop (along with caveats above).\n\n*- Unwa* [*BS-Roformer-Inst-FNO*](https://huggingface.co/pcunwa/BS-Roformer-Inst-FNO) *|* separate [Colab](https://colab.research.google.com/drive/14kuHLSZm4QjqMHVg14l956joSQjzafea?usp=sharing) *#3*\n\nInst. fullness: 32.03, bleedless: 42.87, SDR: 17.60\n\nIncompatible with UVR, install [MSST](#_2y2nycmmf53), then read model instructions [here](#_hjuoj68tot6v) (requires modifying bs\\_roformer.py file in MSST, also models\\_utils.py for PyTorch newer than 2.6).\n\nActually similar results to BS-Resurrection inst model above, less fullness.\n\nSome people even prefer Gabox BS\\_ResurrectioN instead.\n\n“Very small amount of noise compared to other fullness inst models, while keeping enough fullness IMO. I don't even know if a phase fix is needed. Maybe it's still needed a little bit.” dca\n\n“seems less full than the Resurrection, which I would expect given the MVSEP [metric] results. (...) I'd say it's roughly comparable to Gabox inst v7”\n\n“I replaced the MLP of the BS-Roformer mask estimator with FNO1d [Fourier Neural Operator], froze everything except the mask estimator, and trained it, which yielded good results. (...) While MLP is a universal function approximator, FNO learns mappings (operators) on function spaces.”\n\n“(The base weight is Resurrection Inst)”\n\n*Lower fullness models*\n\n*(if you find the ones above too muddy, but here you get more noise)*\n\n*0) Gabox* [*inst\\_gabox3*](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox3.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | [Phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm)\n\nInst. fullness 37.69, bleedless 35.93, SDR 16.50\nActually worse fullness than v1e+ (37.89), and lower bleedless (36.53).\n\nWhen used with Unwa’s beta 6 as reference for phase fixer (thx John UVR), slightly less muddy results than phase-fixed Becruily inst-voc results, but also slightly more vocal residues and a bit more inconsistent sound, fluctuations across the whole separation at times.\n\n0) Gabox [INSTV7N](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/INSTV7N.ckpt) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness 36.83, bleedless 35.47, SDR: 16.65\n\nMore noisy than INSTV7; “it's [even] closer to v7 than inst3”\n\n- [Inst\\_GaboxFv8 v1](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/b06c7d6ee7c21a58d9096d6184727da8d1b98e0f/melbandroformers/instrumental/Inst_GaboxFv8.ckpt) Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | fork of makidanyee [Colab](https://colab.research.google.com/drive/1-NhQJPUiqmnx0qOs_nVT9DKPivWMpVho?usp=sharing)\nInst. fullness: 35.57, bleedless: 38.06, SDR: 16.51\nThe OG link to the model changed to the v2 variant of the model, but the old link to the v1 was retrieved above.\n\nIt has “v1+ metallic noise” - Gabox. And even nasty residues louder than buzzing vs the v2 variant.\n\nVS V1e+ “A bit cleaner-sounding and has less filtering/watery artifacts. Both models are prone to very strange vocal leakage [“especially in the chorus”].\n\nAnd because Fv8 can be so clean at times, the leakage can be fairly obvious. For now, my vote is for Fv8, but I'll still probably be switching back and forth a lot. Still has ringing” - Musicalman. Although, you might still prefer it over V1e+.\n\nMight have some “ugly vocal residues” at times (Phil Collins - In The Air Tonight) - 00:46, 02:56 - dca.\n\n“Sometimes V1e+ has vocal residues which sound like you were speaking through a fan/low quality mp3” - dca\n\n”Seems to pick up some instruments better” Gabox.\n\n\\_\\_\n\n0) SCNet XL model called “very high fullness” | [MVSEP](http://mvsep.com)\n\nInst. fullness 34.04, bleedless 35.15, SDR 16.60\n\nIt might work better than Roformers for less noisy/loud/busy mixes or genres like alt-pop, orchestral tracks with choir, sometimes giving more full results than even v1e, but at the cost of more noise. Might struggle with some vocal reverbs or effects.\n“Very hit or miss. When they're good they're really good but when they're bad there's nothing you can do other than use a different model”\nCompared to the high fullness variant, more crossbleeding of vocals in instrumentals (along with SCNet XL basic model). Some songs which sound full enough even with basic SCNet XL (and HF variant) while others will sound muddy (dca)\n\n“has a lot of noise/bleed, and I haven't found the best way to get rid of it, but it does tend to pick up harmonies and subtle BGV that other models don't.” dynamic64\n\n0) MVSEP SCNet XL high fullness\n\nInst. fullness 31.95, bleedless 34.06, SDR 17.26\n\n“I have a few examples where it's better than v1e+\n\nSometimes there is too much residue but most of the time it's fine” dca\n“Really loving the way SCnet high fullness [variant] handles lower frequencies, below 2K [let’s] say. Roformers are better with the transients up high, but decay on guitars/keys on the SCnet is more natural”\n\n“seems to also confuse less \"difficult\" instruments for vocals”\n\n“I noticed classic SCNet XL preserves more instruments than the high fullness one, but has more vocal crossbleeding in instrumental compared to high fullness\n\nSo if you want instrument preservation use SCNet XL 1727 but if you want less crossbleeding of vocals in instrumental use SCNet XL high fullness\n\nI ignore the very high fullness one because it has too much vocal residue” dca\n\n*(regular SCNet XL moved below)*\n\n*\\_\\_\\_\\_\\_\\_\\_*\n\n###### - Gabox BS\\_ResurrectioN [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/BS_ResurrectioN.ckpt) | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml)\n\n###### “It is a fine-tune of BS Roformer Resurrection Inst but with higher fullness (like v1e for example), it needs [MVSEP’s] BS 2025.07 (as a source/reference) phase fix\n\n###### I requested it because I found some songs where Resur Inst was producing muddy instrum results (...) I requested it not just for me because I saw other people were looking for something like v1e++” - dca\n\n###### **Higher fullness** (but with more noise)\n\n*(sorted by fullness)*\n\n0) Gabox [INSTV6N](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/INSTV6N.ckpt) (N for noise/fullness) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [SESA](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\nInst. **fullness**: 41.68 (more than v1e), bleedless: 32.63 (N “noisier but fuller”)['](https://mvsep.com/quality_checker/entry/7915)\n\nTo get rid of noise in INSTV6N, use Gabox [denoise/debleed](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) model ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) on mixture first, then use INSTV6N - “for some reason it gives cleaner results” (Gabox), but it can’t remove vocal residues.\n\nSome people find it having less noise vs v1e and more fullness.\nAlso, it has more fullness vs INSTV6, and more noise, but some people might still prefer v1e.\n\n“v1e sounds like an \"overall\" noise on the song, while v6n kind of mixes into it.\n\nv6n also sounds like two layers, one of noise that's just there. And the other one mixes into the song somehow. Using the phase swap barely makes it any better than phase swapping with v1e though” - vernight\nAlso Kim model for phase swap seems to give less noise than unwa ft2 bleedless\n\n“Comparing V6N with v1e and couldn't hear a fullness difference despite the metrics being approx 39 for v1e and 41 for V6N” - dca\n\n“my all-time favorite” - ezequielcasas\n\n0) Gabox [inst\\_Fv4Noise](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_Fv4Noise.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml?download=true) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness 40.40, bleedless 28.57, SDR 15.25\n\nCan be better than INSTV6 for some people, but overkill for others. Bigger fullness metric than even v1e.\n\n“Despite v4's significant amount of noise, it seems to be the only model [till 8 February] that gave me a fuller sounding result compared to v1e that's actually perceivable by my ears.” - Shintaro\n\n“although the fullness metric increases when there is more noise, it doesn't always mean it's a better instrumental — an example of this is the fv4noise metrics” - Gabox\n\n0) Unwa [Inst V1e](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) (don’t confuse with newer +/plus variant above) | yaml from v1\n\nInst. fullness 38.87, bleedless 35.59, SDR 16.37\n\n[Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR instructions](#_6y2plb943p9v) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down) | MVSEP\n\nOne of the first Mel Kim model fine-tunes trained with instrumental (other) target. High fullness metric, noisy at times and on some songs. To alleviate it, it can be used with automated [phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) or UVR>Tools (Kim Mel as reference removes more noise than 2024.10 vs muddier Unwa v1/2 on their own; optionally use VOCALS-MelBand-Roformer by Becruily or unwa's kim ft; you can also use FT2 as reference, but it “cuts instruments” vs FT3 which can be rather better alternative). Optionally, in Phase Fixer you can set 420 for low and 4200 for high or 500 for both and Mel-Kim model for source; and [bleed suppressor](https://drive.google.com/file/d/1zWOrzPKd-6x7vjHNqjK_bK2UOYfPzCuu/view) (by unwa/97chris) to alleviate the noise further (e.g. phase fixer on its own works better with v1 model to alleviate the residues). Besides the default UVR default 500/5000 and Colab default 500/9000 values, you could potentially “even try like 200/1000 or even below for 2nd value.” “I would say that the more noisy the input is, the lower you have to set the frequency for the phase fixer.”\n\nV1e might catch more instruments and vocals than INSTV6N. Even fuller model with more noise is instfv4noise below by Gabox.\n\n“The \"e\" stands for emphasis, indicating that this is a model that emphasizes fullness.”\n“However, compared to v1, while the [fullness](#_le80353knnv5) score has increased, there is a possibility that noise has also increased.” “lighter compared to v2.” Like other unwa’s models, it can struggle with flute, sax and trumpet (unlike Mel 2024.10, and BS 2024.08 on MVSEP respectively - you can max ensemble all the three as a fix [dca100fb8]). Also, sometimes unwa's big beta5e can retrieve missing instruments vs v1e when those two above fails. Possible residues of dual layer vocals from suno songs.\n\n0) [inst\\_gaboxFv3](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gaboxFv3.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml?download=true) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) - F for fullness\n\ninst. fullness 38.71, inst. bleedless 35.62 (“F” stands for fuillness) | Inst SDR 16.43\nLike v1e when it comes to fullness, but less bleeding.\n\nVs v1e “it's slightly better with some instruments”, It might pick up an entire sax in the vocal stem.\n\nIt doesn't have that weird fullness noise that fullness models produce, but still gives pretty full results and the phase swapper (with big beta 6 as reference) gets rid of that weird buzzing sound” John UVR\n\n0) Gabox experimental “[fullness.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Fullness.ckpt)” inst Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)).\n\nInst. fullness: 37.66, bleedless: 35.53, SDR: 15.91\n\n“this isn't called fullness.ckpt for nothing.” - Musicalman\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Sorted by the biggest fullness metric on the list:*\n\nINSTV6N (41.68)>inst\\_Fv4Noise (40.40)/INSTV7N (no metrics)/Inst V1e (38.87)>Inst Fv3 (38.71).\nWhile V1e+ (37.89) might be already muddy in some cases.\n\nSorted by bleedless metric [here](#_6ypgpf4ku4d0)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Lower bleedless models/balanced*\n\nStill less noise even when using without phase fixer\n\n0) Unwa [BS-Roformer Resurrection inst](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst.ckpt) ([yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml)) | a.k.a. “unwa high fullness inst\" on MVSEP | uvronline.app/x-minus.pro | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [UVR](#_6y2plb943p9v) (don’t confuse with Resurrection vocals variant)\n\nInst. fullness: 34.93, bleedless: 40.14, SDR: 17.25\n\n(duplicate from the above, because it fits metrically and categorization-wise here, more info above)\n\n0) Unwa [Mel-Roformer inst v1](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/resolve/main/melband_roformer_inst_v1.ckpt) ([yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/resolve/main/melband_roformer_inst_v1.ckpt)) | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d) | UVR [installation](#_6y2plb943p9v) | MVSEP | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)\ninst. fullness 35.69, bleedless 37.59\n\\*) Denoising for v1/2/1e recommended with: 1) ensemble noise/phase fix option for x-minus premium 1b) Becruily [phase fixer](https://discord.com/channels/708579735583588363/767947630403387393/1304002277375213700) | [Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) (also since UVR beta patch #7) 2) Mel-Roformer [de-noise](#_hyzts95m298o) non-agg. (might be better solution) 3) UVR-Denoise medium aggression (default for free users) 4) minimum aggression for premium/[link](https://uvronline.app/ai?discordtest) (damages some instruments less) 5) UVR-Denoise-Lite [agg. 4, no TTA] in UVR - more aggressive method 6) UVR-Denoise [agg. 30/25, hi-end proc., 320 w.s., p.pr.] - even more muddy but preserves trumpets better\n\nv1 might have more instruments missing vs v1e and less noise\n\n0) [inst\\_gabox2](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox2.ckpt) ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\ninst. fullness 36.03, bleedless: 38.02\n\n-\n\n0) Becruily inst [model](https://huggingface.co/becruily/mel-band-roformer-instrumental/tree/main) (again, because it fits here metrically)\n\nInst. [fullness](#_le80353knnv5) 33.98, bleedless 40.48, SDR 16.47\n[Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | on MVSEP a.k.a. “high fullness” (the same model) | x-minus (w/ optional phase correction feature in premium) | [UVR](#_6y2plb943p9v)\nFor less vocal residues use phase fixer [Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) (also in UVR>Tools) and “becruily's vocals as source and inst as target”\n\n0) [Inst\\_GaboxFv8](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv8.ckpt) v2 model ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. fullness: 33.21, bleedless: 40.73, SDR: 16.57\n\n(again, just for metrics)\n\n0) Gabox [instV7plus](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/instv7plus.ckpt) bleedless model (experimental)\n\ninst. fullness: 29.83, bleedless: 39.36, SDR 16.51\n\\_\n\n0) MVSEP SCNet XL (don’t confuse with undertrained weights on ZFTurbo’s GitHub)\n\ninst. fullness 28.74, bleedless 39.42, SDR 17.27\n\n“I've come across a lot of songs where high fullness [SCNet variant above] gives that annoying static noise. I'm starting to like basic SCNet XL more to the high fullness [model]. And also, less vocal residues.” - dca. There is crossbleeding of vocals in some songs. You can find the dca’s list for that model in further parts of [this](#_37hhz9rnw7s8) section.\n\n0) MVSEP SCNet XL IHF\n\ninst. fullness 28.87, bleedless 40.37, SDR 17.41\n\nSome songs struggling with previous models might yield better results.\n\n0) MVSEP SCNet Large\n\ninst. fullness 27.10, bleedless 41.47, SDR 17.05\n\n###### *Higher bleedless (not so full)*\n\n###### *(as above but more models and reversed order)*\n\n0) Gabox B/bleedless v3 ([inst\\_gaboxBv3](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gaboxBv3.ckpt)) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness: 32.13, bleedless: 41.69, SDR 16.60\n\n“can be muddy sometimes” but still fuller than the older one below.\nSometimes more buzzing than FV8B, although fuller, although it can change across the song when it will be exactly the opposite, and also buzzier than FV7Z and FV7B\n\n0) Unwa Mel-Roformer inst v2 (similar but fewer vocal residues (not always), muddier, bigger, heavier model)\n\nInst. fullness 31.85, bleedless 41.73 (less bleeding than Gabox instfv5/6)\n\n[Model files](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) (or OG [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training)) | [UVR](#_6y2plb943p9v) Download Center)\n\nMight miss the flute. “Sounds very similar to v1 but has less noise, pretty good” “the aforementioned noise from the V1 is less noticeable to none at all, depending on the track”. “V2 is more muddy than V1 (on some songs), but less muddy than the Kim model. (...) [As for V1,] sometimes it's better at high frequencies” Aufr33\nMight miss some samples or adlibs while cleaning inverts. SDR got a bit bigger (16.845 vs 16.595).\n\n“Significantly less noise than v1e, sounds full enough, despite the fullness inst score, and that it recognizes more instruments than v1 and v1e, added to the fact it has higher SDR so also slightly less vocal crossbleeding in instrumental.” - dca100fb8\n\n0) Unwa [BS-Roformer-Inst-FNO](https://huggingface.co/pcunwa/BS-Roformer-Inst-FNO)\n\nInst. fullness: 32.03, bleedless: 42.87, SDR: 17.60\n(again, because it fits metrically, more info moved near the top to recent bleedless section)\n\n0) Gabox [Inst\\_GaboxFv7z](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/Inst_GaboxFv7z.ckpt) Mel Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | uvronline/x-minus.pro\n\nBecruily vocal used for phase fixer on x-minus.pro/uvronline (premium).\n\nFullness: 29.96, bleedless: 44.61, SDR: 16.62\n\n(again, because it fits metrically, -||-)\n\n0) Gabox [inst\\_fv7b](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/inst_fv7b.ckpt) Mel Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml))\n\nInst. fullness 27.07, **bleedless** 47.49, SDR 16.71\n\n*(duplicate from the above)*\n\nFullness worse than even most vocal Mel-Roformers (incl. BS-RoFormer SW and Mel Kim OG model).\n\n“on the fuller side, somewhere around inst v1e+, maybe a tiny bit below. The main thing I notice is it captures more instruments than v1e+, but isn't muddy like [HyperACE] (which also captures more instruments)\n\ncan be a little on the noisy side sometimes... but it at least isn't muddy and sounds natural (...) I'd still ensemble if you want the noise reduced - rainboomdash ([src](https://discord.com/channels/708579735583588363/708580573697933382/1441680934129631325))\n\nIn some lo-fi beats, it can keep muffling hi-hats by being too muddy.\n\n0) Rifforge by mesk [model](https://huggingface.co/meskvlla33/rifforge/tree/main) final | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (below)\n\nCan be more destructive for drums than the two above, more muddy, but it has even less buzzing by straight up picking noise existing in the mixture as vocals. It’s slow and big. Generally dedicated to metal, achieving better results there, but can work for hip-hop.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Last resort - muddier but cleaner single vocal models with more bleedless tested for instrumentals (sorted by bleedless)* [*here*](#_6f1v88my7hfk) *|* [*descriptions*](#_3mrz4632uifx)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### *Special purpose models*\n\n- Mesk’s Rifforge Mel-Roformer [model](https://huggingface.co/meskvlla33/rifforge/tree/main) final | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) - focused on inst/voc separation for metal music\n\n“The model can have some quirks (just like most models) but it's all around clean for me to release.” “It kinda also fucks up in like cleans [not distorted/non-metal vocals]”\n\n“A pretty insane instrumental (...) I can still hear the punch (...) drums punch is preserved!!!! result is 95% since it kept some deep madness vocals at the end and there is some subtle vocal reverb left but not noticed if you didn't focus” - mohammedmehditber\nTraining details:\n“This is a dimension 512 depth 24 model (so fairly large file size at 1.9 GB!), with an SDR of 14.2436.\n\nIt's finetuned from an older Melband Roformer checkpoint with an SDR of 13.7.”\n\n- “I think I found the (IMO) the best process for metal:\n\n1. inferencing using the BS 07.2025 model on MVSEP\n\n2. inferencing using my rifforge model\n\n3. ensembling both with a min\\_fft ensemble” - mesk\n\nit keeps the \"fullness\" of the rifforge model being an instrumental focused model but then also removes more stuff than my base model thanks to 07.2025”\n\n- Max Spec ensemble of Deux and inst\\_gaboxFlowersV10 - seems to yield good result for a metal album - mohammedmehditber\n\n- Gabox [inst\\_gaboxFlowersV10](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gaboxFlowersV10.ckpt) Mel-Roformer ([yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/v10.yaml)) | makidanyee's [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link) | uvronline via special link (for [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) accounts)\n\nInst. fullness 37.12, bleedless 36.68, SDR 16.95[’](https://mirror.mvsep.com/quality_checker/entry/9505) (all of these metrics are better than Inst\\_FV8b).\n\nThe yaml is unique from the previous ones. Deux model fine-tune.\nFor less buzzing, use phase fix with becruily voc model - prodbyluke ([phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm)).\n\nIncompatible with RTX 5000 UVR patch (users will encounter “ModuleNotFoundError: \"No module named 'torch.\\_dynamo[.polyfills.fx](http://.polyfills.fx)'\")\n\nChunk size for the flowers model in the Colab should be 352800 (yaml value, might be not evaluated).\n\nIt might have louder constant buzzing vs even inst\\_fv4 without phase fixer, but the FlowersV10 fixes some issues of crossbleeding existing in previous models, and also with some distorted vocals and screams (dtn, dca100fb8, Tobias51).\n\n“a huge improvement over what Gabox has accustomed us to! I got amazing results. It truly achieves a stability that previous models lacked; the clarity is much greater in these.” - billieoconnell.\n\n- (old) mesk’s “Rifforge” metal Mel-Roformer 14.01 fine-tune instrumental model (focused more on bleedless).\n\nInst. fullness: 28.49, bleedless: 42.38, SDR 16.67\n\n“training is still in progress, that's why it's a beta test of the model; It should work fine for a lot of things, but it HAS quirks on some tracks + to me there's some vocal stuff still audible on some tracks, I'm mostly trying to get feedback on how I could improve it” [known issues](https://discord.com/channels/708579735583588363/708580573697933382/1405719212592464003).\n\n<https://drive.proton.me/urls/5XM3PR1M7G#F3UhCU8RDGhX>\n\nBe aware that MVSep’s BS-Roformer 2025.07 can be better for metal both for vocal and instrumentals than these mesk’s models, a lot of the times. It was also trained on mesk’s metal dataset.\n\n- Custom model import Colab has currently some issues with it. Probably, using that old version will work (at least locally).\n\n\"My old MSST repo I'm using, but I removed all the training stuff\n\n<https://drive.proton.me/urls/P530GFQR4W#VCAsF0E1TPje>\n\npip install -r requirements.txt (u gotta have Python and PyTorch installed as well) for the script to work.\n\nYou just gotta put all the tracks you want to test on in the \\*\\*\"tracks\"\\*\\* folder then double-click on \\*\\*\"inference.bat\"\\*\\* to run the inference script\n\nits like if you were to type in the command in cmd but its simpler, and I'm lazy\" - mesk\n\n- Older Mesk metal Mel-Roformer preview instrumental model\n\nInst. fullness: 28.81, bleedless: 42.16, SDR 16.66\nRetrained from Mel Kim on metal dataset consisting of a few thousands of songs.\n\n<https://huggingface.co/meskvlla33/metal_roformer_preview/tree/main> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(previous metrics were made on private dataset)\n\n“Should work fine for all genres of metal, but doesn't work on:\n\n- hard compressed screams\n\n- some background vocals\n\n- weird tracks (think Meshuggah's \"The Ayahuasca Experience\")”\n\nP.S: Use the [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) “or training repo [(MSST)](#_2y2nycmmf53) if you want to [separate] with it. UVR will be abysmally slow (because of chunk\\_size [introduced since [UVR Roformer beta](#_6y2plb943p9v) #3])”\n\n- [Neo\\_InstVFX](https://huggingface.co/natanworkspace/melband_roformer/blob/main/Neo_InstVFX.ckpt) Mel-Roformer by neoculture | [yaml](https://huggingface.co/natanworkspace/melband_roformer/blob/main/config_neo_inst.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness 39.88, bleedless: 32.56, SDR: 14.35\nFocused on preserving vocal chops.\n\n“Great model (at least for K-pop it achieved the clarity and quality that no other model managed to have) it should be noted that it has a bit of noise even in its latest update, its stability is impressive, how it captures vocal chops, in blank spaces it does not leave a vocal record, sometimes the voice on certain occasions tries to eliminate them confusing them with noise, but in general it was a model that impressed me. It captures the instruments very clearly” - billieoconnell.\n\n“NOISY AF, this is probably the dumbest idea ever had for an instrumental model. Don’t use it as your main one, some vocals will leak because I added tracks with vocal chops to the dataset. Just use this model for songs that have vocal chops” - neoculture\nNew fine-tune is in works.\n\n- Unwa [BS-Roformer-Inst-EXP-Value-Residual](https://huggingface.co/pcunwa/BS-Roformer-Inst-EXP-Value-Residual) (uses Mel v2 model type in UVR; If it wasn’t made compatible with MSST already, replace bs\\_roformer.py from this [repo](https://github.com/lucidrains/BS-RoFormer/tree/main/bs_roformer) and\nfrom bs\\_roformer.attend import attend\n\n⇩\n\nfrom models.bs\\_roformer.attend import attend\n\nin bs\\_roformer.py file\n\ngenerally not very good model, but sometimes capable: “successfully removed [vocals] and kept the digital choir atmosphere as well” vs deux, inst\\_gaboxFlowersV10 and HyperACE but it it’s considerably slower than deux - mohammedmehditber)\n\n*Older/other models (still less muddy than vocal models)*\n\n0) [INSTV6](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/INSTV6.ckpt) by Gabox | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | x-minus | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. fullness 37.62, bleedless 35.07, SDR 16.43\n\nv1e still gives better fullness, but the noise in it is a problem\n\nOpinions are divided whether v5 or v6 is better.\n\n“Seems like a mix between brecuily and unwa's models”\n“Slightly better than v5 (...) less muddy and also removes the vocals without adding that low EQ effect when the vocals would come in, so I feel it's better” zzz\nOld viperx’ 12xx models have fewerproblems with sax.\n\n- [Inst\\_GaboxFVX](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/Inst_GaboxFVX.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness 38.25, bleedless 35.35, SDR 16.49\n\n“instv7+3” - fuller than instv3\n\n- Gabox [instv10](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/experimental) (experimental) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml)\n\nLess noise and vocal residues than V7, but muddier\n\n\\_\\_\n\n0) Gabox Mel-Roformer instrumental model “[inst\\_gabox.ckpt](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/intrumental_gabox.ckpt)” (Kim/Unwa/Becruily fine-tuned)\n\nGabox’ [models](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers) repo | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\ninst fullness 37.07 (better than unwa inst v1 and v2), bleedless 37.40 (better than v1e by 1.8, slightly worse than unwa’s v1)\n\n“It’s like the v1 model with phase fixer, but it gets more instruments,\n\nlike, it prevents some instruments from getting into the vocals”, “sometimes both models don't get choirs”.\n\n\\_\\_\\_\\_\\_\n\nList of [*fast inference models*](#_19elh4q7egl6)(small model size/potentially workable on not ancient CPUs without GPU acceleration\nor slow GPUs)\n\n\\_\\_\\_\\_\\_\n\n*Older fullness models*\n\n0) Gabox F/fullness v1 | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\ninst fullness 37.26 | bleedless: 37.19\n\n0) Gabox F/fullness v2 | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\ninst fullness 37.46 | bleedless: 37.09\n\n\\*) Gabox [inst\\_Fv4](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_Fv4) (F - fullness/v4) | (don’t confuse with vocal fv4) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml?download=true) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\ninst fullness 39.40 | bleedless 33.49\n\nDuplicate from the above\n\nOthers\n\n0) [intrumental\\_gabox](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/intrumental_gabox.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\n\\_\\_\n\n0) Gabox B/bleedless v1 instrumental [model](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gaboxBv1.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nInst. fullness 35.03, bleedless 39.10, SDR 16.49\n\n0) Gabox B/bleedless v2 instrumental [model](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gaboxBv2.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\nInst. fullness 35.09, bleedless 38.38, SDR 16.49\n\n([*Gabox models repo*](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers)*)*\n\n0) Cut your song into fragments consisting from the best moments of e.g. v1e/v1/v2 into one (and optionally Mel-Roformer Bas Curtiz FT on MVSEP as it will give you even less vocal bleeding, but more muddiness if necessary)\n\n0) Propositions of models for phase fixer to alleviate vocal residues (from the above)\n\na) Becruily voc with Becruily inst (muddy but very few residues if any)\n\nb) FT3 with V1e+\n\nc) Unwa Beta 6 with inst\\_gabox3 (although it might be less consistent than the top)\n\nd) Unwa Revive model is also good with any instrumental model\n\ne) Unwa Bigbeta5 used to be not bad either.\n\nf) Or any of the vocal models above with e.g. V1e (it's pretty full, and it might be not enough for it nevertheless)\n\nHow to use the phase fixer in UVR?\n\nSeparate with vocal model, then with instrumental model. Go to Audio Tools>Phase swapper, and use vocal model result as reference, and instrumental as target\n\n\\_\\_\n\n###### **>Ensembles**\n\n(for instrumentals; check out also [DAW ensemble](#_oxd1weuo5i4j) with the below)\n\nIf you find some phase fixer results (e.g. ensembled) unsatisfactory, use the Phase Fixer [Colab](https://colab.research.google.com/github/lucassantillifuck2fa/Music-Source-Separation-Training/blob/main/Phase_Fixer.ipynb) - it's tweaked for better results than UVR and standalone scripts. Phase fixing [instructions](#_j14b9cv2s5d9).\n\n0) Mel deux + HyperACE v2 instrum results (Max FFT/Spec) then\nphase fix the Ensembled inst results with Mel becruily vocal inst stems as reference (or Mel Kim if you don't mind taking out some instruments with it), with 100 for Low Cutoff and 100 for High Cutoff,\nthen Ensemble the outputs with BS 2025.07 from MVSep (Max FFT again; optional step)\n\n—->The best Ensemble right now imo - dca100fb8 (if not said otherwise)\n\nSome people complain that it can be noisy.\n\nNote: For deux, it will be not the muddier stem named \\_other in the Colab, but \\_instrumental\n\n“I just added BS 2025.07 because it finds more missing instruments without adding bleed\n\nAnd I chose not to phase fix the final result (which include BS 2025.07) because otherwise \"becruily vocal\" would remove the BS 2025.07 difficult instruments\n\nSo doing it before is good”\n\n0) Mel deux + BS HyperACE v2 instrum (Max FFT) with phase fix using becruily vocals as reference and 100/100 for the low cut and high cut values\n\n—->Former best\n\n0) BS-Roformer HyperACE Inst v2 + 2025.07 (Max FFT) using 100/100 as values for phase fixing and 2025.07 as reference\n\n—->Former best\n\n0) BS HyperACE + BS 2025.07 (Max FFT with BS 2025.07 as phase fix reference and 3000/5000 for the values)\n\n—->My [former] favorite ensemble right now. Though, I notice it produces vocal bleed sometimes and it can be noisy at some parts of the songs while the noise might be totally absent in other parts of the song.\n\n0) BS-Roformer Resurrection Inst (phase fixed with BS 2025.07 using Low Cutoff 3000 and High Cutoff 5000) + BS 2025.07 (Max Spec)\n\n—->Former best\n\n0) Unwa Mel inst v1e+ with FNO with Becruilly inst (Max Spec)\n\n—-> The best result back then\n\n- sakkuhantano\n\n0) Unwa Mel inst V1e + MVSEP BS 2025.07 (Max Spec) using BS 2025.07 as phase fix reference with 100/100 (it's better) as Low Cutoff and High Cutoff values\n\n—->It's a very aggressive value because V1e is noisy, and it works quite well.\n\nThe older best for then, “BS Roformer Resur Inst [ensemble right below] is muddy compared to v1e, and I think fullness is the way. After phase fix the noise is barely noticeable”\n\n0) Unwa BS Roformer Resurrection Inst (BS 2025.07 as a reference for phase fix) + MVSEP BS Roformer 2025.07 (Max Spec)\n\n—->The least vocal crossbleeding (step-by-step process explained [here](#_8rocw7cwj55))\n\nAlternatively, you can use becruily vocal model instead of 2025.07 for the ensemble -\n“Becruily vocal correctly recognize instruments far better than the instrumental one”\n\n(Note: BS 2025.07, BS 2024.04, BS 2024.08 and SW were worse for Resurrection model as a phase fix source, viperx BS-Roformer 1297 better, but not so good for instruments preservation as BS-2025.07)\n\n0) unwa v1e + Mel becruily vocal (Max Spec) + phase fix (using becruily vocal again as a source)\n\n—->The best instruments preservation (with more possible crossbleed)\n\n0) Mel Gabox Fv7z + BS 2025.07 (Max Spec)\n\n—->The least amount of noise (with more possible crossbleed)\n\n0) Mel Gabox Inst V8 + BS 2025.07 (Max Spec) + phase fix (becruily vocal as reference)\n\n—->A good balance between presence of noise and level of fullness\n\n(occasional vocal crossbleeding)\n\n0) Mel Becruily Instrumental (with phase fix, becruily vocal as reference) + SCNet XL IHF (Max FFT)\n\n—->Why SCNet? Because it's better than Mel Roformer at the low frequencies, so why not ensemble both arch. (...) SCNet is already noisy from the start so the fullness models are even noisier obviously\n\n0a) Unwa v1e+ + BS-Roformer 12xx by viperx (Max Spec) - musicalman\n\n0a) FNO inst by unwa + BS-Roformer 12xx by viperx (might be optional) + v1e+ (or becruily inst Mel-Roformer)\n\n“beware of the song where there's a vocal at the beginning of the song, using v1e+ will leave vocal residue. So decide to change into becruily inst as well.” - Sakkuhantano\n\n0\\*) Mesks’s metal min\\_fft ensemble of BS 07.2025 model on MVSEP + rifforge [model](https://huggingface.co/meskvlla33/rifforge/tree/main)\n“it keeps the \"fullness\" of the rifforge model being an instrumental focused model but then also removes more stuff than my base model thanks to 07.2025”\n\n0\\*) Chained separation method by fabio06844 for “very clean and full” instrumental.\n\n1) Go to MVSep and separate your song with the latest Karaoke BS-Roformer by MVSep Team\n\n2) On its instrumental stem result use DEBLEED-MelBand-Roformer (by unwa/97chris)\n\n([model](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/bleed_suppressor_v1.ckpt) | [yaml](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/config_bleed_suppressor_v1.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb))\n\n(despite the fact that “the MVSep Team Karaoke uses the MVSep BS model to extract/remove vocals, then applies [the] karaoke model to that”, it was told to be not enough to just use BS 2025.07 model instead, leaving a little more residues).\n\n0b) v1e phase swapped from Becruily vocals + BS 2025.07 (Max Spec)\n(Phase fixer [Colab](https://colab.research.google.com/github/lucassantillifuck2fa/Music-Source-Separation-Training/blob/main/Phase_Fixer.ipynb)/or UVR’s phase swapper+MVSEP separation>UVR Manual Ensemble)\n\n(“brings: max fullness without a lot of noise since phase fix, rarely missing instruments, no robotic voice problem, rare vocal crossbleeding in instrumental ”) - older favourite dca100fb8’s ensemble\n\n0b) v1e + Becruily vocal (Max Spec) (“If you had to keep one ensemble right now. v1e+ unfortunately is muddier than v1e and has that robotic issue sometimes”) - dca100fb8\n\n~~0b) v1e + Becruily inst + Becruily vocal (Max Spec)~~ (Becruily inst turned out to be “useless” in this bag of models) - -||-\n\n0b) Unwa v1e (with phase fix) + BS Large V1 (Max Spec) (doesn't need [Becruily vocal](https://huggingface.co/becruily/mel-band-roformer-vocals/tree/main) here as the third) - -||-\n\n0) v1e + INSTV7 (Max Spec) - neoculture\n\n0) Use MVSEP’s SCNet XL high fullness below 1000 Hz, and unwa’s v1e above 1000 Hz, and join the two in e.g. Izotope RX - “You can use vertical select in RX with feathering set to 1.00” (heuhew) or you can use linear phase EQ like e.g. free “lkjb QRange” (ensure to not overlap frequencies in the output spectrogram in the crossover point)\n\n0) v1e+ + becruily inst\n\n—> fixes some missing instruments occasionally in v1e+\n\n- Sakku\n\n0b) INSTV7 + Inst\\_FV8 (to check)\n\n0b) unwa’s instv1e+, instv1+, instv2 and inst gabox, instv8 and instv7 - max FFT\n\n“Then, I upscaled using Apollo. Afterward, I applied [Mel-Roformer] de-noise to remove background noise as needed and performed mastering” (Sir Joseph)\n\n0b) Max Spec manual ensemble of: v1e+ + MVSEP BS-Roformer 2025.07 model\n\n- It’s a Senn’s method below, simplified (IntroC)\n\n“I think (...) [it] would be good enough. The way v1e+ keeps the noise is usually fine, and the mvsep 2025.07 model should bring back the lost masked frequencies for v1e+. Otherwise, just adjust the weight of v1e+ for maxspec to reduce the noise”\n\n0b) Max Spec manual ensemble of: v1e+ + MVSEP BS-Roformer 2025.07 model + Becruily inst\n\n- Senn’s method simplified (IntroC/Sakku)\n\n“sometimes becruily can catch a tiny instr while v1e+ can't” - Sakku\n\n\\*) Senn’s OG method:\n\nUse the highest SDR BS Roformer model on MVSEP and the best Fullness Melband Ro-Former model (unwa instrumental v1e plus) - mixed both with one of them phase-inverted, then use Soothe2 to filter out resonants, leaving mostly only noise, further filter and mix them, use a few plugins to do a spectral flattening, very minor.\n\nIn other words:\n\n“Pass the music through BS RoFormer's best SDR and the best Fullness, which is Unwa Instrumental v1e plus\n\nin iZotope RX, Invert the phase of any of the tracks, and copy n paste to the other track, the result should be some ghastly sounding reverb of the vocals\n\nusing iZotope RX's Deconstruct, you wanna filter out the tonal signal of the voice as to remove the more obvious \"sinusoidal\" signals. It has to be fairly subtle as to not damage the noisy residuals\n\nNow with Soothe2, you wanna filter out any of the more aggressive noisy components, I use this setting, but it might not work 100% for everyone https://imgur.com/2A5yn3c (You can replace this with any plugin that acts similar to Soothe2, but Soothe2 is the best compromise)\n\nIf needed, you can use Deconstruct as well but reducing the noisy aspects just to wipe out that aggressive noisy artifact\n\nMess with the gain, and then add it back to the BS RoFormer track (don't forget to invert the phase again), Ideally it should be fairly subtle\n\nFor post-processing:\n\nI use MSpectralDynamics to add a slight spectral flattening to the track, and Unchirp to denoise very slightly the higher frequencies to remove that digital hissy artifact and also tighten more of the sound\n\nA very subtle Gullfoss can also brighten the track slightly as well to compensate\n\nhere is the result” (thx senn)\n\nOlder ensembles (from before becruily inst/voc Mel models release)\n\nUVR>Audio Tools>Manual ensemble (for models from outside UVR)\n\n0) Unwa’s v1e inst + phase fixer/swapper (from Mel-Kim or sometimes Mel 2024.10 on MVSEP for less noise) + BS 2024.08 on MVSEP (Max Spec)\n(fullness with less noise + retrieved missing wind instruments from v1e) - dca100fb8\nBecruily vocal is even better at recognizing instruments compared to Mel 2024.10 or BS 2024.08 (vocal Roformer models have fewer problems with recognizing instruments than inst Roformers)\n\n0) Other dca’s Max Spec ensembles of v1e with other Roformer models\n\nII) v1e + phase fixer/swapper + Mel 1143 (because of fullness with less noise + retrieved missing wind instruments from v1e)\n\nIII) v1e + phase fixer/swapper + BS 1296 (because of fullness with less noise + retrieved missing wind instruments from v1e)\n\nIV) v1e + phase fixer/swapper + BS 1297 (because of fullness with less noise + retrieved missing wind instruments from v1e)\nV) v1e + phase fixer/swapper + BS Large V1 (because of fullness with less noise + retrieved missing wind instruments from v1e)\nRecommended (or with v2/v1 instead) esp. when using a phase fixer due to bleed of instruments in the vocal track in Kim or its fine-tunes used for the tool.\n\nVI) (extra) for slow CPU/GPU: Voc FT + HQ5 (Max Spec)\n\nVII) HQ\\_5 with UVR’s Phase Rotate (which now can replace the above)\n\nVIII) v1e + BS 2025.06 (Max Spec) - manual ensemble - the latter on MVSEP (because it keeps instruments in instrumental correctly [though less than becruily vocal] and it has less vocal crossbleeding in instrumental compared to becruily vocal)\n\n0) Unwa’s Inst V2 and Inst Gabox (Avg) (1120 segments [6GB GPU]/4 overlaps) - cypha\\_sarin\n\n0) Max ensemble of: instv1, instv2 and inst v1e - erdzo125\n(better fullness than inst v1e itself, but more noise)\n\n0b) Models ensembled - available only for premium users on [mvsep.com](https://mvsep.com/)\n\nNow also added “instrumental high fullness” variant for inst, voc ensemble.\n\nFor example, some lower inst, voc SDR ensembles available might be less muddy than 11.50 (e.g. 10.44), but the 11.50 one has the fewer amounts of vocal residues according to **bleedless** metric, but it can also sound very filtered. Newer ensembles added since then (track [the leaderboard](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum) and click on entries to see also bleedless/fullness metrics).\n(ensembles on MVSEP provide currently the best of SDR scores for 2 and 4 stem separators, higher SDR than free v.2.4/2.5 Colabs below; 2025.06.28 has currently the biggest SDR metric, and surpassed ByteDance private model)\n\nThere are shorter queues for single model separation for registered users with at least one point.\n\n*Possibly shorter queues between 10:00 PM - 1:00 AM UTC.*\n\n*The ensemble option fixes some issues with general muddiness of older vocal Roformer models (but 11.50 is muddier than v. 2.4 Colab).*\n\n0) [KaraFan](#_7kniy2i3s0qc) (e.g. preset 5; fork of original ZFTurbo's MDX23 fork with new features by Captain FLAM with jarredou's help on some tweaks), [offline version](https://github.com/Captain-FLAM/KaraFan/tree/master), org. [Colab](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) and [Kubinka](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio_Separation/KaraFan.ipynb) Colab (older version, less vocal residues vs. v.3.1, although v.3.2-4.2/+ were released with fewer residues).\n\nUsed to be one of the best free solutions for instrumentals (before some newer Roformers like unwa’s inst [v1](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) were released), with not big amounts of vocal residues (sometimes more than below), and clear outputs. But no 4 stems unlike below:\n\n0a) *MDX23* by ZFTurbo (weighted UVR/ZF/VPR models) -\n\nfree modified Colab fork v. [2.1](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb) - [2.4](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb), [2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb) with fixes and enhancements by jarredou\n\n(one of the best SDR scores for publicly available 2-4 stem separator,\n[v2.2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) Colab with fullband MDX23C model might have more residues in instrumentals vs v. 2.1, but better SDR, [2.7b](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/2.7/MVSep_MDX23_Colab_2_7_Version_Updated.ipynb) (with SCNet XL, not SDR evaluated - weight set by ear), [2.3](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb), 2.4 with also 12xx BS-Roformer, v. 2.5 with also Kim’s Mel-Roformer (default settings can be already good and balanced, and weights further adjusted, [read](#_jmb1yj7x3kj7) for more settings.\nThe caveat - it haven’t been updated by newer Roformer instrumental models at the top).\n\n0a) dango.ai ([tuanziai.com/en-US](http://tuanziai.com)) - 8$ every 10 songs, currently one of (if not) the best instrumental separator so far; at least till unwa inst models came to the level of fullness of Dango now (you might find the latter even too muddy), but in cost of more noise, although “it can hande complex vocals/songs well so it's more reliable, and no vocal bleed in background of instrumental”.\nDango’s 10 Conservative mode give more fullness to instrumentals in cost of whispering artefacts (experimental for the time being - along with the Aggressive mode), and it doesn’t fix vocal popping using Smart Mode (default). Now Dango 11 is available. More crossbleeding than Unwa Bs-Roformer inst. What changed for better is less noise and better instrument detection\n\nEnsembles (from before becruily models release; less noisy single models later below)\n\n0b) Avg Spec ensemble of unwa inst v1 and v2\n\n0b) Min Spec manual ensemble of vocals stems from these models>inversion with the original song (fuller, more noise)\n(UVR>Audio Tools>Manual Ensemble)\n\n0b) Max ensemble of: unwa’s v1e + Mel 2024.10 + BS 2024.08 - dca100fb8\n(older bag of models; 2024.08 on MVSEP; fixes the flute and trumpet issues)\n\n0b) Separate the song twice - first with v1e, then with Unwa’s BS-Roformer Large and do a Manual Max Spec Ensemble via UVR - dca100fb8\n(old; BS-Roformer is here to retrieve the missing instruments from v1e result, though BS 2024.08 & Mel 2024.10 on MVSEP work better for this task already)\n *(Ensembles from before unwa inst. models release - you might also try out replacing all the 12xx BS-Roformers below with newer unwa’s/becruily/Gabox models)*\n\n0b) Models ensembled - available only for premium users on [x–minus.pro](https://x-minus.pro/)\n\n- max\\_mag ensemble (with viperx 1297 Roformer)\n\n- demudder (on Mel-Roformer)\n\n- Mel-Roformer + MDX23C\n\nUVR 5 ensembles (although beta 4 and inst [v1](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) on their own might be better already)\n\n*For Roformers, min. RTX 3050 8GB or faster AMD/Intel ARC/Apple M1-3 recommended*\n\n*(OpenCL is not as fast as CUDA in UVR; 6GB VRAM on CUDA* [*should*](https://twitter.com/izutorishima/status/1816204774246879725) *be enough too, min. 2K+ CUDA cores recommended)*\n\n0b) 1296 + 1143 (BS-Roformer in [beta](#_6y2plb943p9v) UVR) + MDX-Net HQ\\_4 (dopfunk)\n\n[potentially try out Mel Kim instead of 1143 above already]\n\n0b) Manual ensemble (in UVR’s Audio Tools) of:\n\nBS-Roformer 1296 + file copy of the result + MDX23C HQ (jarredou; [src](https://discord.com/channels/708579735583588363/708579735583588366/1228818808261840976))\n\nor just 1296 + 1297 + MDX23C HQ for slower separation and similar result\n\n0b) Manual ensemble of:\n\n- BS-Roformer 1296 + drums stem from demucs\\_ft or\n\n- Bs-Roformer 1143 result passed through demucs\\_ft for drums to ensemble with 1296 (max/max)\n\n0b) MDXv2 HQ\\_4 + BS-Roformer 1296 + BS-Roformer 1297 + Melband RoFormer 1143 (Max Spec) “Godsend ensemble for demuddiness” (dca100fb8)\n\n0b) Manual ensemble of [HQ\\_5](https://uvronline.app/ai?discordtest&test-mdx) ([paid users](https://uvronline.app/ai?hp&test-mdx)) and Kim's Mel-[Roformer](#_6y2plb943p9v) (max\\_spec)\n\n0b) (for metal) “1 – pass through Kim's Vocal Melband Roformer (link in Single models below)\n\n2 - Multi-stem Ensemble (Average algo):\n\n1\\_HP-UVR\n\nMGM\\_HIGHEND\\_v4\n\nMGM\\_LOWEND\\_A\\_v4\n\n(VR advanced settings: 320 window size, 5 aggression setting, batch size default, TTA enabled | Post Process and High-End Process CHECKED OFF)”\n\n3 – Manual Ensemble both your Melband output and the Multi-stem instrumental output (with Average algorithm)\n\n“best settings/models for metal” (~mesk)\n\n0b) (older version of the above) 1\\_HP\\_UVR + UVR\\_MDX-NET-Inst HQ 4 + UVR\\_MDX-NET-Inst\\_Main 438 ([VIP](#_ma01ud7qwboo) model)\n\n(Min Spec / Average, WS 512, TTA Enabled, Post-Process and High-End Process off)\n\n0b) 9\\_HP2-UVR and Kim Mel-Roformer (newer one for metal; mesk)\n\nbut not in multi-stem cos you need 3 or more models\n\nVR: 320 window size, 1 aggression setting, Default batch size, TTA enabled (post process and high-end process isn't enabled)\n\n0b) Mateus Contini's [method](#_79cxg1a64b11) e.g. #2 or #4\n\n0b) MDX-Net Kim Vocals 1\n\nMDX-Net MDX23C-InstVoc D1581\n\nMDX-Net MDX23C-InstVocHQ 2\n\nMDX-NET UVR-MDX-NET-Voc\\_FT\n\nDemucs v4 | htdemucs\\_ft\n\navg/avg>\"Vocal Splitter Options\" and choose \"VR Arc: 5\\_HP-Karaoke-UVR\"\n\n0b) 9\\_HP2-UVR + BS-Roformer 1297\n\n0b) BS-Roformer ver. 2024.08 + MelBand Roforrmer (Bas Curtiz edition) + MDX-Net HQ4 + SCNet Large, Max Spec Ensemble (dca100fb8)\n\n0b) BS-Roformer ver. 2024.08 + MelBand Roforrmer (Bas Curtiz edition) (Max Spec Ensemble) --> result. Result + MDX-Net HQ4 + SCNet Large (Average Ensemble) - -||-\n\n0b) 1297 (ev. 1296) + MDX23C HQ2 (CZ-84)\n\n[or potentially unwa’s BS-Roformer instead of 12xx]\n\n*See also* [*DAW ensemble*](#_oxd1weuo5i4j) *(older ensembles later below)*\n\n(more about) unwa’s instrumental Mel-Roformer v1 model | MVSEP | x-minus.pro\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d) | UVR [instructions](#_6y2plb943p9v)\n\n\"much less muddy (..) but carries the exact same UVR noise from the [MDX-Net v2] models\"\n\nBut it's a different type of noise, so aufr33 denoiser won't work on it.\n\n“you can \"remove\" [the] noise with uvr denoise aggr -10 or 0” although with -10 it will make it sound more muddy like Kim model and synths and bass are sometimes removed with the denoiser (~becruily)\n\n“ if there is any voice left [or also background noise], use the Mel-Roformer de-noise with minimal aggression.\n\nThis inst model “doesn't eliminate vocoder voices well from an instrumental”.\n\nFor the noise in the model, vs the ensemble trick on x-minus using Mel-Roformer de-noise might be better alternative:\n\n“removes more noise from the song keeping overall instrument quality more than the new button [on x-minus]” koseidon72. But the more aggressive variant of the Mel model sometimes deletes parts of the mix, like snares. UVR-Denoise-Lite doesn’t seem to damage instruments like non-lite UVR-Denoise in UVR, but still more than Mel denoise (recommended aggr. - 4, with 272 vs 512 windows size it’s less muddy, TTA can stress the noise more, somewhere above 10 aggr. it gets too muddy). UVR-Denoise on x-minus is even less aggressive (it’s medium aggression model for free users who don’t have aggression pick), but it might catch ends of some instruments like bass occasionally. Premium minimum aggression model is somehow more muddy, but doesn’t damage instruments.\n\nFor more muddy Roformers consider using Aufr’s [demudder](#_sv6j1ndk4oq5) (it’s used for premium on x-minus for Kim Mel model) although it might increase vocal residues, and UVR demudder (explained there later below).\n\n*\\_\\_\\_\\_*\n\n###### *Muddier but cleaner single models* (Roformer vocal models with fewer instrumental residues vs instrumental models without necessity of using phase fixer)\n\n0c) MVSep BS-Roformer (2025.07.20)\n\nInst. fullness 27.83, vleedless 49.12, inst SDR 18.20\n\nProbably a retrain of the SW model on a bigger dataset.\n\n0c) BS-RoFormer SW 6 stem (MVSEP/Colab/undef13 splifft) / vocals only\n\nInst. fullness 27.45, bleedless 47.41, inst SDR 17.67\n\n(use inversion from vocals and not mixed stems for better instrumental metrics)\n\nKnown for being good on some songs previously giving bad results.\n\n0c) 10.2024 Mel-Roformer vocal model on MVSEP\nInst. fullness 27.84, bleedless 47.37, inst SDR 17.59\n\nThe cleanest, but muddy compared to models trained for instrumentals\n\nCapable of detecting sax and trumpet, but still muddier than instrumental models above.\nBas Curtiz vocal model fine-tuned by ZFTurbo.\n\n0c) Gabox [voc\\_fv4](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv4.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nGood for anime and RVC purposes (codename)\n\nAnd also for instrumentals, if you need less vocal residues than typical instrumental Roformers (even less than Mel Kim, FT2 Bleedless, or Beta 6X - makidanyee).\n\n0c) Beta 6X\n\nSome people might even favored it over Resurrection inst - cyclorana\n\n“the best universal vocal/instrumental model I’ve ever seen TBH” - NateTheGrate\n\n0c) Unwa’s Kim Mel-Band Roformer FT2 | [download](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. fullness: 28.36, bleedless: 45.58\n\nA decent all-rounder too, sometimes less bleeding in instrumentals than 5e, although a bit worse transients.\n\n0c) Unwa’s beta 5e model originally dedicated for vocals | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR instr](#_6y2plb943p9v)\nModel [files](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main) | yaml: big\\_beta5e.yaml or [fixed](https://drive.google.com/file/d/1YRv1j0zMs9hk3-On2z6uwfZbsQ7l1LFP/view?usp=sharing) for AttributeError in UVR\nInst. fullness: 27.63 (bigger than Mel-Kim) | bleedless 45.90 (bigger than Kim FT by unwa, worse than Mel-Kim) | Inst. SDR 16.89\n\nMainly for vocals, but can be still a decent all-rounder deprived of noise present in unwa’s inst v1e/v1/v2 models, also with fewer residues than in Kim FT by unwa, and also more consistent model than Kim Mel model in not muffling instrumental a bit in sudden moments.\nThe third highest bleedless instrumental [metric](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?gid=1468543363#gid=1468543363) after Mel-Kim model (after unwa ft2 bleedless in [vocals](#_n8ac32fhltgg)).\n\nIt seems to fix some issues with trumpets in vocal stem (maxi74x1).\nIt handles reverb tails much better (jarredou/Rage123).\n\nNoisier/grainier than beta 4 (a bit similarly to Apollo lew's vocal enhancer), but less muddy.\n“The noise is terrible when that model is used for very intense songs” - unwa\n\nPhase fixer for v1 inst model doesn’t help with the noise here (becruily).\n“It's a miracle LMAO, slow instrumentation like violin, piano, not too many drums...\nit's perfect... but unfortunately it can't process Pop or Rock correctly” gilliaan\n“the vocal stem of beta5e may have fullness and noise level like duality v1, but it may also suffer kind of robotic phase distortion, yet may also remove some kind of bleed present in other melrofo's.” Alisa/makidanyee\n“particularly helpful when you invert an instrumental and then process the track with it.” gilliaan\n\n0c) Unwa Kim Mel-Band Roformer Bleedless FT2 | [download](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)\n\n0c) Bas Curtiz' edition Mel-Roformer vocal model on MVSEP\n\n(it was trained also on ZFTurbo dataset)\n\n“Music sounds fuller than original Kim's one & the finetuned version from ZFTurbo [iirc below]. Even [though] the SDR is smaller than BS Roformer finetuned last version, but almost song has the best result in instrumental.” Henri\n\nIt can struggle with trumpets more than the other Mel-Roformer on MVSEP [whether 08.2024 or Mel-Kim, can’t remember].\n\n0c) BS-Roformer 2024.08.07 vocal model on MVSEP\n\nInst. fullness 26.56 (less than Mel-Kim), Inst. bleedless 47.48 (the only single model with that better metric than Mel-Kim)\n\nInst SDR 17.62\n\nvs 2024.04 model +0.1 SDR and “it seems to be much better at taking the vocals when there are a lot of vocal harmonies” also good for Dolby channels.\n\nCapable of detecting flute correctly\n\n0c) Mel-Roformer vocal model by KimberleyJSN - [model](https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt?download=true) | [config](https://drive.google.com/file/d/1U1FnACm-ontQSjhneq-WKk1GHEiTW97s/view?usp=sharing) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nInst. fullness 27.44 (worse than beta 5e and duality, but better than current BS-Roformers)\n\nInst. bleedless 46.56 (the best [metric](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?gid=1468543363#gid=1468543363) from public models)\n\nInst SDR 17.32\n\nIt became a base for many Mel-Roformer fine-tunes here.\n\n(works in UVR [beta Roformer](#_6y2plb943p9v)/[Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)/[CML inference](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model)/[x-minus](https://x-minus.pro/)/[MDX23 2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb) (when weight is set only for Mel model)/simple model [Colab](https://colab.research.google.com/drive/1tyP3ZgcD443d4Q3ly7LcS3toJroLO5o1?usp=sharing) (might have problems with mp3 files)\n\nIt’s less muddy than older viperx’ Roformer model, but can have more vocal residues e.g. in silent parts of instrumentals, plus, it can be more problematic with wind instruments putting them in vocals, and it might leave more instrumental residues in vocals. SDR is higher than viperx model (UVR/MVSEP) but lower than fine-tuned 2024.04 model on MVSEP.\n\n0c) Unwa Revive 2 BS-Roformer (“my first impression is it may have less low end noise than fv4 but not the best in the overall quality and amount of residues in vocal” - makidanyee)\n\n0c) BS-Roformer Large vocal model by unwa (viperx 1297 model fine-tune) [download](https://drive.google.com/file/d/1Q_M9rlEjYlBZbG2qHScvp4Sa0zfdP9TL/view)\nOlder BS model. It picks more instruments than 12xx models. More muddy than Kim’s Roformer, a bit less of vocal residues, a bit more artificial sound. Also tends to be more muddy than viperx 1297, sometimes muffling instrumental at times, but a bit less of vocal residues, a bit more artificial sound/a bit less musical. Sometimes it has more vocal residues than beta 5e.\n\nCompared to BS, Mel-Roformers can be a good balance between muddiness and clarity for some instrumentals.\n\nCompared to ZFTurbo (MVSEP) and viperx models, Kim’s trained on Aufr33’s and Anjok’s dataset.\n\nUVR manual model installation (Model install option added in newer [patches](#_6y2plb943p9v)):\nPlace the model file to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models and the config to model\\_data\\mdx\\_c\\_configs subfolder and “when it will ask you for the unrecognised model when you run it for the first time, you'll get some box that you'll need to tick \"Roformer model\" and choose its yaml” some models here are available in Download Center too.\n\nOther unwa fine-tunes (originally vocal models)\n\n0c) Mel-Roformer Kim | FT (by unwa) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n<https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main>\n\nInst. fullness 29.18 (lower than only unwa inst models)\n\nInst. bleedless 45.36 (lower than Beta 5e)\n\nInst. SDR 17.32\n\nHas more vocal residues than Beta 5e\n\n- Aname Mel-Roformer ~~duality~~ [model](https://huggingface.co/Aname-Tommy/Mel-Band-Roformer_Duality) .\nIt’s focused more on bleedless than fullness metric contrary to the unwa’s duality v2 model, but with bigger SDR.\n\nInst. fullness 24.36, bleedless 46.52, SDR: 17.15\n\n- Mel-Roformer unwa’s inst-voc model called “duality v1/2” (focused on both instrumental and vocal stem during training; two independent and not inversible stems inside one weight file).\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-InstVoc-Duality> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MVSEP](https://mvsep.com/)\nV1: Inst fullness 28.03, bleedless 44.16, SDR 16.69.\n\nV2: Inst SDR 16.67\n\nOutperformed in both metrics by the unwa’s Kim FT.\n\nVocals sound similar to beta 4 model, instrumentals are deprived of the noise present in inst v1/e models, but in result, they don't sound similarly muddy to previous Roformers.\n\nCompared to beta 4 and BS-Roformer Large or other archs’ models, it has fewer problems with reverb residues, and vs v1e, with vocal residues in e.g. Suno AI songs.\n\"[other](https://mvsep.com/quality_checker/entry/7321)\" is output from model, \"[Instrumental](https://mvsep.com/quality_checker/entry/7322)\" is inverted vocals against input audio.\n\nThe latter has lower SDR and more holes in the spectrum, using MSST-GUI, leave the checkbox “extract instrumental” disabled for duality models (now it’s also in the Colab with “extract\\_instrumental” option) and probably for inst vx models.\nYou can use it in the Bas Curtiz’ [GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) for ZFTurbo script or with the OG ZF’s repo code.\n\n- unwa’s Mel-Roformer fine-tuned beta 3 (based on Kim’s model)\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)\nInst SDR: 17.30\n\nSince beta 3 there’s no ringing issues in higher frequencies like in previous betas.\nSometimes better for instrumentals than beta 4 - but tends to be too muddy at times, but with fewer vocal residues than beta 5.\n\n- unwa’s Mel-Roformer beta 4 (Kim’s model fine-tuned)\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main> | [Colab](https://colab.research.google.com/drive/1e9dUbxVE6WioVyHnqiTjCNcEYabY9t5d)\n\nOutperformed in both metrics by beta 5e.\n\nBe aware that the yaml config is different in this model.\n\n“Metrics on my test dataset have improved over beta3, but are probably not accurate due to the small test dataset. (...) The high frequencies of vocals are now extracted more aggressively. However, leakage may have increased.” - unwa\n\n“one of the best at isolating most vocals with very little vocal bleed and still doesn't sound muddy” Can be a better choice on its own than some ensembles.\n\n0c) SCNet XL (vocals, instum)\n\nInst SDR: 17.2785\n\nVocals have similar SDR to viperx 1297 model,\n\nand instrumental has a tiny bit worse score vs Mel-Kim model.\n\n0c) Older SCNet Large vocal model on MVSEP\n\n“just like the new BS-Roformer ft model, but with more bleed. [BS] catches vocals with more harmonies/bgv” - isling. “it's like improved HQ4” - dca100fb8\nIssues with horizontal lines on spectrogram.\n\n0d) Aname Mel [model](https://huggingface.co/Aname-Tommy/Mel_Band_Roformer_Full_Scratch) trained from scratch a.k.a. Full Scratch\n\nInst. fullness: 25.10, bleedless: 37.13\n\n###### *Models for older archs*\n\n0c) MDX23C 1666 model exclusively on mvsep.com\n\n(vocal Roformers are much more muddy than MDX23C/MDX-Net in general, but can be cleaner)\n\n0c) MDX23C 1648 model in UVR 5 GUI (a.k.a. MDX23C-InstVoc HQ / 8K FFT) and mvsep.com, also on x-minus.pro/uvronline.app\n\nBoth sometimes have more bleeding vs MDX-Net HQ\\_3, but also less muddiness.\n\nPossible horizontal lines/resonances in the output - fix DC offset and or use overlap “starting from 7 and going multiples up - 14 and so on.” Artim Lusis\n\n0c) MDX23C-InstVoc HQ 2 - [VIP](https://buymeacoffee.com/uvr5/vip-model-download-instructions) [model](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/MDX23C-8KFFT-InstVoc_HQ_2.ckpt) for UVR 5. It's a slightly fine-tuned version of MDX23C-InstVoc HQ. “The SDR is a tiny bit lower, but I found that it leaves less vocal bleeding.” ~Anjok\n\nIt’s not always the case, sometimes it can be even the opposite, but as always, all may depend on a specific song.\n\n0d) MDX-Net HQ\\_4/3/2 (UVR/MVSEP/x-minus/[Colab](https://colab.research.google.com/drive/1GwMEjhczFzdS0Ld7eZzMcZgEmz6Jgv6m)/[alt](https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb)) - small amounts of vocal residues at times, while not muffling the sound too much like in old BS-Roformer v2 (2024.02) on MVSEP, although it still can be muddy at times (esp. vs MDX23C HQ models), HQ\\_4 tends to be the least muddy out of all HQ\\_X models (although not always), and is faster than HQ\\_3 and below, it tends to have less vocal residues vs MDX23C.\n\nFinal MDX-Net HQ\\_5 seems to be muddier for instrumentals, although slightly less noisy, but better for vocals than HQ\\_4.\n\n0d) MDX HQ\\_5 final model in UVR (available in its Download center and [Colab](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb))\nVersus HQ\\_4, less vocal residues, but also muddier at times and a bit lower, 21,5kHz cutoff.\nSometimes even more muddy than narrowband inst 3 to the point it can spoil some hi hats occasionally.\nVersus unwa’s v1e “HQ5 has less bleed but is prone to dips in certain situations. (...) Unwa has more stability, but the faint bleed is more audible. So I'd say it's situational. Use both. (...) Splice the two into one track depending on which part works better in whichever part of the song is what I'd do.” CC Karaoke\n\n[Model](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-MDX-NET-Inst_HQ_5.onnx) | config: \"compensate\": 1.010, \"mdx\\_dim\\_f\\_set\": 2560, \"mdx\\_dim\\_t\\_set\": 8, \"mdx\\_n\\_fft\\_scale\\_set\": 5120\n\n0d) MDX HQ5 beta model on uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)\n\nGo to \"music and vocals\" and there you will see it\n\nIt's not a final model yet, the model was in training since April.\n\nIt seems to be muddier than HQ\\_4 (and more than Kim’s and MVSEP’s Mel-Roformer), it has less vocal bleeding than before, but more than Kim Mel-Roformer.\n\n\"Almost perfectly placed all the guitar in the vocal stem\" it might get potentially fixed in the final version of the model.\n\n0e) Other single MDX23C full band models on [mvsep.com](https://mvsep.com/) (queues for free unregistered users can be long)\n\n(SDR is better when three or more of these models are ensembled on MVSEP; alternatively in UVR 5 GUI’s via “manual ensemble” of single models (worse SDR) or at best, weighted manually e.g. in DAW, but the MVSEP “ensemble” option is specific method - not all fullband MDX23C models on MVSEP, that’s including 04.24 BS-Roformer model are available in UVR)\n\n- BS-Roformer model ver. 2024.04.04 on MVSEP (further trained from viperx’ checkpoint on a different dataset). SDR vocals: 11.24, instrumental: 17.55 (vs 17.17 in the base viperx model). Bad on sax. Less muddy than the three below.\n\nThough, all might share same advantages and problems (filtered results, muddiness, but the least of residues)\n\n- Mel-Roformer model ver. 2024.08.15 on MVSEP (fine-tuned on prob. Kim’s model)\n\n- BS-Roformer 12xx models by viperx model in UVR [beta](#_6y2plb943p9v)/MVSEP and x-minus (struggles with saxophone too, but less (also vs Gabox inst v6), also struggles with some Arabic guitars, bad on vocoders)\n\n“does NOT pick up on large screams that much (example being Shed by Meshuggah in my tests), well at least [vs] [kim’s] x-minus mel-rofo”\n\n1297 variant is being used on x-minus. It tends to be better for instrumentals than the 1296 model.\n\n- Older BS-Roformer v2 model on MVSEP (2024.02) (a bit lower SDR)\n\nAll vocal Roformer models may sound clean, but filtered at the same time - a bit artificial [it tends to be characteristic of the arch], but great for instrumentals with heavy compressed vocals and no bass and drums - the least amount of residues and noise - very aggressive.\n\n- old MelBand Roformer model on MVSEP (don’t confuse with the Kim’s one x-minus - they’re different)\n\n- [GSEP](#_yy2jex1n5sq) (now paid) -\n\nInst fullness: 28.83, bleedless: 31.18, SDR: 12.59\n\nCheck out also 4-6 stem separation option and perform mixdown for instrumental manually, as it can contain less noise/residues vs 2 stem in light mix without bass and drums too (although more than first vocal fine-tunes of like MVSEP’s BS-Roformer v2 back then). Regular 2 stem option can be good for e.g. hip-hop, and 4/+ stems a bit too filtered for instrumentals with busy mix. GSEP tends to preserve flute or similar instruments better than some Roformers and HQ\\_X above (for this use cases, check out also kim inst and inst 3 models in UVR) and is not so aggressive in taking out vocal chops and loops from hip-hop beats. Sometimes might be good or even the best for instrumentals of more lo-fi hip-hop of the pre 2000s era, e.g. where vocals are not so bright but even still compressed/heavily processed/loud or when instrumental sound more specific to that era. For newer stuff from ~2014 onward, it produces vocal bleeding in instrumentals much sooner than the above models. \"gsep loves to show off with loud synths and orchestra elements, every other mdx v2/demucs model fail with those types of things\".\n\n*Older ensembles (among others from the* [*leaderboard*](https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum)*)*\n\nQ: How to ensemble BS-Roformer 1296 with Kim Mel-Roformer using UVR GUI?\n\nI choose max/max vocal/instrumental, but on the list there is only 1296, and no Kim Mel-Roformer like in MDX-Net option [might have been fixed already]\n\nA: “You have to set the stem pair to multi-stem ensemble, it can generate both vocal and instrumental from both models at the same time. Be sure to set the algorithm to max/max. Once that's done, find the ensemble folder and put the two instrumental files/two vocal files onto the input, provided that you have to go to audio tools first. Then set the algorithm to average and click on the start processing button” - imogen\n\n0f. [#4626](https://mvsep.com/quality_checker/entry/4626):\n\nMDX23C\\_D1581 + Voc FT\n\n0g) [#4595](https://mvsep.com/quality_checker/entry/4595):\n\nMDX23C\\_D1581 + HQ\\_3 (or HQ\\_4 now)\n\n0h) Kim Vocal 2 + Kim Inst (a.k.a. Kim FT/other) + Inst Main + 406 + 427 + htdemucs\\_ft (avg/avg)\n\n0i) Voc FT, inst HQ3, and Kim Inst\n\n0j) Kim Inst + Kim Vocal 1 + Kim Vocal 2 + HQ 3 + voc\\_ft + htdemucs ft (avg/avg).\n\n0k) MDX23C InstVoc HQ + MDX23C InstVoc HQ 2 + MDX23C InstVoc D1581 + UVR-MDX-NET-Inst HQ 3 (or HQ 4)\n\n“A lot of that guitar/bass/drum/etc reverb ends up being preserved with Max Spec [in this ensemble]. The drawback is possible vocal bleed.” ~Anjok\n\n0l) MDX23C InstVoc HQ + MDX23C InstVoc HQ 2 + UVR-MDX-Net Inst Main (496) + UVR-MDX-Net HQ 1\n\n\"This ensemble with Avg/Avg seems good to keep the instruments which are counted as vocals by other MDXv2/Demucs/VR models in the instrumental (like saxophone, harmonica) [but not flute in every case]\" ~dca100fb8\n\n0m) MDX23C InstVoc HQ + HQ4\n\n0n) [Ripple](https://apps.apple.com/us/app/ripple-music-creation-tool/id6447522624) (no longer works) / Capcut.cn (uses SAMI-ByteDance a.k.a. BS-Roformer arch) - Ripple is for iOS 14.1 and US region set only - despite high SDR, it's better for vocals than instrumentals which are not so good due to noise in other stem (can be alleviated by decreasing volume by -3dB).\n\n0n) Capcut (for Windows) allows separation only for the Chinese version above (and returns stems in worse quality). See [more](#_f0orpif22rll) for a workaround. Sadly, it normalizes input already, so -3dB trick won’t work in Capcut. Also, it has worse quality than Ripple\n\nThe best single MDX-UVR non-Roformer models for instrumentals explained in more detail\n\n([UVR 5 GUI](https://github.com/Anjok07/ultimatevocalremovergui)/Colabs/MVSEP/x-minus):\n\n0. full band MDX-Net **HQ\\_4** - faster, and an improvement over HQ\\_3 (it was trained for epoch 1149). In rare cases there’s more vocal bleeding vs HQ\\_3 (sometimes “at points where only the vocal part starts without music then you can hear vocal residue, when the music starts then the voice disappears altogether”). Also, it can leave some vocal residues in fadeouts. More often instrumental bleeding in vocals, but the model is made mainly for instrumentals (like HQ\\_3 in general)\n\n0b) full band MDX-Net HQ\\_5 - similarly fast, might be less noisy, but more muddy, although better for vocals, but “it seems it's the best workaround when there is vocal bleed caused by Roformers”\n\n1. full band MDX-Net **HQ\\_3** - like above, might be sometimes simply the best, pretty aggressive as for instrumental model, but still leaving small amounts of vocal residues at times - but not like BS-Roformer v2/viperx, so results are not so filtered like in these.\n\nHQ\\_3 filters out flute into vocals. Can be still useful to this day for specific use cases “the only model that kept some gated FX vocals I wanted to keep”.\n\nIt all depends on a song, what’s the best - e.g. the one below might give better clarity:\n\n2. full band **MDX23C-InstVoc HQ** (since UVR 5.60; 22kHz/fullband as well) - tends to have more vocal residues in instrumentals, but can give the best results for a lot of songs.\n\nAdded also in MDX23 [2.2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) Colab, possibly when weights include only that model, but UVR's implementation might be more correct for only that single model. Available also in [KaraFan](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) so it can be used there only as a solo model.\n\n2b. MDX23C-InstVoc HQ 2 - worse SDR, sometimes less vocal residues\n\nOlder MDX models\n\n2c. narrowband **MDX23C\\_D1581** (model\\_2\\_stem\\_061321, 14.7kHz)- better SDR vs HQ\\_3 and voc\\_ft (single model file [download](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/MDX23C_D1581.ckpt) [just for archiving purposes])\n\n\"really good, but (...) it filters some string and electric guitar sounds into the vocals output\" also has more vocal residues vs HQ\\_3.\n\n\\*. narrowband **Kim inst** (a.k.a. “ft other”, 17.7kHz) - for the least vocal residues than both above in some cases, and sometimes even vs HQ\\_3\n\n\\*. narrowband **inst 3** - similar results, a bit more muddy results, but also a bit more balanced in some cases\n\n- Gabox “[small](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/small_inst.ckpt)” inst Mel Roformer model for faster inference than most Roformers | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-small/resolve/main/config_melbandroformer_small.yaml)\nBe aware that it can have some audible faint constant residues.\n\n\\*. narrowband inst 1 (418) - might preserve hihats a bit better than in inst 3.\n\n3. narrowband voc\\_ft - sometimes can give better results with more clarity than even HQ\\_3 and kim inst for instrumentals, but it can produce more vocal residues, as it’s typically a vocal model and that’s how these models behave in MDX-Net v2 arch (you can use it e.g. as input for Matchering for cleaner, but more muddy model result)\n\n\\*. less often - inst main (496) [less aggressive vs inst3, but gives more vocal residues]\n\n\\*. or eventually also try out HQ\\_1 - (epoch 450)/HQ\\_2 (epoch 498) or earlier 403, 338 epochs, or even 292 is also used frequently from time to time) when VIP code is used.\n\n[*Recommended MDX and Demucs parameters*](#_6q2m0obwin9u) *in UVR*\n\n- Ensemble of only models without bleeding in single models results for specific song\n\n- [DAW ensemble](#_oxd1weuo5i4j) of various separation models - import the results of the best models into DAW session set custom weights by changing their volume proportions\n\n- Captain Curvy method:\n\n\"I just usually get the instrumentals [with MDX23C] to phase invert with the original song, and later [I] clean up [the result using] with voc ft\"\n\n[*How to check whether a model in UVR5 GUI is vocal or instrumental?*](#_p1fyricuv1j8)\n\n(although in MDX23C there is no clear boundary in that regard)\n\n###### **>** ***for vocals***\n\n###### *(*[*ensembles*](#_i7k483hodhhu)*, or click* [*here*](#_vg1wnx1dc4g0) *for Karaoke,* [*here*](#_2vdz5zlpb27h) *for instrumentals)*\n\nMVSEP models without download links can be used only on MVSEP\n\n*(removing/isolating vocals from AI music can give muddy results and capture other unrelated instruments easily; also, Roformers tend to stress plosives which weren’t in the original vocals at time - cristouk)*\n\n*There’s no one, the best model. It depends on the song.\nMost commonly used/the best models for the doc’s date (categorized with links below):*\n\nBS-Roformer 2025.07, Becruily deux, voc\\_fv7, vocfv7beta2, HyperACE voc v2, BS\\_RoFormer\\_mag.\n\nvoc\\_fv4, Big Beta 5e, Resurrection voc, Big Beta 7, Big Beta 6X, Revive 3e, Anvuew BS-Roformer vocals.\nOlder: Mel FT2 Bleedless, voc\\_fv6, voc\\_fv5, Becruily voc, Revive 2, FT3 Preview.\n\n*Bleedless models #1 (MVSEP exclusive; models for download later below)*\n\n- BS-Roformer 2025.07 only on MVSEP - free with longer queue\n\nVocals bleedless: 38.25, fullness: 17.23, SDR: 11.89\nThe biggest bleedless metric for a single model so far. Compared to previous models, picks up backing vocals and vocal chops greatly where 6X struggles, and fixes crossbleeding and reverbs where in some songs previous models struggled before.\nSometimes you might still get better results with Beta 6X or voc\\_fv4 (depending on a song). “Very similar to SCNet, very high fullness without the crazy noise” - dynamic64, “handles speech very well. Most models get confused by stuff like birds chirping (they put it in the vocal stem), but this model keeps them out of the vocal stem way more than most. I love it!”\n\nWorks the best for orchestral choirs out of the long [list](https://discord.com/channels/708579735583588363/1421129357677559970/1421447212881154068) of other models (.elgiano).\n\nIt can be better for metal both for vocal and instrumentals than the mesk’s models, a lot of the times (and sometimes the best).\n\nThe first iteration of the model (2025.06: 37.83/17.30/11.82) received two small updates and was replaced by 2025.07.\n\nIt’s also good as a source model for phase fixer/swapper.\n\n- Mel-Roformer Bas Curtiz edition (/w Marekkon5) (trained on also ZFTurbo dataset) on MVSEP (older version of 2024.10 model)\n\nVocals bleedless: 39.20, fullness: 16.24, SDR 11.18.\n\n*Bleedless models #2 (MVSEP exclusive)*\n\n- Mel-Roformer 2024.10 (Bas Curtiz model fine-tuned by ZFTurbo) on MVSEP\nVocals bleedless: 37.80, fullness: 17.07, SDR 11.28\nSmall amounts of bleeding from instrumentals (inst. bleedless 39.20), might struggle with flute occasionally, good enough for [creating RVC datasets](https://rentry.co/RVC-dataset-RX11).\n\n- BS-Roformer 2024.08 (viperx model fine-tuned v2 by ZFTurbo) on MVSEP\nVocals *bleedless*: 37.61, fullness: 15.89, SDR: 11.32\nGood for inverts, Dolby, lots of harmonies, BGVs. Good or even the best vocal fullness for some genres ~Isling, decent all-rounder, but might be muddier than Mel models here, although it gives less vocal residues than all the Mel Kim fine-tune models here, can be also used for RVC). “I've found it very useful for extremely quiet vocals that Mel couldn't extract” - Dry Paint Dealer. It’s a second MVSEP’s fine-tune of viperx model.\nIirc, it’s used as a preprocessor model for \"Extract from vocals part\" feature on MVSEP.\n\n- MVSep Ensemble 11.93 (vocals, instrum) (2025.06.28) - only for premium users\n\nVocals bleedless: 36.30, fullness: 17.73, SDR: 11.93\n\nSurpassed sami-bytedance-v.1.1 on the multisong dataset SDR-wise.\n\n*Community bleedless models*\n\n- BS-Roformer SW 6 stem [(](https://github.com/undef13/splifft/releases)MVSEP, Colab[)](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Colab_Inference_BSRofo_SW_fp16.ipynb) [/](https://mega.nz/folder/R3BzBSgD#sz2XIOk3y0-LS4hIQOQKYQ) Vocals only\n\nVocals bleedless: 36.06, fullness: 16.95, SDR 11.36\n\nGood for some deep voices.\n\n- Unwa Kim Mel-Band Roformer Bleedless FT2 | [download](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | [Kaggle](https://www.kaggle.com/code/zzryndm/music-source-separation-training-inference-webui) | [UVR instruction](#_6y2plb943p9v)\n\nVocals bleedless 39.30 (better than Mel-Kim), fullness 15.77, SDR 11.05\n(voc. fullness is worse than Mel Kim - 16.26,\ninst. bleedless is still lower than base Mel-Kim model: 46.30 vs 46.56)\n\n“I usually use big beta 6x, big beta 5e if that fails and FT2 bleedless if I want very low noise or instruments are quiet (it gets muddy quick)” - Rainboom Dash\n\n- Anvuew BS-Roformer 12.45 vocal model | [download](https://huggingface.co/anvuew/BS-RoFormer)\n\nDoesn't work on the UVR’s RTX 5000 patch - then use [MSST](#_2y2nycmmf53) instead.\n\nCan be muddy. Not so balanced like Beta6X or vocfv7beta1, but “it properly doesn't capture the instrument [here](https://drive.google.com/file/d/1KdZEEMTezU4iQhv9-_zLpwPL9A84G8_m/view?usp=sharing). Even FT2 bleedless gets tricked by this part, but this does just fine.” - rainboomdash\n\n- Unwa’s BS-Roformer [Resurrection](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection.ckpt) (voc. variant) | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection-Config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVocals bleedless: 39.99, fullness: 15.14, SDR: 11.34\n\nShares some similarities with the SW model, including small size (might be a retrain). The default chunk\\_size is pretty big, so if you run out of memory, decrease it to e.g. 523776.\n\nMake a back of your UVR folder before installing it. We had a report that this model might break your UVR installation probably permanently for some reason.\n\n- Unwa’s [Revive 2](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/bs_roformer_revive2.ckpt) BS-Roformer fine-tune of viperx 1297 model | [config](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nVocals **bleedless**: 40.07, fullness: 15.13, SDR: 10.97\n\n“has a bleedless score that surpasses the FT2 Bleedless”\n\n“can keep the string well”\n\nIt’s depth 12 and dim 512, so the inference is much slower than some newer Mel-Roformers.\n\n*Worse bleedless than above, higher fullness*\n\n- Gabox [voc\\_fv7](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv7.ckpt) Mel-Roformer | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/v7.yaml) | makidanyee’s [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nvoc. bleedless: 33.85, fullness: 17.99, SDR: 11.16[’](https://mvsep.com/quality_checker/entry/9588)\n\nIncompatible with UVR RTX 5000 patch.\n\nSometimes results sound not as noisy as BigBeta5e (but it depends on a asong) and voc HyperACE (which might sound less muddy, but noisier), but it catches harmonies much better than BigBeta6X:\n\nvoc\\_fv7 is \"a little noisy, quiet, but at least it's capturing [the harmonies] (...) As with all models, it varies drastically song to song on how full it is compared to other models\n\nfv7 was also cutting the reverb really aggressively on one song compared to big beta 6x\" - rainboomdash\n\n- Unwa BS-Roformer HyperAce v2 vocal [model](https://huggingface.co/pcunwa/BS-Roformer-HyperACE/tree/main/v2_voc) | separate [Colab](https://colab.research.google.com/drive/1bd8qmLaE6WSix7M-TNs9Oj948SpHqjJc?usp=sharing) #2 | MVSEP\n\nvoc. bleedless 34.08, fullness 19.10, SDR 11.40 (there was no v1 of the voc variant)\n\n“Compared to the Resurrection vocal model, this new model achieves higher scores across the board except for the bleedless score.”\n\n“I feel like most songs hyperace gets a good bleedless result but ill probably stick with voc\\_fv7 beta3 or revive e3” - 5b\n\n“it seems to be like... trying really hard to pull certain stuff out that is causing noise, I think\n\nit doesn't sound that bleedless to me, due to that reason (..) there's some noise during some quieter parts but I'll def be adding this to the list of models I use. I was testing against fv7beta2, during louder parts there seemed to be less noise, but during quieter parts there seemed to be more noise (...)\n\nI'll prob use fv7beta2 for the most part still, but I'll try adding a hyperace vocal model to the mix of models I use. lol, soon I'll be using 10 models throughout one song” - Rainboomdash\n\nNote: It also uses its own inference script (bs\\_roformer.py) and is different from the previous one, and it’s also incompatible with UVR. “You can use this model by replacing the MSST repository's models/bs\\_roformer.py with the repository's bs\\_roformer.py.”\n\nTo not affect functionality of other BS-Roformer models by that file, so older BS-Roformers will still work, you can add it as new model\\_type by editing utils/settings.py and models/bs\\_roformer/init.py [here](https://imgur.com/a/dkGXo2r) (thx anvuew).\n\nFor error while installing the py file for HyperACE model in Sucial’s WebUI:\n\nfrom models.bs\\_roformer.attend import Attend\n\nModuleNotFoundError: No module named 'models'\"\n\nThe fix: “SUC-DriverOld/MSST-WebUI use the name \"modules\" and ZFTurbo/Music-Source-Separation-Training use the name \"models\". And Unwa's bs\\_roformer.py that you replace with, also use \"models\" - fjordfish\nIn order to make it work in Sucial MSST Web UI you have to edit line (...) in the bsroformer.py file that is included with (...) model and change the word \"models\" to \"modules\". - rage313\\_\n\n- unwa Big Beta 7 Mel-Roformer vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big) | [Mkd Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\nBleedless 38.77, fullness 16.20, SDR 11.20[.](https://mvsep.com/quality_checker/entry/9633)\n\n“SDR is lower than the BS model, but personally I prefer this one. (...) Although not reflected in the metrics, noise has been reduced in sections without vocals.” - unwa\n\nMore bleedless model than 6X:\n\n- Unwa Mel-Roformer Big Beta 6X vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6x.ckpt) | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6x.yaml) | [Colab](https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | AI Hub [Colab](https://colab.research.google.com/github/Eddycrack864/UVR5-UI/blob/main/UVR_UI.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) | uvronline\nvoc. bleedless: 35.16, *fullness*: 17.77, SDR: 11.12\n\n“it is probably the highest SDR or log wmse score in my model to date.”\n\n“There's some noise audible; it doesn't sound as clean when you compare to a more bleedless model (...) but it's certainly not fullness... (...) I think calling it bleedless wouldn't be crazy... makes more sense than \"middle of the road\" - rainboomdash\n“Significantly better” than 5e for some people, although slower. Some leaks into vocal might occur, plus “The biggest problem with the model is the remaining background noise. If it were cleaner, it would already be an almost perfect result.” - musictrack\n\n“6X has a lot less noise on vocals, but it's pretty muddy. I would prefer something in between [5e and 6X]. I tried to apply the phase [fixer/swapper] to the vocals and the noise was reduced, but only slightly.” - Aufr33\n\nSome people might prefer fv5 instead [at least on some songs] ~5b\n\n“6x is picking up BV just fine, where voc fv4 is failing” - Rainboom Dash\n\nTraining details:\n\n“dim 512, depth 12. It is the largest Mel-Band Roformer model I have ever uploaded.” - “the same as Bas Curtiz Edition” model. It has a bigger SDR vs smaller depth 6 Big Beta 6 model. “I've added dozens of samples and songs that use a lot of them to the dataset”\n\n*Fullness models*\n\n- Becruily dual Mel-Roformer model “[deux](https://huggingface.co/becruily/mel-band-roformer-deux/tree/main)” for vocal:\nvoc. bleedless: 28.30, fullness: 23.25, SDR: 11.37[’](https://mvsep.com/quality_checker/entry/9482)\n\nCompatible with UVR Roformer [patch](#_6y2plb943p9v) (including the RTX 5000 one) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | uvronline | MVSEP\n“Unfortunately it won't null to the mixture perfectly, this applies to any multi stem model” - becruily\n“vocal model picks up some things super well, like certain backing vocals or there was yelling in the background that it picked up that fv7beta didn't (...) generally slightly less full than fv7beta3 from the songs I tested, but has less instrumental bleed and slightly less noise.” - rainboomdash\n\n“deux vocal stem has more backing vocals than gabox fv7 beta 1-3 and other models I've ever tried, but it may be rather noisy on silent parts or fadeouts” - makidanyee\n“sounds great! I don't hear the dips in the vocals from some songs that are overly compressed” - Rage313\n\nIt doesn't work with speech and dialogues from movies - dca\n\n“I exported it using FP16 just for the smaller size [only 432 MB], the quality is the same.” - becruily\n\n- anvuew [BS\\_RoFormer\\_mag](https://huggingface.co/anvuew/BS_RoFormer_mag) | [mk Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link)\n\n“specialized for magnitude spectrum accuracy”.\nBleedless 32.17, fullness 22.15, SDR 11.09['](https://mvsep.com/quality_checker/entry/9684)\n\n“managed to pull out harmonies I knew were missing frommy test track as well as not extracting some vocal sampled perc that had been in other models previously” - cristouk\n\n- Unwa [bs\\_roformer\\_revive3e](https://huggingface.co/pcunwa/BS-Roformer-Revive/blob/main/bs_roformer_revive3e.ckpt) | [config](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\nvoc. bleedless: 30.51, fullness: 21.43, SDR: 10.98[‘](https://mvsep.com/quality_checker/entry/8337)\n\n“A vocal model specialized in fullness.\n\nRevive 3e is the opposite of version 2 — it pushes fullness to the extreme.\n\nAlso, the training dataset was provided by Aufr33. Many thanks for that.” - Unwa\n\n“seems to sound better than beta5e, it sounds fuller, but this also means it sounds noisier” - gilliaan. For some people, it’s even the best.\n\n*Even more fullness, less bleedless*\n\n- Gabox experimental Mel-Roformer voc\\_fv6 [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv6.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless: 26.61, ***fullness***: 24.93, SDR: 10.64\n“Definitely not bleedless” - rainboomdash, “Sounds like b5e with vocal enhancer. Needs more training, some instruments are confused as vocals” - Gabox. “fv6 = fv4 but with better background vocal capture” - neoculture\n\n“very indecisive about whether to put vocal chops in the vocal stem or instrumental stem.\n\nsometimes it plays in vocals and fades out into instrumental stem and sometimes it just splits it in half kinda and plays in both at the same time lol” - Isling\n\n“I think is the fullest vocal model I've heard, aside from maybe the scnet high fullness ones lol/ Oh and revive 3e and b5e are full too but yeah.” - Musicalman\n\n- SCNet XL very high fullness on MVSEP\n\nvoc. bleedless: 25.30, fullness: 23.50, SDR: 10.40\n\n- SCNet XL IHF (high instrum fullness by bercuily)\n\nvoc. bleedless: 25.48, fullness: 22.70, SDR: 10.87\n\n(it was made mainly for instrumentals, but “It can also be an insane vocal model too”\n\n*\\_\\_\\_\\_*\n\n- MVSEP SCNet XL IHF\n\nvoc. bleedless 28.31, fullness 17.98, SDR: 11.11\n\n“It has a better SDR than previous versions. Very close to Roformers now.” also, vocal bleedless is the best among all SCNet variants on MVSEP. Metrics. IHF - “Improved high frequencies”.\n\n“Certainly sounds better than classic SCNet XL (...) less crossbleeding of vocals in instrumental (...), and handles complex vocals better” - dca\n\n*Middle of the road #1 (lower fullness)*\n\n“Beta 1 and 3 are higher fullness, beta 2 is more middle-ground (still a fullness model)\n\nI \\*think\\* 3 might be more consistent with not having some songs be overly noisy” - rainboomdash\n\n- Gabox vocfv7 beta 2 Mel-Roformer [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta2.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless: 31.55, fullness: 20.44, SDR: 10.87\n\n“fullness went down a little bit” vs beta 1 (...) Definitely an improvement over fv4 (...) still quite a bit fuller than big beta 6x, but has less noise than even fv4 (...) at least when the instruments are loud, fv7beta2 is usually quite a bit less noisy than fv4, while still maintaining a decent amount of fullness... it is a bit less, but not too much (...) both are pretty noisy with fv4 (...) sometimes the noise can be pretty significant with fv7beta1, and fv7beta2 may have the fullness you desire. (...) “I'm really liking the balance of fullness and noise for most songs. fv4 and fv6/fv7beta1 are usually pretty noisy... this is less noisy, but still has a good amount of fullness.” still gonna have an issue with backing vocals compared to fv7beta1 sometimes… (...) “Fv7beta2 has still been significantly better with BV than fv4, despite quite a bit less noise” but “significant issues on one song, while fv6/fv7beta1 didn't” - rainboomdash\n\n- Gabox [vocfv7beta3](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta3.ckpt) Mel-Roformer | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless 30.83, fullness 21.82, SDR 10.80\n\n“beta 1 and 2... eh, pretty close to same instrumental bleed,\n\nbut beta 3 def a step up from the two songs I compared (...)\n\nmost songs so far, fv7beta3 is fuller than fv7beta1,\n\ndef less robotic sounding at times (when a voice gets quiet/hard to capture, and it just fails).\n\nJust had another song where fv7beta1 was fuller than fv7beta3, but it was also a lot noisier\n\nlarge majority of the songs I tested, fv7beta3 was fuller... I think fv7beta3 is usually a bit noisier than fv7beta1? But also sounds fuller in those cases, I'd say it's generally worth it\n\ninstrumental bleed, usually worse with fv7beta3 versus fv7beta1, but it depends\n\nfv7beta2 is always less full/less noise, but only slightly less instrumental bleed than fv7beta1” - rainboomdash\n\n- Gabox Mel-Roformer voc\\_fv7 beta 1 (a.k.a. vocfv7beta1) [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/vocfv7beta1.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless: 30.81, fullness: 21.21, SDR: 10.96\n\n“one step below the extreme fullness models (...) fv6 on average is more full” - rainboomdash. \"Just a better fv4 it seems, better bleedless\" (fullness: 21.33, bleedless: 29.07, SDR 10.58)\n\nvs voc\\_fv4 \"It is noisier... Kinda closer to beta 5e?” “It's slightly less noise and fullness than beta 5e but picking up the backing vocals REALLY well, significantly better than beta 5e”\n\nBut it's pulling the backing vocals out even better than 5e” “the backing vocals are so good!\n\n“it does have significant synth bleed, too... it at least wasn't coming through at full volume\n\nwhen I say fullness, I specifically mean how muddy it sounds” - rainboomdash\n\n\\_\n\n- Gabox Mel-Roformer [voc\\_fv4](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_fv4.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc. bleedless 29.07, fullness 21.33, SDR 10.58\n\n“Very clean, non-muddy vocals. Loving this model so far” (mrmason347)\n\nGood for anime and **RVC** purposes, currently the best public model for it (codename)\n\n“The important thing for an RVC dataset is to get lead vocals so fv4 is good for that\n\nThe newer karaoke models are also helpful” - Ryan\n\nSome might prefer voc\\_gabox2 instead, occasionally - chroniclaugh.\n\nThe opposite of Beta6x which has “lower noise but [is] less full/muddier (...) noise/muddiness seems between 6x and 5e, but even 6x is picking up BV just fine, where voc fv4 is failing”\n\nSome people might want to test it with even overlap 32, and then:\n\n“It's close to perfect, the only thing is it kinda struggled with picking up the adlibs and the delay, but the lead vocal is almost perfect I think. (...) on another song (...) 5e is just too noisy and 6x is muddy, fv4 is best of both worlds (...) has segments with constant significant vocal bleed (for the most part, it's not audible at all) (...) I was trying to get an acapella and every model failed except this one. It's not perfect, but I guess some songs are just too hard for the AI.” - Rainboom Dash\n\nGood also for instrumentals, if you need less vocal residues than typical instrumental Roformers (even less than Mel Kim, FT2 Bleedless, or Beta 6X - makidanyee.\n\n“even beta 6x is a lot better at pulling that background vocal out than voc fv4...\n\nand that's a less full model. hmm, fv6 is noisier and also not picking up the backing vocals as full as the last mel band roformer” - rainboomdash\n\n- Unwa Mel big beta 5e vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | MVSEP | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR](#_6y2plb943p9v)\nyaml: big\\_beta5e.yaml or [fixed](https://drive.google.com/file/d/1YRv1j0zMs9hk3-On2z6uwfZbsQ7l1LFP/view?usp=sharing) yaml for AttributeError in UVR\nvoc. bleedless: 32.07, fullness: 20.77 (the biggest for now), vocals SDR: 10.66\n\n“feel so full AF, but it has noticeable noise similar to lew's vocal enhancer”\nYou can alleviate some of this noise/residues by using phase fixer/swapper and using becruily vocals model as reference (imogen).\nIt seems to fix some issues with trumpets in vocal stem - maxi74x1.\n“It's noisy and, IDK, grainy? When the accompaniment gets too loud. (...) Definitely not muddy though, which is a welcome change IMHO. I think I prefer beta 4 overall” - Musicalman “ending of the words also have a robotic noise” - John UVR\n“Perhaps a phase problem is occurring” - unwa. Phase swapper doesn’t fix the issue (it works for inst unwa’s models).\nIf you try big beta 5e on a song that has lots of vocal chops, the vocal chops will be phasing in and out and sound muddy (Isling).\n\n“Excellent for ASMR, for separating Whispers and noise, the quality is super good\n\nThat's good when your mic/pc makes a lot of noise. All the denoise models are a bit too harsh for ASMR (giliaan)”\n\nWorse for RVC than Beta 4 model below (codename, NotEddy)\n\nIt can handle reverb tails better than 6X.\n\n- Mel-Roformer vocal by becruily [model](https://huggingface.co/becruily/mel-band-roformer-vocals/tree/main) | [config](https://drive.google.com/file/d/1V__MDSd9h47tgk3CCZUhbfcZz9A_oJee/view?usp=sharing) for ensemble in UVR | MVSEP | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)voc. bleedless: 31.26, fullness: 20.72 (on pair with 5e), SDR: 10.55 | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nLower bleedless than 5e,“pulling almost studio quality metal screams effortlessly, wOw ive NEVER heard that scream so cleanly”\n\n(on older UVR beta patches) If you use lower dim\\_t like 256 at the bottom of config for slower GPU these are the first models to have muddy results with it.\nConsider setting 485100 chunk\\_size in the yaml for the highest SDR.\nCurrently used on x-minus/uvronline as a model for phase fixer.\n\n- Gabox Mel-Roformer [voc\\_fv5](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_fv5.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless: 29.50, fullness: 20.67, SDR: 10.56\n\n“fv5 sounds a bit fuller than fv4, but the vocal chops end up in the vocal stem. In my opinion, fv4 is better for removing vocal chops from the vocal stem” - neoculture. [Examples](https://discord.com/channels/708579735583588363/708580573697933382/1369232029291511881)\n\n*Other/older models*\n\n- Gabox Mel-Roformer [voc\\_gabox2](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/vocals/voc_gabox2.ckpt) model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/vocals/voc_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nvoc. bleedless: 33.13, fullness: 18.98, SDR: 10.98\n\n- Gabox Mel-Roformer Vocal F (fullness) v3 [model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/vocals) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc bleedless: 32.15, fullness 19.97\n\n- Gabox Mel-Roformer Vocal F (fullness) v2 [model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/vocals) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc bleedless: 33.40, fullness: 19.31\n\n- Aname Mel [FullnessVocalModel](https://huggingface.co/Aname-Tommy/MelBandRoformers/blob/main/FullnessVocalModel.ckpt) ([yaml](https://huggingface.co/Aname-Tommy/MelBandRoformers/blob/main/config.yaml)) model | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc. bleedless: 32.98 (less than beta 4), fullness: 18.83 (less than big beta 5e/voc\\_fv4/becruily, more than beta 4)\n\n- Gabox Mel-Roformer voc\\_gabox (Kim/Unwa/Becruily FT) [model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers) | [Colab](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc. bleedless: 34.66 (better than 5e, beta 4 and becruily voc), fullness 18.10 (on pair with beta 4, worse than 5e and becruily)\n\n- Mel-Roformer unwa’s beta 4 (Kim’s model fine-tuned) [download](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main) | [Colab](https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)\n\nvoc. bleedless: 33.76, fullness: 18.09\n“Clarity and fullness” - even compared to newer models above.\n\nBeta 1/2 were more muddy than Kim’s Roformer, potentially a bit less of residues, a bit more artificial sound. Ringing issues in higher frequencies fixed in beta 3 and later. It’s good for RVC (and favourite codename’s public model for RVC before voc\\_fv4 was released). Fuller vocals than Bas Curtiz FT on MVSEP (but can bleed more synths) ~becruily\nUnwa’s vocal models are capable of handling sidechain in songs - John UVR\n\n*Bleedless models #3*\n\n- BS-Roformer Revive unwa’s vocal [model](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/bs_roformer_revive.ckpt) experimental | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Revive/resolve/main/config.yaml)\n\nvoc. bleedless: 38.80, fullness: 15.48, SDR: 11.03\n\nviperx 1297 model fine-tuned. “Less instrument bleed in vocal track compared to BS 1296/1297” but it still has many [issues](https://discord.com/channels/708579735583588363/1226334240250269797/1371215438352224307), “has fewer problems with instruments bleeding it seems compared to Mel. (...) 1297 had very few instrument bleeding in vocal, and that Revive model is even better at this.\n\nWorks great as a phase fixer reference to remove Mel Roformer inst models noise” (dca)\n\n- SYHFT V5 Beta - only on x-minus/uvronline (still available only with [this](https://uvronline.app/ai?hp&test) link for premium users, and for [free](https://uvronline.app/ai?test))\n\nVocal bleedless: 37.27, fullness, 16.18, SDR: 10.82\n\n*Other models #2*\n\n- Unwa’s Kim Mel-Band Roformer FT2 | [model](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\nvoc. bleedless: 37.06, fullness: 16.61 (fullness worse vs the previous FT, but both metrics are better than Kim’s)\nIt tends to muddy instrumental outputs at times, similarly like the OG Kim’s model was doing, which didn’t happen in the previous FT below. [Metrics](https://mvsep.com/quality_checker/entry/7714)\n\n- Unwa Kim Mel-Band Roformer FT3 Preview | [model](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/blob/main/kimmel_unwa_ft3_prev.ckpt) | [yaml](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/resolve/main/config_kimmel_unwa_ft.yaml) | [Colab](https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)\n\nvoc. bleedless: 36.11, fullness: 16.80, SDR: 11.05\n\n“primarily aimed at reducing leakage of wind instruments to vocals.”\n\nFor now, FT2 has less leakage for some songs (maybe till the next FT will be released)\n\n- Unwa’s Mel Big Beta 6 vocal [model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/resolve/main/big_beta6.ckpt) | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/blob/main/big_beta6.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI) / [2](https://huggingface.co/spaces/qtzmusic/UVR5_UI) | AI Hub [Colab](https://colab.research.google.com/github/Eddycrack864/UVR5-UI/blob/main/UVR_UI.ipynb)\nSimilar to FT series. “Although it belongs to the Big series, the characteristics of the model are similar to those of the FT series. (...) this model is based on FT2 bleedless with the dim increased to 512”.\n\nMuddier than Big Beta 5[e], might be better than FT2 at times.\n“If you liked the output of the Big Beta 5e model, you may not like 6 as much; it does not have the output noise problem of 5e, but instead sacrifices Fullness. (...) Simply put, it is a more conservative model” (unwa)\n\nFor anime and RVC “isn't as audibly and spectrally full as fv4 + can at times have flat-line artifact at the very top, but then, fv4 can sometimes have \"crunchy\" noise present at some places, so an ensemble of those 2 is probs a good idea (or might be fv4 flash more on less aggressive scenes).” codename\n\n- Unwa’s Kim Mel-Band Roformer FT vocal [model](https://huggingface.co/pcunwa/Kim-Mel-Band-Roformer-FT/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\nEnhanced both voc. bleedless 36.95 (vs 36.75) and fullness 16.40 (vs 16.26) [metric](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?usp=sharing) for vocals vs the original Mel Kim model. [SDR](https://mvsep.com/quality_checker/entry/7585)-wise it’s a tad lower (10.97 vs 11.02).\n\n*Older models list continues later below*\n\n**Tips for separating vocals**\n\n- Separate with becruily Mel Vocal model and its instrumental model variant, then get vocals from the vocal model, and instrumental from instrumental model, import both stems for the DAW of your choice (can be Audacity) so you’ll get a file sounding like original file, then export - so perform a mixdown of both stems, then separate it with vocal model (mrmason347 /Havoc)\n\n- “In my testing, I've found that SCNet very high fullness (on MVSEP) put through Mel-Roformer denoise (average) and UVR denoise (minimum) has the best acapella result” dynamic\n\n*Depending on a model, some Roformers might be muddy*. Then consider using ensembles or Apollo enhancer model by Lew v2 , although it might be noisy. Might work the best on BS+Mel ensembles (max spec, though avg might work better in some cases), v2 ([model](https://huggingface.co/jarredou/lew_apollo_vocal_enhancer/resolve/main/apollo_model_v2.ckpt) | [config](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/download/uni/config_apollo_uni.yaml) | [Colab](https://github.com/jarredou/Apollo-Colab-Inference/) | [Inference](https://github.com/ZFTurbo/Music-Source-Separation-Training/)) ([this](https://colab.research.google.com/drive/1lHnu9-rVvNp5VtU7MFjWx92501Qwfdyf) Colab now probably works instead), v1 ([model](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/download/1.0/apollo_model.ckpt) | [config](http://config_apollo_vocal.yaml)), and it also can be used in the latest UVR Roformer [beta](#_6y2plb943p9v).\n\n- Sometimes using EQ stressing vocals properly might be beneficial for separation too\n\n- You might potentially also try to experiment with demudder added with the beta patch #14 linked above. Normally demudder works only for instrumentals, but when you switch in the config editor to vocal stem being instrumental and in reverse, then demudder will work vocals. If your model have “other” stem instead of “instrumental” or “vocal”, you’ll need to rename it. Demudder requires stem labelled as instrumental to work with.\n\n###### **Ensembles**\n\n*(for vocals)*\n\n- Mel deux + BS 2025.07 (Max FFT) (“the best Ensemble for vocals for now”) (dca100fb8)\n\n- deux + big beta 7 (max spec) (“another good ensemble”) (neoculture)\n\n- BS Roformer HyperACE Voc v2 + 2025.07 (Max FFT) (formerly) (dca)\n\n- BS Revive 3e + BS 2025.07 (Max FFT) (former “best vocal ensemble”) (-||-)\n\n- Mel Becruily Vocal + MVSEP’s BS 2025.07 (Max FFT) (-||-) (-||-)\n\n- unwa’s bigbeta5 + becruily vocal - Max spec (midol)\n\n- voc\\_gaboxFv2 + becruilys vocal (heauxdontlast - gilian)\n\n- Unwa “Big Beta 4 + Big Beta 5e - Average Spec (“really good to reduce the noise while keeping the fullness”) (heauxdontlast)\n\n- unwa beta6 + voc\\_fv4 (good for anime and **RVC**)\n\n- unwa beta6x + voc\\_fv4 (“some songs I can use big beta 6x, and it's enough, others I need to ensemble it with voc\\_fv4”) (Rainboom Dash)\n\n- unwa beta6x + voc\\_fv6 (“would make a good ensemble, but the amount of noise is horrific\n\nand I heard that [phase swapper](#_j14b9cv2s5d9) would fix it”)\n\n- BSRoformer-Viperx1297, BSRoformer-LargeV1 by Unwa, unwa\\_ft2\\_bleedless, mel\\_band\\_roformer\\_vocals\\_becruily, Gabox voc\\_fv4 - Average/Average Spec (good for cleaning inverts) (AG89)\n\n- Models ensembled (inst, voc) available for premium users on [mvsep.com](https://mvsep.com/)\n\n(SDR 10.44-11.93 and “High Vocal Fullness” variants)\n\n###### **RVC models choice with** [**AI Hub**](https://docs.aihub.gg/rvc/resources/dataset-isolation/#best-models-for-local-eddy-uvr5-ui) **advice** (subject to change; read their current docs too)\n\nIf you can separate with these models downloaded from above locally, see also [here](#_wbc0pja7faof) for the list of all cloud sites and Colabs.\n\n“If you need to remove multiple noises, follow this pipeline for the best results:\n\n*Remove instrumental -> Remove reverb [probably on vocals] -> Extract main vocals -> Remove noise*”\n\nOr also Isling’s approach “gives insanely clean results”:\n\n*Vocals>De-reverb>Karaoke*\n\n*“That’s how I get completely raw lead vocals if I don’t have the multitracks to a song:*\n\n*Big Beta 6X (Unwa) -> Karaoke (Frazer & Becruily) -> De-Reverb V2 (Anvuew) -> De-Reverb Room -> (Anvuew) -> Less Aggressive Denoise (Aufr33)” - natethegratevhs*\n\n*Note: The room model outputs mono natively, you might need to process every channel separately if you don't use MVSEP.*\n\n*“I’m using deux to extract vocals. For lead vocals, I ensemble frazer becruily karaoke and anvuew karaoke [max spec], but I target the backing vocals instead of the lead. Once I get the backing vocals from the ensemble result, I open Audacity and invert the ensemble backing vocal track against the deux vocal track (that’s how I get a cleaner lead vocal, at least for my tracks). After I’ve collected the lead vocals, i run anvuew dereverb mono to remove the reverb” - neoculture*\n\n*Recommended all-vocals models for RVC:*\n\n- MelBand Roformer | Vocals FV4 (a.k.a. voc\\_fv4) by Gabox, also\n- Gabox vocfv7beta1 “seems to give better results than fv4”, also\n\n- Mel 2024.10 is mentioned in MVSEP section, but BS-Roformer 2025.07 now has all the metrics better,\n\n- unwa beta6/x + voc\\_fv4 ensemble is also good for RVC,\n\n- unwa beta 4 was better than big beta v5e (NotEddy/codename), research also voc\\_gabox2)\n\n- “deux vocal stem has more backing vocals than gabox fv7 beta 1-3 and other models I've ever tried, but it may be rather noisy on silent parts or fadeouts” - makidanyee\n\n*Instrumentals*\n\n- MelBand Roformer | INSTV7 by Gabox\n(unwa instrumental v1e+ OR Mel 2024.10 are also mentioned in their MVSEP section and Gabox Fv7z is mentioned in the x-minus)\n\n*De-reverb*\n\n- MelBand Roformer | De-Reverb by anvuew\n(it’s probably v2 variant [also mentioned there], or also Sucial V2 (MelRoformer) mentioned in their MVSEP section [“if I'm unhappy with the results I go for Sucial\n\n- isling”] - it probably follows the model naming scheme of UVR UI on [HF](https://huggingface.co/spaces/TheStinger/UVR5_UI), also the new mono-dereverb model is being used occasionally)\n\n*Backing Vocals*\n\n- Mel-Roformer-Karaoke-Aufr33-Viperx (surpassed by Becruily and Frazer Karaoke, but the first can be more consistent; anvuew's Karaoke model have fuller lead vocals; also older Model fuzed gabox & aufr33/viperx (SDR: 9.85) is mentioned in their MVSEP section)\n\n*De-noise*\n\n- Mel-Roformer-Denoise-Aufr33-Aggr (they mention also “Mel denoiser v2” in UVR section)\n\n*Restoration*\n\n- For lossy mp3/mixtures: Apollo Universal by Lew (sometimes AudioSR can be better)\n\n- For voice: AP-BWE or ClearerVoice-Studio's Clear Voice “my favorite is the 2nd one” - codename0)\n\n###### **Fast inference models for general use**\n\n*Above an hour on i3-7100u, rather light, small - the lightest Roformers, while most used to have 870 MB):*\n\n*For vocals*\n\n- [Unwa Resurrection](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection.ckpt) BS-Roformer ([yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/resolve/main/BS-Roformer-Resurrection-Config.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb), 195 MB)\n\n- BS-Roformer SW vocals only (mask\\_estimators.0 on the regular 6 stem model, 195 MB[)](https://mega.nz/folder/R3BzBSgD#sz2XIOk3y0-LS4hIQOQKYQ)\n\nModels like BS\\_RoFormer\\_mag or Anvuew BS-Roformer 12.45 also have similar size, although not all small size models have to be similarly fast like above, but feel free to test (e.g. hyperacev2 is much slower than the Resurrection).\n\nOlder models\n\n- [Aname Mel-Roformer small](https://huggingface.co/Aname-Tommy/Mel_Band_Roformer_small) (203MB)\n\n- [Unwa Mel-Roformer small](https://huggingface.co/pcunwa/Mel-Band-Roformer-small) (203MB)\n\nOlder arch (faster; 25-60 minutes+ on weak i3u/C2Q respectively)\n\n- voc\\_ft (probably the fastest, but uses outperformed MDX-Net v2 arch, also it’s narrowband)\n\n- Kim Vocal 2 (or ev. 1, -||-, older model)\n\n*For instrumentals*\n\n- Unwa [BS-Roformer Resurrection inst](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst.ckpt) ([yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml)) | a.k.a. “unwa high fullness inst\" on MVSEP | uvronline [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [UVR](#_6y2plb943p9v) (don’t confuse with Resurrection vocals variant, 204 MB)\n\n###### - Gabox BS\\_ResurrectioN ([model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/BS_ResurrectioN.ckpt) | [yaml](https://huggingface.co/pcunwa/BS-Roformer-Resurrection/blob/main/BS-Roformer-Resurrection-Inst-Config.yaml), 204 MB)\n\n*For both\n(dual stem model, it don’t invert - you might save time instead of using two models)*\n\n- Becruily Mel Deux (decent vocals and instrumentals, although sometimes bleedy)\n\nOlder models\n\n- Unwa BS-Roformer-Inst-FNO (works only in MSST after modifying py file like in the model card, similar to decently performing Resurrection inst model, 332 MB)\n\n- Gabox Mel-Roformer [small\\_inst](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/small_inst.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/inst_gabox.yaml) (experimental, 203 MB)\n\n- Unwa BS-Roformer-Inst-EXP-Value-Residual (uses Mel v2 model type in UVR; If it wasn’t made compatible with MSST already, replace bs\\_roformer.py from this [repo](https://github.com/lucidrains/BS-RoFormer/tree/main/bs_roformer) and\nfrom bs\\_roformer.attend import attend\n\n⇩\n\nfrom models.bs\\_roformer.attend import attend\n\nin bs\\_roformer.py file\n\ngenerally not very good model, but sometimes capable: “successfully removed [vocals] and kept the digital choir atmosphere as well” vs deux, inst\\_gaboxFlowersV10 and HyperACE but it it’s considerably slower than deux - mohammedmehditber)\n\n*MDX-Net (faster, usually lower quality, CPU-friendly)*\n\n- MDX-Net HQ\\_3, 4, 5 (the last is the fastest, 56 MB)\n\n- MDX-Net inst3, Kim inst (older, narrowband, but can be useful too in some cases, 63 MB)\n\n*4 stems*\n\n- Faster FP16 version of BS-Roformer 6 stems called splifft (by undef13; a tad lower SDR; only 334MB vs 700 MB in the OG weight, CPU/NVIDIA compatible, and potentially AMD ROCm, only bigger variant works in UVR; the OG “Conversion done after 2 hours for a 2 minute 49 second file” on 2/4 i3 7100u) - on CPU it might be slower than the OG, as it might not support FP16 natively due to even possible emulation. But probably Turing GPUs with tensors (e.g. RTX or T4) and newer, probably have FP16 acceleration, while non-RTX 16XX sometimes not.\n\nFaster, lower quality:\n\n- [KUIELab-MDXNET23C](https://drive.google.com/file/d/1M24__8Qnd648ceXOH5PLVWenVeh6maGo/view) (4 stems) - its first scores were probably from ensemble of its five models, and in that configuration it had better SDR than demucs\\_ft on its own, and drums had better SDR than “SCNet-large\\_starrytong” (so single models’ score of any of these MDX23C models is probably lower than in demucs\\_ft).\n> Lighter “model1” drums sounds surprisingly better than htdemucs non\\_ft v4 on previously separated instrumental. It handles trap really well and preserves hi-hats correctly, but in cost of other stem bleeding. v4 model can be used to clean it a bit further,\n\n- htdemucs v4 non-ft (UVR default) - it can clean up other stem bleeding of the above\n\n- htdemucs\\_mmi - probably faster, but worse quality, v3\n\n- kuielab\\_b - lighting-fast, but quality is mediocre (but rather still better than Spleeter)\n\n\\_\\_\\_\n\n*Older vocal models for general use (moved for archiving purposes)*\n\n- Mel-Roformer unwa’s inst-voc model called “duality v1/2” (focused on both instrumental and vocal stem during training, but you can now test newer V1e+ single stem for this purpose too).\n\n<https://huggingface.co/pcunwa/Mel-Band-Roformer-InstVoc-Duality> | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MVSEP](https://mvsep.com/)\n\nVocals sound similar to beta 4 model, but with more noise,\ninstrumentals are deprived of the noise present in inst v1 and later inst models, but as a downside, they’re more muddy for instrumentals.\nv2 have slightly a bit better SDR and fewer residues\n\nBecause duality is a two stems target model.\n\"[other](https://mvsep.com/quality_checker/entry/7321)\" is output from model\n\n\"[Instrumental](https://mvsep.com/quality_checker/entry/7322)\" is inverted vocals against input audio.\n\nThe latter has lower SDR and more holes in the spectrum.\n\nSo, using MSST-GUI, leave the checkbox “extract instrumental” disabled for duality models.\nYou can use it in the Bas Curtiz’ [GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) for ZFTurbo script (already added) or with the OG ZF’s repo, or in the Colab.\n\n- Aname duality Mel [model](https://huggingface.co/Aname-Tommy/Mel-Band-Roformer_Duality)\n\n- Aname Full Scratch Mel-Band Roformer [model](https://huggingface.co/Aname-Tommy/Mel_Band_Roformer_Full_Scratch)\n\nbleedless 30.75 fullness 13.24, SDR: 8.01\n\n- SYHFT (a.k.a. SYH99999/yukunelatyh) MelBandRoformer V3 | [model](https://huggingface.co/SYH99999/MelBandRoformerSYHFTV3Epsilon)\n\nVS previous SYH’s models “this version is more consistent with separation. It's not what I'd call a clean model; It sometimes lets background noise bleed into the vocal stem. But only somewhat, and depending on how you look at it, it can be a good thing since it makes the vocals sound less muddy.” Musicalman\n\n- MelBandRoformerBigSYHFTV1Fast | [model](https://huggingface.co/SYH99999/MelBandRoformerBigSYHFTV1Fast) - more vocal fullness metric, but more bleeding (although less than duality models and even Kim’s purely [metric-wise](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?gid=1468543363#gid=1468543363)). “same parameters size with Kim's. Other models are 2x scale parameter size to compare my model”\n\n- **Mel-Roformer model by Kim** | [model](https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt?download=true) | [config](https://drive.google.com/file/d/15TF3sAWCxWIaKaYRduyqTBBm7VPPcbNv/view?usp=sharing)\n\nVocals bleedless: 36.75, fullness: 16.26, SDR: 11.07\n\n([Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)/[Huggingface](https://huggingface.co/spaces/TheStinger/UVR5_UI)/[2](https://huggingface.co/spaces/qtzmusic/UVR5_UI)/MVSEP/uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)/UVR [beta Roformer](#_6y2plb943p9v) (available in Download Center)/[MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md)/[simple Colab](https://colab.research.google.com/drive/1tyP3ZgcD443d4Q3ly7LcS3toJroLO5o1?usp=sharing)/[CML inference](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model))\n\nUsual base for lots of Mel fine-tunes on that list.\n\nSometimes might leave instrumental residues in vocals, but can be less muddy than other BS-Roformers - the same goes to any fine-tunes of this model vs BS 2024.08, so effectively all the Mel models above)\n\n“godsend for voice modulated in synth/electronic songs” vs 1296 can be more problematic with wind instruments putting them in vocals.\n\n- unwa’s instrumental Mel-Roformer v1e+\n\n- unwa’s instrumental Mel-Roformer v2 model (similar to v1, but less noise, muddier, bigger, heavier model)\n\n[Model files](https://huggingface.co/pcunwa/Mel-Band-Roformer-Inst/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) (It's now included in ZFTurbo's [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training), it's the \"gui-wx.py\" file)\n\nMight miss some samples or adlibs while cleaning inverts. SDR got a bit bigger (16.845 vs 16.595) “Sounds very similar to v1 but has less noise, pretty good” “the aforementioned noise from the V1 is less noticeable to none at all, depending on the track”. “V2 is more muddy than V1 (on some songs), but less muddy than the Kim model. (...) [As for V1,] sometimes it's better at high frequencies” Aufr33\n\n- older BS-Roformer 2024.02 on MVSEP (generally BS-Roformer models “can be slappy with choir-like vocals and background vocals” but “hot on pre-2000 rock”)\n\nThese older Roformers “kinda does poorly on large screams” in metal music, but not always. Sometimes even HQ\\_4 can catch them better than, e.g. viperx models.\n\n- Mel-Roformer fine-tuned 17.48 model on MVSEP (works e.g. for live shows that have crowd)\n\n(it’s different from the one on x-minus)\n\n- Gabox BS-Roformer instrumental, which doesn’t struggle so much with choirs like most Mel-Roformers, although it may not help in all cases ([link](https://huggingface.co/GaboxR67/BSRoformerVocTest/tree/main))\n\n- “ver. 2024.04” SDR 17.55 on MVSEP - fine-tuned viperx model v1 (can pick in adlibs better, occasionally picks some SFX’, sometimes one, sometimes the other is “slightly worse at pulling out difficult vocals”)\n\n- BS-Roformer Large unwa’s vocal model (viperx 1297 model fine-tuned) [download](https://drive.google.com/file/d/1Q_M9rlEjYlBZbG2qHScvp4Sa0zfdP9TL/view) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nMore muddy than Kim’s Roformer, potentially a bit less of residues, a bit more artificial sound. Better than viperx model - “captures more nuances, subtle elements and details” ~A5\nIt can be better for some older music like The Beatles than above models.\n\n- BS-Roformer viperx 1297 model (UVR beta/MVSEP a.k.a. SDR 17.17 for “1296” variant iirc/called just “BS-Roformer” on uvronline via special link for: [free](https://uvronline.app/ai?discordtest)/[premium](https://uvronline.app/ai?hp&test) (scroll down)\n\n- Mel-Roformer viperx 1143 model (UVR>Download More Models)\n\n(don't confuse with 1053 which separates drums and bass in one stem).\n\nThe first Mel-Roformer vocal model trained by viperx before Kim model which introduced changes to the config, which fixed the problem of lower SDR vs models trained on BS-Roformer.\n\nMost people back then preferred Kim Mel-Roformer instead, but Mel viperx’ “does background voices correctly not unlike Kim's (it does not recognise background 'breee's)” “Iirc Viperx Mel Rofo doesn't struggle with instruments counted as vocals”.\n\nAlso, both Mel and BS variants of viperx model struggle with saxophone and e.g. some Arabic guitars. It can still depend on a song whether these are better than even the second oldest Roformer than on MVSEP (from before viperx model got fine-tuned version). Beside problems with recognizing instruments, they're very good for vocals (although Mel-Roformer by Kim on x-minus tends to be better).\n\nMuddy instrumentals when not ensembled with other archs (but we didn’t have typically instrumental stem target models back then), maybe Mel variant less.\n\nBe aware that names of these models on UVR refer to SDR measurements of vocals conducted on private viperx dataset, not even older Synthetic dataset, instead of on multisong dataset on MVSEP, hence the numbers are higher than in the multisong chart on MVSEP.\n\n*Older ensembles for vocals*\n\n- Models ensembled option on x-minus.pro (available only for premium users)\n\n> Mel-Roformer + MDX23C (can be picked after you uploaded/processed a track [at least with Mel-Roformer model chosen]).\n\n> Mel-Roformer + demudder\n\n“I recommend mel-roformer + demudder to remove vocals from songs that contain only backing vocals that are so faint that our ears can barely hear them.”\n\n- MDX23 by ZFTurbo (v. [2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb) jarredou Colab fork)\n\n- Ensembles on MVSEP.com (for premium users)\n\n- Ensembles in UVR 5:\n\na) 1296 + 1143 (BS-Roformer in [beta](#_6y2plb943p9v) UVR) + Inst HQ4 (dopfunk)\n\n(there might be instrumental residues from HQ4 in some cases)\n\nb) 1296 + 1297 + MDX23C HQ\n\nc) Manual ensemble in UVR of models BS-Roformer 1296 + copy of the result + MDX23C HQ (jarredou) - for faster result and similar quality vs the one above\n\nMore ensembles beneath\n\n- [KaraFan](#_7kniy2i3s0qc) (preset 4, but may give worse results than Mel-Roformer)\n\n\\_\\_\\_\n\n*Older single models* for vocals (available in [UVR 5](https://github.com/Anjok07/ultimatevocalremovergui) | inference [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MDX-Net](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb) | [MVSEP](https://mvsep.com))\n\n- **UVR-MDX-Net-Voc\\_FT** (narrowband, further trained, fine-tuned version of the Kim vocal model; Roformers might be better now)\n\n>If you still have instrumental bleeding, process the result with Kim vocal 2\n\n>Alternatively use MDX23C narrowband (D1581) then Voc-FT, \"great combination\" (or MDX23C-InstVoc HQ instead of D1581)\n\n(so separate with the D1581 or InstVoc model first, then use the separated result as input, and separate it further with voc\\_ft)\n\n- Kim Vocal 1 (can bleed less than 2, but more than voc\\_ft, might depend on a song)\n\n- Kim Vocal 2\n\n>MDX-Net HQ\\_3/4/**5** (HQ\\_4 can be sometimes not bad on vocals too, even less muddy than voc\\_ft, though more noisy, and e.g. HQ\\_3 had more vocal residues then Kim Vocal 2 in general, HQ\\_5 have stronger and fuller vocals than HQ\\_4)\n\n>**MDX23C-InstVoc HQ** (can have some instruments residues at times, but it’s fullband - better clarity vs voc\\_ft and Kim Vocal 1/2 -\n\n“This new model is [vs the narrowband vocal models], by far, the best in removing the most non-vocal information from an audio and recovering formants from buried passages... But in some cases, it also removes some airy parts from specific words, and some non-verbal sounds (breathing, moaning).”\n\n- newer MDX23C epochs available on MVSEP like 16.66.\n\nMDX23C models are go-to models for live recorded vocals\n\n(available also in MDX23 Colab v2.3/2.4 when weight set only for InstVoc model)\n\nOlder UVR ensembles (from before Roformer models release)\n\n>Voc FT + MDX23C\\_D1581 (avg/avg)\n\n>292, 496, 406, 427, Kim Vocal 1, Kim Inst + Demucs ft ([#1449](https://mvsep.com/quality_checker/entry/1449))\n\n>Kim Inst, Kim Vocal 1 (or/and voc\\_ft), Kim Vocal 2, UVR-MDX-NET Inst HQ 2 (or 3/4), UVR-MDX-NET\\_Main\\_427, htdemucs\\_ft (avg/avg IRC)\n\n>Kim Vocal 1+2, MDX23C-InstVoc HQ, UVR-MDX-NET-Voc\\_FT\n\n(jaredou)\n\n> [More ensembles](#_xya7mtyl0m39)\n\n>You can also check some ensembles for [instrumentals](#_2vdz5zlpb27h)\n\nYour choice of the best vocal models only (up to 4-5 max for the best SDR - [more](#_tb9spo3rgthx))\n\nIf your separation still bleeds, consider processing it further with models in [Debleeding](#_tv0x7idkh1ua) section further below.\n\n\\_\\_\\_\n\n###### Other services (multipurpose)\n\n- [Ripple](#_f0orpif22rll) (no longer works; since BS-Roformer models release it might be obsolete; it's very good at recognizing what is vocals and what's not and tends to not bleed instrumental into vocal stem; very good if not the best solutions for vocals)\n\n- music.ai (paid; presumably in-house BS-Roformer models)\n\n“almost the same as my cleaned up work (...) It seems to get the instrument bleed out quite well”)\n\n“Beware, I've experienced some very weird phase issues with music.ai. I use it for bass, but vocals are too filtered/denoised IMO, and you can't choose to not filter it all so heavily. ” - Sam Hocking\n\n- <https://myxt.com/> (paid; uses Audioshake)\n\n- moises.ai (paid; uses in-house BS-Roformer models, sometimes better results than the one on MVSEP)\n\n- ZFTurbo’s VitLarge23 e.g. on MVSEP or 2.3/2.4 Colab (it's based on a new transformers arch. SDR-wise it's not better than MDX23C (9.78 vs 10.17), but works \"great\" for an ensemble consisting of two models with weights 2, 1. It's been added in 4 models ensembled on MVSEP (although the bag of current models is a subject to change any time)\n\n- ZFTurbo’s Bandit Plus (MVSEP)\n\nOther decent single UVR models\n\n- Main ([427](https://drive.google.com/drive/folders/1sxDUXVO9cBagfv1owuWCzLmDf-rJumeh?usp=sharing)) or 406, 340, MDXNET\\_2\\_9682 - all available in UVR5, some appear in download center after entering [VIP](https://www.buymeacoffee.com/uvr5/vip-model-download-instructions) code)\n\n- or also instrumental models: Kim Inst and HQ\\_3 (via applied inversion automatically)\n\nOther models\n\n- ZFTurbo's [Demucs v4 vocals 2023](https://mvsep.com/) (on MVSEP, unavailable in Colab, good when everything else fails)\n\n- MDX23 Colab fork [2.1](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb) / [2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) (this might be slow) / [2.3](https://github.com/jarredou/MVSEP-MDX23-Colab_v2/) / [2.4](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb) / [2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb) (it's generally better than UVR ensembles SDR-wise, but it's not available in UVR5) (MDX23 Colab is good also for instrumentals and 4 stems, very clean, sometimes more vocal residues in specific places vs single MDX-UVR inst3/Kim inst/HQ models, but it sounds better in overall, especially the Colab modification/fork with fixes made by jarredou)\n\n- HQ\\_3 (inverted result giving vocals from instrumental in 2nd stem) - more instrumental residues than e.g. Kim Vocal 2, but no 17.7 cutoff)\n\n- Narrowband MDX23C\\_D1581 “Leaves too much instrumental bleeding / non-vocal sounds behind the vocals. Formants are less refined than on any of the top vocal models (Voc FT, Kim 1, Kim 2 and MDX23C-InstVoc HQ).”\n\n- Kavas' methods for HQ vocals:\n\nEnsemble (Max/Max) - Low pass filter (brickwall) at 2k:\n\n- MDX23C\n\n- Voc FT\n\nVoc FT - High Pass Filter (brickwall) at 2k\n\n(“Sometimes it leaves some synth bleeding in the mids\" then try out min/min)\n\nOr:\n\nMultiband EQ split at 2kHz with a low & high pass brickwall filter with:\n\n-MDX23C-InstVoc from 0 to 2kHz and:\n\n-Voc\\_FT from 2kHz onwards\n\n(InstVoc gives fuller mids, but leaves transients from hats in the high end, whereas Voc ft lacks the mids, but gets rid of most transients. Combine the best of both for optimal results.)\n\n- Any [top](https://mvsep.com/quality_checker/leaderboard2.php?&sort=instrum&page=0) ensemble or AI appearing on MVSEP leaderboard (but it depends, - sometimes it can be better for instrumental, sometimes vocals\n\nEnsembles are resource consuming, no cutoff if one model is fullband and the other is narrowband. Random ensembles can result in more vocal or instrumental residues, as mentioned above.\n\nModels not exclusive for MVSEP are all available in [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui), or optionally you can separate MDX models in [Colab](#_aa2xhwp434) and perform manual ensemble in UVR5 (no GPU or fast CPU required for this task) or use manual ensemble in [Colab](#_surlvvp6mr8f) [may not work anymore]) or also in DAW by importing all the stems together and decreasing volume (you might want to turn on limiter on the sum).\n\n###### **Speech separation**\n\n“There isn't one specifically trained for anime, try your luck with the current available models”\n\n###### *The list by Musicalman* (mostly from before the Gabox models release, check [vocals](#_n8ac32fhltgg) too)\n\n“Any vocal model in the past few years should work for speech separation. My favorites at the moment are:\n\n- MDX23c Inst-Voc HQ\n\n- other similar MDX models for least aggressive, but bleedy, only really useful for denoising\n\n- Unwa's Mel-Roformer big beta 4 or beta 5e vocal models - for less bleed. Atm, 5e is my go-to as it sounds less filtered.\n\n~ I've heard people praise BS-Roformers a lot, haven't really tested those much, though.\n\n- Becruily's vocal model can also be better at SFX separation, but can overestimate reverb in the vocal stem sometimes.\n\n- Mel-Roformer Karaoke by viperx and aufr33 - for more aggressive separation (removes a bit more SFX)\n\n- And the most aggressive are Bandit models and the DNR v3 models on MVSEP, though they tend to be a bit too aggressive for my taste, so I only use them selectively.\n\nThis is just my own opinions though, subject to change at a moment's notice lol”\n\n[you’ll find them in [SFX](#_owqo9q2d774z) section]\n\n- [clearvoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice) - it's a set of speech enhancement/separation models. My favorite model of the set is MossFormer2\\_SE\\_48K. Its dialog extraction seems to be similar to Bandit v2, though clearervoice sounds fuller to me, and separation is usually a bit better. Might be especially good in an ensemble with Bandit or vocal sep models eg. unwa, gabox etc.\n\n- BS-Roformer 2025.06 on MVSEP - “handle speech very well. Most models get confused by stuff like birds chirping (they put it in the vocal stem), but this model keeps them out of the vocal stem way more than most. I love it!”\n\nSee also:\n\n- [various speakers isolation](#_ea9fj444mg3m)\n\n- [harmonies](#_9h585vwqgcvg)\n\n- [two singers isolation](#_7bakw3ajb3ii)\n\n- [karaoke](#_vg1wnx1dc4g0)\n\n###### *Can’t find a model?*\n\n- Results containing models in e.g. [#946](https://mvsep.com/quality_checker/entry/946) (e.g. 406, 427, 438) or other ensembles mentioned above, still have public models available in UVR, but you can access them by entering the [download/vip code](https://www.buymeacoffee.com/uvr5/vip-model-download-instructions) in UVR, so more models will show up\n\nYou cannot use VIP code on older beta UVR Roformer patches ([updates](#_6y2plb943p9v)), then to use any other VIP model with Roformers (e.g. D1581), you need to install the stable 5.6 from official GH repo, download the model, and update the installation with the old Roformer patch afterwards if you need such version\n\n- Be aware that MDX23C Inst Voc HQ2 is not accessible in beta Roformer patch when VIP code is inserted. You need to [download](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/MDX23C-8KFFT-InstVoc_HQ_2.ckpt) the model file manually, and paste into models\\MDX\\_Net\\_Models folder.\n\n(Config is detected automatically, as it uses existing model\\_2\\_stem\\_full\\_band\\_8k config - the same as for Inst Voc HQ)\n\n- UVR Denoise non-lite model disappeared from Download Center. Here it is: <https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-DeNoise.pth>\n\n- You cannot use some models from x-minus/uvronline.app or used on MVSEP, e.g. used for Ensemble of 4 and 8 models in UVR, as they contain models not available in UVR, and not available for download. You can only perform manual ensemble of single models processed by MVSEP or x-minus, in UVR, but it will not give the same result as ensemble on MVSEP, as it uses code more similar to MDX23 Colab code, so sometimes weighted ensemble instead of. e.g. avg spec (don’t confuse with MDX23C arch models).\n\n- E.g. for 16.10.23, “MVSep Ensemble of 4” consists of 1648 previous epoch (maybe later updated to 16.66), VitLarge, and Demucs 2023 Vocals and beside the first, none of these models work in UVR, even if downloaded manually (plus VitLarge arch is not supported in UVR at all). Currently, there are various ensembles to choose from on MVSEP.\n\nAs for 4/8 models ensemble on MVSEP - they’re all only for premium users, as many resources and models are being used to output these results\n\n- 1648 on MVSEP is MDX23C HQ1 model (a.k.a. 8K FFT)\n\n- SYHFT V4 and V5 beta by SYH99999 were never publicly released.\n\nV5 Beta is only on x-minus.pro/uvronline and got deleted from the main models view, but might be still accessible via the following links:\n\n<https://uvronline.app/ai?hp&test> (premium)\n\n<https://uvronline.app/ai?test> (free)\n\n###### UVR models repository\n\nUVR5 single models' repository backup as separate links (excluding VIP models, which are offline after decrypting):\n\n<https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models>\n\nAll of publicly available MVSEP models (including checkpoints just for further training):\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/releases>\n\n(refer to the list of models in this document for descriptions of the best models)\n\nAlternatives models’ links list repo:\n\n<https://bascurtiz.x10.mx/models-checkpoint-config-urls.html> (some can be offline)\n\n<https://github.com/SiftedSand/MusicSepGUI/blob/main/models.json>\n\n<https://huggingface.co/spaces/TheStinger/UVR5_UI/blob/main/assets/models.json>\n\n<https://huggingface.co/Politrees/UVR_resources/tree/main/models>\n\nSome of the older UVR5 GUI models described in this guide can be downloaded via expansion packs:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.3.0/v5_model_expansion_pack.zip>\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.3.0/models.zip>\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v4.0.1/models.zip>\n\nSome of the models used by KaraFan:\n\n<https://github.com/Eddycrack864/KaraFan/releases/tag/karafan_models>\n\nMDX23C HQ 2\n\n<https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/MDX23C-8KFFT-InstVoc_HQ_2.ckpt>\n\n427:\n\n<https://drive.google.com/drive/folders/16sEox9Z_rGTngFUtJceQ63O5S9hhjjDk?usp=drive_link> (just in case)\n\nCopy it to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models and rename the model name to: UVR-MDX-NET\\_Main\\_427\n\n*Jarredou’s models mirror*\n\n<https://huggingface.co/jarredou> (6+ models)\n\nSome direct links\n\nVOCALS-InstVocHQ\n\nConfig: <https://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/config_vocals_mdx23c.yaml>\n\nCheckpoint: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v1.0.0/model_vocals_mdx23c_sdr_10.17.ckpt>\n\nVOCALS-MelBand-Roformer (by KimberleyJSN)\n\nConfig: <https://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/KimberleyJensen/config_vocals_mel_band_roformer_kj.yaml>\n\nCheckpoint: <https://huggingface.co/KimberleyJSN/melbandroformer/resolve/main/MelBandRoformer.ckpt>\n\nVOCALS-BS-Roformer\\_1297 (by viperx)\n\nConfig: <https://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/viperx/model_bs_roformer_ep_317_sdr_12.9755.yaml>\n\nCheckpoint: <https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/model_bs_roformer_ep_317_sdr_12.9755.ckpt>\n\nVOCALS-BS-Roformer\\_1296 (by viperx)\n\nConfig: <https://raw.githubusercontent.com/TRvlvr/application_data/main/mdx_model_data/mdx_c_configs/model_bs_roformer_ep_368_sdr_12.9628.yaml>\n\nCheckpoint: <https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/model_bs_roformer_ep_368_sdr_12.9628.ckpt>\n\nVOCALS-BS-RoformerLargev1 (by unwa)\n\nConfig: <https://huggingface.co/jarredou/unwa_bs_roformer/raw/main/config_bsrofoL.yaml>\n\nCheckpoint: <https://huggingface.co/jarredou/unwa_bs_roformer/resolve/main/BS-Roformer_LargeV1.ckpt>\n\nKARAOKE-MelBand-Roformer (by aufr33 & viperx)\n\nConfig: <https://huggingface.co/jarredou/aufr33-viperx-karaoke-melroformer-model/resolve/main/config_mel_band_roformer_karaoke.yaml>\n\nCheckpoint: <https://huggingface.co/jarredou/aufr33-viperx-karaoke-melroformer-model/resolve/main/mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt>\n\nOTHER-BS-Roformer\\_1053 (by viperx)\n\nConfig: <https://raw.githubusercontent.com/TRvlvr/application_data/main/mdx_model_data/mdx_c_configs/model_bs_roformer_ep_937_sdr_10.5309.yaml>\n\nCheckpoint: <https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/model_bs_roformer_ep_937_sdr_10.5309.ckpt>\n\nCROWD-REMOVAL-MelBand-Roformer (by aufr33)\n\nConfig: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/model_mel_band_roformer_crowd.yaml>\n\nCheckpoint: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt>\n\nVOCALS-VitLarge23 (by ZFTurbo)\n\nConfig: <https://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/refs/heads/main/configs/config_vocals_segm_models.yaml>\n\nCheckpoint: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v1.0.0/model_vocals_segm_models_sdr_9.77.ckpt>\n\nCINEMATIC-BandIt\\_Plus (by kwatcharasupat)\n\nConfig: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.3/config_dnr_bandit_bsrnn_multi_mus64.yaml>\n\nCheckpoint: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.3/model_bandit_plus_dnr_sdr_11.47.chpt>\n\nDRUMSEP-MDX23C\\_DrumSep\\_6stem (by aufr33 & jarredou)\n\nConfig: <https://github.com/jarredou/models/releases/download/aufr33-jarredou_MDX23C_DrumSep_model_v0.1/aufr33-jarredou_DrumSep_model_mdx23c_ep_141_sdr_10.8059.yaml>\n\nCheckpoint: <https://github.com/jarredou/models/releases/download/aufr33-jarredou_MDX23C_DrumSep_model_v0.1/aufr33-jarredou_DrumSep_model_mdx23c_ep_141_sdr_10.8059.ckpt>\n\n4STEMS-SCNet\\_MUSDB18 (by starrytong)\n\nConfig: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.6/config_musdb18_scnet.yaml>\n\nCheckpoint: <https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.6/scnet_checkpoint_musdb18.ckpt>\n\nDE-REVERB-MDX23C (by aufr33 & jarredou)\n\nConfig: <https://huggingface.co/jarredou/aufr33_jarredou_MDXv3_DeReverb/resolve/main/config_dereverb_mdx23c.yaml>\n\nCheckpoint: <https://huggingface.co/jarredou/aufr33_jarredou_MDXv3_DeReverb/resolve/main/dereverb_mdx23c_sdr_6.9096.ckpt>\n\nDENOISE-MelBand-Roformer-1 (by aufr33)\n\nConfig: <https://huggingface.co/jarredou/aufr33_MelBand_Denoise/resolve/main/model_mel_band_roformer_denoise.yaml>\n\nCheckpoint: <https://huggingface.co/jarredou/aufr33_MelBand_Denoise/resolve/main/denoise_mel_band_roformer_aufr33_sdr_27.9959.ckpt>\n\nDENOISE-MelBand-Roformer-2 (by aufr33)\n\nConfig: <https://huggingface.co/jarredou/aufr33_MelBand_Denoise/resolve/main/model_mel_band_roformer_denoise.yaml>\n\nCheckpoint: <https://huggingface.co/jarredou/aufr33_MelBand_Denoise/resolve/main/denoise_mel_band_roformer_aufr33_aggr_sdr_27.9768.ckpt>\n\nFor a more recent list see [this](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) Colab and cells containing all the links there too.\n\n\\_\\_\\_\\_\n\n*Why not to use more than 4-5 models for ensemble in UVR* - [click](#_tb9spo3rgthx)\n\n\\_\\_\\_\\_\n\n*Other models*\n\n- [GSEP AI](https://studio.gaudiolab.io/gsep) - new model called “Vocal Remover”, and old instrumental, vocal, 4-6 stem model (it applies additional denoiser for 4/6 stems) - piano and guitar (free). As for 2 stems, it gives very good instrumentals for songs with very loud and harsh vocals and a bit lo-fi hip-hop beats, as it can remove vocals very aggressively. Sometimes even more than HQ\\_3. The new model might be good at removing SFX (instrumental stem is the old model).\n\nIn specific cases (can have more vocal residues in instrumentals vs HQ\\_3 at times - less in jarredou's Colab):\n\n- original [MDX23](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model) by ZFTurbo (only this OG version of MDX23 still works in the offline app, min. 8GB Nvidia card required [6GB with specific parameters]) - sounds very clean though, and not that muddy like inst MDX models, in this means, comparable with even VR arch or better (because of much less vocal residues).\n\n- [Demucs](#_m9ndauawzs5f)\\_ft model (both 3 stems to mix in e.g. Audacity for instrumental) / sometimes 6s model gives better results, or in very specific cases when vocals are easy to filter out - even the old 4 stem mdx\\_extra model (but SDR wise full band MDX 292 is already better than even ft model). The 6s model is worth checking with shifts 20.\n\nMight be still usable in some specific cases, despite the fact that MDX23 uses demucs\\_ft and other models combined.\n\n- [VR models](#_wjd2zth0azhs) settings + VR-only [ensemble settings](#_rv7wwzcmuq3s) (generally deprecated, but sometimes more clarity vs MDX v1, though frequently more vocal residues. Some people still uses it e.g. for some rock, when it can still can give better results than other models, and also for fun dubs, but for it if you have two language tracks of the same movie, you can test out [Similarity Extractor](#_3c6n9m7vjxul) instead, but Audacity center extraction works better than that linked Colab)\n\n- Alternatively, you can consider using narrowband Kim other ft model with fullband model settings parameters in [this](https://colab.research.google.com/drive/1CO3KRvcFc1EuRh7YJea6DtMM6Tj8NHoB?usp=sharing) or the new HV Colab instead. Useful in some specific parts of songs like chorus, where there are still no persistent vocal residues using this method (clearer results than even Max-Spec) or e.g. MDX23 still doesn't give you enough clarity in such places to maybe merge fragments manually of results from different models.\n\nPaid\n\n- [Audioshake](https://indie.audioshake.ai/) (non-copyrighted music only, can be more aggressive than above and pickup some lo-fi vocals where other fails [a bit in manner of HQ models])\n\nHow to bypass the non-copyright music restriction ([1](https://media.discordapp.net/attachments/708579735583588366/1120828059247980645/image.png), [2](https://cdn.discordapp.com/attachments/708579735583588366/1120828171873423380/image.png)).\n\n\"They also reserve themselves the right to keep your money and not let you download the song you split if they discover that you are using a commercially released song and that you don't have the rights to it.\" but generally we didn't have such a case with slowed down songs (otherwise they might not pass anyway)\n\n4 stems might be better at times then Demucs ft model.\n\n- [Dango.AI](https://dango.ai/) (a.k.a. tuanziai.com) free 30 seconds samples; can be the most aggressive for instrumentals vs, e.g. inst 3, tested on Childish Gambino - Algorithm). Since then, models/arch were updated and instrumentals in 9.0 seem to be **the cleanest** or the closest to original instrumentals for 12.08.23 at least in some cases (despite low SDR).\n\n> If you care only about specific snippet in a song, then since 30 second samples to separate are taken randomly from a whole song, to have specific fragment separated, you can copy the same fragment over and over to make a full-length track of it, and it will eventually pick up a whole snippet for separation.\n\nX Uploading snippet shorter than or exactly 30 seconds will not result in the whole fragment being processed from the beginning to the ending.\n\n>Sometimes using other devices or virtual machine in addition to incognito/VPN/new email might even be necessary to reset free credits. It's pretty persistent.\n\n<https://tuanziai.com/encouragement>\n\nHere you might get 30 free points (for 2 samples) and 60 paid points (for 1 full songs) \"easily\".\n\n>>>\n\nEverything else for 2 or 4 stems than above is worse for separation tasks:\n\nLalal, RipX (although now it uses some UVR models (?), Demix, RX Editor 8-11, Spleeter and its online derivatives.\n\n###### **Debleeding/cleaning vocals/instrumentals/inverts**\n\n*-* [*phase fixer*](#_j14b9cv2s5d9) *- for constant vocal shells/buzzing of Roformer instrumental model*\n\n- MVSEP BS-Roformer 2025.07 (e.g. “after making the deux hyperace ensemble ensemble, at least in my cases, works good to remove any small residues that are left in quieter/airy parts of songs” - arxynr)\n\n- [VR](#_wjd2zth0azhs)’s HP2-4BAND-3090\\_4band\\_arch-500m\\_1 in HV Colab a.k.a. 9\\_HP2-UVR in UVR, aggresiveness 1.0 on MVSEP/Colab, 100 in UVR (works for vocal residues from inst Roformers, “can take some vocals residues that even those bleed suppressor models can't (...) still works somehow for cleaning that even BS-Roformer 2025.07 couldn't”, helps with some vocal reverb and some choir vocals and old anime songs and regular nu metal songs - mohammedmehditber)\n\n- MVSEP Choir (same as with 9\\_HP2, but esp. for choir residues - mohammedmehditber)\n\n- UVR Denoise Standard\n\n- UVR Denoise model with 0.1-0.2 aggressiveness (for debleeding vocals from instruments)\n\n- Roformers: Mel 2024.10 on MVSEP a.k.a. Bas Curtiz FT (for vocals debleeding)\n- Or earlier Kim Mel-Roformer model (for vocals; if other model was prev. used)\n\n- Unwa inst v1 (to clean-up vocals from Mel 2024.10 model)\n\n- unwa’s inst v1/e/2 (for OG instrumentals with bleeding [better than Dango for it])\n- unwa big beta 5 (“my go-to clean-up artifacts model after phase inverting master + official instrumental”)\n\n- Unwa BS Revive 3e (although 2 has more bleedless)\n\n- voc\\_fv6 (for vocal inverts - ezequielcasas)\n\n- DEBLEED-MelBand-Roformer (by unwa/97chris) [model](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/bleed_suppressor_v1.ckpt) | [yaml](https://huggingface.co/jarredou/bleed_suppressor_melband_rofo_by_unwa_97chris/resolve/main/config_bleed_suppressor_v1.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(it can work with e.g. inst v1 and its noise, or with v1e, or even MVSEP Karaoke BS-Roformer instrumental stem for “very clean and full” result. Also, sometimes the debleed model can remove some bleed also after using phase fixer)\n\n- Mel avuew’s v2 [de-reverb](#_5zlfuhnreff5) or unwa’s BS Large, MDX-Net HQ\\_5\n(“great for cleaning acapellas from bits of instrumentals”)\n- syftbeta 5 on x-minus.pro (probably still available with [this](https://uvronline.app/ai?hp&test) link for premium, and for [free](https://uvronline.app/ai?test))\n\n- Ensemble of BSRoformer-Viperx1297, Unwa BSRoformer-LargeV1, unwa\\_ft2\\_bleedless, mel\\_band\\_roformer\\_vocals\\_becruily, Gabox voc\\_fv4 on Average/Average\n(good for cleaning inverts - AG89)\n\n- Ensemble of big beta6x, roformer revive2, unwa ft2 bleedless\n(for cleaning instrumental inverts - AG89)\n\n- Or just use other models with the highest bleedless metric ([instrumental](#_2vdz5zlpb27h) | [vocals](#_n8ac32fhltgg))\n\n- [SFX](#_owqo9q2d774z) models are more aggressive than vocal models (not tested for this purpose yet)\n\n- yxlllc’s harmonic noise separation VR [model](https://github.com/yxlllc/vocal-remover/releases/tag/hnsep_240512) (can be used in UVR: rename “model” to some model name, and pt extension to pth, then use Install model option and set config settings to: VR 5.1, 32/128, 1band\\_sr44100\\_hl512; “very good at further remove the noise from a dereverbed vocal yet it is mono. (...) it did have two channels on export but both of them have audible information except one has only some noise that can be easily removed then mix the other channel up to stereo” - mohammedmehditber. Maybe attached CLI code will have better mono model handling)\n- RX10 De-bleed feature for instrumentals ([video](https://youtu.be/nwyJJMiYGUI))\n\n(older methods)\n\n- Gabox Mel [denoise/debleed](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | SESA [Colab v3](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) - for noise from fullness models (tested on v5n) - it can't remove the vocal residues\n- Mel [denoise](#_hyzts95m298o) - that model “removed some of the faint vocals that even the bleed suppressor didn't manage to filter out” before”. Try out denoising on a mixture first, then use the model.\n\n- (*for saxophone bleeding*) ~“1. Take the original song in FLAC or WAV 2. Use MVSEP Saxophone 3. Take the other stem from it - there should be everything else and most of the sax should be gone (for me, there was a small part left) 4. Use Unwa Big Beta 5 on it (so Other-> uvronline/xminus/Colab Unwa Big Beta 5) - then vocals should be very clean no sax bleeding” cali\\_tay98\n\n- (“In case there's any wind instruments that could potentially bleed into the vocals”)\nMVSep Wind\n\n- (*when “models don't pick up the noise*), gently bring back a bit of the original music/instrumental on the inverted track and use AI again.\n\nBy gently, I mean no more than 6 dB“ - becruily\n\n- (*“If your result have \"vocal chops\"*) left in the instrumental separation and no models could remove them completely, then it's likely MDX HQ\\_5 or VitLarge23 v2 will fix it” dca\n\n- Acon Digital DeBleed:Drums “it's just an advanced gate. It doesn't remove bleed when it's overlapping the wanted audio (we can still hear hihat/cymbals on snare with the plugin enabled in their [demo](https://www.youtube.com/watch?v=hCFPa8_QquE))” - jaredou.\n\n- [Audio-Bleeding-Removal](https://github.com/its-rajesh/Audio-Bleeding-Removal) - by its-rajesh\n\n- Try out some L/R inverting, try out to separate multiple times to get rid of some vocal pop-ins like this (fix for ~\"ah ha hah ah\" vocal residues)\n\n*Older de-bleeding models*\n\n- [Ripple](#_f0orpif22rll) (defunct) “AWESOME to use after inverting songs with the official instrumental”\n\nInstrumentals can be also further cleaned with Ripple, and then with [Bandlab Splitter](https://www.bandlab.com/splitter)\n\n(Roformer models may potentially replace Ripple models in that matter now)\n\n- [Top](#_xya7mtyl0m39) ensemble in UVR5 (starting from point 0d)\n\n- [GSEP](#_yy2jex1n5sq) - very minor difference between both for cleaning vocals (maybe GSEP is better by a pinch).\n\nYou can try separating e.g. vocal result double using different settings (e.g. voc\\_ft>kim vocal 2)\n\n- MVSEP 11.50 Ensemble (the least amount of bleeding in inst. separations at least)\n\n- MDX23 jarredou's fork Colab (maybe [this](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb) version at first)\n\n- use voc\\_ft model on the result you got (so separate twice if you already used that model)\n\nC*leaning inverts means - cleaning up residues - e.g. left by the instrumental after an imperfect phase cancellation, e.g. when audio is lossy, or maybe even not from the same mixing session -*\n\n*Aligning*\n\n\"[Utagoe](#_8vosdwb10mjo) bruteforces alignment every few ms or so to make sure it's aligned in the case that you're trying to get the instrumental of a song that was on [e.g.] vinyl.\"\n\n[The previous] UVR's align tool is handy just for digital recordings… [so those] which don't suffer from that [issue] at all.\"\n\nUtagoe will not fix fluctuating speed issues, only the constant ones.\n\nAnjok already \"cracked\" how that specific Utagoe feature works, and introduced it to UVR.\n\n“Updated \"Align Tool\" [is] to align inputs with timing variations, like Utagoe.”\n\n“Some users had good results with Auto Align Post 2 plugin to resync tracks before inverting them.”\n\nFor problematic inverts, you can also try out azimuth correction in e.g. iZotope RX.\n\n**Declicking vocals**\n\n- BS-Rofomer vocal model (iirc; “Can also fix hard clicks in vocals. It is even better than RX in this, but still there is a tiny wave fade residue in some cases”)\n\n- *Kim vocal* first and then separate with instrumental model (e.g. *HQ\\_3 or 4*). You might want to perform additional separation steps to clean up the vocal from instrumental residues first, and invert it manually to get cleaner instrumental to separate with instrumental model to get rid of vocal residues\n\n***Removing metronome*** *(e.g. from a mixture or vocals)*\n\n- Use a good instrumental model so you will be left with metronome + vocals in one stem, then use a drums model - “Then the drum trick worked better but still not very good, a regular extraction worked better this time though!” - brianghost\n\n**Removing bleeding of hi-hats in vocals**\n\n- Kim Mel-Roformer model\n\n- Use MelBand RoFormer v2 on MVSEP (e.g. after using MDX23C Inst HQ)\n\n**Bleeding in other stems**\n\n- RipX Stem cleanup feature (possibly)\n\n- SpectraLayers 10 (eliminates lots of bleeding and noise from MDX23 Colab ensembles)\n\n\"You debleed the layer to debleed from using the debleed source. Results vary. Usually it's better to debleed using Unmix and then moving the bleed to where it belongs\" Sam Hocking\n\n[Video](https://www.youtube.com/watch?v=wZ4hlC4CM2M)\n\n**Bleeding of claps in vocals**\n\n- Reverse polarity and/or remove DC offset of the input file\n\n- [KaraFan](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) (for general drum artefacts, but it doesn’t work well for inverts, try out modded preset 5 [here](https://i.imgur.com/coxj6Zs.png))\n\n- Remove drums with e.g. demucs\\_ft first, then separate the drumless mixture from inversion\n\n- [Settings](https://i.imgur.com/BmvcUEt.png) for VR Colab\n\n- Kim Vocal 2 (but it has a cutoff and creates a lot of noise in the output)\n\n- Denoise model with 0.1-0.2 aggressiveness\n\n- [Sam Hocking method](#_euyv55qdbx07)\n\n**Bleeding of guitars/winds/synths in vocals**\n\n- BVE (Karaoke) models\n\n**Fixing overlapped/misrecognized stems**\n\n- Spectralayer's 9/+ [Cast & Mold](https://download.steinberg.net/downloads_software/SpectraLayers_9/help/Pro/_imprinting.html)\n\n**Fixing low-end rumble**\n\n- Spectral Editing:\n\na) RX Editor’s brush ([video](https://drive.google.com/file/d/118uALJXl_qBLtL3nkftdOxOJydEkDYZQ/view?usp=sharing) by Bas)\n\nb) Audacity ([image](https://imgur.com/a/PI3VJqa)) - “you can, just barely”\n\nPotential alternatives for spectral painting:\n\nFree: [ISSE](https://isse.sourceforge.net/download.html), [Ampter](https://github.com/echometerain/Ampter), [Filter-Artist](https://github.com/HarmoniaLeo/Filter-Artist), [AudioPaint](http://www.nicolasfournel.com/?page_id=125)\n\nPaid: RipX, SpectraLayers, Melodyne, prob. Revoice Pro 5\n\n**Cleaning the white noise/sizzle from vocals**\n\n(from e.g. Roformer models)\n\n- MDX23C model (e.g. the latest on MVSEP or HQ in UVR)\n\n\\_\\_\\_\\_\\_\\_\\_\n\n[Debleeding guide by Bas Curtis](https://docs.google.com/spreadsheets/d/1XIbyHwzTrbs6LbShEO-MeC36Z2scu-7qjLb-NiVt09I/edit?usp=sharing) (other methods, e.g. Audacity)\n\n[*Denoising*](#_hyzts95m298o) *and* [*dereverberation*](#_5zlfuhnreff5)[*/apps*](#_70231k4ydkfw) *later below.*\n\n*See also “*[*Vinyl noise/white noise*](#_hyzts95m298o)*” from the end of the list.*\n\n\\_\\_\\_\\_\\_\\_\\_\n\n###### How to check whether a model in UVR5 GUI is vocal or instrumental?\n\n* Read carefully the models list above - they're categorized\n* If you want to experiment with other models:\n\nThe moment you see \"Instrumental\" on top (and \"Vocal\" below) in the list where GPU conversion is mentioned, you know it's an instrumental model.\n\nWhen it flips the sequence, so Vocal on top, you know it's a vocal model.\n\nSame happens for MDX and VR archs.\n\n* “Be aware that MDX23C/MDXv3 models can be multisource - it depends on the training, so it can be only vocals, or only instrumental, or vocals+instrumental, or vocals+drums+bass+other (like baseline models are), or whatever else.\n* You can know it looking at the config file of the model, for example InstVocHQ,\n\n<https://github.com/Anjok07/ultimatevocalremovergui/blob/master/models/MDX_Net_Models/model_data/mdx_c_configs/model_2_stem_full_band_8k.yaml>\n\nSeeing by the instruments line above, D1581 and InstVocHQ models are instrumental+vocal.\n\nConfig for the rest of the models:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/blob/master/models/MDX_Net_Models/model_data/model_data.json>\n\n([decoded hashes](#_ntgu6se9g0u5))\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### Keeping only **backing vocals** in a song (lead vocal extractor) a.k.a.:\n\n###### **>Karaoke**\n\nYou might want to use a good [vocal](#_n8ac32fhltgg) model as a preprocessor to use with the models below (if MVSEP/x-minus don’t do it already), but sometimes it can degrade the quality, so experiment both ways.\n\nOptionally you may also [de-reverb](#_5zlfuhnreff5) vocals with a good model/plugin first before proceeding, but only if the specific model/method doesn’t filter out a lot of BGVs in your vocals.\n\nAlso, as a preprocessor for all-vocals model, experiment with different panning settings using e.g. A1StereoControl plugin beforehand (might work as counterpart of the panning trick used on uvronline.app).\n\nCheck here for [backing vocal extraction.](#_kkeba46q17rq)\n\n- Mel-Roformer [small\\_karaoke\\_gaboxauf](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/small_karaoke_gaboxaufr.ckpt) by Gabox and Aufr33 | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-small/blob/main/config_melbandroformer_small.yaml) | [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY) | [Metrics](https://mvsep.com/quality_checker/entry/9661)\n\nCompatible with UVR, but not RTX 5000 patch (consider [MSST](#_2y2nycmmf53) in that case for faster separation than using the UVR’s non-RTX 5000 patch).\n\n“it sounds much more fuller than other karaoke models I use [anvuew bs roformer, bs karaoke gabox, and becruily karaoke] and clears out bg vocal much more accurately” - baptizedinfear\n\n“Great! got to hear vocals I could not hear before.” - makesomenoiseyuh\n\n“it's not great with duets btw” - Gabox\n\nLowering chunk\\_size to 352800 makes it muddier, but less noise, “maybe in between would be nice” - rainboomdash\n\n- Anvuew’s Karaoke BS-Roformer [model](https://huggingface.co/anvuew/karaoke_bs_roformer) | [metrics](https://mvsep.com/quality_checker/entry/9180) | incompatible with UVR RTX 5000 patch | [MSST](#_2y2nycmmf53) | MVSEP | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(Lead vocals/backing with instrumental)\n\n“extracts lead vocals a bit better than karaoke becruily frazer, and in some parts, the lead vocals from karaoke anvuew still sound brighter compared to karaoke becruily frazer, which sounds a bit more compressed. oh, and for some reason, the becruily frazer model doesn’t detect vocals with radio effects, while anvuew’s model handles them just fine” - neoculture\n\n“lead vocals leak into instrumental (...) Mel Becruily and Frazer’s BS don’t have this problem”\n\nIn that case, maybe “isolate the acapella first in almost all cases of using a karaoke model” or use the model below instead.\n\n######\n\n###### - BS-Roformer Karaoke [model](https://huggingface.co/becruily/bs-roformer-karaoke/tree/main) by becruily & frazer | [metrics](https://mvsep.com/quality_checker/entry/9013) | MVSEP | uvronline\n\n(Lead vocals/backing with instrumental)\n\nMake sure you don’t have the option “Vocals only” checked in UVR.\n\n8GB VRAM users of AMD and Intel ARC GPUs need to use 160000 chunk\\_size, or the separation will be very slow.\n\n“After dozens of tests I can tell this (...) is the best (better harmony detection, better differentiation between LVs and BVs, sounds fuller, less background Roformer bleed, better uncommon panning handling etc) (...) I noticed \"lead vocal panning\" works really well” - dca\n\n“It also can detect the double vocals” - black\\_as\\_night\n\nIt works the best for some previously difficult songs. Aufr33 and viperx model seems more consistent, but the new BS is still the best in overall - Musicalman\n\n“My OG Mel also catches some of the FX/drums, I guess quite a difficult one due to how it's mixed” - Becruily\n\n“It does do better on mono than previous, sometimes confuses which voice should be the lead, but all models do that on mono in the exact use-case I normally test” - Dry Paint Dealer Undr\n\n“[In] no way inferior to the ViperX (Play da Segunda)” - fabio5284\n\n“The new karaoke model doesn't actually differentiate between lvs & bvs and there's some lead vocal bleeding in the instrumental stem” - scdxtherevolution\n\nVS the newer BS-Roformer MVSEP team model: “sound isn't as clear, but it does an infinitely better job at telling lead/bgv apart”\n\nBecruily:\n\n“I want to remind something regarding my (and the frazer) models\n\nthey're made to separate true lead vocals, meaning either all of the main singer's vocals, or if it's multiple singers - theirs too\n\nthis means if the main singer has stuff like adlibs on top of the main vocals, these are considered lead vocals too - they go together\n\nif there are multiple singers singing on top of each other, including harmonise each other, and if there are additional background vocals behind those - all the singers will be separated as one main lead vocal, leaving only the true background vocals”\n\n“Noticed it a couple days ago. I've had fantastic results with it so far. Much MUCH better at holding the 'S' & 'T' sounds than the Rofo oke (for backing vox). Generally seems to provide fuller results... but also the typical 'ghost' residue from the main vox can end up in the backing vox sometimes, but it's usually not enough to be an issue. I won't go so far as so say that it's replacing the other backing vox models for me entirely... but it feels like the best of both worlds that Rofo and UVR2 provide.” - CC Karaoke\n\n*Tips for the model*\n\n- “I had success by setting the BS Rofo Karaoke model to 100% Right, and then taking the 'Other' result and reprocessing it at 100% Left to get the backing vocals cleanly out.\n\nCurious note on something I've never tried or had to do before but it has worked wonderfully;\n\nI'm isolating the backing vocals on Radiohead's Let Down.” - CC Karaoke\n\n- If you use e.g. vocfv7beta1 as preprocessor for the model, you may get some quieter backing vocals better - Rainboom Dash\n\n###### - MVSEP’s BS Roformer by MVSep Team (SDR: 10.41)\n\n###### under option \"MVSep MelBand Karaoke (lead/back vocals)\", [metrics](https://mvsep.com/quality_checker/entry/9068). Might be a fine-tune. Use the option “extract vocals first”.\n\n###### (“In contrast with other Karaoke models, it returns 3 stems: \"lead\", \"back\" and \"instrumental\".)\n\n“If I had to compare it to any of the models, it is similar to the frazer and becruily models. Sometimes it does not detect the lead vocals especially if there's some heavy hard panning, but when it does, there is almost no bleed, and it works very well with heavy harmonies in mono from what I tested.” - smilewasfound\n\n“becruily & frazer is better a little when the main voice is stereo” - daylightgay\n\nLess clarity thean the models above - wancitte\n\n“On tracks I tested, harmony preservation was better in becruily & frazer (...) the new model isn't worse, I ended up finding examples like Chan Chan by Buena Vista Social Club or The Way I Are by Timbaland where it is better than the previous kar model. The thing is, with the Kar models, it's just track per track. Difficult to find a model for batch processing as it's really different from one track to another” - dca100fb8\n\nAs for MVsep Team: “It’s the only model that combines the lead vocal doubles with the lead vocals stem. It’s far more useful for dissecting harmonies on songs with vocal doubles like Backstreet Boys” - heuheu\n\n“I also found the new model to not keep some BGVs, mainly mono/low octave ones, despite higher SDR” - becruily\n\n“I think I've found a solution for people who don't like the new model.\n\nIf you put an audio file through the karaoke model and then put the lead vocal result through that, it usually picks up doubles.\n\nWhich you can then put in your BGV stem if you'd like” - dynamic64\n\n- MVSep Choir (works for e.g. choirs buried under main vocal in e.g. pop music)\n\n- Ensemble of frazer becruily karaoke and anvuew karaoke (max spec)\n\n- Ensemble of Mel v1e + BS Karaoke MVSep Team with extract vocals first option, Max Spec (using BS 2025.07 as reference and 200/200 for the values)\n\n“It's very aggressive values cuz v1e is noisy, and it works quite well”, the best ensemble for now (dca100fb8)\n\n- Ensemble of BS Roformer Karaoke by anvuew + BS Resurrection Inst (aka \"unwa high instrum fullness\" on mvsep) + phase fix (using BS 2025.07 as reference), older, more crossbleeding - (dca100fb8)\n\n- \"Ensemble of 3 models \"MVSep + Gabox + frazer/becruily\" gives 10.6 SDR on the leaderboard. I didn't upload it yet, but I had local testing.” - ZFTurbo\n\n- Ensemble of ([metrics](https://mvsep.com/quality_checker/entry/9190)):\n\nBS-Roformer Karaoke Frazer & Becruily + BS-Roformer Karaoke Anvuew (avg/avg)\n\n- MVSEP fusion model of Gabox Karaoke and Aufr33/viperx\n(tend to confuse BV/LV more than single models)\n\n- Gonzaluigi Karaoke fusion models ([standard](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/mel_band_karaoke_fusion_standard.ckpt) | [aggresive](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/mel_band_karaoke_fusion_aggressive.ckpt) | [yaml](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/melband_karaokefusion_gonza.yaml)) -\n\nalso confuses BV/LV more\n\n- MVSep MelBand Karaoke (lead/back vocals) SCNet XL IHF by becruily (SDR: 9.53)\n\nWorse SDR than the top performing Roformers, but works best in busy mix scenarios, and when Mel-Roformer models fail, generally bleedier arch. To fix the bleed in the back-instrum stem, use “Extract vocals first”, but “I noticed a pattern that if you hear the lead vocals in the back-instrum track already (SCNet bleed), don't try to use Extract vocals first because there will be even more lead vocal bleed” - dca (iirc it uses the biggest SDR BS-Roformer vocal model as preprocessor).\n\n“Separates lead vocals better than Mel-Roformer karaoke becruily. It's not perfectly clean, sometimes a bit of the backing vocals slips through, but for now, scent karaoke model still the most reliable for lead vocals separation (imo)” - neoculture\n\n- (x-minus/uvronline) Lead and Backing vocal separator (in “Extract Backing vocals>Mel-Roformer Lead/Back)\n\n~~It uses big beta 5e model as preprocessor for becruily Mel Karaoke model~~\n\n“In fact, the big beta 5e model is run after becruily Mel Karaoke” Aufr33 (so you don’t need the additional step to use this separator), plus it also allows controlling option for lead vocal panning like for BVE v2 (It’s to “to \"tell\" the AI ​​where the main vocals are located (how they are mixed)”. “Doesn't even need Lead vocal panning a lot of the time, [the] ability to recognize what is LV and what is BV [is] impressive” - dca).\n\n“The new separator is available in the free version, however, due to its resource intensity, only the first minute of the song will be processed.” if you don’t have a premium.\n\nThe difference from using single becruily Kar model is that, here, “you get the third track, backing vocals.”.\n\n- Becruily: “Probably too resource-intensive, but you could try adding demudders to each step\n\n1) Karaoke model + demudding\n\n2) Separate vocals of BGV + demudidng\n\nBut not sure how much noise this will bring\n\n(Or even a 50:50 ensemble with BVE OG)”\n\n- Mel-Roformer Karaoke by becruily [model file](https://huggingface.co/becruily/mel-band-roformer-karaoke/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | MVSEP | x-minus\n\n(back/main vox/instrumental - 3 stems)\n\nUse voc\\_fv4 vocal model before running it (less bleeding than voc\\_fv6) or:\n\n“first extract the vocals with a fullness model [it was Mel becruily vocal back then] and combine the results with a fullness instrumental model.” - becruily\n\n“It's a dual model, trained for both vocals and instrumental. It sounds fuller + understands better what is lead and background vocal,\nand to me, it is better than any other karaoke model.\n\nImportant note: This is not a duet or male/female model. If 2 singers are singing simultaneously + background vocals, it will count both singers as lead vocals. The model strictly keeps only actual background vocals. The same goes for \"adlibs\" such as high notes or other overlapping lead vocals.\n\nThe model is not foolproof. Some songs might not sound that much improved compared to others. It's very hard to find a dataset for this kind of task.\n\nCompared to Aufr33’s Melband model below, it can achieve e.g. cleaner pronunciation in some songs ([examples](https://discord.com/channels/708579735583588363/708580573697933382/1361694470403260556)).\n\n“Better than Mel Kar, UVR BVE v2, lalal.ai, Dango...”\n\n“For sure better than [the] older Karaoke (aufr's model) for harmonies. Though I can say Dango can remain useful in certain situations” - dca\n\nOn x-minus there’s lead vocal panning setting added for Mel-RoFormer Kar by becruily model. It’s to “to \"tell\" the AI ​​where the main vocals are located (how they are mixed).”.\n\n“doesn't even need Lead vocal panning a lot of the time, [the] ability to recognize what is LV and what is BV [is] impressive” - dca\n“Sometimes struggles when the backing vocals are the same notes as the lead vocals” - isling. Seems like xminus panning can’t solve such issues either.\n\n“Had a similar issue as you however with the Chase Atlantic vocal, MDX Kar V2 with stereo 80% then Chain or Max mag extracts the leftovers works very very well not perfect (at least works for most CA song) but It's enough for me to do an edit” - cali\\_tay98“\n\nIt seems demudder shouldn't be used when Lead vocal panning is set to something different than center, I noticed it brings back the lead vocals in the inst w/BVs as it was before changing LV panning” - dca\n\n- Mel-Roformer Karaoke (by aufr33 & viperx) on [x-minus.pro](http://x-minus.pro/) / [uvronline.app](http://uvronline.app) / [mvsep](https://mvsep.com/)\n\n[model file](https://mega.nz/file/qQA1XTrb#LUNCfUMUwg4m4LZeicQwq_VdKSq9IQN34l0E1bb0fz4) (UVR [instruction](#_6y2plb943p9v)) - online version above might work better, not sure about preprocessor model, maybe even old voc\\_ft, but not necessarily.\n\nThis Mel may extract more than *BVE V2*, if \"extract directly from mixture\" on MVSEP doesn’t detect the BVs (x-minus behavior for this model), the chances are \"extract from vocals part\" on MVSEP (which uses BS-Roformer 2024.08 for it) will detect more BVs (although with possible cross bleed between lead/back in inst+bv)\n\n- If you ensemble the model above with Unwa v1e (Max) it removes all the muddiness of Mel Kar (dca100fb8)\n\n\"You can do it via:\n\nChoose Process Method --> Audio Tools --> Manual Ensemble --> Max Spec\n\nQ: How do you select both inputs?\n\nA: Via Select Input, but if you want to batch process Manual Ensemble it's not possible yet\" (dca100fb8)\n\nEnsemble algorithms like ~“min\\_spec in the direct ensemble are only available if the selected type of stems in the yaml corresponds with the target in UVR.\n\nExample:\n\nIf the target in the yaml is:\n\n-vocals\n\n-other\n\nthen you can't use it in the Vocal/Instrumental selection because it's not written that way in the yaml.” (mesk)\n\n- Gabox Mel [KaraokeGabox](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/kar_gabox.ckpt) model (uses Aufr’s [config](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/config_mel_band_roformer_karaoke.yaml)) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet) - finetune of becruily model\n\n“The lead vocals are good and clean!\nWhile the backing tracks are lossy for this model, [it still] provide[s] great convenient for those who need LdV”\n\n“The model doesn't keep the backing vocals below the main vocals, sometimes the backing vocals will be lost even though there are backing vocals there.”\n\n- BVE v2 model on x-minus.pro/uvronline.app for premium users | [model](https://mega.nz/file/XZpXyJAD#if8wRRDxHZ0T-HiH8ZRLhXUloNIm87kpGKrBRMlHoq8) (uses “4band\\_v4\\_ms\\_fullband” stock [config](https://drive.google.com/file/d/1aGjDkPhqPLLlOKXeIfWDw09LElHMJfos/view?usp=sharing)) by Aufr33\n\nPlace the model file in Ultimate Vocal Remover\\models\\VR\\_Models and config file in lib\\_v5\\vr\\_network\\modelparams (if doesn’t exist already). Then pick “4band\\_v4\\_ms\\_fullband.json” and BV when asked to recognize the model (it has the same checksum as in modelparams folder if it’s there already). Seems like it works with “VR 5.1 model” checked (and probably without it too).\n\n“Note that this model should be used with a rebalanced mix.\n\nThe recommended music level is no more than 25% or -12 dB.\n\nIf you use this model in your project, please credit me.”\n(it's v1 version is also added in UVR’s “Download More Models”, but also without stereo width feature which can fix some issues when BVs are confused with other vocals).\n\nOn x-minus “When you select stereo, it applies a stereo narrower before AI processing.”\n\nIt used to be one of the best models for this purpose. On x-minus at certain point it used voc\\_ft for all vocals as a preprocessor already (not sure if it got changed).\n\n\"BVE sounds good for now but being an (u)vr model the vocals are soft (it doesn’t extract hard sounds like K, T, S etc. very well)\"\n\n\"Seems to begin a phrase with a bit of confusion between lead and backing, but then kicks in with better separation later in the phrase.\"\n\n“If something struggles to separate on bve v2 I change the lv panning option to either 50 or 80% [stereo or center], and it separates it amazingly.\n\nIt even allows me to separate backing backing vocals from backing vocals”\n- For UVR BVE v2 LV bleed in Song without LV - download the BV track and add it to inst v1e model result - no vocal bleed/residues (introC).\n\n“I tried it ensembled with Gabox's model [kar\\_gabox”] and they are amazing together. Yes you have to make the primary stem of aufr33's model \"Instrumental\" if you're ensembling” - AG89\n\n- Newer Gabox experimental [Karaoke model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/karaoke) (June 2025). It’s one stem target so keep extract\\_instrumental enabled for the rest stem.\n\n“really hard to tell the difference between this and becruily's karaoke model” minus the latter has more target stems.\n\n- Chain ensemble mode for B.V. models (available on [x-minus.pro](http://x-minus.pro) for premium users, added in UVR beta 5.5/9.15 beta [patch](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/UVR_Patch_9_15_23_6_35_BETA.exe) already):\n\nIt is possible to recreate this approach using non-BVE v2 models in UVR by processing the output of one Karaoke model by another (possibly VR model as the latter) with Settings>Additional Settings>Vocal Split Mode option (so it separates using the main model for all vocals, then it uses the result as input for the next model).\n\nSo you might experiment with using voc\\_ft or Kim Vocal 2 or 1296 as the main vocal model in the main UVR window, and in Vocal Split Mode use HP5 or HP6 or BVE model, so you won’t have to make the process in 2 steps manually, so separating the result with another model once the first separation is done. Although Vocal Split Mode was designed mainly for BVE models, so in case of any problems with HP5/6 or Karaoke, you can test out also Settings>Choose Advanced Menu>[model arch]>Secondary model instead.\n\nDon't forget reading [vocals](#_n8ac32fhltgg) to find the best separation method for your song to use it for separation with Karaoke/BVE models.\n\nRecommended ensemble settings for Karaoke in UVR 5 GUI (instrumentals with backing vocals):\n\n- 5\\_HP-Karaoke-UVR, 6\\_HP-Karaoke-UVR, UVR-MDX-NET Karaoke 2 (Max Spec)\n\n(in e.g. “min/max” the latter is for instrumental)\n\n- Alternatively, use Manual Ensemble with UVR with Max Spec using x-minus’ [UVR BVE v2](#_4bbgbbg4mfqq) result and the UVR ensemble result from the above.\n\nOr single model:\n\n- HP\\_KAROKEE-MSB2-3BAND-3090 (a.k.a. VR's 6HP-Karaoke-UVR)\n\n- UVR BV v2 on x-minus (and download \"Song without L.V.\". Better solution, newer, different model)\n\n- 5HP can be sometimes better than 6HP\n\n([UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui) / [x-minus.pro](https://x-minus.pro/) / [Colab](https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/main/Vocal%20Remover%205_arch.ipynb#scrollTo=CT8TuXWLBrXF)) - you might want to use Kim Vocal 2 or voc\\_ft or 1296 or MDX23C first for better results.\n\n- UVR-BVE-4B\\_SN-44100-1\n\nQ: What are the differences between Mel-Roformer Karaoke and the last model?\n\nA: If the vocals don't contain harmonies, this model (Mel) is better. In other cases, it is better to use the MDX+UVR Chain ensemble for now.\n\n- Gabox denoise/debleed Mel-Roformer | [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/instrumental/denoisedebleed.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)\n\n“better results than kar v2”\n\n- Gabox kar v2 | [model](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Karaoke_GaboxV2.ckpt) | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/karaokegabox_1750911344.yaml)\n\n- De-echo VR model in UVR5 GUI set to maximum aggression\n\n- [MedleyVox](#_s4sjh68fo1sw) with our trained model (more coherent results than current BV models)\n\nOr ensemble in UVR:\n\n\"The karaoke ensemble works best with isolated vocals rather than the full track itself\"\n\n- VR Arc: 6HP-Karaoke-UVR\n\n- MDX-Net: UVR-MDX-NET Karaoke 2\n\n- Demucs: v4 | htdemucs\\_ft\n\nOr:\n\n- VR Arc: 5HP-Karaoke-UVR\n\n- VR Arc: 6HP-Karaoke-UVR\n\n- MDX-Net: UVR-MDX-NET Karaoke 2\n\n(Max Spec, aggression 0, high-end process)\n\nOr:\n\n- VR arc: 5\\_HP-Karaoke\n\n- MDX-Net: UVR-MDX Karaoke 1\n\n- MDX-Net: UVR-MDX Karaoke 2\n\n(you might want to turn off high-end process and post process)\n\nOr:\n\n- VR Arc: 5HP-Karaoke-UVR\n\n- VR Arc: 6HP-Karaoke-UVR\n\n- MDX-Net: UVR-MDX-NET Karaoke 1\n\n- MDX-Net: UVR-MDX-NET Karaoke 2\n\n(Min/Min Spec, Window Size 512, Aggression 100, TTA On)\n\nIf your main vocals are confused with backing vocals, use X-Minus and set \"Lead vocal placement\" to center (not in UVR5 at the moment).\n\nOr [Mateus Contini's](#_79cxg1a64b11) method.\n\n[How to extract backing vocals X-Minus Guide](https://x-minus.pro/page/bv-isolation) (can be executed in UVR5 as well)\n\nVinctekan Q&A\n\nQ: Which BVE aggression settings (for VR model, e.g. uvr-bve-4b-sn-44100-1) is good for backing removal?\n\nA: “I recommend starting from exactly from 0 and working from there to either - or +.\n\n0 is the baseline for BVE that are almost perfectly center.\n\nIf it's off to the left or right a little bit, I would start from 50”\n\nQ: How do I tell what side BVs are panned or if they are Stereo 50 % or 80 % without extracting them?\n\nA: “It's more about listening to the track. The way I used to it is to invert the left channel with the right channel. In most cases this should only leave the reverb of the vocals in place, but if there is backing vocals that is panned either left or right, then it should be a bit louder than the reverb. Audacity's [Vocal Reduction and Isolation>Analyze] feature usually can give a rough estimates as to how related the two channels are, but that does not tell where the backing vocal actually is. I would only recommend doing the above with a vocal output, though.”\n\nQ: Does anyone know how to tell what side BV's (backing Vocals) are panned similar to [this](https://cdn.discordapp.com/attachments/900904142669754399/1170201735722188810/bve.png)? Like, is there a way to tell using RipX? Or another tool. In my case I think mine might be Stereo 20 30 percent or lower\n\nA: “Your ears [probably the least effective]\n\nIf you have Audacity, select your entire track, and select [Vocal reduction and Isolation] and select the [Analyze] but it won't tell you which direction the panning is in.\n\nOr use it to isolate the sides, and just take a look at the output levels of each channel.\n\nSpectralayers's [[Unmix>Multichannel Content](https://www.youtube.com/watch?v=Jeg2RzOvS7I)] tab can measure the output of frequencies in the spectrogram and can tell you when certain elements are not equal in loudness, which you can restore.”\n\n- Dango.ai has also a good BVE model (expensive) - at least sometimes it gives better results than uvr bve v2 to get songs without lead vocals. “meant for separating melody from harmony, not separating singer from singer, so you'll hear both [singers in one] stem” if present\n\nLater a new Advanced repair tool was added to “fix any backing-vocals errors”\n\n- AudiosourceRE Demix Pro has BVE/lead vocals model\n\n- lalal.ai has a new decent lead and backing vocals model\n\nIf bve or mel karaoke model CAN do it, then they'll do it better, but if they CAN'T do it, then lalal will do it better. \"I have seen lalal work better on mono audio than bve model.\" ~Isling\n\ndca100fb8: “For Instrumentals with Backing Vocals:\n\nIf Mel Kar doesn't work, it's likely Dango Backing Vocal Keeper will not too, although it’s not always the case, and still worth trying out - separating left and right channel with Dango Backing Vocal Keeper once fixed the issue.\n\nFor back vocals/lead vocals model\n\nIf neither Mel Kar or UVR BVE v2 work, it's likely lalal.ai Lead & Back Vocal Splitter will work instead. Its deep Extraction seems to provide better results than Clear Cut”\n\n##### - Advanced chain processing chart ([image](https://lh3.googleusercontent.com/pw/AP1GczNwiJ8BSJdjCWA4Z0D7CHevRfEbOqbf4uBFZSYFEWIxoSm3tUfJVUrlTMJOolR9wD_IxNZgzf7Efe7Nh58-nAuzrZmoAOwXE-FgDFWztWCmMZcJ0ResQZCjP4PMG26BpWSSMhQO4CEiM4XGplqKggoY_mVTqXaT14EBXpweZ95Dy8SJmI69Wf3usD4pyl0E2zJyMxyWZ5MYKs3uz6eenqpF98BYowhl0Qvq55xLZEqfeUsnbhouJPctM792NzghD81lLh1gxNU5sLpyS_c9y79ZOAvPSnXpHp1vFPoqrPbYhYQ1E70HtxuPb7UOSyptwB4pnQVfuJ_JRZzLWq_GDbdWa2uBHznGLFOLnwrtwlH6Kewd2hU8sE9oJzOhPCBhWoY52bDsLqjSFct7AVfFBWUpNAUQpitAMr4pEAYIjc6EaSMyYgR_Cf8Y05htRoNmOqxn08kynl7xlXtQ-5duX1VcZj6cPZ-QSbaH6W-CyBrbPjfCsDymb4V5Yl53W1oVY5ZRQWzfkfM3_KrmP1RQVimzvAq36Vwv9IwjqL_PR9AcS6HHYukQ88ZwtUzuSo7BnzuuslRFVPMM0NxzbwkPP-MWrNzlPGdMCm8VsLnmIcEcq8CRw9CRFvWzII8LsgLDUcpBeo00BBZxRQ8P0mMtQ3OjQ0opjqb4v7OJqoteW-rTPDaIuvcfgu6udWDDSUJgYuHBHOYB3n42ASD0lShDA3yORjfgPNvkQpDIJ5ZOqlbMOZz2dlOjhzq69glN8dmEx2Kn1z2h9NZo7nhrrt8QWaHnHpGT_OxroAwxHscd6lodzSQKtS86zExbCpW3PrmslkQCeXnKL-LMd9Mmr_xN5tO_zfeEuaVhCBTwzJmbxEqg5yxpgQ1klsIzxeCiqIRzJ96i1A5Tv_BjjTGeHlMjyBbOl_iCT7TSbtP9krhYs0BymO_02BE0q1r95rCOlFHyyj1Fbnh38Zt9hBHEPYOYqsTYHvwhJIjmb2M89xOr2KA9qyybrSa5vbZx4X1y91cSxQ03%3Dw1345-h941-s-no-gm?authuser=0))\n\nIt’s a method utilizing old models, and e.g. Kim Vocals 2 can be potentially replaced by unwa’s BS/Mel-Roformer models in [beta UVR](#_6y2plb943p9v) (or other good method for [vocals](#_n8ac32fhltgg)) or ensembles mentioned in this document. Check the best current methods for vocals in one stem to find what works the best for your song to get all vocals before splitting to other stems using this diagram.\n\nhtdemucs v4 above can be replaced by htdemucs\\_ft, as it's the fine-tuned version of the model (or [MDX23 Colab](#_jmb1yj7x3kj7)). Even better, you can use some of the methods for [4 stems](#_sjf0vefmplt) in this GDoc (like drums on x-minus).\n\nDe-echo and reverb models can be potentially replaced by some better paid plugins like:\n\nDeVerberate by Acon Digital, Accentize DeRoom Pro (more in the [de-reverb](#_5zlfuhnreff5) section).\n\nUVR Denoise can be potentially replaced by less aggressive Aufr33 model on x-minus.pro (used when aggressiveness is set to minimum), and there’s also newer Mel-Roformer (read [de-reverb](#_5zlfuhnreff5) section).\n\nAs for [Karaoke](#_h110k6ouf88c) models, there's e.g. a Mel-Roformer model on x-minus.pro for premium users or MVSEP/jarredeou inference [Colab](#_wbc0pja7faof).\n\n\"If the vocals don't contain harmonies, this model (Mel) is better. In other cases, it is better to use the MDX+UVR Chain ensemble for now.\". It is possible to recreate to some extent this approach while not using BVE v2 models, by processing the output of main vocal model by one of Karaoke/BVE models in UVR (possibly VR model as the latter) using Settings>Additional Settings>Vocal Splitter Options, so it separates using one model, then it uses the result as input for the next model (see the Karaoke section).\n\nMedleyVox (not available in UVR) will be useful in the end in cases when everything else fails after you obtain all vocals in one stem, as it's very narrowband. But you can use AudioSR on it afterwards.\n\n###### >Keeping only **lead vocals**\n\n###### in a song\n\nSometimes the same model might work for lead, sometimes for back vocals depending on a song.\n\nSometimes extracting vocals with a good model can sometimes degrade the quality, experiment both ways but “if they are really quiet in the mix, sometimes they first need to be extracted to come out clear, but lead vocals usually aren't that way… and you can easily confuse the model by extracting them first” - rainboomdash\n\n- Anvuew’s Karaoke BS-Roformer [model](https://huggingface.co/anvuew/karaoke_bs_roformer) | [metrics](https://mvsep.com/quality_checker/entry/9180) | incompatible with UVR RTX 5000 patch | [MSST](#_2y2nycmmf53) | MVSEP | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(stems: Lead vocals/backing with instrumental)\n“this is the best for extracting lead vocals, currently” - rainboomdash\n\n“extracts lead vocals a bit better than karaoke becruily frazer, and in some parts, the lead vocals from karaoke anvuew still sound brighter compared to karaoke becruily frazer, which sounds a bit more compressed. oh, and for some reason, the becruily frazer model doesn’t detect vocals with radio effects, while anvuew’s model handles them just fine” - neoculture\n\n“lead vocals leak into instrumental (...) Mel Becruily and Frazer’s BS don’t have this problem”\n\nIn that case, maybe “isolate the acapella first in almost all cases of using a karaoke model” or use the model below instead.\n\n- Ensemble of becruily frazer karaoke + anvuew karaoke (Max Spec) -\n\nto get the backing vocals, use instrumental only in UVR to get only them - neoculture\n\nIf one model in the ensemble does a bad job, it will spoil the result, so be aware.\n\n- (x-minus/uvronline) Lead and Backing vocal separator (in “Extract Backing vocals>Mel-Roformer Lead/Back)\n\n~~It uses big beta 5e model as preprocessor for becruily Mel Karaoke model~~\n\n“In fact, the big beta 5e model is run after becruily Mel Karaoke” Aufr33 (so you don’t need the additional step to use this separator), plus it also allows controlling option for lead vocal panning like for BVE v2 (It’s to “to \"tell\" the AI ​​where the main vocals are located (how they are mixed).”. “Doesn't even need Lead vocal panning a lot of the time, [the] ability to recognize what is LV and what is BV [is] impressive” - dca).\n\n“The new separator is available in the free version, however, due to its resource intensity, only the first minute of the song will be processed.” if you don’t have a premium.\n\nThe difference from using single becruily Kar model is that, here, “you get the third track, backing vocals.”.\n\n- Mel-Roformer [small\\_karaoke\\_gaboxauf](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/small_karaoke_gaboxaufr.ckpt) by Gabox and Aufr33 | [yaml](https://huggingface.co/pcunwa/Mel-Band-Roformer-small/blob/main/config_melbandroformer_small.yaml) | [Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY)\n\nCompatible with UVR, but not RTX 5000 patch (consider [MSST](#_2y2nycmmf53) in that case for faster separation than using the UVR’s non-RTX 5000 patch). [Metrics](https://mvsep.com/quality_checker/entry/9661).\n\n“it sounds much more fuller than other karaoke models I use [anvuew bs roformer, bs karaoke gabox, and becruily karaoke] and clears out bg vocal much more accurately” - baptizedinfear\n\n“great! got to hear vocals I could not hear before.” - makesomenoiseyuh\n\n“it's not great with duets btw” - Gabox\n\nLowering chunk\\_size to 352800 makes it muddier, but less noise, “maybe in between would be nice” - rainboomdash\n\n- Mel-Roformer Karaoke by becruily [model file](https://huggingface.co/becruily/mel-band-roformer-karaoke/tree/main) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | MVSEP | x-minus\n\n“It's a dual model, trained for both vocals and instrumental. It sounds fuller + understands better what is lead and background vocal\n\n- Gabox Mel [KaraokeGabox](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/kar_gabox.ckpt) model (uses Aufr’s [config](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/config_mel_band_roformer_karaoke.yaml)) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)\n\n“The lead vocals are good and clean!\nWhile the backing tracks are lossy for this model, [it still] provide[s] great convenient for those who need LdV”\n\n“The model doesn't keep the backing vocals below the main vocals, sometimes the backing vocals will be lost even though there are backing vocals there.”\n\n- Newer Gabox experimental [Karaoke model](https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/karaoke) (June 2025). It’s one stem target so keep extract\\_instrumental enabled for the rest stem.\n\n“really hard to tell the difference between this and becruily's karaoke model” minus the latter has more target stems.\n\n- BVE v2 model on x-minus.pro/uvronline.app for premium users | [model](https://mega.nz/file/XZpXyJAD#if8wRRDxHZ0T-HiH8ZRLhXUloNIm87kpGKrBRMlHoq8) (“4band\\_v4\\_ms\\_fullband” stock config) by Aufr33\n\nPlace the model file in Ultimate Vocal Remover\\models\\VR\\_Models and [config](https://drive.google.com/file/d/1aGjDkPhqPLLlOKXeIfWDw09LElHMJfos/view?usp=sharing) file in lib\\_v5\\vr\\_network\\modelparams. Then pick “4band\\_v4\\_ms\\_fullband.json” and BV when asked to recognize the model (it has the same checksum as in modelparams folder if it’s there already). Also, I think it's not a VR 5.1 model.\n\n“Note that this model should be used with a rebalanced mix.\n\nThe recommended music level is no more than 25% or -12 dB.\n\nIf you use this model in your project, please credit me.”\n(it's v1 version is also added in UVR’s “Download More Models”, but also without the stereo width feature which can fix some issues when BVs are confused with other vocals).\n\nOn x-minus “When you select stereo, it applies a stereo narrower before AI processing.”\n\n(sometimes it might work for lead, sometimes for back vocals)\n\nIt used to be one of the best models so far. On x-minus it used voc\\_ft for all vocals as a preprocessor already.\n\n\"Seems to begin a phrase with a bit of confusion between lead and backing, but then kicks in with better separation later in the phrase.\"\n\n“If something struggles to separate on bve v2 I change the lv panning option to either 50 or 80% [stereo or center], and it separates it amazingly.\n\nIt even allows me to separate backing backing vocals from backing vocals”\n\n\"BVE sounds good for now but being an (U)VR model the vocals are soft (it doesn’t extract hard sounds like K, T, S etc. very well)\"\n- For UVR BVE v2 LV bleed in Song without LV - download the BV track and add it to inst v1e model result - no vocal bleed/residues (introC).\n\n- Mel-Roformer Karaoke (by aufr33 & viperx) on [x-minus.pro](http://x-minus.pro/) / [uvronline.app](http://uvronline.app) / [mvsep](https://mvsep.com/)\n\n[model file](https://mega.nz/file/qQA1XTrb#LUNCfUMUwg4m4LZeicQwq_VdKSq9IQN34l0E1bb0fz4) (UVR [instruction](#_6y2plb943p9v)) - online version above might work better\n\n(may extract more than *BVE V2*), if \"extract directly from mixture\" on MVSEP doesn’t detect the BVs (x-minus behavior for this model), the chances are \"extract from vocals part\" on MVSEP (which uses BS-Roformer 2024.08 for it) will detect more BVs (although with possible cross bleed between lead/back in inst+bv)\n\n- Gabox Mel [KaraokeGabox](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/kar_gabox.ckpt) model (uses Aufr’s [config](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/config_mel_band_roformer_karaoke.yaml)) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)\n\n“The lead vocals are good and clean!\nWhile the backing tracks are lossy for this model, [it still] provide[s] great convenient for those who need LdV”\n\n“The model doesn't keep the backing vocals below the main vocals, sometimes the backing vocals will be lost even though there are backing vocals there.”\n\n- Gabox Mel [KaraokeGabox](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/experimental/kar_gabox.ckpt) model (uses Aufr’s [config](https://github.com/deton24/Colab-for-new-MDX_UVR_models/releases/download/v1.0.0/config_mel_band_roformer_karaoke.yaml)) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet)\n\n“The lead vocals are good and clean!\nWhile the backing tracks are lossy for this model, [it still] provide[s] great convenient for those who need LdV”\n\n“The model doesn't keep the backing vocals below the main vocals, sometimes the backing vocals will be lost even though there are backing vocals there.”\n\n- anvuew [dereverb](#_5zlfuhnreff5) Mel-Roformer model v2 (“on some songs I tried it worked better than karaoke models”) not for duets, it debleeds too, “cleaner backing vocals [than the below], can sometimes mistake main vocals/delay for backing vocals more often than [the below])”\n- karokee\\_4band\\_v2\\_sn a.k.a. UVR-MDX NET Karaoke 2 ([MVSEP](https://mvsep.com/) [MDX B (Karaoke)] / [Colab](https://colab.research.google.com/drive/1CO3KRvcFc1EuRh7YJea6DtMM6Tj8NHoB) / [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui) / [x-minus.pro](https://x-minus.pro/)) - “best for keeping lead vocal detail” on its own, “cleaner main vocals, has significantly less bleeding than the MVSep counterpart”, removes backing vocals from a track, but when we use min\\_mag\\_k it can return similar results to:\n\n- Demix Pro (paid, “keeps more backing vocals [than Karaoke 2] (and somehow the lead vocals are also better most of the time, with fuller sound”\n\n“Demix is better for keeping background vocals yes, but for the lead ones they tend to sound weaker (the spectrum isn’t as full and has more holes than karaoke 2, but this isn’t always a bad thing because the lead vocals themselves are cleaner, the mdx karaoke 2 might produce fuller lead vocals, but you will most certainly have some background vocals left too”)\n\n- Fusion model of Gabox Karaoke and Aufr33/viperx model on MVSEP\n(tend to confuse BV/LV more than single models)\n\n- Gonzaluigi Karaoke fusion models ([standard](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/mel_band_karaoke_fusion_standard.ckpt) | [aggresive](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/mel_band_karaoke_fusion_aggressive.ckpt) | [yaml](https://huggingface.co/Gonzaluigi/Mel-Band-Karaoke-Fusion/resolve/main/melband_karaokefusion_gonza.yaml)) -\n\nalso confuses BV/LV more\n\n- MDX B Karaoke on mvsep.com (exclusive) - good, but as an alternative you could use MDX Karaoke 2 in UVR 5 (they are different)\n\n“I personally wouldn't recommend 5/6\\_hp karaoke, except for using 5\\_hp karaoke as a last resort, you could also use the x minus bve model in uvr which sometimes is good with lead vocals”\n\n- UVR-BVE-4B\\_SN-44100-1\n- [Center extraction](#_3c6n9m7vjxul) model\n\n- Melodyne [guide](https://github.com/junh1024/junh1024-Documents/blob/master/Audio/Melodyne%20Quickstart.md#introduction)\n\n- RipX\n\n(doesn’t work for everyone)\n\n- MDX-UVR Inst HQ\\_3 - new, best in removing background vocals from a song (e.g. from Kim Vocal 2)\n\nOr consecutive models processing:\n\n- Vocals (good vocal stem from e.g. voc\\_ft or 1296 or MDX23C single models or ensembles of MDX23C / MDX23 2.2 / UVR top/near top SDR / Ensemble of only vocal models: Kim 1, 2, voc\\_ft, MDX23C\\_D1581, eventually with demucs\\_ft\n\n>The vocal result separated with->\n\nKaraoke model -> Lead\\_Voc & Backing\\_Voc\n\n[Tutorial](https://youtu.be/l43CRVgrv4E)\n\n(+ *experimentally split stereo channels and separate them on their own, then join channels back*)\n\n- [arigato78 method](#_vktvthhthrvh)\n\n“Karaoke 2 really won't pick up any chorus lead vocals EXCEPT for ad-libs\n\n6-HP will pick up the melody, although it's usually muffled as hell”\n\n“Q: is mdx karaoke 2 still the best for lead and back vocals' separation?\n\nA: I'm finding it's the best for \"fullness\" but 6-HP picks up chorus melody while K2 only usually picks up ad-libs\n\nI personally like mixing K2, 6-HP (and sometimes 5-HP if 6-HP sounds very thin) together\n\nalso, let's say a verse has back vocals that are just the melody behind the lead vocal (instead of harmonies) for a doubling effect, sometimes K2 will still pick up both the lead and double.”\n\nAG89 avg ensemble:\n\nUVR-5-1\\_4band\\_v4\\_ms\\_fullband\\_BVE\\_V2,\n\nKaraoke\\_GaboxV2,\n\nmel-band\\_karaoke\\_fusion\\_standard\n\n###### **Harmonies**\n\nE.g. for more layers from the above.\n\nOn MVSEP (or on your own) “'Use As Is' [so the mixture - your input file will be used] and 'Extract Vocals First' [so using vocal model] might be the difference between splitting a vocalist's 'double vocal' and not.” or on your own. - cypha\\_sarin.\n\nAlso, as a preprocessor for all vocals, experiment with panning using e.g. A1StereoControl VST2 32 bit plugin beforehand - dementia2009. (might work as a counterpart of panning trick on UVRonline).\n\n- becruily & frazer BS-Roformer Karaoke [model](https://huggingface.co/becruily/bs-roformer-karaoke/tree/main) (in most cases not using vocal model on the top of it is better for doubles, sometimes “it's necessary if the vocal was very, very quiet, otherwise it's extremely muddy” - rainboomdash; or check [more](#_vg1wnx1dc4g0) Karaoke models)\n\n- Stereo width feature for uvr bve v2 by setting it to 80% (it might use voc\\_ft as preprocessor already) - on [x-minus.pro/uvronline.app](http://x-minus.pro/uvronline.app)\n\n- MVSep SATB Choir (soprano, alt, tenor, bass; much better metrics than the old SATB below, and better BS-Roformer model architecture; use “a karaoke model first and then using the satb model on the backing vocals, it tends to have less confusion and more opportunities for manual cleanup” - dynamic64).\n\n- MVSep Choir (works for e.g. choirs buried under main vocal in e.g. pop music)\n\n- Older [SATB](https://drive.google.com/drive/folders/1BpPgtlDk0yqrlArmrq9vnYErixb8I8zJ) choir models by Dry Paint Dealer Undr (soprano, alto, tenor, and bass - vocal harmonies) “I think it’s a great model for anyone looking to study harmonies more closely. Better than the Medley Vox model. So far, it’s the best result I’ve found\" - Roberto89.\n\nNevertheless, it can have bleeding.\n\n“scnet\\_masked model was the best in the end” - Dry Paint Dealer Undr\n\n“I've tried the SCNet one, it's really noisy, and it has a lot of bleed, it kinda works. I can see the potential of this kind of model ngl.” - smilewasfound\n\n“You can’t install [the VR ones] into UVR since that only supports VR v5 [and 5.1] not [VR v6](https://github.com/tsurumeso/vocal-remover/releases)”\n\nFor VR6 bass and soprano model -X flag is needed.\n\nDemucs and SCNet\\_masked are not compatible with UVR for now either (use [MSST](#_2y2nycmmf53) instead), so only regular SCNet variant can be used in UVR.\n\n- [Medley Vox](https://github.com/jeonchangbin49/medleyvox) (free, 24kHz SR model trained by Cyrus, [Colab](https://colab.research.google.com/drive/10x8mkZmpqiu-oKAd8oBv_GSnZNKfa8r2?usp=sharing), local installation [tutorial](https://youtu.be/VbM4qp0VP80), more [info](#_s4sjh68fo1sw), use e.g. [AudioSR](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07) afterwards; “help[ed] me separate a few layers harmonies on one song, though I had to keep mixing certain ones back together and run them through the same model to get a cleaner result”)\n\n- Sucial Mel-Roformer dereverb/echo model #3 called “fused”: [model](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt) | [yaml](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/config_dereverb_echo_mbr_v2.yaml)\n\n- [Melodyne](https://www.celemony.com/en/melodyne/what-is-melodyne) (paid, 30 days trial) - “the best way to ensure it’s the correct voice”, or\n\n- Hit 'n' Mix [RipX](#_1bm9wmdv6hpf) DAW Pro 7 (paid/trial)\n\nIn Melodyne it is \"harder to do but can be cleaner since you can more easily deal with the incorrect harmonics than RipX sometimes choses\"\n\n\"every time I’d run a song through RipX I was only able to separate 4-5\" harmonies\n\n(or also)\n\n- (prob.) Revoice Pro 5\n\n- Mel-Roformer Karaoke (by becruily) [model file](https://huggingface.co/becruily/mel-band-roformer-karaoke/tree/main) (better than aufr’s model)\n\n- Dango.ai (paid, might be still useful in certain situations)\n\n- Choral Quartets F0 Extractor - “midi outputs, but it works”\n\nFor research\n\n<https://c4dm.eecs.qmul.ac.uk/ChoralSep/>\n\n<https://c4dm.eecs.qmul.ac.uk/EnsembleSet/> (similar results to MedleyVox)\n\n###### **For two singers in a duet from one song**\n\n(use on already separated [vocals](#_n8ac32fhltgg))\n\n- MVSEP SATB (in case of crossbleeding “when i put them in the DAW and inverted it gave really satisfactory results”, worked for two main male vocals where the below failed - lekt0rs)\n\n- becruily & frazer BS Karaoke (sometimes can separate even 3 singers if the vocals aren't completely glued into one, especially if it's male and female - maxerv19,\n\n\"It's getting better than MedleyVox\" - ryanz48)\n\n- Becruily Mel Karaoke\n\n**-** [**MedleyVox**](#_s4sjh68fo1sw)(trained by Cyrus, 24kHz SR - use e.g. [AudioSR](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07) afterwards, MVSEP, [Colab](https://colab.research.google.com/drive/10x8mkZmpqiu-oKAd8oBv_GSnZNKfa8r2?usp=sharing), local installation [tutorial](https://youtu.be/VbM4qp0VP80) [use vocals 238 model], more [info](#_s4sjh68fo1sw))\n\n“best at it, but not guaranteed to work always. 5% chances of perfectly separating duets on MedleyVox, or else it always false detects and switches back and forth” “also does a pretty good job on other solo instruments”\n\n- MVSEP Multispeaker model (Experimental section at the bottom)\nWorks well for rap overlapped with singing in one already separated vocal stem.\n\n“Seems very picky with audio, most of the songs/files I tried didn't work\n\nMedleyVox works on the other side (regardless that it's of lower quality)” - becruily\n\n- MVSEP Male/Female:\n\na) Mel-Roformer Male/Female separation model 13.03 SDR\n\na) SCNet XL Male/Female separation model on MVSEP (same model base)\n\nSDR on the same dataset: 11.8346 vs 6.5259 (Sucial)\n\nSometimes the old Sucial model might still do a better job at times, so feel free to experiment.\n\n- Aufr33 BS-Roformer Male/Female beta [model](https://mega.nz/file/XZwV2QwB#5nvWpmvtoBMTJkpor-lMUZCbBZWDH-3i52ELJS_JmcU) | [config](https://huggingface.co/Sucial/Chorus_Male_Female_BS_Roformer/blob/main/config_chorus_male_female_bs_roformer.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | x-minus (uses Kim-Mel-Band-Roformer-FT2 as preprocessor) | MVSEP\n\n(based on BS-RoFormer Chorus Male Female by Sucial) SDR 8.18\n\n- Male/female BS-Roformer [model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2525052333) by Sucial | [config](https://drive.google.com/file/d/15dxMvEanC8h_djEuHoQXHKMk0ERN02_y/view?usp=sharing) for [UVR](#_6y2plb943p9v) | tensor match error [fix](https://github.com/nomadkaraoke/python-audio-separator/releases/download/model-configs/deverb_bs_roformer_8_384dim_10depth_config.yaml)\nIf they sing at intervals [one by one - not together] they cannot be separated. | MVSEP\n\n- Mel-Roformer Karaoke on x-minus.pro (model files in [Karaoke](#_vg1wnx1dc4g0))\n\n- MDX-UVR Karaoke models\n\n- VR's 5\\_HP or ev. 6\\_HP in UVR\n\n- BVE v2 on x-minus (already uses voc\\_ft as preprocessor for separating vocals)\n\nIt might be still not enough, then continue and/or look for Dolby Atmos [rip](#_ueeiwv6i39ca) and retry; works for several backing vocals when lead vocal panning is set to center, “then running the bgv through bve v2 again, but this time set lead vocal panning to 80% but be aware the lead vocal quality will not be that good with this model” - Isling)\n\n- Dry Paint Dealer Undr’s Melband Roformer and Demucs Lead and Rhythm guitar [models](https://drive.google.com/drive/folders/1JH2tjhgDcJgvdi-hrQT82RsaV2XhE8hn?usp=sharing)\n\n- [duet-svs-diffusion](https://github.com/iamycy/duet-svs-diffusion) (“mono/16kHz, 24kHz sample rate, and quality is lower than MedleyVox models”)\n\n- RipX (paid)\n\n- Melodyne (paid, “with polyphonic mode, with a lot of manual finetuning in the detection tab, and this can only work if the voices are not on same pitch”).\n\n- SpectraLayers 11 (but it’s mainly dedicated for voice, not singing)\n\n*Spectral painting:*\n\n- [ISSE](https://isse.sourceforge.net/) (free, you can figure out which voice is who's just by frequencies alone; use on e.g. separated vocals too)\n\n- RX Editor’s brush ([video](https://drive.google.com/file/d/118uALJXl_qBLtL3nkftdOxOJydEkDYZQ/view?usp=sharing) by Bas)\n\n- Audacity ([image](https://imgur.com/a/PI3VJqa)) less effective\n\n- [Ampter](https://github.com/echometerain/Ampter)\n\n- [Filter-Artist](https://github.com/HarmoniaLeo/Filter-Artist)\n\n- [AudioPaint](http://www.nicolasfournel.com/?page_id=125)\n\n*Notes*\n\nIf artists sing the same notes, Karaoke models will rather not work in this case.\n\nIf BVs are heard in the center, don't use the MDX karaoke model but the VR karaoke model instead.\n\nUse the chain algorithm with mdx (kar) v2 on x-minus which will use uvr (kar) v2 to solve the issue. It will be available after you process the song with MDX. (Aufr33/dca)\n\n“The MDX models seem to have a cleaner separation between lead and backing/background vocals, but they often don't do any actual separation, meanwhile the VR models are less clean, but they seem to be better at detecting lead and background”\n\n“MDX models basically require the lead to be completely center and the BV to be stereo\n\nwhereas VR ones don't really care as much about stereo placement”\n\nYou could also ask [playdasegunda](https://www.youtube.com/%40playdasegunda)/play da primeira/viperx for separation, as he has some decent private method/models for double vocals better than becruily model (although the latter can be still close), although newer frazer & becruily model is “no way inferior to the ViperX (Play da Segunda). The rumour says, viperx model on Play da Segunda was trained on 40GB dataset allegedly (it would be small), and the model can be bought for 500$ when you contact via email. It's actually a set of models used for the final inference.\nFor the record, the open-sourced model by the duo costed 600$ on compute and probably used a bigger dataset, achieved a bit smaller SDR, but it’s a single model.\n\nFor research:\n\n“These archs are [...] really promising for multiple speakers separation, and should be working for multiple singers separation if trained on singing voice:\n\n<https://github.com/dmlguq456/SepReformer> (current SOTA)\n\n<https://github.com/JusperLee/TDANet>\n\n<https://github.com/alibabasglab/MossFormer2>”\n\n> Separating two main vocals\n\nE.g. one panned about 30% left and the other right\n\n- “use bve v2 and click the “lead vocal panning” button” on x-minus premium\n\n**For vocals with vocoder**\n\n- voc\\_ft\n\nAlternatively, you can use:\n\n- 5HP Karaoke (e.g. with aggression settings raised up) or\n\n- Karaoke 2 model (UVR5 or Colabs). Try out separating the result obtained with voc\\_ft as well.\n\n- BS-Roformer model ver. 2024.04 on MVSEP (better on vocoder than the viperx’ model).\n\n\"If you have a track with 3 different vocal layers at different parts, it's better to only isolate the parts with 'two voices at once' so to speak\"\n\n###### **Various speakers' isolation (from e.g. podcast or movie)** - MVSEP Male/Female SCNet model\n\n- MVSEP Male/Female MelRoformer model\n\n###### - Aufr33 BS-Roformer Male/Female beta [model](https://mega.nz/file/XZwV2QwB#5nvWpmvtoBMTJkpor-lMUZCbBZWDH-3i52ELJS_JmcU) | [config](https://huggingface.co/Sucial/Chorus_Male_Female_BS_Roformer/blob/main/config_chorus_male_female_bs_roformer.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (based on the model below)\n\n###### - Male/female BS-Roformer [model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2525052333) by Sucial | [config](https://drive.google.com/file/d/15dxMvEanC8h_djEuHoQXHKMk0ERN02_y/view?usp=sharing) for UVR | tensor match error [fix](https://github.com/nomadkaraoke/python-audio-separator/releases/download/model-configs/deverb_bs_roformer_8_384dim_10depth_config.yaml) (if they sing at intervals [one by one] they cannot be separated)\n\n- Multispeaker model on MVSEP\n\n- BS-Roformer becruily & frazer Karaoke model\n\n\\_\\_\\_\\_\n\n- [Guide and script for WhisperX](#_ak53injalbkf)\n\n- <https://github.com/alexlnkp/Easy-Audio-Diarisation>\n\n- [Spectralayers](https://www.youtube.com/watch?v=gVpr9ewLMZg&t=90s) 11’s unmix multiple voices option\n\n(for further research) - some of these tools might get useful:\n\n<https://github.com/dmlguq456/SepReformer> (SOTA for 2 speakers)\n\n<https://paperswithcode.com/task/speaker-separation/latest>\n\n<https://arxiv.org/abs/2301.13341>\n\n<https://paperswithcode.com/task/multi-speaker-source-separation/latest>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### **> 4-6 stems** (drums, bass, others, vocals + opt. **guitar**, **piano**): - Currently when used on AI-generated music, usually hihats will be left behind.\n\n- You might want to use the *already well-sounding instrumental,* possessed with 2 stem model in the section above first, and then separate using the following models.\n- Furthermore, you can slow down your song by x0.75 speed - the result can be - more elements in other stem and better snaps and human claps using 4 stems.\n\nRead the [Tips to enhance separation](#_929g1wjjaxz7) #4 for more.\n\n[-](https://huggingface.co/jarredou/BS-ROFO-SW-Fixed/tree/main) Logic Pro [(](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Colab_Inference_BSRofo_SW_fp16.ipynb)May 2025 update[)](https://github.com/undef13/splifft/releases) / BS-Roformer SW 6 stem [|](https://drive.google.com/drive/folders/1ee9HBdwygactWLi_7hdZiFgFNv45Y22m) MVSEP [|](https://drive.google.com/file/d/1mHbBZGcjXHwVfV5hxfyLY2d6ZhaZofkB/view?usp=sharing) uvronline\n\nSDR bass 14.57, drums 14.05, piano 7.79, guitar 9.00, other 8.66, vocals 11.27\n\nCurrently, the best single model bag SDR for all stems but vocals, drums have lower fullness than MVSep SCNet XL drums 14.26 vs 21.21). Excellent guitar and piano.\n\n“guitar model sounds better than demucs, mvsep, and moises” - Sausum\n\n“it's not a fullness emphasis or anything, but it's shockingly good at understanding different types of instruments and keeping them consistent sounding” - becruily\n\nvocals doesn’t have the biggest metric, but are good for deep voices.\nDrums lacks some fullness but “I got better drums/bass separation with that model than with any others when input is some live/rehearsal recordings with shitty sound” - jarredou\n\nAlthough, compared with Mel-Roformer drums on uvronline:\n\n“separates far far better when it’s programmed instruments compared to actual recorded ones” - isling\n\n“Roformer SW is putting finger snaps and foot taps as vocals and in the vocal stems.” - GodzFire\n\n“gets not just drums but anything percussive/non-melodic. Which I personally don't mind, but yeah it does cause problems with [drumsep](#_m55fp5i7rdpm) models because they're only expecting standard drums.”\n\nBass can be occasionally worse vs demucs\\_ft as bass stem Demucs “considers not only spectrograms but also waveforms”.\n\n“can't differ an electric bass with pedal effect from an electric guitar” - qraiqu\n\n\"better than Lalal.ai by a long shot too\" - nowarrantywarren\n\nAs for MVSEP “just as good a job on vocals as the paid version's ensemble preset.”\n\nIn UVR it takes two hours for overlap 11 for a 4 minute file to separate on 6700 XT. Keep it at 2 (the fastest) - it will also take 2 hours, but on only i3-7100u (GPU Conversion disabled).\nHighest SDR with 882000 chunk\\_size. 352800 on 4GB NVIDIA GPUs works faster with this or lower than 485100 setting. Some people prefer using it with overlap 8.\n\n- xlancelab BS-Roformer ([inference](https://huggingface.co/spaces/chenxie95/xlance-msr/tree/main) | [model](https://huggingface.co/chenxie95/xlance-msr-ckpt/tree/main) | [paper](https://www.arxiv.org/abs/2601.04343)) - trained on the SW model with additional percussion, synth and orchestra stems\n\n- MDX23 v.2.5 by ZFTurbo, fork by jarredou ([Colab](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb); 4 stems when it's enabled)\n\nMultisong dataset SDR bass: 12.58, drums: 11.97, other: 7.28, vocals: 11.10 (v2.4)\n\nIt’s weighted ensemble of various 4 stem Demucs models with weighted ensemble of 2 stem models for 4 stem input, so the metrics for RAW 4 stems output (without getting instrumental from ensemble first) will be a bit lower, and more for other stem - even by 1,38+, and 0.24+ for bass, and 0.02+ for drums ([read more](#_jmb1yj7x3kj7)).\n~\"compared to this, demucs\\_ft drums sound like compressed\"\n\n- Ensemble 2/4/8 stems ([MVSEP](https://mvsep.com/), paid) - similar or better results with newer single stem models combined, various ensembles to choose from, freedom to experiment.\n\n[*Read*](#_6f1v88my7hfk) *all the ensembles metrics sorted by instrumental bleedless.*\n\n##### Chained separation order\n\nWith single stem models below, feel free to experiment with different orders of sequential stem separation to ehnhance separation results:\n\n#1\n\n1) Instrumental model first 2) then drums or bass 3) piano or guitar 4) strings or horns\nNote: If your song has “weird percussion that don't get picked up by drum models and if it's piano-heavy, I would go for piano first, but it sometimes leaves piano behind” - Isling\n\n#2\n1) Instrumental model first 2) drums 3) piano 4) strings or horns 5) bass 6) guitars\nThis way once someone “ended up with a decent 5th-stage “other” stem that seemed to contain some unknown synth sounds + possible orchestra hits” vs when bass was after drums when “gave me a messy “other” with stray piano & strings content” - SilSinn9821\n\n#3\n\n“After each remove, you also remove some useful parts for next instruments. So I'd propose to first use the models with the best quality. Anyway, in the end others can be very muddy” - ZFTurbo\n\n#4\n\n“I think this is my new default [order]:\n\nBass Ensemble (SCNet XL + BS Roformer + HTDemucs4), Drums MVSep SCNet XL, Piano MVSep SCNet Large, Organ MelRoformer, Saxophone MelRoformer, MVSep Wind Ensemble (SCNet + Mel), MVSep Guitar Ensemble (BSRoformer+ MelRoformer), MVSep Strings” - dynamic64\n\nExample: “Personally, I like putting the instrumental through MVSep bass model (above), then putting the other stem through MVSep drums (specifically SCNet XL), then putting the other stem of that through the 6 stem model [a.k.a. SW]\n\nThe 6 stem model is best for piano and guitar, and separating out the drums and bass beforehand helps that model not have to work as hard.\n\nAnd it also allows you to manually add anything that the first 2 models missed, because its a 6 stem model, and re-separates any missed drum and bass.\n\nCreates close to studio results. As close to studio as I've ever heard”\n\n#5\n\nIn case of ensembling various models of the same stem, if the top model in terms of SDR doesn’t have significantly lower metric, sometimes it’s better to use it instead of ensemble [esp. if it doesn’t have any crossbleeding] - dynamic64.\n\nJust be aware that it might vary from song to song\n\n#6\\*\n\n“Drums model often bleeds the bass and 808s into the drum track, [using bass model] prevents this issue from happening”\n\n###### **Drums**\n\n- Ensemble: MVSEP Drums Mel + SCNet XL ~”Usually works the best” - dynamic64, but you might want to saver intermediate files and split it into different fragments to get the best of both. Consider using instrumental model result from e.g. becruily model as input.\n\n- Drums only/other SCNet XL model by ZFTurbo on MVSEP, SDR 13.72\n\n“Very hit or miss. When they're good they're really good, but when they're bad there's nothing you can do other than use a different model” - dynamic64\n- BS-Roformer SW (e.g. drums-only variant on MVSEP)\n- [MDX23](#_jmb1yj7x3kj7) (no drums-only variant, so 4 stems; sometimes better than the SW, esp. for [drumsep](#_m55fp5i7rdpm))\n\n- [Demucs\\_ft](#_m9ndauawzs5f) (no drums-only variant, more “compressed” sound than MDX23)\n\n- Drums only/other Mel-Roformer model by viperx on x-minus.pro (occasionally might work only with [this](https://uvronline.app/ai?discordtest) link), 12.54 SDR\n\n“compared to demucs\\_ft it was too muddy and had too much bleed at the same time\n\nboth MVSep and xminus” - isling\n\n“I got some bad results (that could ruin the ensemble mode). On these tracks, uvronline [x-minus] melband drums model was giving better results” - jarredou\nPreviously the best\n\n- Drums only/other SCNet Large (“x-minus’ Mel band drums model is better” - drypaintdealerundr)\n\n- Drums only/other Mel-Roformer model by ZFTurbo on mvsep.com, 12.76 SDR\n\nOlder ZF’s model\n\n- 1053 BS-Roformer drums/bass model by viperx in UVR Roformer beta or [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb).\n\nVery good drums with bass in one stem model - use instrumental as input to avoid vocal residues)\nMore [metrics](https://imgur.com/a/hnOvdxx) for drums models.\n\n- MVSep Percussion\n\n- xlancelab Percussion ([inference](https://huggingface.co/spaces/chenxie95/xlance-msr/tree/main) | [model](https://huggingface.co/chenxie95/xlance-msr-ckpt/tree/main))\n\n- MVSep Tambourine\n\n- MVSep Timpani\n\n- MVSep Congas\n\n###### **Bass**\n\n###### - MVSEP Bass SCNet XL (the best 13.81 SDR, “It passes Food Mart - Tomodachi Life test. That's the first model to”,\n\n- Ensemble of SCNet XL, BS and HTDemucs4 models (SDR 14.07); SCNet can be sometimes worse than Demucs which “considers not only spectrograms but also waveforms” - Unwa)\nAfter separation, you might want to then apply Mel-RoFormer De-noise to remove the high noise, and finish with Apollo Universal by Lew ([model](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/tag/uni) | [Colab](https://colab.research.google.com/github/jarredou/Apollo-Colab-Inference/blob/main/Apollo_Audio_Restoration_Colab.ipynb)) to get more clarity (Tobias51).\n\n- MVSEP Bass BS Roformer 12.49 (integrated to 2 stem and multi/All-in stem ensemble too, worse other stem vs the below - more “empty” than below, problems with getting even results when bass contains a low pass filter with high resonances, picks up more “actual bass guitar than x-minus” model - drypaintdealerundr)\n\n- x-minus.pro BS (much better at treble-heavy bass tones than demucs, better other stem than above, “catches higher end and synth basses. It makes it sound cleaner” although might sound “weird and muddy” compared to demucs\\_ft at times, and often when it does not capture synth bass, the mvsep bass will do - isling)\n\n- MVSEP Bass BS Roformer SW - might be worse than the above, at least the ensemble with SW model is not better than the 14.07 - isling)\n\n- <https://twoshot.app/model/548> (paid)\n\n- MVSep **Double Bass** model\n\n“The BS-Roformer SW bass model should probably be used first to extract the double bass. Creates a better sound” - dynamic64\n\n- MVSep **Synth** (also, it can sometimes pick up bass in places where regular model’s can’t)\n\n- xlancelab Synth ([Inference](https://huggingface.co/spaces/chenxie95/xlance-msr/tree/main) | [models](https://huggingface.co/chenxie95/xlance-msr-ckpt/tree/main))\n\n**4 stems** in one models\n\n- Demucs\\_ft (4 stem) - the best single Demucs’ model ([Colab](https://colab.research.google.com/drive/117SWWC0k9N2MBj7biagHjkRZpmd_ozu1) / [MVSEP](https://mvsep.com/) / [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui))\n\nMultisong dataset SDR 9.48: bass: 12.24, drums: 11.41, other: 5.84, vocals: 8.43\n(shifts=1, overlap=0.95)\nBetter drums and vocals than in Demucs 6 stem model, decent **acoustic guitar** results in 6s. Good bass stem as Demucs “considers not only spectrograms but also waveforms”.\n\nFor 4 stems alternatively check MDX\\_extra, generally Demucs 6 stem model is worse than MDX-B (a.k.a. Leaderbord B) 4 stem model released with [MDX-Net arch](https://github.com/kuielab/mdx-net) from MDX21 competition (kuielab\\_b\\_x.onnx in this [Colab](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb)), and is also faster than Demucs 6s.\nFor Demucs use overlap 0.1 if you have instrumental instead of mixture mixed with vocals as input (at least it works with ft model) and shifts 10 or higher. For normal use case (not instrumentals input) it will give more vocal residues, overlap 0.75 is max reasonable speed-wise, as a last resort 0.95, with shifts 10-20 max.\n\n- [SCNet-large\\_starrytong](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.9) model (4 stems) ([Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) or [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md))\n\nMultisong dataset SDR 9.29: bass: 11.28, drums: 11.24, other: 5.58, vocals: 9.06 (overlap: 4)\n\nIt’s 3x faster than Demucs (Nvidia GPU or CPU-only) and sounds better for some people, except for bass. [SDR](https://mvsep.com/quality_checker/entry/7250)-wise, vocals are better than in demucs\\_ft (which is low vs single vocal/inst models anyway). Better SDR than starrytong’s MUSDB18 and Mamba models.\n\n- Ableton 12.3 update’s high quality option separation (4 stems) - slow, works only on CPU, probably utilizes BS-Roformer. Better bleedless than ZFTurbo SCNet XL undertrained public model below, and better SDR. Might take 20 minutes for 2 minutes separation on slower CPUs (iirc mobile Sandy/Ivy). The default separation mode has very low metrics.\n\n- [SCNet XL IHF](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.15) model (4 stems) by ZFTurbo\n\nMultisong dataset SDR 9.93: bass: 11.94, drums: 11.58, other: 6.49, vocals: 9.69\n\n- [SCNet XL](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.13) model (4 stems) by ZFTurbo ([Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [MSST](https://github.com/ZFTurbo/Music-Source-Separation-Training/) | [UVR beta patch](#_6y2plb943p9v))\nMultisong dataset SDR 9.72: bass: 11.87, drums: 11.49, other: 6.19, vocals: 9.32\n\nBetter metrics than the starrytong model but “downgrade to the Large model since it produces a f\\*\\*\\* ton of buzzing” due to undertraining.\n\nOnly bass is better in Demucs\\_ft - 12.24, although drums might be still better in demucs\\_ft.\n\n2 stem model on MVSEP is further trained iirc.\n\n- [BS-Roformer](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.12) model (4 stems) by ZFTurbo | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md)\nMultisong dataset SDR 9.38: bass: 11.08, drums: 11.29, other: 5.96, vocals: 9.19\nTrained on MUSDB18HQ\n\n- Mel-Roformer [models](https://huggingface.co/Aname-Tommy/melbandroformer4stems/tree/main) (4 stems) by Aname\n\na) Large (4GB): multisong dataset SDR drums: 9.72, bass: 9.40, other: 5.11\n\nb) XL (7GB):\n\nDespite lower AVG SDR on musdb18 dataset (8.54 vs 9), it seems to outperform demucs\\_ft model (only other stem has better SDR in demucs\\_ft - all other metrics are better in SCNet/XL/BS-Roformer).\n\nXL is “heavy and slow, without giving a quality boost compared to existing public 4 stems models trained on musdb18 by ZFTurbo and starrytong (BsRofo and SCNet large/XL [above])” - jarredou (maybe minus buzzing in the public XL model).\n\n“Drums are sounding really good in particular, tested a couple songs with the large model after using unwa's v1e+ for instrumental” “drums are absolutely the standout”.\n\n“The bass stem is definitely the weakest one from this new model. Very, very muddy and inconsistent.” - santilli\\_\n\n“Large [variant] works in like 99% use case” “Large split sounds amazing so far tho”\n\nXL “result would take so much longer, but the large results sounded better IMO” - 5B\n\nXL model won’t work with default settings on Colab, and very slow on e.g. RTX 3060, “on 4070 Super it took like 6 mins on XL 4 stems compared to 30 seconds on Large 4 stems”\n\n- [SCNet Tran](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.14) a.k.a. small model (4 stems) by ZFTurbo\n\nMultisong dataset SDR bass: 10.99, drums: 10.87, other: 5.63, vocals: 8.42\n\nOutperformed by the above models at least SDR-wise. Cannot be used in UVR.\n\n- [KUIELab-MDXNET23C](https://drive.google.com/file/d/1M24__8Qnd648ceXOH5PLVWenVeh6maGo/view) (4 stems) - its first scores were probably from ensemble of its five models, and in that configuration it had better SDR than demucs\\_ft on its own, and drums had better SDR than “SCNet-large\\_starrytong” above (so single models’ score of any of these MDX23C models is probably lower than in demucs\\_ft).\n- Lighter “model1” drums sounds surprisingly better than htdemucs non\\_ft v4 on previously separated instrumental. It handles trap really well and preserves hi-hats correctly, but in cost of other stem bleeding. v4 model can be used to clean it a bit further, but at least using GPU Conversion on AMD and older directml.dll for some GPUs, it adds more noise/artefacts, so use CPU in that case (tested on Roformer as preprocessor for instrumental). It’s relatively fast, but not as mdx\\_extra (which sounds rather lo-fi in that case).\n- Bigger “model2” is heavier and doesn’t work on at least AMD 4GB VRAM GPUs on Beta Roformer patch #2 (before the introduction of the new overlap code).\n\nTo run model1/2 in UVR “You must change the model names in \"mdx\\_C\" from \"ckpt\" [name] to model1.ckpt, model2.ckpt, & model3.ckpt. [so simply add the name to the extension]” and then copy the ckpts to models\\MDX-Net without yamls. There are actually 3 mdx23c models there (and 2 demucs), but model3 seems to be only for vocals (and with low SDR). So the two of three most important were explained above.\nOG KUIELab’s [repo](https://github.com/kuielab/sdx23/tree/mdx_C#reproduction).\n\n- [model\\_mdx23c\\_ep\\_168\\_sdr\\_7.0207](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v1.0.1) (4 stems)\nMultisong dataset SDR bass: 8.40, drums: 7.73, other: 4.57, vocals: 7.36\n4 stems, also trained on MUSDB18HQ, but by ZFTurbo, it’s different from the above, similar size to “model2”.\n\n- Aname 4 stem BS-Roformer [model](https://mega.nz/file/GwY1TQoB#UmeMGO2BBtgrUXkmXXQQVXqwR_hwxaAmkycDr-fitWg) | [yaml](https://mega.nz/file/Pwp32DrY#Kyl5sK3j6l5kXCe7Br52gN2CXn9c8N6lzOYT1g0FOS0)\nMultisong dataset SDR bass: 9.79, drums: 10.21, other: 5.27, vocals: 9.13\n\nIt has better SDR than the 7.0207 above (as in the SDR metrics link below), but worse than demucs\\_ft and BS-Roformer 4 stem ZFTurbo model above.\n\n- BS-RoFormer 4 stems model by yukunelatyh / SYH99999 added on x-minus\n\n<https://uvronline.app/ai?discordtest>\nMultisong dataset SDR bass: 8.68, drums: 10.37, other: 5.05, vocals: 8.57\nSome people like it more than Demucs, but “it's like demucs v4 but worse, I think.\n\nThe vocals have a ton of bleed, the bass is disappointing tbh.\n\nThe other stem has a ton of bgv and adlib bleed in it” Isling\nIt has [SDR metrics](https://imgur.com/a/O2qDgTQ) for all stems worse than 4 stem BS-Roformer by ZFTurbo and demuics\\_ft.\n\n- v2 of it was added on site with lower metrics for all stems in later period.\n\nSmaller public 4 stem models and all metrics:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/pretrained_models.md#multi-stem-models>\n\n- [GSEP AI](#_yy2jex1n5sq) (2-4-6 stems, sonically it used to have the best other stem vs Demucs, also piano in Demucs is worse, and it picks up e-piano more frequently, GSEP **electric guitar** model doesn't include acoustic, it's only electric). In general, it used to have a very good **piano** model before many alternatives existed\n\n- [Ripple](#_f0orpif22rll) (defunct; and used to be for US iOS, at one point best SDR for all-in one single 4 stem (besides the other stem), but bad, bleedy other stem. Back then could be the best bass stem, and/or kick in drums, but not the best drums in overall vs demucs\\_ft, “you need something to get the rest of the drums out of the ‘other’ stem and at that point might as well use a proper drum model”, good vocals. You can minimize residues in Ripple by providing an already well separated instrumental from the section above, and/or minimizing volume of the input file by 3/6dB.\n\n- [Bandlab Splitter](https://www.bandlab.com/splitter) (4-6 stem - guitar, piano, web and iOS/Android app) - previously could be used e.g. for cleaning stems from other services, 48kHz output, stems can be misaligned, the quality got worse since one of the updates (rather not worth using anymore)\n\n- [Audioshake](https://www.audioshake.ai/) (paid, only non-copyrightes music, or slowed down songs [see workaround in \"paid\" above]) - sometimes better results than Demucs ft model.\n\n- Spectralayers 10 - mainly for bass and drum separation -\n\n“I think I've got some really comparable samples out of jarredou's MDX23 Colab fork”, but for vocals and instrumentals it’s mediocre [in Spectralayers 10].\n\n- music.ai - “Bass was a fair bit better than Demucs HT, Drums about the same. Guitars were very good though. Vocal was almost the same as my cleaned up work. (...) I'd say a little clearer than MVSEP 4 ensemble. It seems to get the instrument bleed out quite well, (...) An engineer I've worked with demixed to almost the same results, it took me a few hours and achieve it [in] 39 seconds” Sam Hocking\n\n- dango.ai (<https://tuanziai.com/en-US>) - also has 4 or more stems separation (expensive)\n\n- (old) MDX23 1.0 by ZFTurbo 4 stems (Colab, desktop app, as above, much cleaner vs demucs\\_ft, less aggressive, but in 1.0 more low volume vocal residues in completely quiet places in instrumentals vs e.g. HQ\\_3, instrumentals as input should sound similar to the current v. 2.4 fork as only 4 stem separation code didn’t change much since then)\n\n- MVSEP has also single **piano,** **guitar and bass** models (in many cases, guitar model can pick up piano better than piano model;\n\n\"works great for songs with grand piano, but only grand piano, since that’s what it was trained on.\n\nSame with guitar, which catches more piano than piano model does, ironically\").\n\n- x-minus/uvronline.app has also single acoustic and guitar models by viperx\n\n- viperx’ piano model is also on x-minus/uvronline.app.\n\n(More on piano and guitar models later below)\n\nTo enhance 4 stem results, you can use good instrumental obtained from other source as input for the above (before instrumental Roformers it could be e.g. [KaraFan](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb), and its different presets ensembled in UVR5 app with Audio Tools>Manual Ensemble)\n\nFor the best results for piano or guitar models, use other stem from 4 stems from e.g. “Ensemble 8 models” or MDX23 Colab or htdemucs\\_ft as input.\n\nMoises.ai - although drums might be better using e.g. “MVSep Drums” already, probably vs Mel-Roformer on MVSEP or x-minus (not sure) [Moises] can give “better results (...) if the input material is for example cassette-tape sourced or post-FM).\n\nFL Studio - rather nothing better than the solutions above\n\nOlder 4 stem models in UVR (for some specific songs, e.g. while trying to fix bleeding across stems):\n\nhtdemucs\n\nhtdemucs\\_mmi\n\nmdx\\_extra\n\nkuielab\\_b\n\n###### **>Sep. parts of drums a.k.a. Drumsep**\n\n(kick/hi-hat/snare/toms/…)\n\nWorks for drums stem from e.g. Demucs\\_ft, MDX23 Colab or MVSEP Drums (the SW model sometimes causes issues for Drumsep).\n\nDrums models descriptions [here](#_m8brt73sq15r).\n\nConsider using a good [instrumental](#_2vdz5zlpb27h) model for the drums model, and then use drumsep.\n\n- MVSEP 8 stems ensemble of all the 4 drumsep models below (so besides the old Demucs model by Imagoy) [metrics](https://i.imgur.com/sR5pNP3.png)\n\n- MVSEP’s SCNet 4 stem (kick, snare, toms, cymbals) best SDR for kick and similar to 6s below for toms: -0.01 SDR difference)\n\n- MVSEP’s SCNet 5 stem (cymbals, hi-hat, kick, snare, toms)\n\n- MVSEP’s SCNet 6 stem model (ride, crash, hi-hat, kick, snare, toms) worse snare SDR\n\n- [jarredou/Aufr33 MDX23C 6 stem model](https://github.com/jarredou/models/releases/tag/aufr33-jarredou_MDX23C_DrumSep_model_v0.1) (kick/hi-hat/snare/toms/ride and crash stems) worse overall SDR, but it's a public model usable in UVR or inference Colab, and works better for debleeding (unavailable on MVSEP)\n\n- [jarredou MDX23C 5 stem model](https://github.com/jarredou/models/releases/tag/DrumSep)\n\n(kick, snare, toms, hi-hat, cymbals)\n\nnewer model, better SDR (not vs MVSEP), MVSEP, Colabs\n\n*More comparisons and metrics of these models* [*here*](#_m55fp5i7rdpm)\n\n*Why not the SW as input?*\n\nIt “gets not just drums but anything percussive/non-melodic. (...) it does cause problems with drumsep models because they're only expecting standard drums.”\n\n“I gotta use acoustic guitar first if I'm using SW drums on Korn or anything with slap bass.\n\n(...) bagpipes before bowed strings is another, but probably also guitar before bagpipes.. thats iffy”\n\n*Outdated*\n\n- SpectraLayers 11 (Unmix Drums)>OG [drumsep](#_jmjab44ryjjo) by Imagoy/>[FactorSynth](#_cz4j2d3uf48s) (depending on how far you want to unmix the drums) >Regroover>UnMixingStation (all the last three paid)>\n\nVirtual DJ (Stems 2.0, barely, or doesn’t pick those instruments at all).\n\n- [LarsNet](#_f067glwjzyi4) (vs OG drumsep, it also allows separating hi-hats and cymbals, toms might be better)\n\n- RipX (paid)\n\n- SpectraLayers 10 (paid, sometimes worse, sometimes better than OG Drumsep. IDK if it was added in update or main version) \"drumsep works a f\\* ton better when separating on this one song I've tested with the pitch shifted down 2\"\n\n- FADR.com (in paid subscription)\n\n- Moises.ai (only for pro)\n\nCompared to OG drumsep, Regroover allows more separations, especially when used multiple times, so allows removing parts of kicks, parts of snares etc, noises etc. More deep control. Plus, it nulls easily. But drumsep sounds better on its own, especially with higher parameters like e.g. shifts 20 and overlap 0.75-0.98. Now it can be replaced by the public MDX23C model.\n\n###### **Strings**\n\n- Dango.ai (e.g. violin, erhu; paid, free 30 seconds fragments) - \"impressive results\" for at least violin\n\n- Music.ai (paid, “Dango.ai and Music.ai [previously] the best strings models [Dango sounds fuller meanwhile Music.ai has more accurate recognition of strings but sounds a bit too filtered]” - from before BS Strings release)\n\n- MVSep Bowed Strings (“doesn't disappoint”, SDR 5.41; formerly “MVSep Strings”)\n\n- MVSep Plucked Strings\n\n- MVSep Violin (sometimes does better than the strings model for strings)\n\n- MVSep Synth (has synth strings stems included during training)\n\n- MVSep SATB Choir (works with strings too)\n\n- Moises.ai (paid, not bad)\n\n- Audioshake\n\n- MVSEP Violin (sometimes does better than the strings model for strings and has also biger SDR on strings dataset)\n- MVSEP Strings MDX23C (it’s “weak”, SDR 3.84)\n\n- x-minus.pro/Uvronline.app Mel-Roformer model by viperx (SDR 2.87) (sometimes with this [link](https://uvronline.app/ai?discordtest) or [this](https://uvronline.app/ai?hp&test) link)\n\n- [Demix Pro](#_4yn6zawn80la) (paid, free trial)\n\n- [RipX DeepRemix](#_1bm9wmdv6hpf) (once was told to be the best bass model, but it doesn’t score that good SDR-wise, probably it’s Demucs 3 (demucs\\_extra) and is worse than Demucs\\_ft and rather also vs MDX23 above; could have been updated) (paid)\n\n- Sometimes Wind model in UVR5 GUI picks up strings\n\n- MVSEP Harp\n\n- MVSep Mandolin\n\n- MVSep Banjo\n\n- MVSep Sitar\n\n- MVSep Ukulele\n\n- MVSep Dobro\n\n**Violin**\n\n- MVSep Violin BS Roformer\n\n“it can even separate violin quartets from cellos, so cool.” - smilewasfound\n\n“Very neat model. (...) Sometimes the model does seem to pick up more than just violins imo, but yeah for separating high strings in particular it is really cool.” - Musicalman\n\n- Dango.ai “impressive”\n\n- MVSEP Viola\n\n- MVSEP Chello\n\n###### **Electric guitar** *“For better results you might try first removing vocals.”*\n\n[Audioshake](#_tc4az79fufkn)>[RipX](#_1bm9wmdv6hpf)/>[Demix Pro](#_4yn6zawn80la)>[lalal.ai](https://www.lalal.ai/) (e.g. lead guitars; the model got better by the time)\n\n(they’re paid ones)/\n\nLogic Pro>GSEP>Demucs 6s (free)<[Moises.ai](https://moises.ai/) (paid “holy shit better [vs demucs, but] still pretty bad”)\n\n[Dango.ai](https://dango.ai/) (paid)\n\n[Music.ai](http://music.ai) (paid, free trial)\n\nLogic Pro (paid, May 2025 update) / BS-Roformer 6 stems a.k.a. MVSEP Guitar SW\n(“really on point. So far it separated super well, also didn’t confuse organs for guitars and certain piano sounds as well.” - Tobias51\n\n“guitar model sounds better than Demcus, MVsep, and Moises” - Sausum\n\n“guitar in particular was amazing. All other models I tried had trouble with it” - Musicalman)\n\n[|](https://huggingface.co/jarredou/BS-ROFO-SW-Fixed/tree/main) [|](https://github.com/undef13/splifft/releases) [|](https://drive.google.com/drive/folders/1ee9HBdwygactWLi_7hdZiFgFNv45Y22m) [|](https://drive.google.com/file/d/1mHbBZGcjXHwVfV5hxfyLY2d6ZhaZofkB/view?usp=sharing) [|](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Colab_Inference_BSRofo_SW_fp16.ipynb)\n\nMVSep Electric Guitar (“really neat. One thing I noticed is that it seems to be better than other models at picking up midi/synth lead guitars (...) also gets tripped up a bit more by weird FX and synth sounds being partially flagged as guitar” - Musicalman)\n\n[Becruily Melband guitar](https://huggingface.co/becruily/mel-band-roformer-guitar/resolve/main/becruily_guitar.ckpt) | [yaml](https://huggingface.co/becruily/mel-band-roformer-guitar/resolve/main/config_guitar_becruily.yaml) (“Not SOTA, but much more efficient and comparable to existing guitar models, and for some songs it might work better because it picks up more guitars [though it can also pick some other instruments].”)\n\n[Mvsep.com](https://mvsep.com) (Mel-Roformer model and the previous - MDX23C one. Mel is “pretty good but suffers some dropouts where MDX23C doesn't”)\n\nuvronline.app (Mel-Roformer viperx' model, it is not flawless either)\n\n[uvronline.app](https://uvronline.app/ai?discordtest&test-mdx) (HQ\\_5 beta/[paid users](https://uvronline.app/ai?hp&test-mdx) - places guitars in vocal stem pretty well)\n\n“Rebalance volume of chans before processing” if you have better separation results processing L and R channel separately.\n\nConsider using Apollo Universal by Lew ([model](https://github.com/deton24/Lew-s-vocal-enhancer-for-Apollo-by-JusperLee/releases/tag/uni) | [UVR](#_6y2plb943p9v) | Colab in [HTMYOR](#_k34y1vaaneb1)) to get more clarity after separation.\n\n###### **Acoustic guitar**\n\n- [dango.ai](http://dango.ai) (paid, probably the best for now, better in at least some songs than the SW model)\n\n- MVSep Acoustic Guitar (strong competitor, outperforms moises “like crazy”\n\n- uvronline.app (viperx' model for premium users - does a good job too)\n\n- BS-Roformer SW\n\n- Demucs 6s - sometimes, when it picks it up\n\n- [GSEP](#_yy2jex1n5sq) - when the guitar model works at all (it usually grabs the electric), the remaining 'other' stem often is a great way to hear acoustic guitar layers that are otherwise hidden.\".\n\n- [lalal.ai](http://lalal.ai) (both paid)>[moises.ai](http://moises.ai) (It picks up acoustic and electric guitar together)\n\n- Audioshake (both electric and acoustic)\n\n- moises.ai\n\n###### **Separating electric and acoustic guitar**\n\n- Use a model from one of two the categories above, e.g.\n\n- MVSep Acoustic Guitar (“it's separating acoustic from electric very well, even in fuzzy, lo-fi recordings” - Input Output)\n\n- “To separate electric and acoustic guitar, you can run a song [e.g. other stem] through the Demucs guitar model and then process the guitar stem with GSEP [or MVSEP model instead of one of these].\n\nGSEP only can separate electric guitar so far, so the acoustic one will stay in the \"other\" stem.”\n\n- “⁠[medley-vox](#_s4sjh68fo1sw) main vs rest model has worked for me to separate two guitars before”\n\n- moises.ai “it's not perfect, it's good when the solo guitar for example is loud then it can be isolated but when it comes in a balanced lead and rhythm guitar, it can't isolate it”\n- MDX23C [phantom center](#_3c6n9m7vjxul) model\n\n- [moises.ai](http://moises.ai) (it has electric, acoustic, rhythmic, solo)\n\n###### **Lead and rhythm guitar**\n\n- moises.ai (paid)\n\n- MVSep Lead/Rhythm Guitar (1 stage, and 2 stage variant)\n\n- MVSEP guitar models\n“I can isolate both guitars with the different models that MVSEP has, especially in rock tracks where the lead guitar is in the center channel and the rhythm guitar is on the right - left side of a stereo track, good results are not always obtained, especially when the lead guitar has long delay effects, tons of reverb or when these effects go from one channel to another, but it also depends on how the song was mixed.” - edreamer 7\n\n- MDX23C [phantom centre](#_3c6n9m7vjxul) extraction by wesleyr36 model\n“First, isolate guitar, then (...) use phantom centre extraction by wesleyr36 model. Here I can find the rhytmn and the lead guitars, as I told before, results can vary” - edreamer 7\n\n- Dry Paint Dealer Undr’s Melband Roformer and Demucs Lead and Rhythm guitar [models](https://drive.google.com/drive/folders/1JH2tjhgDcJgvdi-hrQT82RsaV2XhE8hn?usp=sharing).\n\n“my own very mediocre model for it that I never shared. it does work but has issues that I imagine any better executed model won't.”\n\n- lalal.ai (“it sucks” - isling, Oct 25)\n\n###### **Wind instruments and wind noises** *(trumpet/saxophone/brass/woodwinds/flute/trombone/horn/clarinet/oboe/harmonica/bagpipes/bassoon/tuba/kazoo/piccolo/fluge/horn/ocarina/shakuhachi/melodica/reeds/didgeridoo/mussette/gaida/farts)*\n\n- MVSep Wind BS Roforomer (2025.09) 9.77 SDR (+2.64 SDR)\n\n###### *-* MVSep Wind BS Roformer (2025.08) (more robust and cleaner than the Mel and detects instruments better, +2.5 SDR)- Wind BS-Roformer on x-minus.pro by viperx (big step forward vs the old UVR model)\n\n-MVSEP Wind SCNet\n\n-MVSEP Wind Mel-Roformer\n\n- MVSEP Trumpet (“so clean”)\n\n“after testing [Wind 9.77] on a song where trumpet and sax play in unison, doing the trumpet model is cleaner than doing the sax model” - dynamic64\n\n- MVSep Trombone\n\n- MVSep Oboe\n\n- MVSep Clarinet\n\n- MVSep Harmonica (“Hit or miss” - musicbybrooks)\n\n- MVSep French Horn\n\n- MVSep Tuba\n\n- MVSep Bassoon\n\n- MVSep Accordion\n\n- MVSep Brass\n\n- MVSep Woodwind\n\n- MVSep Bagpipes\n\n- MVSep Braam\n\n*Older*\n\n- \"Wind\" model on UVR5 (Download Center -> VR Models -> select model 17)\n\n(You might have to use it on instrumental separation first, e.g. with HQ\\_4 or Kim Inst)\n\n- Audioshake\n\n- Music.ai\n\n- Adobe Podcast\n\n- karaoke 4band\\_v2\\_sn on e.g. MVSEP (worse than Wind model in UVR)\n\n- Probably someone had some success with one de-crowd model for wind noises\n\n- Lot of instrumental/vocal models confuses wind instruments with vocals\n\nMVSEP Saxophone\n\n- SCNet XL (SDR saxophone: 6.15, other: 18.87)\n\n- MelBand Roformer (SDR saxophone: 6.97, other 19.70)\n\n- Ensemble Mel + SCNet (SDR saxophone: 7.13, other 19.77)\n\n###### **Piano**\n\n*Consider using Unwa BS-Roformer Resurrection inst a.k.a. “unwa high fullness inst\" on MVSEP as preprocessor - rainboomdash*\n\n- Logic Pro (paid[;](https://colab.research.google.com/drive/1iVuweQTlL6NlGR9GC_CHEW8NNCNtUWQu?usp=sharing) May 2025 update, SDR 7.79[)](https://github.com/undef13/splifft/releases) / BS-Roformer 6 stems / MVSEP Piano SW\n\n(“1000 times more efficient than the Lalal.ai piano model”)\n\n- Lalal.ai (paid; no other stem with piano stem attached)\n\n- Demix Pro (paid; “I often combine the two” - Mixman, but it was before the 6 stems above)\n\n- MVSep Piano Ensemble (Mel-Roformer + SCNet Large piano models; SDR 6.21)\n(Mel is viperx' iirc; “a [tiny] bit more bleed during the choruses and whatnot” vs x-minus, “works well maybe 7 times out of 10”; SCNet, “has a less watery sound, but more bleed” vs Mel)\n\n- x-minus.pro (for paid users; cheap subscription; “more consistent than MVSep piano and demucs\\_6s” it knows well what piano is, but it sounds the best for other stem of piano separation, but e.g. on Carpenters - Yesterday Once More “while not terrible, the dropouts, underwater 'gurgles', and general lack of piano punch/presence remains noticeable” - Chris\\_tang1, while MVSEP Piano Ensemble: SCNet + Mel, SDR: 6.21, was much better in that case - might vary on a song)\n\n- Music.ai (paid)\n\n- Dango.ai (paid)\n\n- GSEP (formerly best, paid)\n\n- Moises (separate models for piano and keys)\n\n- MVSep Piano MDX23C 2024 & 2023\n\n- htdemucs\\_6s (not too good)\n\n- MVSEP Digital Piano (much better for epiano, and sometimes also for real or synthesiser when it's picked vs the SW model)\n\n- MVSep Keys\n\n- MVSep Harpsichord\n\n- MVSep Celesta\n\n- MVSep SATB Choir (it's able to separate piano layers; consider using good instrumental model result as an input passed through e.g. BS-Roformer SW piano stem, and then SATB for fuller sound instead of using “Extract vocals option”)\n\n###### **Synths**\n\n- MVSep Synth (it can also pick some bass which some bass models failed to pick up)\n\n- MVSep Organ\n\n- Piano or guitar models might work (if the song doesn't have piano or guitar already) depends on a song\n\n- [Zero Shot](#_g37f4a6hnxm0)\n\n- lalal.ai (hit or miss, sometimes might not work)\n\n- Some voc/inst models might treat synths as vocals (then you could separate them using better inst/voc model)\n\n###### **Organs**\n\n- MVSEP Organ (surprisingly good, and since then SDR doubled since the first version, eliminating some bleed issues or e.g. Hammond organs not being picked in some places)\n\n###### **Idiophones**\n\n- MVSep Marimba\n\n- MVSep Glockenspiel\n\n- MVSep Triangle\n\n- MVSep Bells (tubular bells or chimes - for sleigh bells use drums model)\n\n- MVSep Wind Chimes\n\n- MVSep Xylophone\n\n###### **Crowd**\n\n*Current models struggle with female screams and singing*\n\n- MVSEP (De-)Crowd MDX23C+Mel-Roformer Ensemble (6.07+6.06)\n\n- MVSEP Crowd BS Roformer (SDR 7.21) (newer model, some people have issues with it: \"The 6.27 one removed all kinds of crowd noises, sound effects and general noise, this one only removes random bits of music!\")\n\n- UVR-MDX-NET Crowd HQ 1 (UVR/x-minus.pro/[model](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-MDX-NET_Crowd_HQ_1.onnx)/conf in UVR) (can be more effective than Mel MVSEP’s sometimes; e.g. good for live shows)\n\n- MVSEP Mel-Roformer Crowd by ZFTurbo (SDR 6.07) (MVSEP/[files](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v.1.0.4)/UVR)\n\nTo use it in UVR, Go to UVR\\models folder, and paste [that](https://drive.google.com/drive/folders/1eO6rDhxh77eC-l0IHF16mQNwrrWOX31h) folder there.\n\nThen change \"dim\\_t\" value to 801 at the very bottom of: “model\\_mel\\_band\\_roformer\\_crowd.yaml” file in “mdx\\_c\\_configs” subfolder. Don’t use overlap above 4.\n\n- Mel-Roformer De-Crowd by Aufr33/viperx (x-minus.pro/UVR/[DL](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt)/[conf](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/model_mel_band_roformer_crowd.yaml))\nFor UVR, change the model name to the one from the attached yaml, copy chkpt to models\\MDX\\_Net\\_Models, and yaml to model\\_data subfolder, then set overlap 2 or use ZFTurbo inference [script](https://github.com/ZFTurbo/Music-Source-Separation-Training)] - more effective than MDX below at times)\n\n- MDX23C De-crowd v1/v2 (MVSEP)\n\n- Older MVSEP model (applause, clapping, whistling, noise)\n\n- Aufr33’s Mel-Roformer Denoise average variant ([link](https://mega.nz/file/vM4mHTYQ#f_uCxxS_olfTR4iAsOc-XS6sfUecfbF-ZKXrk3IjbnY) | [yaml](https://drive.google.com/file/d/1uwInhwgjOMIdOMTgj_oNR_dmaq7E-b3g/view?usp=sharing) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)) can be also used as crowd removal\n\n- [AudioSep](#_tvbntqdvkn9n)\n\n- [USS Bytedance](#_4svuy3bzvi1t)\n\n- [Zero Shot Audio Source Separation](#_g37f4a6hnxm0)\n\n- GSEP (sometimes), and e.g. drums stem is able to remove applauses\n\n- [Chant model](https://cdn.discordapp.com/attachments/708580573697933382/859156390537855066/chant_model.pth) (by HV, VR arch, e.g. works for applauses; may leave some echo to separate with other models or tools below) for Colab usage - you need to copy that model to models/v5 and then use 1 band 44100 param, turn off auto-detect arch and set it to \"default\". In UVR pick one of 44100 1 band parameter, possibly 512.\n\n“For really difficult live songs (where the crowd is overwhelmingly loud to the point where you can't hear the band properly) sometimes filtering vocals with mel roformer on xminus THEN running the vocals stem through the mdx decrowd model sometimes helps”\n\n###### **SFX**\n\n*- “You do need to first get an instrumental with a different model, because this isn't really trained to remove vocals. Just SFX” or speech.*\n\n*- SFX models can be more aggressive than regular vocal models for* [*speech*](#_o6au7k9vcmk6)*.*\n\n*Sometimes, some of the regular vocal models may turn out to be better suited for your task, so try out those for speech too.*\n\n- HyperACE inst v2 ([Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link&authuser=3)) - good to clean up vocals or voice from SFX, putting them in other stem - better than DNR models at it. A go-to for fun dubs, “SOTA for DNR (...) almost zero muddiness between the sound effects”, a bit better for it than the v1 - wancitte\n\n- BS-Roformer SW drums (6 stems model, Colab, or just drums on MVSep) - “really good to remove some SFX and foley, way better than DnR v3” - erosunica\n\n- MVSep DNnR v3:\n\na) SCNet\n\nb) MelBand\n\n(better metrics than Bandit v2)\n\n- Bandit v2 (MVSEP | OG [weights](https://zenodo.org/records/12701995) | [yaml](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/configs/config_dnr_bandit_v2_mus64.yaml)) multilingual model (multi)\n\nor single ones for EN/GER/FR/SPA/CH/FAR language.\nAll [models](https://huggingface.co/jarredou/banditv2_state_dicts_only/tree/main) converted ([yaml](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/configs/config_dnr_bandit_v2_mus64.yaml)) for ZFTurbo inference | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) |\nin [UVR](#_6y2plb943p9v)>MDX-Net>Download More Models>Bandit v2/Plus\n(v2 is better on speech vs Bandit Plus, not always on SFX - experiment).\n\n“Multilingual model is most of the time giving better results than French model for French content, so I would start with it” - jarredou\n\n“Co-developed by Netflix and Georgia Institute of Technology. The [paper](https://arxiv.org/abs/2309.02539) is titled \"A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation\"”\n\nOlder models\n\n- Bandit Plus (may work good for TV shows and movies: MVSEP, jarredou [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [UVR](#_6y2plb943p9v)\n“trained on mono audio, so it's dual mono (...) when content is heavily panned left/right, it's where issues start to say \"hello !\"” but other than that it might still handle stereo good enough or even better than others depending more on a song)\n\n- “My suggestions from those who want best results, i suggest using either:\n\nresreuction inst or instfv8 to extract Music and effects audio track” - killdubo\n\n- Jasper (a.k.a. jazzpear94) MDX23C foreground and background SFXs - auxiliary/helper [model](https://drive.google.com/file/d/1ftKK4G9aOPknAm5wvGsJpZpFJ5L4rjBe/view?usp=sharing) ([yaml](https://drive.google.com/file/d/1xo37CIqjmpgSBM5ZENsVYysQs_sILPiv/view?usp=sharing)) for DNRv3 and vocal model working for [speech](#_o6au7k9vcmk6) (it wasn't trained on voice and music; [more info](https://discord.com/channels/708579735583588363/708580573697933382/1467461947187527814))\n\n- Jasper Mel-Roformer BGM (background music) from a movie [model](https://drive.google.com/file/d/10eTjyjmUwdjTgfEBS4Cwks-XxBLEkEEu/view?usp=sharing) ([yaml](https://drive.google.com/file/d/1WDgbQPTykaEVNrG4qRy1FlwE_YCiOE6V/view?usp=sharing)) (stems: voice with SFX/BGM without singing voices), you might want to use some vocal model as a preprocessor here too.\n\nCompatibility with UVR on both not guaranteed, consider using [MSST](#_2y2nycmmf53) in case of any issues\n\n- jazzpear94 Mel-RoFormer model (ability to separate specific SFX groups - Ambiance, Foley, Explosions, Toon, Footsteps, Fighting and General - for all in one stem | MVSEP, fixed newer [Colab](https://colab.research.google.com/drive/1msFRvDn6ZbBsAC5XzIst4ABIxVL-V8T0?usp=sharing), [files](https://drive.google.com/file/d/1la6Piir7j-GzRFaPEt_W7x_fAwv9gYYF/view), [instruction](https://i.imgur.com/ZoaT0hn.png), prob. broken Colabs: [1](https://colab.research.google.com/drive/1Mncjg3JLOgE1Kj6oJAaJ0nIzShLDEHr2?usp=sharing), [2](https://colab.research.google.com/drive/1YBpeGj66FfIHS1WH8uYBcshITOGVYOHY?usp=sharing), [3](https://colab.research.google.com/drive/1jrw-cAi-JqZpBi6wyT3YIp3x-XHhDm1W?usp=sharing#scrollTo=I9-pu3zHFtFk), [4](https://colab.research.google.com/drive/1efoJFKeRNOulk6F4rKXkjg63RBUm0AnJ))\n\n- joowon bandit model: <https://github.com/karnwatcharasupat/bandit>\n\n(better SDR for Cinematic Audio Source Separation (dialogue, effect, music) than DNR Demucs 4 below (SDR 10.16>11.47) - [Colab](https://colab.research.google.com/drive/1efoJFKeRNOulk6F4rKXkjg63RBUm0AnJ?usp=sharing) / MVSEP)\n\n- GAudio (a.k.a. GSEP) announced their SFX (DnR) model in their API:\n“DME Separation (Dialogue, Music, Effects)”\nSo far it’s not available for everyone on their regular site:\n\n<https://studio.gaudiolab.io/>\n\nBut the link on their Discord redirects to the site with a form to write an inquiry:\n\n<https://www.gaudiolab.com/developers>\n\nShortly after entering the one or both of the links and logging on the first, you might get an email that $20 of free credits to access their API have been added to your account\n\n- [USS-ByteDance](#_4svuy3bzvi1t) (for providing any, at least, proper sample)\n\n- [Zero Shot](#_g37f4a6hnxm0) (currently worse for SFX vs Bytedance)\n\n- [Audiosep](#_p3wngakyrk0n)\n\n- custom stem separation on Dango (paid, 10 seconds for free)\n\n- DNR Demucs 4 model (repo: [CDX23](https://github.com/ZFTurbo/MVSEP-CDX23-Cinematic-Sound-Demixing) repo, MVSEP, [Colab](https://colab.research.google.com/github/jarredou/MVSEP-CDX23-Cinematic-Sound-Demixing-Colab-Inference/blob/main/MVSEP_CDX23_Cinematic_Sound_Demixing_Colab.ipynb)) - it used to output fake stereo (at least training dataset was mono).\n\n\"I noticed [it] doesn't do well and doesn't detect water sounds, and fire sounds\"\n\n*Can be used in UVR* (the three stems will be labelled wrong, SFX will be bass).\n\n>For UVR, download the [model files](https://github.com/ZFTurbo/MVSEP-CDX23-Cinematic-Sound-Demixing/releases/tag/v.1.0.0), put them in the Ultimate Vocal Remover\\models\\Demucs\\_Models\\v3\\_v4\\_repo, Delete \"97d170e1-\" from all the three file names, copy this [yaml](https://drive.google.com/file/d/15NKIgCAHGMgH53Bi94PkDRsjQtqyywen/view?usp=sharing) alongside the model files (it won’t work on AMD 4GB VRAM GPUs).\n\n>The Colab might run occasionally on CPU, and then it might be slow to the point that it might take 2.5h for a 15 min audio track (maybe change Google account or retry), and then it might take 2 mins for a similar length once it uses GPU.\n\n- jazzpear’s MDX23C model ([files](https://discord.com/channels/708579735583588363/708580573697933382/1180312079920529448)) - rename the config to .yaml as UVR GUI doesn't read .yml. You put config in UVR’s models\\mdx\\_net\\_models\\model\\_data\\mdx\\_c\\_configs. Then when you use it in UVR it'll ask you for params, so you locate the newly placed config file.\n\n- Aufr33 Mel-Roformer denoise average “27.9768” model - dedicated for footsteps, crunches, rustling, sound of cars, helicopters\n\nIf it’s not available for paid users of [uvronline.app](https://uvronline.app), use [this](https://uvronline.app/ai?discordtest) link | MVSEP | model files:\n\n[Less aggressive](https://mega.nz/file/rIRQGJ4D#9SHaPIXt8GRoi2SL29WUILW0g9dk26I5njyFPZuPJQ8) & [More aggressive](https://mega.nz/file/vM4mHTYQ#f_uCxxS_olfTR4iAsOc-XS6sfUecfbF-ZKXrk3IjbnY) | [yaml file](https://drive.google.com/file/d/1uwInhwgjOMIdOMTgj_oNR_dmaq7E-b3g/view?usp=sharing) | UVR Roformer [patch](#_6y2plb943p9v) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb#scrollTo=GS-QezQ-RG64)\n\n- myxt.com (uses Audioshake)\n\n- [AudioSep](#_tvbntqdvkn9n) (you can try it to get e.g. birds SFX and then use as a source to debleed or maybe try to invert phase and cancel out)\n\n- Moises.AI (the rumour says it’s better than Bandit v2, but it’s expensive “Dialogue, Soundtrack, Effects”)\n\n- Older DNR model on MVSEP from ‘22\n\nI think the most commonly used recent SFX models discussed in the server before DnR v3 ones are DNR Demucs 4 model and Bandit v2, but I haven't seen any settlement in the community on which model is the best, hence it might simply depend on a song.\n\n- voc\\_ft - sometimes it can be better than Demucs DNR model (although still not perfect)\n\n- [jazzpear94 model](https://www.dropbox.com/scl/fo/lcpknm3rvehxhryzcd6mb/h?rlkey=zpi8pnpda30d0n71tqckiocod&dl=0) (VR-arch) - put the .pth file to: Ultimate Vocal Remover\\models\\VR\\_Models. On UVR start set config: 1band sr44100 hl 1024, stem name: SFX, Do NOT check inverse stem in UVR5, 5.1 disabled. Or put that [file](https://drive.google.com/file/d/15YlbTv9CypsmSDC7l0s-HPOGJTVztFkL/view?usp=sharing) to model\\_data subfolder.\n\n- ([dl](https://drive.google.com/drive/folders/12CnfpIph5Ipd9ocoD6RsbOWzWsCeWAeT?usp=share_link)) [source](https://discord.com/channels/708579735583588363/708580573697933382/1055221556223164487) by Forte (VR) (probably setting to: instrumental/1band\\_44100\\_hl1024 is the proper config) Might work in Colab “I tried it with the SFX models, and I just uploaded them in the models folder and then placed the model name, and it processed them” and may even work in UVR.\n\n- Or [GSEP](#_yy2jex1n5sq) (sometimes) esp. the new “Vocal Remover” model\n\n###### **Any other** stem/instrument/sample if not listed above\n\n- Zero Shot Audio Source Separation\n\n- Bytedance-USS (might be worse for instruments, but better for SFX)\n\n- Dango.ai custom stem separation (paid, free 10 seconds preview)\n\n- [Audiosep](https://github.com/Audio-AGI/AudioSep/tree/main) (separate everything you describe; Colab has unpickling issue)\n\n- *Spectral removers (software or VST):*\n\nQuick Quack MashTactic (VST), Peel (VST, they say it’s worse alternative of MT), Bitwig (DAW), RipX (app), iZotope Iris (VST/app), SpectraLayers (app, “Problem with RX [Editor's spectral editing] is it doesn't support working in layers non-destructively.”), R-Mix (old 32 bit 2010 Sonar plugin), free [ISSE](https://isse.sourceforge.net/download.html) (app, [showcase](https://youtu.be/Rd3prIkO5bg)), [FactorSynth](#_cz4j2d3uf48s), Zplane Copycat \"but MashTactic also has a dynamics parameter that is really useful (you can isolate attack from longer sounds, or the opposite, coupled with the stereo placement and EQ isolation)\"\n\nRipX is “not as good as UVR5 for actual separation, but RipX is very good if you need to edit what's already separated more musically. SpectraLayers is a nicer spectral editor, RipX spectral editor is not as usable”\n\nConsecutive multi-AI separation for not listed instruments\n\n- Extract all other instruments \"one by one\" using other models in the chain (e.g. remove vocals with voc\\_ft or now e.g. inst Mel Kim derivative, use what's left to remove drums/bass with htdemucs\\_ft/MDX23/MVSEP ensemble, use what's left to remove guitars/piano with GSEP/demucs\\_6s or now any better purpose model, then use what's left to remove e.g. wind instruments (if present) with UVR wind model, or any any other purpose model applicable, even SFX, till you're left with the instrument of your choice, or as few instruments, as possible, for potentially easier work with spectral editor listed above)\n\n- [Drumsep](#_m55fp5i7rdpm) - “Using DrumSep on melodic stems can help separate instruments easier if you plan on sampling/editing them, but they are separated based on range rather than actual instrument. Low instruments will often be on the kick/tom stems, mid instruments will be on snare and/or tom, and higher instruments will be on the cymbals.”\n\n###### **De-reverb**\n\n###### - Mel-Roformer de-reverb by anvuew v2 (a.k.a. 19.1729 SDR) | [DL](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt) | [config](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) Probably the best de-reverb for now.\n\n######\n\n###### - and RX11's dialogue isolate for de-echo.\n\n###### “anvuew's models can remove reverb effect only from vocals, “captures early reflections a little”. Old FoxJoy's model works with full track.”\n\n“sort of reminds of RX11 dialogue dereverb results but doesn't destroy singing voices”\n“perfect for rvc”\nBoth BS and Mel variant “will also remove harmonies or vocal effects that are not in the center channel.”\n“it works sort of like [that](#_3c6n9m7vjxul) phantom center model, removing sides basically”\nSometimes \"noreverb\" stem might get empty (e.g. on MVSEP, but similar issues was fixed already once there).\n\n“reminds me of the equivalent dereverb mdx model (...) cleaner in some ways, though slightly more filtered and aggressive.”\n\n“it's EXTREMELY aggressive, like very aggressive, it seems kinda muddy at a lot of parts, almost NO reverb bleed, it also caught so many effects and removed them (good thing) which is actually insane!! I also noticed that when it gets breathy or like it has falsetto, it seems to remove a lot of it, it's very weird at the breathy-ish parts of it lol, will be using this mainly if there are heavy vocal effects I want removing” - isling\nIn fact, it was “fine-tuned from kim's mel” - anvuew.\n\nTo make it work with UVR, delete “linear\\_transformer\\_depth: 0” from the YAML file, copy the model to MDX\\_Net\\_Models and config to model\\_data\\mdx\\_c\\_configs.\n\n- Dango Reverb Remover - [click](https://tuanziai.com/en-US/de-reverb) (“it's very similar to [RX11] dialogue isolate good/real-time set to 5. Yeah, it's like listening to the same inference files” John; probably also works in mono, you can get 30 seconds for free) but for other purposes, even older FoxyJoy’s models can give better results\n\n- Anvuew BS-Roformer dereverb 22.5050 | [DL](https://huggingface.co/anvuew/dereverb_bs_roformer/tree/main) | uvronline\n\nNew 2026 model\n\n“very interesting model, sounds like the mel-roformer one but even more aggressive, it's good” it works on “reverb effect on vocals” at least - isling\n\n“sounds cleaner than the `mono sdr 20.4029` one” - rainboomdash\n\n“the dataset is the same as mel, all reverb is generated by VST plugins and includes waves IR1 presets. so it depends on whether you consider IR1 to be room reverb” - anvuew\n\n###### - anvuew BS-Roformer Dereverb Room [model](https://huggingface.co/anvuew/dereverb_room) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | MVSEP (doesn’t work in UVR, use [MSST](#_2y2nycmmf53), if you have stereo errors using MSST on stereo files, update MSST [git clone and git pull commands] or see layer below.\n\n###### “specifically for mono vocal room reverb.” as most are recorded in mono.\n\nNot that long inference compared to other Roformers.\n\n“Really liking the fullness in the noreverb stem. Virtually all dereverb roformers I've tried sound muddy, but this one is just the opposite. (...) Other noises may interfere, and in my experience, makes the model underestimate the reverb. [The previous anvuew’s mono model] is way different [from] this one in every way. So, like I say, worth a shot.” - Musicalman\n\nDo the below to fix stereo error using that model, it might work with your current MSST version instead of the linked repo too, but in a different line.\n\n“Edit inference.py from my [repo](https://github.com/jarredou/Music-Source-Separation-Training/tree/colab-inference) line 59:\n\nReplace :\n\n# Convert mono to stereo if needed\n\nif len(mix.shape) == 1:\n\nmix = np.stack([mix, mix], axis=0)\n\nby :\n\n# If mono audio we must adjust it depending on model\n\nif len(mix.shape) == 1:\n\nmix = np.expand\\_dims(mix, axis=0)\n\nif 'num\\_channels' in config.audio:\n\nif config.audio['num\\_channels'] == 2:\n\nprint(f'Convert mono track to stereo...')\n\nmix = np.concatenate([mix, mix], axis=0)”\n\n- jarredou\n\n- anvuew [dereverb\\_mel\\_band\\_roformer\\_mono\\_anvuew\\_sdr\\_20.4029](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt) model | [yaml](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew.yaml) | x-minus\n“supports mono, but ability to remove bleed and BV is decreased” “separates reverb better than v2”\n\n- Sucial Mel-Roformer dereverb/echo model #3 called “fused”: [model](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt) | [yaml](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/config_dereverb_echo_mbr_v2.yaml)\n“more effective in removing large reverb”\n\n“Specifically targeting large reverb removal. After training, I combined these two models with my v2 model through a blending process, to better handle all scenarios. At this stage, I am still unsure whether my new models outperform the anvuew's v2 model overall [besides large reverbs].” [More](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer)\n\n- anvuew v2 “less aggressive” variant - a bit lower SDR 18.81 | [DL](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt?download=true) | [config](https://huggingface.co/anvuew/dereverb_mel_band_roformer/blob/main/dereverb_mel_band_roformer_anvuew.yaml)\n\n- Gabox Lead Vocal Mel-Roformer de-reverb | [DL](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/experimental/Lead_VocalDereverb.ckpt) | [config](https://huggingface.co/GaboxR67/MelBandRoformers/blob/main/melbandroformers/karaoke/karaokegabox_1750911344.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n“just use it on the mixture” - Gabox’\n\n“I’ve had great results on vocals with heavy delay” - 5b\n\n*Older\nVR* [*models*](https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models)\n\n(added to UVR 5 (GUI), works in NotEddy’s [Colab](#_wbc0pja7faof))\n\n- UVR-DeEcho-DeReverb (213 MB) - “it removes reverb not echo” but you could try it if everything above fails\n“use an aggression of 3.0 -5.0 and nothing more than that.” e.g. 4 (0.4 in some CLI code Colabs).\n“Results have a frequency ceiling around 17540 Hz and a very high pitched noise above 22000 Hz, you might want to upscale your results with HQNizer or Apollo Model” - [more](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07). Or use [point #4](#_929g1wjjaxz7) to slow down the audio and revert it back.\n\n\\_\\_\n\n(below the old ones, which might only work with vocal-remover 5.0.2 by tsurumeso’s default arch settings, [maybe 1band\\_sr44100\\_hl1024 or [512](https://github.com/Anjok07/ultimatevocalremovergui/blob/master/lib_v5/vr_network/modelparams/1band_sr44100_hl512_nf1024.json)? and his [nets and layers](https://github.com/tsurumeso/vocal-remover/tree/develop/lib)])\n\n- VR dereverb - only works on tracks with stereo reverb (j48qny.pth, 56,5MB) ([dl](https://files.catbox.moe/7cnm62.pth)) ([source](https://discord.com/channels/708579735583588363/708580573697933382/1006301791354376233))\n\n- VR reverb and echo removal model (j48qny.pth, 56,5MB) ([dl](https://files.catbox.moe/j48qny.pth)), works with mono/stereo)\n\nMDX models (less aggressive than those at the top)\n\n“I use it when there's not so much reverb, but if it's more intense I will choose VR-Arch DeEcho”\n\n> FoxyJoy's dereverb V2 - works only with stereo (available in UVR's download center and Colab (eventually via [this](https://www.mediafire.com/file/spgazgyafo1pmrv/Reverb_HQ_By_FoxJoy.onnx/file) dl link); it can spoil singing in acapellas or sometimes removes delay too). “I do think [that] MDX is noticeably more accurate [vs VR DeEcho-DeReverb]”\n\n\"(the model is also on X-minus) Note that this model works differently from the UVR GUI. I use the rate change (but unlike Soprano mode only by 1 semitone). This extends the frequency response and shifts the MDX noise to a higher frequency range.\" It's 11/12 of the speed so x 0.917, but actually something else goes on here:\n\n(Anjok)\n\n\"The input audio is stretched to 106%, and lowered by 1 semitone using resampling. After AI processing, the speed and pitch of the result are restored.\"\n\nYou'll find slowing down method explained further in \"tips to enhance separation\"\n\n\"De-echo is superior to de-reverb in every way in my experience\"\n\n“VR DeEcho DeReverb model removes both echo and reverb and can also remove mono reverb while MDX reverb model can only remove stereo reverb”\n\n\"You have to switch [main stem/pair] to other/no other instead of vocal/inst\" in order to ensemble de-echo and de-reverb models.\n\n*Newer*\n\n- UVR Dereverb model by Aufr33 & jarredou for uvronline.app premium | [Model files](https://mega.nz/file/CFRBHLRK#uhRexQFJVo8_Owr8x9sEEohDcCNZbl3UgeX5eyD7IFA) | [settings](https://imgur.com/a/dZAJwef) (PS: Dry, Bal: 0, VR 5.1, Out: 32/128, Param: 4band\\_v4\\_ms\\_fulband)\n\nCopy model file to Ultimate Vocal Remover\\models\\VR\\_Models and json config is probably already present lib\\_v5\\vr\\_network\\modelparams (and has the same checksum).\n\n- MDX23C UVR Dereverb model for uvronline.app premium | [Model files](https://a19p.uvronline.app/public/dereverb_mdx23c_sdr_6.9096.ckpt) | [config](https://drive.google.com/file/d/1dQHfce4VKYSmWZ3IgIj4SZq_uTicD4At/view?usp=sharing)\n\n(by Aufr33 & jarredou)\n\nSeems to pick up room reverb. Previous Foxy’s model sometimes cut “way too much” than this model.\n\nCopy model file to Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models and yaml config to \\model\\_data\\mdx\\_c\\_configs subfolder.\n\nBas Curtiz’ conclusion on both:\n\n- MDX23C “seems to be cleaner, takes the reverb away, also between the words,\n\nwhereas (U)VR leaves a little reverb\n\n- VR “seems to sound more natural, maybe therefore actually.\n\n- MDX23C tends to 'pinch' some stuff away to the background, which sounds unnatural.\n\n“This is just based on my experience with 3 songs/comparisons, but both points are a pattern.\n\nOverall, they're both great when u compare them against the original reverbed/untouched vocals.” showcase [video](https://discord.com/channels/708579735583588363/708580573697933382/1279223128848863338).\n\n- You can find older avuew Mel versions [here](https://huggingface.co/anvuew/dereverb_mel_band_roformer/tree/main/archive%20only)\n\n- BS-Roformer anvuew variant (a.k.a. 8/256/8) | [DL](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2229279531) - a bit higher SDR than Mel v1 (8/26/6) posted firstly in the ZFTurbo repo.\n“not good is all people say” but it might depend on a use case or a song.\n\nMel-Roformer might turn out to be better more often “works weirdly and leaves some echo for some reason” “very usable for single singing voices and speech, cus it's very precise in eliminating echo and reverb,\n\nbut if you have a choir singing or vocals with backing vocals in it, then it'll probably ruin it a bit. For such vocals it's better to use aufr33/jarredou model or dereverb deecho” John UVR\n\nTo fix issues with BS variant of the model in UVR “change stft\\_hop\\_length: 512 to stft\\_hop\\_length: 441 so it matches the hop\\_length above” in the yaml file (thx lew), plus delete linear\\_transformer line in the config like above too.\n\n- 8 384 dim 10 depth BS variant | [dl](https://huggingface.co/anvuew/deverb_bs_roformer/blob/main/deverb_bs_roformer_8_384dim_10depth.ckpt) | [config](https://huggingface.co/anvuew/deverb_bs_roformer/blob/main/deverb_bs_roformer_8_384dim_10depth.yaml)\n\n- #2 Sucial Mel-Roformer dereverb/echo model ([model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2493431454) | MVSEP).\nFine-tune with more training data.\n\n- #1 Sucial Mel-Roformer dereverb/echo model ([model](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2534381169) | MVSEP).\nIt’s good but doesn’t seem to be better than the anvuew's Mel v2 model above [here](#_kcswx79hi856).\nStill, it might depend on a use case.\n\n- older V1 de-reverb HQ MDX model by FoxyJoy ([dl](https://pixeldrain.com/u/UWn7d2iH)) ([source](https://discord.com/channels/708579735583588363/872995262224818187/1062492523689418793)) (also decent results, but most likely worse).\n\n(“It uses the default older architecture with the fft size of 6144”\n\n“After separation, UVR cuts off the frequencies at 15 kHz, so I found that to fix that is to invert the \"Vocals\" and mix that with the original audio file.”\n\nDemonstration: [Original](https://krakenfiles.com/view/CVqVzcS3PO/file.html) | [Dereverbed](https://krakenfiles.com/view/qKFcriiLLy/file.html) | [Detected reverb](https://krakenfiles.com/view/XRwAcmYvMj/file.html))\n\n- To enhance the result if necessary, you can use more layers of models to dereverb vocals, e.g.:\n\nDemucs + karaoke model + De-reverb HQ (by FoxyJoy)\n\n\"works wonders on some of this stuff\".\n\n“Originally I inverted with instrumentals then I ran through deecho dereverb at 10 aggression then demucs\\_ft then kim vocal 2 then uvr 6\\_ at 10 aggression and finally deecho normal” (isling)\n\n- For room reverb check out:\n\nReverb HQ\n\nthen\n\nDe-echo models (J2)\n\n“from my experience, De-Reverb HQ specifically only really works when the sound is panned in the center of the stereo field perfectly with no phase differences or effects or anything that could cause the sound to be out of phase in certain frequencies.\n\nIf the sound doesn't fit that criteria, it only accurately produces the output of whatever’s in the mid”\n\n“I noticed that in some cases the DeEcho normal worked better than the aggressive, which was weird. That's why I ran through both, so to remove as much as possible.”\n\n- For removing reverb bleed left over in the left and right channels of a 5.1 mix from TV shows/movies check out:\n\nMelband Roformer on MVSEP\n\n- <https://twoshot.app/model/36> (AI, paid)\n\n###### Free apps/VSTs for de-reverb/de-echo/denoise\n\n- Accusonus ERA (was good, but discontinued when Facebook bought them, can be found on archive.org from when they gaveaway it without DRM)\n\n- [Voicefixer](https://github.com/haoheliu/voicefixer_main) (CML, only for voice, [online](https://huggingface.co/spaces/akhaliq/VoiceFixer))\n\n- [RemFX](https://github.com/mhrice/RemFx) (de: chorus, delay, distortion, dynamic range compression, and reverb or custom)\n\n- [Reverser](https://apmastering.com/plugins/reverser) (specifically for atanh distortion when level matched, but occasionally worked for other types too)\n\n- [Noise Suppression for Voice](https://github.com/werman/noise-suppression-for-voice/releases) (a.k.a. RNNoise, worse, various plugin types, available in OBS; now also RRNoise 0.2/1.10 available)\n\n- [Krisp](https://krisp.ai/) app (paid, free 60 minutes per day) better (same for RTX voice) - free on Discord\n\n- [Adobe Podcast](https://podcast.adobe.com/) (online, a.k.a. Adobe Podcast Enhance Speech, only for narration, changes the tone of voice, so you might want to use only frequencies from it above 16kHz)\n- [AI-Coustics](https://ai-coustics.com/) (speech enhancement, 30 minutes/5 files for free per month)\n\n- [CrystalSound.AI](https://crystalsound.ai/) (app)\n\n- [Noise Blocker](https://closedlooplabs.com/) (paid, 60 minutes free per day)\n\n- [Steelseries GG](https://steelseries.com/gg) (app, classic noise gate with EQ and optional paid AI module, activating by voice in noisy environment may not always work correctly)\n\n- RTX Voice (in NVIDIA Broadcast app, currently for any GTX or RTX GPU)\n\n- AMD Noise Suppression (for RX 6000 series cards, or for older ones using unofficial Amernime Drivers)\n\n- Elgato Wave Link 3.0 - Voice Focus feature (now free for everyone, standalone VST3/AU version is paid 50$)\n\n- [AI SWB Noise Suppression](https://meeamitech.com/ai-swb-noise-suppression/) (free, currently they give away that Mac/Windows driver only on email requests)\n\n- Audio Magic Eraser shipped with new Google Pixel phones (separate [options](https://media.discordapp.net/attachments/708579735583588366/1159137836939882578/image.png?ex=651ecabc&is=651d793c&hm=02fab4a1c15a2090ac6cbd5e35aaae46b15854b228795b77b4b7df3463ca060b&) for cancellation of: noise, wind, crowd, speech, music)\n\n*The best paid de-reverb plugins for vocal tracks/stems/separations:*\n\n- RX 11 Dialogue Isolate (RX Editor/VST, paid) - some people like it more than DeVerberate 3. In RX Advanced variant, there’s additionally “multi-band processing and a high-quality mode as an offline process”, good companion for de-echo along with anvuew v2 model for dereverb\n\n- DeVerberate 3 by Acon Digital (someone while comparing said it might be even better than RX10) \"I find it's useful to take the reverb only track and unreverbed track and mix them to a nice level\" “Acon is probably best if you can tweak to each stem separated. RX is imo too rough.” [Comparison](https://discord.com/channels/708579735583588363/708579735583588366/1171587840837160980)\n\n- Accentize DeRoom Pro (\"great\" but expensive, available in DxRevive Pro, now 1.1.0)\n- prime:vocal (multitool with also dereverb and other vocal enhancers)\n\n- DxRevive Pro 1.1.0 - complete dialogue restoration tool; noise removal, reverb suppression, restoration of absent frequencies, elimination of Codec Artifacts\n\n- Izotope RX <?8-10 Dialogue De-Reverb (RX Editor/VST) for voice and mixtures\n\n(more possible free [solutions](https://discord.com/channels/708579735583588363/814405660325969942/1085297801749074051)). Good results not only for room reflections, but also regular reverb in vocals. It picks reverb where even FoxyJoy's model fails (“De-reverb” and “Dialogue de-reverb” options). It’s destructive for mixing raw vocals, but can just work.\n\n- Clear by Supertone “equally good compared to RX10 imho. Smoother imho. It's only good on vocals though” Simple 3 knob plugin - “the cleverest / least-manual to get good results and is AI-based.” previously known as Supertone Voice Clarity and defunct free GOYO.AI) Also destructive for mixing raw vocals, but can just work.\n\n- Waves Clarity Vx DeReverb - it cannot perform de-echoing, so you need UVR De-echo (17.7kHz cutoff) or RX Dialogue Isolate for it, simpler than RX (paid; models updated in 12/17/2023 build) - same. Maybe you could even mix the two plugins using less aggressive settings in both.\n\nOthers:\n\n- SPL De-Verb Plus\n\n- Audio Damage Deverb\n\n- Zynaptiq UnVeil\n\n- Zynaptiq Intensity\n\n- Thimeo Stereo Tool (one of its modules, not so agresive/capable like Silk Vocal, can be used on mixbuss)\n\n- Acon Dialogue:Extract 2 (voice dereverb, denoise)\n\n- Cedar StageVox (voice dereverb and denoiser capable of working in real-time for tracking)\n\n- Cedar VoicEX 2 (voice dereverb and denoiser)\n\n- Waves Silk Vocal (voice has optional Live version too)\n\n- Noiseworks VoiceAssist (doesn't give very dry vocals, but it also has some highs recovery option for them after de-reverb, not always free of artefacts, but very decent)\n\nIf you want to use some of these DAW plugins for your microphone in real-time, you can use Equalizer APO.\n\nGo to \"Recording devices\" -> \"Recording\" -> \"Properties\" of the target mic -> \"Advanced\".\n\nTo enable a plugin in Equalizer APO select \"Plugins\" -> \"VST Plugin\" and specify the plugin dll. You need VST2 x64 plugin for x64 app version, VST3 is unsupported, but using a separate chainer/wrapper/adapter from VST2 to VST3 is still possible.\n\nTo run a plugin for a microphone in a simple app and send it to any output device, alternatively, you can download [savihost3x64](https://www.hermannseib.com/english/savihost.htm), then edit downloaded exe name to the name of your plugin you want to use, placed nearby, and run the app. Now go to settings and set input and output device (can be virtual card, maybe not necessarily). Contrary to Equalizer APO (irc) it supports VST3 plugins too. Of course, you can also use DAWs for the same purpose (Reaper, Cakewalk etc. - but not Audacity irc)\n\nMore free tools:\n\n<https://github.com/FORARTfe/HyMPS/blob/main/Audio/AI-Enhancing.md#dereverbers->\n\n**De-echo**\n\n- UVR-De-Echo-Aggressive (121 MB)\n\n- UVR-De-Echo-Normal (121 MB)\n\n- UVR-DeEcho-DeReverb (213 MB)\n\n(now added in UVR and MVSEP, won't be in Colab for now, but the first too are on [HuggingFace](https://huggingface.co/spaces/r3gm/Ultimate-Vocal-Remover-WebUI))\n\n- [delay\\_v2\\_nf2048\\_hl512.pth](https://pixeldrain.com/u/1Wck1P78) (by FoxyJoy, all VR arch, [source](https://discord.com/channels/708579735583588363/872995262224818187/1062484995358347335), can't remember if it was one of the above), decent results.\n\n“works in UVR 5 too. Just need to select the 1band\\_sr44100\\_hl512.json when the GUI asks for the parameters”\n\n“You [also] can use this command to run it: python inference.py -P models\\delay\\_v2\\_nf2048\\_hl512.pth --n\\_fft 2048 --hop\\_length 512 --input audio.wav --tta --gpu 0”\n\nThey’re also on X-Minus now:\n\n“The \"minimum\" and \"average\" aggressiveness settings use the Normal version of the model. The Aggressive one is used only at the \"maximum\" aggressiveness.”\n\n“What's crazy is maximum aggressiveness sometimes does better at removing bgvox than actual karaoke models”\n\n###### **De-noising (vinyl noise/white noise/general)**\n\n*Faster/quicker/not so efficient*\n\n- Denoise standard in UVR serving mainly for MDX v2 noise (like in HQ\\_1-5, iirc it uses HV [code](https://discord.com/channels/708579735583588363/887455924845944873/1021652469320781834); how it works: it separates \"twice, with the second try inverted, after separation reinverted, to amplify the result, but remove the noise introduced by MDX, and then deamplified by 6dB, so it still the same volume, just without MDX noise.”\n\n- Denoise model in UVR (it’s using VR’s UVR-DeNoise-Lite, 20kHz cutoff)\n\n(Options>Choose Advanced Menu>Advanced MDX-Net Options>Denoise output)\n\nmodel dedicated also for filtering noise existing in almost all MDX-Net v2 models in silent or quiet parts, but potentially also for more applications\n\n- Min Spec ensemble of *denoise model* and *denoise disabled* results in Advanced MDX-Net Options (Audio Tools>Manual ensemble>Min Spec)\n\nFilters more MDX noise in quieter parts than the denoise standard and the denoise model dedicated option in UVR.\n\n*Optimal, all-arounder, slower*\n\n- Gabox [denoise/debleed](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/denoisedebleed.ckpt) model | [yaml](https://huggingface.co/GaboxR67/MelBandRoformers/resolve/main/melbandroformers/instrumental/inst_gabox.yaml) | [Colab](https://colab.research.google.com/drive/1U28JyleuFEW6cNxQO_CRe0B2FbNoiEet).\n\nFor noise from fullness models (tested on v5n) - it can't remove the vocal residues - try out denoising on mixture first, then use fullness model.\n\n“It can preserve slightly more high frequency content in speech [than Aufr33 Mel model]” - Musicalman. “quite a bit slower”. Impressive “ability to clean up noisy vinyl or cassettes” (padybu), “from my testing it's been better than aufr33's [Mel]” (pipedream, \\_0nshuk)\n\n- Mel-Roformer Denoise by Aufr33 | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | MVSEP | links below\n\na) minimum aggressiveness model called “27.9959” a.k.a. “1”\n\n- good for white noise/static noise\n\nb) average “27.9768” a.k.a. model “2” or a.k.a. “aggressive”\n- works for footsteps, crunches, rustling, sound of cars, helicopters, VHS rips\n\nSome people like to use overlap 10 with these models.\n\nminimum - removes fewer effects such as thunder rolls, scratching or sweeping surfaces. It’s not as good at removing louder MDX noise when using AMD GPU instead of CPU on older system’s DirectML.dll in UVR\n\naverage a.k.a. aggressive - usually removes more noise than minimum, and also occasionally slight reverb/echo from room in vocals\n\n“The Mel-RoFormer denoise model is amazing at removing 78 RPM record crackle”\n\nIt’s much better at higher frequencies than the VR model below (it doesn’t damage them that bad, it's less destructive, but also less aggressive).\n\n~”Incredibly useful as mixing tools, it can pull all kinds of hum out of raw vocals, guitars, room mic's, bass, etc. before mixing with zero artifacts left over”.\n\nIf it’s not available for paid users of [uvronline.app](https://uvronline.app), use [this](https://uvronline.app/ai?discordtest) link | model files:\n\n[Less aggressive](https://mega.nz/file/rIRQGJ4D#9SHaPIXt8GRoi2SL29WUILW0g9dk26I5njyFPZuPJQ8) & [More aggressive](https://mega.nz/file/vM4mHTYQ#f_uCxxS_olfTR4iAsOc-XS6sfUecfbF-ZKXrk3IjbnY) | [yaml file](https://drive.google.com/file/d/1uwInhwgjOMIdOMTgj_oNR_dmaq7E-b3g/view?usp=sharing) | works with UVR Roformer [patch](#_6y2plb943p9v) | [MSST](#_2y2nycmmf53)\n\nFor UVR - use Install model option in MDX-Net, or copy ckpt files to models\\MDX-Net folder and yaml to model\\_data\\mdx\\_c\\_configs subfofolder. Choose the new model, press yes to set parameters, enable Roformer option, pick the config file corresponding with the copied yaml name. In case of “use\\_amp” error (e.g. in MSST), add “use\\_amp: true” in the yaml under optimizer and other\\_fix lines.\n\n“From most aggressive to least:\n\nVR Denoise\n\nVR Denoise Lite\n\n[Aufr’s] Mel-Rofo Denoise Aggr(essive)\n\n[Aufr’s] Mel-Rofo Denoise”\n\n- Bas Curtiz\nRather RX11 Spectral Denoise can be more aggressive than all of them at certain settings, and RX12 might be even better at certain cases.\n\n- Apollo Lew Uni model - tends to smooth out some even consistent noise in e.g. higher frequencies, making the spectrum more even there\n\n*-* UVR De-Noise by aufr33 “minimum aggressiveness” on [*x-minus.pro/uvronline.app*](http://x-minus.pro/uvronline.app) (for premium or using [this](https://uvronline.app/ai?discordtest&test-mdx) link)\n\n(less aggressive than denoise model in UVR,\n“The (...) model is designed mainly to remove hiss, such as preamp noise. For vocals that have pops or clipping crackles or other audio irregularities, use the old denoise model“. Grabs “sound effects in old recordings (radio drama”, might make “soft voices sound weak”).\n\n- UVR De-Noise by aufr33 “medium aggressiveness” on x-minus (same as default for free users) - it seems to be even less aggressive than UVR-DeNoise-Lite in UVR\n\n- Mel-Roformer De-Crowd by Aufr33/viperx (x-minus.pro/UVR/[DL](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt)/[yaml](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/download/v.1.0.4/model_mel_band_roformer_crowd.yaml))\n\n(“to remove background noise when denoise models were failing [not sure if it was rain or wind”, can remove vinyl noises])\nFor UVR, change the model name to the one from the attached yaml, copy chkpt to models\\MDX\\_Net\\_Models, and yaml to model\\_data subfolder, then set overlap 2 or use ZFTurbo inference [script](https://github.com/ZFTurbo/Music-Source-Separation-Training)] - more effective than MDX below at times)\n\n- yxlllc’s harmonic noise separation VR [model](https://github.com/yxlllc/vocal-remover/releases/tag/hnsep_240512) (can be used in UVR: rename “model” to some model name, and pt extension to pth, then use Install model option and set config settings to: VR 5.1, 32/128, 1band\\_sr44100\\_hl512; “very good at further remove the noise from a dereverbed vocal yet it is mono. (...) it did have two channels on export but both of them have audible information except one has only some noise that can be easily removed then mix the other channel up to stereo” - mohammedmehditber. Maybe attached CLI code will have better mono model handling)\n\n- Rifforge by mesk [model](https://huggingface.co/meskvlla33/rifforge/tree/main) final | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (might potentially serve also as denoiser, because it can pick up original mixture noise, and filter it out along with vocals)\n\n*Vocal models as denoisers*\n\n- Unwa BigBeta 5e and 6 (5e “good when your mic/pc makes a lot of noise. All the denoise models are a bit too harsh for ASMR” - giliaan, both “for denoising a conversation, was better than: UVR Denoise, MDX23cInstVocHQ, HQ5, KimVocal2, VocFT, Apollo, MelRof-aufr33-Denoise, GaboxDenoise, BanditV2cinematic, ViperX-BSrof-1297” - mixamillion) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | MVSEP | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) | [MSST-GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) | [UVR instruction](#_6y2plb943p9v) | [Model](https://huggingface.co/pcunwa/Mel-Band-Roformer-big/tree/main) | yaml: big\\_beta5e.yaml | [fixed](https://drive.google.com/file/d/1YRv1j0zMs9hk3-On2z6uwfZbsQ7l1LFP/view?usp=sharing) yaml for AttributeError in UVR\n\n*-* BS-Roformer *viperx 1296 /* MVSEP BS-Roformer 04.24 / *Gabox BS\\_ResurrectioN (*denoising and derumbling working the most efficiently on vocals here too) x-minus/[Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)/UVR/MVSEP\n\n- Kim Mel-Roformer (works for denoising and debleeding vocals well)\n\n- Vocal model like Voc\\_FT or Unwa’s ft2 bleedless (“it can sometimes isolate the vocals without the noise. And has a better result than a normal denoiser model” - Kashi)\n\n- Mel-Roformer Karaoke (by aufr33 & viperx) (to remove noise from a dialogue, mostly rustling in the background)\n\n[x-minus.pro](http://x-minus.pro/) / [uvronline.app](http://uvronline.app) / [mvsep](https://mvsep.com/)\n\n[model file](https://mega.nz/file/qQA1XTrb#LUNCfUMUwg4m4LZeicQwq_VdKSq9IQN34l0E1bb0fz4) (UVR [instruction](#_6y2plb943p9v))\n- Mel-Roformer Duality model (excellent for pops and clicks in mixture to get clean vocals out of 45 RPM vinyl mixture - bratmix)\n\n*Other tools*\n\n*-* [resemble-enhance](https://github.com/resemble-ai/resemble-enhance)(available on x-minus, but only as denoiser for voice/vocals, and on [HuggingFace](https://huggingface.co/spaces/ResembleAI/resemble-enhance), [site](https://www.resemble.ai/enhance/); works good for wind/outside noise)\n\n*-* [tape.it/denoiser](https://tape.it/denoiser) *-* (“great tool for removing tape hiss. Seems to be free without limitation at this point in time, though it seems to have issues with very large files [20 mins etc])”\n\n- [crowdunmix.org/try-rokuon/](https://crowdunmix.org/try-rokuon/)\n\n- [github.com/eloimoliner/denoising-historical-recordings](https://github.com/eloimoliner/denoising-historical-recordings) (mono, old 78rpm vinyls, fixed [Colab](https://colab.research.google.com/drive/1KjlQYq5DFH0BhHSIYGaKg-syLafw6yf2), sometimes deletes SFX, but not as much as UVR De-Noise by aufr33 in old recordings)\n\n- [audo.ai](https://audo.ai/)\n\n- [github.com/sp-uhh/avgen](https://github.com/sp-uhh/avgen)\n\n- [github.com/Rikorose/DeepFilterNet](https://github.com/Rikorose/DeepFilterNet) | [Huggingface](https://huggingface.co/spaces/hshr/DeepFilterNet2) (for speech)\n\n- [studio.gaudiolab.io](https://studio.gaudiolab.io) (new Noise Reduction feature)\n\n- possibly [USS-Bytedance](#_4svuy3bzvi1t) (when similar sample provided)\n\n- [Various AI tools](https://github.com/FORARTfe/HyMPS/blob/main/Audio/AI-Enhancing.md#denoisers-) - list by FORARTfe/HyMPS | [#2](https://github.com/FORARTfe/HyMPS/blob/main/Audio/Treatments.md#noise-reducing-)\n\n- [Free apps](https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?pli=1#heading=h.70231k4ydkfw)\n\n*Older models*\n\n- [UVR-DeNoise](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-DeNoise.pth) (trained by FoxJoy) - DeNoise-lite above is less aggressive\n\nYou can use negative values in UVR for that model. -20/-25 - for cleaning vocals\n\n-10/-15 - when some vocals are gone - Gabox\n“It's decent, but it needs a little work compared to\" RX 10 spectral denoise.\n\n- voc\\_ft - works as a good denoiser for old vocal recordings\n\n- GSEP 4-6 stem (\"noise reduction is too damn good. It's on by default, but it's the best I've heard every other noise reduction algorithm makes the overall sound mushier\", it’s also good when GSEP gives too noisy instrumentals with 2 stem option, it can even cancel some louder vocal residues completely)\n\n- UVR-MDX-NET Crowd HQ 1 (UVR/x-minus)\n\n- [This](https://imgur.com/a/B0kOY8I) VR ensemble in [Colab](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl) (for creaking sounds, process your separation output more than once till you get there)\n\n##### *Plugins* *(different types of noise)*\n\nFree\n\n- Guide for classic denoiser tools in DAW, e.g. for debleeding (Bas Curtiz): <https://docs.google.com/spreadsheets/d/1XIbyHwzTrbs6LbShEO-MeC36Z2scu-7qjLb-NiVt09I/edit?usp=sharing>\n\n- [Bertom Denoiser Classic](https://bertomaudio.com/denoiser-classic.html) (or paid [Pro](https://bertomaudio.com/denoiser-pro.html))\n\n- [Accusonus ERA 6](https://archive.org/details/era-bundle-v-6.2.00-voice-changer-v-1.3.10) (released for free after FB acquisition) - bundle with also de-esser, voice auto-EQ, voice leveller (better than soothe2 for de-essing for some people), deplosive, declipper and more\n\n- Airwindows DeNoise - multiband noise gate with various controllable bands\n\nPaid\n\n- Izotope RX 10 Spectral De-noise (“I think RX 10's Spectral De-noise is better at removing the noise MDX [model] makes”)\n\nActually, the new UVR De-noise model is really good when you combine it with RX 10's Spectral De Noise”, better than Lab 4 and current models, also more tweakable, but takes more time to set (now also RX 11 available - should be even a step forward)\n\n- Acon Restoration Suite 2’s DeNoise “is decent if you can build a good noise profile with the Learn option, I like to have a few in series set to do -3dB of NR.” - theophilus3711\n\n- SOUND FORGE Audio Cleaning Lab 4 (formerly Magix Audio & Music Lab Premium 22\n\n[2016/2017] or MAGIX Video Sound Cleaning Lab - basically the same stock plugin across all of these versions)\n\n- Unchirp VST (for musical noise, artefacts of lossy compression)\n\n- Izotope Dialogue Dereverb (it is also denoiser)\n\n- Izotope Dialogue Isolate in RX11\n\n- Waves Clarity Vx / Pro (designed mainly for vocals)\n\n- Brusfri by Klevgrand\n\n- prime:vocal (multitool with also dereverb and other vocal enhancers)\n\n- DxRevive Pro (mainly for dialogue: denoiser, declipper, dereverb, enhancer, codecs artefacts removal)\n\n- Acon Dialogue:Extract 2 (dereverb, denoise)\n\n- Cedar StageVox - (dereverb and denoiser capable of working in real-time for tracking)\n\n- Waves Silk Vocal - (has optional Live version too)\n\nVisit also [Debleeding/cleaning](#_tv0x7idkh1ua) e.g. inverts\n\n###### **Bird sounds**\n\n- [Google's bird\\_mixit](https://github.com/google-research/sound-separation/tree/master/models/bird_mixit) (code & checkpoint for their bird sound separation algo; [more](https://blog.research.google/2022/01/separating-birdsong-in-wild-for.html?m=1))\n\n[De-reverb](#_kcswx79hi856) models, e.g.:\n\n- UVR-DeEcho-DeReverb (doesn't work for all songs)\n\n[Vocal](#_n8ac32fhltgg) models, e.g.:\n\n- MVSEP BS-Roformer 2025.07 (if you already have birds in a vocal stem, as most vocal models do iirc, that may do the trick)\n\n[SFX](#_owqo9q2d774z) models\n\nZero shot solutions:\n\n(you can try them to get e.g. birds SFX and then use as a source to debleed or maybe try to invert phase and cancel it out)\n\n- [AudioSep](#_tvbntqdvkn9n)\n\n- [USS-ByteDance](#_4svuy3bzvi1t) (for providing any, at least, proper sample)\n\n- [Zero Shot](#_g37f4a6hnxm0) (currently worse for SFX vs Bytedance)\n\n- custom stem separation on Dango (paid, 10 seconds for free\n\nTechnically, if bird noises are in vocals, then equally:\n\n- RTX Voice,\n\n- AMD Noise Suppression or even\n\n- Krisp and\n\n- Adobe Podcast\n\nmight get rid of them, but at least the last changes the tone of voice, and the previous may work good only with voice instead of vocals.\n\nSpectral editing\n\n###### **De-clippping/de-limitter/de-compression of dynamics** (for loud or brickwalled songs with overly used compressor/clipper/limiter/distortion - transients/peaks recovery)\n\nFree declipper plugins:\n\n- ReLife 1.42 by Terry West (works best for stereo tracks divided into mono, newer versions are paid)\n\n- [ERA 6](https://archive.org/details/era-bundle-v-6.2.00-voice-changer-v-1.3.10) declipper (released in bundle for free after they were bought by Meta)\n\n- Airwindows AQuickVoiceClip - mainly for streamers yelling into the microphone “It’s not a ‘un-clipper’ but it tames the distortion a bit.”\n\nAI tools (not plugins):\n\n- [RemFX](#_713q0eyar6o3) (contains model to get rid of distortion and compression; “mostly for singular sounds, it won't work for whole mixes like songs”)\n\n- [Rukai](https://crowdunmix.org/rukai/) (for speech and instrumentals)\n\n- [Amis](https://crowdunmix.org/try-amis/) (mainly for speech)\n\n- [stet-stet’s DDD](https://github.com/stet-stet/DDD) (speech, req. decent CML knowledge to setup)\n\n- [jeonchangbin49’s De-limiter](https://github.com/jeonchangbin49/De-limiter) ([HF](https://huggingface.co/spaces/jeonchangbin49/De-limiter) | [Colab](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio/Other/De-Limiter.ipynb)) (“if you have any squished tracks that Apollo doesn't handle well, try passing it through that AI de-limiter first” - macularguide\n\n“parallel mix - “define[s] how the normalized input and the de-limited inference will blend together. Where 0 is 100% normalized and 1 means 100% de-limited” - santilli\\_\nFor Colab don’t provide the file name in the input field, leave just directory as it was, it will scan its content)\n\n- Hendrix-ZT2 [pyaudiorestoration](https://github.com/HENDRIX-ZT2/pyaudiorestoration)’s Spectral Expander - for single band compression or aggressive tape AGC ([more](https://github.com/HENDRIX-ZT2/pyaudiorestoration/wiki/Spectral-Expander))\n\n- [Neutone FX](https://neutone.ai/fx)>Clipper, actually an AI plugin ([instruction](https://qosmo.notion.site/Getting-started-607e99436b1243b5a6d273a76ee4811a))\n\nMore GH repos - [HyMPS](https://github.com/FORARTfe/HyMPS/blob/main/Audio/AI-Enhancing.md#declippers-) list | [#2](https://github.com/FORARTfe/HyMPS/blob/main/Audio/Treatments.md#declipping-)\n\nPaid: Ozone’s 12 Delimiter, Acon DeClip, ProAudioDeclipper, Declipper in Thimeo Stereo Tool (a.k.a. Perfect Declipper - standalone; both free for Winamp), iZotope RX De-clip (in RX Editor or as plugin), Pure D compressor by Flux Audio, HMD Uncompressor, FX Factory De-Clipper, DxRevive Pro (mainly for dialogue, also denoiser, dereverb, enhancer, codecs artefacts removal), Declipper in Magix/Sound Forge Cleaning Lab, Adobe Audition’s Declipper, sometimes even Fabfilter Pro-MB multiband compressor might be useful\n\nSee [comparison](https://divideconcept.github.io/Restoration-Comparison/)\n\n**Clippers** (the opposite, but useful in the whole mastering chain, sometimes in a tandem with the above in the whole chain):\n\nFree\n\n- KClip Zero\n\n- FreeClip (sometimes you can use both in the same session for interesting results)\n\n- GClip\n\n- Limiter6 by vladg (Clipper module)\n\n- Initial Clipper\n- Airwindows Hypersoft - “a more extreme form of soft-clipper”\n\n- Airwindows OneCornerClip - compared to OG ADClip, it retains the character of sound\n\n- Airwindows ADClip8 - “loudenator/biggenator”\n\n- Airwindows ClipOnly - “2-buss safety clipper at -0.2dB with powerful anti-glare processing.”\n\n- Hornet Magnus Lite - clipper and limiter modules\n\n- Razor Clip\n\nPaid: Orange Clip 3 (multiband mode), Gold Clip (widely praised lately), Gold Clip Track, Soundtheory Kraftur, KClip 3, SIR Standard Clip (popular, though KClip 3 may give better results), Izotope Trash 2, DMG Tracklimit, TR5 Classic Clipper (great for a kick), KNOCK (hard & soft clipper), Boz Little Clipper 2, Flatline (clipper), Newfangled/Eventide Saturate (spectral clipper), JST Clip, Brainworx Clipper, Elysia Alpha Mastering Compressor (soft clip module), soft clipper in Cubase, Music Hack Fuel, Music Hack Fuel Clipper (saturation, dynamics, limiting/soft clipper)\n\n**De-expliciter** (removes explicit lyrics from songs)\n\n<https://github.com/tejasramdas/CleanBeats> (more recent fork)\n\n###### **De-breath**\n\n- Sucial de-breath VR v1/2 [models](https://huggingface.co/Sucial/De-Breathe-Models/tree/main)\n\n- Aspiration Mel [models](https://huggingface.co/Sucial/Aspiration_Mel_Band_Roformer) by Sucial | [config](https://drive.google.com/file/d/1EJ2hKlGdVstLvxmZX-jlp64Htd8MGGba/view?usp=sharing) | MVSEP (one variant) (“grabs a lot more than just breaths and other sounds too, de breath gets ONLY breaths” - isling)\n\n- yxlllc’s harmonic noise separation VR [model](https://github.com/yxlllc/vocal-remover/releases/tag/hnsep_240512) (can be used in UVR: rename “model” to some model name, and pt extension to pth, then use Install model option and set config settings to: VR 5.1, 32/128, 1band\\_sr44100\\_hl512;\n\n“it's really useful while making covers and when an OG song is airy/whispery. I'm not the sharpest tool in the shed so i resort to using websites like this” - wancitte\n\n“very good at further removing the noise from a dereverbed vocal yet it is mono. (...) it did have two channels on export but both of them have audible information except one has only some noise that can be easily removed then mix the other channel up to stereo” - mohammedmehditber. Maybe attached CLI code will have better mono model handling)\n\n- Accusonus ERA Bundle (free/gave away plugin after FB acquisition) [download](https://archive.org/details/accusonus-era-bundle-v-6.2.00)\n\n- Dead Duck (free breath removal/gate plugin)\n\n- Noiseworks VoiceAssist (paid plugin supporting ARA, having its own audio editor inside the session highlighting all the breaths allowing to turn them down automatically or with specific threshold for each)\n\n- Izotope RX11’s breath control (paid; VST/Audio Editor)\n(“The “remove breaths” preset they have on it Usually works about 95% of the time for me” -5b)\n\n- DNR v3 (Sometimes (...) (without vocal help), the grunts and breathing will be in the SFX, and the dialogue in the speech, while both will be in the music) - fal\\_2067\n\n- Mel-Roformer de-reverb by anvuew v2 (a.k.a. 19.1729 SDR) | [DL](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt) | [config](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(“when it gets breathy or like it has falsetto, it seems to remove a lot of it, it's very weird at the breathy-ish parts of it lol, will be using this mainly if there are heavy vocal effects I want removing” - isling”)\n- MDX23C-InstVoc HQ (“ in some cases, it also removes some airy parts from specific words, and some non-verbal sounds (breathing, moaning).”\n\n*\\_\\_\\_\\_\\_*\n\n*Manipulate various* [*MDX settings*](#_6q2m0obwin9u) *and* [*VR Settings*](#_atxff7m4vp8n) *to get better results*\n\n*\\_\\_\\_\\_*\n\n*Final resort - specific* [*tips to enhance separation*](#_929g1wjjaxz7) *if you still fail in certain fragments or tracks*\n\n*\\_\\_\\_\\_*\n\n*Get VIP models in UVR5 GUI (optional donation) - it's if you can't find some of the listed above or in top ensembles chart:*\n\n[*https://www.buymeacoffee.com/uvr5/vip-model-download-instructions*](https://www.buymeacoffee.com/uvr5/vip-model-download-instructions)\n\n*(dead links)*\n\n*List of VR models in UVR5 when VIP code is entered (w/o two denoise by FoxyJoy yet):*\n\n[*https://cdn.discordapp.com/attachments/708595418400817162/1104424304927592568/VR-Arch.png*](https://cdn.discordapp.com/attachments/708595418400817162/1104424304927592568/VR-Arch.png)\n\n*List of MDX models when VIP Code is entered (w/o HQ\\_3 and voc\\_ft yet and MDX23C):*\n\n[*https://cdn.discordapp.com/attachments/708595418400817162/1103830880839008296/AO5jKyQ.png*](https://cdn.discordapp.com/attachments/708595418400817162/1103830880839008296/AO5jKyQ.png)\n\n*More updated list can be found in that UI:*[*https://huggingface.co/spaces/TheStinger/UVR5\\_UI*](https://huggingface.co/spaces/TheStinger/UVR5_UI)\n\n*(some models might be not from Download Center/VIP code)*\n\n*Models repository backup of all UVR5 models in separate links*\n\n[*https://github.com/TRvlvr/model\\_repo/releases/tag/all\\_public\\_uvr\\_models*](https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models)\n\n*Some models might be not available in the repository above, as e.g. 427 model which is available only after entering VIP code.*\n\n*(just in case, here's the link for 427:*\n\n[*https://drive.google.com/drive/folders/16sEox9Z\\_rGTngFUtJceQ63O5S9hhjjDk?usp=drive\\_link*](https://drive.google.com/drive/folders/16sEox9Z_rGTngFUtJceQ63O5S9hhjjDk?usp=drive_link)\n\n*Copy it to UVR folder\\models~MDX folder and rename the model name to:\nUVR-MDX-NET\\_Main\\_427)*\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nQ: “Hello, we are now getting very good results in turning music that includes human voice into only instrumental. Sometimes there are vocal leaks that we can call just crumbs or whispers, but this is not that important. But now we have another important problem. People who do not want to listen to vocals, that is, who only want to listen to the music that remains when the vocals are deleted, encounter a problem. Sometimes there are big gaps in the songs. Because not every song is arranged in a way that continuous instrumental music is heard, and when the vocal part is deleted, a perception of silence or emptiness can occur. It is as if the music does not have continuity, and everything is cut off in some parts of the song. The reason for this is that when the vocal is deleted, the vocal melody is also destroyed. Although it seems like a good idea at first, when we listen to music that is only instrumental with the vocals deleted, that song loses a lot of its identity. As a result, I want to learn how we can preserve the vocal melody after deleting the vocal. What I mean is, can we divide the song into instrumental and vocal and then turn the melody of the vocal part into an instrument such as piano, bass guitar, flute, etc. Then, I want to combine this vocal melody with the instrumental result.” - sweetlittlebrowncat\n\nA: You could try out some older, less aggressive models than Roformers. Even GSEP.\n\nThey can sometimes leave some melody from vocals (in fact, some quiet harmonies), so the song is not so \"dead\" after separation. Actually, you could try to separate vocals into separate stems to look for something useful to mix with the instrumental quietly.\n\nCheck Vocal models, then separate further with BV/Karaoke models or alternatively check GSEP, MDX-Net and maybe even VR models. Open document outline of this document and there you have all the interesting sections.\n\nAlso, you can use:\n\n\"<https://audimee.com/>\n\nsplit instrumental from vocal\n\nuse vocal as input\n\nconvert it into piano, bass, flute, whatever they offer\n\nmerge\n\nprofit\" Bas Curtiz\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### Mixing/mastering\n\nIf you already did your best in separating your track, tried out [ensembles](#_nk4nvhlv1pnt) or [manual weighting](#_oxd1weuo5i4j), also read [tips to enhance separation](#_929g1wjjaxz7), but if it lacks original track clarity, you can use:\n\n- Demudder added in the beta [Roformer patch](#_6y2plb943p9v) #14 in UVR (if it won’t increase vocal residues too much; won’t work with even small chuk\\_size in AMD/Intel 4GB VRAM GPUs with Roformers)\n\n- [AI Mastering services](#_ki1wmwa90cgp) (mainly for instrumentals)\n\n- For improving vocals' clarity, you could even “train a RVC model out of clean [artists] audio clips and then inference this audio with the model you made. It takes some time, but the results are worth it” John UVR ([examples](https://discord.com/channels/708579735583588363/900904142669754399/1299077849487118346)). Workflow explained later below.\n\n- Aufr33’s expander template for Reaper 7.05 ([DL](https://drive.google.com/file/d/1lFGJiGIGcvKuz1gQtlppA0sMPhCC1P9T/view?usp=sharing)) fixing ducking in instrumentals (explained later below)\n\n- Read [Make your own remaster](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/) (more below)\n\nInherited flaws of instrumentals from AI separation models\n\nLots of audio engineers recommend leaving space for vocals in production. The result is, lots of instrumentals, no matter how good the model is, might have a hole in the midrange.\nSometimes even adding vocals to the mixture makes psychoacoustic effect making an impression that the instrumental sounds fuller. Similarly to the fact that adding artificial noise can increase fullness metrics of models. On headphones, or by sticking one of your smartphone speakers to your ear, you might identify the constant buzzing of models easier, while even on nearfield monitors it might just make an impression that the model is fuller, and the noise won’t be noticeable. Also, current models have troubles with distinguishing noise of the original mixture as background of instrumentals with vocals. Buzzing pop-ins are sometimes remnants of constant noise in the mixture, sometimes louder, sometimes quieter - in places when vocals appear and disappear.\nSome model results will be muddy, or also too noisy, even for the best model result or ensemble. You need a result which is not destructive for any of the instruments in order to be able to be picked by a specialised single instrument model to mix it further. If you already have good instrumental result:\n\n######\n\n*Mixing track from scratch using various AIs/models*\n\nNow if you're not afraid of mixing, and e.g. if you have clear instrumental already or whole track to remaster, then you can use the following for such a task:\n\n- very quiet mixture (original file; so instrumental mixed vocals - if you remaster whole OG song)\n\n- stems from demucs\\_ft or BS-Roformer SW (both [MDX23](#_jmb1yj7x3kj7) Colab or Ensemble of various models on MVSEP can be even better than Demucs, and vs SW esp. for bass - check out [4 stems](#_sjf0vefmplt) section) mixed with also:\n\n- [drumsep](#_2u19k7ty9b00) MDX23C free model result (but you can also test out stems from the old drumsep and LarsNet [although they have worse SDR], or newer MVSEP drumsep models)\n\n- GSEP result for piano or guitars (MVSEP models can be handy too, now the SW model for those stems are much better, and previously the only downloadable decent guitar model released becruily, and demucs\\_6s is mediocre, now we have SW)\n\n- for bass both GSEP and Demucs ft/MDX23 aligned and mixed together (or simply from MVSEP ensemble or MDX23 Colab) or see [bass models](#_sjf0vefmplt) for more recent list incl. the SW\n\n- \"other\" stem could be paired like above too (but drums remained only from e.g. Demucs\\_ft - they were cleaner than GSEP and good enough)\n\n- Actually in one of those guitars weren't recognized in guitar stem, but were in other stem, so I mixed that all together (it wasn't busy mix)\n\n- If it's not instrumental, probably mixing more than one vocal model might do the job, check various [vocal ensembles](#_i7k483hodhhu) (but it’s essentially what MDX23 and ensembles on MVSEP do, but the latter with private models, it’s not exactly the same - you can add different effects for every of such tracks, having fuller sound and change their volume manually).\n\nThe all above gave me an opportunity for a very clean mix and instruments using various plugins while setting correct volume proportions vs mastering just instrumental separation result or plain 3 stems from Demucs.\n\nFor example, demucs\\_ft or other single or incorporated drums model provides much higher quality of drums than the old Demus drumsep during mixing, so in such case you won’t use its stems on its own, but you will use drumsep more to overdub the specific parts of instruments more (e.g. snares - that’s the most useful part of using drumsep as normally it’s easy to bury snare in a busy mix when hi hats kick in overly in a heavily processed instrumental stem or drums stem - not you won’t have to push drums stems from demucs\\_ft or MDX23 so drastically).\n\n*Sam Hocking’s method for enhancing separated instrumentals from a mixture (song containing instrumental and vocals):*\n\n“I think looking at spectrally significant things like snares can work. We can already do it manually by isolating the transient audio/snare pattern as MIDI and then triggering a sample from the track itself to reinforce, but it's time-consuming and requires a lot of sound engineering to make it sound invisible.”\n\nYou can probably use Cableguys Snapback plugin for that, or maybe UVI Drum replacer.\n\nSam’s method will work the best in songs with samples instead of live recordings (if the same sounds repeat across the whole beat). [More](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.ww5380bbnab3) of those plugins.\n\nPS. In the late 2025 we received an info about Apple Music rejecting Atmos mixes of some legacy music made with separation models (ensembles, and then probably some for 4-6 stems), even though the mixes sounded good, and were accepted by labels and artists. Also, we know that these separation methods worked fine in the past, at least for some other engineers. We suspect that they might use automated tools catching specific artefacts usually seen in separation models on spectrograms of extracted channels from the whole Atmos mix.\n\n“I don't think there's a need of really advanced and expensive method to detect source separated stems, most of the time, just looking at the background noise is enough to tell, original stem vs separated one [[click](https://imgur.com/a/znLIBph)]\n\n+ kind of \"aliasing\" artifacts and/or dither residues popping here and there...\n\nThere are lots of patterns than can make separated audio stems identifiable, I don't think it's hard to develop a model to spot them with quite good accuracy (even if not really audible to human ears)” - jarredou\n\nTo sum up:\n\n######\n\nTips to [enhance separation](#_929g1wjjaxz7)\n\n[Demudder](#_bviye361m0v) in UVR/x-minus\n(increases vocal residues)\n\nAI mastering [services](#_ki1wmwa90cgp)\n\n[Blending with RVC model](#_p4mh61gmvsx2)\n\n###### AI audio upscalers [list](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit#heading=h.i7mm2bj53u07)\n\n[Make your own remaster](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/):\n\n**More clarity/better quality/general audio restoration of separated stem(s)**\n\nHave complete freedom over the result, using (among others) spectral restoration plugins to demudd the results of separations freely with plugins. Then you can use the result further with e.g. AI upscaler or in reverse.\n\nE.g. from plugins, you can start by using [Thimeo Stereo Tool](https://www.thimeo.com/stereo-tool/download/) which has a fantastic re/mastering chain feasible for spectral restoration useful for instrumentals sounding too filtered from vocals and lacking clarity. Also use [Unchirp](https://www.zynaptiq.com/unchirp/) which states great complement to Thimeo Stereo Tool, although focuses more on the already existing spectrum.\n\nYou can also play with free Airwindows [Energy](https://www.airwindows.com/energy-vst/)/[Energy2](https://www.airwindows.com/energy2/) and [Air](https://www.mediafire.com/folder/kua5r9x27mwrk/Plugins_Backup)/[Air2](https://www.airwindows.com/air2/) (~~or Air3,~~ ~~MIA Thin)~~ plugins for restoration, and furthermore some compressors or other plugins and effects mentioned in the link above.\n\nIf you're not afraid of learning a new DAW, [Sound Forge Cleaning Lab 4](https://www.magix.com/us/music-editing/sound-forge/sound-forge-audio-cleaning-lab/) has great and easy built-in restoration plugins too (Brilliance, Sound Clone>Brighten Internet Sources) with complete mastering chain to push even further what you already got with Unchirp and Stereo Tool.\n\nIzotope RX Editor and its Spectral Recovery may turn out to be just not enough, but the rest of RX plugins also available as VST can become handy, although Cleaning Lab has lots of substitutes for filtering various kinds of noise. Working comfortably in real-time with all the plugins opened simultaneously while combined is more comfortable than RX Editor workflow. But you can use some plugins from RX Editor as separate VSTs in other DAWs including Lab 4. Ozone Advanced might turn out useful too.\n\nActually, once you finish using the plugins above, now you can try out some of the mastering services and not in the opposite way (although you might want to meet some basic requirements of AI mastering services to get the best results first, e.g. in terms of volume).\n\nQ: AI vocal remover did not \"normalize\" (I don't think it's the right word) the track on the moment where the vocal was removed, so it's noticeable, especially on instrument-heavy moments.\n\nI make things better by creating a backup echo track by combining stereo tracks with inverted ones and adding this to the main track with -5db, but it's still not good enough. Are there any technics that separate track with not noticeable effects or maybe there is some good restoration algorithm that I can use\n\nA: If vocals are cancelled by AI, such a moment stands out from the instrumental parts of the song.\n\nSometimes you can rearrange your track in a way that it will use instrumental parts of the song when there are no vocals, instead of leaving AI separated fragments. Sometimes it's not possible, because it will lack some fragments (then you can use only filtered moments at times), and even then, you will need to take care about coherence of the final result in the matter of sound as you said.\n\nAt times, even fade outs at the ends of tracks can have decent amounts of instrumentals which you can normalize and then use in rearrangement of the track. E.g. you normalize every snare or kick and everything later in fade out, and then till the end, so it will sound completely clean.\n\nGenerally it's all time-consuming, not always possible, and then you really have to be creative using normal mastering chain to fit filtered fragments to regular unfiltered fragments of the track.\n\nYou can also try out layering, e.g. specific snare found in a good quality in the track. May work easier for tracks made with quantization, so when the pattern of drums is consistent throughout the track. Also, you can use 4 stem Demucs ft or MDX23 and overlap drums from a fragment where you don’t hear vocals yet, so drums are still crispy there.\n\n**Ducking effect eliminator**\n\nYou can also check Aufr33 Reaper 7.05 project aimed at alleviating this issue:\n\n“the music volume is reduced where there are vocals”. Instruction:\n“Just place two stems: vocals and music. Adjust the Expander if necessary.”\n\n“It's just an expander side-chained to vocals. You can replicate this in any other DAW.”\n\n[Src](https://discord.com/channels/708579735583588363/773763762887852072/1382544710857527358) | [mirror](https://drive.google.com/file/d/1lFGJiGIGcvKuz1gQtlppA0sMPhCC1P9T/view?usp=sharing)\n\n- Nice [chart](https://media.discordapp.net/attachments/708579735583588366/1123408156664533002/image.png?width=1206&height=687) (>moved to “Advanced chain processing chart” at the bottom of Karaoke section (use search)\n\ndescribing process for creating AI cover (replace kim vocal with voc ft there, or MDX23 vocals/UVR top ensemble/Roformers).\n\n###### **Blending with RVC model** *(by Gabox & dubpluris a.k.a. Mark | Avalaunch - text)*\n\n“*My use Case*:\n\nRestoring older, lower-quality vocal recordings (e.g., camcorder recordings from the 90s) using RVC models trained on clean studio vocals from the same artist.\n\n*Practical Workflow for Using RVC in Vocal Restoration*\n\nThe idea is not to replace old performances entirely, but to enhance them. A few key points came out of the discussion:\n\n- Blending, not replacing: Using only the RVC output will usually sound artificial. The better approach is to run the old vocal stems through the trained RVC model and then blend the AI-generated stem with the original. This preserves natural performance qualities while adding clarity.\n\n- Input quality matters: Even “decent but rough” camcorder audio can work. Extremely degraded sources, however, will still produce artifacts (“bad input = bad output”).\n\n*Complementary tools:*\n\nFlashSR – an audio super-resolution method that restores high frequencies and improves fidelity before running RVC. (<https://mvsep.com/en/demo?algorithm_id=60>)\n[AudioSR might potentially give better results, but it’s much slower;\n“Imo much better candidates are: [AP-BWE](https://github.com/yxlu-0102/AP-BWE) ([Colab](https://colab.research.google.com/drive/1g4r0Ejd-AeVpNSKmFxa8ZMQkK_8yhuAq?usp=sharing) | [new repo](https://github.com/pokepress/aero?tab=readme-ov-file) [old]) and [Clearer-Voice-Studio's Clear Voice](https://github.com/modelscope/ClearerVoice-Studio) (my favorite is the 2nd one - codename0; more simplified version by codename0 - [DL](https://drive.google.com/file/d/13Mwk_K8K4198Bd8aQ_Ajm8vBEprAjTU_/view)”]\n\nMatchering – matches EQ/tonal balance of rough recordings to studio references, either standalone or integrated into UVR5. Using a clean studio version of the artist as the reference and the old performance as the target is recommended.\n\n(<https://sergree.github.io/matchering/>) - Available on UVR\n\n[You can also try out <https://masterknecht.klangknecht.com/>]\n\n*General workflow:*\n\n1. (Optional) Pre-process low-quality audio with FlashSR.\n\n2. Train RVC on clean studio stems.\n\n3. Run inference on the old stems with the trained model (i.e., feed the cleaned original vocal through the trained RVC model to get a converted stem.)\n\n4. Blend, align and mix original + RVC stem (RVC as enhancement, not replacement) until it feels natural.\n\n5. Use Matchering or other mastering techniques to polish.\n\nFor a comprehensive remastering workflow, see [How to make your own remaster](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/).\n\nThe overall takeaway: RVC can be used for restoration, but it works best as part of a chain of tools (super-resolution, EQ matching, mastering), with the human performance always kept at the center through blending rather than full replacement.”\n\n###### (old) **More descriptions of models**\n\n**and AIs, with troubleshooting and tips**(most models here are dated as it lacks Roformers)\n\n(Instruction here was moved to [Reading advice](#_jx9um5zd7fnp))\n\n*Older models descriptions*\n\n- Inst fullband (fb) HQ\\_3/4/5 x-minus, MVSEP, Colabs\n\nHQ\\_4 vs 3 has some problems with fadeouts when occasionally it can leave some vocal residues\n\nHQ\\_3 generally has problems with strings. mdx\\_extra from Demucs 3/4 had better result with strings here, sometimes 6s model can be good compensation in ensemble for these lost instruments, but HQ\\_3 gives some extra details compared to those.\n\nHQ\\_3/4 are generally muddy models at times, but with not much of vocal residues (near Gsep at times, but more than BS-Roformer v2).\n\nFor more clarity, use MDX23C HQ model (HQ\\_2 can have less vocal residues at times).\n\nAnother possibly problematic instruments are those wind ones (flute, trumpet etc.)\n\n- use Kim inst or inst 3 then\n\nHQ3 has worse SDR vs:\n\n- voc\\_ft, but given that HQ\\_3 is an instrumental model, the latter can leave less vocal residues at times.\n\n<https://mvsep.com/quality_checker/leaderboard2.php?id=4029>\n\n<https://mvsep.com/quality_checker/leaderboard2.php?id=3710>\n\nThese are SDR results from the same patch, so the voc\\_ft vs HQ\\_3 comparison is valid.\n\n- MDX23C\\_D1581 (narrowband) - usually worse results than voc\\_ft and probably worse SDR if evaluation for both models was made on the same patch\n\nCan be a bit better for instrumentals\n\n“The new model is very promising\n\nalthough having noise, it seems to pick vocals more accurately and the instrumentals don't have that much of the filtering effect (where entire frequencies are being muted).”\n\nWhile others say it’s worse than demucs\\_ft\n\n- [GSEP AI](#_yy2jex1n5sq)an online closed source service (cannot be installed on your computer or your own site). mp3 only, 20kHz cutoff.\n\nDecent results in some cases, click on the link above to read more about GSEP in the specific section below. This [SDR leaderboard](https://mvsep.com/quality_checker/leaderboard.php) underestimates it very much, probably due to some kind of post-processing used in GSEP [probably noise gate and/or slight reverb or chunking). As a last resort, you can use 4-6 stems option and perform mixdown without vocal stem in e.g. Audacity or other DAW. 4-6 stem option has additional noise cancellation vs 2 stem.\n\nGSEP is good with some tracks with a busy mix or acoustic songs where everything else simply fails, or you’re forced to use the RX10 De-bleed feature.\n\n- GSEP is also better than MDX-UVR instrumental models on at least tracks with **flute** and possibly duduk/clarinet or oriental tracks, and possibly tracks with only piano, as it has a decent dedicated piano model.\n\n- To address the issue with flute using MDX-UVR, use the following ensemble: Kim\\_Inst, HQ1, HQ2, INST 3, Max Spec/Max Spec (Anjok).\n\n- Sometimes kim inst and inst3 models are less vulnerable to the issue (not in all cases).\n\n- Also, main 406 vocal model keeps most of these trumpets/saxes or other similar instruments\n\n- Passing through a Karaoke model may help a bit with this issue (Mateus Contini [method](#_79cxg1a64b11)).\n\n- inst HQ\\_1 (450)/HQ\\_2 (498)/HQ\\_3 MDX-UVR fullband models in the Download center of UVR5 - great high quality models to use in most cases. The latter a bit better SDR, possibly a bit less vocal residues. Not so few like inst3 or kim ft other in specific cases, but a good point to start.\n\nWhat you need to know about MDX-UVR models is that they're divided into instrumental and vocal models and that instrumental models will always leave some instrumental residues in vocals and vice versa - vocal models will more likely to leave some vocal residues in instrumentals. But you can still encounter specific cases of songs when breaking that rule will benefit you - that might depend on the specific song. Usually, instrumental model should give better instrumental if you’re fighting with vocal residues.\n\nAlso, MDX-UVR models can sometimes pick up sound midi effects which won’t be recovered.\n\n- kim inst (a.k.a. ft other) - cutoff, cleaner results and better SDR than inst3/464 but tends to be more noisy than inst3 at times. Use:\n\n- inst3/464 - to get more muddy, but less noisy results, although it all depends on a song, and sometimes HQ\\_1/2/3 models provide generally less vocal residues (or more detestable).\n\n- MDX23 by ZFTurbo v1 - the third place in the newest MDX challenge. 4 stem. Already much better SDR than Demucs ft (4) model. More vocal residues than e.g. HQ\\_2 or Kim inst, but very clean results, if not the cleanest among all at the time. Jarredou in his fork fixed lots of those issues and further enhanced the SDR so it’s comparable with Ensemble on MVSEP, which was also further enhanced since the first version of the code released in 2023, and also has newer models and various enhancements.\n\n- [Demucs 4](#_m9ndauawzs5f) (especially ft 4 stem model; UVR5, Colab, MVSEP, 6s available) - Demucs models don't have so aggressive noise cancellation and missing instruments issue like in GSEP. Check it out too in some cases (but it tend to have more vocal bleeding than GSEP and MDX-UVR inst3/464 and HQ\\_3 (not always, though), and 6 stem has more bleeding than 4 stem, but not so much like the old mdx\\_extra 4 stem model).\n\n- [Models ensemble](https://mvsep.com/quality_checker/leaderboard2.php?&sort=instrum) in UVR5 GUI (**one of the best results** so far for both instrumentals and vocals SDR-wise). Decent Nvidia GPU required, or brace for 4 hours processing on 2/4 Sandy Bridge per whole ensemble of one song. How to set up ensemble [video](https://media.discordapp.net/attachments/767947630403387393/1070004461231149077/UVR5_-_How_to_setup_a_ensemble.mp4).\n\nGeneral video [guide](https://youtu.be/jQE3oHXfc7g) about UVR5.\n\n\"UVR-MDX still struggles with acoustic songs (with a lot of pianos, guitars, soft drums etc.)\" so in this case use e.g. GSEP instead.\n\nDescription of vocal models by Erosunica\n\n\"That's my list of useful MDX-NET models (vocal primary), best to worst:\n\n- MDX23C-8KFFT-InstVoc\\_HQ (Attenuates some non-verbal vocalizations: short low-level and/or high-frequency sounds)\n\n- Kim Vocal 2\n\n- UVR-MDX-NET-Voc\\_FT\n\n- Kim Vocal 1\n\n- Main (Attenuates some low level non-verbal vocalizations)\n\n- Main\\_340 (Attenuates some non-verbal vocalizations)\n\n- Main\\_406 (Attenuates some non-verbal vocalizations)\n\n- Kim Inst (Attenuates some non-verbal vocalizations)\n\n- Inst\\_HQ\\_3 (Attenuates some non-verbal vocalizations)\n\n- MDXNET\\_2\\_9682 (Attenuates some non-verbal vocalizations)\"\n\nand it’s also worth to check HQ\\_4.\n\n##### “UVR BVE v2 model [currently on x-minus] is actually full band. There is, however, a small nuance. This model uses MDX VocFT preprocessing, which is not full band. MDX VocFT model is rebalancing the song. The music is slightly mixed with the vocals (25% music + 100% vocals). This mix is then processed by the BVE model. A small amount of music can help the model better understand the context (it's important for harmony separation). We train the model on a rebalanced dataset. It contains 25% of music.” aufr33\n\n\\_\\_\\_\\_\\_\n\nAll the tips moved to [Tips to enhance separation](#_929g1wjjaxz7) section\n\n\\_\\_\\_\\_\\_\n\n[Screenshot and video showcase](https://discord.com/channels/708579735583588363/767947630403387393/1216488566138470442)\n\n#####\n\n##### MDX settings & ens. explanations in UVR5 (and also Demucs/VR/MDX v2/23C inferencing parameters)\n\n*In one of the pre-5.6 UVR updates, the following min/avg/max features for single models got replaced by a better automated alternative, and* *you might still get cleaner results of e.g. voc\\_ft with max\\_mag on X-Minus or in* [*this*](https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb) *Colab still utilizing it (or downgrade your UVR version).*\n\n*Now it’s only applicable for Ensemble and Manual Ensemble in Audio Tools.\nManual Ensemble is very fast, can be used on even old dual-core CPU, as it uses already separated files and simple code - not model.*\n\n###### *Ensemble algorithm explanations* Ensemble - a way to use multiple models to potentially get better results.\n\nRules to be broken here, but:\n\n*Max Spec* is generally for vocals\n(is maximum result of each stem, e.g. in a vocal you'll get the heaviest weighted vocal from each model, and the same goes for instrumental, giving a bit cleaner results, but more artefacts)\n\n*Min Spec* for instrumentals in most cases\n(it leaves the similarity from the models)\n\n*Avg Spec* is something in between\n(gets the average of vocals/instrumentals)\n\nE.g. following the above, we get the following setting:\n\n*“Max Spec / Min Spec”*\n\nLeft side = about the Vocal stem/output\n\nRight side = about the Instrumental stem/output\n\n\"Max takes the highest values between each separation to create the new one (fuller sounding, more bleed).\n\nMin takes the lowest values between each separation to create the new one (filtered sounding, less bleed).\n\nAvg is the average of each separation.\"\n\n*More*\n\n*For ensemble, avg/avg got the highest SDR, then worse results for respectively max/max, min/max and min/min.*\n\nFor single MDX model, min spec was the safest for instrumental models and gave the most consistent results with less vocal residues than others.\n\nMax spec - is the cleanest - but can leave some artifacts (if you don't have them in your file, then Max Spec for your instrumental like now might be a good solution).\n\nAvg - the best of the both worlds and the only possible to test SDR e.g. at least for ensembles, maybe even to this day if it wasn't patched\n\n*“Max Spec/Min Spec” option*\n\nFor at least a single instrumental model, it's the safest approach for instrumentals and universal for vocals. E.g. Min Mag/Spec in kae Colab using the old codebase for MDX models gives me the only acceptable results with hip-hop. I usually separate using a single model, but I cannot guarantee that Min Spec in UVR and manual ensemble will necessarily work exactly like Min Mag in Colab for a single model. But the explanation remains the same. The best option might even depend on a song.\n\nTL;DR\n\n*For vocals bleeding in instrumentals*\n\nYou can use Spectral Inversion for alleviating problems with bleeding in instrumentals.\n\nMax Spec/Min Spec is also useful in such scenario.\n\n*You want less bleed of Vocal in Instrumental stem?*\n\nUse Max-Min\n\n*For bleeding instruments in vocals*\n\nPhase Inversion enabled helps to get rid of transients of the kick which might be still hearable in vocals in some cases.\n\nSet Ensemble Algorithm: Min/Avg when you still hear bleeding.\n\nIf still the same, try Min/Max instead of Avg/Avg when doing an ensemble with Vocals/Instrumental output.\n\nAlso, you can resign from ensemble setting, and simply use only one clean model on the models list if the result is still not satisfactory.\n\n*Further explanations*\n\nWhy not always go for Min-Max when you want the best acapella?\n\nWhy not always go for Max-Min when you want the best Instrumental?\n\nSo far, I hear Max-Min on Instrumental sounds more 'muddy/muffled' compared to Avg-Avg.\n\nI bet this will be the same for acapella, but it's less noticeable (I don't hear it).\n\nHence, I think the best approach would be always going with Avg-Avg.\n\nThen based on the outcome - after reviewing, tweak it based on your desired outcome,\n\nand process again with either Min-Max or Max-Min.”\n\nMin = less bleeding of the other side/stem (into this side/stem), but could get sound muddy/muffled\n\nMax = more full sound, but potential it will have more bleeding\n\nAvg = average, so a bit of all models combined\n\nAverage/Average is currently the best for ensemble (the best SDR - compared with Min/Max, Max/Min, Max/Max).\n\n“Ensemble is not the same as chopping/cutting off and stitching, it blends/removes frequencies. If song 1 has high vocals in the chorus, and song 2 has deep vocals in the chorus, max will mash them together, so the final song will have both high and deep vocals\n\nwhile min will remove both vocals”\n\n\"If I ensembled with max, it would add a lot of noise and hiss, if I ensemble with min it would make the overall sound muted gsep.\"\n\n*Technical explanation on min/avg/max*\n\nMax - keeps the frequencies that are the same and adds the different ones\n\n“Max spec tends to give more artifacts as it's always selecting the loudest spectrogram frequency bins in each stft frames. So if one of the inputs have artifacts when it should be silent, and even if all other inputs are silent at the same instant, max spec will select the artifacts, as it's the max loud part of spectrogram here.” jarredou\n\nMin - keeps the frequencies that are the same and removes any different ones\n\n\"if the phases of the frequencies are not similar enough min spec and max spec algorithms for ensembles will create noisy artifacts (IDK how to explain them, it just kinda sounds washy), so it's often safer to go with average\"\n\n*by Vinctekan*\n\n\"Min = Detects the common frequencies between outputs, and deletes the different ones, keeps the same ones.\n\nMax = Detects the common frequencies between outputs, and adds the difference to them.\n\nNow you would think that Max-Spec would be perfect since it should combine the all of the strengths of every model, therefore it's probably the best option\n\nThat would be the case if it wasn't for the fact that the algorithms that are used are not perfect, and I posted multiples tests to confirm this.\n\nHowever, it still gives probably the cleanest results, however, there are a few issues with said Max\\_Spec:\n\n1. Lot of instrumentals are going to be left within the output\n\n2. If you are looking to measure quality by SDR, don't expect it to be better than avg/avg\n\nThe average algorithm, basically, combine all the outputs and averages them. Like the average function in Excel.\n\nThe reason why it works best is that it does not destroy the sound of any of the present outputs compared to Max\\_Spec and Min\\_Spec\n\nThe 2 algorithms still have potential for testing, though.\"\n\n*More on how the ensemble in UVR works*\n\n\"Max takes the highest values between each separation to create the new one (fuller sounding, more bleed).\n\nMin takes the lowest values between each separation to create the new one (filtered sounding, less bleed).\n\nAvg is the average of each separation.\"\n\n“[E.g.] HQ 1 would be better if the ensemble algorithm worked how I thought it did.\n\nIt was explained to me that [ensemble algorithm] tries to find common frequencies across all the outputs and combines them into the result, which to me doesn't actually seem to happen when HQ1 manages to bring vocals to the mix in an 8 model ensemble, how is it not like \"okay A those are vocals, and B you're the only model bringing those frequencies to me trying to imply that they are not vocals\" and discard them. I mean I am running max/max, but I swear all avg/avg and min/min do is lower the volumes [see [enemble in DAW](#_oxd1weuo5i4j)], It's hard to know without days of testing”\n\n“If u try avg/avg it will get quite muddy on instr result than max/max. But some song if you put kim vocal 1 will get vocal residue on the result”.\n\n###### *4-5 max ensemble models rule*\n\n###### Q: Why I shouldn’t use more than 4-5 models for UVR ensemble (in most cases)\n\nA: It's easier to get, when you separate the same song using some models. Get the best 4-5 models out of the most recommended currently, plus make some more separations, using some random ones. Then try to reflect avg spec from UVR by importing all of these results to your DAW.\n\nYou'll do it by decreasing volume by 3dB per one stem, so for a pair you need to decrease the volume of two stems by 6dB (possibly 6.02 as well). Decrease the volume by the same value further for more than a pair for all stems accordingly, so you'll get pretty much similar result like avg spec in UVR.\n\nYou can also maybe apply a limiter on the master. In the second variant, manipulate the volume of all stems by your taste instead of keeping the same volume. By this process, you can observe that the more results imported above 4-5 results, the worse result you have when you don't decrease volume of worse results. When you have control over the volume of single results, you'll end up decreasing the volume of bad results (or deleting them completely). You don't have this opportunity in UVR using avg spec - so like in the first variant in your DAW when you set the same volume for all results. The only way to not deteriorate the final result further, is to delete such worse results from the bag entirely, to not worsen the final outcome when you have too many models ensembled. Without the possibility of decreasing volume of such a result when all volumes are equal, the more results you'll import to the bag of the 4-5 the best models, the worse final result you'll get. Because you cannot compensate for bad results in the bag by decreasing their volume like in avg spec - all tracks are equally loud in the bag of avg to the models with good results - hence, good models sound quieter if they are in minority and the final outcome is worse.\n\nThe 4-5 max models ensemble rule is taken from long-conducted tests of SDR on MVSEP multisong leaderboard. When various ensembles were tested in UVR, most of these combinations didn't consist of more than 4-5 models, because above that, SDR was usually dropping. Usually due to all the reasons I mentioned.\n\nEven using clever methods of using only certain frequencies of specific models, like in ZFTurbo, jarredou and Captain FLAM code from MDX23 (don't confuse with MDX23C arch) and its derivations, which minimize the influence of \"diminishing returns\" when using too many models I think they never used more than 4-5 in their bags, and they conducted impressive amount of testing, and jarredou even focused on SDR during developing his fork (actually OG ZFTurbo code too).\n\n\\_\\_\\_\\_\\_\n\n*For vocal popping in instrumental issue, read about* [*chunks*](#_4t9vx74g45zt) *or update UVR to use a better option used automatically (called batch mode) if you didn't update to 5.6/+ for a long time already, but the issue might still occur on GPUs with less than 11GB VRAM (and earlier patches doesn’t have Roformers support).*\n\n\\_\\_\\_\\_\\_\\_\\_\n\n######\n\n###### ***MDX v2 parameters*** (e.g. HQ\\_1-5, Kim inst, Inst 1-3, NET, Crowd) (self.n\\_fft / dim\\_f / dim\\_t inference parameters later below)\n\nSegments 512 had better SDR than many higher values on various occasions (while 256 has lower SDR, and has almost the same separation time).\n\nSegments 1024 and [0.5](https://mvsep.com/quality_checker/entry/5129) overlap are the last options before processing time increases very much.\n\nDon't exceed an overlap of 0.93 for MDX models, it's getting tremendously long with not much of a difference.\n\nOverlap 0.7-0.8 might be a good choice as well.\n\nSegments can also ditch the performance AF -\n\nsegments 2560 and 2752 (for 6GB VRAM) might be still a high, but balanced value, although not fully justified SDR-wise, as 512 or 640 can be better than higher values for many songs.\n\nIn UVR and Not Eddy’s Colabs you can change segment size from 512 to 32 in order to possibly get better results with some older models like e.g. 438 (it tremendously increases separation time).\n\nOverlap: 0.93-0.95 (0.7-0.8 seems to be the best compromise for ensembles, with the biggest measured SDR for 0.99)\n\nBest measured SDR on MVSEP leaderboard have currently following settings (but it was measured on 1-minute songs, so it can be potentially different for your song):\n\nSegment Size: 4096\n\nOverlap: 0.99\n\nwith 512/0.95 worse by a hair (0.001 SDR) and 0.9 overlap for as long, but still not tremendously long processing time (1h30m31s vs 0h46m22s for multisong dataset on GTX 1080 Ti).\n\nAlso, segments 12K performed worse than 4K SDR-wise (counterintuitively to what it is said, that higher means better result, but maybe diminishing returns at some point here, so too big values maybe cause SDR drop in some cases)\n\nIt seemed to be correlated with set overlap.\n\nFor overlap 0.75, segments 512 was better than 1024,\n\nbut for overlap 0.5, 1024 was better, but the best SDR out of these four results has 0.75/512 setting, although it’s a bit slower than 1024, but for 0.99 overlap, 4096 segments were better than 512.\n\nSDR difference between overlap 0.95 and 0.99 for voc\\_ft in UVR is 0.02.\n\nSegment size 4096 with overlap 0.99 ([here](https://mvsep.com/quality_checker/entry/4533)) vs 512/0.95 ([here](https://mvsep.com/quality_checker/entry/5143)) showed only 0.001 SDR difference for voc\\_ft and vocals in favour of the first result.\n\nDifference between segment size 512 with overlap 0.25 ([here](https://mvsep.com/quality_checker/entry/5122)) vs 0.95 ([here](https://mvsep.com/quality_checker/entry/5143)) is 0,1231 SDR for the latter.\n\nThe difference between default segment size 256 with overlap 0.25 ([here](https://mvsep.com/quality_checker/entry/5116)) vs 512/0.95 ([here](https://mvsep.com/quality_checker/entry/5126)) is 0,1948 SDR for vocals, and 0,1969 with denoiser on (standard, not model), and 0.95 is longer by triple.\n\n1024/0.25 vs 256 has not much longer processing time (7 vs 6 mins) than default settings, and better SDR by 0.0865\n\nFor overlap [0.75](https://mvsep.com/quality_checker/entry/5124), segments 512 were better than 1024 (at least on 1 minute audio).\n\nMeasurement is logarithmic, meaning that 1 SDR is 10x difference.\n\nBe aware that increasing only overlap to e.g. 0.5 from default 0.25, when segments are still at default 256 will muddy the result a bit (might be more noticeable with denoise model enabled), while increasing segments (at least up to 480/512) suppose to add more clarity.\n\nAt least on the second beta Roformer patch, max supported segment size on 4GB AMD/Intel GPUs is 480 (at least for 4:58 and HQ 1-3 can sometimes only work with lower 448 - higher overlap and segment size crashes).\n\n256/0.5 also works, at least with HQ 4 (but crashes with 480 segments)\n\n480/0.38 works too, but you can settle on e.g. 0.31 if it’s too muddy.\n\nTry not to keep too many opened apps during separation, as drawing their interface also eats up VRAM on the GPU.\n\nMDX-Net v2 max balanced settings:\n\nSegment Size: 2752 (1024 if it’s taking too long as it’s the last value before processing time increases really much; at least SDR-wise, 512 is better in every case than default 256 unless overlap is increased, and still gets good SDR results)\n\nOverlap: 0.7-/0.8\n\nDenoising\n\nDenoise option used to increase SDR for MDX-Net v2, but instrumentals get a bit muddier ([result](https://mvsep.com/quality_checker/entry/5141)).\n\nDenoise model has slightly lower SDR ([result](https://mvsep.com/quality_checker/entry/5142)).\n\nFor MDX23C models it somehow changed and using standard denoiser doesn’t change SDR.\n\nSpectral Inversion\n\nOn bigger dataset like Multisong Leaderboard decreases SDR, but sometimes you can avoid some e.g. instrumental residues using it - can be helpful when you hear instruments in silent parts of vocals.\n\nExplanation:\n\n\"When you turn on spectral inversion, the SDR algorithm is forced to invert the spectrum of the signal. This can cause the SDR to lose signal strength, because the inverse of a spectrum is not always a valid signal. The amount of signal loss depends on the quality of the signal and the algorithm used for spectral inversion.\n\nIn some cases, spectral inversion can actually improve the signal strength of the SDR. This is because the inverse of a spectrum can sometimes be a more accurate representation of the original signal than the original signal itself. However, this is not always the case, and it is important to experiment with different settings to find the best results.\n\nHere are some tips for improving the signal strength of the SDR when using spectral inversion:\n\n\\* Use a high-quality input. The better the quality of the signal, the less likely it is that the SDR will lose signal strength when the spectrum is inverted. (...)\"\n\nFurther, there is also about picking a good inversion algorithm and experimenting with different ones, but UVR seems to have one to pick anyway.\n\nQ: I noticed <https://mvsep.com/quality_checker/leaderboard2.php?id=2967>\n\nhas Spectral Inversion off for MDX but on for Demucs. The Spectral Inversion toggle seems to apply to both models, so should it be on or off?\n\nA: Good catch.\n\nOnce u put it on for one or the other, both will be affected indeed.\n\nI've enabled it (so for both, actually) [for this result].\n\n\\_\\_\\_\\_\\_\n\n***MDX v3 parameters*** *(e.g. MDX23C-InstVoc HQ and 2 and MDX23C\\_D1581)*\n\n(biggest measured SDR)\n\nSegment Size: 512\n\nOverlap: 16\n\n(default)\n\nSegment Size: 256\n\nOverlap: 8\n\nThe “512/16 is slightly better for big cost of time” vs the default 256/8.\n\n- On a GPU with lots of VRAM (e.g. 24GB), you can run two instances of UVR, so the processing will be faster. You only need to use 4096 segmentation instead of 8192.\n\nIt might be not fully correct to evaluate segment and overlap SDR-wise based on measurements done on multisong dataset, as every single file in the dataset is shorter than average normal track, and that might potentially lead to creating more segments and different overlaps than with normal tracks, so achieved results won’t fully reflect normal separation use cases (if e.g. number of segments is dependent on input file). Potentially, the problem could be solved by increasing overlap and segments for a full length song to achieve the same SDR as with its fragment from multisong dataset.\n\n*Recommended balanced values for various archs*\n\nbetween quality and time for 6GB graphic cards:\n\n\\_\\_\\_\\_\\_\n\n**VR**\n\nWindow Size: 320 (best measured SDR)\n\nFaster value for slow PCs: 512\n\nSlower, might give more artefacts: 272\n\nWorse: 768, 1024\n\nRead more in [VR settings](#_atxff7m4vp8n)\n\n\\_\\_\\_\\_\\_\n\n**Demucs**\n\nSegment: Default\n\nShifts: 2 (def)\n\nOverlap: 0.5\n\n(experimental: 0.75,\n\ndefault: 0.25)\n\nThe best SDR for the least time for Demucs (more a compromise, as it takes much longer than default settings ofc - “best SDR is a hair more SDR and a sh\\*load of more time):\n\nSegments: Default\n\nShifts: 0\n\nOverlap: 0.99 (max can be 0.999 or even more, but it’s getting tremendously long)\n\nBest results for instrumentals as input (tested in Colab):\n\nSegments: Default\n\nShifts: 10 (20 is max possible)\n\nOverlap: 0.1\n\n\"Overlap can reduce/remove artifacts at audio chunks/segments boundaries, and improve a little bit the results the same way the shift trick works (merging multiple passes with slightly different results, each with good and bad).\n\nBut it can't fix the model flaws or change its characteristics\"\n\nIn case of Voc\\_FT it's more nuanced... there it seems to make a substantial difference SDR-wise.\n\nThe question is: how long do you wanna wait vs. quality (SDR-based quality, tho)”\n\nIn UVR and Not Eddy’s Colabs you can change segment size from 512 to 32 in order to possibly get better results with some older models like e.g. D1581 (but it tremendously increases separation time).\n\n*For lack of spectrum above 14.7kHz*\n\nE.g. in such ensemble:\n\n5\\_HP-Karaoke-UVR, 6\\_HP-Karaoke-UVR, UVR-MDX-NET Karaoke, UVR-MDX-NET Karaoke 2\n\nSet Max Spec/Max Spec instead of Min Spec/Min Spec, and also hi-end process (both need to be enabled for fuller spectrum).\n\nKaraoke models are not full band, even VR ones are 17.7kHz and MDX are 14.7kHz IRC. Setting Max Spec with hi-end process will give around 21kHz output in this case.\n\nCutoff with min spec in narrowband models is a feature introduced at some point in UVR5 GUI for even single MDX models in general, and doesn't exist in CLI version. It's to filter out some noise in e.g. instrumental from inversion. Cutoff then matches model training frequency (in CLI MDX, vocal model after inversion with mixture gives full band instrumental). Also, similar filtering/cutoff is done in ensemble with min spec.\n\n*More settings explanation*\n\nLeaving both shifts and overlap default vs shifts 10 decreases SDR by only 0.01 SDR in ensemble, but processing time is much faster - 1.7x for each shift. Also, 0.75 overlap increases SDR at least for a single model when even shift is set to 1)\n\nIt takes around 1 hour 36 minutes on a GTX 1080 Ti for 100 1-minute files.\n\n“And 18 hours on i5-2410M @2.8 for 5:04 track.\n\nRating 1 Ensemble on a 7-min song to compare.\n\nTime elapsed:\n\n1080Ti = 5m45s = 345s = 100%\n\n4070Ti = 4m49s = 289s = 83,8%\n\n4070Ti = ~16% faster\n\n1080Ti = ~€250 (2nd hand)\n\n4070Ti = €909 (new)\n\nConclusion: for every 1% gain in performance, u pay €41 extra (€659 extra in total).” Bas\n\n*More min/max explanations moved to* [*MDX/Ensemble settings*](#_6q2m0obwin9u)\n\nCompensation values for MDX v2\n(no longer necessary since MDX23C)\n\n“Volume compensation compensates the audio of the primary stems to allow for a better secondary stem.''\n\nFor the last Kim's ft other instrumental model, 1.03 or auto seems to do the best job.\n\nFor Kim vocal 1 and NET-X (and probably other vocal models), 1.035 was the best, while 1.05 was once calculated to be the best for inst 3/464 model, but the values might slightly differ in the same branch (and compensation value in UVR5 only changes secondary stem - changing compensation value in at least UVR GUI for inst models doesn't change SDR of instruments metric)\n\nself.n\\_fft / dim\\_f / dim\\_t parameters\n\nThese parameters directly correspond with how models were trained. In most cases they shouldn't be changed, and automatic parameter detection should be enabled.\n\n- Fullband models:\n\nself.n\\_fft = 6144 dim\\_f = 3072 dim\\_t = 8\n\n- kim vocal 1/2, kim ft other (inst), inst 1-3 (415-464), 406, 427:\n\nself.n\\_fft = 7680 dim\\_f = 3072 dim\\_t = 8\n\n- 496, Karaoke, 9.X (NET-X)\n\nself.n\\_fft = 6144 dim\\_f = 2048 dim\\_t = 8 (and 9 kuielab\\_a\\_vocals only)\n\n- Karaoke 2\n\nself.n\\_fft = 5120 dim\\_f = 2048 dim\\_t = 8\n\n- De-reverb by FoxyJoy\n\nself.n\\_fft = 7680 dim\\_f = 3072 dim\\_t = 9\n\n\\_\\_\\_\n\n**Roformers** (located in MDX-Net menu;\n\nonly in UVR Roformer beta [patches](#_6y2plb943p9v))\n\n*chunk\\_size*\n\n“most of the time using higher chunk\\_size than the one used during training gives a bit better SDR score, until a peak value, and then quality degrades.\n\nFor Roformers trained with 8 sec chunk\\_size, 11 sec is giving best SDR (then it degrades with higher chunk size)\n\nFor MDX23C, when trained with ~6 sec chunks, iirc, peak SDR value was around 24 sec chunks (I think it was same for vit\\_large, you could make chunks 4 times longer)\n\nHow much chunk\\_size can be extended during inference seems to be arch dependant.” - jarredou\n\nBe aware that increasing chunk\\_size consumes much more VRAM, and for 4GB VRAM AMD/Intel GPUs, the max supported will be chunk\\_size = 112455 (2,55s), sometimes chunk\\_size = 132300 (3s). CUDA has garbage collector which might make VRAM usage more efficient.\n\n“Conversion between dim\\_t and chunk\\_size [dim\\_t was used in the old Roformer beta 2 UVR patch]\n\ndim\\_t = 801 is chunk\\_size = 352800 (8.00s) - maximum value working on AMD/Intel 8GB GPUs and 900MB, at least Mel models\n\ndim\\_t = 1101 is chunk\\_size = 485100 (11.00s)\n\ndim\\_t = 256 is chunk\\_size = 112455 (2,55s) - maximum value for AMD/Intel 4GB GPUs\n\ndim\\_t = 1333 is chunk\\_size = 587412 (13,32s)\n\nThe formula is: chunk\\_size = (dim\\_t - 1) \\* hop\\_length)” - jarredou\n\nUnless you turn off Segment default in Options>Advanced MDX-Net>Multi Network Options, chunk\\_size is being read from the yaml of the model.\n\n*Inference mode*\n\nCan be found in the menu Multi Network Options menu above. Turning it off will fix the issue of silent separations on older GTX GPUs (iirc GTX 900 and older), but it might make separation slower for other, at least Nvidia GPUs.\n\nIt was implemented in one of the latest beta Roformer patches, so if you noticed any slowdowns since updating UVR, try enabling it (now it’s disabled by default).\n\n*batch\\_size*\n\nInference Colab by jarredou forces 1 (clicks with that setting were fixed in MSST later), and using above 2 might increase VRAM usage. In newer patches, Anjok started to use MSST inference code for Roformers and MDX23C, hence it might have inherited its usage.\n\nTechnical explanation how it works near the [end](#_j14b9cv2s5d9) of this document (scroll down a bit).\n\n###### *Overlap*\n\n4 is a balanced value in terms of speed/SDR according to [measurements](https://imgur.com/a/KyCtncG) (since the beta patch #3 or later used above, overlap 16 is now the [slowest](https://imgur.com/a/JtxzRZD) (not overlap 2 anymore) and overlap 4 has a bigger SDR than overlap 2 now.\nSome people still prefer using overlap 8, while for others it’s already an overkill.\nThere’s very little SDR improvement for overlap 32, and for 50 there’s even a decrease to the level of overlap 4, and 999 was giving inferior results to overlap 16.\n\nCompared to overlap 2, for 8 “I noticed a bit more consistency on 8 compared to 2 (less cut parts in the spectrogram).”\nInstrumentals with overlap higher than 2 can get gradually muddier.\n\nCalculations above were based on evaluations conducted on multisong dataset on MVSEP. Search for e.g. overlap 32 and overlap 16 below, and you will see the results to compare:\n\n<https://mvsep.com/quality_checker/multisong_leaderboard?algo_name_filter=kim>\n\n“overlap=1 means that the chunk will not overlap at all, so no crossfades are possible between them to alleviate the click at edges.”\nThe setting in GUI overrides the one in model’s yaml.\n\nRefer to UVR Roformer beta [patch](#_6y2plb943p9v) section for more detailed information\n\n##### [Tips to enhance separation results]\n\nIf you cannot achieve good separation, you can conduct the following experiments\n\n1. *De-bass*\n\nTurn down all the bass to stabilize the voice frequencies of your input song (example EQ curves: [1](https://cdn.discordapp.com/attachments/1054893075056562236/1061276815391469639/image.png) and [2](https://cdn.discordapp.com/attachments/1054893075056562236/1061276881103630366/image.png)).\n\nMale setting: cut all below 100Hz + cut all above 8kHz.\n\nFemale setting: cut all below 350Hz + cut all above 17kHz.\n\nThis works, because jitter is reduced a lot.\n\n2. *De-reverb*\n\nYou can also test out the de-reverb e.g. in RX Advanced 8-10 on your input song. One or both combined in some cases may help you get rid of some synth leftovers in vocals. Alternatively (not tested for this purpose), you can also try out [this](https://discord.com/channels/708579735583588363/708580573697933382/1006301791354376233) or [this](https://discord.com/channels/708579735583588363/872995262224818187/1062492523689418793) ([dl](https://pixeldrain.com/u/UWn7d2iH) is in UVR's Download Center) de-reverb model (decent results). Currently, the VR dereverb/de-echo model in UVR5 GUI seems to give the best results out of the available models (but RX or others described in the models list section at the top can be more aggressive and effective with more customizable settings).\n\n3. *Unmix drums*\n\n*(mainly tested on instrumentals)*\n\nSeparate an input song using 4 stem model, then mix the result tracks together without drums and separate the result using strong ensemble or single vocal or instrumental model (doesn't always give better results).\n\nAlternatively, unmix bass as well. There’s great bass+drums BS-Roformer model released for UVR (currently in beta)\n\n4. *Pitch it down/up*\n\n*(soprano/tenor voice trick + ensemble of both)*\n\n- You can use <https://github.com/JoeAllTrades/SpectraDownshift> for it\n(it’s based on scipy, it’s lossless, so fully reversible and nulls), or:\n\n- Already implemented option in newer versions of UVR under “Shift Conversion Pitch” in Settings>Choose Advanced Menu>Advanced [Arch] Options>\n\nAnd there are positive and negative values when you scroll up and down\n(lossy, even more than soxr).\n\nNegative value will slow down the track before separation, so e.g. model with cut-off will be compensated for its band lost a bit after speeding up again.\n\nIf you slow down the input file, it may allow you to separate more elements in the “other” stem of 4-6 stems separations of Demucs or GSEP (when it’s done manually).\n\nIt works either when you need an improvement in such instruments like snaps, human claps, etc. The soprano feature on x-minus works similarly (or even the same), it’s also good for high-pitched vocals.\n\nBe aware that low deep male vocals might not get separated while using this method (then use tenor voice trick instead - so pitch it up instead of pitching it down.\n\nAlso, it serves the best for hard paned songs (e.g. 1970 and pre era, e.g. The Beatles, etc). Also, it works great for drums. While evaluation on multisong dataset on MVSEP, it decreases SDR by around 1.\n\n\"Basically lossless speed conversion a.k.a. soprano voice trick done manually:\n\nDo it in Audacity by changing sample rate of a track, and track only (track > rate), it won't resample, so there won't be any loss of quality, just remember to calculate your numbers\n\n44100 > 33075 > 58800\n\n48000 > 36000 > 64000\n\n(both would result in x0.75 speed)\n\netc.\" (by BubbleG)\n\nQ: Won't the result be sped up?\n\nA: “No. Because when you first slow it down, after processing with said model it gets converted to 44100 again (only the sample rate, not the actual speed), so speeding it up brings the speed back to normal” becruily\n\nQ: I don't quite get what I'm supposed to do though, just slow down the file to 0.75x and then export in 58800?\n\nA: “change the sample rate to 33075 Hz,\n\nthen export at whatever sample rate\n\nprocess then,\n\nchange the sample rate of the processed file to 58800 Hz\n\nkey word being change, not resample\n\nlike [this](https://imgur.com/a/nC6BR2Z), click other and the pick the correct samplerate” Dry Paint Dealer\n\n4b\\*. If you have a mix of soprano and baritone voices, you possibly can do:\n\n\"1. Soprano mode (slow down sample rate), then bring back to normal\n\nafter that\n\n2. Tenor mode (speed up sample rate), then bring back to normal\n\nand finally combine the two with max algorithm\"\n\nMaking an ensemble of such results can also increase the quality of separation.\n\n5. *Use 2 stem model result as input for better 4-6 stem separation*\n\nYou may get better results in Demucs/GSEP/MDX23C Colab using previously separated good instrumental result from UVR5 or elsewhere (e.g. MDX HQ3 fullband or Kim inst narrowband in case of vocal residues, or BS-Roformer 1296)\n\n6. *Debleed*\n\nIf you did your best, but you still get some bleeding here and there in instrumentals, check RX 10 Editor with its new De-bleed feature. [Showcase](https://youtu.be/nwyJJMiYGUI)\n\n[More](#_tv0x7idkh1ua) methods of debleeding stems.\n\n7. *Vocal model>karaoke model*\n\nYou might want to separate the vocal result achieved with a vocal model with MDX B Karaoke afterwards to get different vocals (old model).\n\n8. The same goes for unsatisfactory result of instrumental model - you can use MDX-UVR Karaoke 2 model to clean up the result, or top ensemble or GSEP like for cleaning inverts (old models)\n\n9. *Mixdown of 4 stems with vocal volume decreased for final separation*\n\nAn old trick of mine. Used in times of Spleeter to minimize vocal residues.\n\nProcess mixture to 4 stems and then mix stems in a way that vocal is still there, but quieter, so lower their volume, and set drums louder, then send the mixture from it to one good isolation model/ensemble, so in result drums after separation will be less muddy, and possible vocal residues will be less persistent.\n\nBut it was in times when there wasn't even Demucs (4) ft or MDX-UVR instrumental models, where such issues are much less prevalent.\n\n10. If you use UVR5 GUI and 4GB, you may hear more vocal residues using GPU processing than e.g. while using 11GB GPU (tested on NVIDIA). In this case, use CPU processing instead.\n\n11. *Fake stereo trick*\n\nAufr33: “process the left channel, then the right channel, then combine the two. [Hence] the backing vocals in the verses are removed” (it still may be poor, but better). “I'm having to process as L / R mono files otherwise I get about 3-5% bleed into each channel from the other channel, but processing individually, totally fixes that” -A5\n\nOn an example of Audacity: import your file, click on down arrow in track selection near its label, click [Split Stereo Track](https://cdn.discordapp.com/attachments/708595418400817162/1169618081178464336/image.png), go to Tracks>[Add](https://cdn.discordapp.com/attachments/708595418400817162/1169618081400754216/image.png) New>Stereo Track.\n\nMark the whole channel, copy and paste on one of the tracks you divided before.\n\nIt will overlap the same mono track in stereo track, so the same across both channels.\n\nDo the same for both L and R separately. Then separate with some model both results separately. Then import both files and join their separate channels by method above. Don’t confuse L and R channel while joining both.\n\n12. *Turn on Spectral Inversion in UVR*\n\nit can be helpful when you hear instruments in silent parts of vocals, and sometimes also using denoiser might help for it (although both can make your results slightly muddier)\n\n13. *Chain separation*\n\nFor vocal residues in instrumental, you can experimentally separate it with e.g. Kim vocal (or inst 3) model first and then with instrumental model. You might want to perform additional steps to clean up the vocal from instrumental residues first, and invert it manually to get cleaner instrumental to separate with instrumental model to get rid of vocal residues. [Tutorial](https://youtu.be/FBMOWcDDxIs)\n\n14. To not clean silences from instrumental residues in the vocal stem manually, you can use a noise gate in even Audacity. [Video](https://youtu.be/9vrVJov7OWo)\n\nIn some cases, using noise reduction tool and picking noise profile might be necessary. [Video](https://youtu.be/XiuNjkGl4iY)\n\n15. *Choice of good models for ensemble*\n\nUse only instrumental models for ensemble if you have some vocal residues (and possibly vice versa - use only vocal models for ensemble for vocals to get less instrumental residues) - mainly used in times when there was still strong division between vocal and instrumental models (before MDX23C release). Now it can narrow down to picking only models which doesn’t have bleeding - listening all the separate models results carefully, and pick the best 2-5 results to make an ensemble.\n\n16. *For vocals with vocoder*\n\nYou can use 5HP Karaoke (e.g. with aggression settings raised up) or Karaoke 2 model (UVR5 or Colabs). Try out separating the result as well (outdated models).\n\n\"If you have a track with 3 different vocal layers at different parts, it's better to only isolate the parts with 'two voices at once' so to speak\"\n\nBe aware that BS-Roformer model ver. 2024.04 on MVSEP is better on vocoder than the viperx’ model.\n\n17. *Find some leaked or official instrumental for inversion*\n\nTo get better vocals\n\nIf you're struggling hard getting some of the vocals:\n\n\"I used an instrumental that I don't remember where I found it (I'm assuming most likely somewhere on YouTube) and inverted it and then used MDX (KAR v2) on x-minus and then RX 10 after.\n\nI Just tried the one-off Bandcamp and funnily enough it didn't work with an invert as good as the remake that I used from YouTube, but I don't remember which remake it was I downloaded because it was a while ago\"\n\n18. *Fix for* *~\"ah ha hah ah\" vocal residues*\n\nTry out some L/R inverting, try out to separate multiple times to get rid of some vocal pop-ins like this\n\n19. *Center channel extraction method*\n\n*by BubbleG using Adobe Audition*:\n\n\"The idea is that you shift the track just enough where for example if you have a hip hop track, and the same instrumental tracks the drums will overlap again in rhythm, but they will be shifted in time so basically Center Extract will extract similar sounds. You can use that similarity to further invert/clean tracks... It works on tracks where samples are not necessarily the same, too…”\n\n>\n\n*Step-by-step guide by Vinctekan* ([video](https://cdn.discordapp.com/attachments/708579735583588366/1136773448291602614/bandicam_2023-08-03_23-28-25-726.mp4))\n\n1. You take your desired audio file\n\n2. Open it in Audacity\n\n3. Split Stereo to Mono\n\n4. Click the left speaker channel (now mono), and duplicate it with Ctrl+D.\n\n\\*: If the original and duplicate is not beside eachother, move it so that it's next to eachother\n\n5: Select the original left speaker channel and it's duplicate, and click \"Make Stereo Track\"\n\n6: Solo it.\n\n7. Export it in Audacity, preferably in 44100hz since UVR doesn't output in higher frequencies. Format, and bit depth don't really matter, I prefer wav always.\n\n8: Do the same thing for the right speaker channel.\n\n9: Open UVR\n\n10: Navigate to Audio Tools>Manual Ensemble.\n\n11: Make sure to choose Min Spec (since that function is supposed to isolate the common frequencies of 2 outputs)\n\n12: Select the 2 exported fake stereo files of both the left and right speaker channels.\n\n13: Hit process\n\n\\_\\_\\_\n\n20. *Q&A for the above*\n\nQ: For the right channel are you doing the same with the duplicate and moving the file next to the original or just duplicating and making that stereo?\n\nA: Those 2 steps go hand in hand. These reason I mentioned it is because if you try to make a Stereo Track with those 2 (the left/right channel speaker, and it's duplicate mono]) when there is a track between them, it doesn't work. Even if you select those 2 with Ctrl held down.\n\nTake that 1 channel (left/right), Ctrl+C, Ctrl+V, now you have 2 of the exact same audio. Hold Ctrl select the 2, click \"Make Stereo Track\". Finally, export.\n\n21. *Passing through lot of models one by one*\n\n\"I usually do ensemble to make an instrumental first, then demucs 4\\_ft… sometimes I do it once, then take that rendered file and pass it back through the algo a few more times, depends until it strips out artifacts.\"\n\nIt can be beneficial also in case of more vocal residues of MDX23 or Demucs ft model compared to current MDX models or their ensembles.\n\n22. If you still have instrumental bleeding in vocals using voc\\_ft, process the result further with Kim vocal 2\n\n23. *Rearrange cleaner parts*\n\nWhen a verse starts, and you start having muddy drums and their pattern is consistent (e.g. some hip-hop), and you have cleaner drums from fragments before the verse starts, you can rearrange drums manually, using 4 stems model and paste that cleaner fragments throughout the track. Sometimes fade outs or intros can have clean loops without vocals, which can be rearranged without even the need of separation. Listen carefully to the track. Such moments can be even briefly in the middle of the song.\n\n##### 24. *arigato78 method for lead vocal acapella*\n\n1) Try to make the best acapella (using mvsep.com site or using UVR GUI). I recommend the MDXB Voc FT model for this with an overlap setting set to at least 0.80 (I used 0.95 for this example). The overlap for this model at mvsep.com is set to 0.80. Speaking of the \"segment size\" parameter in UVR GUI - changing it from 320 to 1024 doesn't make much of a difference. It acts randomly, but we're working on a beta version of UVR GUI - remember that. (...)\n\nI noticed all the \"vocal-alike\" instruments still remaining on the acapella track, but wait...\n\n2) The second part is to process the acapella thru the mdx karaoke model (I did it using mvsep.com). I prefer the file with \"vocalsaggr\" in the name. It has more details than the file with \"vocals\" in it. The same goes to the background vocals in this case - I prefer the \"instrumentalaggr\" one.\n\nOne important thing - all (maybe almost) of the residue instrumental sounds were taken by mdx karaoke model to the backing vocals stem, leaving the lead vocal almost studio quality (\"studio\"). But - it may be helpful for all you guys trying to make good acapellas. I was just playing with all the models and parameters and I accidentally came across this. Please, let me know what you think about it. I'm gonna try this on some tracks with flutes, etc. And I realize that this method is not perfect - we get nice lead vocals, but the backing vocals are left with all that sh\\*tty residues.\n\nSo the track is called \"Reward\" by Polish singer Basia Trzetrzelewska from her 1989 album \"London, Warsaw, New York\".\n\n\\_\\_\n\n25. *Uneven quality of separated vocals*\n\nYou can downmix your separated vocal result to mono and repeat the separation (works for e.g. BVE model on x-minus).\n\n26. *Experimental vocal debleed with AI for voice*\n\nSometimes for instrumental residues in vocals, AIs for voice recorded with home microphone can be used (e.g. Goyo [now paid Supertone Clear], or even Krisp, RTX Voice, AMD Noise Suppression, Adobe Podcast as a last resort) it all depends on the type of vocals and how destructive the AI can get.\n\n27. *Minimize vocal residues for very loud songs*\n\nFor very loud tracks between -2.5 and -4 iLUFS, try to decrease volume of your track before separation. E.g. for Ripple, -3dB for loud tracks is a good choice. If your track you’re trying to separate is already quiet and around -3dB, then the step is not necessary.\n\n27b. You could try out attenuate volume of the mixture before separation (-3/6 dB), but I can't remember whether current MSST uses normalization before anyway. UVR maybe not.\n\n28. *Brief (old) models summary*\n\nMDX-Net HQ\\_3 or 4 is a more aggressive model for instrumentals, with usually fewer amounts of residues vs MDX23C HQ models or sometimes even vs KaraFan or jarredou’s MDX23 Colab v2.3. HQ\\_3 can give muddier results vs competition, though.\n\nThe most aggressive are BS-Roformer models, but they can sound filtered and even muddier at times, but cleaner. It’s good to use them with ensemble with e.g. MDX23C model.\n\nvoc\\_ft is pretty universal for vocals (with residues in instrumental, but not less muddy results), while people also liked Ripple/Capcut, although they give more artefacts (use the released BS-Roformer models now for vocals instead). Consider using MDX23C HQ model(s) as well, but they tend to have more instrumental residues.\n\n29. *Cleaning up bleeding between mics in multitracks*\n\n*(by SeniorPositive)*\n\n\"Demucs bleed \"pro\" tip that I figured out now, and I didn't see mentioned, that I will probably try to use every time I hear some bleed between. (...) I was cleaning multitrack from bleed between microphones in conga track, and used demucs for separation drums/rest pair, and [the] other [stem] had some of those bongos still, very very low, but it existed, and I heard it just enough.\n\n- So I took rest signal, boosted it +20db (NOT NORMALISE! Other value but make note how much of it you boosted, go few dbs less to 0db threshold). If you do not boost it to sensible levels, the algorithm will skip it.\n\n- Do separation once again (this time I've done it using spectralayers one, but it's also demucs)\n\n- lower result -20dB add this result to first separation result\n\n[The] result [is -] better separation, fewer data in other/bleed and with proper proportions.\n\nIt looks like AI is not yet perfect with low volume information and, as seen in ripple Bas Curtiz discovery, too hot content also.\"\n\n[Showcase](https://discord.com/channels/708579735583588363/708579735583588366/1171218543258378370)\n\n30. *For clap leftovers in vocal stem*\n\nMethods suggested in [debleeding](#_tv0x7idkh1ua)\n\n31. *(paraphrase of point 17)*\n\nUse traditional phase inversion method and then feed them to the UVR models if you had a chance finding any official instrumental or vocal, but it doesn’t invert perfectly. This way, the models will have less noisy data to work with. But it sometimes happens that the official instrumental and the vocal version of tracks have slightly different phasing. This makes isolating vocals via phase inversion difficult, or even sometimes impossible ~Ryan\\_TTC\n\nSometimes only specific fragments of song will align, and in further parts of the track it will stop and require manual aligning. You may try to use Utagoe or possibly UVR with Aligning in Audio Tools as it shares some similar functionalities.\n\n*Why official stems don’t invert?*\n\n“Very rarely will the vocal or instrumental fully invert out of the master. This is because of master bus processing and non-linear nature of that processing. I.e. part of the masters sound is the processing reacting to the vocal and instrumental passing through the same chain.\n\nSidechaining and many limiters are also looking ahead to the signal. Also, some processing is non-linear so even if you set it up identically re. settings, each bounce will be slightly different in nature. Stuff like saturation/distortion. Some reverbs, limiters and transient shapers etc are not outputting the same signal / samples every time you bounce, so instrumental bounce is not the same as the master bounce in terms of phase inversion.” - Sam Hocking\n\n32a. *Muddiness in instrumentals of some BS-Roformer models*\n\nInvert (at best lossless) mixture (original song - instrumental mixed with vocals) with vocal result of separation. It might increase vocal residues outside busy mix parts.\n\nInverting vocals instead of mixture will result in less residues, but more artificial results in busy mix parts.\n\nSimilar trick might even increase SDR for MDX23C models irc.\n\nHow to perform inversion is explained somewhere in this [doc](https://docs.google.com/spreadsheets/d/1XIbyHwzTrbs6LbShEO-MeC36Z2scu-7qjLb-NiVt09I/edit) by Bas Curtiz.\n\nIt might be unnecessary to use in UVR - it might use this trick for BS-Roformer models already, but for 2024.02 on MVSEP it was beneficial.\n\nThe trick is not necessary for 04.2024 BS-Roformer model (it sounds worse after inverting).\n\nFurthermore, for some muddiness in this model, you can use the premium’s feature - ensemble. The default output without intermediates should be enough (min\\_fft is very muddy, and max\\_fft very noisy). Strangely, the result from Roformer from intermediates might sound v. slightly better (maybe it was something random). The ensemble is kinda mimicked in jarredou’s MDX23 v2.4 [Colab](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb) and to some extend it can be mimicked in UVR by using 1296+1297+MDX23 HQ ensemble (or copy of 1296 result via Manual ensemble instead, for faster processing).\n\nNow also x-minus has drums ensemble feature for Roformer models.\n\n32b. Fixing *muddiness for MDX-Net* (on example of HQ\\_3 model) - *inverting trick*\n\nIt's less muddy when mixture is inverted and mixed with separated vocals in louder parts, but vs the instrumental stem, it's worse in silent parts with less busy mix - then it has more vocal residues than the instrumental stem.\n\nWhen vocals were inverted instead of mixture, it was more muddy, but still more residues were present vs OG inst. stem, just a bit less. Can't tell how it's SDR-wise.\n\nSo you can combine various fragments for the best results.\n\n33. *Descriptions of models, pt. 2*\n\n*Muddiness of instrumentals in specific archs*\n\nBeside changing min/avg/max spec for MDX ensembling (or in Colab for single models), plus aggression for VR models, or manipulating shifts and overlap for Demucs models, you need to know that some models or AIs sound usually less muddy than others. Like e.g. VR tends to have less muddiness vs MDX-Net v2 arch, but the first tends to have more vocal residues. Consider using HQ2/3/4/inst3/Kim inst for fewer residues than in VR arch or BS-Roformer.\n\nFor less muddiness than in MDX-Net, consider using MDX23 Colab 2.0/2.1 or 2.2 (more residues) or KaraFan (e.g. preset 5).\n\n*34. Muddiness of 4/+ stem results after mixdown*\n\nUVR5 supports even 64 bit output for Demucs, eventually you can use Colab or CML version for 32-bit float, but mvsep.com supports 32 bit output in MDX23 model when you choose WAV. It has better SDR vs Demucs, anyway, but sometimes more vocal residues.\n\nThen, on MVSEP beside 4 stems, you have also instrumental - ready mixture of the three for instrumental in 32 bit provided, which is not bad, but you can go to extreme, and download e.g. Cakewalk, and 3 stems separately, and now in Cakewalk:\n\n1) Don't use splash screen project creation tool, close it\n\n2) Go to new\n\n3) Pick 44100 and 64 bit\n\n4) Make sure that double 64 bit precision is enabled in options\n\n5) Import MDX23 3 stems (without vocals)\n\n6) Go to file>Export\n\n7) Pick WAV 64\n\nOutput files of 64 bit mixdown are huge, but that way you get the least amount of muddiness as possible. If only MDX23 model doesn't give you much more vocal residues vs MDX-UVR inst models or top ensemble which you wouldn't accept.\n\nBe aware that 32-bit float vs 16 bit outputs can sound more muddy. Probably due to the fact that most sound cards/DACs don’t have native 32-bit float output support in drivers and additional downsampling must be done in-fly during playback, probably even if some drivers allow using 32-bit output in Sound settings in Control Panel for the same device (while other version might not).\n\nSpectrum-wise, instrumentals downloaded from MVSEP vs manual mixdowns are nearly identical. The only difference in one case I saw was in an instrumental intro in the song where the site's instrumental had more high end, maybe noise, but besides, spectrum looks identical at first glance without zooming it. Still, when I performed mixdown to anything lower than 64 bit, I didn't get comparable clarity to the site's instrumental. Maybe I'd need to change some settings, e.g. change project bit depth to the same 32 bits as stems and later perform mixdown to 64 bit. Haven't tested it yet.\n\n###### *35. Debleeding of drums in vocals by Sam Hocking*\n\nFor drums, I usually try and do some kind of sidechained denoise using the demixed Drum stem itself as the signal to invert with. If you shape 'shape' the sidechained input using spectral tools/filters/transient tools etc, you can often null more of the drum out of the vocal. My favourite tool for this is Bitwig Spectral Split, but there's several FFT Spectral VSTs out there. The key is the tools has smoothing to extend the transients in time a bit so they null more.\n\nDifficult to audibly hear on a video, but here's a vocal stem with a lot of residue i've exaggerated in a passage without singing. I turn on a sidechain bass, drums and other stem to phase invert them out the vocal a bit via the spectral transient split in Bitwig. I then take a spectral noiseprint in Acon Digital of what's left and that works as a mild denoiser, but only after the inversion has done its thing. Don't take the noise print until you're happy everything else is inverting out as much as you can get it, and it's not noticeable.\n\n*36. Manual MDX23 stems mixdown issues*\n\nIt can happen that after importing three stems from MDX23 or other arch, into the same session, all combined they sound so loud that they clip on the master fader. I’d rather suggest that, in many cases it can be ignored, as after mixdown it will be fine in most cases and better than with using limiter, but it also depends on a song loudness of how much clipping even the instrumental from single model will have:\n\n37. Q: *Why sometimes separated instrumentals have clipping?*\n\nA: “Mixture doesn't clip, but instrumental is clipping.\n\nThis is because where the instrumental is clipping in positive values, the vocals are in negative values, and so vocals are lowering instrumental peak value when mixed together.\n\nIf you separate a song peaking at 0 with high loudness, the instrumental will probably clip because of this (and the more loudness, the more chances this clipping can happen, as waveform is brickwalled toward boundaries values). It's the laws of physics, as that's because of these laws that audio phase/polarity inversion works.\n\nThat's why Demucs is using the \"clamp\" thing, or can also lower the volume of the separated stem to avoid that clipping.\n\n- Most of the time, lowering your input by 3dB solves that issue\n\n- Saving your audio to float32 can be a solution, as \"clipped\" audio data is not lost in this case” (jarredou)\n\nSo theoretically in a 32-bit float, the volume can be decreased after separation and still nothing is lost, and clipping should be fixed.\n\n- UVR has normalization option which turns down the output volume to avoid clipping if necessary, if you don't plan to use 32 bit float and potentially turn down the volume manually later\n\n38. *Separated audio using MDX-Net arch has noise when mixture has no audio and is silent*\n\nUse denoise standard (or denoise model) in Options>Choose Advanced Menu>Advanced MDX-Net Options>Denoise output\n\n39. *MDX23C/BS-Roformer models ringing issue*\n\n“It was reported that maybe DC offset can amplify it. Fixing it with RX before separation was said to alleviate the issue” See the [screenshot](https://i.imgur.com/JKZfjUu.png) how to do it,\n\n“Don't forget to use \"mix\" pasting mode” - jarredou\n\nIt serves to alleviate the issue of horizontal lines in specific frequencies across the whole track, cause most likely by bandsplitting neural network artifacts. Problem presented above.\n\nQ: Mine is 0.047% for the DC offset, so I would just do 0.047 or 0.04\n\nA: “0.047% is kind of normal value, it's even a great one. No need to fix that.\n\nI don't know at what value it could be become problematic for source separation models.\n\nOn some raw instrument recordings, I have seen 20%~30% DC offset sometimes, which can become a real issue for mixing then, as it's reducing headroom” - jarredou\n\n40. *Ensemble of pitch shifted results (point 4 continues)*\n\nSo you follow the point 4, and “change sample rate before each separation and restore it after for each, then ensemble them all”.\n\n“on drums it was really working great, where sometimes you have sudden muffled snare because other [stem] masked it, the SRS ensemble [irc used in MDX23 2.x Colab and KaraFan] was helping a lot with that, making separation more coherent across the track.”\n\n41. *A5 method for clean separations*\n\nConsider the fake stereo trick fist from point 11, separate with BS-Roformer 1296, clean the residues in vocals manually, put the vocals back into mixture - so perform mixdown to have a mixture again, and then separate this mixture with demucs\\_ft (old models)\n\n42. *Using surround versions of songs*\n\nSometimes you can get vocals separated easier from center channel from surround version of the song. Perhaps you might also get different separations of instrumentals from such versions, also with possibility of manipulating the volume of specific tracks before mixdown to 2.0/stereo file. It might be necessary anyway, because otherwise you might run into some errors on an attempt of separation of 5.1 file or with more channels.\n\n*Use Dolby Atmos/360 Reality Audio/5.1 version of the song*\n\nMultichannel mixes can give better results for separation. For more on Atmos [read](#_ueeiwv6i39ca).\n\nBe aware that center may contain not only vocal, but also some effects.\n\nConsider separating every channel separately, or one pair of channels at the time (rear, front, center, sides separately) or only separate center channel separately and all the rest separately.\n\nVisit [this](#_nspwy0bkpiec) section for more information.\n\n43. *Matchering as substitute of ensemble (UVR>Audio Tools)*\n\nIf the result of some separation is too noisy, but it preserved the mood and clarity of the instrumental much better than some cleaner, but muddy result, you can use that noisy result\n\nas the reference for more muddy target file. E.g. voc\\_ft used as reference for GSEP 2 stem instrumental output.\n\n44. *Retry separation 2–3 times*\n\nAt least for MDX23C models it happened for someone, that every separation made in UVR differed in terms of muddiness and residues, and someone received satisfactory result after the second or third attempt of separating the same song. Consider turning on Test mode in UVR, so the few digits number will be added to the output file name, so the results won’t be overwritten during the process, and you’ll be able to listen and compare them.\n\n45. *Ensemble instrumental result with drums with max/max*\n\nCan help to fix muddiness of vocal BS-Roformer models, but drums can sound too loud in the end. Consider decreasing their volume before ensemble if necessary.\n\nDrums can be obtained from e.g. demucs\\_ft (and mixture as input or from some less muddy model) or from MDX23 Colab/MVSEP (which already uses its own input from model ensemble for 4 stems)\n\n46. *Use EQ on your song before separation (e.g. for too weak “s” sounds in separated vocals)*\n\nIt’s an old method used in times when models didn’t give good quality yet, might be no longer necessary. You can use EQ on a mixture to stress vocals in the mix more, so the separation might turn out to be better.\n\n47. Bas Curtiz [video](https://www.youtube.com/watch?v=UGvkpilXx3Q) tutorial for tips and tricks and [document](https://docs.google.com/spreadsheets/d/1XIbyHwzTrbs6LbShEO-MeC36Z2scu-7qjLb-NiVt09I/edit)\n\n48. [Aufr33’s demudder](#_sv6j1ndk4oq5) (more for Roformers than HQ 4)\n\n49. *Volume compensation finetuning for MDX-Net models*\n\nIt can slightly enhance the result, helping fighting muddiness a bit.\n\nIt’s no longer beneficial for MDX23C and Roformer models.\n\nVolume compensation generally differ for every song. E.g. for HQ\\_3 model, sometimes 1.035 can be the best, but sometimes 1.022. By default, it affects only vocals, but when you switch primary stem in model settings, so vocals are labelled as instrumentals and vice versa (so how MDX kae Colab works), it can be used also to fine tune instrumental stems.\n\n50. *Picking correct models for ensemble (by dca100fb8)*\n\n“I'm seeing a certain pattern, if the Mel-Roformer model from x minus leaves faint vocals in the background during silent parts of the song, then it means MDX23 & Demucs 4 htdemucs\\_ft models should not be used for ensemble because vocals can be heard in the background too using these models, while MDXv2 models will not leave those vocals. So it's either UVR Mel/BS-Roformer 1296 + 1297 + MDXv2 or MDX23 + Demucs + Mel-Roformer X Minus + BS-Roformer 1296 + 1297.\n\nI excluded VitLarge because it always leaves faint vocals”\n\n51. *Ensemble only extra higher frequencies*\n\nfrom e.g. HQ 3 model with narrowband inst 3 model - [guide](#_h952n842ljfj)\n\n*52. Use some BV/Karaoke model first, to potentially get cleaner instrumental with dedicated model afterwards*\n\n*53. Set vocals to center with stereo plugin (guide by Musicalman)*\n\nTrick [working] with the [now outdated] BS-Roformer karaoke model, though it may work on other karaoke models too (I suspect you might have some mileage with MDX for instance). Anyway, the trick has to do with separating one voice from other sounds. If the voice you want to separate is panned centrally, you're already in luck; the model should expertly separate it. If not, you can rotate the stereo field so that the voice is as close to the center as possible (I use the Reaper js stereo field manipulator plug in for this). Process the rotated sound with the karaoke model and the voice you're looking for will magically be separated, even from other voices! If you need the original stereo image back, simply perform the opposite rotation.\n\n54. *Method for cleaner vocals (by YAZKEN\\*)*\n\n“Basically you do 2 vocal extractions, invert the polarity of one of them and render it, after that you invert the rendered audio and choose one of the extractions you’ve made and listen what is cleaner”\n\n55. Spectral editing in Audacity explained (by CC Karaoke)\n\nI typically use Audacity .. But [...] RX11 has some nice shiny toys, so maybe try that. I do things the hard way.\n\nHere's a great basic example when using the (Roformer) Karaoke model. It can really sometimes struggle with the hard consonants.\n\nSo for best results, you'll often need to isolate those in the main vocal stem by muting out all the surrounding sound, and then mixing them back in with the backing vocal stem. Of course by doing this the hard consonants will often be too loud, so you can de-amp the volume on them and then play back till you get a level that sounds like it blends properly.\n\n<https://imgur.com/a/dyc9olh>\n\nSlightly more complex example; Where the vocal lines are overlapping. I tried drawing green over one of the lines to show the difference. Might be a couple mistakes lol, as I haven't checked, but you get the idea. The previous sound is a carrying note, whereas the next line is a 'HA' kinda hard hit punching words, so it has a different shape to it. This kind of more obvious difference is easier than say... reverb…\n\n<https://imgur.com/a/BFGqN7P>\n\njarredou’s hint: That's a case where I would go SpectraLayers as while the 2 vocals are not on the same pitch, you can separate them manually (with harmonic selection tool). At least for that small part shown here.\n\nIn SpectraLayers, you can change FFT resolution, higher value will give you more defined freq \"picture\", and it can help when 2 parts are really close in pitch, like here.\n\nThe downside is that with high FFT values, you lose time resolution. So to use SpectraLayers manual selection efficiently, you often need to switch that FFT resolution value depending on the elements you are targeting, like you would zoom/dezoom in photoshop while editing a picture.\n\n56. *Sequential stem separation (by dynamic64/isling)*\n\nWith single stem models, feel free to experiment with sequential stem separation -\n\nInstrumental model first, then drums or bass, piano or guitar, strings or horns. It depends on the song whether better results will give e.g. drums or bass when separated first, the same to piano vs guitar and strings vs horns first.\n\n57. *Advanced chain processing chart* ([image](https://lh3.googleusercontent.com/pw/AP1GczNwiJ8BSJdjCWA4Z0D7CHevRfEbOqbf4uBFZSYFEWIxoSm3tUfJVUrlTMJOolR9wD_IxNZgzf7Efe7Nh58-nAuzrZmoAOwXE-FgDFWztWCmMZcJ0ResQZCjP4PMG26BpWSSMhQO4CEiM4XGplqKggoY_mVTqXaT14EBXpweZ95Dy8SJmI69Wf3usD4pyl0E2zJyMxyWZ5MYKs3uz6eenqpF98BYowhl0Qvq55xLZEqfeUsnbhouJPctM792NzghD81lLh1gxNU5sLpyS_c9y79ZOAvPSnXpHp1vFPoqrPbYhYQ1E70HtxuPb7UOSyptwB4pnQVfuJ_JRZzLWq_GDbdWa2uBHznGLFOLnwrtwlH6Kewd2hU8sE9oJzOhPCBhWoY52bDsLqjSFct7AVfFBWUpNAUQpitAMr4pEAYIjc6EaSMyYgR_Cf8Y05htRoNmOqxn08kynl7xlXtQ-5duX1VcZj6cPZ-QSbaH6W-CyBrbPjfCsDymb4V5Yl53W1oVY5ZRQWzfkfM3_KrmP1RQVimzvAq36Vwv9IwjqL_PR9AcS6HHYukQ88ZwtUzuSo7BnzuuslRFVPMM0NxzbwkPP-MWrNzlPGdMCm8VsLnmIcEcq8CRw9CRFvWzII8LsgLDUcpBeo00BBZxRQ8P0mMtQ3OjQ0opjqb4v7OJqoteW-rTPDaIuvcfgu6udWDDSUJgYuHBHOYB3n42ASD0lShDA3yORjfgPNvkQpDIJ5ZOqlbMOZz2dlOjhzq69glN8dmEx2Kn1z2h9NZo7nhrrt8QWaHnHpGT_OxroAwxHscd6lodzSQKtS86zExbCpW3PrmslkQCeXnKL-LMd9Mmr_xN5tO_zfeEuaVhCBTwzJmbxEqg5yxpgQ1klsIzxeCiqIRzJ96i1A5Tv_BjjTGeHlMjyBbOl_iCT7TSbtP9krhYs0BymO_02BE0q1r95rCOlFHyyj1Fbnh38Zt9hBHEPYOYqsTYHvwhJIjmb2M89xOr2KA9qyybrSa5vbZx4X1y91cSxQ03%3Dw1345-h941-s-no-gm?authuser=0))\n\nIt’s a method utilizing old models, and e.g. Kim Vocals 2 can be potentially replaced by unwa’s BS/Mel-Roformer models in [beta UVR](#_6y2plb943p9v) (or other good method for [vocals](#_n8ac32fhltgg)) or ensembles mentioned in this document. Check the best current methods for vocals in one stem to find what works the best for your song to get all vocals before splitting to other stems using this diagram.\n\nhtdemucs v4 above can be replaced by htdemucs\\_ft, as it's the fine-tuned version of the model (or [MDX23 Colab](#_jmb1yj7x3kj7)). Even better, you can use some of the methods for [4 stems](#_sjf0vefmplt) in this GDoc (like drums on x-minus).\n\nDe-echo and reverb models can be potentially replaced by some better paid plugins like:\n\nDeVerberate by Acon Digital, Accentize DeRoom Pro (more in the [de-reverb](#_5zlfuhnreff5) section).\n\nUVR Denoise can be potentially replaced by less aggressive Aufr33 model on x-minus.pro (used when aggressiveness is set to minimum), and there’s also newer Mel-Roformer (read [de-reverb](#_5zlfuhnreff5) section).\n\nAs for [Karaoke](#_h110k6ouf88c) models, there's e.g. a Mel-Roformer model on x-minus.pro for premium users or MVSEP/jarredeou inference [Colab](#_wbc0pja7faof).\n\n\"If the vocals don't contain harmonies, this model (Mel) is better. In other cases, it is better to use the MDX+UVR Chain ensemble for now.\". It is possible to recreate to some extent this approach while not using BVE v2 models, by processing the output of main vocal model by one of Karaoke/BVE models in UVR (possibly VR model as the latter) using Settings>Additional Settings>Vocal Splitter Options, so it separates using one model, then it uses the result as input for the next model (see the Karaoke section).\n\nMedleyVox (not available in UVR) will be useful in the end in cases when everything else fails after you obtain all vocals in one stem, as it's very narrowband. But you can use AudioSR on it afterwards.\n\n58. See [here](#_tv0x7idkh1ua) for more on *cleaning/debleeding*\n\n59. *Reverse polarity and/or remove DC offset* of the input file\n\n60. Find *fragments* of instrumentals in your song and *overlap them inverted across the whole song* before separation (heauxdontlast)\n\n60. *Method for better quality of instrumental leaks on YT by theamogusguy*“I did something really odd. (...) since you can only rip max 128kbps I did something really odd to get a higher quality instrumental:\n\nI inverted the 128kbps AAC YouTube rip into the original to get the acapella\n\nI took the subtracted acapella and ran it through AI (Mel-Roformer 2024.10) to reduce the compression artifacts\n\nI then inverted the isolated acapella and mixed it with the lossless to get an... unusual lossless instrumental file?\nAlso, the OPUS stream goes up to 20kHz, but I feel like the sample rate difference is going to cause issues, so I ended up ripping AAC (OPUS is 48khz while most music is 44.1kHz)”\n\n61. *Join the best fragments from various models*E.g. unwa inst models might be noisy at times, so you might want to use specific fragments of v1e/v1/v2 fitting across the song, or e.g. beta 4 vocal model in certain fragments where it’s not enough, though it is more muddy, but less noisy than unwa’s inst models. In some cases, if it’s still not enough, you might want to use BS-Roformer models like unwa’s Large or e.g. 24.10 on MVSEP. Just find which model on the list in this document has the least amounts of residues and experiment with the rest starting from models listed at the top.\n\n62. *Lowpassing lossless file to 20kHz*\n\nSometimes it’s a bit useful in getting rid of some constant faint noise/residues from vocals in instrumentals. It might muffle some unwanted parts of instrumental, but some more difficult fragments with more residues than usual might sound better that way. Tested on FLAC 16 compressed to mp3 320kbps, but it should work better with lowpassing using EQ instead of compressing. Other example values you might want to try out using are 19kHz (mp3 VBR V0 cutoff)/17.7kHz (cutoff of some narrowband models)/16kHz (cutoff of mp3 and AAC 128kbps)/14.7kHz (D1581 model cutoff).\n\nA possible explanation of why it might sometimes work is: sometimes, e.g. more oldschool hip-hop beats might have less higher tones, or even none above 16kHz, so most of the information in this area might come from vocals in a mixture. You can recognize it especially if vocals lose much more clarity than beat in the mixture once you compress it to e.g. mp3 VBR V0 (19kHz cutoff) or lower.\n\n63. Refrain from excessively stacking models (e.g. for RVC)\n\n“Inst Voc, Kim Vocals, Denoise, ensemble mode, and so forth can introduce noises to your dataset as it rips away frequencies from your audio. This harms the model fidelity and quality.” [more](https://rentry.co/RVC-dataset-RX11/#preparing-the-dataset-through-musicsfx-removal)\n\n64. Get cleaner vocals with vocal and instrumental model mixdown (e.g. of Mel becruily models) by [Havoc](https://discord.com/channels/708579735583588363/767947630403387393/1324043432309817437)/mrmason347\n\nSeparate with becruily Mel Vocal model and its instrumental model variant, then get vocals from the vocal model, and instrumental from instrumental model, import both stems for the DAW of your choice (can be Audacity) so you’ll get a file sounding like original file, then export - perform a mixdown of both stems, then separate it with vocal model\n\n65. Less vocal bleed with dim\\_t 256 or corresponding [chunk\\_size](#_c4nrb8x886ob) (cypha\\_sarin)\n\nSmall difference observed on 6GB NVIDIA GPU and Gabox instv5 model where “one little vocal glitching sound from the song that only gets picked up when the segment size is lower [256]”\n\n66. If you set 24-bit output in UVR>Options>Additional settings (or ev. 64-bit) for e.g. demudder, the results might be slightly less muddy\n\n67. *Clean loop of the instrumental used for Matechering and full separation*\n\nYou can use well sounding fragment of single instrumental model separation with high fullness metric as a reference for Matchering in UVR for [phase-fixed](#_j14b9cv2s5d9) muddy result set as target. It will have less bleeding than models with low bleedless metric, but still fuller than phase-fixed results.\n\n68. chunk\\_size 112455 and overlap 50\n\nTo have the best SDR for Roformers, use chunks not lower than 11s, which is usually training chunks value (rarely higher). Although, at times people get better results with 2,55s chunks (called chunk\\_size 112455 since UVR Roformer patch #3). But be aware that e.g. using becruily Karaoke model, using low 2,55s chunk will lead to crossbleeding. dim\\_t to chunk\\_size conversion is later [here](#_6y2plb943p9v). Sometimes even go to extremes and use e.g. overlap 50 claiming that it was better with 112455 and becruily inst model (thx gustownis)\n\n69. If you want smoother vocals from e.g. Beta 5e, use negative values of Shift Pitch Conversion in UVR Advanced MDX-Net settings (explained more thoroughly above).\n\n“tried it on a regular model (bigBeta5e) - the spectrogram looks a little more cut off at the high end than without the pitch adjust and overall the vocal sounds a little rounder and not quite as harsh (so the transients are not so nuclear)” - cristouk\n\n70. *Fixing missing sound after separation of multistem models*\n\nWith certain at least 4 stem models, you might find out that the inversion of a mixdown of those 4 stems vs original mixture is different. So you might get an additional 5th stem that way - your own “other”. It might be useful if some instruments got missed, or simply for remastering purposes where not having any missed bits of audio is critical for your work.\n\n71*. Start separation in a different place of the song*\n\nCut it manually. The result might resemble changing chunks setting a bit.\n\n72. *Use instrumental model result as pre-processor for vocal model*\n\nIt’s one of suggested RVC workflows\n\n73. Don't use 96kHz and higher audio files in UVR for separation\n\nFor some reason it yields bad results and clipping. 48kHz are allright.\n\nYou could use:<https://github.com/rorgoroth/mingw-cmake-env/releases/tag/latest> ffmpeg -i \"C:\\input96or48.wav\" -af asf2sf=dblp,ardftsrc=44100:quality=61656210:bandwidth=0.9941249 -c:a pcm\\_f32le C:\\output44.wav\nIt will give you 32-bit float after downsampling, so you could turn down the volume before separation (probably you could even settle on some volume attenuation in the command itself). It was #1 on the resamplers [chart](https://src.hydrogenaudio.org/) on hydrogenaudio. It requires 24GB of RAM to process faster. Be aware that it doesn't track the progress. In a limited RAM scenario, it can take even 10 minutes or more for 5-minute song.\n\n\\_\\_\\_\n\nGet VIP models (optional donation)\n\n<https://www.buymeacoffee.com/uvr5/vip-model-download-instructions>\n\nIf you still see some missing models in UVR5 GUI, which are mentioned in this document, get them from download center (or [here](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.3.0/v5_model_expansion_pack.zip), expansion pack) and click refresh in model list if you don't see some models.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n##### SDR leaderboard\n\nTested on multisong dataset\n[https://mvsep.com/quality\\_checker/leaderboard2.php?&sort=instrum](https://mvsep.com/quality_checker/leaderboard2.php?&sort=instrum&page=0)\n\n(some models/AIs/methods are not public, or only on MVSEP, all others you will find in UVR's and/or download center if you can't find some models, some only after using VIP code, or somewhere in this doc if it’s public)\n\n*Older, “synth” dataset more of older models, a bit less reliable, no longer updated leaderboard by the results of new models*\n\n<https://mvsep.com/quality_checker/leaderboard.php?sort=insrum>\n\nThe biggest SDR doesn't automatically have to mean that your result will be the best for your song, and your use case (inst/voc/stem). Read the [list](#_rz0d5zk9ms4w) of all the best models and methods at the top, and experiment.\nApart from bleedless/fullness metric, models with bigger SDR than others might pick up instruments better (e.g. less wind instruments recognized as voice).\n\nAlso, “The way I see high SDR is it indicates the lower frequencies will be more accurate to the original stem, and be more free of distortion or noise. And I also see it sometimes indicates better quality of fundamental frequencies (closer to the original gain/phase, more consistent separation), but I don’t know much beyond that lol” - stephanie\n\n- For specific songs, different ensemble configurations can give better results than for others.\n\n- \"Since the SDR [on MVSEP’s] synth dataset is flawed from the get-go due to the dataset being used isn't really music, but sample-based, don't get your hopes up too much.\".\n\nBut it generally reflects in greater extent differences between models, e.g. used in Demixing Challenge 2021, so it's not totally bad and multisong dataset might be even better (and still not perfect) - just be aware that different settings can give you better results for your particular song rather than average best combination of models on the SDR chart.\n- Bas Curtiz conducted some tests with commercial music as evaluation dataset, and it turned out that only models already close in SDR switched ranks, and most models kept the same. So in [conclusion](https://discord.com/channels/708579735583588363/708579735583588366/1304892082057646090), multisong dataset can be considered as still reliable (although bleedless and fullness metric is more suitable for our tasks now - more below).\n\n*About SDR evaluation on MVSEP and how important factor is that to the final result*\n\nIt still depends on the specific song, what bag of models/ensemble or what specific models will come out the best in specific scenarios. Suggesting by SDR of at least multisong dataset can be misleading. For example, the metric doesn’t really reflect the differences between e.g. HQ\\_3 and MDX23C fullband model in case of bleeding in instrumentals occurring in lots of contemporary songs. Although, the bleeding issue doesn’t always occur, and then, HQ\\_3 results can be more muffled, so in this case, SDR metric would be more accurate to human listening scenario where MDX23C models gets better metric, so it can be misleading, because SDR can vary very much from song to song.\n“The thing is that SDR evaluates at the same time how \"full\" the stem separation is and how much bleed there is in the separated stem. You can't know, only based on SDR score, which of \"fullness\" or \"bleedless\" is impacting the score the more” - jarredou\n\nAlso, according to some SDR evaluations conducted by Bas Curtiz, it turned out that permanent bleeding don’t have more impact on SDR than occasional bursts of bleeding here and there.\n\nStill, in some scenarios SDR metric of multisong dataset on MVSEP can be a safe approach, giving you some reassurance that the result in a strict test scenario will be at least decent in some respects, although you can (or even should when some instruments are missing) still experiment trying to get a better result, but it doesn't have to be reflected in SDR.\nTo sum up, SDR evaluation is only kind of averaging toward a specific dataset of songs, and it’s unpredictable based on just SDR how certain model will behave on specific song, plus its algorithm is limited vs human ears too. For example, if you could measure SDR for a specific song by its official, perfectly inverting instrumental, then it may not get the best result by the settings of the best ensemble combination measured by SDR for the time being. Suggesting by SDR means there’s just higher chance to hit a good result in a certain spectrum of sonic changes - it’s a good starting point to experiment further.\n\nBased on 9.7 NET 1 models, MVSEP synth dataset usually gives ~0.7 higher scores than on [Demixing Challange 2021 leaderboard](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/leaderboards?challenge_leaderboard_extra_id=869&challenge_round_id=886&post_challenge=true). Also, it favours Bas Curtiz FT model more than multisong dataset due to some characteristic features ZFTurbo pointed out.\n\n“A calculation by a computer isn't a human ear”.\n\n- Another way to at least sonically evaluate a model/ensemble, is to test it on a set of [AI killing tracks](#_37hhz9rnw7s8) which tend to have specific issues after separation with most if not all models, and to see how better or worse it got. Childish Gambino – Algorhythm is a good starting point to chase differences in vocal bleeding in instrumentals among various models, due to specific effects applied to vocals.\n\n*How does SDR even work in Python*\n\ndef sdr(reference, estimate):\n\ndelta = 1e-7 # avoid numerical errors\n\nnum = np.sum(np.square(reference), axis=(1, 2))\n\nden = np.sum(np.square(reference - estimate), axis=(1, 2))\n\nnum += delta\n\nden += delta\n\nreturn 10 \\* np.log10(num / den)\n\nQ: Is there a way to compare SDR between an official instrumental and the filtered instrumental\n\nA: Bas has shared an [.exe](https://drive.google.com/file/d/1i4xeKfNSUJumvcKjexu5lTjqKVJv9ZBI/view?usp=sharing) script to do that easily ⁠([uvr-general⁠](https://discord.com/channels/708579735583588363/767947630403387393/1235225091944480870))\n\nWhat you could do, but only if u have the original vocal or instrumental, is to check on SDR with this:\n\nusage:\n\nsdrcalc.exe \"c:\\your-input-folder\" \"c:\\your-output-folder\"\n\nmake sure they have the exact same extension + filename\n\n“Here is an idea for multisong leaderboard V2, with the songs edited to have the loudness of real music. In this paper, they show that lots of models SDR value decrease when evaluated on real music <https://arxiv.org/pdf/2208.14355>”\n\nAfter the community used SDR on synth, and later multisong dataset extensively, later\n\njarredou invented a new method of automated evaluation of models:\n\n## [**Bleedness**](https://mvsep.com/quality_checker/multisong_leaderboard?algo_name_filter=&sort=instrum&ranking_metrics=bleedless) **and** [**fullness**](https://mvsep.com/quality_checker/multisong_leaderboard?algo_name_filter=&sort=instrum&ranking_metrics=fullness) **leaderboard** Python evaluation [script](https://drive.google.com/file/d/1n8CWiFYtr0_T1qR8WM8pOjYcA401epez/view?usp=sharing) by jarredou (prob. [mirror](https://discord.com/channels/708579735583588363/911050124661227542/1302786918244814930)), [Torch version](https://drive.google.com/file/d/1MqSUnWgY_w-Io0GNXUyyA3_19Y9RVgLy/view?usp=sharing) (with Bas Curtiz), used on [Quality Checker](https://mvsep.com/pl/quality_checker)\n\nLibrosa version added to ZFTurbo training repo.\n\n*More detailed and reliable method of evaluation on multisong dataset than SDR.*\n\n(old) Bas Curtiz’ evaluation chart with some Roformers tested with that method:\n<https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?usp=sharing> ([shortened version](https://imgur.com/a/aJ0nNdc) - it’s outdated - all metrics are rewritten in the models sections [above](#_2vdz5zlpb27h))\n\nFor some newer models not on the list, you could search [here](https://mvsep.com/quality_checker/leaderboard2.php?sort=instrum) for the model name, and bleedless/fullness metrics for new models are now provided in the evaluation description when you click on the result, but plenty of model evaluations have names not corresponding to final model names and were shared along with models on our Discord and later pasted to this document above.\nAlso, sorting by specific metric on MVSEP was added in June 2025, so you can track the exact evaluation by provided metrics in this document that way, or by searching Discord, but it can be difficult, as links to some older models’ evaluations were not indexed by the metrics bot, so once the evaluation was posted, the bot wasn’t showing metrics from the beginning, and not all models were evaluated along with the model release.\n\n*Explanations on the metric*\n\n*Spectrogram difference showcase* [*diagram*](https://imgur.com/a/EnD4Ljc)\n\n“Blue is what is missing from a separated stem (compared to a clean source).\n\nRed is bleed in a separated stem.\n\nWhite is perfect\n\n(dB scale on the right seems wrong, I haven't checked, but it's not really important to see what is going on).\n\nSame formula [can] be used for a metric, which would theoretically measure bleedness and fullness of the evaluated models\n\nI think that for a metric, it's better to then separate negative values of diff array on one side, and keep positive values on other side, and average/scale each of them separately, so we get 2 scores, 1 for bleedness and 1 for fullness.\n\nIt has to be experimented further (and with better stft, it's only working on single chunk currently)\n\n(Not sure that so high n\\_fft/mel\\_bins values are really needed, it was just nicer on the plot with that)”\n\n“bleedless/fullness metrics are stft magnitude-only based and as they are discarding the phase data, they have some kind of blind spots.” - jarredou\n\nRandom noise added to results can increase fullness metric:\n\n<https://mvsep.com/quality_checker/entry/7709>\n\n<https://mvsep.com/quality_checker/entry/7708>\n\n“l1 freq, the simplest way to explain it - it’s a mix between fullness and bleedless but without the noise issue (in a sense it’s the real fullness/bleedless metric) (...)\nthere’s no universal metric still sadly, we have to rely on a combination of them (and our ears)”\n\n“-l1\\_freq = bleedless (higher is cleaner)\n\n-aura\\_mrstft = fullness (higher is fuller)\n\nthey maybe don’t have the issues fullness and bleedless have but I haven’t played to check that” - becruily\n\n“aura\\_mrstft - is more perceptually relevant than SDR for musical content imo” - gilliaan\n\n*Read for* [*discussion*](https://discord.com/channels/708579735583588363/708579735583588366/1299477079963992064)\n\n“[The] problem with bleedless/fullness metric is that you can easily increase them by multiplying stem on constant.\n\nMultiply predictions by 0.97 - it increases fullness and reduces bleedless\n\nMultiply predictions by 1.03 - it greatly increases bleedless and reduces fullness” - [ZFTurbo](https://discord.com/channels/708579735583588363/911050124661227542/1344980978556473437) *(metrics/discussion)*\n\nQ: I don't understand what fullness models do differently that result in this bleeding that is typical of fullness models\n\nA: I would say that fullness models are leaving a larger ring of fire\\*/shell of the untargeted stem ([link](https://discord.com/channels/708579735583588363/767947630403387393/1331748563885097041))\n\n(I guess that with a different n\\_fft resolution or number of bands, that bleed noise would sound also differently with roformers iirc)\n\n- jarredou\n\nOther metrics:\n\nLog WMSE - good “at least for drums or anything rich in low frequency content” - jarredou\n\n\"It is a relatively new time-domain metric over SDR and SI-SDR that is not overly sensitive to low frequencies like SDR and can accurately evaluate silent intervals.\n\nIn addition, time-domain metrics can be evaluated for both amplitude and phase.\" - Unwa\n\nMetrics ignore phase, so probably phase fixer won't affect fullness/bleeedless metric.\n\nMore by [unwa](https://discord.com/channels/708579735583588363/708595418400817162/1414796129039814688):\n\n“<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/utils/metrics.py>\n\nFirst, metrics can be categorized as either time-domain or time-frequency domain.\n\nSDR, SiSDR, and log\\_wmse are time-domain metrics.\n\nl1\\_freq, aura\\_stft, and aura\\_mrstft are all STFT-based metrics and belong to the time-frequency domain.\n\nTime-domain metrics assess how accurately the waveform matches the original sound, while time-frequency metrics compare how closely the spectrogram resembles the original sound's spectrogram.\n\nTime-domain metrics are biased toward low frequencies and are less affected by high frequencies.\n\nTime-frequency domain metrics can evaluate both low and high frequencies equally when based on a linear-scale spectrogram.\n\nWhen based on Mel spectrograms, they exhibit a bias closer to human hearing.\n\nHowever, the weakness of time-frequency metrics is their lack of phase consideration, since all metrics here are based on amplitude or power spectrograms.\n\nThe log\\_wmse metric is unique in that it allows weighting tailored to human hearing.\n\nAlthough it is a time-domain metric, it is special in that it also takes the frequency domain into account.\n\nThe difference between aura\\_stft and aura\\_mrstft is that stft compares spectrograms of a single resolution, while mrstft compares mel spectrograms of multiple resolutions.”\n\n*For evaluating specific instrument stems, interesting read:*\n\n<https://arxiv.org/abs/2507.06917v2>\n\n*\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_*\n\n*Top metrics of publicly available Roformers for instrumentals available for download*\n\n*(as for 18.06.25)*\n\n*Instrumental models sorted by instrumental* ***fullness*** *metric:*\n\nINSTV6N (41.68)>inst\\_Fv4Noise (40.40)/INSTV7N (no metrics)/Inst V1e (38.87)>Inst Fv3 (38.71).\n\nWhile V1e+ (37.89) might be already muddy in some cases\n\n##### *Instrumental models sorted by instrumental* ***bleedless*** *metric:*\n\n*Gabox inst\\_fv7b*\n\n*Fullness: 27.07 (worse than most vocal Mel-Roformers later below)*\n\n*Bleedless: 47.49*\n\n*Inst\\_GaboxFv7z*\n\n*Fullness: 29.38*\n\n*Bleedless: 44.95*\n\n*Unwa BS-Roformer-Inst-FNO*\n\n*Fullness: 32.03*\n\n*Bleedless: 42.87*\n\n*Unwa v2*\n\n*Fullness: 31.85*\n\n*Bleedless: 41.73*\n\n*Inst\\_gaboxBv3*\n\n*Fullness: 32.13*\n\n*Bleedless: 41.69*\n\n*Inst\\_GaboxFv8 (its replaced v2 variant)*\n\n*Fullness: 33.22*\n\n*Bleedless: 40.71*\n\n*Becruily inst*\n\n*Fullness: 33.98*\n\n*Bleedless: 40.49*\n\n*Gabox instv7plus*\n\n*Fullness: 29.83*\n\n*Bleedless: 39.36*\n\nUnwa HyperACE\n\nFullness: 36.91\n\nBleedless: 38.77\n\n*Unwa v1*\n\nFullness: 35.69\n\nBleedless: 37.59\n\n*Gabox fv3*\n\n*Fullness: 38.71*\n\n*Bleedless: 35.62*\n\n*Unwa v1e*\n\n*Fullness: 38.87*\n\n*Bleedless: 35.59*\n\n*Gabox fv5*\n\n*Fullness: 39.40*\n\n*Bleedless: 33.49*\n\n###### Vocal models/ensembles sorted by instrumental **bleedless** metric: (more muddy; Gabox and Unwa’s Revive models not evaluated yet):\n\n[*Descriptions*](#_3mrz4632uifx) *of the public models*\n\n*MVSep BS-Roformer (2025.07.20) - the 2 previous versions got replaced on the site by it*\n\n*Inst. Fullness 27.83*\n\n*Inst. Bleedless 49.12*\n\n*MVSep Ensemble 11.50 (2024.12.20)*\n\n*Inst. Fullness 27.17*\n\n*Inst. Bleedless 47.94*\n\n*MVSep Ensemble (4 stem) 11.93 (2025.06.30)*\n\n*Inst. Fullness 28.70*\n\n*Inst. Bleedless 47.68*\n\n*MVSep MelBand Roformer (2024.10)*\n\n*Inst. Fullness 27.73*\n\n*Inst. Bleedless 47.48*\n\n*BS-RoFormer SW 6 stem (MVSEP/Colab/undef13 splifft)*\n\n*Inst. Fullness 27.45*\n\n*Inst. Bleedless 47.41*\n\n*(use inversion from vocals and not mixed stems for better instrumental metrics)*\n\n*MDX23 Colab fork v2.5 by jarredou*\n\n*Inst. Fullness 28.02*\n\n*Inst. Bleedless 47.24*\n\n*(more noticeable bleeding/noise than MVSep Ensemble above)*\n\n*Beta 6X*\n\n*xx*\n\n*xx*\n\n*voc\\_fv4*\n\n*xx*\n\n*xx*\n\n*(Good if you need less vocal residues than typical instrumental Roformers (even less than Mel Kim, FT2 Bleedless, or Beta 6X - makidanyee).*\n\n*MelBand Roformer Kim*\n\n*Inst. Fullness 27.44*\n\n*Inst. Bleedless 46.56*\n\n*Kim | FT2 Bleedless (by Unwa)*\n\n*Inst.* Fullness 27.78\n\n*Inst.* Bleedless 46.31\n\n*Beta 5e (by unwa)*\n\n*Inst. Fullness 27.63 (bigger metric than Kim)*\n\n*Inst. Bleedless 45.90*\n\n*Kim | FT 2 (by unwa)*\n\n*Inst. Fullness 28.36*\n\n*Inst. Bleedless 45.58*\n\n*Kim | FT (by unwa)*\n\n*Inst. Fullness 29.18*\n\n*Inst. Bleedless 45.36*\n\n*MVSEP BS Roformer (2025.06)*\n\n*Inst. fullness: 17.30*\n\n*Inst. bleedless: 37.83*\n\n*(can be still a good choice in case of some crossbleeding, vocal chops, or residues of reverbs or BGV)*\n\n*MVSEP Ensemble 11.93 (also contains 2025.06)*\n\n*Inst. fullness: 17.73*\n\n*Inst. bleedless: 36.30*\n\n*\\_\\_\\_*\n\n*Outperformed vocal models for instrumental bleedless\n(still metrics for instrumental stem, so after inversion if not duality)*\n\n*SYHFT V3 (by SYH99999)*\n\n*Fullness 28.07*\n\n*Bleedless 45.15*\n\n*Duality v1 (by unwa)*\n\n*Fullness 29.08*\n\n*Bleedless 43.26*\n\n*Duality v2 (by unwa)*\n\n*Fullness 28.03*\n\n*Bleedless 44.16*\n\n*Mel Becruily vocal*\n\n*Fullness 28.25*\n\n*Bleedless 40.95*\n\n*SYHFT V2.5 (by SYH99999)*\n\n*Fullness 28.60*\n\n*Bleedless 40.34*\n\n*Big SYHFT V1 (by SYH99999)*\n\n*Fullness 28.48*\n\n*Bleedless 44.81*\n\n*Unwa beta 4*\n\n*Fullness 26.29*\n\n*Bleedless 44.71*\n\n*SYHFT V4 and V5 were never publicly released*\n\n*\\_\\_\\_*\n\n*bleedless+fullness/2=avg*\n\n*experimental avg metric for vocals (favours bleedless metric)*\n\n*Bas' Edition - 27.72*\n\n*FT2 bleedless - 27,54 | 2,49*\n\n*24.10 - 27.44 | 2,21*\n\n*FT2 - 26.84 | 2,23*\n\n*FT - 26.58*\n\n*5e - 26.42 | 1,54*\n\n*voc\\_gabox - 26.38*\n\n*voc\\_fv2 - 26.36*\n\n*voc\\_fv3 - 26.06*\n\n*Becruily - 25.99*\n\n*beta 4 - 25.93*\n\n*FullnessVocalModel - 25.91*\n\n*voc\\_fv4 - 25.02*\n\n###### Other ensembles for UVR5\n\nBest newer ensembles on the list at the [top](#_2vdz5zlpb27h) of the doc. Older configurations follow after the listed hidden results below.\n\nFor reference, read MVSEP’s [SDR evaluation chart](https://mvsep.com/quality_checker/leaderboard2.php?&sort=instrum&page=0) (UVR ensembles will appear later in the chart).\n\nBe aware that some of the results on the chart above at the top are not from UVR5 or use different methods and code to achieve better results and might be not public/still WiP, e.g. the following:\n\n*Hidden leaderboard results (all SDR results provided for instrumentals,\nDiscord links below are dead, but at least some can be found by the search on Discord and by verifying the opened link address which initial URL hasn’t changed):*\n\n- Bas’ unreleased fullband vocal model epoch 299 + voc\\_ft - SDR [16.32)](https://cdn.discordapp.com/attachments/708579735583588366/1127034811656196137/image.png)\n\n- [this](https://media.discordapp.net/attachments/708579735583588366/1126326776952520764/image.png?width=1440&height=63) older viperx’ unreleased custom weights code (newer one is up already), besides, “instrumental vX” entries are his ones (it rather utilizes public models with his own non-public weighted inference, and he gatekeeps it for more than since MDX23 results were published).\n\nBTW. ebright is probably the 2nd place in MDX23, at least the result appeared in similar time like ByteDance. 2nd place decided not to publish their work.\n\n- [32-bit](https://cdn.discordapp.com/attachments/911050124661227542/1136370992369905775/image.png) higher SDR result of original multisong dataset uploaded as output (opposed to the previous 16-bit currently on top). “Multisong dataset | Original stems | bass/drums/other joined” is not a model!\n\n- Bytedance v.0.2 - inst. SDR [17.26](https://media.discordapp.net/attachments/911050124661227542/1126353744880210041/image.png?width=1440&height=362), now it’s outperformed by v.0.3 and is [17.28](https://web.archive.org/web/20230806134030/https%3A//mvsep.com/quality_checker/multisong_leaderboard?sort=instrum), now called 1.0),\n\n-\"MSS\" - is probably ByteDance 2.0, not [multi source stable diffusion](https://github.com/gladia-research-group/multi-source-diffusion-models), as BD's test files which were published were starting with MSS name before, but the first doesn't necessarily contradict the latter, although they said to use novel arch - SDR [18.13](https://cdn.discordapp.com/attachments/708579735583588366/1134276341697630329/MSS.png), and probably another one by ByteDance - SDR [18.75](https://cdn.discordapp.com/attachments/767947630403387393/1136620057783455844/image.png), let's call it 2.1, but seeing inconsistent vocal result vs previous one here, we have some suspicions that the result was manipulated at least for vocals (or stems were given from different model).\n\n- Ripple app/SAMI-Bytedance on the chart is 16.59, also input files weren't lossless.\n\n- BS-Roformer results by viperx posted in [Training](#_bg6u0y2kn4ui)\n\nBTW. model\\_mel\\_band\\_roformer\\_ep\\_617\\_sdr\\_11.5882 is Bas Curtiz model trained purely on multisong dataset as an experiment, and won’t give good results outside multisong dataset.\n\nmel\\_band\\_roformer\\_ep\\_125\\_sdr\\_11.2069 is Bas Curtiz fine-tune model trained from ZFTurbo checkpoint, and it was shared with him under condition it will remain non-public/MVSEP exclusive.\n\n\\_\\_\\_\\_\n\nSome of these models in the download center are visible after using the [VIP code](https://www.buymeacoffee.com/uvr5/vip-model-download-instructions).\n\n*Older the best ensembles for UVR by SDR* :\n\n(some newer/better ones than these located at the top of the doc)\n\nFor 28.07.23\n\nKim Vocal 2 + MDX23C\\_D1581 + Inst HQ3 + Voc FT | Avg/Avg\n\nFor 28.07.23 (#4563)\n\nKim Vocal 1 + Kim Vocal 2 + MDX23C\\_D1581 + Inst HQ3 + Voc FT + htdemucs\\_ft | Avg/Avg\n\nFor 27.07.23 (#4561)\n\nKim Vocal 1 + Kim Vocal 2 + Kim Inst + MDX23C\\_D1581 + Inst HQ3 + Voc FT + htdemucs\\_ft | Avg/Avg (beta UVR)\n\nFor 24.06.23 (#3842)\n\nKim Vocal 1 + 2 + Kim Inst + HQ3 + Voc FT + htdemucs\\_ft | Avg/Avg | Chunks: [ON](https://media.discordapp.net/attachments/708579735583588366/1123703874847506532/image.png?width=343&height=651)\n\n(but for ensembles instead of single models it can score better with chunks disabled)\n\n[Consider using MDX23C\\_D1581 vocal model above as well, if ensemble in this arch works correctly, if not, perform manual ensemble, not sure here)\n\n*As for the very big ensemble* from older synth leaderboard (2023-04-30):\nMDX-Net: 292, 496, 406, 427, Kim Vocal 1, Kim Inst + Demucs ft\n\nOptionally, with later released models - voc\\_ft and Kim Vocal 2 -\n\nIt doesn't score too good SDR-wise on newer synth dataset, since it uses older models which have better counterparts already. Synth dataset is not used for evaluations for a long time.\n\nFor 13.06.23 (#3322)\n\nInst HQ2 + 427 + Inst Main + Kim Inst + Kim Vocal 1 + 2 + Demucs FT | Avg/Avg | Chunks Batch | Spectral inversion OFF\n\nMost probably you can safely replace Inst HQ2 with HQ3 and 4 (better SDR) getting a slightly better SDR in ensemble (it’s just not tested in ensemble yet).\n\nBut be aware that “The moment you introduce Instrumental models, there will be a bit of residue in the vocal output.\n\nHowever, the SDR scores higher.\n\nI'd say go with Vocal models only, if you care about your vocal output.”\n\nThe same is vice versa for instrumentals.\n\n*- Older ensemble configurations or custom settings with lower SDR*\n\n*(but might be useful for some specific songs or genres if further info is given)*\n\nFrom public models, the best SDR on 14.04.23:\n\nEnsemble | Kim vocal 1 + Inst HQ 2 + Main 427 + htdemucs\\_ft | Avg/Avg | Chunks Batch | Denoise Output ON | Spectral Inversion OFF | WAV\n\nFor instrumentals\n\nAnd\n\nEnsemble | Kim vocal 1 + Inst 3 + Inst HQ 2 + Inst Main + htdemucs\\_ft | Avg/Avg | Chunks Batch | Denoise Output ON | Spectral Inversion OFF | WAV\n\nFor vocals\n\n*As of 01.01.23 the best SDR for vocals/instrumentals has:*\n\n-UVR-MDX-NET INST MAIN + UVR-MDX-NET Inst 3` + `kim vocal model fine tuned (old)` + `Demucs: v4 | htdemucs\\_ft - Shifts: 2 - Ensemble Algorithm: Avg/Avg`, chunk margin: 44100 (better SDR compared to 22050), denoise output on (-||- off), spectral inversion off (-||- on)\n\n- MDX-Net: Kim vocal model fine-tuned (old) + UVR-MDX-NET\\_Main\\_427 + Demucs: v4 | htdemucs\\_ft - Ensemble Algorithm: Avg/Avg, Volume Compensation: Auto\n\n(it sets `1.035` - the best for Kim (old) model vs other options)\n\nShifts: 10 - Overlap: 0.25\n\n- a bit worse ensemble settings than both ensemble settings above SDR-wise:\n\nUVR-MDX-NET Inst 3 (464) and “UVR-MDX-NET\\_Main\\_438” vocal model (main) and htdemucs\\_ft - Ensemble Algorithm: Average/Average\n\n- Also good combo (for instrumentals, vocals in half of the cases):\n\nMDX-Net: UVR-MDX-NET Inst Main\n\nVR Arc: 7\\_HP2\\_UVR\n\nDemucs: v4 | htdemucs\\_ft\n\nMax Spec/Max Spec\n\n- UVR-MDX-NET Inst 3 as a main model and 7\\_HP2-UVR as a secondary with the scale set to 75%\n\n(Anjok 21.12.22: Personally, I found that using [it] produces the cleanest instrumental.\"\n\n“It means the final track will be 25% hp2 model and 75% inst 3 (similar to ensemble feature, but you have more control over how strong you want the secondary model to be)”\n\n- MDX-NET inst3 model (464) with secondary model 9\\_HP2\\_UVR 71% (hendrysetiadi: seems to get the best results with e.g. disco songs).\n\n- Inst Main + 427 + Net 1 (CyPha-SaRin: was a pretty good combo. One big model, one medium, one small, pretty decent results across the board. If a song going to have problematic parts, it's going to have regardless of what combo you picked, it seems.)\n\n- kim vocal 1 + instr 3 + full 403 + inst HQ 1 + full 292 + instr main with MAX/MAX (hendrysetiadi: i think that's the best combination of ensemble that i found)\n\n- For Rock/Metal - The MDX-Net/VR Architecture ensemble with the Noise Reduction set between 5-10 (depending on the track) and Aggression to 10.\n\n- For Pop - The MDX-Net/VR Architecture ensemble with the Noise Reduction set between 0-4 and Aggression to 10. (Anjok, 13.05.22)\n\n- Here is another ensemble that I have tried myself \"VR Arc: 1\\_HP-UVR x MDX-Net: Kim Vocal 1 x MDX-Net: UVR-MDX-NET: Inst HQ 1 x MDX-Net: UVR-MDX-NET: Inst HQ 2\" All with the average/average ensemble (Mikey/K-Pop Filters)\n\n- Inst HQ 1 & Main 427 are best for India\n\n-VR: 7\\_HP2-UVR, MDX: Kim vocal 1, Inst 3, Inst Main, Main, htdemucs\\_ft\n\nMax/Max, main pair: vocals/instrumental\n\n\"Instrumentals sound so good using these settings also. I can’t believe this is possible. What an amazing software. Thank you to whoever made this.\" StepsFan\n\n- I got an ensemble that works well for loud and crazy tracks (this instance it's dariacore lol) - by knock:\n\nModels: Inst HQ 3, Main, Voc FT\n\nEnsemble Algorithm: Avg/Avg\n\nMDX-Net settings:\n\nVol Comp: Auto\n\nSegment Size: 4096 (you can go up to 6144 if you want to wait longer, 4096 has seemed to be perfect for me)\n\nOverlap: Default (which I believe is 0.5)\n\nShift Conversion Pitch: -6 (semitones)\n\nMatch Freq Cut-off: Off\n\nDenoise Output: Yes\n\nSpectral Inversion: No\n\n###### Mateus Contini's methods\n\n######\n\n###### #1 (old)\n\n-“TIP! For busy songs: I was testing some ensembles trying to get Instrumental Stems with less volume variation (muddy), preserving guitar solos, pads the most and I had great results doing the following, for anyone interested:\n\nEnsemble (Demucs + 5\\_HP-Karaoke with Max for Instrumental stem) - The result will be the Instruments + Backing Vocals and this preserves most of the guitar solos, pads and things that MDX struggles.\n\nInstrumental Stem Output > Demucs to remove the Backing Vocals from the track - This pass will remove the rest of the Vocals. In some cases will be some minor leftovers that you can clean later with other methods.\n\nI find the results better than Demucs alone/ MDX models or other ensembles for what I'm looking for. I'm not evaluating noise, but fuller instrumental Stems, trying to preserve most of it and also the cost (time) to do it.\n\nSince I'm not interested, for this case, in doing manual work song by song and just use these stems to sing over it, I find the results great.” - Mateus Contini\n\nQ: Do you mean that you process Demucs 2 times? Once for ensemble with VR then the result was processed using Demucs again?\n\nA: You can add other models with the ensemble, like Demucs, VR\\_5-Karaoke and HQ3 for an extra, before processing again with Demucs.\n\nAlso, this method is very good for leave good backing vocals into the instrumentals (only the ensemble result). I find extracting bv from the Vocal Stem to be less effective, giving you less material (comparing if you would join the bv with instrumentals later)\n\nM.Contini Method #2 (newer)\n\nWell, I tried to improve the results of the method I posted, so here it is, for \\*\\*anyone interested in get fuller Instrumentals\\*\\*, with a bit of bleed in some songs, wielding great results overall.\n\nI'm doing this in the UVR-gui. The idea behind it is to scoop the vocals little by little, so the instrumentals is preserved the most. The proccess requires 3 extractions. Here are the Ensembles:\n\n1. pass Ensemble: 5\\_HP-Karaoke-UVR + Inst HQ3 + htdemucs - Min/ Max\n\n- If the song doesn't have BV, this will already give you good Instrumental Stem results. If you have Vocals bleeding into the Instr, continue to pass 2, but sometimes jumping straight to pass3 will produce better results.\n\n- If the song have BV, this you keep a fuller \\*\\*Instrumental Stem with BV\\*\\* in it. If you want to keep the BV, but there is some Main Vocals bleeding through the Instr, continue to pass 2.\n\n2. pass Ensemble: Kim Vocal 2 + Inst HQ3 + MDX Karaoke 2 - Min/Max\n\n- This pass will try to preserve the BV in the Instrumental Stem while removing Main Vocal bleed. You can stop here if you want the \\*\\*Instrumental Stem with BV\\*\\*\n\n3. pass Ensemble: Kim Inst + Inst HQ3 + htdemucs - Min/Max\n\n- This pass will try to remove BV from the instrumental Stem and other Main Vocal Bleed while keep the Instrumental fuller.\n\nThe idea behind it, is to have less volume variation where the vocals are extracted, leaving the Instrumental Stem less muddy. Since the extraction of the vocals is done little by little using the Min/Max, the Models will not be so aggressive. This is a great starting point if you want to improve further in a DAW or just sing over it. The Con is that, sometimes, the track will have tiny bleeds. If you try this method, please post the results here.\n\n#3\n\n- -try this ensemble: 9\\_HP (10 agression) + HQ3 (chunks on) + demucs\\_ft, Min/Max\n\n- it preserves most of the instruments.\n\nM. Contini method #4 (new)\n\nAnother Ensemble suggestion for good instrumentals with minimized bleeding vocals and a bit of noise in some cases:\n\nEnsemble: 9\\_HP + HQ3 + Demucs\\_6s (secondary model 50%: full\\_292) - Algorithm [min/max]\n\nConfigs:\n\n9\\_HP Window[512], Agress[10], TTA[on], Post[off], High-End [off])\n\nHQ3 Chunks[on] [auto], Denoise[on], Spectral[off]\n\nDemucs\\_6s Chunks[on] [auto], Split[off], Combine[off], Spectral[off], Mixer[off], Secondary Model - Vocals/Instr [MDX-Inst\\_full\\_292] [50%]\n\nWhy Demucs\\_6s and not \\_ft - I compare them in some songs and 6s have less vocal bleed in the instrumental track.\n\nDescription:\n\nThe idea is to take the good bits of the models using only one from each Group (VR, MDX and Demucs). The secondary model on Demucs is to minimize some vocal bleeding with sustained notes that was happening in some songs.\n\nComparing the results from multiple models, I find that Chunks enabled on MDX and Demucs removes some bleeding vocals from the Instrumental track and gives better results overall. This ensemble in my machine completes in about 5 min per song (GTX 1070 8GB, 16GB RAM, Ryzen 1600x). [chunks have been replaced by newer method in newer UVR GUI versions]\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n- “The best combo is the HQ instrument models ensemble average/average including HQ3/Main/Main Inst/Kim1/2/Kim Inst/demucs3 (mdx\\_extra)/htdemucs\\_ft/hdtdemucs6s” (MohammedMehdiTBER)\n\n\"Wow, I tried out the ensemble with all those models you said, and it actually sounds pretty good. There's a definitely more vocal bleed but in a saturated/detailed distortion type of way. I can't tell which one I like better, the ensemble sounds more full and has more detailed frequencies, but the vocal bleed is a lot more obvious. The HQ\\_3 by itself has almost no vocal bleed but sounds more thin and watery.\"\n\n- Kim instr + mdx net instr3 + HQ2 + HQ3 + voc ft max/max\n\nThe result is so amazing… Now can hear more detail on instrumental result where before I cannot hear a bit of music parts. (Henry)\n\n- \"I am very much enjoying making an ensemble of HQ3 and MDX23C\\_D1581, then inverting the vocals into the instrumental and running that through hq3 with 0.5 overlap\" (Rosé)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n**Ensembles for specific genres**\n\nEvaluation based on public models available at 23.04.23 and multisong dataset on MVSEP. The list might be outdated, as it doesn’t take all the current models into account.\n\nSDR sorted by genre\n\nBy Bas Curtiz\n\n\"If we remove \\*\\*Kim vocal 2\\*\\*, so only those that are available right now will be taken into account:\n\n- Ensemble Rating 1 scores highest on average overall\n\n[Probably this one:\n\n[Kim vocal 2 + Kim FT other + Inst Main + 406 + 427 + htdemucs\\_ft | Avg/Avg](https://mvsep.com/quality_checker/entry/974)\n\nAt least it was the best for the given date.\n\nBut now we have ensembles which score better.]\n\n- Kim vocal 1 is best for Rock\n\n- Kim vocal 1 & Ensemble Rating 1 are best for RnB/Latin/Soul/Funk\n\n- MDX'23 Best Model is best for Pop\n\n- Main 427 & MDX'23 Best Model are best for Other\n\n- Main 427 & MDX'23 Best Model are best for Blues/Country\n\n- Main 427 & Ensemble Rating 1 are best for Jazz\n\n- Main 427 & Ensemble Rating 1 are best for Acoustic genres\n\n- Ensemble Rating 1 is best for Beats\n\n- Ensemble Rating 1 is best for Hip Hop\n\n- Ensemble Rating 1 is best for House\n\nSheet where \\*\\*Kim vocal 2 \\*\\*is removed:\n\n<https://docs.google.com/spreadsheets/d/1ceXA7XKmECwnsQvs7a0S81XZOUokIXUN8ndsUDcYRcc/edit?usp=sharing>\"\n\n**Further single MDX-UVR models descriptions**\n\nE.g. used for ensembles above, but if a model has a cutoff, using ensemble with models/AIs without cutoff like Demucs 2-4 will fill the gap above. But it's still a good alternative for people without decent Nvidia GPUs or are force to use Colab.\n\n*UVR-MDX models naming scheme*\n\nAll models called \"main\" are vocal models.\n\nAll models called \"inst\" and \"inst main\" are instrumentals.\n\nNET-X [9.X/9.XXX in Colab] are vocal models\n\nKim vocal 1/2 (self-explanatory)\n\nInst main is 496\n\nKim other ft is Kim inst\n\nModel labelled as just ‘main’ is vocal, and was reported to have the same checksums as 427 and 423, but it doesn't seem to be true as 427 and main have different SDR (427 has better SDR than main, so apparently main is 423 [CRC32: E3C998A6]).\n\n- MDX HQ\\_1/2 models - excellent, vivid snares, no cutoff (22kHz) high quality, rarely worse results than narrowband inst1-3 models, HQ\\_2 might have slightly less loud snares, but can have fewer problems with removing some vocals from instrumentals\n\n- MDX-UVR Inst 3 model (464) - 17.7 cutoff (the same cutoff as for Inst 1, 2 inst main, but maybe not applicable for vocals after inversion in Colab), it was the third-best single model in our [SDR chart](https://mvsep.com/quality_checker/leaderboard.php) at the time, available in Colab [update](#_zaimpsi6j19a) and [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui) with [VIP models package](https://www.buymeacoffee.com/uvr5/vip-model-download-instructions) - now available for free.\n\n- Forth-best single model for instrumentals back then was inst main (496, MDX 2.1), then inst 1 and inst2.\n\n- There was some confusion about MDX 2.1 model (iirc on x-minus) being vocal 438 (even 411), but it’s currently inst main.\n\n- Full band MDX-Net models without cutoff (better SDR than Demucs 4 ft)\n\nAs for SDR, the epochs score is following: 292<403<386<(inst 1)<[338](https://mega.nz/file/1tdgFTaJ#AiiXwWWjmAuHb-Cebj5LpWezkXcJKWJsp8LHzFuNvho)<382<309<337\n\n<450 (first final, HQ\\_1)<498 (HQ\\_2)<(inst3)<(Kim inst)<HQ\\_3<HQ\\_4\n\nEpochs 292, 403 and 450 and newer are also in [Colab](https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb) (and in UVR5, older when VIP code is redeemed)\n\n- (currently the best, maybe not single model, but custom ensemble, as for vocals) MDX23 in [MVSEP beta](https://discord.com/channels/708579735583588363/911050124661227542/1086720481962496000),\n\nand in UVR5 - Kim vocal model -\n\nIt's a further trained MDX-UVR vocal model from their last epoch (probably UVR-MDX-NET Main). It's based on a higher n\\_fft scale which uses more resources.\n\nNot always gives that good results for instrumental as SDR may suggest, and also more people shares that opinion [both Colab and UVR users, so i’ts not due to no cutoff in Colab]).\n\nIn UVR5 generally for the best vocal result use vocal models, and for the best instrumental result use instrumental models or eventually 4 stem Demucs 4 ft.\n\n\"[Kim\\_Vocal\\_1] is an older model (November), than Kim uploaded at 2022-12-04 to\" <https://mvsep.com/quality_checker/leaderboard.php?sort=insrum&ensemble=0>\n\n(steps below no longer necessary, the model is added to GUI and these are the same models)\n\nYou can download her (so-called “old”) model from here (it still gets better results for vocals than inst 3 and main): <https://drive.google.com/drive/folders/1exdP1CkpYHUuKsaz-gApS-0O1EtB0S82?usp=sharing>\n\nWhen you copy/paste the model in `C:\\Users\\YOURUSERNAME\\AppData\\Local\\Programs\\Ultimate Vocal Remover\\models\\MDX\\_Net\\_Models` It asks you to configure, hit Yes.\n\nThen change `n\\_fft to 7680`.\"\n\nFor instrumentals, it gets worse results, frequently with more bleeding, and UVR manually applies cutoff above training frequency to instrumentals after inversion, to avoid some noise and possibly bleeding. Colab version of Kim model doesn’t have that cutoff, so instrumentals as a result of inversion have max 22kHz frequency (but UVR applies it to prevent some noise).\n\n*- (generally outperformed by models above)* [*MDX-UVR 9.7 vocal model*](#_pv80l0nr97r5) a.k.a. UVR-MDX-NET 1 (instrumental is done by inversion, older model)- available in [Google Colab](https://colab.research.google.com/drive/189nHyAUfHIfTAXbm15Aj1Onlog2qcCp0?usp=sharing)/[mvsep](https://mvsep.com) (here 24 bit for instrumentals)/[UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui).\n\nCompared to 9.682 NET 2 model, it might have better results on vocals, where 9.682 NET might have better results for instrumentals, but everything might still depend on a song. Generally, 9.7 model got better SDR both in Sony Demixing Challenge and on MVSEP. Generally, 438 vocal, or 464 inst\\_3 should give better results for instrumentals. 427 vocal model tends to give worse results for instrumentals than even this older 9.7/NET1 model.\n\nMore about MDX-UVR models -\n\nIf they don't have more vocal bleeding than GSEP, they’re better in filtering more vocal leftovers which sometimes GSEP tend to leave (scratches, additional vocal sounding sounds, also so-called “cuts” [short multiple lo-fi vocal parts] which GSEP doesn’t catch, but MDX-UVR does probably due to bigger dataset). But using single instrumental MDX-UVR models instead of ensemble will result in cut off of a training frequency (e.g. 17.7kHz or lower).\n\nAlso, MDX-UVR like GSEP may not have this weird constant \"fuzz\" which VR models tend to leave as vocal leftovers (but in other cases, 9.7 model can leave very audible vocal residues, so test out everything on this list, till you get the best result).\n\nThe 9.7 model (or currently newer models) is also good for cleaning inverts (e.g. when having lossy a cappella and regular song).\n\nIf you tested all the alternatives, and you stick to the MDX-UVR 9.7 for some song, and it doesn't have (too much) bleeding, to fine-tune the results you can try out two 9.6 models to check whether it's better for you than 9.7 in this specific case (they're available at least in HV Colab and UVR5 GUI).\n\nNewer MDX-UVR 423 vocal model usually provides more audible leftovers than 9.7 model.\n\nTo further experiment with MDX-UVR results, and you’re stuck with Colab, you can enable Demucs 2 model on Colab to \"ensemble\" it with MDX-UVR model (although metrics say it slightly decreases SDR, I like what it does in hi-end - it was suspected at some point the SDR decreasing problems may come out from enabling chunking).\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n- [Demucs 4](#_m9ndauawzs5f) (htdemucs\\_ft) - no cutoff, it’s 4 stem, but you can perform mixdown without vocals in Audacity for instrumental - sometimes it may give you louder snare than in GSEP, but usually muffled shakers compared to GSEP. Also, it will give you more vocal residues than GSEP and MDX-UVR 464 (Inst 3). 6 stem models gives more vocal residues than 4 stem model (ft is the best one and also outperformed mdx\\_extra model [better than mdx\\_extra\\_q - quantized) but in some cases that might be worth to check old mdx\\_extra model as well (but\n\n- (outperformed in many cases when used at least as a single models)\n\n[*VR-architecture models*](#_rdfatusyntt1) (Colab, CLI or UVR5 GUI) sometimes provide cleaner and less muddy results for instrumentals than single narrowband models of MDX or even GSEP, only if they do not output too much vocal bleeding (which really happens for VR models frequently - especially for heavily processed vocals in contemporary music), but bleeding also depends on specific model:\n\n- E.g. *500m\\_1 (9\\_HP2-UVR) and MSB2 (7\\_HP2-UVR)* models are the most aggressive in filtering vocals among VR models, but other, less aggressive VR models may provide better sounding, less spoiled instrumentals (only if it is not paid for with worse vocal bleeding [BTW. I haven’t heard the newest 2022 VR model yet (available at least in UVR5 GUI, maybe for Patreons, not sure]).\n\nAll parameters and settings corresponding to specific models you’ll find in “[VR architecture models settings](#_atxff7m4vp8n)” section.\n\n*-* [*VR models-only ensemble settings*](#_rv7wwzcmuq3s) *-* if your track doesn’t have too many problems with bleeding using VR-models above, to fine-tune the results achieved with VR, and to get rid of some mud, and e.g. get better sounding drums in the mix, I generally recommend VR-architecture models ensemble with settings I described in the linked section above.\n\nI'd say it's pretty universal, though the most time/resource-consuming method.\n\nAlso, [these](https://media.discordapp.net/attachments/1054893075056562236/1060720863361650728/image.png) ensemble settings from the UVR HV Colab seem to make decent job for extracting vocals in some cases when above solutions failed (e.g. claps leftovers).\n\nCheck also demucs\\_6s with 9 HP UVR and gsep in min-specs mode\n\nAlso, UVR5 GUI has rewritten MDX, so it can use their Demucs-UVR models from Demucs 3 (I think mvsep doesn't provide ensembling for any MDX models):\n\n- (generally outperformed by MDX-UVR 4xx models) *Demucs-UVR models* - 1 and 2 models beside \"bag\" are worth trying out (mainly 1) on their own if the results achieved with above methods still provide too much bleeding - better results than e.g. bare MDX-UVR 9.7 or VR models or even GSEP in some specific cases (available on [MVSEP](https://mvsep.com) and [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui)). They're Demucs 3, 2 stem better trained models by UVR team. No cutoff - 22kHz.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n- As for extracting -\n\n#### **Karaoke / Backing Vocals**\n\n(more up-to date, but less descriptive list at the [top](#_rz0d5zk9ms4w))\n\ncheck MDX-UVR Karokee 2 model (available on MVSEP, UVR 5 GUI)\n\nTL;DR - \"Usually MDX B Karaoke has really good lead vocals and UVR Karaoke has really good backing vox”\n\n\"There are 3 good karaoke models (the ones I'm referring to are on mvsep.com [they seem to be no longer available there]). \"MDX B (Karaoke)\" seems to be the best at getting lead vocals from karaoke while \"karokee\\_4band\\_v2\\_sn\" (UVR) and \"HP\\_KAROKEE-MSB2-3BAND-3090\" (UVR) seem to be best for backing vocals. I recommend using a mix of the 3 to get as many layers as possible, and then use Melodyne to extract layers as best as possible. Then combine the filter results and Melodyne and you should have smthn that sounds pretty good\" karokee\\_4band\\_v2\\_sn model might be not compatible with Colab (check mvsep or UVR5 GUI)\n\n- Demix Pro may do a better job in B.V. than models on x-minus.\n\nEven than the new model on x-minus since 01.02.23, but might be worth trying out on some songs (the problem is probably bound to MDX architecture itself).\n\n\"MDX in its pure form is too aggressive and removes a lot of backing vocals. However, if we apply min\\_mag\\_k processing, the results become closer to Demix Pro\"\n\n- Medley Vox\n\n(installation [tutorial](https://youtu.be/VbM4qp0VP80))\n\nFor separating different voices, including harmonies or backing vocals check out this vocal separator, the demos sound quite good and Cyrus model has pretty similar results.\n\nIt's for already separated or original acapellas. Sometimes it gives better results than BVE models. Output sample rate is 24kHz, but it can be easily upscaled by AudioSR well.\n\nOrg. repository\n\n<https://github.com/jeonchangbin49/medleyvox>\n\nOld info (dead link):\n\n<https://media.discordapp.net/attachments/900904142669754399/1050444866464784384/Screenshot_81.jpg>\n\n*How to get vocals stems by using specific models:*\n\nSong -> vocal model -> Voc & Inst\n\nVocal model -> Karaoke model -> Lead\\_Voc & Backing\\_Voc\n\nLead\\_Voc + Inst = Lead\\_Inst\n\n- How to get backing vocals using x-minus\n\n<https://x-minus.pro/page/bv-isolation?locale=en_US>\n\n“Method two is terrible and I do not recommend it” - Aufr33\n\n-If you have x-minus subscription, you can use chain mode for Karaoke as it currently gives the best results\n\nHow it probably works under the hood?\n\n\"On sitting down and reading <https://discord.com/channels/708579735583588363/900904142669754399/1071599186350440540>\n\nIt's a multistep process where it mixes a little bit from MDX's split vocals and instruments.\n\nThen passes that mixture through the UVR v2 karaoke/backing vocals model.\n\nThen with those results, it inverts the separated lead vocal, and adds it to the instrumental result\"\n\n- As for ***4 stem*** **separation**, check GSEP or [Demucs 4](#_m9ndauawzs5f) (now check better MDX23 Colab by jarredou)\n\n(other stem is usually the best in GSEP, bass in Demucs 4, rest depends also on a song, and as for drums, if you further process them in DAW using plugins, then Demucs 4 is usually better as it's lossless and supports up to 32-bit float output).\n\nDemucs 4 has also experimental **6 stem** feature. Guitar (can give good results) and piano (it's bad and worse than GSEP).\n\n- As for free ***electric guitar* and *piano stems,*** currently GSEP and MVSEP models are the best, but paid Audioshake provides better results than GSEP. Also in GSEP \"when the guitar model works (and it grabs the electric), the remaining 'other' stem often is a great way to hear acoustic guitar layers that are otherwise hidden.\". LALAL.AI also has piano model and is “drastically” better than Demucs.\n\n- From paid solutions for separating drums' sections, there are [FactorSynth](#_cz4j2d3uf48s), UnmixingStation, or free [Drumsep](#_2u19k7ty9b00) (but rather use MDX23C model).\n\n- As for specific sounds separation, check [Zero Shot Audio](#_g37f4a6hnxm0).\n\n\\_\\_\\_\\_\\_\\_\n\nCutoffs examination with spectrograms for various models and AIs, available in UVR5 GUI, along with examined times needed for each model to process on CPU or GPU (1700x/1080 Ti) by Bas Curtiz (cutoffs examination not applicable for MDX Colab where there is none unlike in UVR [it's to prevent noise]):\n<https://docs.google.com/spreadsheets/d/1R_pOURv8z9GmVkCt-x1wwApgAnplM9SHiPO_ViHWl1Q/edit#gid=23473506>\n\nSpreadsheet of songs that use Vocals as a melody with snippets how they separate on various models/AIs\n\n<http://vocalisolationtesting.x10.mx/>\n\n\\_\\_\\_\n\nIn below sections you’ll find more details, links, Colabs, all tools/AIs listed, more information about specific models as alternatives to experiment further (mostly MDX-UVR instrumental and vocal models available in UVR5 GUI and <https://x-minus.pro/ai> and MVSEP). I also provide some technicalities/troubleshooting everywhere when necessary.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### **Table of content**\n\n*(click on an entry to be redirected to a specific section;\nthe section is outdated - check it in document outline instead if you can)*\n\n[Last updates and news](#_k3vca4e9ena8) 1\n\n[General reading advice](#_2vw7f9wat3nv) 30\n\n[**Instrumental, vocal, stems separation & mastering guide**](#_sosams1g0zrm)\n\n[The best models](#_rz0d5zk9ms4w)\n\n[for specific stems](#_8o01ot6sjxel)\n\n[for instrumentals](#_2vdz5zlpb27h) 31\n\n[for vocals](#_n8ac32fhltgg) 34\n\n[How to check whether a model in UVR5 GUI is vocal or instrumental?](#_p1fyricuv1j8) 39\n\nfor k[araoke](#_vg1wnx1dc4g0) 39\n\nfor [4-6 stems (drums, bass, others, vocals + opt. guitar, piano):](#_sjf0vefmplt) 43\n\n[SFX](#_owqo9q2d774z) 45\n\n[De-reverb](#_5zlfuhnreff5) 46\n\n[Vinyl noise/white noise (or simply noise)](#_hyzts95m298o) 50\n\n[Mixing and mastering](#_86cdyl2tgclm) 51\n\n[Audio upscalers list](#_kmvf6iw5hfvm) 52\n\n[More descriptions of models](#_gdihug899mot) 53\n\n[MDX settings in UVR5 explained](#_6q2m0obwin9u) 57\n\n[Tips to enhance separation](#_929g1wjjaxz7) 63\n\n[Other ensembles in UVR5 - list](#_xya7mtyl0m39) 71\n\n[50 models sorted by SDR](#_n0f4tib5eipp) 87\n\n[Separating speakers in recording](#_ak53injalbkf) 93\n\nGeneral section of [UVR5 GUI (MDX-Net, VR, Demucs 2-4, MDX23) ………………………. 95](#_czix2y8eiuna)\n\n[GUI FAQ & troubleshooting](#_ul5en196k909) 96\n\n[Chunks may alter separation results](#_4t9vx74g45zt) 99\n\n[Q: Why I shouldn’t use more than 4-5 models for UVR ensemble (in most cases).............100](#_tb9spo3rgthx)\n\n[(older) UVR & x-minus.pro updates](#_yx8u0ahol7ao) 101\n\n[MVSEP models from UVR5 GUI](#_16gdep9n4hi3) 107\n\n[Manual ensemble Colab for various AI/models](#_surlvvp6mr8f) 108\n\n[Joining frequencies from two models](#_h952n842ljfj) 109\n\n[DAW ensemble](#_oxd1weuo5i4j) 110\n\n[Manual ensemble in UVR5 GUI of single models from e.g. Colabs](#_wbhpqttnrw7b) 110\n\n[**UVR’s VR architecture models**](#_wjd2zth0azhs) [(settings and recommendations)](#_7j2ewdqsy5qw) 110\n\n[VR Colab by HV](#_rdfatusyntt1) 110\n\n[VR settings](#_atxff7m4vp8n) 111\n\n[VR models settings and list](#_1wojovpsoqy) 113\n\n[VR ensemble settings](#_rv7wwzcmuq3s) 116\n\n[VR Colab troubleshooting](#_nj23a76dbn89) 123\n\n[**First vocal models trained by UVR for MDX-Net arch:**](#_87ny11r7l9) **125**\n\n[(the old) Google Colab by HV](#_zaimpsi6j19a) 126\n\n[Upd. by KoD & DtN & Crusty Crab & jarredou, HV (12.06.23)](#_aa2xhwp434) 126\n\nOther archs general section\n\n[**Demucs 3**](#_99i5nkp6p5v0) **134**\n\n[**Demucs 4 (+ Colab) (4, 6 stem)**](#_m9ndauawzs5f) **135**\n\n[**Gsep (2, 4, 5, 6 stem, karaoke)**](#_yy2jex1n5sq) **139**\n\n**D**[**ango.ai**](#_xdux18tet3x9) **144**\n\n[**MDX23 by ZFTurbo /w jarredou fork (2, 4 stems)**](#_jmb1yj7x3kj7) **145**\n\n[**KaraFan by Captain FLAM (2 stems)**](#_7kniy2i3s0qc) **149**\n\n[**Ripple/SAMI-Bytedance/Volcengine/Capcut (Jianying)/BS-RoFormer (2-4 stem)**](#_f0orpif22rll) **152**\n\n[**Single percussion instruments separation (from drums stem)**](#_m55fp5i7rdpm) **159**\n\n[**drumsep** (free)](#_jmjab44ryjjo) 159\n\n[FactorSynth](#_cz4j2d3uf48s) 160\n\n[Regroover](#_buopqcmi0inj) 161\n\n[UnMixingStation](#_n8a91zv9lor0) 161\n\n[VirtualDJ 2023/Stems 2.0 (kick, hi-hat)](#_ftdzxgety1hd) 162\n\n[RipX DeepAudio (-||-) (6 stems [piano, guitar])](#_1bm9wmdv6hpf) 162\n\n[Spectralayers 10](#_404qq7uhcrx5) 162\n\n[**USS-Bytedance (any; esp. SFX)**](#_4svuy3bzvi1t) **163**\n\n[**Zero Shot (any sample; esp. instruments)**](#_g37f4a6hnxm0) **164**\n\n[**Medley Vox (different voices)**](#_s4sjh68fo1sw) **165**\n\n[About other services:](#_t7yszids7p1p) 167\n\n[Spleeter](#_eoh4mhmvzmrt) 167\n\n[Izotope RX-8/9/10](#_lmhpaip88xjn) 167\n\n[moises.ai (3 EU/month)](#_cbtg72bxj2sf) 167\n\n[phonicmind](#_colcze2vgkha) 167\n\n[melody.ml](#_wttisf5ujyf1) 167\n\n[ByteDance](#_76t56x1587z5) 167\n\nR[eal-time](#_z0rg2bewgfed) separation\n\n[Serato](#_920bb96xiyga) 167\n\n[Stems 2.0](#_ko20o19wn5vp) 168\n\n[Acon Digital Remix](#_mj9frri60cqr) 168\n\nMisc\n\n[FL Studio (Demucs)](#_oe4d2kewoelf) 168\n\n[Fadr.com from SongtoStems.com](#_4r74hvaiyeik) 168\n\n[Apple Music Sing](#_kbxqqeby51dw) 168\n\n[Music to MIDI transcribers/converters](#_zd7m35sh6zri) 169\n\n[Piano2Notes](#_6xxrbp51to9n) 169\n\n[**Audioshake**](#_tc4az79fufkn) 169\n\n[Lalal.ai](#_51dyuze5xz9o) 170\n\n[DeMIX Pro V3](#_4yn6zawn80la) 171\n\n[Hit'n'Mix RipX DeepAudio](#_bf9sv6h9xjaz) 171\n\n[Moises.ai](#_x7qk80tje220) 172\n\n[How to remove artefacts from an inverted acapella? (can be outdated)](#_krccq343z6z9) 174\n\n[**Sources of FLACs for the best quality for separation process**](#_nspwy0bkpiec) **175**\n\n[Dolby Atmos ripping](#_ueeiwv6i39ca) 184\n\n[**AI mastering services**](#_ki1wmwa90cgp) **186**\n\n[How to get the best quality on YouTube for your audio uploads](#_tu3sw6pao8fp) 192\n\nHow to get the b[est quality from YouTube and Soundcloud - squeeze out the most from the music taken from YT for separation](#_6543hhocnmmy) 193\n\n[Custom UVR models](#_nmqmya8t76oc) 195\n\n[Repository of other Colab notebooks](#_40ggyvro35uu) 196\n\n[Google Colab troubleshooting (old)](#_lc0zj8wttng0) 199\n\n[**Repository of stems/multitracks from music - for creating your own dataset**](#_k3cm3bvgsf4j) **200**\n\n[List of cloud services with a lot of space](#_5haztbxg91rt) 205\n\n[**AI killing tracks - difficult songs to get instrumentals**](#_37hhz9rnw7s8) **211**\n\n[**Training models guides**](#_bg6u0y2kn4ui) **215**\n\n[Volume compensation for MDX models](#_yhu13dizwjvi) 229\n\n[UVR hashes decoded by Bas Curtiz](#_ntgu6se9g0u5) 231\n\n[Local SDR testing script](#_gzsz53bzhmzn) 233\n\n[Best ensemble finder for a song script](#_dus2zjzbt7dg) 233\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Models master list*\n\n### 50 models sorted by SDR\n\n(from the public ones - so available to download and offline use)\n\n(07.10.2024)\n\nThese are basically the top single models for now\n\n(conventionally after these, additional vocal residues kick in, especially if not a vocal model)\n\nBased on Multisong dataset evaluation on MVSEP chart with similar or the same parameters and inference if applicable.\n\nKim’s Mel Roformer\n\nmodel\\_bs\\_roformer\\_ep\\_317\\_sdr\\_12.9755\n\nmodel\\_bs\\_roformer\\_ep\\_368\\_sdr\\_12.9628 (viperx/UVR beta)\n\nBS-Roformer\\_LargeV1 (unwa’s ft)\n\nUnwa's Mel-Roformer Beta 3 (although for this day, SDR wasn’t tested with the same parameters vs above, so it’s based on assumption that unwa used the same parameters in synth dataset measurement)\n\nUnwa's Mel-Roformer Beta 4\n\nUnwa's Mel-Roformer Beta 5e\n\n0) InstVoc MDX23C HQ (fullband a.k.a. 1648, 8K FFT)\n\n0b) InstVoc MDX23C HQ 2 (fullband)\n\n1) voc\\_ft\n\n1b) UVR-MDX-NET HQ\\_4 (inst)\n\n2) MDX23C\\_D1581 (a.k.a. narrowband)\n\n3) Kim Vocal 2\n\n4) Kim Vocal 1\n\n5) UVR-MDX-NET\\_Main\\_427 (voc)\n\n6) UVR-MDX-NET\\_Main\\_406 (voc)\n\n7) UVR-MDX-NET HQ\\_5 (inst)\n\n7) UVR-MDX-NET HQ\\_3 (inst)\n\n8) UVR-MDX-NET\\_Main\\_438 (voc)\n\n9) UVR-MDX-NET\\_Main\\_390 (voc)\n\n10) Kim inst (a.k.a. other)\n\n11) UVR-MDX-NET\\_Main\\_340 (voc)\n\n12) Inst 3 (a.k.a. 464)\n\n13) UVR-MDX-NET HQ\\_2 (inst)\n\n(for vocal models, here start those with more vocal residues in instrumentals - can be still handy for specific songs)\n\n+4 pos.\n\n9) Inst Main (496)\n\n10) Inst 2\n\n11) UVR-MDX-NET HQ1\n\n12) UVR-MDX-NET HQ 337 >382>338 epoch\n\n13) Inst 1\n\n14) HQ 386>403>292 epoch\n\n15) UVR-MDX-NET2>NET3>NET1>9482 (NET3 a.k.a. 9.7)\n\n16) htdemucs\\_ft (4 stem) (S 10/O 0.95)\n\n17) hdemucs\\_mmi (4 stem)\n\n18) htdemucs\\_6s (6 stem)\n\n19) UVR-MDX-NET\\_Inst\\_82\\_beta\n\n20) Demucs3 Model B (4 stem)\n\n21) UVR-MDX-NET\\_Inst\\_187\\_beta\n\n(dango.ai, Audioshake, Bandlab not evaluated)\n\nSomewhere here, trash begins (excluding GSEP)\n\n22) Moises.ai (probably before transition to newer Roformer models)\n\n23) DeMIX Pro 4.1.0\n\n24) Myxt (AudioShake 128kbps)\n\n25) UVR-MDX-NET\\_Inst\\_90\\_beta\n\n26) RipX DeepRemix 6.0.3\n\n27) kuielab\\_b (4 stem) (MDX Model B from 2021 MDX Challenge)\n\n28) kuielab\\_a (4 stem)\n\n29) LALAL.AI\n\n30) GSEP (6 stem) (although it sometimes gives much better results than its SDR)\n\nVR arch\n\n31) 7\\_HP2-UVR (a.k.a. HP2-MAIN-MSB2-3BAND-3090\\_arch-500m)\n\n32) 3\\_HP-Vocal-UVR\n\n33) 2\\_HP-UVR (HP-4BAND-V2\\_arch-124m)\n\n34) 9\\_HP2-UVR (HP2-4BAND-3090\\_4band\\_arch-500m\\_1)\n\n35) 1\\_HP-UVR (HP\\_4BAND\\_3090\\_arch-124m)\n\n36) 8\\_HP2-UVR (HP2-4BAND-3090\\_4band\\_arch-500m\\_2)\n\n37) 14\\_SP-UVR-4B-44100-2 (4 band beta 2)\n\n38) 4\\_HP-Vocal-UVR\n\n39) 13\\_SP-UVR-4B-44100-1 (4 band beta 1)\n\n39) 15\\_SP-UVR-MID-44100-1\n\n40) 16\\_SP-UVR-MID-44100-2\n\n41) 14\\_HP-Vocal-UVR\n\n42) VR | MGM\\_LOWEND\\_A\\_v4\n\n43) 12\\_SP-UVR-3B-44100\n\n44) Demucs 2 (4 stem)\n\n(6 other old VR models proceeds)\n\n50) Spleeter 4 stems\n\n51) Spleeter 2 stems\n\n52) GSEP after mixdown from 4 stems separation\n\n*Only instrumental models listed (outdated)*\n\n*(4 stem and MDX23C models lies in all categories):*\n\n*Tier 1*\n\nMDX-Net models (trained by UVR team)\n\n0) MDX23C HQ 1648 fullband\n\n1) MDX23C HQ 2 fullband\n\n1b) UVR-MDX-NET HQ\\_4 (inst)\n\n2) MDX23C\\_D1581 narrowband\n\n7) HQ3\n\n10) Kim inst (other)\n\n12) Inst 3\n\n13) HQ2\n\n*Tier 2*\n\n+4 pos.\n\n9) Inst Main (496)\n\n10) Inst 2\n\n11) HQ1\n\n12) HQ 337 >382>338 epoch\n\n13) Inst 1\n\n14) HQ 386>403>292 epoch\n\nDemucs 4\n\n16) htdemucs\\_ft (S 10/O 0.95)\n\n17) hdemucs\\_mmi\n\n18) htdemucs\\_6s\n\n20) Demucs 3 Model B (mdx\\_extra)\n\nTier 3\n\n(somewhere between place 9-20 might be dango.ai, Audioshake, later maybe Bandlab)\n\n22) Moises.ai\n\n23) DeMIX Pro 4.1.0\n\n24) Myxt (AudioShake 128kbps)\n\n26) RipX DeepRemix 6.0.3\n\n27) MDX-Net Model B from 2021 MDX Challenge (kuielab\\_b)\n\n28) kuielab\\_a\n\n29) LALAL.AI\n\n30) GSEP (although it sometimes gives much better results than its SDR)\n\nTier 4\n\nVR arch\n\n31) 7\\_HP2-UVR (a.k.a. HP2-MAIN-MSB2-3BAND-3090\\_arch-500m)\n\n33) 2\\_HP-UVR (HP-4BAND-V2\\_arch-124m)\n\n34) 9\\_HP2-UVR (HP2-4BAND-3090\\_4band\\_arch-500m\\_1)\n\n35) 1\\_HP-UVR (HP\\_4BAND\\_3090\\_arch-124m)\n\n36) 8\\_HP2-UVR (HP2-4BAND-3090\\_4band\\_arch-500m\\_2)\n\nTier 5\n\n37) 14\\_SP-UVR-4B-44100-2 (4 band beta 2)\n\n38) 13\\_SP-UVR-4B-44100-1 (4 band beta 1)\n\nTier 6\n\n39) 15\\_SP-UVR-MID-44100-1\n\n40) 16\\_SP-UVR-MID-44100-2\n\n42) VR | MGM\\_LOWEND\\_A\\_v4\n\n43) 12\\_SP-UVR-3B-44100\n\n44) Demucs 2\n\n(6 other old VR models proceeds)\n\nTier 7\n\n50) Spleeter 4 stems\n\n51) Spleeter 2 stems\n\n52) GSEP after mixdown from 4 stems separation\n\nDifferences by SDR divided for vocals and instrumentals are important to divide I think only in ensembles. In all other cases, if SDR is bigger for instrumentals in some model, it will be bigger for vocals vs the same model. At least only for ensembles there were so little differences that we had two top ensembles for both vocals and instrumentals.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Hall of fame*\n\n*Great thanks to Anjok, Aufr33 (creators of UVR), KimberleyJSN a.k.a. Kim (model contributor and MDX/Roformers support), viperx (our former heavy user, supporter and now private models creator), tsurumeso (the creator of VR arch base code), BoskanDilan (creator of the old UVR GUI), IELab a.k.a Kuielab & Woosung Choi (MDX-Net arch creators), ZFTurbo (creator of MVSEP, MDX23, and many models), GAudio (GSEP creators), Alexandre Deffosez a.k.a. Adefossez (Demucs creator), Bytedance with asriver (Roformer arch and Ripple app), lucidrains (for recreating the BS and Mel Roformer from released papers), jarredou (MDX23 fork, drumsep model, tons of support and work Colabs), Bas Curtiz (model trainer, insane amount of testing and UVR5 settings guidance, tutorials with SDR evaluating, models creator), Captain FLAM (KaraFan creator), unwa (for his Roformer fine-tunes and training advice), Gabox (-||-), becruily (model trainer, tons of advice), FoxyJoy (de-reverb, de-echo, denoise models), Not Eddy (UVR UI, Colabs, HF, KF fork), Sir Joseph (WebUI Colabs) - thanks to all of these people for the best freely available AI separation technologies and models so far, mesk (metal models and trianing guide).*\n\n*Special thanks for users of our Discord:*\n\n*HV (MDX and VR Colabs creator and UVR contributor), txmutt (Demucs Colab), CyberWaifu (lots of testing, some older Colabs), KoD/Mixmasher (first HV MDX Colab fork), dca100fb1 (a.k.a dca100fb8) (VR ppr bug, finding tons of UVR bugs and models testing and feedback), mesk (training guide and Roformers fine-tuning),* Isling (lot of testing and suggestions), *CyPha-SaRin (lots of models/UVR testing), BubbleG, ᗩรρเєг, Joe, santilli, RC, Matteoki (a.k.a. Albacore Tuna, our “upscaling” guru), Syrkov, ryrycd, Mikeyyyyy/K-Kop Filters,* Mr. Crusty ᶜʳᵃᵇ (our mod; compensation values finding, MDX Colab mods and testing), knock (ZF’s MDX23 fine-tuning), A5 (lots of feedback on existing models), Infisrael (MDX23 guide and model testing), Pashahlis/ai\\_characters (WhisperX guide and script), Sam Hocking (our most valuable pro sound engineer and industry insider), Kubinka (for his Colabs and coding help), Vinctekan (one of our most valuable sound engineers and tools creator), CC Karaoke, “am able to use uvr with gpu”/vernight (both lots of testing and advice), hendry.setiadi, raiboomdash (lots of model tests with vast descriptions), wancitte, essid64, makidanyee,\n\n- thanks *to all of these people - for knowledge, help, testing and everyone whose advice, quotes and stuff appear in this doc. This guide wouldn't be created without you. If I forgot someone, forgive me.*\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nYou can support UVR team by these links:\n\n<https://www.buymeacoffee.com/uvr5/vip-model-download-instructions>\n\nand\n\n<https://boosty.to/uvr>\n\n(subscription to <https://x-minus.pro/ai> to process some VIP models there online)\n\nIf you see duplicated models on the list in UVR5, click refresh.\n\n*X-minus FAQ*\n\nQ: How come level 1 will be eliminated? Is it possible to leave it since I use this site very little and paying ( 2.79$ ) per month is too much and anyway 360 minutes of audio per week is a lot. I do 5/ 6 per week. It is a waste of minutes.\n\nA: If you renew your subscription several months in advance, you can use Level 1 even after removal. In addition, once your subscription Level 1 expires, you can use it for another month for free (after removing it in February).\n\n##### Similarity/Phantom Center/Mid channel Extractor\n\n“It extracts the phantom centre. I.e. what you hear as being in front of you in stereo audio” “very useful for older mixes. Like 60s songs with hard panning”\n\n[Metrics by gilliaan](https://docs.google.com/spreadsheets/d/1uWOC4XtIYHila7OeuX4N13RFBT4ZN9Dh4TfFQbS9R0M/edit?usp=sharing)\n\nMost important for that kind of model are two metrics “bleedless and aura\\_mrstft,\n\nWhy is bleedless the most important?\n\nIf the side bleeds into the mid, that's a task failure. This is the primary quality indicator.\n\naura\\_mrstft is more perceptually relevant than SDR for musical content imo, so that one too”\n\n- 2048 MDX23C [models](https://drive.google.com/drive/folders/1KHEnvsrvvIDlO-pBT-Hw8UKKebpJCyTe?usp=sharing) by wesleyr36/drypaintdealerundr | [info](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2417116936)\n\nThe 71.99 model variant has currently the biggest measured bleedless metric.\n\nCan be used on MVSEP (in Experimental section) and x-minus.pro (option Extract backing vocals) or using ZFTurbo inference CML [code](https://github.com/ZFTurbo/Music-Source-Separation-Training/) (it doesn’t work in the OG MDX23C inference code and in UVR).\n\n“This model is similar to the Center Channel Extractor effect in Adobe Audition or Center Extract in iZotope RX [and Audacity/Bertom], but works better.\n\nAlthough it does not isolate vocals, it can be useful.” Aufr33\n\n“The main thing I trained it for was to be used in a similarity extractor, since the original also used an AI model\n\nThe steps for that being:\n\n1. Take the L channel from Audio\\_1 and the L channel from Audio\\_2 and merge them into a stereo file.\n\n2. Run that through the model\n\n3. Repeat for R channels\n\n4. Merge the L and R channels back together, and you have the similarity, assuming the audio files were perfectly aligned.”\nIt was trained in a period of 6 days on Quadro RTX 4000\n\n“Some bits were better on the model, others on [the] Audacity's [Vocal and Center Isolation feature]”\n\n- gilliaan dual phantom center model [gilliaan\\_MonoStereo\\_Dual\\_Beta2](https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality/blob/main/gilliaan_MonoStereo_Dual_Beta2.ckpt) | [yaml](https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality). It has better metrics than the beta 1. [More info](https://discord.com/channels/708579735583588363/708580573697933382/1482589903253409982).\n\nDescription below.\n\n- gilliaan\\_MonoStereo\\_Dual\\_Beta1 Mel-Roformer\n\n<https://huggingface.co/gilliaaan/Mel-Band-Roformer-MonoStereo-Duality>\n\nIt has much higher SDR and aura\\_mrstft than the 71.99 above, with a bit lower bleedless.\n\n“Phantom center dual extractor (...) trained to isolate only the correlated center content, so hard-panned signals stay in the side output and don't leak into the center. (...) [And for]\n\nHard-panned correlated signal: The same signal in mid and in side, but the side version is hard-panned 100% to one channel (L or R). This represents mono recordings doubled to one side. very common in 60s music and some modern productions with somewhat panned frequencies.\n\nWhen mid.flac and side.flac contain the same signal but the side version is hard-panned to one channel only, the model outputs mid fullness of ~97 (everything dumped into mid) and side SDR near 0. It treats the correlation as a signal to route everything to center, ignoring the spatial asymmetry entirely. (...).\n\nBleedless/Fullness are on the same level.\n\nOn my validation set it scores higher SDR than what's currently out there.\n\n- Dry Paint Dealer Undr (a.k.a. wesleyr36)\n\nHTDemucs Similarity/Phantom Centre Extraction model:\n\n[GDrive](https://drive.google.com/drive/folders/10PRuNxAc_VOcdZLHxawAfEdPCO6bYli3?usp=sharing) / [HF](https://huggingface.co/jarredou/HTDemucs_Similarity_Extractor_by_wesleyr36/tree/main) (mirror) / [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) (it tends to be more “correct” in center extraction than the MDX23C model below)\n\nThat Demucs model won’t work with UVR giving bag\\_num error even with the yaml prepared in the same way as for Imagoy Drumsep and after renaming ckpt to th (it’s probably because it needs ZFTurbo inference code it was trained with).\n\n- SCNet Similarity/Phantom Centre Extraction model by Dry Paint Dealer Undr\n\n<https://drive.google.com/drive/folders/1CM0uKDf60vhYyYOCg2G1Ft4aAiK1sLwZ?usp=sharing>\n\n- Melband Roformer Similarity/Phantom Centre Extraction [models](https://drive.google.com/drive/folders/1uJP5OQuChCQVY4CVB1Ju3nxBskE-dYzy?usp=sharing) (beta) + LoRA\nby wesleyr36/drypaintdealerundr\n\n“results are relatively clean but sound a bit filtered at times, comes with 2 LoRA checkpoints for frazer's [LoRA repo](https://github.com/fmac2000/Music-Source-Separation-Training-Models/tree/lora)”\n\n- SCNet difference/sides models by Dry Paint Dealer Undr [DL](https://drive.google.com/drive/folders/1ZSUw6ZuhJusv7HE5eMa-MORKA0XbSEht?usp=sharing)\n\n- Two BS-Roformer Phantom Center/Similarity extraction models by drypaintdealerundr - previously unreleased from some training sessions from September or October 2025. [DL](https://drive.google.com/drive/folders/1jSQ3FdbQOjC6PnIWmqMByPmlK482iAKT?usp=sharing)\n\n*VR6 models don’t work in UVR:*\n\n- iter41\\_l1\\_loss [model](https://drive.google.com/file/d/12xKwmRpig-dGdHJoPLM2oV9LniWWC0MR/view?usp=sharing) for [VR v6.0.0b4](https://github.com/tsurumeso/vocal-remover/releases) - similarity/phantom centre extractor by wesleyr36/drypaintdealerundr,\n\n“I think this arch makes for a much better similarity extractor\n\n(make sure to use the complex flag when running this model --complex or -X)”\n\nCompared to MDX23C models below, VR6 ones were trained on a limited dataset, but they can still perform better.\n\n- 4096 [model](https://drive.google.com/drive/folders/1RMkUa6iMvJ0gW0LIMq_OofGiZDA5snKK?usp=sharing) for [VR v6.0.0b4](https://github.com/tsurumeso/vocal-remover/releases) by wesleyr36/drypaintdealerundr (it should perform better than the MDX23C 2048 model below)\n\n“must be run not only with -X or --complex but also --n\\_fft 4096 --hop\\_length 2048 or -f 4096 -H 2048”\n\nVS 2048 - “pros: less bleed\n\ncons: less complete results as a similarity extractor\n\nit seems to benefit from running the centre channel results back through the model for more complete results just like the original similarity extractor for more complete results although with the trade-off of more bleed\n\nyou end up with overall more bleed than the other model but with even more complete results”\n\nUsage for [VR v6.0.0b4](https://github.com/tsurumeso/vocal-remover/releases):\n\npython inference.py -i path/to/an/audio/file --gpu 0 -P path/to/model.pth -X -f 4096 -H 2048 -o folder/you/wish/to/save/to\nyou can just drag a file/folder into the terminal/CMD to get the path too if that's more convenient\n\nThe command for Nvidia GPU, but CPU inferencing should be possible too.\n\n- Mel-Roformer de-reverb by anvuew (a.k.a. v2/19.1729 SDR) | [DL](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt) | [config](https://huggingface.co/anvuew/dereverb_mel_band_roformer/resolve/main/dereverb_mel_band_roformer_anvuew.yaml) | [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\n(it can serve also as a phantom center model, removing sides)\n\n- Older VR model by HV Colab (Colab fixed 16.02.24)\n\n<https://colab.research.google.com/drive/1WP5IjduTcc-RRsvfaFFIhnZadRhw-8ig?usp=sharing>\n\nDon’t forget to click cell with dependencies after mounting\n\nIf you want to use the repo locally, use just this [fix](https://drive.google.com/drive/folders/1PUYK2QDT8moe6EOugDu0kROV11XlZjnx?usp=sharing)\n\n\"If you have two language track it'll remove the vocals, but not its adlibs\"\n\n\"It works like invert but instead of mixing the inverts together, it removes the difference and leaves the ones that sound the same\"\n\nIt uses a specifically trained model on 100 pairs.\n\n* Sadly, “It's like a downgrade of Audacity Vocal and Center Isolation feature” - it’s muddier\n\nAudacity can be used in browser at:\n\n<https://wavacity.com/>\n\nThe option in version prior 3.5.0 is located in:\n\nEffect>Special>Vocal Reduction and isolation\n\non 2.x: Effect>Vocal Reduction and isolation (at the very bottom)\n\n3.5.0 or later: downloadable as Nyquist plugin from [here](https://plugins.audacityteam.org/nyquist-plugins/effect-plugins/filters-and-eq#vocal-reduction-and-isolation)\n\n“Adobe Audition works similarly, but you can actually tweak a lot of settings. But the difference is pretty much non-existent. Or any better for that matter. Similar way. Even with Audacity, Adobe Audition, and PEEL [3d Audio Visualize], we are still not quite there yet.\n\nCurrently, Audacity, and maybe Waves Stereo Center plugin have the best capabilities, but they still aren't perfect.” Vinctekan\n\nSadly, it turns out that all the three solutions can sound worse than current models for the use case of getting rid of dubbing in movies.\n\nIt can be used with window size 768 on CPU as well. Probably the lowest supported for GPU is 272 (352 was set, and 320 is possible too), but probably it won't change here much.\n\nOne of the use cases of Audacity method to get lead vocals (in 2021) was by obtaining e.g. main vocals from vocal or BVE model, and processing that stem with these settings:\n\nAudacity>Effect>Vocal reduction and isolation>\n\non action, make it Isolate Center\n\nStrength: 1.1 or 1.6\n\nThen click OK. That effect must go on the vocal part. If you use center isolation, low/high cut will be ignored\n\n“Q: wouldn’t it be possible to extract anything that is panned to a specific point yk like extract anything that is panned 100% exactly, or anything that is panned 80% or 50% etc, would that not be possible?\n\nA: Mashtactic does that since almost 20 years now (coupled with dynamics and EQ filtering)\n\n<https://www.youtube.com/watch?v=0lDAY0va4VE>\n\n- zplane has released a clone recently, but it doesn't have the transient/sustain filtering iirc” (isling/jarredou)\n\n- Also, you can use AudioSourceRE RePAN for center extraction as well (IncT)\n\n- Or less complex free/paid Bertom Phantom Center (sometimes it’s better, sometimes worse than MDX23C model).\n“Bertom Audio claims to not use basic mid/side processing to extract the center so probably uses decorrelation on the sides instead of mid/side processing which by its nature negatively correlates them. The net result of Bertom is the sides are decorrelated from the center and so are maintained more strongly in the stereo field.” Sam Hocking\n\n- [AOM Stereo Imager D](https://aom-factory.jp/products/stereo-imager-d/) “turn off auto-gain & turn down center”\n\n- [Reaper guide](https://github.com/junh1024/junh1024-Documents/blob/master/Audio/How%20to%20extract%20Backing%20Vocals.md#fft-imaging-extract-out-of-phase)\n\n*Hints on using similarity models*\n\nQ: “Hi there! I have a question regarding audio separation in movies. I have an old movie with a stereo track in English, and a mono (!) track in French. I couldn't for the life of me find anything better than mono for my native language, which is a shame (even a source with supposedly stereo French is actually mono, you hear it right away).\n\nSo I'm willing to try and use the English track to reinject some stereo in the background, to widen the music and sfx at least (the voices being centered don't bother me). i.e. I could separate the voices from the rest in both languages, then mix the French voices with the English sfx and music. Whatever artifacts remaining could blend enough that it wouldn't be noticeable... Maybe...\n\nHow would you guys go about it? I've used UVR in the past but only on music, not movies – and it was months ago. Also my GPU is old (Nvidia GTX 970 with 4 GB VRAM) so this might be a limitation. Thanks for any advice you can give me!”\n\nA: “If the French rip is a downmix between stereo channels, then you don't have to separate dialogue in the French rip. Only separate the vocals in the English version, encode it to mid/side, replace the mid channel with the French rip, decode back to stereo, and you're done. English vocal separation will get rid of the stereo bits from the side channel, so you won't be hearing both languages at once. Obv you have to align the sources too, which can be tricky if there are any changes in timing between the versions. If the French rip is only a single channel from stereo source, then do your way of isolating both languages.\n\nInstead of using your own GPU you can send your audio file to [Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) and use smth like Mel-Roformer v1e with default settings.” (introC)\n\nQ: What difference do you mean with \"a downmix between stereo channels\" and \"a single channel from source stereo\"? It seems the same thing to me but I may be missing sth obvious.\n\nA: A single channel is either left or right, downmix is an average of left and right\n\nQ: Then I guess what I have is a downmix?\n\nA: Not sure, you need to check. I can check if you want, sent the English and mono audio. But basically, align the tracks, downmix the English track and check for null when mixing with the French mono track. If it isn't null, then it's not a downmix (or the audio tracks have other differences)\n\n\\_\\_\\_\\_\\_\n\n[OG](https://colab.research.google.com/drive/12siscZBrM9SwxITGipAvSobVtQ8t2Pus?usp=sharing) VR broken Colab by HV fixing history\n\nIt was fixed by adding these lines to it:\n\n!apt-get install python3.8\n\n!apt-get install python3.8-distutils\n\n!apt-get install python3.8 pip\n\n!python3.8 -m pip install librosa==0.9.1\n\n!python3.8 -m pip install numpy==1.19.5\n\n!python3.8 -m pip install numba==0.55.0\n\n!python3.8 -m pip install tqdm\n\n!python3.8 -m pip install torch==1.13.1\n\nand renamed inference Colab line to python3.8\n\n(not necessary)\n\n! pip install soundfile=0.11.0\n\ndistutils was necessary to fix numpy wheel error, but regular 3.8 installed before was necessary for Colab to recognize !python3.8 commands. Because 3.8 was bare, it needed pip installed separately for this 3.8 installation. Then the rest of the necessary packages are installed for 3.8 - the old librosa fix, numpy for 3.8, and broken dependencies numba and tqdm. Then, the last torch working in HV Colabs was 1.13.1, 1.4 didn't work though it's compatible with 3.8. Maybe CUDA or generally upgraded Ubuntu problem. Can't tell. It was necessary anyway because Torch wasn't installed for 3.8.\n\nAdditionally, to fix the regular VR Colab, this line was necessary:\n\n!python3.8 -m pip install opencv-python\n\nAnd for some reason, I needed to install these with normal pip like below, and with python 3.8, so basically twice, otherwise it gave module not found\n\n! pip install pathvalidate\n\n! pip install yt\\_dlp\n\nThat all hassle with Python 3.8 is necessary because numpy on Colab got newer version, and newer ones no longer supports function used in HV Colabs, as they got deprecated.\n\n#### Separating people in recording\n\n*Guide and script for* [*WhisperX*](https://github.com/m-bain/whisperX) *by Pashahlis/ai\\_characters*\n\n“A script on the AI hub discord for automatically separating specific voices from an audio file, like separating a main character's voice from an anime episode.\n\nI massively updated this script now, and I am also posting it here now, since this discord is literally about that kinda stuff.\n\nScript to automatically isolate specific voices from audio files\n\n(e.g. isolating the main character's voice from an anime episode where many different characters are speaking).\n\nAfter literal hours of work directing ChatGPT, fixing errors, etc, there is now a heavily updated and upgraded script available:\n\nI encountered some transcription errors (musical notes, missing speaker or start and end times) that would result in the entire script failing to work. So the updated script now skips such audio. That is not a problem, however, as for a 22-min file it skipped only 16s of audio and the errored audio is just music or silence anyway.\n\nIt now also automatically merges all your audio files into one if you provide multiple, so that the speaker diarization remains consistent. This increases diarization time by quite a lot, but is necessary. The merged file will be temporarily saved as a .flac file, as .wav files have a maximum file size of 4gb. The resulting speaker files at the end of the script are created as .wav again, though, as it is unlikely they will reach 4gb in size.\n\nI also added helpful messages that tell you at which state of the script it currently is at and which audio files it is processing at the start with the total length of audio being processed.\n\nI also made sure that it saves the speaker files in the original stereo or mono and 16 bit or 32 bit format.\n\nAt the end of the script execution, it also lists all the speakers that were identified in order of and with the audio length for each speaker. It also lists the total amount of audio length that had to be skipped due to processing errors, as well as the total time it took to execute the script.\n\nLast but not least, I ran this script on a vast.ai rented Ubuntu based VM with a 4090 GPU and it worked. I did this to test Linux as well as because I was processing over 4h of audio, so I wanted this to be fast. Keep in mind that if you are running this script on your home PC with a bad GPU and are processing a lot of audio, it can take quite a while to complete.\n\nScript is attached.\n\n<https://drive.google.com/file/d/13iY2knyABBU-MOaMN5_zNAoDHFMZY6SD/view?usp=sharing>\n\nexample console output and example speaker output:\n\n<https://discord.com/channels/708579735583588363/708579735583588366/1132503652033114192>\n\nUsage instructions:\n\ninstall whisperx and its additional dependencies such as FFmpeg as per the instructions on the GitHub page <https://github.com/m-bain/whisperX>\n\nAdditionally, install pydub (and any other dependencies you might be missing if the script gives an error message indicating you are missing a dependency)\n\ninstall ffmpeg-python, make sure to use the following command instead of pip install if you're running this in a conda environment, otherwise it won't work: conda install -c conda-forge ffmpeg-python\n\nedit the script to include your huggingface token and path to the folder containing the audio files you want to process\n\nrun the script simply by python your\\_filename\\_here.py\n\nResults are quite good for what it is, but you'll definitely need to do some additional editing in audacity and ultimate vocal remover or whatever afterwards to cut out music, noise, and other speakers that were wrongfully included. It definitely works best with speakers that appear a lot in the audio file, like main characters. It does a very good job at separating those.\n\nI won't provide tech support beyond this, as I am no programmer and did this all by just directing ChatGPT.”\n\nOr check [alternatives](#_ea9fj444mg3m)\n\n### UVR5 GUI (MDX, VR, Demucs 2-4 and UVR team models)\n\nGUI provides more functionalities and models/AIs compared to Colabs, incl. custom model import:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases>\n\nOfficial app Win 11 installation tutorial:\n\n<https://youtu.be/u8faZW7mzYs>\n\nMacOS build:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6> (or [beta](#_6y2plb943p9v))\n\nMacOS Catalina tutorial (outdated at this point):\n\n<https://www.youtube.com/watch?v=u8faZW7mzYs>\n\n(you better don’t run Windows build in W10 VM or you will get like 3 hours processing time)\n\nWindows 7 users:\n\n\"To use the newest python 3.8+ with Windows 7 install VxKex API extensions and in case of problems select Windows 10 compatibility in EXE installer properties.\"\n\n*Here you can find a searchable PDF guide by the devs for UVR5 GUI describing functions and parameters (can be outdated):*\n\n<https://drive.google.com/file/d/1RtMRj8FpSpMHlK1XxaBrKoWQlmMfCSmy/view?usp=drive_link>\n\nVideo guide:\n\n<https://youtu.be/jQE3oHXfc7g>\n\nOnline guide:\n\n<https://multimedia.easeus.com/ai-article/how-to-use-ultimate-vocal-remover.html>\n\n(their instructions for installing and using stable version seems to be fine, despite the fact they recommend to clone repo for Macs. It's already available as appropriate binaries for M1 and Intel CPUs (/wo Roformer beta patch at the moment)\n\nSome of UVR5 GUI models described in this guide can be downloaded via the expansion pack:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.3.0/v5_model_expansion_pack.zip>\n\nVIP models\n\n<https://www.buymeacoffee.com/uvr5/vip-model-download-instructions>\n\n(some older) settings for the GUI:\n\n<https://photos.app.goo.gl/EUNMxm1XwnjMHKmW6>\n\n(though it's mostly outdated).\n\n(no longer necessary as UVR now has separate DirectML branch and executable:)\n\nOptional fork of UVR GUI for AMD and Intel cards, currently supporting only VR Architecture and MDX using DirectML (Demucs currently not supported). If you have Nvidia card, then use official app above since CUDA is supposed to be faster.\n\n“A four minute and 20 second audio takes about 30 seconds (including saving) using 1\\_HP-UVR on an Intel Arc A770 16GB. It takes up approximately 6GB of VRAM.”\n\nIf you only use MDX models, in most cases it won't be faster than processing with CPU - i5 4460 has similar performance to RX 6700 XT here, so better stick to official app.\n\nCompared to Roformer beta #8, it’s still much faster at least for VR models, but you might get some issues with MDX-Net models, though.\n\n<https://github.com/Aloereed/ultimatevocalremovergui-directml>\n\n*Python command line fork of UVR 5 with current models support:*\n\n<https://github.com/nomadkaraoke/python-audio-separator>\n\n(moves from: <https://github.com/karaokenerds/python-audio-separator>)\n\nIt was based on some outdated UVR code, but probably got updated since then to support more models.\n\n#### GUI FAQ & troubleshooting for UVR\n\nStart with reading information about [Roformer patch](#_6y2plb943p9v)and its common issues section\n\n- “If you enable the \"enable help hints\" setting” “you can hover parameters with the mouse, [and] you'll get [settings] info hints (...) if [it’s] not activated by default)”\n\n- See the section [above](#_czix2y8eiuna) for UVR installation and usage guides\n\n- It's not guaranteed to run on older versions of Windows than 10, so do it at your own risk.\n\n“3.8.10 is the last [Python] official installer that works on Win7, however I was able to find an unofficial [Python] installer from GitHub for 3.10.13 on Win7 and that seemed to do the trick! No more error on load of UVR”\n\n- You may encounter “Encoding failed. ffmpeg/avlib returned error code: 3221225477” while using Manual ensemble and output set to mp3 on Windows 7\n“Think I've found my problem. I used a full build of FFMPEG instead of the essentials one.”\n\n- “If anyone needs the solution to running it on [MacOS] Mojave+ go to the Releases page on GitHub scroll down to 5.5, under assets grab UVR 5.5 x86\\_64\\_9\\_29.dmg. Confirmed working now on my Mojave machine. Thanks to @ambresakura on GH”\n\n- Installing the GUI outside the default location on C:\\ drive, esp. with older versions may result in e.g. startup issues (although they seem to be fixed in 5.6 and Roformer patches). If you lack space on C: drive, create your folders using [Symlink Creator](https://github.com/amd989/Symlinker) to redirect the content to some other disk, keeping the C:\\ location in the Windows file system logic.\n\nOr else, copying only Ultimate Vocal Remover\\gui\\_data folder to the C:\\ drive while keeping the GUI installation on another drive might work as well. Although there seem to be no issues with the latest stable 5.6 opened from a different location than default, and standalone Roformer patch installed in a different location.\n\n- There’s no way to bypass the 3 GB free disk space requirement on C: drive, even for using memory-light AudioTools. [Here](https://github.com/Anjok07/ultimatevocalremovergui/issues/285#issuecomment-1244606111) someone set UVR to L: drive, and it read free space from that letter, but I’m not sure if they just installed UVR in that location, and whether it’s still possible using the latest UVR versions (even when UVR was uninstalled previously) or whether it’s enough to copy UVR elsewhere and/or change something in the registry about installation path.\n\n- Be aware that your system may occasionally become unresponsive on slow 2 and 4 core configurations with GPU Conversion disabled while separation is progressing (although, you can set all the priorities in Process Lasso to Idle, and it will be saved for future use).\n\n- The provided directory is not writable or read only\n\nRun UVR as admin, or potentially changing privileges to the output/input folders to everyone might help too (alternatively use “Context Menu” for it described [here](https://www.makeuseof.com/windows-10-11-own-folder/))\n\nQ: Why Vocal Dereverb Options are greyed out. I can't select more, only \"main vocals\"\n\nA: This option removes reverb from a vocal stem.\n\nYou must have the \"UVR-DeEcho-DeReverb\" VR Arch model installed to use this option.\n\n- Matchering doesn’t work correctly with Opus files (error occurs)\n\n- Matchering doesn’t work correctly with mp3 files on Mac (at least x86, error occurs)\n\n- Matchering input audio file length limit before error occur is 14:44 or 15 minutes\n\n- Matchering and Manual Ensemble use only CPU and are fast\n\n- “Download speed of models via Download center was really slow for no apparent reason, like some users have already reported.\n\nI've reduced [the] UVR[’s] window while the download was ongoing and the download speed fastly improved instantly.\n\nI've restored the window, download speed was again instantly slowed down. Re-reduce UVR window, download speed back to normal again…”\n\n- Official UVR requirements from GH page:\n\nNvidia RTX 1060 6GB is the minimum requirement for GPU conversions.\n\nNvidia GPUs with at least 8GBs of VRAM are recommended.\n\nIntel Pentium might be unsupported, but AVX or SSE4.2 instructions are not required, so even newer C2Q like Q9650 with SSE4.1 will suffice.\n\n- 2GB VRAM GPUs had some issues even on CPU, maybe it's fixed already\n\n- Official minimum RAM requirement is 8GB, although it works correctly on 6GB RAM too. With 4GB RAM you can run out of memory on longer tracks (probably fixed in many cases in the v 5.5 and newer - you’re able to separate Roformers on GPU on 4GB VRAm with 2,5s chunks).\n\n- As from new Nvidia GPUs, something like RTX 3050 (8GB) is a good, cheap choice for even the heaviest processing and is (theoretically) equivalent to Colab's Tesla T4 for CUDA computing power (but it's not really enough for training, of course, and in Colab slower like 3 times). But watch out for smaller 4GB laptop variants, as they can be more problematic.\nBut if you separate a lot using Rofomers, definitely consider something better (look for as menu CUDA cores as possible)\n\n*(troubleshooting continues later beneath)*\n\n**CPU/GPU performance in UVR**\n\n- The higher the total amount of CUDA cores for Nvidia GPU, the faster separation in UVR\n\n- AMD and Intel ARC GPUs using OpenCL are slower in this separation task than CUDA used in Nvidia GPUs. So it’s safe to say that Nvidia GPUs from the same performance segment will be most likely faster for separation.\n\n- Min. 4GB VRAM GPUs tested (with some yaml tweaking for Roformers described below), On AMD, 16GB VRAM recommended (so no modifications are required).\nMin. NVIDIA Maxwell/900 series GPUs/compute compatibility 5 is the minimum requirement (at least NVIDIA GT 700 series and older are unsupported returning CUDNN\\_STATUS\\_NOT\\_INITIALIZED).\nFor AMD, at least RX 4GB models tested (not sure about R9 200 4GB GPUs - either if on newer modded Radeon-ID drivers and/or with downgraded DirectML.dll attached with your drivers, copied to UVR\\torch\\_directml folder)\n\nIntel was confirmed to work with ARC GPUs, and Xe integrated graphics (e.g. Tiger Lake 2021) with MDX-Net HQ (v2) models.\n\nRTX 5090 is not yet supported in official UVR packages, you can use OpenCL (DirectML) in options instead (slower).\n\n2GB cards will probably cause issues, and 4GB VRAM too - at least on certain Roformer models and settings (unless dim\\_t equivalent of chunk\\_size is set to 301/201, but 201 might have a bit of audible audio skipping at times - see [here](#_6y2plb943p9v) later below for dim\\_t>chunks\\_size conversion)\n\n- Not meeting these requirements, you’re forced using CPU processing, which is very slow - even Ryzen 5950X is slower than 1050 Ti in this application, and 1700X is slower by double than even 940M.\n\n- Since the last updates you can also use AMD/Intel GPU, with separate installer with OpenCL support (most likely min. requirement is GCN or Polaris architectures and up - HD 7XXX and RX 4XX, but even 4GB variants may crash on certain settings).\n\n- Old drivers for GCN GPUs might fail using GPU Conversion option. Consider using Radeon.ID drivers or using DirectML.dll copied from your Windows installation into Ultimate Vocal Remover\\torch\\_directml\n\n- You can also use Mac M1 for GPU acceleration (MPS GPU support in separate installer), but also Radeons acceleration on Intel Macs is supported\n\n- GTX 1660 Ti (6GB) is slow for separation using Roformer models. A better choice is RTX 3050 despite similar performance in games, 3050 has the same amount of CUDA cores as Tesla T4 on Colab, but with less VRAM. Avoid the laptop variant of 3050 which has only 4GB of VRAM, and not 8GB like the desktop variant.\n\n- If you want a fast 2nd hand GPU with more VRAM, consider 1080 Ti or 2080 Ti or even 3080 Ti (16GB). Pretty fast ones for separations.\n\n1080 Ti is much faster in this task than 3060 12GB.\n\n<https://media.discordapp.net/attachments/767947630403387393/1133164474749169864/image.png> (dead)\n\nQ: My AMD/Intel GPU has sudden spikes of usage, or just 30% is being utilized. Is that a CPU bottleneck?\n\nA: Nope. It's just how inefficiently DirectML behaves. It's normal and happens for all people, even on some ancient 4 cores.\n\nWe’ve tested various DirectML.dll libraries major versions since 1.9 (and besides 1.12 and 1.14), up to 1.15.4, and the one attached with UVR (1.10.1.0) was the fastest (1.5.1.0 didn’t work). [Link](https://drive.google.com/drive/u/0/folders/18O65sau8TmtQaeOA36SGuIxya7ieq9vs)\n\n###### **Separation times chart by Bas Curtiz** (various CPUs and GPUs, model cut-off examination)\n\n<https://docs.google.com/spreadsheets/d/1R_pOURv8z9GmVkCt-x1wwApgAnplM9SHiPO_ViHWl1Q/edit?gid=460807774#gid=460807774>\n\n(probably the results made with old UVR Roformer patch when lower overlap means longer processing time, so the opposite to what's in patches newer than #2)\n\nIn addition to the above -\n\n*for CPU-only:*\n\n- MDX-Net HQ\\_3 in UVR with CPU takes 2 minutes with Ryzen 5 3600 (some regular song time)\n\n- HQ\\_4 takes ~13 minutes on C2Q @3.6 DDR2 on CPU, 6GB RAM with default settings\n\n- HQ\\_3 for 4:19 track takes 20 minutes and 22 seconds ([default](#_6q2m0obwin9u) overlap and 256 segments, iirc default too)\n\n- HQ\\_4 is much faster on the same CPU\n\n- On AMD A6-9225 Dual-Core CPU (2/2), 4GB RAM three models ensemble (MDX, MDX, Demucs 4) it took almost 17 hours.\n\n- On i3 3xxx it took around 8 hours (not sure about song elapsed time).\n\n- The main burden in this ensemble on such configuration is Demucs\n\n- MDX23C and Demucs ht/ft cannot be processed under ~5-17 hours without GPU acceleration with CPU-only using C2Q. Probably the same for Roformers.\n\n- MDX-Net HQ\\_3/4 models and VR models with 512 window size are fine on the same configuration\n\n- MDX-Net HQ\\_3 in UVR with CPU takes 2 minutes with Ryzen 5 3600 - for unknown regular song between 2-4 minutes)\n\n- HQ\\_4 takes ~13 minutes on C2Q @3.6 DDR2 with default settings (and it’s faster than HQ\\_3) - for uknown regular song between 2-4 minutes)\n\n*for GPU Conversion:*\n\n- It will take 39m 28s for 3:28 song using 1296 model on RX 470 4GB and C2Q @3.6 DDR2 and beta 2 patch (GPU OC doesn't really matter here, so the same will be for RX 570 which is basically the same chip after OC/different BIOS) and 18 minutes for Kim Mel-Roformer and 3:01 song, and 45 minutes for unwa v2 and 3:52 song. It’s pretty possible that we have a huge CPU bottleneck in that case, as CPU still takes crucial part in separation, even for GPU separation.\n\n- iGPU Intel Iris Xe Tiger Lake 11th Gen i7 on LG Gram notebook from 2021 (newer UVR patch)\n\nI used Unwa's duality v2, chunk size: 2s\n“I separated a 6 minute long song, it was a flac file, it took 38 minutes.\n\n- With CPU processing it took around and an hour and twenty minutes iirc\n\nkeep in mind, I had stuff like VSCode, librewolf, Firefox etc in the background hogging up memory and CPU as well, during both those instances” 12/16GB RAM recommended - it uses RAM as VRAM and you will clog almost whole 16GB RAM fast when using apps in background\n\n- Using HQ\\_4 is much faster than real-time using default settings, but even longer than accelerated Roformer, when on CPU only using old Core 2 Quad @3.6 DDR2 800MHz.\n\n- On Mac M1 using dedicated Roformer patch:\n- it takes 9 minutes to process a 3-minute song using BS-Roformer 12xx viper model (dim\\_t 1101, batch size 2, overlap 8) with “constant throttling”.\n\n- below 4 minutes for Kim Mel-Roformer (overlap 1, dim 801)\n\nand 11:12 for MDX23C-InstVoc HQ for 04:11 track with default settings\n\n- On 6800 XT it takes one hour for 5 minute song using overlap 5 and unwa’s inst v2 Mel-Roformer model (newer UVR patch)\n\n- RTX 3060 Ti allows 3x realtime using BS-Roformer SW 6 stems:\n3 minutes of audio takes a minute to process (newer UVR patch)\n\n*Some separation times above could have changed a bit in newer UVR beta Roformer patches than #2*\n\n\\_\\_\\_\\_\\_\\_\\_\n\n*(FAQ continues)*\n\n- Vocal chops using MDX models are more likely to appear on 4GB VRAM cards (use CPU processing with e.g. 12GB of RAM to get rid of the issue). MDX HQ\\_1 (or later) model can cause errors on some 4GB VRAM laptop GPUs at least with wrong parameters (you might want to use CPU processing instead, then min. 8GB RAM recommended). We’re talking about the newer Batchmode (you cannot choose Chunk mode anymore in newer versions).\n\n- (no longer needed) 4GB VRAM cards should be supported out of the box with chunks set to auto (6GB may be required for longer tracks for auto setting or batch processing for chunks higher than at least 10).\n\n(probably fixed) 4GB GPUs will sometimes force you to reopen UVR after each separation to free up VRAM or else separation might be unsuccessful (setting chunks in old versions of UVR to 10 or lower might alleviate the issue).\n\n- UVR5 GUI instead of old CML “mirroring” has now “hi-end process” for VR models which is actual mirroring (no mirroring2, not sure about possible automatic bypass from CML while using ensemble of VR models) but don’t confuse it with old “hi-end process” from CML version which was dedicated for 16kHz models.\n\nQ: If you run a single model with default configuration, it is okay with success. The problem is when ensemble 2 models, it does not have enough resources to complete the process. Unless using a manual ensemble. It also has an error if the chunk size was changed, even with a single model. Seems there is not enough VRAM for processing the song.\n\nA: I had the same issue the other day running ensemble 4 models.\n\nTurned out - as the error msg showed, the chunk size was too big...\n\nI prolly must have changed it by accident to `Full` - when I set it back to `Auto` - it was able to process.\n\nU can find this setting under Settings > Advanced MDX Options.\n\n- (probably fixed) For 4GB VRAM GPU and VR 3090 models (e g. 1,2,9\\_HP-UVR) you may need to split e.g. 2:34 song into two parts (I recommend lossless-cut) or eventually use chunks option if you encounter run out of memory CUDA error. Lossless-cut will do chunking, so it won’t be necessary to set chunks in UVR in case of some problems (not in all cases on 4GB VRAM).\n\n- (fixed in the latest Roformer patch iirc) \"When choosing to save either vocals or instrumentals only, the app saves the exact opposite (if I want to save vocals only, it will save instrumental, and vice versa)\"\n\n- A value of 10 for aggressiveness in VR models is equivalent to 0.1 (10/100=0.1) in Colab\n\n- Hidden feature of the GUI:\n\n\"All the old v5 beta models that weren't part of the main package are compatible as well. Only thing is, you need to append the name of the model parameter to the end of the model name\"\n\nAlso, V4 models are compatible using this method.\n\n- The GUI also has all 4-6 stem models from Demucs 4 implemented. For 4 stem, simply pick up \\_ft model since it's the best for 4 stems. Demucs-UVR 2 stem model trained on Demucs 3 gets worse results than newer Demucs 4 ft model.\n\n- You might consider using Nvidia Studio Drivers for UVR5. Versus Game Ready normal drivers, they can be more stable, but less often updated. You can check your current type of drivers in GeForce Experience (but if you don’t know which ones you have, they’re probably Game Ready)\n\nQ: Is there a way I can remove models I already have downloaded?\n\nI want to remove all the HP models, but I don't want to delete them from the directory, I want to be able to get them back if I need them\n\nA: Check the current Download center and if all the models you want are there, then you can delete them and redownload from there later\n\n1. Delete the models from the directory, or\n\n2. Move the models to a separate folder out of the directory\n\n- At least since introduction of Batch mode (now default and the only option in 5.6/Roformer patch), stability of the app on lower VRAM GPUs got improved, but you can see more vocal residues processing on 4GB GPU vs on CPU, while 11GB GPU doesn't really have that problem.\n\nMaybe something changed since batch mode was introduced, but some vocal pop-ups could be fixed only with chunks set to 50-60 (11 and 16GB VRAM cards only) in the older UVR versions and CML code.\n\nSome low values were still culprits of vocal pop-ups in chunks mode (at least before the patch).\n\nI’m not sure if the way of handling chunks has changed since the integration of inference code with MSST repo in for MDX-Net menu models.\n\n- Chart showing separation times for various MDX models and different chunks settings on desktop GTX 960 4GB - [click](https://cdn.discordapp.com/attachments/767947630403387393/1061182654621429811/image.png) (dead)\n\n- More in-depth - Settings per model SDR vs Time elapsed -||- (incl. dim\\_t and overlap evaluation for Roformers) - [click](https://docs.google.com/spreadsheets/d/1XNjAyKwA2RkyOA_agmaV_6Xp2xXOHjJV09t-ho0nngk/edit?gid=1530726921#gid=1530726921) | [conclusion](https://imgur.com/a/KBYHdNK)\n\n- Linked above frequency cutoff + Time elapsed per model (GPU vs. CPU) - chart by Bas Curtiz - [click](https://docs.google.com/spreadsheets/d/1R_pOURv8z9GmVkCt-x1wwApgAnplM9SHiPO_ViHWl1Q/edit#gid=23473506)\n\n- (no chunks in 5.6 anymore) On 4GB VRAM cards, you can encounter crashes with the newest instrumental and Kim vocal model while using batch processing. Lowering chunks to at least 10 (but better lower, sometimes still crashes) should help\n\n\\_\\_\\_\n\nThere’s no way to bypass 3GB free disk space requirement on C: drive, even for AudioTools\n\n- If your disk space in not freed after separation, check in PowerShell if you have Memory Compression and Page Combining enabled, by typing:\n\nMMAgent. If not 1) Type: Get-MMAgent 2) Then: Enable-MMAgent -mc ([video tutorial](https://drive.google.com/file/d/1LGVnJgivysKkXOkc2IfZAoF58f9MMJXt/view?usp=sharing))\n\nMore typical ways to get more space on C:\\\n\n- If something is suddenly eating your disk space on the system disk, check: C:\\Users\\User\\AppData\\Local\\CrashDumps because UVR can create even a few gigabyte crash dumps. Consider turning on compression in properties for that folder.\n\nAlso, you can simply search for \\*.dmp and delete all the existing crash dumps on C: drive.\n\n- You should have around 20GB of free space on C: drive after UVR installation on 12GB RAM configurations for separating top ensemble settings (it uses a lot of pagefile) and at least 10GB for 24GB RAM for long songs on 4GB VRAM cards. You can enable pagefile on another drive as well if you run out of space on the system drive (better if it was an SSD as well).\n\n- Go to Safe Mode and delete your GPU drivers with DDU - it will delete all remain remnants from older versions of drivers\n\n- Delete all restore points besides the newest\n\n- Delete cache taking the biggest amounts of space in your browser if you don't want to clear browser data entirely (e.g. the old-fashioned cookie cleaning).\n\nE.g. in Chrome you can do it here:\n\nchrome://settings/content/all?sort=data-stored\n\n- Consider using CompactGUI for using a better compression algorithm for system compression than built-in context menu. Some programs compress excellent. E.g. Office.\n\n- Use old-fashioned disk cleaning feature in context menu of disk in Computer, and click on the next menu to see more entries (but cleaning up temp in appdata and Windows folder will do similar trick)\n\n- Consider using TreeSize Free in order to investigate the biggest files and folders on your partition\n\n- Sometimes Windows Update leaves lots of unused files after updates are installed - they can be cleaned up too by some methods.\n\nThis command helped me free some disk space in the past. Iirc, precisely for WU cache in C:\\Windows\\SoftwareDistribution\n\nstart %systemroot%\\system32\\rundll32.exe advapi32.dll,ProcessIdleTasks (it will take up to 15 minutes, leave PC for some time, and observe how some processes suddenly use CPU or disk, and suddenly stop, then it's done, eventually maybe after restart the space is freed)\n\n- You can shrink the pagefile to min. 500MB on C: drive and use other partition for pagefile\n\n<https://mcci.com/support/guides/how-to-change-the-windows-pagefile-size/>\n\n\\_\\_\n\n- Q: When ensembling and having settings test mode enabled, UVR keeps all the different outputs before ensembling in a folder. If you're not careful, these quickly can stack up.\n\n[Is it] Possible to have a feature where UVR automatically deletes those after ensembling?\n\nA: Disable '\\*Save all outputs\\*' in \\*Ensemble Customization Options\\* > \\*Advanced Option Menu\\* is what you ask for.\n\n- Performance of GPU per dollar in training and interference (running a model): [click](https://timdettmers.com/wp-content/uploads/2023/01/GPUs_Ada_performance_per_dollar6.png)\n\n- How to check whether the model is instrumental or vocal?\n\nQ: Are VR Arc models also grouped between instrumentals/vocal models, or it's just MDX-Net models?\n\nA: The moment you see Instrumental on top (and Vocal below) in the list where GPU conversion is mentioned, you know it's an instrumental model.\n\nWhen it flips the sequence, so Vocal on top, you know it's a vocal model.\n\nSame happens for MDX and VR archs.\n\nQ: [How to] have UVR automatically deleting the ensemble result folder after processing a song.\n\nA: Go to settings, ensemble options, uncheck \"Save all outputs\".\n\n- You can perform the Manual ensemble on your own already separated files (e.g. from Colab) in UVR5 under \"Audio Tools”. Just ensure that files are aligned (begin in the same place). Sometimes using lossy files can mess with offset and file alignment.\n\n- Furthermore, you can use Matchering in Audio Tools, e.g. to fit muddy results without residues, to the separation with more clarity, but containing residues you want to get rid of. Just use file without residues as target,\n\n- If you have crashes on “saving stem” uninstall odrive\n\n- Q: An option to add the model's name to the output files (this existed in a previous version of UVR but now it's gone) it was really useful when you needed to test multiple models on the same song\n\nA: It's still there under additional settings \"Model Test Mode\"\n\n- Q: I want to separate an audio from a video (input is still empty when I choose a file)\n\nA: Go to General Process Settings>Accept Any Input\n\n- Q: First time trying the ensemble mode and I used the VR Models: \"De-Echo-Aggresive, De-Echo-Normal, DeEcho-DeReverb, DeNoise\" now the outputs confuse me. In the folder called \"Ensembled-Outputs\" there are many files which are from each of the models. Outside that directory are 2 wav files, one says Echo the other No Echo. Isn't the ensemble mode basically a wav file that goes through each model and saves a final wav file after it went through all the models listed?\n\nA: The two files outside the ensemble folder are the final ensembled files.\n\nThe folder is all the separate outputs from each model (you've enabled that in settings)\n\nQ: Those files are final after they went through all the models, right? Not just the DeEcho model.\n\nA: Yes\n\nQ: I am just suspicious of the naming, I see at the time, and it makes sense that the files outside the directory are the final version although are they after all the models or just 1 model.\n\nA: The naming is just whatever stem is assigned to the models, in your case all the models output echo and no echo file\n\nso the final ensemble files will have that in the name\n\n- Q: What is this \"[band](https://cdn.discordapp.com/attachments/900904142669754399/1193600744377552987/image.png)\" that I keep seeing in the spectrograph of tracks that I've isolated with x-minus?\n\nA: MDX noise - a noise it produces no matter what. In UVR you can use denoise standard/model in options>Advanced MDX-Net it will do exactly what the below describes:\n\nYou can either use UVR De-noise model or isolate the track twice. Once normal one and already inverted,\n\nthen you add the results of normal-inst, inverted-inst, reinvert the inverted-inst, merge both normal and reinverted-inst.\n\nThe merged will be without noise, but 6 dBs higher - so lower the gain accordingly, and you'll get the same, just no noise. Repeat that for vocals obviously.\n\n- Q: voc\\_ft doesn’t have any spectrum above 17.7kHz. How to restore it, and have e.g. 48kHz or 96kHz output like the input file has?\n\nA: Turn off “Match Freq Cut-off” but it copies the remaining frequencies from the original, leading to possibly more noise.\n\n“if you want true 96 kHz you need to manually lower the rate for 44100 Hz or less since the models themselves are 44100 Hz”\n\n- It can happen that VR models using 512 window size can crash on 4GB cards, but 272 will be fine, although it will take more time\n\nQ: “I have tried everything and also googled a lot, but UVR with MDX-Net is producing me this type of noise in every sample I have tried, that was not in the recording before. Anybody have an idea what can cause it?”\n\nA: “It’s just part of the architecture. Either run it through a denoise model or run it through it twice with the second time the sound being phase-inverted”\n\n“Enabling Denoise Output should do the trick. I use the Denoise Model option, seems to work quite well, to my ears, at least”\n\nQ: “Is there any way to fix the uvr bve model saying \"vocals\" on the bgv and \"instrumental\" on the lead vocal file? It's unbelievably annoying”\n\nA: Change primary stem from whatever it is set to the opposite in model options ([screenshot](https://cdn.discordapp.com/attachments/875539590373572648/1204228165506039898/image.png))\n\nQ: Matchering gives errors with long files.\n\nA: 14:44 input length limit for both target and reference audio is set, and sth slightly above it caused error (probably a bit above 15 mins, so maybe 15 minutes is a limit).\n\nIf you see the error log, it will specify whether the reference or target file is too long, but the limit is the same for both.\n\nQ: “Is there any way to batch-process multiple different models on the same file?\n\nA: Yeah, ensemble, turn individual outputs on [in options], you'll have the same song over and over, each with a different model name attached, all saved before the final min/max/avg mix”\n\nIf you drag and drop many files at the same time into the input field and save intermediate files in options, you don't have to do manually start separation for every song.\n\nThe feature to save intermediate files is probably enabled by default in options>Choose Advanced>Ensemble>Save all outputs\n\nIDK if \"Model test mode\" in Options>Additional is necessary for it (Settings Test Mode can be additionally enabled too, just in case something gets overwritten by accident if you change settings).\n\nSo you can simply use Ensemble to pick all models you want to batch process, and using drag and drop, to separate all songs you want, using all models you picked in the Ensemble, and probably intermediate files (so from the models) will be saved rather intact, no matter what ensemble algorithm you'll use. You can make a manual ensemble with any algorithm with existing intermediate files later.\n\n- \"Invalid buffer size: 17.34 GB\" (Mac ARM) when using demucs\\_ft\n\nTry to uninstall and make a clean installation of UVR.\n\nConsider also using the latest Roformer [patch](#_6y2plb943p9v).\n\n(4-5 max ensemble explained moved to MDX settings)\n\n#### - Chunks may alter separation results\n\n(update: chunks are now replaced with batch mode on even 4GB cards, feature was introduced in one of beta patches and is available in v. 5.6, and you cannot use chunks experimentally in this version if batch mode gives you some vocal pop-ups vs 11GB GPUs which is a pretty common issue in 5.6; the old text for old UVR pre 5.6 code with chunks available follows).\n\nE.g. a bigger chunk value will less likely cause instruments disappearing.\n\nChunks 1 is not the same as chunks full (disabled). Also, chunks may cause distorting briefly some vocals in the middle when split is being made. Chunks “auto” is calculated individually for your VRAM and RAM configuration (also song length), so the result will differ among various users for the same song. Maximum chunks value differ for various MDX models (e.g. NET 1 will allow for bigger values than newer Inst models with higher training frequency). You can test what is the maximum supported chunk size for your computer specs till you encounter crash (e.g. for 5:11 song and inst main 496 - chunks 15 (20 for 30 s song) for 4GB desktop card, 38 for 6GB laptop card (50 for NET 1 model), and around 50 for 11GB). Sweet spot for 3:39 track is chunks 55 (works at least on 16GB VRAM) - more than that gives worse results. Also on some GPUs/configuration you may notice some variations in very short vocal bleeding not (fully) associated with chunks which don’t happen on e.g. x-minus or other configurations (1660 Ti vs 1080 Ti and 960 (we don’t know what causes it). In this case, you can only alleviate the issue by changing chunks. Be aware that low maximum chunks on 4GB cards beside more sudden vocal residues and cuts in the result, may cause also specific artefacts like e.g. beeping not existing on e.g. 11GB card (the issue happen in Kim vocal model).\n\n- FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\[user]\\AppData\\Local\\Programs\\Ultimate Vocal Remover\\models\\Demucs\\_Models\\model\\_data\\model\\_name\\_mapper.json'\n\n> “I wrongfully removed this lol. fixed by downloading demucs again before closing the current open window. glad I discoeverd it before that” - mohammedmehditber\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n##### (older) UVR & x-minus.pro updates/news (2021-2023)\n\nQ: What is the segment size/overlap for VOC FT processing for uve bve models on x-minus, aufr33?\n\nA: --segments\\_mdx 384\n\n--overlap\\_mdx 0.1\n\nuvr bve v1\n\n-0.2, -0.05 and 0.15\n\nAverage aggressiveness is 0.0 (for v2)\n\n- Anjok (UVR5) “I made a few more fixes to batch mode for MDX-Net before I release it publicly to GitHub later this week. This install also includes a new full band model that will be included in this week's public patch. Please let me know if you run into any bugs or issues.”\n\nLink (not needed anymore):”\n\n(the model is called UVR-MDX-NET-Inst\\_HQ\\_1 - it’s epoch 450, better SDR than 337 and 403 models, only sometimes worse than narrowband inst3 [464])\n\n- Anjok: \"I decided to make a public beta for everyone here who wants to try the new patch with \\*\\*batch mode for MDX-Net\\*\\* before I release it publicly to GitHub next week. This install also includes a \\*\\*new full band beta model\\*\\*! [full\\_403] Please let me know if you run into any bugs or issues.” Patch download link\n\nIf you don't have the new model on the list, make sure you have \"Download more models\" on your list.\n\n- The beta patches are currently only for Windows (but just the fb 403/450 models can be used in the older UVR version, and it works correctly - the patch itself is an exe installer which has the model inside and doesn't check for current UVR installation)\n\nUpdate 14.02.23\n\n\"I found a bug in the MDX-NET.\n\nIf the input song contains a DC offset,\n\nthere will be a lot of noise in the output!\n\nIt has already been fixed on the XM.\n\nIt will also be fixed soon in the next UVR GUI update.\" [Examples](https://discord.com/channels/708579735583588363/900904142669754399/1075014031326330991)\n\nUpdate 11/12.02.23\n\n\"I will soon add a new setting to fine tune the Karokee / B.V. model. This will help remove \\*\\*even wide stereo lead vocals\\*\\*.\n\n\"You can now specify the placement of the lead vocal. The percentages are approximate vocal wideness.\"\n\n[Here](https://discord.com/channels/708579735583588363/900904142669754399/1073823800161996931) is the current result. As you can hear, the lead vocals are hardly removed [in the old setting].\"\n\n\"this is super cool, if you invert the 2 results you can actually get the stereo width vocals isolated\n\n1 step closer to more than just 1 track bgvox separation\"\n\n\"Ooo that's very interesting, stereo lead vocals always get confused for background ones\"\n\nUpdate 4.02.23\n\nNew chain ensemble mode for B.V. models available on x-minus\n\n\"the chain is the best bg vox filtering I've ever heard\"\n\n\"It mixes MDX lead vocal and a little bit of instruments. The resulting mix is then processed by the UVR (b.v.) v2 model and the cleaned lead vocal is inverted with the input mix (song).\n\nUnlike min\\_mag and other methods, when using chain, processing is sequential. One model processes the result of another model. That's why I called it a \"chain\".\" Aufr33\n\nUpdate 31.01.23\n\n\"\\*\\*The new MDX Karokee model is ready and will be added to [x-minus.com] tomorrow!\\*\\*\\*\" aufr33\n\nNew Demucs 4 (probably instrumental) model is in training. edit. training stopped due to technical issues and full band MDX models were trained instead.\n\n\"Throwing a Demix Pro karaoke model for comparison... I think the bgv parts still sound better for this song, but demix has more noise on the lead parts\n\nDemix keeps more backing [backround] (and somehow the lead vocals are also better most of the time, with fuller sound)\"\n\n\"MDX in its pure form is too aggressive and removes a lot of backing vocals. However, if we apply min\\_mag\\_k processing, the results become closer to Demix Pro.”\n\n“In the future, we will create a [b.v.] model for Demucs V4. The MDX-NET is not really well suited for such a purpose.\"\n\n**Update 24.12.22**\n\nWind instruments model (saxophone, trumpet, etc.) added to x-minus for premium users (since March now also in UVR5).\n\n\"I tested. Maximum aggressiveness extracts the most amount of instrument, while minimum the least. The model is not bad at all, but has hiccups often (maybe it needs a much larger dataset)\"\n\nMaximum aggressiveness \"gives you more wind\".\n\n**Update 20/19.12.22**\n\nNew UVR5 GUI 5.5.0 rewrite was released. Lots of changes and faster processing.\n\nMDX 2.1 model added as inst main (inst main 496) in UVR5 GUI.\n\n- There was some confusion about MDX 2.1 model being vocal 438, but it’s inst main.\n\nMacOS native build available on GitHub.\n\nVIP models are now available for free with a donation option.\n\nMore changes:\n\n\"Pre-process mode for Demucs is actually very useful. Basically, you can choose a solid mdx-net or VR model to do the heavy lifting in removing vocals and Demucs can get the rest with far less vocal bleed\"\n\n\"Secondary Models are a massive expansion of the old \"Demucs Model\" check button MDX-Net used to have. You'll want to play around with those to find what works for the tracks your processing.\"\n\nThere was also Spectral Inversion added, but it seems to decrease SDR slightly.\n\nThere was an additional cutoff to MDX models introduced - “Just a heads up, for mdx-net, the secondary stem frequencies have the same cut-off as the primary stems now\n\nThere were complaints about lingering vocals (or instrumentals depending on the model) in the upper frequencies that was audible and very bothersome”\n\n**Update 04.12.2022**\n\n\"\\*\\*A new MDX model has been added!\\*\\*\n\nThis model uses non-standard FFT settings optimized for high temporal resolution: 2048 / 5210\n\n<https://x-minus.pro/ai?hp&test>\n\n[results are very promising]\n\nedit. 19.12. Final main model sometimes leaves more vocal leftovers.\n\n**Update 16.11.2022**\n\n\"Due to anti-Russian sanctions, I will no longer be able to receive your donations from December 9th. All available withdrawal methods are no longer available to me. I will try to solve this issue, and probably move to another country such as Kazakhstan or Uzbekistan, but it will take some time, and servers must be paid for monthly.\n\nAs a temporary solution, I will use Boosty. I ask everyone who is subscribed to Patreon to cancel your subscription and subscribe to Boosty: <https://boosty.to/uvr>\n\n\\*\\*Just a reminder that I'm switching from Patreon to Boosty.\\*\\*\n\nIf you want to renew your subscription but don't want to mess with Boosty, I've found an alternative for \\*European\\* users!\n\n<https://www.profee.com>\"\n\nIf you have any questions, DM aufr33 on Discord.\n\n**Update September 2022**\n\nNew VR model added to UVR5 GUI for patreons.\n\n**Update 31.10.22**\n\nThe release of the new instrumental models for patreons -\n\noptimised for better hi-end (lower FFT parameter), not so big cutoff during training and possibly better results for hip-hop (and possibly more genres).\n\n~~https://www.patreon.com/uvr~~\n\nUVR-MDX-NET-Inst\\_1 is Epoch 415\n\nUVR-MDX-NET-Inst\\_2 is Epoch 418\n\nUVR-MDX-NET-Inst\\_3 is Epoch 464\n\nThe last one is the best model (at least out of these three) so far, although -\n\n“I like it 50/50. In some cases it does a really good job, but on others it's worse than 418.”\n\n“New models are great! I'm having a little issue on higher frequencies hanging in the vocals, but I found I can remove that by processing again”\n\n\"Anyone else still uses inst 464? I've been testing it and my conclusion is that it's a great model alongside 418\n\nthe pros of it are that it sounds fuller and doesn't have a lot of vocal residues, but it falls short when picking up some vocals, there might be occasions where it misses some bits, or you can hear some very low or very high-pitched vocals (though this is mostly fixed by using other models)\"\n\n\"I've only tested one track so far, with 468 (My usual first test; Rush - The Pass). First off, it's the cleanest vocal removal of the track yet. First model to really deal with the reverb/echo and faint residuals ... but also the first model to trap a ton of instrumentation in the vocal stem.\n\nFascinatingly again, the UVR Karaokee model was able to almost perfectly remove the trapped instrumentation from the vocal line, creating a much more perfect result. I don't know if the new models were trained with this in mind, but the Karaokee model has proven to be extremely effective at this. The two almost work as a necessary pair.\"\n\n(UVR Karaoke model should be available on MVSEP or maybe also x-minus, and of course UVR5 GUI and it's free and public)\n\n**September update**\n\nof MDX vocal models added only for premium users (more models available in GUI, to be redeemed with code). They're available online exclusively for our Discord server via this link:\n\n<https://x-minus.pro/ai?hp&test-mdx>\n\n(probably not needed or working anymore as training is finished and final models are already released from this training period, but I'll leave it just in case).\n\n*edit. Be aware that models below are outdated and newer above supposed to outperform already them*\n\n*(outdated, as some old models got deleted from x-minus)*\n\nmdx v2 (inst) = 418 epoch (inst model)\n\nmdx v2 (voc) = 340 epoch (voc model)\n\nDescription for new **MDX** VIP vocal ones (instrumental based on inversion) and instrumental models (vocal models 9.7 (NET 1) and 423 availableon MVSEP under MDX-B option):\n\nVocal models:\n\n- beta 340 is better for vocals, while -\n\n- 390 has better quality for instrumentals, though it has more vocal residues.\n\n- \"423 is really nice for extracting vocals, but is not good for instrumentals\n\n- 427 is not good for me.\"\n\n- “In the last 438 vocals are really nice, also backing vocals. Unfortunately, we can hear more music noises, but voices are amazing” (it's good for cleaning artifacts from inverts). “(no longer available, at least on x-minus).\n\n- Beta 390 is better than 340. Instruments are cleaner but have more vocal disturbances.\n\n- I've tried a combination of MDX 390 - UVR min\\_mag\\_k. Not really bad at all”.\n\n- \"406 keeps most of these trumpets/saxes or other similar instruments, and ensembling with max\\_mag means it combines it with UVR instrumental which already keeps such instruments, so you get best of both worlds\".\n\nInstrumental models:\n\n- 430 or 418 are worth checking.\n\n**Update 17.11.2021** - older public UVR 9.6 and 9.7 vocal models (but still decent) for MDX are described in \"[MDX-Net with UVR team](#_pv80l0nr97r5)\" section.\n\n*Upcoming UVR5 updates (outdated)*\n\nSince the training of MDX September models is completed, some older beta models might not be available anymore.\n\nAs of the middle of September a new VR model was in training, but cancelled due to not \"conclusive\" results, although later a new VR model was released.\n\n\"these models will be next:\n\n1. Saxophone model for UVR.\n\n2. \"Karokee\" model for MDX v2.\"\n\nAlso, completely rewritten UVR5 GUI version.\n\nAmong many new features - new denoiser for MDX models available and new Demucs 4 models (SDR 9).\n\n#####\n\n##### ***Online sites and Colabs for separation - the best quality freebies you can currently get***\n\n*Refrain from using lossy audio files for separation (e.g. downloaded from YouTube) for the best results.*\n\n*See* [*here*](#_ataywcoviqx0) *for ripping lossless music from Tidal, Qobuz, Deezer, Apple Music or Amazon.*\n\nIf you don't have a computer, or decent CPU/GPU and separation is too slow on your machine using UVR 5 GUI, or it doesn’t work correctly for you, you can use these online sites to separate for free:\n\n[mvsep.com](http://mvsep.com) (lots of the best UVR 5 GUI models incl. various Roformers, and some exclusive models not available in UVR, and ensemble of these for paid users)\n\nThe page by MDX23 code/Colab original author and models’ creator - ZFTurbo.\n\n~~If you register an account on MVSep, you can output in .flac and .wav 32-bit float.~~\n\nSince 28.07.25, now 32-bit float for WAV will be used only if gain level fall outside 1.0 range, otherwise 16 bit PCM will be used.\n\nAlso, now FLAC uses 16-bit instead of 24-bit.\n\nIf you have troubles with nulling due to the new changes in free version, consider decreasing volume of your mixtures by e.g. 5dB, and you won’t be affected, although it might slightly affect separation results.\n\nIf your credits are higher than 0, you have a shorter queue (also users using the mobile app have “a bit higher priority”), 100 MB max file size/10 minutes (up to 10 concurrent separations in premium and 1GB/100MB in premium). You can disable using credits for non-ensembles in settings, for the cost of a longer queue again. Shorter queues seem to be currently around mornings of GMT +2/CEST (9 a.m.) or even early afternoon or late at night, depends - sometimes the queue goes crazy long randomly, but if you don’t care, you can just set your jobs and download it the next day. Usually weekends have higher queues.\n\nIn February 2026 MVSEP introduced a limit of 50 separations per day for free users.\nThere were cases where certain free users were clogging the queue substantially.\n\n“There won't be reset [time], it will just count all separations in the last 24 hours.”\n\nSelecting “Extract from vocals part” uses the best BS-Roformer models as preprocessor for the chosen model (currently BS-Roformer ver. 2024.08 - subject to change).\nFor Mel Band Kar dim\\_t 801 and overlap 2 is used, and for Mel Becruily inst/voc, Mel 2024.10, Mel Rofo Decrowd: 1101 and 2.\n\nIf downloading from the site is too slow try out e.g. Free Download Manager (Win) or ADM (Android) and/or VPN, or if you have premium you can use your credits to pack to zip the separation after your separation is done.\n\nFor some people it also helps to close all the MVSEP tabs while download has started.\n\nAs a last resort, use mirror.mvsep.com\n\nBatch processing with [API](https://github.com/ZFTurbo/MVSep-API-Examples) and [GUI](https://github.com/ZFTurbo/MVSep-API-Examples/tree/main/python_example4_gui) - [click](https://discord.com/channels/708579735583588363/911050124661227542/1335760269598523433) ([Mac](https://github.com/septcoco/macvsep/)). You can use MVSEP download links as remote URL in order to further separate the result (e.g. MVSEP Drums>drumsep for more stems).\n\nQ: Is there a way to turn off the normalization when using FLAC?\n\nIt's annoying when you have to combine the outputs later\n\nA: “No, if you turn off normalization, FLAC will cut all above 1.0\n\nAnd if it was normalized, it means you had these values.”\n\nFLAC doesn’t support 32-bit float, it’s 32 int, so normalization is still needed.”\n\nSo if your stems don’t invert correctly, just use WAV.\n\nQ: How multichannel is handled by MVSEP:\nA: [librosa script](https://discord.com/channels/708579735583588363/911050124661227542/1143136052547178516) which performs stereo downmixing (for 5.1 or 7.1 inputs)\n\nQ: I convert a song using v1e+ and use phase fix, then do another conversion using for example Gabox V7 and use phase fix, if I go back to upload the same song using v1e+ it gives the stems instantly but if I use phase fix it will process again, in the past it would remember\n\nA: This may be a temporary issue. Sometimes that server may be unavailable, then processing will start on another server.\n\nQ: What’s “include results from independent models”?\n\nA: “When you use an ensemble, you will also get results from each model of the ensemble and not only the ensemble final result.”\n\nQ: What means “Disown Expired Separations” option\n\nA: “we do not delete expired separation data (they are needed for analytics), but just remove your ownership from expired separations\n\nWe could have written delete expired separations, but wanted to be more clear about your data”\n\nQ: “So I understand, all the uploads are kept, regardless of 'disowning' or not. So what is the distinction between disowning and not disowning? Is there one?”\n\nA: no uploads are kept, just settings. If you disown, you won't see your expired separations\n\nQ: I will need a refresher in terms. Separations are created from (audio) uploads. Separations are also not kept? Only the settings used, i.e. kuielab\\_a\\_drums, aufr33-jarredou\\_DrumSep\\_model\\_mdx23c\\_ep\\_141\\_sdr\\_10.8059, and whatever segment, aggression, vocal only, etc are selected at the point of hitting 'do it'.. in a manner of speaking..\n\nA: separation is when you choose settings and upload file, we just save the settings and delete file.\n\nQ: How to use the same file over and over for different models in order to test them, but without reuploading the same file over and over\n\nA: “You can use remote upload for this. Just use link on file from previous separation. So you will not need to upload anything. <https://mvsep.com/remote>”\n\n[x-minus.pro](http://x-minus.pro) / [uvronlione.app](https://uvronlione.app) (-||-, 10 minute daily limit for free, very fast, mp3 192kbps output for free (lossless for premium), some exclusive models for paid users, Roformers will be back for free around 31 December 2024)\n\nThe site is made by one of the UVR creators and models creator - Aufr33 with dedicated\n\nOverlap 2 used for Roformers. At subscription level standard and above, song limit for Roformers is 20 minutes. For Mel Karaoke model, dim\\_t 256 and overlap 2 is being used\n\n“Mel-RoFormer by Kim & unwa ft3 and some other models are hidden. As before, you can find them here: <https://uvronline.app/ai?hp&test>” - Aufr33 ([link](https://uvronline.app/ai?discordtest) for free users)\n\nModel used for phase fixer/swapper/correction on the site is Mel-Roformer Becruily Vocal\n\n\\_\\_\\_\n\nAlternatively, you can use Google Colab notebooks for free (with time limits), which are virtual runtime environments configured to use specific architectures and models or ensembles\n\n(see dedicated sections in the document outline for more information on specific Colabs).\n\nIf downloading from the site is too slow, go to settings and turn on “Use CDN…” or try out e.g. Free Download Manager and/or VPN.\n\n[Makidanyee Colab](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY?usp=drive_link) - extended fork of the jarredou's inference Colab. It has a few cool features like adding model names and parameters, queuing multiple models separation, manual ensemble out of existing separations, newer models, including FNO and HyperACE support, zip/unzip of separations and more.\n\n[Phase fixer Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) by santilli\\_/Michael (fixed) - swaps phase from vocal model into instrumental model result to alleviate noise/vocal buzzing, also separates on its own [(](https://colab.research.google.com/drive/13qM3HaQB6nh-OzCEH5RBzTVzTb4829oY?usp=sharing)[OG](https://colab.research.google.com/github/lucassantillifuck2fa/Music-Source-Separation-Training/blob/main/Phase_Fixer.ipynb), [older outdated](https://colab.research.google.com/drive/13qM3HaQB6nh-OzCEH5RBzTVzTb4829oY?usp=sharing), [newer outdated](https://colab.research.google.com/drive/1uDXiZAHYk7dQajOLtaq8QmYXL1VtybM2))\n\n[Music Source Separation Colab by jarredou](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb)\n\nSingle models inference - new MDX23C Drumsep model, most if not all Roformers (now also /w 1053, and unwa/Gabo models), plus VitLarge, and Bandit model support for SFX and MelBand Decrowd, plus experimental SCNet, and both MDX23C and Mel-Roformer dereverb.\nBased on ZFTurbo inference repo (optimized dependencies and uses frozen commit as a base to avoid issues with constantly changing repo, it comes from pre-refactoring period).\n\nNo BigShifts here, and fixed overlap issues with Roformers from UVR.\n\nDetailed instruction how to use the Colab:\n<https://rentry.org/msst-colab>\n\nUse drumsep on already separated drums from already good sounding instrumental.\n\nUse 4 stem models at best on already well separated instrumental with a single model.\n\nQ: What is the TTA option?\n\nA: “It means \"test time augmentation\", with ZFTurbo's script, it will do 3 passes on the audio file instead of 1. 1 pass with be with original audio. 1 will be with inverted stereo (L becomes R, R become L). 1 will be with phase inverted and then results are averaged for final output. It gives a little better SDR score, but hard to tell if it's really audible in most cases”\n\n“overlap: This helps improve separation quality slightly. Higher overlap might give better results at the cost of slower processing speeds.” I'll go for 8. For instrumentals, oeverlap higher than\n\n“chunk\\_size: Just leave it default unless the model uses higher chunk\\_size in yaml (the Colab overrides the parameter).\n\nSometimes it might fail to detect files uploaded on GDrive after mounting on Colab was done. Then open the file manager in the Colab and show your input folder, so your file will appear and start working. Sometimes adding a new code field with:\ndrive.mount(\"/content/drive\", force\\_remount=True)\n\nwill be necessary (it forces remounting).\n\n*Colab instruction for newbies*\n\n0. If you plan to use your GDrive for input files, go there now, and create a folder called “input” and upload your files there. Create also \"output\" folder in the root GDrive directory (not sure if the Colab creates both already). That way you may decrease the time till timeout, when the Colab is initialized (esp. for people with slower connection), and also you will avoid an occasional bug when files uploaded on GDrive appear with some delay in the Colab.\n\nThe Colab is case aware - e.g. call your folder \"input\" not \"Input\" to match what is written in the Colab\n\n1. Now open the Colab link in your browser\n\n2. Click the “play” button on the \"GDrive connection\" cell. Grant all the privileges (otherwise there will be an error).\n\nDon't use any other account than you're already logged in in the right top corner (otherwise it will error out).\n\n3. Click the “play” button on the \"Install\" cell, and wait patiently til it's finished (it should show a green checkmark on the side afterwards) - be aware that rarely it can take a longer time than in most cases.\n\n4. Now pick your model in \"Separation\" cell.\n\n5. Click the “play” button on the \"Separation\" cell.\n\nDon't provide any filenames in input\\_folder path there. It will batch process all the files inside the input folder.\n\nDefault settings are already balanced in terms of SDR, and not too resource-intensive (increasing overlap might muddy instrumentals a bit, 8 might have a bit more information on spectrogram iirc in vocals).\n\nTTA increases SDR a bit. I'd leave it turned on, although it will separate 3 times.\n\nChunk\\_size should be left default, as it's the value used by most models, but iirc beta 6 uses higher chunks. Refer to the yaml of the model, as the Colab will override yaml setting.\n\n6. After it's done, it will output the stems in the output.\n\n7. Before closing, go to Environment and delete the environment manually, so you won't exceed your free Colab credits (so you’ll be able to use it e.g. next day).\n\nYou should be able to use the Colab for 3,5h+ per day (I think 4h in at least not one single separation job started).\n\nIf your GPU gets disconnected, change the Google account in the right top corner of the Colab and use the same account to mount GDrive.\n\n[Colab of 6 stem undef13 splifft](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Colab_Inference_BSRofo_SW_fp16.ipynb) - SW BS-Roformer model, but in FP16 (almost identical metrics than in the above, but faster)\n\n[Custom Model Import](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29_CustomModel.ipynb) Version of the inference Colab by jarredou.\nYou can use it if we don’t add any new model to the main Colab on time, or you test your own models.\n\nJust make sure you “you have downloaded the webpage presenting the model instead of the model itself.”\n\nE.g. for yamls from GH, use e.g.:\nhttps://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/config\\_vocals\\_mdx23c.yaml'\n\nInstead of:\nhttps://github.com/ZFTurbo/Music-Source-Separation-Training/main/configs/config\\_vocals\\_mdx23c.yaml'\nAnd for HF, follow the pattern presented in the Colab example (so with the resolve in the address)\n“If you don't delete the failed yaml/ckpts downloads you've made before [e.g. wrong link pasted], the Colab will continue to use them.” so delete the files manually from file manager or restart environment while still getting errors.\n\n[UVR5 UI HuggingFace](https://huggingface.co/spaces/TheStinger/UVR5_UI) ([mirror](https://huggingface.co/spaces/qtzmusic/UVR5_UI)) maintained by NotEddy and hosted by their friend - running on Zero GPU (A100 cluster), it has most models from the inference Colab. Might be faster.\n\n(HF has a quota ~12 min of usage each 2 hours, and it doesn’t have TTA). Some [advice](https://discord.com/channels/708579735583588363/708579735583588366/1263877979696664669) to make it work on PC.\n\n[SESA Colab by yusuf v3](https://colab.research.google.com/drive/1CH2JWd6YculmKSug9zpzuxvM6mSYysdB?usp=sharing) - WebUI for the same ZFTurbo inference code (might differ in available models)\n\n[Multi-arch Colab by Not Eddy](https://colab.research.google.com/github/Eddycrack864/UVR5-NO-UI/blob/main/UVR5_NO_UI.ipynb)\n\nArchs: MDX-Net, MDX23C, Roformers (incl. 1053), Demucs, and all VR models (incl. e.g. de-echo not supported in VR HV Colab) with YouTube support and batch separation.\n\nIf you encounter increased separation time (like 5 hours) using some high parameters for MDX-Net models (e.g. 512 segment size and 0.95 overlap) use another Google account. You could’ve reached free daily limit.\n\nPlus, be aware that the Colab uses broken overlap from OG beta UVR core for Roformers, so the same fix for the issue applies:\n\nDon't set overlap higher than 10 for 1101 segments, and overlap 8 for 801. Best SDR is dim\\_t=1101 and overlap 2.\n\n[Not Eddy’s multi-arch Colab](https://colab.research.google.com/github/Eddycrack864/UVR5-UI/blob/main/UVR_UI.ipynb) in form of UI (like in e.g. KaraFan)\n\nIn case of “FileNotFoundError: [Errno 2]” try other location than “input”, or other Google account in case of ERROR - mdxc\\_separator (helps for both errors).\n\nOr use the Colab below for Roformers instead:\n\n[MVSEP MDX23 jarredou fork Colab v.2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb) (2-4 stems)\n\nIt has adjustable ensemble of BS-Roformer Viperx, Kim Mel-Roformer, UVR MDX-Net HQ\\_4, MDX23C HQ 1, VitLarge, voc\\_ft and has optional output to 4 stems using ensemble of various 4 stem demucs models. Original 1.0 code made by ZFTurbo (MVSEP).\n\nConsider using already well separated instrumental as input from the above Colabs.\n\nYou can manipulate with weights there to have more of a specific model in the final result.\n\nDefault settings can be a good start.\n\nSometimes you might want to disable VitLarge.\n\nAlso, some people like to increase BigShifts to 20 or even 30 with all other default settings (some songs might be less muddy that way),\n\nbut default 3 is already a balanced value, although exceeding 5 or 7 may not give a noticeable difference, while increasing separation time severely over default settings. [Read](#_jmb1yj7x3kj7) for more.\n\n[KaraFan by Captain FLAM](https://colab.research.google.com/github/Eddycrack864/KaraFan/blob/master/KaraFan_Improved_Version.ipynb)\n\nIt allows using currently all notable UVR instrumental and vocal models besides BS-Roformer, also in ensemble (with suggested ensemble presets - start with P5 for instrumentals and P4 for vocals), but with further tweaks and tricks in order to get the best quality of instrumentals and vocals sonically, but without overfocusing on SDR only, but on the overall sound. Usually more vocal residues than instrumental Roformers.\n\n[MDX-Net Colab by HV](https://colab.research.google.com/drive/1GwMEjhczFzdS0Ld7eZzMcZgEmz6Jgv6m)\n\nAll older notable UVR-MDX models are in this fork, including HQ\\_5 (don't confuse with MDX23C arch) - very fast once the Colab initializes, but more vocal residues than instrumental Roformers.\n\n[MDX-Net alternative kae Colab](https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb) (fork of one earlier HV Colab version, not sure if it still works)\n\nIn comparison to the above, it has the old min/avg/max mag mixing algorithms and optional Demucs 2 ensemble for only vocal models.\n\n[VR HV Colab](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl) (even older archs with even more residues, no de-echo model - it’s included in Not Eddy’s HF/Colabs above)\n\n[Demucs 4 - for 4 stems](https://colab.research.google.com/drive/117SWWC0k9N2MBj7biagHjkRZpmd_ozu1) (lower SDR than MDX23 Colab)\n\nYou might want to use here already well separated instrumental with the methods above\n\n[Batch separation for Demucs](https://colab.research.google.com/drive/1KTkiBI21-07JTYcTdhlj_muSh_p7dP1d?usp=sharing) by jarredou (less friendly GUI, but should be usable too)\n\nolder [Drumsep](https://colab.research.google.com/drive/1wws3Qm3I1HfMr-3gAyW6lYzUHXG_kuyz) by Imagoy (newer model above in Music Source Separation Colab) - kick, snare, hi-hat, toms (based on Demucs v3) - also use on already separated drums from already good sounding instrumental\n\n[LarsNet](https://colab.research.google.com/github/jarredou/larsnet-colab/blob/main/LarsNet_Colab.ipynb) - kick, snare, hihats, toms and also cymbals separation (can be worse than the old Imagoy’s based on Demucs at times, but has more stems)\n\n[UVR on hugging\\_space](https://huggingface.co/spaces/r3gm/Ultimate-Vocal-Remover-WebUI) (incorporates VR de-echo not available in HV Colab,\n\nit’s slower than Multi-arch Colab above)\n\nBandit Plus, Mel-Roformer by jazzpear SFX separation - Colab by joowon\n\n<https://colab.research.google.com/drive/1efoJFKeRNOulk6F4rKXkjg63RBUm0AnJ>\n\n[ByteDance-USS](#_4svuy3bzvi1t) (SFX separation based on audio sample, March 2024 update)\n\n<https://colab.research.google.com/drive/1f2qUITs5RR6Fr3MKfQeYaaj9ciTz93B2>\n\nColab by jazzpear94\n\n[MedleyVox Colab](https://colab.research.google.com/drive/10x8mkZmpqiu-oKAd8oBv_GSnZNKfa8r2?usp=sharing) by Cyrus (can be used on MVSEP too)\n\nwith chunking introduced\n\nUse already separated vocals as input (e.g. by [these](#_n8ac32fhltgg) models).\n\nCollabs for upscalers (AudioSR, FlashSR, Apollo and more) - [here](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07)\n\n*Other Colabs*\n\nMVSEP-MDX23 v2\n\n<https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/main/MVSep-MDX23-Colab.ipynb>\n\nMVSEP-MDX23 v2.1\n\n<https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb>\n\nMVSEP-MDX23 v2.2\n\n<https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb>\n\nMVSEP-MDX23 v2.3\n\n<https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb>\n\nNot sure if all of these older versions has the following fix for slow separations:\n\n!python -m pip -q install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/onnxruntime-cuda-12/pypi/simple/\n\njazzpear's soon to be 17-stem separation Colab (probably doesn’t work anymore)\n\n<https://colab.research.google.com/drive/1jrw-cAi-JqZpBi6wyT3YIp3x-XHhDm1W>\n\nSimilarity Extractor\n\n<https://colab.research.google.com/drive/1WP5IjduTcc-RRsvfaFFIhnZadRhw-8ig>\n\nBut Audacity's center extraction which can be used also online works better:\n\n[wavacity.com](http://wavacity.com)\n\nThere was also MDX23C model for the same purpose released:\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2417116936>\n\n*Useful repositories*\n\nPython command line fork of UVR 5 with current models support\n\n<https://github.com/karaokenerds/python-audio-separator>\n\n(it used to have the same broken overlaps from UVR for Roformers)\n\nOG repo on which jarredou’s single models Colab separation is made\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\n(can be used locally both for inference [separation] and training)\n\nIt has other GUI too:\n\n<https://github.com/AliceNavigator/Music-Source-Separation-Training-GUI>\n\nUsing only CPU in the GUI might be fixed by changing line 149 to\n\ndevice = 'cuda'\n\n<https://github.com/AliceNavigator/Music-Source-Separation-Training-GUI/blob/66ada053a623a20865cac7b9d26a02615204d178/inference.py#L148> ~frazer\n\nWebUI:\n\n<https://github.com/SUC-DriverOld/MSST-WebUI>\n\nOther GUI for UVR\n\n<https://github.com/TheStingerX/Ilaria-UVR>\n\nGood paid sites:\n\n- [dango.ai](http://dango.ai) (expensive, one of the best results for instrumentals)\n\n- [moises.ai](http://moises.ai) (probably in-house BS-Roformer models)\n\n- [studio.gaudiolab.io](http://studio.gaudiolab.io) (a.k.a. GSEP, still good for specific cases)\n\n- [Music AI](https://music.ai/) - better results than those on Moises (same team). $25 per month or pay as you go, pricing chart, no free trial, Good [selection](https://cdn.discordapp.com/attachments/708579735583588366/1206684280625963018/image.png) of models and interesting [module stacking](https://cdn.discordapp.com/attachments/708579735583588366/1206353306767728752/image.png) feature. To upload files instead of using URLs “you make the workflow, and you start a job from the main page using that custom workflow” by [~ D I O ~].\n\n“Bass was a fair bit better than Demucs HT, Drums about the same. Guitars were very good though. Vocal was almost the same as my cleaned up work. (...) I'd say a little clearer than mvsep 4 ensemble. It seems to get the instrument bleed out quite well, (...) An engineer I've worked with demixed to almost the same results, it took me a few hours and achieve it [in] 39 seconds” by Sam Hocking\n\n- [Audioshake](#_tc4az79fufkn)\n\n- [Myxt](https://myxt.com/) - 3 stem model, unfortunately, it has/had WAVs with 16kHz cutoff which Audioshake normally doesn't have. No other stem. Results, maybe slightly better than Demucs. Might be good for vocals.\n\n*Thanks to Mr. 𝐂𝐑𝐔𝐒𝐓𝐘 ᶜʳᵃᵇ for gathering lots of the links.*\n\nMore online site descriptions\n\n<https://mvsep.com/> (FLAC, WAV 24 bit/32 bit for MDX instrumentals and Demucs, Roformers, 100MB per file limit, MP3 320kbps available, 512 window size for VR models (all UVR 5 GUI models including WiP piano [it's better than Spleeter worse than GSEP]), /wo HQ\\_4, big choice of various architectures and models.\n\nGood instrumental models: MDX23C 16.66, MDX B>HQ\\_3, BS-Roformer 17.55\n\nGood vocal models: MDX B>voc\\_ft, MDX23C 16.66 (more bleeding, better quality\n\nEnsemble for paid users (instrumentals have fewer residues than 2.4 Jarredou Colab, but are muddier)\n\n(old) In **Demucs 3-UVR** instrumental models - model 1 is less aggressive, model 2 is more destructive (sometimes it happens the opposite, though), the “bag” leaks even more,\n\nalso, regular 4 stem model B - mdx\\_extra from Demucs 3 and also HT Demucs 4 (better ft model). For UVR-MDX models choose MDX model B, and the new field will appear. Biggest queue in the evenings till around 10 PM CEST, close to none around 15:00 (working days).\n\n<https://x-minus.pro/ai> (10 minutes daily limit for free users - it can exceed like 1 minute on the last song)\n\nCurrently, more models are available for free (like MDX and Roformer models), but some more resource hungry methods like drums max mag are behind paywall.\n\nGood methods:\n\nModels ensembled - available only for premium users:\n\n- demudder (used on Mel-Roformer)\n\n- Mel-Roformer + MDX23C\n\n- drums ensemble max\\_mag with Roformer\n\n(old) Previously for free users only one UVR model without parameters for \"lo-fi\" option was available (unreleased model, mp3, 17kHz cutoff) and Demucs 3 (2 stem) (or 6 stems?) for registered users (site by of the authors) and Demucs 4 (4 stem) for premium users (and its better htdemucs\\_ft model for songs shorter than 5 minutes [better equivalent of previous demucs\\_extra model which wasn't quantized) and 7-8 minutes in the future (not sure if it also got replaced by 6s model for premium users as well).\n\nBesides WAV, paid users get exclusive unreleased VR model when aggressiveness is set to minimum.\n\nAs the site development dynamically progresses, some info above can be outdated.\n\n##### MSST / MSST-GUI by ZFTurbo\n\nRepository of MVSEP creator and model trainer: <https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\nIt can be used either for training or also inference (separation using models).\n\nMSST-[GUI](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md) by Bas Curtiz (default list of models can get outdated, but you can provide file paths manually there too) other GUIs linked at the bottom. The GUI has screen reader compatibility unlike UVR 5.6.1.\n\nUnlike in UVR, MDX-Net v2 and VR archs are unsupported here\n\nDon’t install Python from the MS Store, or you might get a “1920” error about privileges.\n\nPython newer than 3.12 might not work correctly.\n\nWorks on CPU or NVIDIA GPUs - by default it uses GPU if it’s properly configured (in case of GPU acceleration issues, some commands for it are listed later below).\n\nIn some specific cases MSST can separate up to 3 times faster than UVR on some GPU archs with e.g. 8GB GPUs (but on some 4GB it might be even slower).\nIt was also tested on Linux and ROCm 6 and 7 on AMD GPU (instructions later below, might work with WSL on Windows which uses Ubuntu, iirc only one RX series is supported on Windows natively).\nAt least BS and Mel Roformers work on Mac ARM (M1-X) - maybe with GPU acceleration too (the code has some MPS references at least - otherwise check [this](https://github.com/axeldelafosse/BS-RoFormer/) too, for Mel [read](https://github.com/axeldelafosse/BS-RoFormer/?tab=readme-ov-file#usage-1) later). No DirectML support for now (unless someone found a way along the time).\n\n[Sucial’s WebUI](https://huggingface.co/Sucial/MSST-WebUI/tree/main/1.7.0) (if you have a hard time setting your local Python environment with the above, you could try out portable installation of fork, but I cannot guarantee the compatibility with all the latest models - the 1.7.0 code derives from April 2025)\n\n\\*. For errors while installing py file for e.g. HyperACE model in Sucial’s WebUI:\n\n*“from models.bs\\_roformer.attend import Attend*\n\n*ModuleNotFoundError: No module named 'models'\"*\n\n>“SUC-DriverOld/MSST-WebUI uses the name \"*modules*\" and ZFTurbo/Music-Source-Separation-Training use the name \"*models*\". And Unwa's bs\\_roformer.py that you replace with, also use \"*models*\".” - fjordfish\n\n“In order to make it work in Sucial MSST Web UI you have to edit line [e.g.] #8 in the bsroformer.py file that is included with [e.g.] SiameseRoformer model and change the word \"models\" to \"modules\". ” - rage313\\_\n\n1. If you deal with dequantization error while occurring on e.g. crowd model on 1 hour mp3 file, use this repo instead:\n<https://github.com/jarredou/Music-Source-Separation-Training/tree/colab-inference>\n\n“It’s the one used in Colabs” - jarredou. Sometimes MSST updates might break things, while here a certain older checked commit is used.\n\n2. “For some reason when I use unwa models it just like gives back a really quiet resampled version of whatever I put in.”\n\nA: “Check that you are using an up-to-date version of the repo, IIRC, an edit made some months ago to Roformers code was creating weird issues similar to this for some people and was removed later”\n\nQ: “Works now”\n\nHint: Requirements just for inference might be faster to install, like presented in the Installation cell in [this](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_%28Colab_Inference%29.ipynb) Colab.\n\n3. For state\\_dict error using existing MSST installation, update MSST to the last repo version with:\n\n!rm -rf /content/Music-Source-Separation-Training\n\n!git clone https://github.com/ZFTurbo/Music-Source-Separation-Training\n\n“and you must reinstall the main branch's requirement.txt. (before it, edit requirements.txt to remove wxpython)” - Essid\n\nwxpython is for GUI.\n\n- Officially [supported](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) AMD consumer GPUs for ROCm on Linux are:\nRX 7900 XTX, RX 7900 XT, RX 7900 GRE and AMD Radeon VII on Ubuntu 24.04 LTS using Pytorch 2.6 for ROCm 6.3.3, but also RX 9070 and RX 9070 XT and RX 6700 XT should be manageable to work, and probably 5700 XT with older ROCm version.\n\n- For RX 7900 XTX “No special editing of the code was necessary. All we had to do was install a ROCm-compatible version of the OS, install the AMD driver, create a venv, and install ROCm-compatible PyTorch, Torchaudio, and other dependencies on it.” - unwa\n\n- ROCm 7 officially working with Instinct MI350 CDNA 4 was released, providing 3-7x performance gains over 6.0 ([more](https://www.techpowerup.com/341074/amd-launches-rocm-7-0-up-to-3-8x-performance-uplift-over-rocm-6-0)). You can try your lack with it on other GPUs by e.g:\n\n*pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0*\n\n- Since then also ROCm 6.4.4 allowing using PyTorch natively on Linux and Windows on RX 7000 and 9000 was released ([more](https://www.techpowerup.com/341329/amd-enables-pytorch-on-radeon-rx-7000-9000-gpus-with-windows-and-linux-preview)), but one of our users had building errors with MSST ([fix](https://discord.com/channels/708579735583588363/708595418400817162/1449395523449520209)), and also separate [instruction](https://discord.com/channels/708579735583588363/708595418400817162/1449671635350192280) was written for MSST-WebUI. Instead, you could consider using WSL on Windows instead too (it’s Ubuntu with direct GPU access on Windows with near zero performance overlay, and even GUI support in newer WSL/Windows version), then follow e.g. the below on WSL:\n\n- “I managed to make [MSST-WebUI](https://github.com/SUC-DriverOld/MSST-WebUI) work [on Linux] with:\n\n*Torch 2.10.0.dev20251110+rocm7.0*\n\non RX 7600\n\n(...) it seems like ROCm 7.0 is about a second faster [than 6.x]”\n(probably by adding just pip install before it)\n\nturns out that if you do:\n\n*export TORCH\\_ROCM\\_AOTRITON\\_ENABLE\\_EXPERIMENTAL=1*\n\nit uses waay less VRAM and processes even faster\n\ninst\\_V1e\\_plus batch\\_size=2 overlap=3 chunk\\_size= 485100, 51.78s/it [3:50 of audio in 61 seconds]\n\nFor ROCm 6.x (a tad slower, might work on more GPUs):\n\n*torch 2.9.0+rocm6.3 torchvision0.24.0+rocm6.3 [--index-url https://download.pytorch.org/whl/rocm6.3]*\n\nThanks, fr4z49.\n\nOfficial support for PyTorch on RX 400/500 (a.k.a. Polaris/GCN 4/GFX803) GPUs was dropped, but you can follow [this](https://github.com/robertrosenbusch/gfx803_rocm) Ubuntu guide for unofficial ROCm 6 support (it might even potentially work from Windows using WSL with almost no GPU performance overhead).\n\nOr for ROCm 5, read [this](https://github.com/nikos230/Run-Pytorch-with-AMD-Radeon-GPU) Ubuntu guide.\n\nAlso, there seems to be some Arch Linux community package to install Pytorch still compatible for these GPUs ([click](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux)).\n\nOr optionally also might be potentially supported with some other specific versions of ROC, e.g. 5.7.2 and also described above:\n\n*export ROC\\_ENABLE\\_PRE\\_VEGA=1* (deprecated in ROCm 6; might help for lacking dependencies or wheel building issues). Or also check out this:\n\n<https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/10435#issuecomment-1555399844>, or alternatively follow below instructions:\n\n<https://pytorch.org/get-started/locally/> and then execute:\n\n*pip3 install torch torchvision torchaudio --index-url* [*https://download.pytorch.org/whl/rocm5.4.2*](https://download.pytorch.org/whl/rocm5.4.2)\n\n4. If you have SageAttention error, you need an arch corresponding to e.g.: RTX 5000, 4000, 3000, H100, H200 (Ada Lovelace, Hopper, Ampere, Blackwell) which will probably work out of the box. Otherwise, it will probably fall back to CPU. To fix it, make sure you have installed CUDA/Torch/Torchvision/Torchaudio compatible with your GPU\n\n(probably it will work down to Maxwell GPUs (not sure about Kepler):\n\n“For the [GTX] 1660 the minimum [CUDA] version is 10” (on GTX 1060, Torch 2.5.1+cu121 can be used), but pip doesn’t find such a package of Torch (and usually it fixes issues when CPU is only used on those GPUs).\n\nCheck out index-url method described later below:\n\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 torchvision torchaudio —-extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nand\n\npip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nReplacing cu118 with newer cu121 seems to give a proper working URL too.\n\nMaybe replacing 2.3.0 with 2.3.1 will work too. cu126 is the latest supported for GTX 1080, while cu128 isn’t (but supports RTX 5000 series), although it works with torch 2.7.0 which can cause unpickling errors with some models:\n\"pip install torch==2.7.0 torchvision --upgrade --index-url https://download.pytorch.org/whl/cu126\".\n\nJFYI, the official PyTorch page: <https://pytorch.org/get-started/previous-versions/>\n\nlacks links for CUDA 10 compatible versions for older GPUs other than v1.12.1 (which is pretty old, and might be a bit slower if even compatible at all), so the only way to install newer versions for CUDA 10 is --extra-index-url trick, as executing normally “pip install torch==2.3.0+cu118” will end up with the version not found error.\n\nAlternatively, you might also try out installing it from wheels from [here](https://download.pytorch.org/whl/torch_stable.html) by the following command:\n\n“pip install SomePackage-1.0-py2.py3-none-any.whl” - providing a full path with the file name should do the trick. Just for the location with spaces, you also need \" \". On GTX 1660 and Turing GPUs, you might seek for e.g. cu121/torch-2.3.1\" and those various CP wheels (there are no newer versions). But the -extra-index-url trick above should be enough.\n\n4.1. After performing all of these, you might still have SageAttention not found error on GPUs up to Turing arch. Then perform the following:\n\n“Had to replace cufft64\\_10.dll from C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\torch\\lib\n\nby the one from C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin”\n\nIt is even compatible with the newest Torch 2.8.0 (if you followed the instruction to fix the dict issue above) if you grab that apparently “ fixed version of cufft64\\_10.dll from CUDA v10.0” - dca\n\n5. If you write Python in CMD, and it wasn't found, start with method 2 described here:\n\n<https://www.liquidweb.com/help-docs/server-administration/windows/adding-python-path-to-windows-10-or-11-path-environment-variable/>\n\nOr make sure you've checked the option to add path environmental variable during Python installation.\n\nAlso, you can try out “disabling the python executable in app execution aliases.” - neoculture\n\n6. For state\\_dict = torch\\_load/unpickling\\_error\n\n“add the following line above torch.load (at utils/model\\_utils.py line 479):\n\nwith torch.serialization.safe\\_globals([torch.\\_C.\\_nn.gelu]):\n\n[So, the code like the following:]\n\nelse:\n\nwith torch.serialization.safe\\_globals([torch.\\_C.\\_nn.gelu]):\n\nstate\\_dict = torch.load(args.start\\_check\\_point, map\\_location=device, weights\\_only=True)\n\nmodel.load\\_state\\_dict(state\\_dict, strict=False)\n\n“(~unwa)\n\n\\*. For “np.complex” error with incompatible Numpy (Python 12) execute:\n\npip install numpy==1.26.4\n\npip install -U librosa audiomentations\n\n7. For: “failed to build diffq pesq” [click](https://discord.com/channels/708579735583588363/708595418400817162/1430693276272300052)\n\n8. \"ERROR: No matching distribution found for pedalboard~=0.8.1\" when MSST requirements\n\nfix: “downgrade python” - Stray Kids Filters\n\n*More notes:*\n\n- 4GB VRAM GPUs will give out of memory errors on for Roformers. You can use CPU instead, or potentially decreasing chunk\\_size as described [here](#_c4nrb8x886ob) might help too.\n\n- Leave the checkbox “extract instrumental” disabled for duality or potentially other models with more than one stem target (it will have worse quality than dedicated stem output)\n\n*CML guide by mesk (working on RTX 3070 Mobile):*\n\n“0 – You need Python:\n\n<https://www.python.org/downloads/>\n\n0a – I would also recommend installing Pytorch too from here: <https://pytorch.org/get-started/locally/>\n\n(grab the command and enter it into the command prompt)\n\n0b – But then you can just also double-click on guiwx.py on the repo, and it is much easier.\n\nThat's the harder method with the command prompt\n\n1 – Go there: <https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\nand clone the repository (click on Code => Download as zip)\n\n2 – Go to the repo folder, create 3 new folders: results, input and separation\\_results.\nPlace your tracks in the input folder. Place the checkpoint in results and leave the yaml at the root of the repo (where inference.py and requirements.txt are)\n\n3 – Open command prompt, type in cd C://Users/[YOURUSERNAME]/Desktop/Music-Source-Separation-Training-main\n(changes directory to the repo folder on your desktop)\n\n4 – Type in: python install -r requirements.txt\n\n5 – Let it install the requirements\n\n6 – Type in: python inference.py --model\\_type mel\\_band\\_roformer --config\\_path [NAME OF YAML] --start\\_check\\_point results/[NAME OF CHECKPOINT] --input\\_folder input/ --store\\_dir separation\\_results/ --extract\\_instrumental\n\n7 – Make sure to replace the stuff in brackets with the actual stuff you need”\n\nIf you have decent Nvidia GPU, and no GPU acceleration, maybe “Check these commands to install torch version that handle CUDA”:\n\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 torchvision torchaudio —-extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\nor\n\npip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118\n\n[various GPU archs will have different CUDA requirements for different Torch versions, refer to documentation]\n\nFAQ\n\nQ: “When running inference, I am getting tired of it creating a folder for each input file and putting individual stems inside that folder. Especially since most of the time I'm running single stem models and I only need the primary stem. I remember on a much older version, it wouldn't create folders, it would just copy the original filename and put the stem name at the end of it. So I was wondering what I could modify in newer versions to restore that behavior. I'm guessing that would be in inference.py, but don't exactly know where to look.” - Musicalman\n\nA: “You can download the \"old\" version from my forked repo's \"colab-inference\" branch, with old behaviour for results and folders. It's the version used in my colab notebooks, it should be preselected with that link:\n<https://github.com/jarredou/Music-Source-Separation-Training/tree/colab-inference>” - jarredou\n\n- torch.load and load\\_state\\_dict, errors\n\nA: “PyTorch 2.6 and later have improved security when loading checkpoints, which causes the problem. torch.\\_C\\_.nn.gelu must be set to exception” or “add the following line above torch.load (at utils/model\\_utils.py line 479)\n\nwith torch.serialization.safe\\_globals([torch.\\_C.\\_nn.gelu]):\n\n“don't forget to align the indentation since it's Python code.”- unwa\n\nAlso in the case of the unwa’s FNO model: “edit the model file\n\nAs I mentioned in the model card, you need to change the MaskEstimator”\n\nQ: “I've been using a really old version of msst for a while and finally decided to update it today. I noticed that the gui-wx.py file was moved to the gui folder (it used to be in the root). So now when I try to launch the gui, I get file not found errors. Gui still works at least for screen reader users like myself, but these errors should definitely be fixed.\n\nI'm wondering if I should be trying to launch the gui from the main msst folder, or if I should be launching it from the gui folder. Either way I get errors. I could fix them by modifying paths in gui-wx.py, just need to know which folder I should be starting from lol” - Musicalman\n\nA: “You can use python gui/gui-wx.py and replace (130-131 strings) right now:\n\nfont\\_path = \"gui/Poppins Regular 400.ttf\"\n\nbold\\_font\\_path = \"gui/Poppins Bold 700.ttf\"\n\nI will fix it at next pull-request” Kisel\n\n##### MVSEP models from UVR5 GUI explained\n\n- Ensemble option - further developed custom code of the original MDX23 (not available in UVR) - custom tech, consisting of various models from UVR and in-house, non-public models unavailable in UVR\n\n- Demucs 4 ft - (settings might be shifts 1 and overlap 0.75 as he tested once) - same as in Colabs or UVR\n\nMDX B (not sure about whether min, avg, max is set) - the option has MDX arch models:\n\n- Newest MDX models added - Kimberley - Kim inst (ft other), Kim Vocal 1 & 2, HQ\\_2, 3\n\n- 8.62 2022.01.01 - is NET 1 (9.7) with Demucs 2, this one has a new name now. It had slightly bigger SDR in the same [multisong](https://mvsep.com/quality_checker/leaderboard2.php?&sort=instrum&page=40) dataset as the newer model below - discrepancy vs UVR5 SDR results might be on the server side (e.g. different chunks), so it might be still the same. The dates can only relate to date of adding the model to the site and nothing more, not sure here, but it might be it - NET 1 is older model than below indeed. Looks like that the model is used with Demucs 2 enabled (at least he said it was configured like this at some point)\n\n- 8.51 2022.07.25 - might be vocal 423 a.k.a main, not sure if with Demucs 2 (judging how instrumental from inversion in 423 looked like - cannot be any inst model yet, since they were released in the end of 2022 - epoch 418 in September to be precise) - it was tested in multisong dataset on page 2 as MDX B (UVR 2022.07.250 - the date is the same as before, so nothing new here), can't say now if Demucs 2 is used here. In times of 9.7/NET 1 it was decreasing SDR a bit on I don't know which dataset, but instrumentals usually sounded kinda richer with this enabled. Now it's better to use other models to ensemble.\n\nThe change in MDX-B models scheme was probably to unify SDR metrics to multisong dataset.\n\n- Demucs 3 Model B - mdx\\_extra (and rather not mdx\\_extra\\_q as ZFTurbo said it's \"original\" and used mdx\\_extra name on the channel referring to this model; in most cases the one below should be better)\n\n- Ultimate Vocal Remover HQ - the option has VR architecture models\n\nWindow size used - 512\n\n[Here](#_1wojovpsoqy) are all VR arch model names\n\n- UVRv5 Demucs - rather the same names\n\n- MVSEp models - unavailable in UVR5\n\n- MDX B Karaoke - Possibly MDX-UVR Karokee or MDX-UVR Karokee 2 (karokee\\_4band\\_v2\\_sn irc), maybe the latter\n\nThe rest below on the MVSEP’s list is outdated and not really recommended to use anymore\n\n*Issues using MVSEP*\n\n- NaN error during upload is usually cause by unstable internet connection, and it usually happens on mobile connections when you already upload more than one file elsewhere.\n\nIf you have NaN error, just retry uploading your file.\n\n- Rarely it can happen after upload that error about not uploaded file occurs - you need to upload your file again.\n\n- If you finished separation and click back, model list can disappear till you won’t click on other algorithm and pick yours again. But if you click separate instead, it will process with the first model which was previously on the list (at least if it was also your previous choice).\n\n- Slow download issues. Separation was complete, and I was listening to the preview when playback on the preview page simply stopped, and couldn't be started. Main page didn't load (other site worked).\n\nAlso, I couldn't download anything. It showed 0b/s during attempt of downloading.\n\nTwo solutions:\n\n- close all MVSEP tabs completely and reopen\n\n- Connect to VPN, preview some track, but after a short time, the same can happen and nothing is playing or buffering. Then fire up Free Download Manager, and simply copy the download link there, and it will start downloading. Later, the browser can also start downloading something you clicked a moment a go. Crazy.\n\n—\n\nComparing to MDX with 14.7 cutoff, depending on a track, VR models only (not MDX/Demucs) might leave or cut more instruments or leave more constant vocal residues, but in general VR is trained model at 20kHz with possible mirroring covering 20-22kHz, generally less aggressive vocal removing (with exceptions) but most importantly, comparing to MDX, VR tends to leave some specific noise in a form of leftover artifacts of vocal bleeding, but from the other hand MDX, especially models with cutoff, can be more muddy and recall original track mastering less.\n\n\\_\\_\\_\\_\\_\\_\n\n#### Manual ensemble Colab\n\nYou can perform manual ensemble on your own files in UVR5 under \"Audio Tools\" or beside [DAW method](#_oxd1weuo5i4j), you can also use:\n\n*Ensemble Colab for various AI/models*\n\nIf you want to combine ready result files from various MDX and Roformer models or different archs/AIs from external sources using Google Colab, here’s a notebook for you:\n\n<https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Manual_Ensemble_Colab.ipynb>\n\n*(implementation of ZFTurbo code with drop-down menus plus manual weights and various ensemble algorithms by jarredou)*\n\nOnce you mount GDrive, open file manager on the left, right click>copy path>paste in the first input field, then for the second file in the second field and so on and so forth. Then you can change type from max to e.g. avg or set weights manually - so to have the specified amount of one model in the result file (you could listen to the imported stems altogether in DAW to actually know what you’re doing).\n\nUse 896 or 1024 n\\_fft ([more](https://discord.com/channels/708579735583588363/1354486974395846686/1359816902569889824)).\n\nPossible fixes for the errors\n\n- Don’t use spaces in output file name\n\n- “AttributeError: module 'numpy' has no attribute 'float'\n\nnp.float was a deprecated alias for the builtin float.”\n\n> “Try to rerun the install cell, this issue is because of a problem with numpy version\n\nif it doesn't work you can force numpy upgrade by creating a new code cell with:\n\n!pip install -U numpy\n\nSometimes install cell ask you to restart runtime because of numpy version too, if you don't say yes, you have to restart runtime by yourself to make it work”\n\n“Try forcing librosa update too:\n\n!pip install -U librosa\n\nHave you tried to delete runtime and restart it from scratch ?\n\nIt's weird that these issues happen again, they were lots of these with old colab, but for recent ones, not much”\n\n“If you face this error again, you can update the 2 libs at the same time with:\n\n!pip install -U numpy librosa” -jarredou\n\n- “ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 2) + inhomogeneous part.”\n\n> In some specific cases (not always) converting your file to 32-bit float WAV might help.\n\nNot sure if exactly the same length is necessary. But you can check it if above fails.\n\nAlso, lossy files will not align with lossless, and also files with different sample rate.\n\n> For one person helped converting files to 320kbps mp3 for the ValueError\n\n- ValueError: Homogenous shape\n\n>? anything from the above\n\n*\\_\\_\\_\\_*\n\n*(Old not working Colab by ZFTurbo*\n\n<https://colab.research.google.com/drive/1fmLUYC5P1hPcycI00F_TFYuh9-R2d_ap?usp=sharing> <https://cdn.discordapp.com/attachments/708912741980569691/1102706707207032833/Copy_of_Ensemble.ipynb>\n\nLast backup\n\n<https://drive.google.com/file/d/1k1jD_sOWKLish2T3_pZoYpeE1DGwGfG3/view?usp=sharing>\n\nWe got two reports that it throws out some errors now, and could stop working due to some changes Google made into Colabs this year)\n\nYou should be able to modify it to use with three models and different weights like 3, 2, 2 in example of Ensemble MDX-B (ONNX) + MVSep Vocal Model + Demucs4 HT on the old SDR chart (so it does not work like avg/avg in GUI).\n\n\\_\\_\\_\n\n#### Joining frequencies from two models\n\nSometimes it may happen that a regular ensemble even with min spec doesn't give you complete freedom over what you want to achieve, having one cleaner narrowband model result with fullband model result with more vocal residues, but you still want to have a full spectrum.\n\nInstead of using ensemble Colab, you can also mix in some DAW, MDX-UVR 464/inst3 or Kim inst model result which have 17.7Hz cutoff, with HQ\\_1-5 or Demucs 4 result, which has full 22kHz training frequency model.\n\nFirst, import both tracks. Now rather the most correct attitude to avoid any noise or frequency overlapping is to use [brickwall highpass](https://i.imgur.com/5PpmmfI.png) in EQ at 17680Hz everywhere on Demucs 4 stems, and leave MDX untouched, and just it. You can use GSEP instead of Demucs 4 (possibly less vocal residues).\n\nIf you want to experiment further, as for a cut-off, once I ended up with 17725.00 flat high pass with -12dB slope for \"drums\" in Izotope Elements EQ Analog and I left MDX untouched. “Bass” stem set to 17680.00 in mono and \"other\" in stereo at 17680.00 with Maximiser with IRC 1 -0.2, -2, th. 1.55, st. 0, te 0. But it might produce hissy hi-hat in places with less busy mix or when hi-hat is very fast, so tweak it to your liking.\n\nFor free EQ you can use e.g. TDR Nova - click LP and set 17.7 and slope -72dB.\n\nAs a free DAW you can use free Audacity (new versions support VST) or Cakewalk, Pro Tools Intro, or Ableton Lite.\n\nThe result of above will probably cause a small hole in a spectrum, and a bit lack of clarity. Alternatively, you can apply resonant high pass instead of brickwall, so the whole will be filled without overlapping frequencies.\n\nInstead, you can also consider using linear phase EQ/mode like in free Qrange and its high pass to potentially cause less problems in phase.\n\nSimilar method to this can also be used for *joining YT Opus frequencies above 15750Hz with AAC* (m4a) files, which gives more clarity compared to normal Opus on YT. Read [this](#_6543hhocnmmy).\n\n#### DAW ensemble\n\n*Averaging*\n\nThe counterpart of avg ensemble from UVR ([more](#_lczfb0e870z9)) can also be made in a DAW (Audacity/Cakewalk/Reaper etc.). When you drag and align all stems you want to ensemble in your DAW, you simply need to lower the volume of stems according to the number of imported stems to ensemble.\n\nIt's -3 dB per one stem for replicating avg spec, so for a pair you need to decrease the volume of two stems by 6 dB (possibly by 6.02 as well).\n\nSo, when you add another stem (so for 3 models ensemble), you need to decrease the volume of all stems by 9dB, and so on.\n\nThe other way round, it's 3dB decrease for all stems every time you import a new track.\n\n*Weighting manually (more precise)*\n\nYou can change volume of stems to your liking, just to not cause clipping having too loud output on master fader once you play. You can circumvent the problem to some extent using a limiter on the sum, but it might be not necessary.\n\nAlso, you can use different volume automation of stems towards specific verses and choruses, or just different volume relation of stems whenever a new verse or chorus appear.\n\nNote: You won’t be able to use that method if one stem had phase rotated or flipped.\n\n\"I've made some tests by simply overlaying each audio above each other and reducing their volume proportionally of the number of audio overlays (like you would do in a DAW), it scores like ~0.0002 better SDR than UVR's average.\"\n\nYou can use Audacity online at wavacity.com, although it might crash occasionally while using on at least smartphone.\n\nBandlab is more stable while using as app: <https://www.bandlab.com/explore/home>\n\nbut also crashes when used in PC mode online.\n\nThe app at least vertically doesn’t show master fader, so you cannot control the output volume meter. Probably the same in horizontal view. Plus, the app doesn't give the possibility to adjust the gain precisely, e.g. to 9dB instead of -9,5dB, so to use single files with found gain values in Wavacity, to mix them without crashes, you can use the [manual ensemble Colab](https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Manual_Ensemble_Colab.ipynb).\n\nIf you want to just listen to stems offline, change their volume, panning, solo, mute, you can download [this](https://drive.google.com/file/d/1fgBW5gqPz7J3u-u772Cab88IEVv8TvyX/view?usp=sharing) html (by cypha\\_sarin) and run it locally in your browser.\n\n#### Manual ensemble in UVR5 GUI from single models (e.g. from inference Colabs or online sites)\n\n\"You can use Colabs to make individual model separations and then use the manual ensemble tool from UVR5 GUI to merge them together (you don't need special CPU/GPU to use that tool and it's fast! 15-year-old computers can handle it).\n\nIn UVR GUI > process method = Audio Tools, then choose \"Manual Ensemble\" and the desired ensembling method.\"\n\nCombine input is even more aggressive than Max Spec.\n\nE.g. it takes two -15 ilufs songs, and makes pretty loud -10 ilufs result.\n\nTo potentially deal with harshness of such output, you can set quality in options to 64 bit (sic!), or possibly manually decrease volume of ensembled files before passing through UVR Combine Inputs.\n\nCombine input was good for ensembling KaraFan results of preset with the least amounts of residues, and preset 5 for more clarity, but a bit more residues. The instrumental result was fuller sound, better snares and clarity.\n\nThe downside is, you cannot control gain of ensembled stems precisely like in DAW, or using Colab.\n\n#### Model fusion\n\nYou can perform fusion of models using [ZFTurbo script](https://drive.google.com/file/d/18E5uTSVJV6rn8gTsOc0RC1m12lJFJGmP/view?usp=sharing)([src](https://discord.com/channels/708579735583588363/1220364005034561628/1386610707042271243)) or [Sucial script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) (they’re similar if not the same). “I think the models need to have at least the same dim and depth, but I'm not sure about that” - mesk.\n\nThey allow creating one model out of weighted models with specified parameters, so only one model is needed to inference instead of two or more.\n\n*How to separate in UVR using multiple models in batch to compare the results for the best manual ensemble?*\n\nSimply use **Ensemble Mode**, but before, go to Settings>Additional Settings and enable “**Settings Test Mode**” (adds 10-digits to every separation file name, so you won’t overwrite the result of the same models with different settings) and “**Model Test Mode**” (adds model name to every output file name, so the file won’t get overwritten by any other model separation) and now go to Settings>Settings Guide>Choose Advanced Menu>**Ensemble Customization Options** and enable “**Save All Outputs**” (now when you choose models to separate in Ensemble, intermediate files won’t be deleted, so not only min/max/avg mag ensemble result file will be left, but also result of separation from single models which you can use later to check the result manually for [manual weighted ensemble](#_oxd1weuo5i4j) e.g. in DAW or in [Colab](#_surlvvp6mr8f)).\n\n## UVR’s VR architecture models\n\n###### (settings and recommendations; *mostly outdated arch for all vocals and instrumental models*) Available on Colab, HF, UVR, UVR old CLI, MVSEP\n\n##### VR Colab by HV\n\n(old) <https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/main/Vocal%20Remover%205_arch.ipynb>\n\nUse [this](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl?usp=sharing) fixed notebook for now (04.04.23)\n\nSometimes Google Colab might break itself (e.g. error: No module named 'pathvalidate'), and then you can simply try to go to Environment and delete it entirely and start over, and then it might start working.\n\n(since 17.03.23 the official link for HV Colab at the very top stopped working (librosa, and later pysound related issues with again YT links, but somehow fixed)“!pip install librosa==0.9.1” in OG Colab fixes the issue and is only necessary for both YT and local files and clean installation works too.)\n\n- HV also made a [new](https://colab.research.google.com/drive/1VnqwFkpjPLjMwUPmgjoZJQR1S8hd6CBJ) VR Colab which irc, now don’t clutter all your GDrive, but only downloads models which you use (but without VR ensemble) and probably might work without GDrive mounting.\n\n(Google Colab in general allows separating on free virtual machine with decent Nvidia GPUs - it's for all those who don't want to use their personal computer for such GPU/CPU-intensive tasks, or don’t have Nvidia GPU or decent CPU, or you don’t want to use online services - e.g. frequently wait in queues, etc.)\n\nVideo tutorial how to use the VR Colab (it’s very easy to use): <https://www.youtube.com/channel/UC0NiSV1jLMH-9E09wiDVFYw>\n\nYou can use VR models in UVR5 GUI or\n\nTo use the above tool locally (old command line branch for VR models only):\n\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v5-beta-cml>\n\nInstallation tutorial: <https://www.youtube.com/watch?v=ps7GRvI1X80>\n\nIn case of CUDA out memory error due to too long files, use Lossless-cut to divide your song into two parts,\n\nor use this Colab which includes chunks option turned on by default (no ensemble feature here):\n\n<https://colab.research.google.com/drive/1UA1aEw8flXJ_JqGalgzkwNIGw4I0gFmV?usp=sharing#scrollTo=I4B1u_fLuzXE>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nBelow, I'll explain Ultimate Vocal Remover 5 (VR architecture) models only (fork of vocal remover by tsurumeso).\n\n**For more information on VR arch, see here for official documentation and settings:**\n\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v5-beta-cml>\n\n<https://github.com/Anjok07/ultimatevocalremovergui>\n\n**The best**\n\n#### VR settings\n\nExplained in detail\n\nSettings available in [Colab](https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl?usp=sharing) and in CLI branch, and also UVR 5 GUI (but without at least mirroring2. mirroring in UVR5 GUI for VR arch got replaced entirely by High End Process (works as mirroring now, and not like original High End Process which was originally dedicated for very old 16kHz VR models only).\n\nThese VR models can be used in this 1) [Colab](https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/main/Vocal%20Remover%205_arch.ipynb) or in 2) [UVR5 GUI](https://github.com/Anjok07/ultimatevocalremovergui) or on 3) [mvsep.com](http://mvsep.com) (uses 512 windows size, aggressiveness option, various models) 4) x-minus.pro/uvronline.app (for free one UVR (unreleased) model without parameters (\"lo-fi\" option, mp3, 17,7 kHz cutoff) [Demucs 4 for registered users iirc (site by Aufr33 - one of the authors of UVR5)]\n\nI had at least one report that results for just VR models are better using Colab above/old CLI branch instead of the newest UVR5 GUI, so be aware (besides both mirroring settings - only mirroring is working under high-end process - no mirroring2 [272 window size is added back as user input] all settings should be available in GUI). Interestingly, I received similar report for MDX models in UVR5 GUI comparing to Colab (be aware just in case). The problems might be also bound to VRAM, and don't exist on 11GB GPPUs and up or in CPU mode.\n\nBefore we start -\n\n*Issue with additional vocal residues when postprocess is enabled*\n\n- “*postprocess* option masks instrumental part based on the vocals volume to improve the separation quality.\" (from: <https://github.com/tsurumeso/vocal-remover>)\n\nwhere in HV Colab it says: “Mute low volume vocals”. So, if it enhances separation quality, then maybe it should cancel some vocals residues (\"low volume vocals\") so that's maybe not too bad explanation.\n\nBut that setting enabled in at least Colab may leave some vocal residues:\n\n(it’s fixed in UVR GUI \"the very end bits of vocals don't bleed anymore no matter which threshold value is used\")\n\nCustomizable postprocess settings (threshold, min range and fade size) in HV's Colab were deleted, and were last time available in this revision:\n\n<https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/b072ad7418f6b1825d3dcff7cef70c5b0985d540/Vocal%20Remover%205_arch.ipynb#scrollTo=CT8TuXWLBrXF>\n\nSo change default 0.3 or 0.2 threshold value (depending on revision) and set it to 0.01 if you have the issue when using postprocess.\n\nThe *threshold* parameter set to 0.01 fixes the issue (so quiet the opposite thing happened using default settings than this option should serve to, I believe).\n\nAlso, default threshold values for postprocess changed from 0.3 to 0.2 in later revisions of the Colab.\n\n- *Window size* option set to anything other than 512 somehow decrease SDR, although most people like lower values (at least 320, me even 272; 352 is also possible, but anything above changes the tone of sound more noticeably) - we don’t know yet why lower window sizes mess with SDR (similar situation like with GSEP) - 512 might be a good setting for ensemble with other models than VR ones or for further mastering. Sometimes compared to 512 windows size, 272 can lead to a bit more noticeable vocal residues. You might find bigger window sizes less noisy in general, but also more blurry for some people.\n\n- *Aggressiveness/Aggression* - “A value of 10 is equivalent to 0.1 (10/100=0.1) in Colab”.\n\nStrangely, the best SDR for aggressiveness using MSB2 instrumental model turned out to be 100 in GUI, 10 in Colab, while we usually used 0.3 for this model and 500m\\_x as well, while HP models usually behaves the best with lower values than HP2 models (0.09/10 in GUI).\n\n- Mirroring turned out to enhance SDR. It adds to the spectrum e.g. above 20kHz for a base training frequency of VR model (all 4 bands).\n\nnone - No processing (default)\n\nbypass - This copies the missing frequencies from the input.\n\nmirroring - This algorithm is more advanced than correlation. It uses the high frequencies from the input and mirrored instrumental's frequencies. More aggressive.\n\nmirroring2 - This version of mirroring is optimized for better performance.\n\n--*high\\_end\\_process* - In the old CLI VR, this argument restored the high frequencies of the output audio. It was intended for models with a narrow bandwidth - 16 kHz and below (the oldest “lowend” and “32000” ones, none more). But now in UVR5 GUI, High-end process is counterpart of mirroring.\n\n(current 500MB models don’t have full 22kHz coverage, but 20kHz, so consider using mirroring instead or none if you want fuller spectrum)\n\n- Be aware, that even for VR arch, the same rule for GPUs with less than 8GB VRAM applies (inb4 - Colab T4 has 15GB) - separations on 6GB VRAM have worse quality with the same parameters. In order to work around the issue, you can split your audio into specific parts (e.g. for all chorus, verses etc).\n\n#### VR models settings and list\n\nFor VR architecture models, you can start with these two fast models:\n\nModel: **HP\\_4BAND\\_3090\\_arch-124m** (1\\_HP-UVR)\n\n1) Fast and reliable. V2 below has more “polished” drums, while here they’re more aggressive and louder. Sometimes V2 might be safer and can fit in more cases where it’s not hip-hop and music is not drum oriented, but that one rarely harms some instruments more in certain cases with more busy mix with e.g. repeatable synth. You may want to isolate using these two models and pick the best results on even the same album.\n\nWindows size: 272\n\nAggressiveness: 0.09 in Colab/CLI/MVSEP (9 in UVR 5.6.x)\n\nTTA: ON (OFF if snare is too harsh)\n\nPost-processing: OFF (at least for this model - it can get muffle instruments in background beside drums of the track in some cases, e.g. guitar)\n\n\"Mirroring\" (Hi-end process in GUI) (rarely \"Mirroring2\" here, since the model itself is less smooth and usually have better drums, but it sometimes leads to overkill - in that case check mirroring2 in CLI or V2 model above)\n\n*Better yet, to increase the quality of the separation (when drums in e.g. hip-hop can be frequently damaged too much during the process) go now straight to the Demucs section and read the \"Anjok's tip\".*\n\nIf you have too many vocal residues vs 500m\\_1 model, increase aggressiveness from 0.09 to 0.2 or even 0.3, but it’s destructive for some instruments (at least without Demucs trick above).\n\nModel: **HP-4BAND-V2\\_arch-124m** (2\\_HP-UVR)\n\n!) Fast and nice model, but sometimes gives lots of vocal residues comparing to above, but thanks to this, it may sometimes harm snare less in some cases (still 4 times faster than 500m\\_1) it’s ~55/45 which model is better and depends on the album even on the same genre:\n\nWindow size: 272 (the lowest possible; in some very rare cases it can spoil the result on 4 band models, then check 320)\n\nAggressiveness: 0.09 (9 in GUI)\n\nTTA: ON (instr. separation of a better quality)\n\nPostprocess: (sometimes on, it rather compliments to the sound of this model especially when the result sounds a bit too harsh, but it also can spoil drums in some places when e.g. strong synths suddenly appear in mix for short, probably misidentifying them as vocals, so be aware)\n\nMirroring (it fits pretty well to this model in comparison to mirroring2 which is not “aggressive” enough here) [mirroring doesn’t seem to be present in GUI so be aware)\n\nProcessing time for this model is 10 minutes using the weakest GPU in Colab (but currently you should be getting better Tesla T4).\n\n(for users of x-minus) “slightly different models [than in GUI] are used for minimum aggressiveness. When we train models, we get many epochs. Some of these models differ in that they better preserve instruments such as the saxophone. These versions of the models don't get into the release, but are used exclusively on the XM website.”\n\nModel: **HP2-4BAND-3090\\_4band\\_arch-500m\\_1** (9\\_HP2-UVR)\n\n3) Older good model, but resource heavy - check it if you get too many vocal residues, or in other cases - when your drums are too muffled - rarely there might be more bleeding and generally more spoiled other instruments in comparison to those above, it depends on a track. In some cases it bleeds vocal less than HP\\_4BAND\\_3090\\_arch-124m\n\nWindow size: 272\n\nAggressiveness: 0.3-0.32 (30-32 in GUI)\n\nTTA: ON\n\nPostprocess: (turned ON in most cases with exceptions (it’s polishing high-end), and the problem with muffling instruments using ppr doesn’t seem to exist in this model)\n\nMirroring2 (I find mirroring[1] too aggressive for this model, but with exceptions)\n\n! Be aware these settings are very slow (40 minutes per track in Colab on the former default K80 GPU, but it's faster now) so just in case, you might want to experiment with 320/384, or at worse even 512 window size if you want to increase processing speed in cost of isolation precision.\n\nColab’s former default Tesla K80 processes slower than even GTX 1050 Ti, so if you have a decent Nvidia GPU, consider using UVR locally. Since May 2022 there is faster Tesla T4 available as default, so there shouldn't be any problem.\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_2 (8\\_HP2-UVR)\n\nwas worse in I think every case I tested, but it’s good for a pair for ensemble (more about ensemble in section below).\n\nModel: HP2-MAIN-MSB2-3BAND-3090\\_arch-500m (7\\_HP2-UVR.pth)\n\n4) Last resort, e.g. when you have a lot of artifacts (heavily filtered vocal residues) some instruments spoiled, and no equal sound across the track. Last resort, because it’s 3 band, instead of 4 band, and it lacks some hi-end/clarity, but if your track is very demanding to filter out vocal residues, then it’s good choice. The best SDR among VR-arch models.\n\nWindow size: 272\n\nAggressiveness: 0.3\n\nTTA: ON\n\nPostprocess: ON\n\nMirroring\n\nIt’s similarly nightmarishly slow in Colab just like 500m\\_1/2 using these settings (1 hour for a track on K80) when you got accidentally slower Tesla K80 assigned in Colab instead of Tesla T4.\n\nHighPrecison\\_4band\\_arch-124m\\_1\n\n\\*)\n\nMay sometimes harm instruments less than HP\\_4BAND\\_3090\\_arch-124m, but may leak vocals more in many cases, but generally instrumentals lacks some clarity, but it sounds more neutral vs 500m\\_1 with mirroring (not always an upside). It’s not available in GUI by default due to its not fully satisfactory results vs models above.\n\nWindow size: 272\n\nAggressiveness: 0.2\n\nTTA: ON\n\nPostprocess: off\n\nmirroring\n\nSP in the GUI models stands for \"Standard Precision\". Those models use the least amount of computing resources of any other models in the application. HP on the other hand stands for \"Higher Precision\" those models use more resources but have better performance.\n\nSo, what's the best VR arch model?\n\nI'd stick to **HP\\_4BAND\\_3090\\_arch-124m** (1\\_HP-UVR) if it only gives good result for your song (e.g. hip-hop). If you're forced to use any other VR model for a specific song due to unsatisfactory results with this model, then probably current MDX models will achieve better results.\n\nSecond most usable model for me was 500\\_m1(9\\_HP2), and then HP-4BAND-V2\\_arch-124m (2\\_HP-UVR) or something in between, but compared to MDX-UVR models, it might be not worth to use it anymore due to possibility of more vocal residues.\n\n- 13/14\\_SP models (called 4-band beta 1/2 in the Colab) - less aggressive than above\n\n(these are older UVR5 models by UVR team - less aggressive, give more vocal residues frequently’ the mid ones have less clarity, but might be less noisy - but they’re surpassed by MDX models)\n\n- v4 models -\n\nEven older models from times of previous VR codebase\n\"All the old v5 beta models that weren't part of the main package are compatible [with UVR] as well. Only thing is, you need to append the name of the model parameter to the end of the model name\"\n\nAlso, V4 models are still compatible with UVR using this method.\n\n“Main Models\n\nMGM\\_MAIN\\_v4\\_sr44100\\_hl512\\_nf2048.pth -\n\nThis is the main model that does an excellent job removing vocals from most tracks.\n\nMGM\\_LOWEND\\_A\\_v4\\_sr32000\\_hl512\\_nf2048.pth -\n\nThis model focuses a bit more on removing vocals from lower frequencies.\n\nMGM\\_LOWEND\\_B\\_v4\\_sr33075\\_hl384\\_nf2048.pth -\n\nThis is also a model that focuses on lower end frequencies, but trained with different parameters.\n\nMGM\\_LOWEND\\_C\\_v4\\_sr16000\\_hl512\\_nf2048.pth -\n\nThis is also a model that focuses on lower end frequencies, but trained on a very low sample rate.\n\nMGM\\_HIGHEND\\_v4\\_sr44100\\_hl1024\\_nf2048.pth -\n\nThis model slightly focuses a bit more on higher end frequencies.\n\nMODEL\\_BVKARAOKE\\_by\\_aufr33\\_v4\\_sr33075\\_hl384\\_nf1536.pth -\n\nThis is a beta model that removes main vocals while leaving background vocals intact.\n\nStacked Models\n\nStackedMGM\\_MM\\_v4\\_sr44100\\_hl512\\_nf2048.pth -\n\nThis is a strong vocal artifact removal model. This model was made to run with MGM\\_MAIN\\_v4\\_sr44100\\_hl512\\_nf2048.pth -\n\nHowever, any combination may yield a desired result.\n\nStackedMGM\\_MLA\\_v4\\_sr32000\\_hl512\\_nf2048.pth -\n\nThis is a strong vocal artifact removal model. This model was made to run with MGM\\_MAIN\\_v4\\_sr44100\\_hl512\\_nf2048.pth -\n\nHowever, any combination may yield a desired result.\n\nStackedMGM\\_LL\\_v4\\_sr32000\\_hl512\\_nf2048.pth -\n\nThis is a strong vocal artifact removal model. This model was made to run with MGM\\_LOWEND\\_A\\_v4\\_sr32000\\_hl512\\_nf2048.pth -\n\nHowever, any combination may yield a desired result.”\n\n#### VR ensemble settings\n\nAs for VR architecture, ensemble is the most universal and versatile solution for lots of tracks. It delivers, when results achieved with single models fail - e.g. when snare is too muffled or distorted along with some instruments, but sometimes a single model can still provide more clarity, so it’s not universal for every track.\n\nIn most cases, ensemble of only VR models is dedicated for the tracks when in the most prevailing moments of busy mix in the track, you don’t have major bleeding using single VR model(s) because it rarely removes that well vocal residues from instrumentals better than current MDX models, or with high aggressiveness it becomes too destructive.\n\nOrder of models is crucial (at least in the Colab)! Set the model with the best results as the first one. Usually, using more than 4 models has a negative impact on the quality. Be aware that you cannot use postprocess in HV Colab in this mode, otherwise you’ll encounter an error. *Please note that now UVR 5 GUI allows an ensemble of UVR and MDX models in the app exclusively, so feel free to check it too.* Here you will find settings for “only” UVR models ensemble only.\n\n- HP2-4BAND-3090\\_4band\\_arch-500m\\_1.pth (9\\_HP2-UVR)\n\n- \\*\\*HP2-4BAND-3090\\_4band\\_arch-500m\\_2.pth (8\\_HP2-UVR)\n\n- HighPrecison\\_4band\\_arch-124m\\_1.pth (probably deleted from GUI, and you’d need to copy this model from [here](https://drive.google.com/file/d/13Tm0AveW5yKEmxaTnkRWXxjshIXi415c/view?usp=sharing) to your GUI folder manually - if it will only work)\n\n- HP\\_4BAND\\_3090\\_arch-124m.pth (1\\_HP-UVR)\n\n(order in Colab is important, keep it that way!)\n\nOr for less bleeding, but a bit more muffled snare, use this one instead:\n\nHP-4BAND-V2\\_arch-124m.pth (model available only in Colab, recommended\n\n\\*on slower Tesla K80 you can run out of time due to runtime disconnection, but you should get faster Tesla T4 by default on first Colab connection on the account in 24h.\n\n*Aggressiveness*: 0.1 (pretty universal in most cases, 0.09 rarely fits).\n\nOr for more vivid snare if bleeding won’t kick in too much: 0.01 (in cases when it’s more singing than rapping - for the latter it can result in more unpleasant bleeding (or just in some parts of the track). Suggested very low aggressiveness here doesn’t leak as much as it could using the same settings on a single model, but it leaks more in general vs single models’ suggested settings).\n\n0.05 is not good enough for anything AFAIK.\n\n*high\\_end\\_process*: mirroring2 (just ON in GUI)\n\n(for less vivid snare check “bypass”, (not “mirroring” for ensemble - for some reason both make the sound more muffled), be aware that bypass on ensemble results with less vocal leftovers)\n\n*ensembling\\_parameter*: 4band\\_44100.json\n\n*TTA*: ON\n\n*Window size*: 272\n\n*FlipVocalModels*: ON\nEnsemble algorithm: default on Colab (min\\_mag for instrumentals)\n\nOther ensemble settings\n\n* For clap leftovers in vocal stem, check out [this](https://i.imgur.com/BmvcUEt.png) ensemble settings.\n* For creaking sounds, process your separation output more than once till you get there with [this](https://i.imgur.com/jDYlYFk.jpeg) setting\n* Also reported clean instrumentals with [this](https://i.imgur.com/iaHdw5M.png) setting\n\nMake sure you checked separated file after the process and file length agrees with original file. Occasionally, the result file can be cut in the middle, and you’ll need to start isolation again. Also, you can accidentally start isolation before uploading of source file is finished. In that case, it will be cut as well.\n\nIt takes 45 minutes using Tesla T4 (~RTX 3050 in CUDA benchmarks) for these 4 models settings. Change your songs for processing after finishing the task FAST, otherwise you’ll be disconnected from runtime when the notebook is idle for some time (it can even freeze in the middle).\n\nIn reality, Tesla T4 maybe has much more memory, but what takes 30 minutes on a real RTX 3050, here might take even more than 2 hours and sometimes slower or sometimes slightly faster (usually slower). So you're warned.\n\n\\*\\*Be aware that these 4 model ensemble setting with both 500m models in most cases won’t suffice for the slowest (and no longer available in 2023) Tesla K80 due to its time and performance limit to finish such a long operation which exceeds 2 hours (it takes around 02:25h). Certain tasks too much above 2 hours ends up with runtime disconnection, so you're warned.\n\nAlso be aware that the working icon of Colab on the opened tab sometimes doesn’t refresh when operation is done.\n\nFurthermore, it can happen that the Colab will hang near 01:45-02:17h time of executing the operation. To proceed, you can click F5 and press cancel on prompt to whether to refresh. Now the site will be functional again, but the process will stop without any notice. It is most likely the same case when you suddenly stop connection to the internet, and the process will still run virtually till you reconnect to the session. But here, you just don’t have to click the reconnect button on the right top. Most likely you have very limited time to reestablish the connection till the process will stop permanently if you don't connect on connection lost (or eventually if progress tracker/Colab will stop responding). So in the worst case, you need to observe if the process is still working between 01:45-02:17h of processing. If you see that your GPU has 0.84GB instead of ~2GB, you’re too late and your process is permanently interrupted, and the result is gone. It’s harder to track how long it processes when you already used the workaround once, and the timer stopped, so you don't know how long it is separating already.\n\nLimit for faster Tesla T4 is between 1:45 and 2:00h/+ (sometimes 2:25, but can disconnect sooner, so try not to exceed two hours) of constant batch operation, which suffice for 2 tracks being isolated using ensemble settings above with both 500m models (rarely 3 tracks).\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_1 (9\\_HP2-UVR) - I think it tends to give the most consistent results for various songs (at least for songs when vocal residues are not too prevalent here)\n\nHP-4BAND-V2\\_arch-124m (2\\_HP-UVR) - much faster and can give crisp results, but with too many vocal residues for some songs (like VR arch generally tends to)\n\nHP\\_4BAND\\_3090\\_arch-124m (1\\_HP-UVR) - something between the two above, and can give the best results for some song too (out of other VR models)\n\nHP2-MAIN-MSB2-3BAND-3090\\_arch-500m (7\\_HP2-UVR.pth) - tends to have the least vocal residues out of the VR models listed above, but in cost of instrumentals not sounding so \"full\"\n\nHighPrecison\\_4band\\_arch-124m\\_1 (I think not available in UVR, you'd need to install it manually) - can be a good companion if you only have VR models for ensemble\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_2 (9\\_HP2-UVR) - the same situation, I think it rarely gives any better results than 500m\\_1 (if in even any case) but it's good for purely VR ensemble\n\n\\_\\_\\_\\_\\_\\_\\_VR algorithms of ensemble \\_\\_\\_\\_\\_\\_\\_\n\nby サナ(Hv#3868)\n\n“np\\_min takes the highest value out, np\\_max does vice versa\n\nit's also similar to min\\_mag and max\\_mag\n\nSo the min\\_mag is better for instrumental as you could remove artefacts.\n\ncomb\\_norm simply mixes and normalizes the tracks. I use this for acapella as you won't lose any data this way”\n\n*Batch conversion on UVR Colab*\n\nThere’s a “ConvertAll” batch option available in Colab. You can search for “idle check” in this document to prevent disconnections on long Colab sessions, but at least if you get the slowest K80 GPU, the limit is currently 2 hours of constant work, and it simply terminates the session with GPU limit error. The limit is enough for 5 tracks - 22 minutes with ~+/-3m17s overhead (HP\\_4BAND\\_3090\\_arch-124m/TTA/272ws/noppr/~2it/s) so better execute bigger operations in smaller instances using various accounts and/or after 3-5 attempts you can also finally hit on better GPU than K80.\n\nTo get faster GPU simply go to Runtime>Manage session>Close and connect and execute Colab till you get faster Tesla T4 (up to 5 times). But be aware, that 5 reconnections will reach the limit on your account, and you will need to change it. It’s easier to get T4 and not reach the limit reconnecting, around 12:00 CET in working days. 14:30 o’clock it was impossible to get T4, but probably it depended on a situation when I already used T4 this day since I received it immediately on another account.\n\nFor single files isolation instead of batch convert I think it took me 6-7 hours till the GPU limit was reached, and I processed 19 tracks using 272 ws in that session.\n\nJFI: Even 5800X is slower than the slowest Colab GPU.\n\n*Shared UVR installation folder among various Google accounts*\n\nSince we no longer can use old Gdrive mounting method allowing mounting the same drive across various Colab sessions - to not clutter all of your accounts by UVR installation, simply share a folder with editing privileges and create a shortcut from it to your new account. Sadly the trick will work for one session at a time.\n\nFirstly - sometimes you can have problems with opening the shared folder on proper account despite changing it after opening the link (it may leave you on old account anyway). In that case, you need to manually insert id of your account where you want to open your link to. E.g. https://drive.google.com/drive**/u/9/**folders/xxxxxxxx (where 9 is an example of your account ID which shows right after you switch your account on main Google Drive page).\n\nAfter you opened the shared UVR link on your desired account, you need to add the shortcut to your disk (arrow near folder’s name) and when it’s done, create “track” and “separated” folder on your own - so delete/rename shared “tracks” and “separated” folder and create it manually, otherwise you will get error during separation. If you still get an error anyway, refresh file browser in the left of Colab and/or retry running separation three times till error disappears (from now on it shows error occasionally, and you need to retry from time to time and/or click refresh button in file manager view in the left or even navigate manually to tracks folder in order to refresh), Colab gets changes like moving files and folders on your disk with certain delay. And be aware that most likely such way of installing UVR will prevent you from any further updates from such account with shared UVR files, and on the account you shared the UVR files from, you need to repeat folder operations if you will use it back again on Colab.\n\nComparing 500m\\_1 and arch\\_124m above, in some cases you can notice that the snare is louder in the first, but you can easily make it up using mirroring instead of mirroring2. Downside of normal mirroring might be more pronounced vocal residues due to higher output frequency.\n\nAlso, in 500m\\_1 more instruments are damaged or muffled, though more aggressiveness in the default setting of 500m\\_1 sometimes makes an impression that more vocal residues are cancelled.\n\n([evaluation tests](https://discord.com/channels/708579735583588363/767947630403387393/870722720546041856) window size 272 vs 320 -\n\nit’s much slower, doesn’t give noticeable difference on all sound systems, 272 got slightly worse score, but based on my personal experience I insist on using 272 anyway)\n\n([evaluation tests](https://discord.com/channels/708579735583588363/767947630403387393/868770117205520434) aggressiveness 0.3 vs 0.275 -\n\ndoesn’t apply for all models - e.g. MGM - 0.09)\n\n([evaluation tests](https://discord.com/channels/708579735583588363/767947630403387393/868594718915829841) TTA ON vs OFF -\n\nin some cases, people disable it)\n\n5a) (haven’t tested thoroughly these aggressiveness parameters yet)\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_1.pth\n\nw 272 ag 0.01, TTA, Mirroring\n\n5c)\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_1.pth\n\nw 272, ag 0.0, TTA, Mirroring 2\n\nLow or 0.0 aggressiveness leaves more noise, sometimes it makes instrumental cleaner, if you don’t care for more vocal bleeding (it depends also on your sound system how you are able to catch them. E.g. whether you listen on headphones or speakers).\n\nBut be aware that:\n\n“A 272 window size in v5 isn't recommended [in all cases]. Because of the differing bands. In some cases it can make conversions slightly worse. 272 is better for single band models (v4 models) and even then the difference is tiny” Anjok (developer)\n\n(so on some tracks it might be better to use 320 and not below 352, but personally I haven’t found such case yet)\n\nDeepExtraction is very destructive, and I wouldn’t recommend it with current good models.\n\nKarokee V2 model for UVR v5 (MDX arch)\n\n(leaves backing vocals, 4band, not in Colab yet, but available on MVSep)\n\nModel:\n\n<https://mega.nz/file/yJIBXKxR#10vw6lRJmHRe3CMnab2-w6gAk-Htk1kEhIp_qQGCG3Y>\n\nBe sure to update your scripts (if you use older command line version instead of GUI):\n\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v5-beta-cml>\n\nRun:\n\npython inference.py -g 0 -m modelparams\\4band\\_v2\\_sn.json -P models\\karokee\\_4band\\_v2\\_sn.pth -i <input>\n\n5d) Web version for UVR/MDX/Demucs (alternative, no window size parameter for better quality):\n\n<https://mvsep.com/>\n\nHow to use this free online stem splitter with a variety of quality algorithms -\n\n1. Put your audio file in.\n\n2. Choose an algorithm. Usually, you really only need to choose one of two algorithms:\n\n- The best algorithm for getting clean vocals/instrumental is selecting Ultimate Vocal Remover. Once you selected Ultimate Vocal Remover, select HP-4BAND-V2 as the \"Model type\".\n\n- The best algorithm for getting clean separate instrument tracks, like bass, drums and other, is Demucs 3 Model B.\n\n3. Hit Separate, and mvsep will load it for you. This means you can do everything yourself, no need to ask for other people's isolations if you can't find them.\n\n6) VR 3 band model (gives better results on some songs like K Pop)\n\n[HP2-MAIN-MSB2-3BAND-3090](https://mega.nz/file/2MpjGIoQ#rUz2_AzTqISYTm7Yy8YTnoTmWqAhq3JlLbhwor4rYiI)\n\n(I think default aggresiveness was 0.3)\n\n7) deprecated - in many cases lot of bleeding (not every time) but in some cases it hurts some instruments less than all above models (e.g. quiet claps).\n\nMGM-v5-4Band-44100-BETA2/\n\n(MGM-v5-4Band-44100-\\_arch-default-BETA2)\n\n/BETA1\n\nAgg 0.9, TTA, WS: 272\n\nSometimes I use Lossless-Cut to merge beta1 and beta2 certain fragments.\n\nModels from point 4 surpasses ensemble of both BETA1 and BETA2 models.\n\n(!) Interesting results (back in 2021)\n\n“Whoever wants to know the HP1, HP2 plus v4 STACKED model method, I have a [...] group explaining it\"\n\n<https://discord.gg/PHbVxrV4yS>\n\nLong story short - you need to ensemble HP1 and HP2 models, then on top of it, apply stacked model from v4.\n\nBe aware that ensemble with postprocessing in Colab doesn't work.\n\nInstruction:\n\n1 Open this link\n\n<https://colab.research.google.com/drive/189nHyAUfHIfTAXbm15Aj1Onlog2qcCp0?usp=sharing>\n\n2. Proceed all the steps\n\n3. After mounting GDrive upload your, at best, lossless song to GDrive\\MDX\\tracks\n\n4. Uncheck download as MP3, begin isolation step\n\n5. Download the track from \"separated\" folder on your GDrive. You can use GDrive preview on the left.\n\n1\\*. Alternatively, if you have a paid account here, upload your song to: <https://x-minus.pro/ai?hp>\n\nMake sure you have \"mdx\" selected for the AI Model option. Wait for it to finish processing.\n\n2\\*. Set the download format to \"wav\" then click \"DL Music.\" Store the resulting file in the ROOT of your UVR installation.\n\n6. Use a combination of UVR models to remove the vocals. Experiment to see what works with what. Here's a good starting point:\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_1.pth\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_2.pth\n\nHP\\_4BAND\\_3090\\_arch-124m.pth\n\nHP-4BAND-V2\\_arch-124m.pth\n\n7. Store the resulting file in the ROOT of your UVR installation alongside your MDX result.\n\n8. Finally, ensemble the two outputs together. cd into the root of your UVR installation and invoke spec\\_utils.py like so:\n\n$ python lib/spec\\_utils.py -a crossover <input1> <input2>\n\nthe output will be stored in the ensembled folder\n\n9\\* (optional). Ensemble the output from spec\\_utils with the output from UVR 4 stacked models using the same algorithm\n\nEnsemble\n\nspec\\_utils.py allowing ensemble is standalone, and doesn't require UVR installed in order to work. It accepts any of the audio files\n\nmul - multiplies two spectrograms\n\ncrossover - mixes the high frequencies of one spectrogram with the low frequencies of another spectrogram\n\nDefault usage from aufr33:\n\npython lib/spec\\_utils.py -o inst\\_co -a crossover UVR\\_inst.wav MDX\\_inst.wav\n\n<https://github.com/Anjok07/ultimatevocalremovergui/blob/v5-beta-cml/lib/spec_utils.py>\n\nCustom UVR Piano Model:\n\n<https://drive.google.com/file/d/1_GEEhvZj1qyIod1d1MX2lM6u65CTpbml/view?usp=s>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n#### VR Colab troubleshooting\n\nIf you somehow can't mount GDrive in the VR Colab because you have errors or your separation fails:\n\n- Use the same account for Colab and for mounting GDrive (or you’ll get an error)\n\n- If you’re on mobile, you might be unable to use Colab without PC mode checked in your browser settings (although now it works without it in Chrome Android)\n\n- In some cases, you won’t be able to write “Y” in empty box to continue on first mounting on some Google account. In that case, e.g. change browser to Chrome and check PC mode.\n\n- In some cases, you won’t be able to paste text from clipboard into Colab if necessary, when being in PC mode on Android, if some opened on-screen applications will prevent the access - you’ll need to close them, or use mobile mode (PC mode unchecked)\n\n- (probably fixed) If you started having problems with logging into Colabs.\n\n> Actually, it doesn't show that you're logged in while the button says to log in.\n\nSo, it should respect redirections in Colab links to specific accounts, but if you're mounting to GDrive, and it fails with Colab error, simply click the button in the top right corner to log in. It will. Just won't show that you did that. Then Colab will start working.\n\n- Don't use postprocess in ensemble, or you'll encounter error\n\n- You can try checking force update in case of errors\n\n- Go to runtime>manage sessions>terminate session and then try again with Trigger force update checked (ForceUpdate may not work before terminating session after Colab was launched already).\n\n- Make sure you got 4.5GB free space on GDrive and mounting method is set to \"new\". You can try out \"old\" but it shouldn't work.\n\nTry out a few times.\n\n- If still nothing, delete VocalRemover5-COLAB\\_arch folder from GDrive, and retry without Trigger update.\n\nOn fresh installation, make sure you still have 4.5GB space on GDrive (empty recycle bin - automatic successful models installation will leave separate files there as well, so you can run out of space on cluttered GDrive easily)\n\n- If still nothing (e.g. when models can’t be found on separation attempt), then download that thing, and extract that folder to the root (main) directory of Gdrive, so it looks like following: Gdrive\\VocalRemover5-COLAB\\_arch and files are inside, like in the following link:\n\n<https://drive.google.com/drive/folders/1UnjwPlX1uc9yrqE-L64ofJ5EP_a8X407?usp=sharing>\n\nand then try again running the Colab:\n\n<https://colab.research.google.com/drive/16Q44VBJiIrXOgTINztVDVeb0XKhLKHwl>\n\n- if you cannot connect with GPU anymore and/or you exceeded your GPU limit\n\ntry to log into another Google account.\n\n- Try not to exceed 1 hour when processing one file or one batch of files, otherwise you'll get disconnected.\n\n- Always close the environment in Environment before you close the tab with the Colab.\n\nThat way, you will be able to connect to the Colab again after some time, even if you previously connected to the runtime and stopped using it. Not shutting down the runtime before exit, makes it wait in idle, and hitting timeout. Then the error of limit reached will appear after you'll try to connect to Colab again if it wasn't closed before. Then you'll need to wait up to 24h, or switch Colab account, while using the same Google account as for Colab in the mounting cell (otherwise, it will end up with error when you'll use different account for Colab and different for GDrive mounting).\n\n- New layer models may not work with 272 window size causing following error:\n\n“raise ValueError('h1\\_shape[3] must be greater than h2\\_shape[3]')\n\nValueError: h1\\_shape[3] must be greater than h2\\_shape[3]”\n\n- (fixed) Sometimes on running mounting cell you can have short “~from Google Colab error” on startup. It will happen if you didn’t log into any account in the top right corner of the Colab. Sometimes it will show a blue “log in” button, but actually it’s logged in, and Colab will work.\n\n- *A network error occurred, and the request could not be completed.*\n\n*GapiError: A network error occurred and the request could not be completed.*\n\nIn order to fix these error in Colabs, go to hosts file in your c:\\Windows\\System32\\Drivers\\etc\\hosts and check if you don’t have any lines looking like:\n\n127.0.0.1 clients.google.com\n\n127.0.0.1 clients1.google.com etc.\n\nIt can be introduced by RipX Pro DAW.\n\n- Various Colabs might occasionally get unstable, and the environment disk might get unmounted, or you might get weird errors. In that case, simply kill the current environment and start over\n\n- These are all the lines which fix problems in our VR Colabs since the beginning of the year when new versions of these dependencies became incompatible (but usually one Colab linked is forked when told and up-to-date with these necessary fixes applied already)\n\n!pip install soundfile==0.11.0\n\n!pip install librosa==0.9.1\n\n!pip install torch==1.13.1\n\n!pip install yt-dlp=2022.11.11\n\n!pip install git+https://github.com/ytdl-org/ytdl-nightly.git@2023.08.07\n\nLater in February 2024 we needed to switch to older Python 3.8 in order to make numpy work correctly with used deprecated functions. More details on these fixes and used lines below [Similarity Extractor](#_3c6n9m7vjxul) section (all those fixes should be already applied in the latest fixed Colab at the top).\n\n###### MDX-Net [Colab](https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb) by HV (March 2025) Models trained by UVR team models (aufr33 & Anjok)\n\n## *First vocal models trained by UVR for MDX-Net arch:*\n\n## *9.703 model is UVR-MDX-NET 1, UVR-MDX-NET 2 is UVR\\_MDXNET\\_2\\_9682, NET 3 is 9662, all trained at 14.7kHz*\n\n(instrumental based on processed phase inversion)\n\nList of all (newer) available MDX models at the very top.\n\nI think main was 438 in UVR 5 GUI at some point. At least now it's simply main\\_438 (if it wasn't from the beginning, but it was easy to confuse it with simply main model or even inst main)\n\n(use MDX is a way to go now over VR) Generally use MDX when the results achieved with VR architecture are not satisfactory - e.g. too much vocal bleeding (e.g. in deep and low voices) or damaged instruments. If you only want acappella - it’s currently the best solution. Actually the best in most cases now.\n\nMDX-UVR models are also great for cleaning artifacts from inverts (e.g. mixture (regular track) minus official instrumental or acappella).\n\n(outdated) 9.682 might be better for instrumentals and inversion in some cases, while 9.7 for vocals, but better check already also newer models like 464 from KoD update (should be better in most cases) and also check Kim Model in GUI.\n\nGenerally on MVSEP's multisong dataset, these models received different SDR than on MDX21 dataset back in the days.\n\nOn MVSEP there’s 9.7 (NET 1) model, and it doesn't have any cutoff above training frequency for inverted instrumentals like currently GUI has. For (new) model it’s vocal 423 model and possibly with Demucs 2 enabled like in Colab, but it doesn’t have a specific jaggy spectrum above MDX training frequency which is specific to inverted vocal 4XX models from that period including Kim’s model.\n\nNon-onnx version of voc\\_ft model in pth by MusicMan - 20x faster on MPS devices:\n\n<https://discord.com/channels/708579735583588363/887455924845944873/1204148534790852608> (roughly the same model size)\n\nIt won’t work in UVR. Inference code mirror: <https://drive.google.com/file/d/1aSe0bwgIWhR7vvF1aoHQlCHpj39Kd-YK/view?usp=sharing>\n\nMirror:\n\n<https://drive.google.com/drive/folders/16QbwuCBT0_w9nmNDg22m1niq0odtaZUP?usp=sharing>\n\nAnd the rest of MDX-Net v2 models: HQ\\_1-5, inst3, Kim inst, Kim Vocal 1-2, and older narrowband vocal and instrumental ones and Karaoke models.\n\n#### **(the old) Google Colab by HV**\n\n(with OG demucs 2 ensembling for vocal models)<https://colab.research.google.com/drive/189nHyAUfHIfTAXbm15Aj1Onlog2qcCp0?usp=sharing>\n\nAdd separate cell as following, or else it won’t work\n\n!pip install torch=1.13.1 (probably numpy 1.25 for this old Torch)\n\nIf you're still getting errors, delete whole MDX\\_Colab folder, terminate your session, make clean installation afterward, and don't forget to have this torch line executed after mounting (that might happen in case you manually replaced model.py with some of the ones below, and didn't restore the correct old one).\n\n(The Colab to use MDX easily in Google’s cloud. Newer models not included, and it gives error if you add other models manually - custom models.py necessary, only 9.7 [NET 1-3] and karaoke models included above)\n\n(In case of “RuntimeError: Error opening 'separated/(trackname)/vocals.wav': System error.” simply retry)\n\nMore MDX models explained in UVR section in the beginning of the document since they're a part of UVR GUI now.\n\nOptionally, 423 model can be downloaded separately [here](https://1.filedit.ch/1/wIfYtOBtLvIIsKKbHPK.rar) (just in case, it’s main). It is on MVSEP as well.\n\n#### (defunt) Upd. by KoD & DtN & Crusty Crab & jarredou, HV (12.06.23) (probably now requires !pip install numpy==1.26 and restarting env) It might have more models than above (e.g. some beta HQ ones)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nThe newest MDX Colabs - now with automatic models downloading (no more manual GDrive models installation as in older updates). Consider everything in the divided section later below as unnecessary.\n\n<https://colab.research.google.com/github/kae0-0/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb> (stable, lacks voc\\_ft batch process + also manual parameters loading per model like in the two above)\n\n<https://colab.research.google.com/github/jarredou/Colab-for-MDX_B/blob/main/MDX_Colab.ipynb> (Beta. Might lack HQ\\_3 and voc\\_ft. It supports batch processing. Works with a folder as input and will process all files in it.\n\nIn \"tracks\\_path\" must be a folder containing (only) audio files (not the direct link to a file).\n\nBut the below might still work.)\n\n<https://colab.research.google.com/drive/1CO3KRvcFc1EuRh7YJea6DtMM6Tj8NHoB?usp=sharing> (older revision with also auto models downloader, but with manual n\\_fft dim\\_f dim\\_t parameters setting like HV added)\n\nand working one by HV linked at the top:\n\n<https://colab.research.google.com/github/NaJeongMo/Colab-for-MDX_B/blob/main/MDX-Net_Colab.ipynb>\n\n(new one by HV with community edits - 2025)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nOld update from before model downloader implementation (May which year?)\n\n[MDX Colab](https://colab.research.google.com/drive/1MssLvN06i4gUC-N6QiEgeRdq5IyBiyem) with separate input for 3 models parameters, so you don’t need to change models.py every time you switch to some other model. Settings for all models listed in Colab. From now on, it uses reworked main.py and models.py downloaded automatically (made by jarredou). Don’t replace models.py from below packages with models from now on. Now denoiser also optionally added.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n(older Colab instruction)\n\nTo use more recent MDX-UVR models in Google Colab:\n\n1. Use and install this [Colab](https://colab.research.google.com/drive/1NV6Ewjn9CLSrKFGCZGFFgd5nG45_Ow7T?usp=sharing) ([new](https://colab.research.google.com/drive/1MssLvN06i4gUC-N6QiEgeRdq5IyBiyem)) to GDrive at least once, run all the cells, nothing more - if you used MDX HV Colab (the one in the section above) on your specific Google Drive account before, ignore this step.\n2. Copy these files to onnx folder in MDX\\_Colab on your GDrive (inst1-3, 427) (down) <https://drive.google.com/drive/folders/13SsV7b_kC6SqkICeX5wKhx-Z05uC8dLl> (down)\n3. Overwrite models.py in MDX\\_Colab folder by provided below (not for new Colab)\n\n(compatible with inst1-3, 427, Kim vocal and other)\n\n<https://cdn.discordapp.com/attachments/945913897033023559/1036947933536473159/models.py> (completely different one with self.n\\_fft set to 7680 - incompatible with NET-1/9.x and 496 models)\n\n1. Use this notebook with added models\n\n(the same as the link in point 1):\n\n<https://colab.research.google.com/drive/1zx7DQM-W9i7MJuEu6VTYz1xRG6lKRKVL?usp=sharing>\n\n1. For Kims vocal model (poor instrumentals on Colab and no cutoff after inversion) copy vocals.onnx\n\n(use the same models.py from point 3): <https://drive.google.com/drive/folders/1exdP1CkpYHUuKsaz-gApS-0O1EtB0S82?usp=sharing>\n\nto onnx subfolder named \"MDX-UVR-Kim Vocal Model (old)\"\n\n1. For 496 inst model (inst main/MDX 2.1) go to the link below and put the model to onnx subfolder named “*MDX-UVR Ins Model 496 - inst main-MDX 2.1*” but you must replace attached models.py in the link in your GDrive (it’s from the OG HV Colab), and it is incompatible with the rest of the models in this new Colab - make a copy/rename the previous models.py in order to go back to it\n\n(496 model is not as effective as 464/inst3 leaving more vocal residues in some cases, but might work well in specific scenarios). 496 is the only model requiring the old models.py from 9.7/NET1-3 models (attached below). <https://drive.google.com/drive/folders/1iI_Zvc506xUv_58_GPHfVKpxmCIDfGhx?usp=share_link> (if you place model in the wrong place, you’ll get missing vocals.onnx error [e.g. wrong folder structure or name] or “Got invalid dimensions for input: input for the following indices index: 2 Got: 3072 Expected: 2048.” [when having wrong models.py])\n\n1. Demucs turned on works only with default mixing algorithm and vocal models (or else you’ll get “ValueError: operands could not be broadcast together with shapes (8886272,2) (8886528,2)”). Also, chunks might have to be decreased.\n2. Be aware that after following these steps if you launch the old HV Colab above, it may overwrite models.py by the old one in point 6, which is compatible only with inst main/496 or full band models, so you'll need to repeat step 3 or 10 in case of invalid dimensions error or cutoff of full band model.\n3. In case of runtime error, to use Kim model decrease chunks from 55 to 50, and for Demucs on, decrease it to 40 (or respectively even lower)\n4. (beta) Full band beta 292 model (with new, only working for that model, models.py file with self.n\\_fft changed to 6144).\n\nGo to the link below, copy model file to onnx subfolder called “MDX-UVR Ins Model Full Band 292” as in the link, and replace models.py (ideally make a backup/rename the old one in order to use previous models)\n\nThanks for help to Kim\n\n<https://drive.google.com/drive/folders/1CTJ6ctldr_avwudua1qJJMPAd7OrS2yO?usp=sharing>\n\n1. (beta) Full band beta 403 model (with the same modified models.py for these two models)\n\nCopy model file to:\n\nGdrive\\MDX\\_Colab\\onnx\\MDX-UVR Ins Model Full Band 403\\” as in the link below, and replace models.py in Gdrive\\MDX\\_Colab\n\n<https://drive.google.com/drive/folders/1UXPxQMVAocpyDVb3agXu0Ho_vqFowHpA?usp=sharing>\n\n1. (final) Full band 450/HQ\\_1 model (with the same modified models.py for the full band models)\n\nCopy model file to:\n\nGdrive\\MDX\\_Colab\\onnx\\MDX-UVR Ins Model Full Band 450 (HQ\\_1)\\” as in the link below, and replace models.py in Gdrive\\MDX\\_Colab (if you didn’t already for full band models)\n\n<https://drive.google.com/drive/folders/126ErYgKw7DwCl07WprAXWPD_uX6hUz-e?usp=sharing>\n\n1. From now on, you’re forced to run separately newly added torch cell to fix PyTorch issues\n2. Newer full band 498/HQ\\_2 model (with the same modified models.py for the full band models)\n\nCopy model file to:\n\nGdrive\\MDX\\_Colab\\onnx\\MDX-UVR Ins Model Full Band 498 (HQ\\_2)\\” as in the link below, and replace models.py in Gdrive\\MDX\\_Colab (if you didn’t already for full band models)\n\n<https://drive.google.com/drive/folders/1O5b-uBbRTn_A9B2QkefklCT41YR9voMq?usp=sharing>\n\n1. For full band models, use only modified models.py attached above, or you’ll get cutoff at 14.7kHz instead of 22kHz in spectrograms while using 427 models.py file.\n2. For Kim other FT instrumental model with cutoff but the highest SDR (even than inst3)\n\nCopy both (vocals and other) model files to:\n\nGdrive\\MDX\\_Colab\\onnx\\Kim ft other instrumental model\\” as in the link below, and replace models.py in Gdrive\\MDX\\_Colab (if you didn’t already for full band models)\n\n<https://drive.google.com/drive/folders/1v2Hy4AgFOJ9KysebGuOgn0rIveu510j6?usp=sharing> (it will give only 1 stem output, models duplicated fixes errors in Colab, models.py is from inst3 model)\n\n1. If you use models.py from fullband model, it will output fullband for ft other model, but giving much more vocal residues (but it still might be even better in some busy mix parts than VR models, while having still less vocal residues only in those busy parts like chorus) - definitely use min\\_mag here.\n2. To fix the following error, make sure both vocals and invert vocals are always checked:\n\n*shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory*\n\n*Intel MKL FATAL ERROR: Cannot load* /usr/local/lib/python3.9/dist-packages/torch/lib/libtorch\\_cpu.so.\n\nAbove error can also mean you need to terminate your session and start over. It randomly happens after using the Colab:\n\n1. I've reverted old \"Karokee\" and \"Karokee\\_AGGR\" models to use with the oldest HV’s [models.py](https://cdn.discordapp.com/attachments/708595418400817162/1102627615803723817/models.py) file, but these are old models (maybe they will do the trick, though).\n2. *ModuleNotFoundError: No module named 'models'*\n\nSometimes switching models.py doesn’t work correctly (especially during working on previously shared Colab folder with editing privileges) in that case, check Colab’s file manager if models.py is actually present after you’ve made a change on GDrive. If not, rename it to models.py (it might have been renamed to something else).\n\n1. Collection of all three models.py for all models for your comfort:\n\n<https://drive.google.com/drive/folders/1J35h9RYhPFk8dH-vShSW_AUharXY1YsN?usp=sharing>\n\n1. Main\\_406 vocal model\n\n<https://mega.nz/file/dcREzKTR#PYKk3s1NPicC3mBBYH8ejC2rK_Im3sAj0p9xcOi1cpE>\n\n\"compensate\": 1.075,\n\n\"mdx\\_dim\\_f\\_set\": 3072,\n\n\"mdx\\_dim\\_t\\_set\": 8,\n\n\"mdx\\_n\\_fft\\_scale\\_set\": 7680,\n\nModels include here only: baseline, instrumental models: 415 (inst\\_1), 418 (inst\\_2), 464 (inst\\_3) trained on 17.7kHz, and vocal model 427, and Kim’s vocal model (old) (instrumental should be automatically made by inversion option, but it’s not a very good one for it) and 292 and 403 full band. If you want to use older 9.7 models, use old HV Colab above.\n\n464/inst 3 should be the best in most cases for instrumentals and vocals than previous 9.x models, but depending, even in half of the cases, 418 can achieve better results, while full band 403 might give better results than inst3/464 in half of the cases.\n\n**Settings**\n\nmax\\_mag is for vocals\n\nmin\\_mag for instrumentals\n\ndefault\n\n(deleted from the new HV Colab, still in Kae Colab above)\n\nBut \"min mag solve some unwanted vocal soundings, but instrumental [is] more muffled and less detailed.\"\n\nAlso check out “default” setting (whatever is that, compare checksums if not one of these).\n\nChunks\n\nAs low as possible, or disabled.\n\nEquivalent of min\\_mag in UVR is min\\_spec.\n\nBe aware that UVR5, opposed to MDX Google Colab, applies cutoff to inverted output, matching the frequency of training frequency e.g. 17.7kHz for inst 1 and 3 models. It was to avoid some noise and vocal leftovers. You might have to apply it manually.\n\nAlso, you can uncomment visibility of compensation value in Colab, and change it to e.g. 1.08 to experiment.\n\nCompensation value for 464 MDX-UVR inst. model is 1.0568175092136585\n\nDefault 1.03597672895 is for 9.7 model, and it also does the trick with at least Kim (old) model in GUI (where 1.08 had worse SDR).\n\nOr check + 3.07 in DAW (it worked on Karokee model).\n\nIn Collab above, I also enabled visibility of max\\_mag for vocals and min\\_mag for instrumentals settings (mixing\\_algoritm).\n\nAlso, if you want to use Demucs option (ensemble) in Kae Colab, it uses stock Demucs 2, which in UVR5 was rewritten to use Demucs-UVR models with Demucs 3 or even currently better Demucs 4.\n\nAccording to MVSEP SDR measurements, for ensemble Max Spec/Min Spec was better than Min Spec/Max Spec, but Avg/Avg was still better than these both.\n\nAlso for ensemble, Avg/Avg is better compared to e.g. Max Spec/Max Spec - it's 10.84 v 10.56 SDR in other result.\n\nHow denoiser work\n\nIt's not frequency based, it processes “the audio in 2 passes, one pass with inverted phase, then after processing the phase is restored on that pass, and both passes mixed together with gain \\* 0.5. So only the MDX noise is phase cancelling itself.”\n\nOr the other way round:\n\n“it's only processing the input 2 times, one time normal and one time phase inverted, then phased restored after separation, so when both passes are mixed back together only the noise in attenuated. There's no other processing involved”\n\nDenoise serves to fix so called MDX noise existing in all inst/voc MDX-NET (v2) models.\n\n\\_\\_\\_\\_\\_\\_\n\nWeb version (32 bit float WAV as output for instrumentals, just use MDX-B for single MDX-Net models.\n\nIt was 9.682 MDX-UVR model in 2021, but in the end of 2022 it's probably inst 1 judging by SDR (not sure, as results are not exactly the same), then more models were added (e.g. HQ\\_3):\n\n<https://mvsep.com/>\n\nWeb version (paid for MDX, lossless):\n\n<https://x-minus.pro/>\n\nIn kae Colab, you can keep the option Demucs: off (ONNX only), it may provide better results in some cases even with the old MDX narrowband models (non-HQ).\n\nIn Colab you can change chunks to 10 if your track is below 5:00 minutes. It will take a bit more time, but the quality will be a bit cleaner, but more vocal residues can kick in (esp. short sudden ones).\n\nBe aware that MDX Colabs for single models have 16 bit output.\n\nAnd also noise cancellation implementation for MDX models in kae and HV Colab can differ a bit, plus there is also separate denoise method available as separate model.\n\nCode for denoise method in HV Colab [here](https://discord.com/channels/708579735583588363/887455924845944873/1021652469320781834).\n\nAs for any other settings, just use defaults since they're the best and updated.\n\nJust for a vocal it’s one of the best free solutions on the market, very close to the result of paid and (partly) closed Audioshake service (#1 AI in a Sony separation contest; SDRs are from the contest evaluation based on private dataset). Very effective, high quality instrumental isolation *AI* and custom model(but the old models are trained at 14.7 kHz [NET-X a.k.a. 9.x] in comparison to VR models, and 17.7kHz in newer models like inst X and kim inst).\n\nIn most cases MDX-UVR inverted models give less bleeding than VR (especially on bassy voice), while occasionally the result can be worse comparing to VR above, especially in terms of hi-end frequencies quality, but in general, MDX with UVR team models behaves the best for vocals and instrumentals.\n\nEven instrumental from inverted vocals from vocal models gets less impaired than in VR, since vocal filtering is less aggressive, but with even more bleeding in some cases. Depends on a song.\n\nYou can support the creators of UVR and the newest MDX model is also available on <https://www.patreon.com/uvr> <https://boosty.to/uvr> to visit <https://x-minus.pro/> to get an online version of MDX there as well (with exclusive paid models).\n\nAt least paid x-minus subscription allows you to use MDX HQ\\_2 498 (or HQ\\_3 already) instrumental model and for VR arch - 2\\_HP-UVR (HP-4BAND-V2\\_arch-124m), and Demucs 6s on their website. Feel free to listen and download lots of uploaded instrumentals on x-minus already. Dozens of instrumentals available.\n\n*Outdated*\n\nAlternatively you can experiment with 9662 model and ensemble it with the latest UVR 5's 4 band V2 with -a min\\_mag as Anjok suggested (but it was when new models weren't released yet).\n\nRemotely I only know about old Colab which ensembles any two audio files, but it uses old algorithm if I'm not mistaken, so it is not as good (better use the ensemble Colab linked at the very top of the document):\n\n<https://colab.research.google.com/drive/1eK4h-13SmbjwYPecW2-PdMoEbJcpqzDt?usp=sharing>\n\n\\_\\_\\_\\_\\_\n\nNote\n\nDon’t disable invert\\_vocals in Colab even if you only need vocal instead of instrumental, otherwise the Colab will end up with error.\n\n*MDX noise*\n\nThere is a noise using all MDX-UVR inst/vocal models, and it’s model dependent (irc 4 stems don’t have it). It's fixed in Colabs using denoiser \"however by using my method, conversions will be 2x slower as it needs to predict twice.\n\nI see no quality degradation at all, and I can't believe it actually worked rofl\" -HV\n\nAlso, UVR 5 GUI has the same noise filtering implemented (if not better, also with alternative model).\n\nCurrent MDX Colab has normalization feature “normalizes all input at first and then changes the wave peak back to original. This makes the separation process better, also less noise. IDK if you guys have tried this, but if you split a quiet track, and normalize it after MDX inference the noise sounds more audible than normalizing it and changing the peak back to original.”\n\nIf you want to experiment with MDX sound, the Colab from before that change is below:\n\n<https://colab.research.google.com/drive/1EXlh--o34-rzAFNEKn8dAkqYqBvhVDsH?usp=sharing> (might no longer work due to changes made by Google to Colab environment, the last maintained are kae and HV (new) Colabs)\n\nFurthermore, you can also try manually mixing vocal with original track using phase inversion and add specific gain on vocal track (+1.03597672895 or +3.07) for 9.7 model (or other ones with different values), using both this and below Colab and save result as 32 bit float (but this might have more bleeding, but it uses 32 bit while chunking):\n\n<https://colab.research.google.com/drive/1R32s9M50tn_TRUGIkfnjNPYdbUvQOcfh?usp=sharing#scrollTo=lkTLtOvyBuxc>\n\n(for e.g. the best compensation value for 464 MDX-UVR inst. model is 1.0568175092136585\n\nand it's not constant)\n\nAlso be aware that MVSEP uses 32 bit for MDX-UVR models for ready inversion of any model too.\n\nIf you look for eliminating the noise from MDX-UVR instrumentals, also the method described in Zero Shot below might work.\n\n\"I just run the MDX vocals thru UVR to remove any remaining buzz noises and synths, it works great so far\" (probably meant one of VR models)\n\nAverage track in Colab is being processed in 1:00-1:30 minute using slower Tesla K80 (much faster than even UVR’s HP-4BAND-V2\\_arch-124m model).\n\nIf you want to get rid of some artifacts, you can further process output vocal track from MDX through Demucs 3.\n\nOptions in the old HV MDX Colab/or kae fork Colab (from the very top)\n\nDemucs model in the older MDX-Net Colab\n\nWhen it's enabled, it sounds better to me, used with the old narrowband 9.X and newer vocal models, as Demucs 2 model is fullband, but opinions on superiority of this option are divided, and MVSEP dev made some SDR calculation where it achieved worse results with Demucs enabled. But be aware, that inverted results from narrowband are still fullband despite the narrowband training frequency, as there’s no cutoff matching present in Colab, as it’s implemented in UVR GUI as a separate option. Using such cutoff matching training frequency (which can be observed in non-inverted stem) might lead to less noise and residues in the results. Demucs model will work correctly only with vocal models in Colabs (we didn’t have any MDX instrumental models back then, so naming scheme is reversed for these models, hence Demucs model with instrumental model produces distorted sound, it mixes vocals with instrumental in a weird way).\n\n“The --shifts=SHIFTS performs multiple predictions with random shifts (a.k.a. the shift trick) of the input and average them. This makes prediction SHIFTS times slower but improves the accuracy of Demucs by 0.2 points of SDR. It has limited impact on Conv-Tasnet as the model is by nature almost time equivariant. The value of 10 was used on the original paper, although 5 yields mostly the same gain. It is deactivated by default, but it does make vocals a bit smoother.\n\nThe --overlap option controls the amount of overlap between prediction windows (for Demucs one window is 10 seconds). Default is 0.25 (i.e. 25%) which is probably fine.”\n\nYou can even try out 0.1, but for Demucs 4 it decreases SDR in ensemble if you’re trying to separate a track containing vocals. If it’s instrumental, then 0.1 is the best (e.g. for drums).\n\n(outdated/for offline use/added to Colab)\n\nHere's the new MDX-B Karokee model! <https://mega.nz/file/iZgiURwL#jDKiAkGyG1Ru6sn21MkIwF90C-fGD0o-Ws58Mn3O7y8>\n\nThe archive contains two versions: normal and aggressive. The second removes the lead vocals more. The model was trained using a dataset that I completely created from scratch. There are 610 songs in total. We ask that you please credit us if you decide to use these models in your projects (Anjok, aufr33).\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n## **Demucs 3**\n\nfor 4 stems\n\n(SDR 7.7 for 4 stems, it’s better than Spleeter (which is SDR 6.5-7), or better than MDX 4 stem. In most cases, it’s even better than Audioshake - at least on tracks without leading guitar)\n\nAccompanied by MDX-UVR 9.7 vocal model, it gives very good 4 stem separation results\n\n(For Demucs 4 a.k.a \"htdemucs\" check below)\n\n<https://colab.research.google.com/drive/1yyEe0m8t5b3i9FQkCl_iy6c9maF2brGx?usp=sharing> (by txmutt), alternatively with [float32](https://colab.research.google.com/drive/1Q-mD-ypAoaaTzmt52nMiml10Wz0NgOMT?usp=sharing) here\n\nOr <https://huggingface.co/spaces/akhaliq/demucs>\n\nOr <https://mvsep.com/>\n\nPick up from the list Demucs Model B there.\n\nYou can export result files in MP3 320kbps, WAV and FLAC. File limit is 100MB and has a 10 minute audio length limit.\n\nTo use Demucs 3 locally: <https://discord.com/channels/708579735583588363/777727772008251433/909145349426384917>\n\nCurrently, all the code uses now main branch which is Demucs 4 (previously HT) but these Colabs use old mdx\\_extra model.\n\nDemucs 3 UVR models 2 stem only available on MVSEP.com or in UVR5 GUI (nice results in cases when you suffer vocal bleeding i regular UVR5, GSEP, MDX 9.7 - model 1 less aggressive, model 2 more destructive, model bag has more bleeding of all three).\n\nIn Colab, judging by quality of drums track, I prefer using overlap 0.1 (only for instrumentals), but default set by the author is 0.25 and is better for sound of instrumental as a whole.\n\nBut it still provides decent results with instrumentals.\n\nAlso, HV had overall better separation quality results using shifts=10, but it increases separation time (it's also reflected by MVSEP's SDR calculations). Later we found out it can be further increased to 20.\n\nAlso, I have a report that you may get better results in Demucs using previously separated instrumental from e.g. UVR.\n\nAnjok’s tip for better instrumentals: “I recommend removing the drums with the Demucs, then removing the vocals and then mixing the drums back in”. Yields much better results than simple ensemble.\n\nIt works the best in cases when drums get muffled after isolation, e.g. in hip-hop. You need to ensure that tracks are aligned correctly. E.g. if you isolate drumless UVR track, isolate also regular track to align drumless UVR track easier with drums track from Demucs, otherwise there will be hard to find the same peaks. Then simply align drumless UVR the same as regular track is aligned and mute/delete UVR regular (instrumental) track.\n\nBe aware! This is not a universal solution for the best isolation in every case. E.g. in tracks with busy mix like Eminem - Almost Famous, the guitar in the background can get impaired, and so even drums (UVR tends to impair guitars in general, but on drumless track it was even more prevalent - in that case normal UVR separation did better job).\n\n**Also**, if you slow down the input file, it may allow you to separate more elements in the “other” stem.\n\nIt works either when you need an improvement in such instruments like snaps, human claps, etc.\n\nNormally, the instrumental sounds choppy when you revert it to normal speed. The trick is - \"do it in Audacity by changing sample rate of a track, and track only (track menu > rate), it won't resample, so there won't be any loss of quality, just remember to calculate your numbers\n\n44100 > 33075 > 58800\n\n48000 > 36000 > 64000\n\n(both would result in x 0.75 speed)\n\netc.\".\n\nAlso, there's dithering enabled in Audacity by default. Might be worth disabling it in some cases. Maybe not, but still, worth trying out. There should be less noise.\n\nBTW. If you have some remains of drums in acapella using UVR or MDX, simply use Demucs, and invert drums track.\n\n“The output will be a wave file encoded as int16. You can save as float32 wav files with --float32, or 24 bits integer wav with --int24” it doesn’t seem to work in Colab.\n\n## Demucs 4 (+ Colab) (4, 6 stem)\n\n4 stem, SDR 9 for vocals on MUSDB HQ test, and SDR 9 for mixdowned instrumentals (5, 6 stem - experimental piano [bad] and guitar)\n\n<https://github.com/facebookresearch/demucs> (all these models available in UVR 5 GUI or MVSEP [just x-minus doesn’t have ft model for at least free users, it was mmi model at some point, but then got replace by MDX-B which “ turned out to be not only higher quality, but also faster”])\n\nGoogle Colab (all 4-6 stem models available, 16-32 bit output)\n\n<https://colab.research.google.com/drive/117SWWC0k9N2MBj7biagHjkRZpmd_ozu1>\n\nor Colab with upload script without Google Drive necessity:\n\n<https://colab.research.google.com/drive/1dC9nVxk3V_VPjUADsnFu8EiT-xnU1tGH?usp=sharing>\n\nor Colab by Bezio with batch processing, (only mp3 output and no overlap/shifts parameters beside model choice - choose demucs\\_ft for 4 stems):\n\n<https://colab.research.google.com/drive/15IscSKj8u6OrooR-B5GHxIvKE5YXyG_5?usp=sharing>\n\nor Colab with batch processing by jarredou (less friendly GUI, but should be usable too, lossless):\n\n<https://colab.research.google.com/drive/1KTkiBI21-07JTYcTdhlj_muSh_p7dP1d?usp=sharing>\n\n\"I'd recommend using the “htdemucs\\_ft” model over normal “htdemucs” since IMHO it's a bit better\", also SDR measurements confirm that. 6s might have more vocal residues than both, but will be a good choice in some cases (possibly songs with guitar).\n\nAll the best stock models:\n- htdemucs\\_ft (f7e0c4bc-ba3fe64a.th, d12395a8-e57c48e6.th, 92cfc3b6-ef3bcb9c.th, 04573f0d-f3cf25b2.th [drums, bass, other, vocals])\n\n“fine-tuned version of htdemucs, separation will take 4 times more time but might be a bit better. Same training set as htdemucs”.\n\nCan be obtained with UVR5 in download center (04573f0d-f3cf25b2.th, 04573f0d-f3cf25b2.th, d12395a8-e57c48e6.th, f7e0c4bc-ba3fe64a.th; not in order)\n\n- htdemucs - “first version of Hybrid Transformer Demucs. Trained on MusDB + 800 songs.”\n\nDefault Demucs model in e.g. UVR5 (955717e8-8726e21a.th)\n\n- htdemucs\\_mmi = Hybrid Demucs v3, retrained on MusDB + 800 songs\n\nhtdemucs\\_6s = 6 sources version of htdemucs, with piano and guitar being added as sources. Note that the piano source is not working great at the moment.”\n\n“nowhere near Logic Pro” from May 2025 update.\n\n- mdx\\_extra: The best Demucs 3 model from MDX 2021 challenge. Trained with extra training data (including MusDB test set), ranked 2nd on the track B of the MDX 2021 challenge.\n\n- mdx\\_extra\\_q: a bit worse quantized version of the above (a bit faster)\n\nBe aware that also UVR team and also ZFTurbo [available on MVSEP and GitHub] trained their own Demucs models (respectively instrumental and vocal ones), but there are some issues with ZFTurbo model using inference other than provided on his GitHub (so it’s so far not compatible with e.g. UVR giving “KeyError: \"'models'\" for ckpt Demucs models insteat of th).\n\nTo use the best Demucs 4 model in the official Colab (the 2nd link) rename model to e.g. “htdemucs\\_ft”. It can behave better than 6 stems if you don’t need extra stems.\n\nIn other cases, extra stems will sound better in the mix, although using 6s model, vocal residues are usually louder than in ft model (but that might depend on a song or genre).\n\nDespite the fact that 6s is an electric guitar model, it can also pick up acoustic guitar very well in some songs.\n\nThe problem with 6s models is that “when a song has a piano because not only the piano model is not the best, but it also makes the sound itself worse\n\nrather than just very filtered piano, it sounds like distorted filtered piano”\n\nSometime Gsep can be “still better because each stem has its dedicated model\" but it depends on a song (other stem in GSep can be better more frequently, but now MDX23 jarredou fork or Ensemble models on MVSEP returns good other stems as well)\n\nGsep instead of inverting the whole result among stems like Demucs, won’t preserve all the instruments occasionally.\n\n\"htdemucs (demucs 4) comes a bit closer [vs Gsep], most of the time the bass is better and there are few instances where demucs picks up drums better\"\n\n“From my experience and testing: If you decide to process an isolated track through Demucs, it has no trouble identifying what is bass guitar and what isn't bass guitar [does not matter if it's finger/pick/slap, it works on all of them for me, except distorted wah-wah bass]. The leftover noise [the part's that demucs did not pick up, and left it in the (No Bass) stem] is usually lower than minus 40 - 45 DB, and it's either noise, or hisses usually.\n\nThe problem comes when there are instruments besides the bass guitar that are playing beside it [a.k.a. music], since these are separation models, not identification models. It starts having trouble grabbing all the upper harmonics [which is the multiple of the root note frequency], and the transients, potentially starts mis-detecting, or in extreme cases, it does not pick up the bass at all.”\n\n“When used with \"--shifts\" > 0, demucs gives slightly different results each time you use it, that can also explain some little score differences”\n\n<https://github.com/facebookresearch/demucs/issues/381#issuecomment-1262848601>\n\nInitially, Shifts 10 was considered as max, but it turned out 20 can be used.\n\nOverlap 0.75 is max before it gets very slow (and 0.95 when it becomes overkill).\n\nWhile we also thought overlap 0.99 is max, it turned out you can use 0.99999 in UVR, and 0.999999 in CLI mode, but both make separations tremendously long, even 0.999 much longer than 0.99.\n\nOn GTX 1080 Ti on 1 minute song:\n\n`0.99` = Time Elapsed: `00:09:45`\n\n`0.999` = Time Elapsed: `01:36:45`\n\nAlso, shifts can be set to 0.\n\nWith htdemucs\\_ft, shifts doesn't matter nearly as much as overlap, I recommend keeping (shifts) at 2 [for weaker GPUs].\n\nThe drum SDR with 1 and 10 shifts difference is about 0.005\n\nSo overlap impacts SDR a bit more than shifts.\n\n“The best way to judge optimum settings is to take a 10-second sample of a vocal extraction where there's evident bleeding and just keep trying higher overlaps etc until you're happy, or you lose patience, then you'll arrive at what I call the 'Patience Ratio'. For me, it's 2x song length.”\n\n*Installation of only Demucs for Windows*\n\nUse UVR, or:\n\nDownload the git repo, extract it, then open PowerShell and write\n\n\"pip install \\*insert the directory of the extracted repo here\\*\"\n\n<https://github.com/facebookresearch/demucs#egg=demucs>\n\nAlternatively, execute this command:\n\npip install git+<https://github.com/facebookresearch/demucs#egg=demucs>\n\nor download the git repo first and then\n\n\"pip install \\*insert the directory of the extracted repo here\\*\"\n\nIn case of “norm\\_first\\_ error run this line or update torch to 1.13.1\n\npython.exe pip install -U torch torchaudio\n\nIn Colab, judging by quality of drums track, I prefer using overlap 0.1 (better only for instrumentals) with shifts 10 (actually can be set to even 20), but default set by the author is 0.25 and is better for sound of instrumental as a whole.\n\nAlso, we have overall better separation quality results using shifts=10, but it increases separation time (it's also reflected by MVSEP's SDR calculations). Overlaps also increase general separation quality for instrumentals/vocals, at least up to 0.75, but everything above starts being tremendously slow (few hours for 0.99 max setting).\n\nIf you use particularly high overlap like 0.96 for a full length song, you can run out of Colab time limit if it’s not your first file being processed during this session (for cases when processing takes more than 1 hour). If you exceed the limit, you can change Google account in the right top (don’t use other account during mounting, or you’ll end up with error). The limit is reset after 12 hours (maybe sooner). It’s capable of processing one file for two hours, at least only if it’s the first file being processed for a longer time during this day. Also, rarely, it can happen that your file is being processed faster than usual despite the same T4 GPU.\n\nIf you have *“something has gone terribly wrong*” error right on the separation start, simply retry. If in the end of long separation - ignore it, and don’t retry - your result is in the folder.\n\n- \\*clipclamp\\* - uncheck it to disable hard limiter, but it may cause separation artifacts on some loud input files or will change volume proportions of the stems. I like it enabled somehow.\n\n- Q: How to stop Demucs from rescaling the volume of stems after they're extracted (without adjusting the volume of the input mixture and passing --clip-mode=clamp)?\n\nA: Set “”--clip-mode none argument coupled with export to --float32” (jarredou)\n\n[Picture](https://imgur.com/a/0VxJdz4)\n\n*Demucs parameters explained by jarredou*\n\n- “Overlap is the percentage of the audio chunk that will be overlapped by the next audio chunk. So it's basically merging and averaging different audio chunk that have different start (& end) points.\n\nFor example, if audio chunk is `|---|` with overlap=0.5, each audio chunk will be half overlapped by next audio chunk:\n\n```\n\n|---|\n\n|---|\n\n|---| etc...\n\n|---| (2nd audio chunk half overlapping previous one)\n\n|---| (1st audio chunk)\n\n```\n\n-shifts is a random value between 0 and 0.5 seconds that will be used to pad the full audio track, changing its start(&end) point. When all \"shifts\" are processed, they are merged and average. (...)\n\nIt's to pad the full song with a silent of a random length between 0 and 0.5 sec. Each shift add a pass with a different random length of silence added before the song. When all shifts are done (and silences removed), the results are merged and averaged.\n\nShifts is performing lower than overlap because it is limited to that 0.5 seconds max value of shifting, when overlap is shifting progressively across the whole song. Both works because they are shifting the starting point of the separations. (Don't ask me why that works!)\n\nBut overlap with high values is kinda biased towards the end of the audio, it's caricatural here but first (chunk - overlap) will be 1 pass, 2nd (chunk - overlap) will be 2 passes, 3rd (chunk - overlap) will be 3 passes, etc…”\n\nSo Overlap has more impact on the results than shift.\n\n“Side-note: Demucs overlap and MVSEP-MDX23 by ZFTurbo overlap features are not working in the same way. (...)\n\nDemucs is kinda crossfading the chunks in their overlapping regions, while MVSep-MDX23 is doing avg/avg to mix them together”\n\nWhy is overlapping advantageous?\n\nBecause changing the starting point of the separation give slightly different results (I can't explain why!). The more you move the starting point, the more different the results are. That's why overlap performs better than shifts limited to 0-0.5sec range, like I said before.\n\nOverlap in Demucs (and now UVR) is also crossfading overlapping chunks, that is probably also reducing the artifacts at audio chunks/segments boundaries.\n\n[So technically, if you could load the entire track in at once, you wouldn't need overlap]\n\nShifts=10 vs 2 gives +0.2 SDR with overlap=0.25 (the setting they've used in their original paper), if you use higher value for overlap, the gain will be lower, as they both rely on the same \"trick\" to work.\n\nShifts=X can give little extra SDR as it's doing multiple passes, but will not degrade \"baseline\" quality (even with shifts=0)\n\nLower than recommanded values for segment will degrade \"baseline\" quality.\n\nSo in theory, you can equally set shifts to 0 and max out overlap.\n\nSegments optimum (in UVR beta/new) is 256.\n\n## Gsep (2, 4, 5, 6 stem, karaoke)\n\n<https://studio.gaudiolab.io/>\n\nPaid (20 minutes free in mp3 - no credit card required)\n\n7$/60 minutes\n\n16$/240 minutes\n\n50$/1200 minutes\n\nElectric guitar (occasionally bad), good piano, output: mp3 320kbps (20kHz cutoff), wav only for paid users, accepted input: wav 16-32, flac 16, mp3, m4a, mp4, don’t upload files over 100MB (and also 11 minutes may fail on some devices with Chrome \"aw snap\" error), capable of isolating crowd in some cases, and sound effects. Ideally, upload 44kHz files with min. 320kbps bitrate to have always maximum mp3 320kbps output for free.\n\n2025 metrics for 2 stems\n\n<https://mvsep.com/quality_checker/entry/9095>\n\n*(outdated) About its SDR*\n\n[10.02 SDR](https://gaudiolab.com/ai_source_separation_technology/) for vocal model (vs Byte Dance 8.079) on seemingly MDX21 chart, but non-SDR rated newer model(s) were available from 09.06.22, and later by the end of July, and now new model is released since 6 September (there were 4 or 5 different vocal/instrumental models in total so far, the last introduced somewhere in September and no models update was performed with later UI update). [MVSEP SDR comparison](https://mvsep.com/quality_checker/leaderboard.php?sort=insrum) chart on their dataset, shows it's currently around SDR 9 for both instrumental and vocals, but I think evaluation done on [*demixing challenge*](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/leaderboards?challenge_leaderboard_extra_id=869&challenge_round_id=886&post_challenge=true) (first model) was more precise. Be aware that GSEP causes issue of cancelling different sounds which cannot be found in any stem.\n\nSince May 2024 update there was an average of 0.13 SDR increase for mp3 output and first 19 songs from multisong dataset evaluation, but judging by no audible difference for most people, they could simply change some parameters of inference. Actually, it’s more muddy now, but in some songs there are a bit less of vocal residues, and in other songs, noticeably more. Inverting the mixture with vocals in WAV will muffle the sound in overall, e.g. snares, esp. in places of these residues, but the residues will disappear as well.\n\nUncheck vocals to download WAV file if WAV download doesn't work,\n\nand uncheck instrumental to download vocals in WAV -\n\ndon't check all stems if you can't download WAV at all and download window simply disappears.\n\nIf you still can’t download your WAV files, go to Chrome DevTools>Network before starting downloading, and press CTR+R, now start download. Now both stems should be shown in DevTools>Network, starting with input file name, e.g. instrumental with ending name “result\\_accom.wav” (usually marked as “fail” in State column and xhr as type), click the entry with right mouse button and choose Open in new tab.\n\nThe download may fail frequently, forcing you to resume the download multiple times in browser manually, or wait a bit on the attempt to download the file at the start.\n\nFree option of separating has been removed since the May 2024 update. There's only a 20-minute free trial with mp3 output.\n\nVocals and all other stems (including instrumentals/others) are paid, and length for each stem is taken from your account separately for each model.\n\nNo credit is not required for the trial.\n\nFor free, only mp3 output and 10 minutes input limit.\n\nFor paid users there's a 20 minutes limit, and mp3/wav output, plus paid users have faster queue, shareable links, and long term results storage.\n\nSeems like there weren't any changes in the model\n\n<https://www.youtube.com/watch?v=OGWaoBOkiMg>\n\nThe old files from previous separations on your account didn't get deleted so far if you have premium.\n\n<https://studio.gaudiolab.io/pricing>\n\nThere was also added a new option for vocals called “Vocal remover” - good \"for conservative vocals, it's fine it even has 15 best scoring on SDR.\" and 10.85 in vocals on multisong dataset.\n\n*Instruction*\n\nLog in, and re-enter into the link above if you feel lost on the landing page.\n\nFor instrumental with vocals, simply uncheck drums, choose vocal, and two stems will be available for download.\n\nAs for using 4/5 stem option for instrumental after mixing if you save the tracks mixed in 24 bit in DAW like Audacity, it currently produces less voice leftovers, but the instrumental have worse quality and spectrum probably due to noise cancellation (which is a possible cause of [missing sounds](https://vimeo.com/776925082) in other stem). Use 5 stem, but cut silence in places when there is no guitar in the stem to get comparable quality to 4 stem in such places.\n\nFor 3-6 stem, you better don’t use dedicated stems mixing option - yes, it respects muting stems to get instrumental as well, but the output is always mp3 128kbps while you can perform mixdown from mp3s to even lossless 64 bit in free DAWs like Audacity or Cakewalk.\n\nIn some very specific cases you can get a bit better results for some songs by converting your input FLAC/WAV 16 to WAV 32 in e.g. Foobar2000.\n\n*Troubleshooting*\n\n- (fixed for me) Sometimes very long \"**Waiting**\" or recently **“Waiting”** - can disappear after refreshing the site after some time (July 2023) - e.g. if you see “SSG complete” message, you can refresh the site to change from waiting to waveform view immediately. I had that on a fresh account once when uploading the very first file on that account, and then it stopped happening (later it happened for me on an old account as well).\n\n- (might be fixed too) **If you don’t see all stems after separation** (e.g. while choosing 2 stems, only vocals or only instrumental is shown) and only one stem can be downloaded (can’t be done on mobile browser) - workaround:\n\n- \"Aw snap\" error on mobile Chrome can happen on regular FLACs as well as an attempt to download a song. Simply go back to the main page and try to load the song again and download it.\n\n- If nothing happens when you press download button on PC, also go to Chrome DevTools>Network>All and click download again. Then new files will appear on the list. Right click and open mp3 file in a new tab to begin download. Alternatively, log into your account in incognito mode.\n\n- If you have \"An error has occurred. Please reload the page and try again.\" try deleting Chrome on mobile (cleaning cache wasn't enough in one case).\n\n- (rather fixed) If you have “***no audio***” error all the time when separation is done, or preview loading is infinite, or you have only one stem, also -\n\nIn PC Chrome go to DevTools>Network>[All](https://media.discordapp.net/attachments/708579735583588366/1123682568399749271/image.png?width=1222&height=634) and refresh this audio preview site, and new entries will show up on the right, which among others will list filenames with your input file name with stems names e.g. \"rest of targets\" in the end.\n\nDouble click it or click RBM on it and press open on new tab, and download will start.\n\nIf no filenames to download appear on the list, press CTRL+R to refresh the site, and now they should appear.\n\nIn specific cases, files in the list won’t show up, and you will be forced to log in to GSEP using incognito mode (the same account and result can be used). Also, make sure you have enough of disk space on C:.\n\nAlternatively, clean site/browser cache (but the latter didn't help me at some point in the past, don't know how now).\n\nIf still the same, use VPN and/or new account (all three at the same time only in very specific cases when everything fails). You can also use different browser.\n\n- When you see loop of redirections when you just logged, and you see Sign In (?~go to main page) simply enter the main link <https://studio.gaudiolab.io/gsep>\n\n- If you’re getting mp3 with **bitrate lower than 320kbps** which is base maximum quality in this service (but you get 112/128/224 output mp3 instead)\n\n> Probably your input file is lossy 48kHz or/and in lower bitrate than 320kbps > your file must be at least mp3 320kbps 44kHz (and not 48kHz). The same issue exists for URL option and for Opus file downloaded from YouTube when you rename it to m4a to process it in GSEP. To sum up - GSEP will always match bitrate of the input file to the output file if it’s lower than 320kbps. To avoid this, use lossless 44kHz file or if you can’t, convert your lossy file to WAV 32 bit (resample Opus to 44kHz as well - it’s always 48kHz, for YT files, don’t download AAC/m4a files - they have cutoff at 16kHz while Opus at 20kHz). Now you should get 320kbps mp3 as usual without any worse cutoff than 20kHz for mp3 320kbps.\n\nIf you still not get 320kbps, try using incognito mode/VPN/new account (at best all three at the same time).\n\nYou can use Foobar2000 for resampling e.g. Opus file (RBM on file in playlist>convert>processing>resampler>44100. And in output file format>WAV>32 bit). Don’t download from YT in any other audio than Opus, otherwise it will have 16kHz cutoff and separation result will be worse.\n\n- (fixed) Also on mobile, the file may not appear on your list after upload, and you need to refresh the site.\n\n- If FLAC persists to be stuck in the \"Uploading\" screen, try converting it to WAV (32-bit float at best)\n\n- Check [this](https://vimeo.com/776925082) video for fixing issues in missing sounds in stems (known issue with GSEP)\n\n- GSEP separation results don't begin at the same time signature like UVR results.\n\n> In order to fix it, convert mp3 to WAV or align stems manually if you need it for some comparisons or manual ensemble. Also some DAWs can correct it automatically on import.\n\nEventually hit their [Discord](https://discord.gg/tMcqmhu79Y) server and report any issues (but they’re pretty much inactive lately).\n\n*Remarks about quality of separation*\n\n“The main difference (vs old model) is the vocals. I can't say for sure if they're better than before, but there is a difference, the \"others\" and \"bass\" are also different. Only the drums remain the same. Generally better, but the difference is not massive, depends on the song” (becruily)\n\nGSEP is generally good for tracks where using all the previous methods you had bleeding (e.g. low-pitched hip-hop vocals) or got flute sounds removed, although it struggles with “cuts” and heavily processed vocals in e.g. choruses. Though, it has more bleeding in some cases when the very first model didn't, so new MDX-UVR models can achieve generally better results now.\n\n\"GSEP is good at piano extraction, but it still lacks in vocal separation, in many times the instruments come out together with the voices, this is annoying sometimes.\"\n\nElectric guitar model got worse in the last update in some cases. Also, bass & drums also not so loud since the first release of gsep.\n\n\"Electric guitar model barely picks up guitars, it doesn't compare to Demix/lalal/Audioshake\".\n\n“I kinda like it. When it works (that's maybe 50-60% of time), it's got merit.”\nThe issue happens (also?) when you process (GSEP) instrumental via 5 stems. If you process a regular song with vocals - it picks up guitar correctly. It happens only in a place where previously was vocal removed by GSEP 2 stem.\n\nI only tested GSEP instrumental so far, I don’t know whether it happens on official instrumentals too (maybe not).\n\nThe cool thing is that when the guitar model works (and it grabs the electric), the remaining 'other' stem often is a great way to hear acoustic guitar layers that are otherwise hidden.\n\nThe biggest thing I'd like to see work done on is the bass training. At present, it can't detect the higher notes played up high... whereas Demucs3/B can do it extremely well.”\n\nIt has “much superior” other stem than Demucs or even better than Audioshake. It has changed since 6 September 2022, but probably got updated since then and is probably fine.\n\nAs for 14.10.22 piano model sounds “very impressive”.\n\nAs for the first version of the model comparable vocal stem to MDX-UVR 9.7, but with current limitation to mp3 320kbps and worse drums and bass than Demucs (not in all cases). Usually less bleeding in instrumentals than VR architecture models.\n\n“Gsep sounds like a mix between Demucs 3 and Spleeter/lalal, because the drums are kind of muffled, but it's so confident when removing vocals, there aren't as many noticeable dips like other filtered instrumentals, and it picks up drums more robustly than Demucs. [it can be better in isolating hihats then Demucs 4 ft model too]\n\nIt removes vocals more steadily and takes away some song's atmospheres, rather than UVR approach which tries to preserve the atmosphere, but [in UVR] you end up with vocal artefacts”\n\nAs for tracks with more complicated drums sections: “GSEP sounds much fuller, Demucs 3 still has this \"issue\" with not preserving complex drums' dynamics” it refers to e.g. not cancelling some hi-hats even in instrumentals.\n\nIt happens that some instruments can be deleted from all stems. “From what I've heard, [it] gets the results by separating each stem individually (rather than subtractive / inverting etc.), but this means some sounds get lost in between the cracks you can get those bits by inverting the gsep stems and lining up with the original source, you should then be left with all the stuff gsep didn't catch”.\n\nAlso, I'd experiment with the result achieved with Demucs ft model, and apply inversion for just the specific stem you have your sounds missing.\n\nAs for June 2023 gsep is still the best in most cases for stems, not anywhere close to being dead\n\ngsep loves to show off with loud synths and orchestra elements, every other mdx/demucs model fail with those types of things\n\n*Processing*\n\nAfter your track is uploaded (when 5 moving bars disappear) it’s very fast, and it takes 3-4 minutes for one track to be separated using 2 stem option (processing takes around 20 seconds). If 5 bars are moving longer than expected track upload time, and you see that nothing uses your internet upload, simply press CTRL+R and retry, if still the same, log off and log in again. It can rarely happen that the upload stuck (e.g. when you minimize the browser on mobile or switch tabs).\n\nGenerally it’s very fast and long after the very first GSEP days, I needed to wait briefly in queue twice at 6-9 PM CEST, and I think once on Sunday in weekend of adding new model once in my whole life I waited around 7 minutes. Usually you wait in a queue longer than processing takes, so it’s bloody fast.\n\n\\_\\_\\_\n\n*(outdated)*\n\nIf your stems can’t be downloaded after you click the download button, go to Tools for Developers in your browser and open the console and retry. Now you should see an error with file address and your file name in it. You can simply copy the address to the address bar and start downloading it.\n\n(Outdated - 3rd model changes) The quality of hi-hats is enhanced, sometimes at the cost of less vivid snare in less busy mix, while it’s usually better in busy mix now, but it sometimes confuses snare in tracks when it sounds similar to hi hat making it worse than it was. So a trap with lots of repetitive hi-hats and also tracks with a busy mix should sound better now.\n\n## dango.ai\n\n([2](https://dango.ai/vocal-remover) or [more](https://dango.ai/stem-separator) [up to 6+] stems, paid only, 30 seconds free preview of mp3 320 output, 20kHz cutoff)\n\ndrums, vocal, bass guitar, electric guitar, acoustic guitar, violin, erhu\n\n“10 tracks = €6.33 + needs Alipay or WeChat Pay”\n\nmax 12 minutes input files allowed\n\nNow the site has English interface\n\nCurrently, one of the best instrumental results (if not the best). Not so good vocals.\n\n(for older models) The combination of 3 different aggression settings (mostly the most aggressive in busy mix parts) gives the best results for Childish Gambino - Algorithm vs our top ensemble settings so far. But it's still far from ideal (and [not only] the most aggressive one makes instruments very muffled [but vocals are better cancelled too], although our separation makes it even a bit worse in more busy mix fragment).\n\nAs for drums - better than GSEP, worse than Demucs 4 ft 32, although a bit better hihat. Not too easy track and already shows some diffrences between just GSEP and Demucs when the latter has more muffled hi-hats, but better snare, and it rather happens a lot of times\n\n(old) Samples:\n\n[Instrumental](https://discord.com/channels/708579735583588363/708579735583588366/1070546464322883664)\n\n[Drums](https://drive.google.com/drive/folders/1RSKoci8gPd4w2fRNb2rsJdrjPhaIfFxm?usp=sharing)\n\nAlso, it automatically picks the first fragment for preview when vocal appears, so it is difficult to write something like AS Tool for that (probably manipulations by manual mixing of fake vocals would be needed). Actually, smudge wrote one.\n\nVery promising results even for earlier version.\n\nThey wrote once somewhere about limited previews for stem mode (for more than 2 mode) and free credits, but haven’t encountered it yet.\n\nThey’re accused by aufr33 to use some of UVR models for 2 stems in the past, without crediting the source (and taking money for that).\n\nNow new, better models are released. Better instrumentals than in UVR/MVSep, and not the same models.\n\nIt used to be possible to get free 30 seconds samples on dango.ai, but recently 5 samples are available for free (?also) here:\n\n<https://tuanziai.com/vocal-remover/upload>\n\nYou must use the built-in site translate option in e.g. Google Chrome, because it's Chinese only. You are able to pay for it using Alipay outside China.\n\n## [music.ai](https://music.ai/)\n\nPaid - $25 per month or pay as you go ([pricing chart](https://music.ai/pricing/)). In fact, no free trial.\n\nGood [selection](https://i.imgur.com/ssjEI79.png) of models and interesting [module stacking](https://i.imgur.com/h3u7Vuw.png) feature.\n\nTo upload files instead of using URLs “you make the workflow, and you start a job from the main page using that custom workflow” [~ D I O ~].\n\nAllegedly it’s made by Moises team, but the results seem to be better than those on Moises.\n\n“Bass was a fair bit better than Demucs HT, Drums about the same. Guitars were very good though. Vocal was almost the same as my cleaned up work. (...) An engineer I've worked with demixed to almost the same results, it took me a few hours and achieve it 39 seconds” (...) I'd say a little clearer than MVSEP 4 Ensemble. It seems to get the instrument bleed out quite well,”\n\n“Beware, I've experienced some very weird phase issues with music.ai. I use if for bass, but vocals are too filtered / denoised imo and you can't choose to not filter it all so heavily.”\n\nSam Hocking\n\n## MDX23 by ZFTurbo (jarredou fork) - 2, 4 stems\n\n(2-4 stems, max 32-bit float output)\n\n*As of October 2025, Colabs are defunct due to Google’s runtime changes (possible* [*fix*](https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference/issues/5#issuecomment-3478451287)*).*\n\n[v2.5](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/MVSep-MDX23-Colab.ipynb), [v2.5 /w HQ\\_5](https://colab.research.google.com/drive/1v-a7qcdmUOaXLJd9QUpUBnac9Atxb3dd?usp=sharing) (experimental - muddiness, residues - set HQ\\_5 weight to 2.5 or lower), [/w SCNet XL](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/2.7/MVSep_MDX23_Colab_2_7_Version_Updated.ipynb) (weights not measured), [WebUI fork](https://github.com/RedsAnalysis/MVSEP-MDX23-Colab_v2) (for local installation), [2.4](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.4/MVSep-MDX23-Colab.ipynb) (added BS-Roformer model), [2.3](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio/Separate/MDX23C.ipynb) (Kubinka fork of jarredou’s Colab /w FLAC conversion, ZIP unpacking, new fullband preservation), [2.1](https://colab.research.google.com/github/deton24/MVSEP-MDX23-Colab_v2.1/blob/main/MVSep_MDX23_Colab.ipynb) (voc\\_ft instead of Kim Vocal 2, a bit better SDR over 2.0 in overall), [2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) (with MDX23C model, may have more vocal residues vs 2.1), org. [2.3](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb) (with VitLarge model instead instr-HQ3), [GUI/CML](https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model/releases) (GUI only for older original 1.x release by ZFTurbo), instructions for local installation at the button\n\nThe ZFTurbo 1.0 Colab further modified by jarredou to alleviate vocal residues. It adds better models and volume compensation, fullband trick for narowband vocal models, higher frequency bleeding fix and much more. Currently, it achieves not much worse SDR as current “Ensemble 4 models” on MVSEP utilizing some newer private models available only on MVSEP already. Initially released 1.x code by ZFTurbo received 3rd place in the latest MDX 2023 challenge.\n\n“I have successfully processed a ~30min track with vocals\\_instru\\_only mode [on Colab] while I was working on that 2.3 version, but it was probably with minimal settings.\n\n[Errors/freezes are] already happening during Demucs separation when you do 4-stem separation with files longer than ~10-15 min” jarredou\n\nWith v. 2.4, 30 minute file was too long, and the Colab hung on Roformer model separation.\n\nThe Colab combines results of then the best public models of different architectures using custom weights for every model (like a manually set volume for every stem, then mixed with others together), instead of usual methods of ensembling as in UVR, which in e.g. “avg” averages results of all models (so there the same volume is used for every stem). More tricks in the Colab explained further below.\n\nAs of v. 2.5 “Baseline ensemble is made with Kim Melband Rofo, InstVocHQ and selected 1296 or 1297 BS Rofo” (so Kim Rofo was added, and VitLarge is no longer default).\n\n~“Free Google Colab gives you 3h per day, then you need to wait 24h, and next day it gives you 2 free hours, after 24h wait you'll get only 1h, and 24h later, 2h of free credits, the day after 1h of free credits, etc... and once in that pattern, you have to wait 48h to recover the 3h back.” You can just change Google account when GPU limit is reached, but remember to use the same new account during mounting GDrive, otherwise you may get an error.\n\n“I've opened a donation account for those who would want to support me: <https://ko-fi.com/jarredou>”\n\n*Troubleshooting*\n\n- “PytorchStreamReader error”\n\nsimply restart the environment, it’s a random issue occurring in the Colab.\n\n- “usage: inference.py [-h] --input\\_audio INPUT\\_AUDIO [INPUT\\_AUDIO ...] --output\\_folder”\n\n(and the whole list of arguments is shown below)\n\nlaunch mount to GDrive cell (it’s not being done automatically) or change file input and output path\n\n- “ValueError: Mountpoint must not already contain files”\n\n(on attempt of mounting GDrive), go to file manager on the left, and you probably have GDrive folder with empty folders you need to delete from there first, and retry (might happen when you use GDrive on this account while it’s nto mounted yet, but Colab works).\n\n- \"no such file\"\n\n(error in v2.3 while batch processing)\n\n“it's square brackets [ ]\n\nwhen it sees a [ in the filename, it then thinks there's two additional [ ] in the name\n\nchanging to regular parentheses does work”\n\n- “If I input more than a single song it just starts building up on model data without clearing the old one, so it slowly starts running out of VRAM and then gets stuck”\n\n*Experimenting with settings*\n\n- Default settings of the Colab are a good starting point in general\n\n- Some people like to increase BigShifts to 20 or even 30 with all other default settings (some songs might be less muddy that way), but default 3 is already balanced value, but exceeding 5 or 7 may not give a noticeable difference, while increasing separation time severely.\n\n- Switching from 1296 to 1297 model produces more muddy/worse instrumentals in this Colab (more sudden jumps of dynamics from residues). Similar situation with decreasing BigShifts to 1.\n\n- voc\\_ft enabled might give less muddy results, but with more residues in instrumentals\n\n- In 2.5 you can try out the following settings by mesk:\n\n“Set the weights of BS-RoFormer & MDX23C to 0, enable VitLarge, and set the weights of Mel-RoFormer & VitLarge to 8 & 3 respectively.\n\nYou can set BigShifts to whatever you'd like, I think 5 or 7 is optimal” but mesk uses even 9.\n\nVitLarge overlap can no longer be changed in v2.5 of the Colab, probably only in CLI version.\n\n- Or you can test ensemble of only Kim weight 10 + Vit weight 5, BigShifts e.g. 9\n\n- Or BS-Roformer with MDX23C\n\n- Experimentally set “Separation\\_mode:” to 4 stems (slower) and \"filename\\_instrum2\" will be the sum of the Drums + Bass + Other stems that are obtained by processing \"instrum\" with multiple Demucs models. It might have a bit less vocal residues or be a bit muddier. Vs 2.1 denoiser is less aggressive as its disabled for some stems to save on VRAM.\n\n- Increasing overlap might give muddier results, but potentially better if you hear some vocal residues\n\n- In e.g. older v. 2.4 you might want to disable VitLarge to experiment (it’s disabled in 2.5) - the model increases some noise at times\n\n- Older versions than 2.4 have very clean results for instrumentals, although it can rarely fail in getting rid of some vocals in quiet fragments of a track, but it has bigger SDR than the best ensembles in UVR. Versions 2.4 and newer started to utilize BS-Roformer arch, which is pretty muddy itself, but deprived of the majority of vocal residues.\n\n- For instrumentals, I’d rather stick to instrum2 results (so sum of all 3 stems instead of inversion with e.g. inst only enabled) but some fragments can sound better in instrum and it also slightly better SDR, so e.g. instrum can give louder snares at times, while instrum2 is muddier but sometimes less noisy/harsh. It can all depend on a song. Most people can’t tell a difference between both.\n\n- If you stuffer from some vocal residues in v. 2.2.2, try out these settings\n\nBigShifts\\_MDX: 0\n\noverlap\\_MDX: 0.65\n\noverlap\\_MDXv3: 10\n\noverlap demucs: 0.96\n\noutput\\_format: float\n\nvocals\\_instru\\_only: disabled (it will additionally give instrum2 output file for less vocal residues in some cases)\n\n- You can manipulate with weights.\n\nE.g. different weight balance, in 2.2 with less MDXv3 and more VOC-FT.\n\n- For vocals in [2.2](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) you can test out [these](https://cdn.discordapp.com/attachments/708579735583588366/1191316113166958622/image.webp) (dead link) settings (21, 0, 20, 6, 5, 2, 0.8)\n\n- In older versions of the Colab Overlap large and small control overlap of song during processing. The larger value, the slower processing but better quality (for both), but bad setting will crash your separation at least on certain songs.\n\nQ: is it possible to use v.2.5 for Melband inference without the need to run the BS model?\n\nA: You can comment out the model(s) you don't want to disable them L621-627 in inference.py [in the line called “vocals\\_model\\_names”\n\nProbably, you could also set BS weight to 0, but it might trigger separation of that model anyway, making it slower.]\n\n- To experiment with parameters for just 4 stems separation, you can use:\n\n1. \"overlap\\_demucs\" in the Colab (not sure how in this Colab, but for demucs\\_ft, shifts 10 and overlap 0.1 worked the best for original instrumentals as input)\n2. shifts for demucs are in line probably 511 (formerly 618 in some other versions iirc): <https://github.com/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/inference.py>\n\n- In order to bypass models for 2 stem separation to use just instrumentals as input for 4 stem separation, “comment out/delete the name of the models you want to bypass” in the line 621 ([screen](https://imgur.com/a/g3mrQTb)). “If you want to use only VOCFT, you have to activate InstVoc too, else it will crash (as it's using InstVoc to fill the spectrum part that is missing because of VOCFT cutoff)” - jarredou\n\n*Using other models not included in the Colab*\n\n- <https://github.com/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.5/inference.py>\n\nE.g. in line 452 you can replace Kim model by any other vocal model, and replace that edited in file manager once Colab has executed initialization cell or fork the repo. As for using instrumental Roformer models instead of vocal models, I can't guarantee it will work correctly.\n\n- “Easiest way [to replace MDX HQ model] should be to replace Inst-HQ4 link with HQ5 link line 480 <https://github.com/jarredou/MVSEP-MDX23-Colab_v2/blob/36909309efd4a75dab9f1d093a112785a8f560fb/inference.py#L480>\n\nIf models parameters are same (iirc they are), drop-in replacement should work (and then you control HQ5 with HQ4 settings in colab GUI)”\n\n- Change the args awaited by inference.py accordingly to the ones you've changed in the Colab notebook [if you decide to change models names in he Colab], it's at bottom of inference.py (line 874 and so on)\n\n- Adding e.g. SCNet is not that easy task, it will also require to really add SCNet arch to the script, not only words (add its core files to \"modules\" folder, import them in main script, check if that work with existing \"demix\" functions, etc... else it can't work).\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/tree/main/models/scnet>\n\nYou can study how ZFTurbo is doing it with his script and then try to adapt it to MDX23 Colab. ~jarredou\n\n- What weight should you use for your custom model?\n\n“You must process an evaluation dataset with each model individually, download the separated audio and then use my \"weight finder\" script ([here](https://discord.com/channels/708579735583588363/773763762887852072/1273057160670089236) [[mirrored](https://drive.google.com/drive/folders/1Hn5rFYaUGxRkdrwUCiOwOES06PPghAwu)]) with all the separated audios from each model. It will try many different weights until it find the best ones for the given model inputs.\n\nElse you can set \"random\" weights, process the multisong dataset from MVSEP and upload the separated audios to the quality checker to get the evaluation scores <https://mvsep.com/en/quality_checker> (and repeat until you're satisfied)\n\nDownload the mutlisong eval dataset provided on the quality checker link I've shared above. Process the 100 tracks with the model/ensemble you want to evaluate. Download the separated audio.\n\nRename the files accordingly to guidelines provided in quality checker link, zip them, upload them and wait for the results\n\nAll in same folder, and named:\n\nsong\\_000\\_instrum.wav\n\nsong\\_000\\_vocals.wav\n\nsong\\_001\\_instrum.wav\n\nsong\\_001\\_vocals.wav\n\nsong\\_002\\_instrum.wav\n\nsong\\_002\\_vocals.wav\n\netc...\n\nSoftware like <https://www.advancedrenamer.com/> can be useful for this”\n\n*About*\n\nThe Colab produces one of the best SDR scores for 4 stems (maybe with slightly better implementation on MVSEP as “Ensemble” 4/5 or more models, although it could be 24 or 32 bit output used for that evaluation which increases SDR (jarredou’s v2.3 evaluation was made using 16 bit).\n\nIn version 2.4, for 2 stems, UVR/ZFTurbo/Viperx following models are used:\n\nMDX23C Inst Voc HQ/MDX-Net HQ\\_4 and voc\\_ft (optionally)/VitLarge/BS-Roformer\n\nand for 4 stems:\n\nHow MDX23 Colab works under the hood in 2.3 iirc (more or less)\n\n- MDX models vocal outputs (so inversion of one inst model there) + Demucs only vocals>inversion of these to get instrumental>demucs\\_ft+demucs 6s+demucs+mmi to get remaining 3 stems (weighted) to get remaining 3 stems (all steps weighted). Something in this recipe could be changed since then.\n\nOr differently - “The process is:\n\n1. Separate vocals independently with InstVocHQ, VitLarge (and VOC-FT as opt)\n\n2. Mix the vocals stems together as a weighted ensemble to create final vocals stem\n\n3. Create instrumental by inverting vocals stem against source\n\n4. Save vocals & instrumental stems\n\n5 (if 5). Take the instrumental to create the 3 others stems with the multiple demucs models weighted ensembles + phase inversion trick and save them.”\n\nModified inference will probably work locally too, e.g. if you use [that](https://github.com/deton24/MVSEP-MDX23-Colab_v2.1) 2.1 repo locally (and probably newer too), but the modified inferences from jarredou crashes the GUI, so you can only use CML version locally in that case.\n\nUsage:\n\npython inference.py --input\\_audio mixture1.wav mixture2.wav --output\\_folder ./results/\n\nTo separate locally, it generally requires a 8GB VRAM Nvidia card. 6GB VRAM is rather not enough but lowering overlaps (e.g. 500000 instead of 1000000) or chunking track manually might be necessary in this case. Also, now you can control everything from options: so you can set chunk\\_size 200000 and single ONNX. It can possibly work with 6GB VRAM that way.\n\nIf you have fail to allocate memory error, use --large\\_gpu parameter.\n\nChunks option have been deleted from newer Colab options.\n\nJarredou made some fixes in 2.2.2.x version in order to handle memory better with MDX23C fullband model.\n\n“I've only removed the denoise double pass for demucs\\_6s, it's activated for other demucs models.”\n\njarredou:\n\n“you can use a workaround to have MDX23C InstVoc-HQ results only with ([dead](https://media.discordapp.net/attachments/767947630403387393/1162948305152655390/image.png?ex=653dcb02&is=652b5602&hm=307fa02087648792bc7a598cab1fd21af1d1d066903cd6619758b1f0930dabf6&=&width=953&height=646)) settings:\n\n(all weights beside MDXv3 set to 0, BigShifts\\_MDX set to min. of 1, and demucs overlap 0 [at least for vocal\\_instru\\_only)\n\nYou can use a higher \"overlap\\_MDXv3\" value than in the screenshot to get slightly better results.\n\n(and also, as it's only a workaround, it will still process the audio with other models, but they will not be used for final result as their weights = 0)\n\n(MDX23C InstVoc-HQ = MDXv3 here)\n\nYou can also use the defaults settings & weights, as it scores a bit higher SDR than InstVoc alone ”\n\nBe aware that [2.0](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/2.0/MVSep-MDX23-Colab.ipynb) version wasn’t updated with:\n\n!python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/ort-cuda-12-nightly/pypi/simple/\n\nOr in case of credential issues, you can try out this instead:\n\n!python -m pip -q install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/\\_packaging/onnxruntime-cuda-12/pypi/simple/\n\nHence, it’s slow (so use 2.1-2.3 instead as they work as intended or add the line at the end of the first cell yourself)\n\n*Explanations on features added in* [*2.2 Colab*](https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.2/MVSep-MDX23-Colab.ipynb) *(2.2 might have more residues vs 2.1) by jarredou*\n\nWhat are BigShifts?\n\nIt's based on Demucs' shift trick, but for Demucs it is limited to 0.5 second shifting max (with a randomly chosen value).\n\nEach BigShifts here shifts the audio by 1 second, no more random shfiting.\n\nf.e. bigshifts=2, it will do 1 pass with 0 shifting, and a second pass with 1 second shifting, then merge the results\n\nbigshifts=3 means 1 pass with 0 shifting + 1 pass with 1 sec shift + 1 pass with 2 sec shift, etc...\n\nOverlap is doing almost the same thing but at audio chunks level, instead of full audio, and the way overlap is implemented (in MVSEP-MDX23), f.e. with overlap=0.99, first audio chunk will have 1 pass, 2nd audio chunk will have 2 passes, etc... until 99th audio chunk and following ones will have 99 passes. With BigShifts, the whole audio is processed with the same number of passes.\n\nSo BigShifts shifts the audio forward one second each time.\n\nOverlap computing is different between MDXv2 models and the other ones in the fork:\n\nFor MDXv2 models (like VOC-FT), it uses the new code from UVR and goes from 0 to 0.99.\n\nFor MDXv3 (InstVoc) & VitLarge models [introduced in v2.3] it uses code from ZFTurbo (based on MDXv3 code from KUIELab, <https://arxiv.org/abs/2306.09382>) and it goes from 1 to whatever.\n\nI'm using low overlap values in the fork because it's kinda redundant with the BigShifts experimental feature I've added and which is based on Demucs' \"shift trick\" (described here <https://arxiv.org/pdf/1911.13254.pdf>, chapter 4.4). But instead of doing shifts between 0 and 0.5 sec like Demucs by adding silence before input, BigShifts are much larger (and related to input length). Having larger time shifting gives more amplitude in possible results.\n\nInstead of adding silence before input to shift it, which would be a waste of time & resources as BigShifts can be above 30s or 1 min of shifting, instead, it changes the shifted part position in audio input (like move the 1st minutes of audio at the end of the file before processing and restores it after processing).\n\nThen like Demucs original trick all shifted & restored results are merged together and averaged.\n\nFrom my tests, it can influence results from -2 SDR to +2 SDR for each shifted results, depending on input and BigShifts value. It's not linear!\n\nUsing BigShifts=1 (disabled) and high overlap value probably gives more stable results, in the other end, but maybe not always as high and/or fast as what BigShifts can give.\n\nWeights have been indeed evaluated on MVSep's multisong dataset. I haven't tried every possible settings, but default values should be not far away from optimal settings, if not optimal [already].\n\nQ: Wasn't the BigShifts trick in the MDX23 Colab relying on a slowed-down and sped-up separation ensembling?\n\nI think increasing the parameter too much rather tends to increase bleeding.\n\nA: It's unrelated to bigshifts, but it was doing that for MDX2 models with a cutoff around 16-17khz (to get fullband results from them) but since it's using only fullband models, I've removed that part (in v2.2 iirc)\n\nThere are a few other \"tricks\" used in the fork:\n\nThe phase inversion denoise trick (was already in original code from ZFTurbo, also used in UVR):\n\nSome archs (MDXv2 mostly, so VOC-FT here) are adding noise to output signals. So to attenuate it, we process the input 2 times, including one time with phase polarity inverted before processing, and restored after processing. So, only the model noise is phase cancelled when the 2 passes are mixed together. (It doesn't cancel 100%, but it's attenuated). This is also applied to Demucs processing (since original code).\n\nMDXv3 & VitLarge don't seem to add noise (or at insignificant volume) so this trick is not used with these models.\n\nSegment\\_size (dim\\_t) original model value is doubled since v2.3 of the fork.\n\nSome benchmarks done by Bas Curtiz showed that it gives a little bit better results ([here](https://discord.com/channels/708579735583588363/767947630403387393/1158057984727973948) with VOC-FT, there's the same benchmark with InstVocHQ model [here](https://discord.com/channels/708579735583588363/767947630403387393/1156687062439821422)).\n\nMultiband ensembling:\n\nI'm using a 2-band ensemble, with different ensemble in frequencies below 10 kHz and above. This is a workaround to get fullband final results even when not fullband models are part of the ensemble (like VOC-FT). Without it, the instrumental stem, obtained by vocals phase inversion against the input audio would have small permanent vocals bleeding above VOC-FT's cutoff, as phase cancellation would be biased there.\n\nIt was a really more essential feature in previous versions when most of the models were not fullband.\n\nVitLarge is not used too in high freq band, but it's a more personal taste (so in the end there's only InstVoc model results above the crossover region)\n\nIn fact, alternatively you could separate your instrumental with Demucs single models used by the Colab (demucs\\_ft, demucs 6s, demucs, mmi) and use SCNet XL and BS-Roformer models from [here](#_sjf0vefmplt).\n\nAs, along with demucs\\_ft, they have the best overall SDR for 4 stems separation (actually MDX23C model1 can give interesting results too compared to demucs\\_ft).\n\nAnd then perform manual weighted ensemble in DAW by setting volume of the stems manually to your liking after importing and aligning lossless stems.\n\nBecause rarely ensembling of more than 4 stems gives good results, IG you could get rid of some demucs models with lower SDR for it (I think the mmi has the lowest SDR, and then 6s).\n\nIf it's too much of a hassle, you could change the volume of the stems from a specific model by the same volume.\n\n**Guide how to use Colab v 2.5**\n\n(reworked Infisrael text)\n\n0. If you plan to use your GDrive for input files, go there now, and create folder called “input” and upload your files there. Create also output folder (not sure if the Colab creates both already). That way you may decrease the time till timeout when the Colab is initialized (esp. for people with slower connection).\n\nNow open the Colab\n\n1. Click the “play” button on the Installation cell and wait until it's finished (should show a green checkmark on the side)\n\n2. Click the “play” button on the GDrive cell.\n\nIt will ask you for permission for this notebook to access your Google Drive files, you can either accept or deny it (it is recommended to accept it if you want to use Google Drive as i/o for your files).\n\nAfter you've done installing it, go to the Separation section below.\n\nDefault settings are already balanced in terms of SDR, and too resource-intensive.\n\n3. Click on the play button to start the Separation, \\*\\*make sure\\*\\* you uploaded the audio file in the `folder\\_path`.\n\nAfter it's done, it will output the stems in the `output\\_folder`.\n\nAlso note, \"`filename\\_instrum`\" is the inversion of the separated vocals stems against the original audio.\n\n\"`filename\\_instrum2`\" when “Separation\\_mode:” is set to 4 stems (slower) is the sum of the Drums + Bass + Other stems that are obtained by processing \"`instrum`\" with multiple Demucs models.\n\nSo \"`instrum`\" is the most untouched and \"`instrum2`\" can have fewer vocals residues or sound a bit muddier.\n\nExperimenting on settings you can set BigShifts to 5 or 7, although it may not give a noticeable difference vs default 3, while increasing separation time severely, but some people use 20 or even 30.\n\n***Comparisons of MDX23 (probably v. 2.0) vs single demucs\\_ft model by A5***\n\nThe Beatles - She Loves You - 2009 Remaster (24-bit - 44.1kHz)\n\nSo I tried out the MDX23 Colab with She Loves You, which is easily the most ratty sounding of all the Beatles recordings, as it is pure mono and the current master was derived from a clean vinyl copy of the single circa 1980. So if it can handle that, it can handle anything. And well, MDX23 is very nice, certainly on par with htdemucs\\_ft, and maybe even better. I'm surprised. You can hear the air around the drums. Something that is relatively rare with demucs. And the bass is solid, some bleed but the tone and the air, the plucking etc is all there. Plus, the vocals are nicer, less drift into the 'other' stem.\n\nJohn Lennon - Now and Then (Demo) - Source unknown (16-bit - 44.1kHz)\n\nOK, another test, this time on a John Lennon demo, Now and Then. The vocals are solid, MDX23 at 0.95 overlap is catching vocals that were previously in htdemucs\\_ft being lost to the piano. So, yeah, it's pretty good. MDX23 is now my favored model. In fact, upon listening to the vocals, it's picking up, from a demo, from a poor recording, on a compact cassette, lip smacks, breathing and other little non-singing quirks. It's like literally going back and having John record in multitrack.\n\nQueen - Innuendo - CD Edition TOCP-6480 (16-bit 44.1kHz)\n\nEvery single model fell down with Freddie Mercury's vocals, not anymore. (...) I've heard true vocal stems from his vocals and the MDX23 separation sounds essentially like that. We're now approaching the 'transparent' era of audio extraction.\n\nNOTE: [voc\\_ft not tested] for Innuendo, will be tested by 07/07/2023\n\n*Colab instruction by Infisrael for old versions*\n\nInstall it, click on the play button and wait until it's finished (should show a green checkmark in the side).\n\nIt will ask you for permission for this notebook to access your Google Drive files, you can either accept or deny it (it is recommended to accept it if you want to use Google Drive as i/o for your files).\n\nAfter you've done installing it, go to the configuration, it's below the 'Separation' tab.\n\n<https://i.imgur.com/qD9jsYG.png> (dead)\n\n(Recommended settings)\n\nInput \"`overlap\\_large`\" & \"`overlap\\_slow`\" with what you desire, at the highest (1.0), it will process slower but will give you a better quality. The default values for large are (0.6), and for small (0.5) [with 0.8 still being balanced in terms of speed and quality].\n\nInput \"`folder\\_path`\" with the folder destination where you have uploaded the audio file you'd like to separate\n\nInput \"`output\\_folder`\" with the folder you'd like the stems to be separated\n\nChange your desired path after `/content/drive/MyDrive/`, so for example:\n\n> `folder\\_path: /content/drive/MyDrive/input`\n\n> `output\\_folder: /content/drive/MyDrive/output`\n\nYou can also make a use of \"`chunk\\_size`\" and put it in a higher value by a little, but if you experience memory issues, lower it, default value for it is 500000.\n\nAfterwards, click on the play button to start the separation, \\*\\*make sure\\*\\* you uploaded the audio file in the `folder\\_path` you provided.\n\nAfter it's done, it will output the stems in the `output\\_folder`.\n\nAlso note, \"`filename\\_instrum`\" is the inversion of the separated vocals stems against the original audio.\n\n\"`filename\\_instrum2`\" is the sum of the Drums + Bass + Other stems that are obtained by processing \"`instrum`\" with multiple Demucs models.\n\nSo \"`instrum`\" is the most untouched and \"`instrum2`\" can have fewer vocals residues.\n\n**Installing the Colab locally**\n\nNVIDIA 12GB VRAM GPU recommended\n\n\"I think it's possible to use Colab notebook .ipynb files locally with anaconda and jupyter, but I've never tried.”\n\n“I didn't get it to work yet but a simple tkinter gui should be easy to throw together I reckon”\n\n[Alternatively]\n\n“You can git clone the repo, install requirements and use the inference.py script, but the command line can be really long to type manually (on Colab it's managed with the GUI):\n\npython inference.py \\\n\n--input\\_audio \"{file\\_path}\" \\\n\n--large\\_gpu \\\n\n--BSRoformer\\_model {BSRoformer\\_model} \\\n\n--weight\\_BSRoformer {weight\\_BSRoformer} \\\n\n--weight\\_Kim\\_MelRoformer {weight\\_Kim\\_MelRoformer} \\\n\n--weight\\_InstVoc {weight\\_InstVoc} \\\n\n--weight\\_InstHQ4 {weight\\_InstHQ4} \\\n\n--weight\\_VOCFT {weight\\_VOCFT} \\\n\n--weight\\_VitLarge {weight\\_VitLarge} \\\n\n--overlap\\_demucs {overlap\\_demucs} \\\n\n--overlap\\_VOCFT {overlap\\_VOCFT} \\\n\n--overlap\\_InstHQ4 {overlap\\_InstHQ4} \\\n\n--output\\_format {output\\_format} \\\n\n--BigShifts {BigShifts} \\\n\n--output\\_folder \"{output\\_folder}\" \\\n\n--input\\_gain {input\\_gain} \\\n\n{filter\\_vocals} \\\n\n{restore\\_gain} \\\n\n{vocals\\_only} \\\n\n{use\\_VitLarge\\_} \\\n\n{use\\_VOCFT\\_} \\\n\n{use\\_InstHQ4\\_} \\\n\n{use\\_InstVoc\\_} \\\n\n{use\\_BSRoformer\\_} \\\n\n{use\\_Kim\\_MelRoformer\\_}\n\nQ: How do you use the example {useVitLarge}\n\nlike the other stuff ik how to use\n\nA: These last arguments are boolean based, there are generated before the command line and depending on the option selected in the GUI with:\n\nuse\\_InstVoc\\_ = '--use\\_InstVoc' #forced use\n\nuse\\_BSRoformer\\_ = '--use\\_BSRoformer' #forced use\n\nuse\\_Kim\\_MelRoformer\\_ = '--use\\_Kim\\_MelRoformer' #forced use\n\nuse\\_VOCFT\\_ = '--use\\_VOCFT' if use\\_VOCFT is True else ''\n\nuse\\_VitLarge\\_ = '--use\\_VitLarge' if use\\_VitLarge is True else ''\n\nuse\\_InstHQ4\\_ = '--use\\_InstHQ4' if use\\_InstHQ4 is True else ''\n\nrestore\\_gain = '--restore\\_gain' if restore\\_gain\\_after\\_separation is True else ''\n\nvocals\\_only = '--vocals\\_only' if Separation\\_mode == 'Vocals/Instrumental' else ''\n\nfilter\\_vocals = '--filter\\_vocals' if filter\\_vocals\\_below\\_50hz is True else ''\n\nQ: So you don't need to use them?\n\nOnly using the ones with the two -- before right\n\nA: For example, if you want to activate vocals filtering below 50hz, you add \"--filter\\_vocals\" to the command line\n\nQ: How do you do this\n\nA: ([click](https://drive.google.com/file/d/1CI6dwZ7tPUvbolckwjicTl6s1tbfAly-/view?usp=sharing))\n\nQ: oh yeah I just have to change the default number then right\n\nIt works [#⁠general⁠](https://discord.com/channels/708579735583588363/708579735583588366/1272994209732755500)\n\nA: If you have multiple GPUs and the CUDA one is not labelled device \"0\", maybe that can be the cause too, it's hardcoded for Colab, but you can change it in first lines of inference.py file gpu\\_use = \"0\"\n\nIf your GPU is not detected in Anaconda, use Python (can be 3.12). If it's the same:\n\n<https://pytorch.org/get-started/locally/#start-locally>\n\nWhere it says \"run this command\" I basically uninstalled the modules it had in there\n\nso I did pip uninstall torch torchvision torchaudio\n\nthen ran that command to install it\n\nand it fucking fixed it (knock)\n\n1.0 original code used kim vocal 1 (later 2), kim inst and (at least for 4 stems) Demucs models.\n\n## KaraFan by Captain FLAM\n\n(2 stems)\n\n[Colab](https://colab.research.google.com/github/Eddycrack864/KaraFan/blob/master/KaraFan_Improved_Version.ipynb) w/ more models (AI Hub fork, also fixed), fixed org. [Colab](https://colab.research.google.com/drive/1HwCKsVMGotBvkHe1bfR8Q5POZsrRx-iu), org. [Colab](https://colab.research.google.com/github/Captain-FLAM/KaraFan/blob/master/KaraFan.ipynb) (slow), [GUI](https://github.com/Captain-FLAM/KaraFan/releases), GH [documentation](https://github.com/Captain-FLAM/KaraFan/wiki/)\n\n[How](https://www.youtube.com/watch?v=uWJvMzu5EyA) to install it locally (advanced), alt. [tutorial](https://www.youtube.com/watch?v=BM5bF_bcYoE),\n\nor [easy](https://github.com/Captain-FLAM/KaraFan/wiki/%F0%9F%9A%80-Install-PC-users) instruction\n\nShould work on Mac with Silicon or AMD GPU (although not for everyone)\n\n& Linux with Nvidia or AMD GPU\n\n& Windows probably with at least Nvidia GPU, or with CPU (v. slow)\n\n- For Colab users - create “Music” in the main GDrive directory and upload your files for separation there (the code won’t create the folder on the first launch).\n\n- Sometimes you’ll encounter soundfile errors during separation. Just retry, and it will work\n\nKaraFan (don’t confuse with KaraFun) is a direct derivative of ZFTurbo’s MDX23 code forked by jarredou, but with further tweaks and tricks in order to get the best quality of instrumentals and vocals sonically, but without overfocusing on SDR only, but the overall sound.\n\nIts aim is to not increase vocal residues without making instrumentals too muddy like e.g. sometimes HQ\\_3 model does, but without having so many vocal residues as MDX23C fullband model (but it depends on chosen preset).\n\nSince v. 4.4 and 5.x you have five presets to test out.\n\nPresets 3 and 4 are more aggressive in canceling vocal residues (P4 can be good for vocals).\n\nPreset 5 (takes 12 minutes+ on the slowest setting for 3:25 track on T4) has more clarity of instrumentals over presets 3 and 4, but also more vocal residues (although less than P1 and P2 (takes 8 minutes for 3:24 track on the slowest setting).\n\nOn 23.11.24 “Preset 5 was corrected to be less aggressive as possible”. All the below Preset 5 descriptions refer to the old P5. The original preset 5 is [here](https://i.imgur.com/dJqnFnX.png), and is less muddy, but has more vocal residues (at least the original preset contains more models and is slower).\n\nSpeed and chunks affect quality. The slower, the muddier, but also slightly less vocal residues, although they’ll be still there (just slightly quieter). I’d recommend the “fastest” Speed setting and 400K chunks for the current P5 (tested on 4:07 song, may not work for longer tracks).\n\n- If you replace Inst Voc HQ1 model by HQ2 using AI Hub fork in current P5, the instrumental will be muddier.\n\n- To preserve instruments which are counted as vocals by other MDXv2 models, use [these](https://i.imgur.com/coxj6Zs.png) preset’s 5 modified settings - they have more clarity than P5 and preserve hi-hats better. But to preserve the same processing time as in P5, but setting “Speed” slider to medium, in this case will result in more constant vocal residues vs P5 with the slowest setting (too much at times, but it might serve well for specific song fragments). It will take 12 minutes+ for 3:24 track on medium. Debug and God mode on the screenshot are unrelated and optional.\n\n- To fix issues with saxophone in P5 use [these](https://i.imgur.com/GeMeBJx.png) settings. They even have more clarity than the one above, but also more hearable vocal residues. It helps to preserve instruments better than the setting from the above. It can be better than P2 - less hearable consistent vocal residues, but in similar amount, while on other artists sax preset even gives more vocal residues than P2. Sax setting is worse in preserving piano than the setting above.\n\n- Using the slowest setting here in sax fix preset will result in disconnection of runtime with free T4 after 28 minutes of processing, but it should succeed anyway (result files might be uploaded on GDrive after some time anyway).\n\nVs medium, the slowest setting gives more muffled sound, but not always less vocal residues. It can be heard the best in short parts with only vocals. 18 minutes for 4:07 track on Fast setting (God Mode and Debug Mode are disabled in KaraFan by default).\n\nAfter 3-4 ~18 minutes separations (in this case not made in batch, but with manually changed parameters in the middle), when you terminate and delete environment, you might be not able to connect with GPUs again as the limit will be reached unless you switch Colab account (mount the same GDrive account as Colab to avoid errors)\n\n- Preset 5 provides more muffled results than the two settings above, but with good balance of clarity and vocal residues. Sometimes this one has less vocal residues, sometimes 16.66 MDX23C model on MVSEP (or possibly a bit older HQ\\_1 model in UVR), it can even depend on a song fragment. Using newer MDX23C HQ 2 in P5 instead of MDX23C HQ doesn’t seem to produce better results\n\nAfter 5th separation (not in batch) you must start your next separation very fast because or you’ll run out of Colab free limit when GUI is in idle state. In such case, switch Colab account, and use the same account to mount GDrive (or you might encounter error).\n\nComparisons above made with normalization disabled and 32-bit float setting\n\nThe code handles mono and 48kHz files too, 6:16 (preset 3) tracks, and possibly 9 minutes tracks too (but can’t tell if with all presets). It stores models on GDrive, which takes 0,8-1,1GB (depending on how many models you’ll use). One 4:07 song in 32-bit float with debug mode enabled (all intermediate files will be kept) will take 1,1GB on GDrive. Instrumentals will be stored in files marked as Final (in the end), Music Sub (can sound a bit cleaner at times, but with more residues), and Music Extract (from specific models).\n\nOlder 1.3 version Colab fork by Kubinka was deleted.\n\nColab fork made by AI HUB server members also includes MDX23C Inst Voc HQ 2 and HQ\\_4 models, and contains slow separation fix from the “fixed Colab”.\n\nKaraFan used to have lots of versions which differ in these aspects with an aim to have the best result in the recent Colab/GUI version. E.g. v.3.1 used to have more vocal residues than in 1.3 version and even more than in HQ\\_3 model on its own, and it got partially fixed in 3.2 (if not entirely). But 1.3 irc, had some overlapped frequency issue with SRS disabled, which makes the instrumentals brighter, but it got fixed later. The current version at the time of writing this excerpt is 4.2, with pretty good opinions for v.4.1 shortly before.\n\n*Colab troubleshooting*\n\n- (no longer necessary in the fixed [Colab](https://colab.research.google.com/drive/1HwCKsVMGotBvkHe1bfR8Q5POZsrRx-iu)) If you suffer from very slow or unfinishable separations in the Colab using non-MDX23C models (e.g. stuck on voc\\_ft without any progress), use fixed Colab (the onnxruntime-gpu line added in the end of the first cell)\n\n- Contrary to every other Colab in this document, KaraFan uses a GUI which launches after executing inference cell. It triggers Google’s timeout security checks frequently esp. in free Colab users, because Google behaves like the separation is not being executed where you do it in GUI, and it’s generally against their policies to execute such code instead pasting commands to execute in Colab cells directly. The same way many RVC Colabs got blocked by Google, but this one is generally not directly for voice cloning, and is not very popular yet, so it wasn’t targeted by Google yet.\n\n- Once you start separation, it can get you disconnected from runtime quickly, especially if you miss some multiple captcha prompts (in 2024 captchas stopped appearing at all, so the user inactivity during separation process seems to be no longer checked).\n\n- After runtime disconnection error, output folder on e.g. GDrive can be still constantly populated with new files, while progress bar is not being refreshed after clicking close or even after closing your tab with Colab opened. At certain point it can interrupt the process, leaving you with not all output files. Be aware that final files always have “Final” in their names.\n\n- It can consume free \"credits\" till you click Environment>Terminate session. It happens even if you close the Colab tab. You can check “This is the end” option so the GUI will terminate the session after separation is done to not drain your free limit.\n\n- (rather fixed) As for 4.2 version, session crashes for free Colab users can occur, due to running out of memory. You can try out shorter files.\n\nCurrently, if you rename your output folder with separation, and retry separation, it will look for the old folder with separation to delete, and return the error, and running the GUI cell again may cause disappearing of GUI elements.\n\nit's a default behavior of Colab and IPython core : Sync of files the Colab sees is not real time\n\nTwo possible solutions:\n\n* wait until sync with Google Drive is done\n* restart & run Colab\n\n- Sometimes shutting down your environment in Environment options and starting over might do the trick if something doesn't work. E.g. (if it wasn't fixed), when you manipulate input files on GDrive when GUI is still opened, and you just finished separation, you might run into an error when you start separating another file with input folder content changed.\n\nIn order to avoid it, you need to run the GUI cell again after you've changed the input folder content (IRC it's \"Music\" folder by default). Maybe too low chunks (below 500k for too long tracks if something hasn't changed in the code). Also, check with some other input file you used before and worked before first.\n\nAlso, be more specific about what doesn't work. Provide screenshot and/or paste the error.\n\n- You can be logged to a maximum of 10 Google accounts at the same time. You can’t log out of any of these single accounts on PC in browser. The only way is to do it on your Android phone, but it might not fix the problem, as it will tell “logged out” on that account on PC, and logging into other one might not work and the limit will be still exceeded. At this situation you can only logged out from all accounts (but it will break accounts order, so any authorizations set to specific accounts in your bookmarked links will be messed up - e.g. those to Colab, GDrive, Gmail, etc. I mean: /u/0 and in Colab authuser= in links. Easier way to access to extra Google account will be to log into it from Incognito mode.\n\nIf you possess lots of accounts and you don’t log for some for 2 years, Google can delete it. To avoid it, create YT channel on it, and upload at least one video, and the account won’t be deleted.\n\n*Tests of four presets of KF 4.4 vs MDX-UVR HQ\\_3 and MDX23C HQ (1648)*\n\n(noise gate enabled a.k.a. “Silent” option)\n\nNot really demanding case, so without modern vocal chain in the mix, but probably enough to present the general idea of how different presets sound here. So, more forgiving song to MDX23C model this time, and less aggressive models with more clarity.\n\nGenre: (older rap) Title: (O.S.T.R. - Tabasko [2002])\n\nBEST Preset : 3\n\nMusic :\n\nVersus P4, hi-hats are preserved better in P3.\n\nSnare in P3 is not so muffled like in P4.\n\nHQ\\_3 has even more muffled snares than in P4.\n\nP3 still had less vocal residues than MDX23C HQ 1648 model, although the whole P3 result was more muffled, but residues are smartly muffled too.\n\nMDX23C had like more faithfully sounding snares than P3, to the extent that they can be perceived brighter (but vocal residues, even on a more forgiving song like this, are more persistent in MDX23C than in P3).\n\nSometimes it depended on specific fragment where P4 and where P3 has more vocal residues in that specific case, so P3 turned out being pretty much balanced, although P4 had less consistent vocal residues, although still not so few like HQ\\_3, but it's not that much of a problem (HQ\\_3 is really muffled). If it was 4 stems, then I'd describe P3/4 as having very good \"other\" stem but drums too as I mentioned.\n\nWORST Preset (in that case) : 1\n\nMusic : Too much consistent vocal residues\n\nThere's a similar situation in P2, but at least P2 has brighter snares than even MDX23C.\n\nIn other songs, P1 can be better than P2, leaving less vocal residues in specific fragments for a specific artist, but noticeably more for others.\n\nPreset 4 with setting slow (but not the slowest) takes 16 minutes for 5 minutes song on T4 in free Colab (performance of ~GTX 3050). For 3:30 track, it takes 13:30 for the slowest setting. In KF 5.1 with default chunks 500K and slowest setting, for 4:50 song and preset 2 it took <10 minutes, preset 3, 12 minutes.\n\nVS preset 3, the one from the [screenshot](https://cdn.discordapp.com/attachments/1162265179271200820/1173570485733302312/karafan.PNG) (now added as preset 5) is more noisy and has more vocal residues, mainly in quiet places or when there is no instrumental. Processing time for 6:16 track on medium setting is 22:19 minutes. But it definitely has more clarity over preset 3. And there is still less vocal residues than in Preset 1 and 2, which have more clarity, but tend to have too many vocal residues in some tracks. Hence, preset 5 is the most universal for now.\n\nFor future: “To add or remove some models u need to edit the .csv file <https://github.com/Eddycrack864/KaraFan/blob/master/Data/Models.csv>\n\nwith the model info (Only MDX23C or MDX-NET) u can found the model info on the model\\_data\\_new.json: <https://raw.githubusercontent.com/TRvlvr/application_data/main/mdx_model_data/model_data_new.json> u need to find the hash of the model. And.... that's it! (Not Eddie)\n\n## Ripple/Capcut/SAMI-Bytedance/Volcengine/BS-RoFormer (2-4 stem)\n\n(Ripple is discontinued since 31 January 2026)\n\nOutput quality in Ripple is: 256kbps M4A (320kbps max) and lossless (introduced later). 50MB upload limit, 4 stems\n\nMin. iOS version: 14.1\n\nRipple is only for US region (which you can change, more below)\n\n**Ripple no longer separates stems** (there's an error \"couldn't complete processing please try again\")\n\nRipple for iOS: <https://apps.apple.com/us/app/ripple-music-creation-tool/id6447522624>\n\nCapcut for Android: <https://play.google.com/store/apps/details?id=com.lemon.lvoverseas>\n\n(separation only for Pro, Indian users sometimes via VPN)\n\nCapcut a.k.a. Jianying (2 stems) works also on Windows (only in Jianying Pro, separation option is available)\n\nCan be used instead of Ripple if you're on unsupported iOS below 14.1 or don’t have iOS. To get Ripple you can also use a virtual machine remotely instead (instructions below). Ripple can also be run on your M1 Mac using app sideloading (instructions below).\n\nRipple = better quality than CapCut as of now (and fullband)\n\nwith fixed the click/artifacts using cross-fade technique between the chunks.\n\nCapcut = “the results are really low quality but if you export the instrumental and invert it with the lossless track, you will get the vocals with the noise which is easy to remove with mdx voc ft for example, then you can invert the lossless processed vocals with the original and have it in better quality.\n\nThe vocals are very clean from cap cut, almost no drum bleed”\n\nRipple and Capcut uses SAMI-Bytadance arch (later known as BS-Roformer), but it’s a different model with worse SDR than on the leaderboard. It was developed by Bytedance (owner of TikTok) for MDX23 competition, and holds the top of our MVSEP leaderboard. It was published on iOS and for the US region as “Ripple - Music Creation Tool” app. Furthermore, it's a multifunctional app for audio editing, which also contains a 4 stem separation model. Similar situation with Capcut (which is 2 stems only IRC). The model itself is not the same as for MDX23 competition (SAMI ByteDance v1.0), as they said, models for apps were trained on 128kbps mp3 files to avoid copyright issues, but it’s the same arch, just scores a bit lower (even when exported losslessly for evaluation). SDR for Ripple is naturally better than for Capcut.\n\nSeems like there is no other Pro variant for Capcut Android app, so you need to unlock regular version to Pro.\n\nAt least the unlocked version on apklite.me have a link to the regular version, so it doesn't seem to be Pro app behind any regional block. But -\n\n\"Indian users - Use VPN for Pro\" as they say, so similar situation like we had on PC Capcut before. Can't guarantee that unlocked version on apklite.me is clean. I've never downloaded anything from there.\n\n*Bleeding*\n\nBas Curtiz found out that decreasing volume of mixtures for Ripple by -3dB (sometimes -4dB) eliminates problems with vocal residues in instrumentals in Ripple. [Video](https://cdn.discordapp.com/attachments/708579735583588366/1165647205600854118/Ripple_vs_-6db_example_2.mp4)\n\nThis is the most balanced value, which still doesn't take too many details out of the song due to volume attenuation.\n\nOther good values purely SDR-wise are -20dB>-8dB>-30dB>-6dB>-4dB> /wo vol. decr.\n\nThe method might be potentially beneficial for other models, and probably work best for the loudest tracks with brickwalled waveforms.\n\nThe other stem is gathered from inversion to speed up the separation process. The consequence is bleeding in instrumentals.\n\n- If you suffer from bleeding in other stem of 4 stems Ripple, beside decreasing volume by e.g. 3/4dB also “when u throw the 'other stem' back into ripple 4 track split a second time, it works pretty well [to cancel the bleeding]”\n\nThe forte of the Ripple is currently vocals - the algo is very good at differentiating what is vocals and what is not, although they can sound “filtered” at times.\n\nCurrently, the best SDR for public model/AI, but it gives the best results for vocals in general. For instrumentals, it rather doesn’t beat paid Dango.ai (and rather not KaraFan and HQ\\_3 or 1648/MDX23C fullband too).\n\nIt's good for vocals, also for cleaning vocal inverts, and surprisingly good for e.g. Christmas songs, (it handled hip-hop, e.g. Drake pretty well). It's better for vocals than instrumentals due to residues in other stem - bass is very good, drums also decent, kicks even one if not the best out of all models, as they said some fine-tuning was applied to drums stem. Vocals can be used for inversion to get instrumentals, and it may sound clean, but rather not as good as what 2 stem option or 3 stem mixdown gives as output is lossy.\n\n**Capcut** (2 stems only)\n\n<https://www.capcut.cn/>\n\nIt is a new Windows and Android app which contains the same arch as Ripple inst/vocal, but lower quality model, and without an option of exporting 4 stems.\n\nIt normalizes the input, so you cannot use Bas’ trick to decrease volume by -3dB to workaround the issue of bleeding like in Ripple (unless you trick out the CapCut, possibly by adding some loud sound in the song with decreased volume).\n\n“At the moment the separation is only available in Chinese version of Windows app which is jianyingpro, download available at capcut.cn [probably here - it’s where you’re redirected after you click “Alternate download link” on the main page, where download might not work at all]\n\nSome people cannot find the settings on [this](https://media.discordapp.net/attachments/875539590373572648/1163477679736102932/image.png?ex=653fb807&is=652d4307&hm=567a1806a464224601faa2e16b43ba2ff856d8a70355e3926d116869cefb1360) screen in order to separate.\n\nSeparation doesn't require sign up/login, but exporting does, and requires VIP, which is paid depending on whether you’re from rich or poor country, then it’s free.\n\n- There’s a workaround for people not able to split using Capcut for Windows in various regions.\n\n- Bas Curtiz' new video on how to install and use Capcut for separation incl. exporting:\n\n<https://www.youtube.com/watch?v=ppfyl91bJIw>\n\n\"It's a bit of a hassle to set it up, but do realize:\n\n- This is the only way (besides Ripple on iOS) to run ByteDance's model (best based on SDR).\n\n- Only the Chinese version has these VIP features; now u will have it in English\n\n- Exporting is a paid feature (normally); now u get it for free\n\nThe instructions displayed in the video are also in the YouTube description.\"\n\n- mitmproxy [script](https://cdn.discordapp.com/attachments/708579735583588366/1167458847963746364/mitmproxy_script.py) allowing to save to FLAC instead of AAC (although it just reencodes from AAC 113kbps with 15.6kHz lowpass filter). It’s a bit more than script. See the [full](https://www.youtube.com/watch?v=gEQFzj6-5pk) tutorial.\n\n- For some people using mitmproxy scripts for Capcut (but not everyone), they “changed their security to reject all incoming packet which was run through mitmproxy. I saw the mitmproxy log said the certificate for TLS not allowed to connect to their site to get their API. And there are some errors on mitmproxy such as events.py or bla bla bla... and Capcut always warning unstable network, then processing stop to 60% without finish.” ~hendry.setiadi\n\n“At 60% it looks like the progress isn't going up, but give it idk, 1 min tops, and it splits fine.” - Bas\n\n“in order to install pydub within mitmproxy, you additionally need to:\n\nopen up CMD\n\npip install mitmproxy\n\npip install pydub”\n\n- IntroC created a [script](https://drive.google.com/drive/folders/12m1qrRNpsTrCxfioG9xzcZUYTV0Gl8Ap) for mitmproxy for Capcut allowing fullband output, by slowing down the track. [Video](https://www.youtube.com/watch?v=-34Q5rJ68pI)\n\nOlder Capcut instruction:\n\nThe [video](https://cdn.discordapp.com/attachments/708595418400817162/1166823721831501834/Testing_CapCut_workaround.mp4) demonstration of below:\n\n0. Go offline.\n\n1. Install the Chinese version from [capcut.cn](https://www.capcut.cn/)\n\n2. Use [these](https://wetransfer.com/downloads/301dd839f2b7af2a0bcfdaf3188ff1a420231025163008/049ffb) files copied over your current Chinese installation in:\n\nC:\\Users\\(your account)\\AppData\\Local\\JianyingPro\n\nDon’t use English patch provided below (or the separation option will be gone)\n\n3. Now open CapCut, go online after closing welcome screen, happy converting!\n\n4. Before you close the app, go offline again (or the separation option will be gone later).\n\n! Before reopening the app, go offline again, open the app, close welcome screen, go online, separate, go offline, close. If you happen to missed that step, you need to start from the beginning of the instruction.\n\n(no longer works after 4.6 to 4.7 update, as it freezes the app) The only thing that seems to enable vocal separation without requiring replacing everything, is to replace that [SettingsSDK](https://cdn.discordapp.com/attachments/708595418400817162/1167169672580440195/SettingsSDK.zip) folder contents inside User Data. It's probably the settings\\_json file inside responsible for that.\n\nFYI - the app doesn’t separate files locally.\n\nThe quality of separation vs Capcut is not exactly the same as Ripple. Seeing by spectrograms, there is a bit more information in vocals in Capcut, while Ripple has a bit more information in spectrum in instrumentals.\n\nSeparated vocal file is encrypted and located in C:\\Users\\yourusername\\AppData\\Local\\JianyingPro\\User Data\\Cache\\audioWave”\n\nThe unencrypted audio file in AAC format is located at \\JianyingPro Drafts\\yourprojectname\\Resources\\audioAlg (ends with download.aac)\n\n“To get the full playable audio in mp3 format, a trick that you can do is drag and drop the download.aac file into Capcut and then go to export and select mp3. It will output the original file without randomisation or skipping parts”\n\n(although it resulted in VIP option disappearing but Bas somehow managed to integrate it in his new video tutorial, and it started to work, English translation isn't the culprit of the problem, but if you use both language pack and SettingsSDK folder from above)\n\nYou can replace the zh-Hans.po file with [English one](https://cdn.discordapp.com/attachments/875539590373572648/1163902105006903347/zh-Hans.po) to have English language on Chinese version of the app possessing separation feature in:\n\njianyingpro/4.6.1.10576/Resources/po\n\nWhile you can’t use that language pack, you can always use Google Translate to transform Chinese into your own language on a screen of your smartphone.\n\n<https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQwk_qynMHMwquSfQZFrrn30F355Ihta_GHQNo7vhnPUhfjj-kUiqSRBiLQbPlgmB5Gqro&usqp=CAU>\n\n<https://support.google.com/translate/answer/6142483?hl=en&co=GENIE.Platform%3DDesktop>\n\n“Trying out capcut, the quality seems the same as the Ripple app (low bitrate mp3 quality)\n\nat least the voice leftover bug is fixed, lol”\n\nRandom vocal pops from Ripple are fixed here.\n\nAlso, it still has the same clicks every 25 seconds as before in Ripple.\n\nCapcut adds 1024 extra samples at the beginning, and 16 extra samples at the end of the file.\n\n**How to change region to US**\n\n**in order to make Ripple work on iOS**\n\nin Apple App Store to make \"Ripple - Music Creation Tool\" (SAMI-Bytedance) work.\n\n<https://support.apple.com/en-gb/HT201389>\n\n- Bas' [guide](https://media.discordapp.net/attachments/708595418400817162/1146727313963237406/Ripple_iOS_iPad_mini_2_-_demo.mp4) to change region to US for Ripple on iOS\n\n<https://www.bestrandoms.com/random-address-in-us>\n\nOr use this Walmart address in Texas, the number belongs to an airport.\n\nDo it in App Store (where you have the person-icon in top right).\n\nYou don't have to fill credit cards details, when you are rejected,\n\nreboot, check region/country... and it can be set to the US already.\n\nAlthough, it can happen for some users that it won't let you download anything forcing your real country.\n\n\"I got an error because the zip code was wrong (I did enter random numbers) and it got stuck even after changing it.\n\nSo I started from the beginning, typed in all the correct info, and voilà\"\n\nIf ''you have a store credit balance; you must spend your balance before you can change stores''.\n\nIt needs (an old?) a simcard to log your old account out if necessary\n\n#### Ripple on Windows or MacOS\n\n- Another way to use **Ripple** without Apple device -\n\n**virtual machine**\n\nSideloading of this mobile iOS app is possible on at least M1 Macs.\n\n- Saucelabs\n\nSign up at <https://saucelabs.com/sign-up>\n\nVerify your email, upload this as the IPA: <https://decrypt.day/app/id6447522624/dl/cllm55sbo01nfoj7yjfiyucaa>\n\nRotating puzzle captcha for TikTok account can be tasking due to low framerate. Some people can do it after two tries, others will sooner run out of credits, or completely unable to do it.\n\n- <https://mobiledevice.cloud/>\n\nMobile device cloud\n\n- Scaleway\n\n\"if you're desperate you can rent an M1 Mac on scaleway and run the app through that for $0.11 an hour using this <https://github.com/PlayCover/PlayCover>”\n\nIPA file:\n\n<https://www.dropbox.com/s/z766tfysix5gt04/com.ripple.ios.appstore_1.9.1_und3fined.ipa?dl=0>\n\n\"been working like a dream for me on an M1 Pro… I've separated 20+ songs in the last hour\"\n\nMore info:\n\n-<https://cdn.discordapp.com/attachments/708579735583588366/1146136170342920302/image.png>\n\n- “keep in mind that the vm has to be up for 24 hours before you can remove it, so it'll be a couple bucks in total to use it”\n\n**Fixing chunking artefacts** (probably fixed)\n\n- Every 8 seconds there is an artifact of chunking in Ripple. Heal feature in Adobe Audition works really well for it:\n\n<https://www.youtube.com/watch?v=Qqd8Wjqtx-8>\n\n-The same explained on RX10 example and its Declick feature:\n\n<https://www.youtube.com/watch?v=pD3D7f3ungk>\n\n**Volcengine** (a.k.a. The sami-api-bs-4track - 10.8696 SDR Vocals)\n\n<https://www.volcengine.com/docs/6489/72011>\n\nRipple/SAMI Bytedance's API was found. If you're Chinese, you can go through it easier -\n\nyou need to pass the Volcengine facial/document recognition, apparently only available to Chinese people\n\nWe already evaluated its [SDR](https://mvsep.com/quality_checker/entry/4750), and it even scored a bit better than Ripple itself.\n\n\"API from volcengine only return 1 stem result from 1 request, and it offers vocal+inst only, other stems not provided. So making a quality checker result on vocal + instrument will cost 2x of its API charging.\n\nSomething good is that volcengine API offers 100 min free for new users\"\n\nAPI is paid 0.2 CNY per minute.\n\nIt takes around 30 seconds for one song.\n\nIt was 1.272 USD for separating 1 stem out MVSEP's multisong dataset (100 tracks x 1 minute).\n\n\"My only thought is trying an iOS Emulator, but every single free one I've tried isn't far-fetched where you can actually download apps, or import files that is\"\n\nSo far, Ripple didn't beat voc\\_ft (although there might be cases when it's better) and Dango.\n\nSamples we got months ago are very similar to those from the app, also \\*.models files have SAMI header and MSS in model files (which use their own encryption), although processing is probably fully reliable on external servers as the app doesn't work offline (also model files are suspiciously small - few megabytes, although it's specific for mobilenet models). It's probably not the final iteration of their model, as they allegedly told someone they were afraid that their model will leak, but better than the first iteration judging by SDR with even lossy input files.\n\nLater they told that it’s different model than the one they previously evaluated, and that time it was trained with lossy 128kbps files due to some “copyright issues”.\n\n\"One thing you will notice is that in the Strings & Other stem there is a good chunk of residue/bleed from the other stems, the drum/vocal/bass stems all have very little to no residue/bleed\" doesn't exist in all songs.\n\nIt's fully server-based, so they may be afraid of heavy traffic publishing Ripple worldwide, and it's not certain whether it will happen.\n\nThanks to Jorashii, Chris, Cyclcrclicly, anvuew and Bas, Sahlofolina.\n\nPress information:\n\n<https://twitter.com/AppAdsai/status/1675692821603549187/photo/1>\n\n<https://techcrunch.com/2023/06/30/tiktok-parent-bytedance-launches-music-creation-audio-editing-app/>\n\nSite:\n\n<https://www.ripple.club/>\n\n**BS-RoFormer**\n\nUsed architecture in Capcut/Ripple (now defunct). Their paper was published and later reimplemented by lucidrains for training and inferencing:\n\n<https://github.com/lucidrains/BS-RoFormer>\n\nLater, [Mel-Band RoFormer](https://github.com/lucidrains/BS-RoFormer/blob/main/bs_roformer/mel_band_roformer.py) based on band split was released, which is faster, but doesn't provide such high SDR as BS. Mel variant might require some revision of the code, and its paper might lack some features need to keep up SDR-wise with extremely slow BS original variant. On paper, it should be better than BS-Roformer, but for some reason, models trained with Mel have worse results than with BS-Roformer (so probably problem with reimplementation from paper). Kim reworked her config, so the results with Mel models improved, but still are a tad lower than BS-Roformer. ZFTurbo includes training and inference of Roformers in his repository on GitHub.\n\nFor more information, check the [training](#_bg6u0y2kn4ui) section.\n\n*About ByteDance*\n\nWinners of MDX23 competition. They said at the beginning, that it utilizes novel arch (so no weighting/ensembling of existing models). In times of v.0.1 seemingly the best vocals, not so good instrumentals, as it was once said by someone who heard samples, but they came a long way lately. It's all about their politics. It's a Chinese company responsible for TikTok, famous for d\\*\\*k moves outside China - manipulating their algorithms - encourage of stupidity outside China, and greedy, wellness-centered attitudes for users in China (the app is currently banned in China), manipulating their algorithms to promote only black-white relationships in western countries, spying on users copying their clipboard, spying even on journalists to find their sources of information about the company, unauthorized remote access to TikTok user data from China, and also, a subject to ban in US and other countries for bad influence on children, data infringement by storing non-China users data directly on their servers which is against the law of many countries (there were some actions taken on it later). [Decompiling TikTok analysis](https://twitter.com/d1rtydan/status/1277081198624337920?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1278204068175818752%7Ctwgr%5E&ref_url=https%3A%2F%2Fwww.wirtualnemedia.pl%2Fartykul%2Fhakerzy-anonymous-usuncie-tiktoka-to-aplikacja-do-szpiegowania-internautow-dlaczego-tiktok-jak-zainstalowac-jak-korzystac-z-aplikacji) (tons of spying improper behavior of the app). Currently, Bytedance is only around 40% owned by founders, Chinese investors, and their employees and the rest (60%) state global investors (incl. lots of American) and is pushed to sale more stakes to US risking US ban on the app.\n\nThey said, the CEO, told them to hold this ByteDance arch for two years for themselves. Initially they had plans to release it in some kind of app, firstly at the end of June, later something was planned at the end of year, later they said something about two years (maybe more about open sourcing, but we can't have our hopes high). Previously, they said the case of open sourcing/releasing was stuck in their legal department. Later they told they used MUSDBHQ+500 songs for their dataset. These 500 songs could have been obtained illegally for training (although everyone does it), but they might be extremely cautious about it (or it's just an excuse). Eventually, they released Ripple and Capcut. Then they released the paper for the Bs/Mel archs, and it was implemented and coded by Lucidrains, so later could be used for training.\n\nLater, they seemingly spread information among users privately, that despite the similarities in SDR, the 18.75 score is a result of a trolling, someone other than ByteDance. Some people favoring ByteDance were rumored for disruptive, trolling behavior on our server too, harassing other users, or just being unkind to others etc. Besides, the same person responsible, was also the most informed about ByteDance next moves, and was also changing nicknames or accounts frequently. Also possessed great ML knowledge. Many coincidences. In the end, the same user, zmis (if you see the details of the account above), was behind a lot of newly created, accounts, which were banned on our server.\n\nThe same day or in very similar period, a new account was created, conducting the same behavior, when previous was banned.\n\nThe main core of their activities, was spreading misinformation about SDR metrics, telling that is the most important thing in the world, because their own arch is good at it, hence the narration.\n\nSo don't bother, and do your good job not feeding troll from other company. They don't like competition, doing their own moves behind backdrops and become better.\n\nIt’s not impossible to fake SDR results in the MVSEP leaderboard. For current public archs, you’d need to feed your dataset both by the songs in the evaluation dataset, keeping your regular big dataset in place, so you simply lose evaluation factor of this leaderboard, or you can simply mix your result stems with original stems. “SDR focuses more on lower frequencies, it can easily be fooled into giving a higher score if the lower frequencies are louder, Bas tested this theory and confirmed it”. you can boost the bass, and it will score +1 or +2 sdr higher or something, that's why It's not always reliable” - becruily\n\nThose results, which are not faked, are at least those, which were uploaded by various users evaluating the same public, available for offline use models, but usually uploaded with various parameters which affects SDR (so usually the better parameters, the higher SDR, but not always), remain consistent among various users evaluations with similar parameters and inference code, so scalability is correct and preserved, thus the results weren’t faked, and can be reproduced with similar SDR. For the other scores from unpublic inferences/models/methods, we simply trust ZFTurbo and rather viperx too, as they’re/were our trusted users for years. Also, the leaderboard in the current multisong dataset tends to give better SDR to the results with more residues on different occasions before, so the chart is simply not fully reliable for that, but rather not manipulated in its core either. It’s more a nature of SDR measurement and/or used dataset.\n\nViperX trained the first community BS-Roformer model similar to SAMI v1.0 model, although 2 stems (and lower scoring Mel at the time). His BS model sounds similar to Ripple (although it's only 2 stem, while 4 stem Ripple variant scores a bit higher than the 2 stem variant, but still lower than ViperX and v1.0). Then there were a lot of community trained models like private one by ZFTurbo, and fine-tunes by various users (Kim, Unwa, Bas, ZFTurbo)\n\nBas tried to train a model purely on multisong dataset only, but failed to surpass the SDR score of a 1.1 Bytedance’s model anyway. v1.1 has new arch enhancements to the arch, and will be presented on ISMIR2024 (white paper is already out; link in the [Training](#_9i8359eysaoe) section).\n\n## Drumsep - single percussion instruments separation\n\nIf you want to further separate single instruments from drums stem separated with e.g. MDX23 Colab, Mel-Roformer drums on x-minus.pro premium, MVSEP, or Demucs\\_ft (not necessarily BS-Roformer SW) into: hihat, cymbals, kick, snare and more, you might want to check below solutions. Sampling from such separated stems might be not the best idea due to the quality (see [here](https://raredsp.com/drumclone) for free Drumclone plugin allowing even different types of synthesised kicks from mixture; [video](https://www.youtube.com/watch?v=NbDo9DwNtuI)). But e.g. it serves well for purposes of conducting new mixes/remasters of the same songs or separated instrumentals, e.g. when it's overlapped with better quality, previously separated drums stem. It might give interesting results when aligned with the original drums, and rebalanced with effects (drums stem might end up louder in the mix than separated percussion, as most likely it will still have better quality). Check out these drum replacers.\n\nTo potentially increase drumsep models separation quality, “try using a small pitch shift up or down, like +/- 1 or 2 semitones (...) can sometimes help bring out the lows or highs if they seem weak.” (CZ-84)\n\nAlso, consider using good instrumental model before using 4 stem model for drums (if it's not instrumental already), to enhance drums stem, and then to enhance drumsep result.\n\n*Some drumsep models might have a bug where “a small, but relevant portion of audio is being lost when the [drumsep] model is being used”*\n\n*“The solution is to invert the phase on all the drum stems into the original file and save that as its own file, making your own \"other\" file”. It has been fixed on MVSEP.*\n\n### Mel-Roformer MVSEP drumsep models\n\n1) 4 stems v2 (kick, snare, toms, cymbals) - “It gives the best metrics with a big gap for kick, snare and cymbals.” - ZFTurbo. The old v1 below was removed.\n\n([metrics](https://imgur.com/a/h924uBF); only toms are worse SDR-wise vs previous SCNet Drumsep models below)\n\n~~1) 4 stems v1~~ removed (kick, snare, toms, cymbals) - average SDR of hihat ride, crash is 11,52 (but in one stem) and so far it’s the best SDR out of all models (even vs the previous ensemble consisting of three MDX23C and SCNet models).\n\n2) 6 stems (kick, snare, toms, hihat, ride, crash) - average SDR of hihat ride, crash is 8.18 (but from separated stems), while\n\nThe snare in 1) has the best SDR out of all available models.\nKick and toms are still the best SDR-wise in the previous 3x MDX23C and SCNet ensemble (new ensemble with these new Mel-Roformers so far)\n\n- The new models “are very great for ride/crash/hh. And overall, they have the best metrics for almost all stems.” - ZFTurbo\n\n[SDR/L1 Freq/bleedless/fullness chart](https://imgur.com/a/n9WMkSY) of all models\n\n[Evaluations on new dataset](https://imgur.com/a/vEVYTcJ) (esp. check Log WMSE Results with “\"bypass\\_filter\" with torch\\_log\\_wms, ([good] at least for drums or anything rich in low frequency content)” - jarredou\n\nSometimes the newer jarredou’s drumsep 6 stems model below can serve to clean up “upper frequency range of snare hits” in the cymbals stems in “either MelRoFormer or SCNet-XL four-or-six stem DrumSep models” - Dyslexicon\n\n“The core problem is that the main MVSep Drums model which is used by everything- including the Drumsep models- is not purely drums, it's mixed with other percussion which taints things.” - godzfire\n\n### SCNet MVSEP drumsep models\n\nBetter SDR than MDX23C and Demucs models above\n\n- MVSEP 8 stems ensemble of all the 4 drumsep models below (along with MDX23C model, and besides Demucs model by Imagoy) [metrics](https://i.imgur.com/sR5pNP3.png)\n\n- MVSEP’s SCNet 4 stem (kick, snare, toms, cymbals) out of following models, the best SDR for kick and similar to 6 stem below for toms - only -0.01 SDR difference)\n\n- MVSEP’s SCNet 5 stem (cymbals, hi-hat, kick, snare, toms)\n\n- MVSEP’s SCNet 6 stem model (ride, crash, hi-hat, kick, snare, toms) worse snare SDR\n\n### (newer) MDX23C 5 stem drumsep by jarredou\n\n[Download](https://github.com/jarredou/models/releases/tag/DrumSep). All SDR metrics are better than the previous 6 stem model below:\n\nSDR: kick: 16.66, snare: 11.54, toms 12.34, hihat: 4.04, cymbals: 6.36 ([all metrics](https://mvsep.com/quality_checker/entry/8460)).\n\nMetric fullness for snare: 25.0361, bleedless for hh: 12.3470, log\\_wmse for snare: 13.8959\n\n“it's more on the fullness side than bleedless” - from all the metrics, only bleedless for snare is worse than in the previous model:\n\n26.8420 vs 30.4149\n\n“Quite cleaner than the previous [6 stem] one”, “a lot noisier than other drumpsep models, but that's not necessarily a bad thing.”\n\nPossible “UnpicklingError: \"invalid load key, '\\x0a'.\"” issue in UVR if you use the old 6 stem yaml.\n\nMaybe if we separate just snare with the old MDX23C model below from an already separated drums stem, and mix/invert to get the rest, then pass it through the new model, the bleed would be gone.\n\nFor comparison, [metrics](https://mvsep.com/quality_checker/entry/8195) of the old 6 stem jarredou/Aufr33 MDX23C model\n\n(which has cymbals divided into ride and crash which are not in the evaluation dataset):\n\nSDR: kick: 14.55, snare: 9.79, toms: 10.64, hihat: 3.20, cymbals: 6.08\n\nMetric fullness for snare: 25.0361, bleedless for hh: 10.2765, log\\_wmse for snare: 12.4258\n\nThe model was trained with a lightweight config to train on a subpar T4 GPU on free Colabs and 10 accounts. The metrics do not surpass exclusive drumsep Mel-Roformer and SCNet models on MVSEP, but at least you can use this one locally.\n\n“Depending on the quality tier of input source material, it can sometimes yield more accurate stem-to-stem separations than either MelRoFormer or SCNet-XL four-or-six stem DrumSep models. (...)\n\nFor example, I often find that MelRoFormer DrumSep can leave the upper frequency range of Snare hits and mis-assign them to the Cymbals stem. This is a common issue I have encountered with separating AUD recordings with MelRoFormer DrumSep.\n\nMDX23c 5-stem drumsep is trained in such a way that it separates these snare remainders out of the Cymbals stem, which is extremely useful. \" Dyslexicon”\n\n### (older) MDX23C 6 stem drumsep by jarredou/Aufr33\n\nUse it on already separated [drums](#_sjf0vefmplt).\n\nDownload\n\n<https://github.com/jarredou/models/releases/tag/aufr33-jarredou_MDX23C_DrumSep_model_v0.1>\n\nUse on Colab: <https://github.com/jarredou/Music-Source-Separation-Training-Colab-Inference/>\n\n“. Added on MVSEP and [uvronline](https://uvronline.app/ai) too.\n\n(jarredou) “Drums Separation model trained by aufr33\n\n(on my not-that-clean drums dataset)\n\nStems:\n\nkick, snare, toms, hh, ride, crash\n\nMVSEP dataset evaluation:\n\nSDR: kick: 14.55, snare: 9.79, toms: 10.64, hihat: 3.20, cymbals: 6.08, hihat & cymbals: 6.77\n\n[More metrics](https://mvsep.com/quality_checker/entry/8195)\n\nTo get potentially better results with the model “try using a small pitch shift up or down, like +/- 1 or 2 semitones, in the settings you use to extract the drum stem from the instrumental stem. (...) can sometimes help bring out the lows or highs if they seem weak.” (CZ-84)\n\nIt can already be used, but training is not fully finished yet.\n\nThe config allows training on not so big GPUs [n\\_fft 2048 instead of 8096], it's open to anyone to resume/fine-tune it.\n\nFor now, it's struggling a bit to differentiate ride/hh/crash correctly, kick/snare/toms are more clean.\n\n[“and has the usual issues with mdx23 models, but it’s an improvement over drumsep I think” - Dry Paint Dealer Undr]\n\n- If you got an error while using jarredou’s Drumsep Colab (object is not subscriptable):\n\nchange to this on line 144 in inference.py:\n\nif type(args.device\\_ids) != int:\n\nmodel = nn.DataParallel(model, device\\_ids = args.device\\_ids)\n\n(thx DJ NUO)\n\nIt works in UVR too. All models should be located in the following folder:\n\nUltimate Vocal Remover\\models\\MDX\\_Net\\_Models\n\nDon't forget about copying the config file to: model\\_data\\mdx\\_c\\_configs.\n\nOnce the model is detected, select the config in a new window, and that’s all.\n\nThe model achieved much better SDR on private jarredou's small evaluation dataset compared to the previous drumsep model by Inagoy which was based on a worse dataset and older Demucs 3 arch.\n\nThe dataset for further training is available in the drums section of [Repository of stems/multitracks](#_k3cm3bvgsf4j) - you can potentially clean it further and/or expand the dataset so the results might be better after resuming the training from checkpoint. Using the current dataset, the SDR might stall for quite some amount of epochs or even decrease, but it usually increases later, so potentially training it further to 300-500-1000 epochs might be beneficial.\n\nAttached config also includes necessary training parameters for training further using ZFTurbo [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training/tree/main).\n\nCurrent model metrics (not MVSEP evaluation dataset):\n\n“Instr SDR kick: 18.4312\n\nInstr SDR snare: 13.6083\n\nInstr SDR toms: 13.2693\n\nInstr SDR hh: 6.6887\n\nInstr SDR ride: 5.3227\n\nInstr SDR crash: 7.5152\n\nSDR Avg: 10.8059” Aufr33\n\nAnd if evaluation dataset hasn't changed since then, the old Drumsep SDR:\n\n“kick : 13.9216\n\nsnare : 8.2344\n\ntoms : 5.4471\n\n(I can't compare cymbals score as it's different stem types)” - jarredou\n\nAfter initial jarredou’s training in Colab, Aufr33 decided to train the model for additional 7 days, to at least above epoch 113 (perhaps around 150, it wasn't said precisely), while using the same config, but on a faster GPU (rented 2x RTX 4090).\n\nEven epoch 5 trained on jarredou's dataset casually in slow and troublesome free Colab (which uses Tesla T4 15GB with performance of RTX 3050, but with more VRAM) with multiple Colab accounts and very light and fast training settings, already achieved better SDR than Drumsep using smaller dataset and older architecture. Colab epochs metrics:\n\n“epoch 5:\n\nInstr SDR kick: 13.9763\n\nInstr SDR snare: 8.4376\n\nInstr SDR toms: 6.7399\n\nInstr SDR hh: 0.7277\n\nInstr SDR ride: 0.8014\n\nInstr SDR crash: 4.4053\n\nSDR Avg: 5.8480\n\nepoch 15:\n\nInstr SDR kick: 15.3523\n\nInstr SDR snare: 10.8604\n\nInstr SDR toms: 10.3834\n\nInstr SDR hh: 4.0184\n\nInstr SDR ride: 2.7248\n\nInstr SDR crash: 6.1663\n\nSDR Avg: 8.2509”\n\nDon't forget to use already well separated drums (e.g. from Mel-Roformer for premium users on x-minus or MVSEP Drums ensemble) from well separated instrumental as input for that model, or jarredou’s MDX23 Colab fork v. 2.5 or also for all stems - MVSEP 4/+ ensemble (premium).\n\nPurely for drums separation from even instrumentals, the model might not give good results, hence it needs separated drums first. It was trained just on percussion sounds and not vocals or anything else.\n\nAlso, e.g. the kick and toms might have a bit weird looking spectrograms. It’s due to:\n\n“mdx23c subbands splitting + unfinished training, these artifacts are [normally] reduced/removed along [further] training.” [Examples](https://discord.com/channels/708579735583588363/900904142669754399/1258441408109613209)\n\n### Older [drumsep](https://github.com/inagoy/drumsep) by Inagoy\n\nDemucs 3 model. Just remember to use drums in one stem (e.g. with demucs\\_ft) from already good sounding instrumental or ensemble on MVSEP or MDX23 v. 2.4 Colab first, as use it as input (both are better for instrumental in most cases than just Demucs 4 - you can use various settings for ensembles to get better instrumentals, the better drums, the better results from drumsep)\n\n- [Fixed Colab](https://colab.research.google.com/drive/1wws3Qm3I1HfMr-3gAyW6lYzUHXG_kuyz?usp=sharing)\n\n- or [Kubinka Colab](https://colab.research.google.com/github/kubinka0505/colab-notebooks/blob/master/Notebooks/AI/Audio/Separate/Drumsep.ipynb) (you can provide direct links there)\n\n- Available on MVSEP.com (but you can use more intensive parameters in Colab for a bit better quality)\n\n(Use these solutions instead of GitHub Colab as the model's GDrive link from OG GitHub Colab is currently deleted, so drumsep won’t work correctly, unless you replace GDrive link with model to the .th model reupload:\n\n<https://drive.google.com/file/d/1S79T3XlPFosbhXgVO8h3GeBJSu43Sk-O/view>)\n\n- Windows installation - execute the following:\n\ndemucs --repo \"PATH\\_TO\\_DrumSep\\_MODEL\\_FOLDER\" -n modelo\\_final \"INPUT\\_FILE\\_PATH\" -o \"OUTPUT\\_FOLDER\\_PATH\"\n\n- You can also use drumsep in UVR 5 GUI\n\n(so beside using fixed Colab or in CML):\n\nGo to UVR settings and open application directory.\n\nFind the folder \"models\" and go to \"demucs models\" then \"v3\\_v4\"\n\nCopy and paste both the [.th](https://drive.google.com/file/d/1S79T3XlPFosbhXgVO8h3GeBJSu43Sk-O/view) and [.yaml](https://drive.google.com/file/d/1LJ_C4h-hXAJEMaP7lbPTpcrVH-i_VKjm/view?usp=drivesdk) files, and it's good to go.\n\nBe aware that stems will be labelled wrong in the GUI using drumsep.\n\nIt's much more sensitive to shifts than overlap, where above 0.6-0.7 it can become placebo. Consider testing it with shifts 20.\n\nBut some people find using shifts 10 and overlap 0.99 better than shifts 20 and overlap 0.75.\n\nJust be aware, that if you’re willing to wait, you can further increase shifts to 20 if you want the best of both worlds.\n\nAlso, consider testing it with -6 semitones e.g. in UVR 5.6/+, or with 31183Hz sample rate with changed (decreased) tempo and pitch.\n\n-12 semitones from 44100Hz is 22050 and should be rather less usable in most cases, the same for tempo preservation, it should be off.\n\nBe aware that sometimes it can “consistently put hi hats in snare stem” and can contain some artefacts, and results might not null with the source.\n\n“From what I've tested (on drums already extracted with demucs4\\_ft from a live band recording from the output of the soundboard... so shitty sounding!), It is quite good at separating cymbals from shells, and kick from snare, but there are parts of kick or snare sounds that can go into the toms stem (...it's easy to fix manually in a DAW)”\n\n\"Ok I did test it.\n\n- You're right, Drumsep is good if shifts are applied, this makes a HUGE difference, first time i did test it with 0 or 1 shift and results were meh. Shifts (from about 5/6/10 depending on source) clean it nicely.\n\nMinuses: only 4 outputs. Not enough for a lot of drumtracks (but hey you can Regroove results, and this is what i will be doing probably from now) - It takes a long time with a lot of shifts, - it doesnt null with original tracks\n\n- Regroove allows me more separations, especially when used multiple times, so as a producer it allows me to remove parts of kicks, parts of snares etc, noises etc. More deep control. Plus it nulls easily (it always adds the same space in front) so I can work more transparently.\n\nBut you're right, I will use drumsep in the Colab with a lot of shifts as a starting point in most cases now.\"\n\n\"It's trained with 7 hours of drum tracks that I made using sample-based drum software like Adictive Drums, trying to get as many different-sounding drums as I could. As everything was controlled with MIDI, I could export the isolated bodies: kick, snare, toms (all on one track), and cymbals (including hi-hat). So every dataset example is composed of kick, snare, toms, cymbals, and the mixture (the sum of all of them).\" - said the author - Inagoy\n\nFrom paid solutions for separating drums' sections there is mainly a paid [FactorSynth](#_cz4j2d3uf48s) and other alternatives are more problematic or less perfect.\n\nUse free zero shot for separating single other instruments from e.g. others stem from Demucs or GSEP.\n\n### Moises.ai drumsep\n\n(only for Pro)\n\n- Kick, snare, toms, hi-hat, cymbals, other\n\nIt’s not well documented on their promotional materials, but the option is available after dragging your input file on the site, and then under drums button.\n\n### FactorSynth\n\nSince version 3 available in a form of plugin for most DAWs. Demo runs for 20 minutes at a time. Exporting and component editing are disabled.\n\nTill v. 2 it was Ableton-only compatible add-on. And (probably) could be used on free Ableton Live.\n\nAlso, not for separating drums from a full mix, but for separating your already separated drums into further layers like kick, snare, transients, cymbals, etc. from Demucs or GSEP (the latter usually has better shakers and at least hi-hats when they're in fast tempo).\n\n[till v2 demo version limit was 8 seconds and no limit for full version]” “it’s amazing”.\n\nIt works the same way as Regroover VST (which may have some problems with creating a trial account).\n\nIt’s comparable or better quality (both better than zero shot for at least drums).\n\n“Factorsynth has more granularity, but drumsep is easier to work with and gets less confused between toms and kicks.”\n\nThere’s a freeware prototype 0.4-0.1 versions from 2017 for Mac available to download:\n\n<https://www.jjburred.com/software/factorsynth/proto.html>\n\n### Regroover\n\nRegroover is only for 30 seconds chunks, and they require manual align due to phasing issues - additional silence is added in the beginning and ending.\n\n“Get your 30-second drum clip, then drag and drop it into Regroover.\n\nMake sure to de-select the Sync option, as it will time stretch it by default.\n\nOn the right-hand side, I recommend changing the split to 6 layers instead of 4, simply for flexibility.\n\nOnce it has processed that, you can choose export -> layers.\"\n\nThere was a report that probably newer versions might not be feasible for this task anymore.\n\nIn other words:\n\nIt’s much more hassle to use it than drumsep, but it’s very good “if you need particular sound and not about pattern etc.\n\n1. separate drums from whole track (demucs)\n\n2. Cut drum track into max 30 second cuts [regroover limits] and ideally cut right on transient, some space before kick helps,\n\n2. You use regroover for the first time and for example try to separate to 4 tracks, just so overall separation.\n\n3. Those separation sums exactly to that is given, sometimes it just need to be realigned few ms.\n\n4. And if for example kick still has some not needed parts, you just regroove it once again.\n\nIf are looking overall fast and for patterns, drumsep. Regroover for painfull but precise job. Also in most cases hihats are trash, but snare's and kicks you often can find perfetclu usable ones. I'm not sure about metal but overall.”\n\n### UnMixingStation\n\n\"Very, very old and almost impossible to [find](https://www.magesypro.com/audionamix-unmixingstation-v0-90-115-8730-assign/), but the separations are 95% close to Regroover\". The software is 13 years old, and their site is down, and the tool doesn’t seem to be available to buy anywhere.\n\n### LarsNet\n\nAdden on MVSep. [Colab](https://github.com/jarredou/larsnet-colab). Source: <https://github.com/polimi-ispl/larsnet>\n\nIt separates previously separated drums into 5 stems: kick, snare, cymbals, toms, hihat.\n\nIt’s worse than Drumsep as it uses Spleeter-like architecture, but “at least they have an extra output, so they separate hihats and cymbals.”. Colab\n\n“Baseline models don't seem better quality than drumsep, but the provided checkpoints are trained with oly 22 epochs, it doesn't seem much. (and STEMGMD dataset was limited by the only 10 drumkits), so it could probably be better with better dataset & training”\n\nSimilar situation as with Drumsep - you should provide drums separated from e.g. Demucs model.\n\nThere’s also Zynaptiq Unmix Drums, but it’s not exactly a separation tool, but to “Boost Or Attenuate Drums In Mixed Music”.\n\n- For only kick and hi hat separation now free -\n\n### VirtualDJ 2023/Stems 2.0 (kick, hi-hat)\n\nProbably using drums from Demucs 4 or GSEP first, will give better results but, it's not perfect. In many cases it may leave bleeding of snare a little bit, in both hi-hat and kick track. Sadly it sometimes confuses these elements of a mix.\n\n\"If you are not using it professionally, and do not use any professional equipment like a DJ controller, or a DJ mixer, then VirtualDJ is (now) FREE\".\n\n### RipX DeepAudio (-||-) (6 stems [piano, guitar])\n\nPopular tool. Decent results for specific drums' sections separation (but as for vocal/instrumental/4 stems separation, all the tools mentioned in at the very top of the document outperforms RipX, so use it only for specific drums’ section separation only, at best using Demucs 4 or GSEP for drums stem).\n\n\"It can separate a file into a buncha things into a lot more types of instruments than just the basic 4 stems (with varying degrees of success ofc).\n\nMight be a case that old cracked versions of RipX don't allow separating drums sections well, or just the opposite - check both the newest version and Hit'n'Mix RipX DeepAudio v5.2.6, but probably the latter doesn't support separating single drums yet.\n\nIt’s basically UVR but with their custom models + SFX single stem\n\nIt's good for guitar, but not in all cases (possibly Demucs for 4 stems).\n\nPiano and guitar models were added recently (somewhere in the January 2023)\n\n- Hit 'n' Mix RipX DAW Pro 7 released. For GPU acceleration, min. requirement is 8GB VRAM and 10XX card or newer (mentioned by the official document are: 1070, 1080, 2070, 2080, 3070, 3080, 3090, 40XX). Additionally, for GPU acceleration to work, exactly Nvidia CUDA Toolkit v.11.0 is necessary. Occasionally during transition from some older versions, separation quality of harmonies can increase. Separation time with GPU acceleration can decrease from even 40 minutes on CPU to 2 minutes on decent GPU.\n\nThey say it uses Demucs.\n\nWe have reports about crashes, at least on certain audio files. There are various RipX versions uploaded on archive.org, maybe one will work, but some keys work only on versions from 2 and up.\n\n### Spectralayers 10\n\nReceived an update of an AI, and they no longer use Spleeter, but Demucs 4 (6s), and they now also good kick, snare, cymbals separation too. Good opinions so far. Compared to drumsep sometimes it's better, sometimes it's not. Versus MDX23 Colab V2, instrumentals sometimes sound much worse, so rather don’t bother for instrumentals.\n\n## USS-Bytedance (any; esp. SFX)\n\n<https://github.com/bytedance/uss>\n\n(COMMAND: \"conda install -c intel icc\\_rt\" SOLVES the LLVM ERROR)\n\nYou provide e.g. a sample of any instrument or SFX, and the AI separates it solo from a song or movie fragment you choose to separate.\n\nIt works in mono. You need to process right and left channel separately.\n\nUpdate 29.04.25 (Python No such file or directory fix; thx epiphery)\n\n<https://colab.research.google.com/drive/1rfl0YJt7cwxdT_pQlgobJNuX3fANyYmx?usp=sharing>\n\n(old) ByteDance USS with Colab by jazzpear94\n\n<https://colab.research.google.com/drive/1lRjlsqeBhO9B3dvW4jSWanjFLd6tuEO9?usp=share_link>\n\n(old) Probably mirror (fixed March 2024):\n\n<https://colab.research.google.com/drive/1f2qUITs5RR6Fr3MKfQeYaaj9ciTz93B2>\nerrors out with:\n\n“sed: can't read /usr/local/lib/python3.10/dist-packages/uss/config.py: No such file or directory”) (2025)\n\nIt works (much) better than zero-shot (not only “user-friendly wise”).\n\nBetter results, and It divides them into many [categories](https://media.discordapp.net/attachments/708579735583588366/1106629812119941341/image.png).\n\nGreat for isolating SFX', worse for vocals than current vocal models. Even providing acapella didn't give better results than current instrumental models. It just serves well for other purposes.\n\n\"Queries [so exemplary samples] for ByteDance USS taken from the DNR dataset. Just download and put these on your drive to use them in the Colab as queries [as similarly sounding sounds from your songs to separate].\"\n\n<https://www.dropbox.com/sh/fel3hunq4eb83rs/AAA1WoK3d85W4S4N5HObxhQGa?dl=0>\n\nAlso, grab some crowd samples from here:\n\n<https://youtu.be/-FLgShtdxQ8>\n\n<https://youtu.be/IKB3Qiglyro>\n\n<https://youtu.be/Hheg88LKVDs>\n\nQ&A by Bas Curtis and jazzpear\n\nQ: What is the difference between running with and without the usage of reference query audio?\n\nA: Query audio lets you input audio for it to reference and extract similar songs based upon (like zeroshot but way better) whereas without a query auto splits many stems of all kinds without needing to feed it a query.\n\nQ: Let's say there is this annoying flute you wanna get rid off...\n\nand keep the vocals only....\n\nYou feed a snippet of the flute as reference, so it tries to ditch it from the input?\n\nA: Quite the reverse. It extracts the flute only which ig you could use to invert and get rid of it\n\n## Zero Shot (any sample; esp. instruments)\n\n<https://github.com/RetroCirce/Zero_Shot_Audio_Source_Separation>\n(as [USS Bytedance](#_4svuy3bzvi1t) came out now, zero shot can be regarded as obsolete now, although zero-shot might is rather better for single instruments than for SFX)\n\nYou provide e.g. sample of any trumpet or any other instrument, and AI returns it from a song you choose to separate.\n\n[Guide and troubleshooting](https://discord.com/channels/708579735583588363/947867283752108122/947867478271340556) for local installation (get Discord invitation in footer first if necessary).\n\nGoogle Colab [troubleshooting](https://discord.com/channels/708579735583588363/947867283752108122/950263276514725898) and [notebook](https://colab.research.google.com/drive/1kUj5DQe6HzkPo4WyWYTVfHZg14FyTgZY?usp=sharing) (though it may not work at times when GDrive link resources are out of download limit, also it returns some torch issues after Colab updates in 2023).\n\nCheck out also this Colab alternative:\n\n<https://replicate.com/retrocirce/zero_shot_audio_source_separation>\n\nIt's faster (mono input required).\n\nAlso available on <https://mvsep.com/> in a form of 4 stems without custom queries, and it’s not better than Demucs in this form.\n\n\"Zero shot isn't meant to be used as a general model, that's why it accelerates on a specific class of sounds with some limitations in mind.... It mostly works the best when samples match the original input mixture, of course there are limitations\"\n\n\"You don’t have to train any fancy models to get decent results [...] And it’s good at not destroying music\". But it usually lefts some vocal bleeding, so process the result using MDX to get rid of these low volume vocals. Zero-shot is also capable of removing crowd from recordings pretty well.\n\nAs for drums separation, like for snares, it’s not so good as drumsep/FactorSynth/RipX, and it has cutoff.\n\n\"I did zero shot tests a week or two ago, and it was killing it, pulling harmonica down to -40dB, synth lines gone, guitars, anything. And the input sources were literally a few seconds of audio.\n\nI've been pulling out whole synths and whistles and all sorts.\n\nKnocks the wind model into the wind, zero shot with the right sample to form the model backbone works really well\n\nThe key is to give it about 10 seconds of a sample with a lot of variation, full scales, that kinda thing\"\n\n**Dango.ai**\n\nCustom stem separation feature (paid, 10 seconds for free)\n\nExpensive\n\n*Special method of separation by viperx (ACERVO DOS PLAYBACK) edited by CyberWaifu*\n\nProcess music with Demucs to get drums and bass.\n\nProcess music with MDX to get vocals.\n\nSeparate left and right channels of vocals.\n\nProcess vocal channels through Zero-Shot with a noise sample from that channel.\n\nPhase invert Zero-Shot's output to the channel to remove the noise.\n\nJoin the channels back together to get processed vocals.\n\nInvert the processed vocals to music to get the instrumental.\n\nSeparate left and right channels of instrumental.\n\nProcess instrumental channels through Zero-Shot with a noise sample from that channel.\n\nPhase invert Zero-Shot's output to the channel to remove the noise.\n\nJoin the channels back together to get processed instrumental.\n\nProcess instrumental with Demucs to get other.\n\nCombine other with drums and bass to get better instrumental.\n\nSo it sounds like Zero-Shot is being used for noise removal.\n\nAs for how Zero-Shot and the noise sample works…\n\n## AudioSep\n\n“I decided to try AudioSep: <https://github.com/Audio-AGI/AudioSep> on MultiSong Dataset.\n\nI used prompt 'vocals'. I was sure it would be bad, but I didn't think it's so bad.\n\n<https://mvsep.com/quality_checker/entry/8408>\n\nI also tried it on the Guitar dataset - it's even worse - negative SDR. Maybe I'm doing something wrong. But I tried the example with cat from the demo page, and it worked the same as in there. So I think I have no errors.”\n\nsdr: 0.33\n\nsi\\_sdr: -2.39\n\nl1\\_freq: 17.62\n\nlog\\_wmse: 6.72\n\naura\\_stft: 3.66\n\naura\\_mrstft: 5.55\n\nbleedless: 9.29\n\nfullness: 16.58\n\nColab on GH probably gives unpickiling issue. You might be able to fix it be executing:\n\n!pip install torch==2.5.0\n\nAfter you execute all the installation-related cells.\n\nSince then, probably something more about dependencies is also needed, like it ‘s coded now in the inference Colab.\n\n### Medley Vox (different vocalists)\n\nUse already separated [vocals](#_n8ac32fhltgg) as input (e.g. by Roformers, vox\\_ft or MDX23C fullband a.k.a. 1648 in UVR or 1666 on MVSEP).\n\nLocal installation video tutorial by Bas Curtiz:\n\n[https://youtu.be/VbM4qp0VP8](https://youtu.be/VbM4qp0VP80)\n(NVIDIA GPU acceleration supported, or perhaps CPU - might be slow)\n\nCyrus version of MedleyVox Colab with chunking introduced, so you don't need to do chunking manually:\n\n<https://colab.research.google.com/drive/10x8mkZmpqiu-oKAd8oBv_GSnZNKfa8r2?usp=sharing> (07.02.25 fork with fairseq fix and GDrive integration)\n\nCurrently, we have a duet/unison model 238 (default in Colab),\n\nand main/rest 138 to uncomment in Colab.\n\nRecommended model is located in vocals 238 folder (non ISR-net one).\n\nWhile:\n\n“The ISR\\_net is basically just a different type of model that attempts to make audio super resolution and then separate it. I only trained it because that's what the paper's author did, but it gives worse results than just the normal fine-tuned.”\n\nMedleyVox is also available on MVSEP, but it has more bleeding and “doesn't work as well as the Colab iteration with duets”. (Isling/Ryanz)\n\nThe \"duet/unison model 238\" will be used by default.\n\n*``and main/rest 138 to uncomment in Colab``* if you need it.\n\nThen go to the first cell again. To \"uncomment\" means to delete the \"#\" from the beginning of the line before the \"!wget\" so the line will be used to download the model files.\n\nDo it for both pth and json lines\n\n(you might be asked whether to replace existing pth and json files by the alternative model you just downloaded in the place of the previous one)\n\n*``Recommended model is located in vocals 238 folder (non ISR-net one).``*\n\nThat's the model used in the Colab by default. You can ignore that information. It's for users using the MV on their own machine.\n\nThe output for 238 model is 24kHz sample rate (so 12kHz model in Spek).\n\nYou might want to upscale the results using e.g. [AudioSR](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.i7mm2bj53u07) or maybe even Lew’s vocal enhancer location further below the linked section.\n\nThe output is mono.\n\nYou might want to create a \"fake stereo\" as input by copying the same channel over the two, then do the same with another channel, and then create the stereo result from both channels processed separately in dual mono with MV.\n\nThe AI will create a downmix from both input channels instead of processing channels separately.\n\nBe aware that “dual mono processing with AI can often create incoherencies in stereo image (like the voice will be recognized in some part only in left channel and not the other, as they are processed independently)” jarredou\n\n\"The demos sound quite good (separating different voices, including harmonies or background [backing] vocals)\"\n\nIt's for already separated or original acapellas.\n\nThe model is trained by Cyrus. The problem is, it was trained with 12kHz cutoff… “audiosr does almost perfect job [with upscaling it] already, but the hugging page doesn’t work with full songs, it runs out of memory pretty fast”.\n\nIt was possible at some point that later stages of the training, looking like over fitting were responsible for higher frequency output.\n\nIt’s sometimes already better than BVE models, and the model has already similar to demo results on their site.\n\nSadly, the training code is extremely messy and broken, but a fork by Cyrus with instructions is planned, with releasing datasets including the one behind geo-lock. Datasets are huge and heavy.\n\nOriginal repo (Vinctekan fixed it - the video at the top contains it)\n\n<https://github.com/jeonchangbin49/medleyvox>\n\n*\\_\\_\\_\\_*\n\n*Outdated*\n\n<https://colab.research.google.com/drive/1StFd0QVZcv3Kn4V-DXeppMk8Zcbr5u5s?usp=sharing> (pip issues fixed 29.08.24, defunct as of 06.02.25)\n\n(outdated instructions, current Colab explains everything)\n\n“Run the 1st cell, upload song to folder infer\\_file, run the 2nd cell, get results from folder results = profit”\n\nFurther explanations how to use the Colab:\n*``Run the 1st cell``*\n\nSo press the first \"play\" button then you load the Colab\n\n*``upload song to folder infer\\_file``*\n\nLooks like the folder for the input file has changed from infer\\_file to input in newer Colabs.\n\nSo, once you started the first cell, and it finished, open Colab file manager (folder icon on the left) and go to /content/MedleyVox/input\\\n\nNow paste your song there and wait till it's done.\n\n*``run the 2nd cell``*\n\nSo the next play button below the first one once you scroll down a bit. Now it will start separation\n\n*``get results from folder results``*\n\nGo to file manager again and find /content/MedleyVox/results\n\nright-click on the result file and download it. Wait till it's done.\n\n*``Currently, we have a duet/unison model 238 (default in Colab)``*\n\nSo you don't have to change anything in the Colab to separate using it.\n\nOld info\n\n<https://media.discordapp.net/attachments/900904142669754399/1050444866464784384/Screenshot_81.jpg> (dead)\n\nColab old\n\n<https://colab.research.google.com/drive/17G3BPOPBPcwQdXwFiJGo0pKrz-kZ4SdU>\n\nOlder Colab\n\n<https://colab.research.google.com/drive/1EHJFBSDd5QJH1FQV7z0pbDRvz8yXQvhk>\n\n(The same one, but here you need to change the .ckpt, .json and .pth files there from Cyrus [more details in the video above].)\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n###### **About other services:**\n\nCheck [this](https://docs.google.com/spreadsheets/d/1-UTexxQpvZpxliGjcmRUDcnKSY-nikkNCqVDcq_0ROE/edit#gid=0) chart by Bas Curtiz to check what AIs use various (also online) services, plus their pricing.\n\n*At this point everything mentioned above this link for at least instrumentals, vocals, 4-6 stems is better than below, (with exceptions for some single stems described at the top) commonly known services:*\n\n###### Spleeter\n\n*and its implementation in:*\n\n###### Izotope RX-8/9/10\n\n*which just uses 22kHz models instead of 16kHz in the original Spleeter. There is no point in using these anymore. The same goes to most AIs described below (or only for specific stems):*\n\n*voiceremover.org, lalal.ai,*\n\n###### phonicmind\n\n###### melody.ml\n\n*RipX, Demix,*\n\n###### ByteDance Ripple/CapCut\n\n[beatstorapon](https://beatstorapon.com/ai-stem-splitter)\n\n######\n\nFor reference, you can check a [comparison](https://mvsep.com/quality_checker/leaderboard.php?sort=insrum) chart on MVSEP.com,\n\nor results of [demixing challenge](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/leaderboards?challenge_leaderboard_extra_id=869&challenge_round_id=886&post_challenge=true) from Sony (kimberley\\_jensen there is 9.7 MDX-UVR model for vocals - 2nd best on the time)\n\nand watch [this](https://youtu.be/gl5AKCgMSSc) comparison.\n\nTo hear 4 stems models comparison samples you can watch [this](https://youtu.be/gl5AKCgMSSc) video comparison (December 2022).\n\nIt all also refers to new:\n\n### real-time\n\nAI separation tools like\n\n###### Serato\n\nand\n\n###### Stems 2.0\n\ntensorflow model (which can be found in newer Virtual DJ 2023 versions, now free for home users - better than Serato and Spleeter implementations) - they do not perform better than the best offline solutions at the very top of the document. But “Esp. since it's on-the-fly [...] results are more than decent (compared to others).”\n\n###### Acon Digital Remix\n\n(Vocals, Piano, Bass, Drums, and Other)\n\n“Just listened to the demo, not great [as for realtime] but still”\n\n*Others*\n\n##### FL Studio (Demucs)\n\nIt’s actually not realtime. It takes some time to process tracks first (hence maybe it’s the best out of the three).\n\nIt's Demucs 4, but maybe not ft model and/or with low parameters applied or/and it's their own model.\n\n\"Nothing spectacular, but not bad.\"\n\n\"- FL Studio bleeds beats, just like Demucs 4 FT\n\n- FL Studio sounds worse than Demucs 4 FT\n\n- Ripple clearly wins\"\n\n##### djay Pro 5.x\n\n“very good realtime stems with low CPU” Allegedly “faster and better than Demucs, similar” although “They are not realtime, they are buffered and cached.” but it’s very fast anyway. It uses AudioShake. It can be better for instrumentals than UVR at times.\n\n##### Neutone VST\n\nHas Demucs model to use in realtime in a DAW\n\n(it uses light “retrained, smaller version” version of Demucs\\_mmi)\n\n<https://neutone.space/>\n\n<https://neutone.space/models/1a36cd599cd0c44ec7ccb63e77fe8efc/>\n\nIt doesn't use GPU, and it's configured to be fast with very low parameters, also the model is not the best on its own. It doesn't give decent results, so it's better to stick to other real-time alternatives. It won’t work correctly on low-end CPU, breaking audio in the middle and giving inconsistent audio stream with breaks.\n\n##### Peel Stems\n\n<https://products.zplane.de/products/peelstems/>\n\nVST for real time source separation (probably same models like in MPC stems)\n\n<https://www.youtube.com/watch?v=0Js5bWQWY7M>\n\n- Service rebranded to\n\n#### Fadr.com from SongtoStems.com\n\nis just Demucs 4 HT, but paid.\n\n\"My assumption, Fadr uses Gain Normalize [for instrumentals] was right [...].\n\nDemucs 4 HT seems to get a cleaner result. The rest = practically identical.\" And someone even said that vocals in VirtualDJ with Stems 2.0 had less artifacts on vocals.\n\n#### Apple Music Sing\n\n“I heard a few snippets, and what stood out is, whether intentional or not, the vocals remained in the background just enough to actually hear them.\n\nNow that could be great for Karaoke, so u have a kind of lead to go on.” but as for just instrumentals, it’s bad.\n\n#### Voxless\n\nVST “uses AI to separate vocals and instrumental in real time. Now it is designed to be used in a DAW, but you can also run it in soundsource [on Mac, or probably SAVIHost (VST2/3) or Equalizer APO (VST3) or JBridge on Windows] so you can use it on your system audio live. It has low latency and doesn't use CPU a lot. The software has a very simple interface, just two knobs to increase/ decrease the instrumental or vocals or a mute/solo button for each. As for the quality it sounds like the first ever days of audio separation with AI like Demucs v1 or Spleeter in 2019 - 2020 but a little worse somehow, since it is very low latency not CPU heavy, but it does the job. Voxless has a trial of 7 days if you wanna check it but the license costs 100$ which I think is quite a lot for a software that separates vocals with barely first gen quality.” midol\n\n#### Ozone 11 Master Rebalance\n\n“I select vocals and have them dialed down to max using an EQ inside of it (may sound complicated, so you gotta watch a tutorial to see how ozone works). However, the results were far from voxless quality. It leaves so much bleed and whenever vocals are quite loud you can barely hear anything from the way it's fighting it, so it sounds like a complete mess. Both from the master rebalance and the main AI interface” midol\n\n(x) BL-Rebalance\n\n“The most important thing which is the separation quality, is horrible unfortunately, dialing the vocals all the way down to -120db the max, barely picks up vocals to cancel, the song sounds like it's just playing normally with vocals being suppressed in a very horrible way, it's muddy, and it leaves a lot of bleed, also again, when vocals are quite loud, you barely hear anything else because it's fighting hard.” midol\n\nalgoriddim djay\n\nApp for Windows, Android, Mac.\n\nJudging by strings in stemseparation.dll, they seem to use “bytesep” which is a package name of this repository: [bytedance/music\\_source\\_separation](https://github.com/bytedance/music_source_separation).\n\n*\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_*\n\n#### Music to MIDI transcribers/converters\n\nMultitrack stem MIDI transcription:\n\n<https://github.com/magenta/mt3>\n\n<https://colab.research.google.com/github/magenta/mt3/blob/main/mt3/colab/music_transcription_with_transformers.ipynb>\n\nGood results for piano:\n\n<https://github.com/magenta/magenta/tree/main/magenta/models/onsets_frames_transcription>\n\n<https://colab.research.google.com/notebooks/magenta/onsets_frames_transcription/onsets_frames_transcription.ipynb>\n\n“On high quality piano recordings its almost flawless”\n\nIf you have notes:\n\nmusescore\n\n[transkun](https://github.com/Yujia-Yan/Transkun) transcriber (now also on MVSEP)\n“it's the most accurate piano transcription algorithm ever trained and is unequalled in accuracy and absolute indifference to literally \\*any audio quality\\*\n\nas long as the piano being transcribed is at A440 it'll spit out a 95 percent accurate transcription from virtually any recording no matter how absolute garbage it is”\n\n#### [Piano2Notes](https://klang.io/piano2notes/)\n\n(notes and midi output, paid, 30 seconds for free, very good results)\n\n[basicpitch.spotify.com](https://basicpitch.spotify.com/)\n\n“Tried Basic-Pitch and It is way worse than MT3 as It produces MIDI tracks without an identifier.”\n\n“not 100% accurate but it's certainly the easiest to use and maybe most versatile”\n\n**Harmonic mixing** (find the song key)\n\n“Since mixed in key change from 10 to 11 the software has several failures especially when overwriting the file name and an error that base 84 error, and you are left without the analysis of the file thing. Which is essential when doing remixes and having a clarity of the tone and bpm. Someone knows of an alternative that does not make a mistake”\n\n<https://www.reddit.com/r/DJs/comments/n5byah/key_detection_comparison_mixed_in_key_10_vs_85/>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nOlder separation services\n\n#### Audioshake\n\nPaid, $16 per wav stem, 2 or 5 stems (6? (guitar and piano) or 4 stems for preview (Indie creators)\n\nBetter piano model than GSEP.\n\n\"gsep piano model is very clean but sometimes fails in bigger mix, when there are a lot of instruments\"\n\nAnd also guitar stem\n\nInstead of Audioshake you can use:\n\n- myxt.com (also paid, 3 stem model, prob. 16kHz cutoff which Audioshake normally doesn't have. No other stem. Results, maybe slightly better than Demucs)\n\n- Algoriddim djay pro\n\n- Neural Mix Pro (part of Algoriddim, also uses Audioshake), but it’s only for MacOS\n- LANDR Stems (cheaper, also [uses](https://www.audioshake.ai/press-releases/landr-introduces-landr-stems-plugin-powered-by-audioshakes-award-winning-ai-technology) Audioshake; plugin, probably doesn’t work locally, free access won’t give you access to stems; “LANDR Stems is only included in Studio Standard + Studio Pro” it’s not included in trial; SDR: [1](https://mvsep.com/quality_checker/entry/7603) | [2](https://mvsep.com/quality_checker/entry/7602))\n\n- <https://twoshot.app/model/289>\n\nAudioshake is suspected that it is just MDX with expanded dataset, but there’s no evidence at the moment. Comparing to UVR/MDX-UVR NET 1 model, vocal stem is 9.793 vs 9.702 in free MDX-UVR, so they’re close as for vocals.\n\nTheir researcher said they were training UMXHQ model at this period of time of 2020 Demixing Challenge.\n\nFree Demucs 3 has a much better SDR for drums and bass than Audioshake, however the [SDR](https://cdn.discordapp.com/attachments/708912656370630717/952035316754174002/unknown.png) for vocals and others is worse.\n\nIt accepts only non-copyrighted music for separations, but you can slow it down to circumvent it (some music like K-Pop BTS is not detected) but changing speed to 110% yields better results, even vs reversing the track.\n\nUpload limit is one minute, so theoretically you can cut and merge chunks, but AS will fade out each chunk, so you need to find specific overhead to begin every next chunk with, to merge chunks seamlessly (I don’t remember if it solves the problem of AS watermark, though).\n\nThen, you can download preview (chunk) for free using similar method like described in allflac section (Chrome Dev Tools -> Network -> Set filter to amazon) but result file is unfortunately only 128kbps mp3.\n\nThey are now limiting how many audio files you can upload to preview, but that can easily be mitigated by just using a temporary email provider or adding “+1” or “+2” or “.” to your gmail address, so you will still receive your email e.g. y.o.u.r.m.a.i.l.@gmail.com is the same for Google as yourmail@gmail.com.\n\nYou can also ping Smudge, Baul Plart or Bas Curtiz in #request-separation to isolate some song to make this all easily just for you (edit. 09.02.2023 - at least the Bas’ tool stopped working, so the rest like AS Tool might be dead too - at least in terms of API access, not sure).\n\n#### Lalal.ai\n\n7 stem\n\nAcoustic and electric guitar models, piano, bass, drums and vocal with instrumental (for 2 stem UVR/MDX should do the job better)\n\nOnline service with 10 minutes/50MB per file limitation per free user.\n\nNow they have some voc/inst models sounding like some ensemble of public Roformers, but still not as good, but close. Some specific models are worth trying out, e.g. lead guitars - the model got better by the time or piano model.\n\nOlder notes:\n\n“I love Demucs 3, although for some specific songs (with a lot of percussion and loops) I still find lalal better.\n\nDemucs is great at keeping punchy drums, for example hip-hop, rap, house etc songs”\n\n“lalal is[n’t] worth it anymore, most of their models like strings or synths are crap and don't work at all” ~becruily\n\nHow to… abuse it. Doesn't always work for everyone, and sometimes you'll receive only 19 seconds snippets.\n\nGo to the signin/register screen and use a temp email from <https://tempail.com/>\n\nWhen you are in, make sure you use the settings with a P icon, P meaning Pheonix, which seems to be some hybrid mvsep lalal shit they made\n\nI'd recommend making the processesing level normal, although you can play around with the settings to see what sounds better\n\nThey will later process it and since lalal has shorter queues, you get them faster. It took me like 10 seconds to get a preview for a song and 20 seconds for full which is wild.\n\nYou will get a sample and if you like it, you can submit it and get your stems!\"\n\nYou can also use dots in Gmail addresses, instead of +1 (and more) at the end, which is unsupported in lalal. You'll receive your email with dots in its username anyway, and it will be treated as a separate email by their system.\n\nTheir app uploads input files to separate on external servers.\n\n#### DeMIX Pro V3\n\nPaid, 6 stem model, trial\n\nOfficial site:\n\n<https://www.audiosourcere.com/demix-pro-audio-separation-software/>\n\n<https://www.demixer.com/?utm_source=audiosourcere&utm_medium=pop&utm_campaign=exit&utm_term=asre-exit-pop>\n\npaid 33$/month, or x10 for year, or x2,5 permanent license, 7 days trial available\n\n<https://www.audiosourcere.com/demix-pro-audio-separation-software/>\n\nVocal, Lead Vocal, Drum, Bass & Electric Guitar\n\n<https://www.demixer.com/> has the same models implemented, though they don’t currently even describe that guitar model is available, but when you log in, it’s [there](https://imgur.com/a/p6WCzoD). Guitar might be a bit worse than RipX (not confirmed)\n\n“audioshake [had] the best guitar model [at some point] (its combined [paid only]), second place is deemix pro (electric guitar)”\n\n\"Demix launched a new v4 beta, and it can now process songs locally + new piano and strings models\n\nthe piano model is not bad at all, it sounds a bit thin/weak, but it detects almost all notes\n\nhadn't found good songs to test the strings model yet, but it might be good too\"\n\n#### [Hit'n'Mix RipX DeepAudio](#_1bm9wmdv6hpf)\n\n#### Moises.ai\n\n<https://moises.ai/>\n\nNot really a good models before introduing BS-Roformer ones, no previews for premium features.\n\nYou can use apk when it allows previewing for free without downloading, but isling found some workaround googling for “moises premium free apk”.\n\nSome information here might be outdated.\n\n“also has a guitar and a b.v. model, and a new strings model, but it's not that good, in my opinion it is not worth buying a premium account.\n\n4-STEM model is something like demucs v2 or demixer.\n\nB.V. model is worse than the old UVR b.v..\n\nGUITAR model is not really good, it's probably MDX, it has a weird noise, and it tries to take the \"guitar\" where is not at all. It takes acoustic and electric guitar together.\n\nPIANO model is just splitter, maybe better at some songs.\n\nSTRINGS model is interesting, It's good for songs with orchestra, but still not that clean\n\nTheir service is very interesting, and the appearance of their site is clear and simple, but the models have better competitors.” thx, sahlofolina.\n\nByte Dance\n\navailable on <https://mvsep.com/>\n\n“This algorithm took second place in the vocals category on Leaderboard A in the Sony Music Demixing Challenge. It's trained only on the MUSDB18HQ data and has potential in the future if more training data is added.\n\nQuality metrics are available here (SDR evaluated by his authorship non-aircrowd method):\n\n<https://mvsep.com/quality.php>\n\nDemos for Byte Dance: <https://mvsep.com/demo.php?algo=16> “\n\n(8.08 SDR aicrowd for vocal)\n\nMDX-UVR SDR vocal models (kimberley\\_jensen a.k.a. KimberleyJSN) were evaluated by the same dataset as ByteDance above (aircrowd):\n\n<https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021/leaderboards?challenge_round_id=886&challenge_leaderboard_extra_id=869&post_challenge=true>\n\n<https://discord.com/channels/708579735583588363/887455924845944873/910677893489770536>\n\nand presumably the same goes to GSEP and their very first vocal model (10 SDR) since their chart showed the same ByteDance SDR score like in aircrowd.\n\n###### \\_\\_\\_UVR settings for ensemble (section deprecated, see the section above)\\_\\_\n\nEnsemble can provide different results from one current main model, but not especially better in all cases, so it’s also a matter of taste and conscious evaluation.\n\n* Aggressiveness shouldn’t be set to more than 0.1\n\n(also check 0.01)\n\n* high\\_end\\_process: bypass (official recommendation) or mirroring 2 (in some cases)\n* In most cases, you shouldn’t use more than 4 models to not decrease the quality (developer recommendation)\n\nDon't use postprocessing in HV Colab for ensemble (doesn't work).\n\nOther recommended models for ensemble:\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_1.pth,\n\nHP2-4BAND-3090\\_4band\\_arch-500m\\_2.pth\n\n(+new 3 band?)\n\nas they currently the best (15.08.21) but feel free to experiment with more (I also used old MGM beta 1 and 2 with two above,\n\nsome people used also vocal models as well, and later there was also HP2-MAIN-MSB2-3BAND-3090\\_arch-500m model released, which gives good results solo).\n\n###### (old) \\_\\_\\_Good UVR accapella models\\_\\_\\_\\_\\_\\_\n\nIn general, it’s better to use MDX-UVR models for clean acappellas, but for UVR, these are going to be your best bet:\n\n- Vocal\\_HP\\_4BAND\\_3090 - This model with come out with less instrumental bleed.\n\n- Vocal\\_HP\\_4BAND\\_3090\\_AGG - This is a more aggressive version of the vocal model above.\n\n“If you wanna removes the vocals but keeping the backing vocals, you can use the latest BV model”\n\nHP-KAROKEE-MSB2-3BAND-3090.pth\n\n(HV)\n\nFor clean vocal, you can also use ensemble with following models:\n\n<https://cdn.discordapp.com/attachments/767947630403387393/897512785536241735/unknown.png>\n\n(REUim2005)\n\n###### \\_\\_How to remove artefacts from an inverted acapella?\\_\\_\\_\\_\\_\n\nThis section is old, and “cleaning inverts” in [current models](#_rz0d5zk9ms4w) section can provide more up-to date solutions.\n\n0) Currently, GSEP is said to be the best in cleaning inverts. But at least for vocal you can use some MDX model like Kim, or even better MDX23 from MVSEP beta.\n\n1. by charm\n\n(rather outdated) Use Vocal\\_HP\\_4BAND\\_3090\\_arch-124m.pth at 0.5 aggressiveness, tta enabled\n\nthen use any model u like with Vocal\\_HP\\_4BAND\\_3090\\_arch-124m.pth instrumental results to filter out any vocals that weren't detected as vocals with Vocal\\_HP\\_4BAND\\_3090\\_arch-124m.pth model\n\ncombine the two results\n\nthen use model ensemble with whatever models u like (i used HP2 4BAND 1 and 2)\n\ndrag both vocal hp 4band+another model and ensemble results into audacity\n\nuse the amplify effect on both tracks and set it to -6.03\n\nrender\n\nthen use StackedMGM\\_MM\\_v4\\_1band\\_arch-default.pth\n\ntbh vocal models even at 0 aggressiveness really help inverts\n\nOr 0.3\n\nI mostly use acapellas for mashups and remixes, so the little bit of bleed i get at 0.0 aggressiveness is fine\n\ndrums-4BAND-3090\\_4band.pth\n\n0.5 optionally (less metallic sound)\n\n2)\n\n###### [Utagoe](https://drive.google.com/file/d/1_QabafcnbEUOSbmL6J3JDtNS7Da9V06m/view?usp=sharing) (English version with guide and error messages translated by Anjok) - if the invert isn't good, then try utagoe, but it’s not the best (although rather better then Align Inputs in UVR).\n\nSettings for Utagoe by HV:\n\n“if your tracks don't invert perfectly” (even when aligned)\n\n[https://imgur.com/a/ZC14xlE](https://imgur.com/a/uuuJ7Ws)\n\n“if it's perfectly inverting”:\n\n<https://imgur.com/a/Qb4pKeX>\n\nSome other settings:\n\n<https://imgur.com/a/fvQwbMO>\n\n“It has a weird issue sometimes tho, even when everything is perfectly aligned and inverts perfectly, utagoe misses some places, and it won’t insert for a second or so”\n\nby Mixmasher00\n\n“There is no actually settings depending on songs, but that is what I use, which is the default one.\n\n<https://imgur.com/a/kSDrTAB>\n\nGoing higher than 1.3 [of extractable level] imo won't do good at cleaning. Additional tip too, if you want to do just an \"invert\" and \"keep the original vocal volume\" just choose \"by waveform”.\n\nI have been using Utagoe for inversions recently because it keeps the original volume of the vocals, and then I ran it on UVR or MDX. If the chunks are soft, I prefer using UVR but if there are chunks are that heavy like drums, I'd use MDX.\n\nAlso, I find it better to clean an invert via UVR or MDX than Utagoe because it's better and cleaner without destroying the vocals“\n\n- “When using Utagoe, or UVR5, for aligning inputs, and inverting them, I get this really strange cracking noise, that not even doing a vocal separation with AI later can get rid of. Anyone know what could possibly be causing this?\n\nSo, I actually found a solution to this, for anyone running into a similar issue.\n\nWith inversions like this, you're already gonna have to use AI to get rid of the left over noise, since it's not gonna be a perfect inversion. So, the solution isn't to get a perfect one, it's simply to get rid of the noise that the AI cannot recognize, right?\n\nWith this particular inversion, I originally was using really compressed MP3s from around 2007 for the instrumentals, because the lossless versions of the instrumentals were lost media, up until a few months ago.\n\nI thought it was odd, because i don't remember this noise being an issue with the MP3s, and that's when it hit me, MP3s cut off the noise, with compression, and added just a small bit more of that noise you get with imperfect inversions.\n\nSo I converted the lossless instrumentals to an MP3 with Foobar, and it was better, but still had that damned drum crackling! So i kept trying. I used OPUS, OGG, different bitrates of MP3, even AAC.\n\nI have found that OPUS is the best at removing the drum overlap, I cannot hear any in fact, with OPUS.\n\nSo, my final guide is,\n\nIf you are getting cracking/crackling/overlap on drum hits in your inversions, then:\n\n1. Convert the instrumental to OPUS (128 Kbps) with Foobar2000.\n2. Use a software like Audacity to amplify it to a peak amplitude zero DB (since apparently OPUS auto declips to floating levels?)\n3. Export it as a WAV at the original sample rate (since OPUS only supports 48 kHz, I actually tried resampling the instrumental, and original to 48 kHz before converting to OPUS, but found that results in a WORST output.)\n4. Do your inversion (hopefully in Utagoe)\n5. Use whichever vocal model you like the output for best for cleanup.” - sausum\n\nPS. Once, I've runned into similar issue. And I fixed it, actually similarly. I think I was trying to invert with mp3 VBR, and the other file was lossless, so I converted it to the same codec and bitrate/V preset.\n\nYes, it wasn't perfect, but better.\n\nI wonder if simply applying cutoff at 20kHz wouldn't be a better solution. That's what Opus more or less does, plus upsampling to 48kHz.\n\n- Despite the fact that separation is in 32 bit float, align inputs option in UVR uses something lower internally, hence the clipping may occur.\n\n- As better alternative to Utagoe and UVR’s Align feature, you can use paid Auto Align Post 2 (maybe even cheaper MAutoAlign).\n\n######\n\n## \\_\\_Sources of FLACs for the best quality for separation process\\_\\_\n\n*Introduction*\n\n* Don’t use YouTube or mp3 as input files for separation. Compression decreases the quality of the output. If you need music video, Tidal has videos up to 320 kbps AAC audio, with bitrates varying by content and quality settings. None of the online rippers download videos from Tidal. You need a valid subscription and local ripper for it (more later below).\n* If you're forced to use YT, download audio, preferably as Opus if it's available for your video, and it exceeds 16kHz on spectrogram (AAC might be better up to 16kHz).\n* To enhance results from YT by combining Opus and AAC [read](#_6543hhocnmmy)\n* If you want to verify if your input file is really lossless:\n\n<https://fakinthefunk.net/en> (sometimes streaming services share bad/lossy versions)\n\n* If you want to check the real bit depth of the file, check:\n\n<https://www.stillwellaudio.com/plugins/bitter/>\n\n* For output files, don’t set quality to mp3 in [Colabs](#_wbc0pja7faof) (export your separations as WAV/FLAC) - YT will recompress it\n* To upload your lossless output file also losslessly on YT to avoid recompression [read](#_tu3sw6pao8fp)\n\n###### *Various versions of the same song*\n\n###### Sometimes the same track you may try to isolate can exist in few versions: e.g.\n\n0) album version\n- on streaming services - sometimes both explicit and non-explicit album versions are available, plus sometimes both in 44kHz 16-24 bit or 48-192kHz - they might give a bit of different results (if your old player app struggles with playing these files, download the latest MKVToolnix, drag and drop the file and begin multiplexing. It doesn’t reencode/recompress the audio stream)\n\n- on CD - sometimes these are two different masters - if total time and track gain of lossless files scanned by F2K is different by e.g. 1dB or iLufs reading is different, it’s a different master.\n\n~ Sometimes recent masters of older music are louder on CDs than on streaming services providing fewer dynamics, and in most cases such CD should be worse for AI separation when mastered to -9 ilufs vs -14 ilufs for streaming, although it can be totally opposite for some releases too\n\n~ Regional CD version - certain albums in the past used to have different releases for some countries, e.g. Japan, different track order, even slightly different mastering\n\n- on DSD - if available, they are different masters and might be worth to check for separation too\n\n- on SACD - -||-\n\n- on vinyl (so-called “vinyl rip”) - might give you a bit of different results for problematic tracks\n\n- on DVD-Audio (sometimes also 2.0 releases) if AC3 was used, it can be lossy - used bit depth or sample rate might depend on a release and it can be a different master\n\nin separation - usually vinyls are different masters with bass more/mostly in mono\n\n###### 1) single version - in the old days, single versions contained official instrumentals or accapella which not always invert with original mixture to get instrumental if it wasn’t available (but if it inverts at least partially it might give you better result - always try lossless files - lossy might not invert well)\n\n###### - on CD - sometimes contain different track list than on vinyl, extra tracks, remixes, etc.\n\n###### (rarely available on streamings in this form now) - always refer to Discogz to find all releases of your interest\n\n- on vinyl - -||- (won’t invert correctly due to constant playback speed fluctuations)\n\n###### 2) deluxe edition/reissue/remastered (sometimes separated instrumentals from remastered versions can be crispier than leaked multitracks which are rarely even mastered; also, different remasters might be available on streaming platforms or fan-made ones on YT or on the internet)\n\n3) Video released for the song - although lossy, it can be a completely different mix or master giving different results for separation (YT, Tidal, Vimeo, Dailymotion, etc.)\n\n4) Official remix - sometimes it might be easier to separate vocals from such version\n\n5) Leaks of earlier version of the song (might have different mixing, lyrics, even instrumental) - YT (often a subject to be taken down), fan-made Discord servers, internet (use various search engines; more below)\n\n6) Leaks of multitracks or stems of the song - usually it’s a different master than the final song, but you might experiment with Matchering and using it mixdown without vocals a target, and well sounding separation as input for Matchering to make it more similar to the final song\n\n7) Leak of instrumental/vocals - it can be lossy or slightly different from the final song, e.g. close to final stage or sometimes might have even dry vocals without any effects\n\n*Surround versions*\n\n8) 5.1 - e.g. DTS on DVD Audio/Blu-ray or SACD (you can search Discogz to look for multichannel versions released on disks, e.g. whole DTS Entertainment label)\n\n9) 360 Reality or Atmos (7.1 or more) - e.g. on Tidal, Apple Music (how to download is described below).\n\nSometimes in surround releases, vocals can be there simply in the center channel, but it's not always the case - still, it can be a better source, e.g. when you manipulate with volume of specific channels, or for vocals - when you get only center channel with very little instrumentalization which may turn out easier to separate by AI (for instrumental you might possibly invert result of vocal model and center channel to receive the remain instrumentalization in center).\n\n“For me, I convert left and right together then center alone then LS+Rs+LFE together then I have 3 audio files process them then remix into 5.1 again.\n\nThe 2 are in stereo and one mono which is center:.\n\n- killdubo\n\n“I do the same, except that I don't process LFE. Only the other 5.” -\n\n- santilli\\_/Michael\n\nE.g. \"With the Dolby Atmos release of Random Access Memories, some vocals and instrumentals can be separated almost like stems\"\n\nOr alternatively, you can simply downmix everything to stereo and then separate (just to check the outcome vs regular 2.0 versions).\n\n*Tape*\n\n10) Cassette tape - wouldn’t recommend it as a source (maybe unless it sounds superb, or you have a great deck at disposal to rip the recording) - even though the cassette tapes might be still released occasionally, contemporary music usually sound worse on them than it used to in the past, and potentially to compensate for it, they might contain different masters (cassette tape won’t invert due to speed fluctuations either)\n\n11) Reel to reel tape (better quality, mostly old music, also, usually in the past, the base medium for archival original stems of the recording before final mix and master, but degrading along the time, sometimes can change the sound after some time even when in the period of the song production)\n\n*General*\n\nAs for a good quality music on streaming services, you can get FLAC 16 bit and 24 bit on Qobuz, Amazon or on Tidal (now also up to 24 bit FLACs for Master quality - formerly MQA 24 bit (in the past most of Max (formerly Master) quality on Tidal was 16 bit MQA, while High (formerly Hi-Fi) is and was always FLAC 16 bit; MQA was lossy (but less than all other formats), but 24 bit MQA file could have given better results than 16 bit FLAC). MQA was gradually transitioned to FLAC on Tidal, but seems like old uploads in 24 bit MQA are not 24 bit anymore, but just 16 bit FLACs, so you might want to use Qobuz to find some of these 24 bit files if they aren’t available on Tidal like they used to be.\n\nMost importantly -\n\nFeel free to experiment with different versions and find the best result with a specific version of your song, although 24 bit FLAC should be the best (although not everyone might notice the difference).\n\nIf you have seemingly the same FLAC Audio CD rip from before streaming services times (~<2013), it can happen that a lossless file taken from a streaming service may be slightly different in most cases (same length but slight changes in Spek across the whole track which normally don’t exist when comparing FLACs from various streaming services which have the same Audio MD5 checksums - also sometimes track finishes in slightly different place). Sometimes it can sound better, sometimes worse.\n\n([*outdated - there are no longer MQA files on Tidal]* and we’re talking about situation that it’s not MQA 16 bit like \"Master\" quality files on Tidal [but lots are 24 bit as well, though it's better to get them from e.g. Qobuz if 24 bit for some track is available, or at least compare both, because it can give slightly different results]).\n\nAlso, it can happen that 24 bit MQA on Tidal will sound better for whatever reason than seemingly better FLAC on Qobuz - it might be possibly due to different files sent to streaming services by the provider/label.\n\nHow to notice difference on spectrogram in e.g. Spek between MQA and FLAC is frequencies from 18kHz (only in certain places) but in all cases - frequencies from 21kHz - press alt-tab between the two windows’ - don’t hover your mouse between preview of both windows’ - use alt-tab - you’ll notice the changes easier. That way, you’ll notice CD rip vs streaming differences if there are any.\n\nGenerally, MQA is the least of lossy codecs - you might consider picking it where its 24 bit variant is available over regular 16 bit FLAC (separate the track using the two, and you should notice any differences easier if you already can’t hear them on mixtures/original songs).\n\n*Comparisons of various versions of the FLAC files on streaming services*\n\nUse Audio MD5 in Foobar >properties of the file (or download [AudioMD5Checker](https://github.com/moisespr123/AudioMD5Checker/releases/)) to not run in circles looking for various versions of the same track with the same length. Some FLAC files don’t have MD5 checksums in F2K shown, so you’ll need to download AudioMD5Checker.\n\nE.g. on Tidal Recovery by Eminem returns the same MD5 for Deluxe and regular album, but using [~~https://free-mp3-download.net~~](https://free-mp3-download.net) (Deezer), checksums are different for both (to differentiate - albums on the net site have various release dates), but Deluxe on Tidal with regular on (Deezer) have the same MD5. And when Audio MD5 checksums were different, there were different results after separation. In this case of one unique vs 3 same MD5, the unique resulted in worse separation (but it can depend on more factors in other cases). Be aware of non-explicit versions which will naturally have different checksum.\n\n###### **Sites and rippers**\n\n*List of a ways to get lossless files for separation process*\n\nVarious music is scattered across various streaming services nowadays. If you can’t find your track on one service, or its ripper currently doesn’t work (it constantly changes) check other streaming service or the net (more below).\n\nList of all lossless streaming services with rippers below:\nTidal, Qobuz, Apple Music, Amazon Music (they support hi-res), Deezer (16/48 FLAC).\n\n*Note: Qobuz bans accounts using Qobuz-DL since October 1st 2025, so it’s not supported by any sites. Also Deezer stopped being supported quite some time ago.*\n\n*0)* [lucida.to](https://lucida.to/) | [lucida.su](https://lucida.su/) (sometimes works, sometimes redirects to doubledouble.top)\n\n*(Qobuz, Deezer [may not work], Tidal [icl. 24/96kHz FLAC stereo], Amazon Music, Beatport, lossy: Spotify, Yandex Music, Soundcloud).* Various subscription regions, for now there’s no Apple Music) - for URLs generated from share option on these services\n\nTry out 2-3 times for at least Tidal links when having errors after ripping started.\n- Send your request again if you got site unreachable browser error during download\n- Sometimes it might fail generating download in the first place\n- It frequently shows that it’s down, simply retry entering\n- “if a track has a status code 404, it means it is unavailable (region locked or completely unavailable) you have to try with another country/account under 'advanced…'”\n- Downloading more than EP (7-10 songs) at a time works clunky - it’s slow (at least from Tidal), plus during downloading track ~10 it might give an error, so you need to click retry, sometimes a few times, and then the whole album ripping will be completed. It might occur more than once for one album\n\n- Clicking retry while downloading whole albums from e.g. might end up with loops of “An error occurred. Track #1 error: Max retries attempted.” errors, after long wait on “Sending request for item 1” or series of interrupted downloads, while downloading single songs will work fine\n- You cannot download Dolby Atmos versions of songs from e.g. Tidal\n- For \"An error occurred. Unexpected token '<', \"<html> <h\"... is not valid JSON.\" just retry the task, or uncheck add metadata option.\n\n- For a search found as “unidentified” with generic cover, e.g. on Amazon, use Japan region and retry the search\n\n- All linking schemes like:\n\n1) https://www.deezer.com/xx/track/XXXXXXXXXXXX?host=XXXXXXX&utm\\_campaign=clipboard-generic&utm\\_source=user\\_sharing&utm\\_content=track-XXXXXXXXXXXX&deferredFl=1&universal\\_link=1”\n\n2) Later converted by lucida automatically to: https://www.deezer.com/xxx/track/XXXXXXX or\n\n3) https://link.deezer.com/s/xXxXXxxXxxxXxxXX\n\ncan can give “uh-oh!” error - if [stats](https://lucida.to/stats) page shows 0 downloads for Deezer (or any other service), it just doesn’t work\n\n- Sometimes Tidal links will work after retrying the pasting link request, or if you cut everything which follows <https://tidal.com/track/xxxxxxxx/> so the “u” (not sure which one).\n\n- In Amazon linking scheme, referral number appears in the middle, not at the end of the link, so you can debloat the following to restrict the tracker: https://music.amazon.com/albums/B073JR1FBD?marketplaceId=A3K6Y4MI8GDYMT&musicTerritory=PL&ref=dm\\_sh\\_xxxxxxxxxxxxxxxxxxxxxxx&trackAsin=B073RRBQR5\nBy the following:\n\nhttps://music.amazon.com/albums/B073JR1FBD?marketplaceId=A3K6Y4MI8GDYMT&musicTerritory=PL&trackAsin=B073RRBQR5\nWhen you use shared Amazon links, sometimes they must be deprived of some information after “&” mark, in order to not return error on the site, but because they contain an “album” string, you might end up with the whole album downloaded instead of a single song if you did it wrong.\n\nAt some point region string was better to be rather US instead of whatever your shared links have (it seems the accounts have US region), but now other regions in the links work too, and currently are changed automatically to US, and sometimes changing them manually to US might cause another error. Setting Japan in the Lucida region might alleviate some issues, but it will give Japan names of artists in the output files (both in tags and file names).\n\n0) [doubledouble.top](https://doubledouble.top/) (currently Qobuz, Tidal doesn’t seem to work for the EU region for now or for US rarely] and Soundcloud only works) Sometimes works, sometimes redirects to Lucida. For URLs, it supported Apple Music unlike Lucida (but later it stopped working too or worked rarely or with specific music, then lucida started supporting it), and currently from Deezer it returns only mp3 128kbps (check current services status at the bottom).\n\nIn specific cases, some streaming services might not have FLAC for your song, then use other streaming service.\n\n0) <https://tidal.qqdl.site> - TIDAL (doesn’t seem like it downloads anything higher than 16/44 when file quality is Max, but ensure; some smartphones might run out of memory running this, it’s a a new site by Lucida stuff)\n\n0) <https://tidal.squid.wtf/> - TIDAL\n\n\\*) [yams.tf](https://yams.tf/) (now very simple registration without email is required; based on song search, it allows playing the files in your browser in FLAC and creating your own playlists.\nDownloading is only possible with Settings>DevTools>Media>Start playing>New entry appers>Open in new Tab>...>Download. But output file names are generic and tags are empty). Didn’t check what streaming service it uses now.\n\nBefore, it supported Qobuz, Tidal, Deezer, Spotify and Apple Music only in 320 kbps, and worked only with URLs.\n\n*Telegram ripping bots*\n\n*Note*: To use Telegram in a browser, visit\n\n<https://web.telegram.org/>\n\nYou need the app installed and an account registered on your phone, and then QR code from there is needed.\n\n1a) Bot\n\n<https://t.me/onlydonuts>\n\n1b) Amazon bot (up to 24/96 FLAC)\n\n<https://t.me/GlomaticoAmazonMusicBot>\nQ: “I hit start bot, but nothing happens”\nA: “Once you start the bot, you must type /codec and send, then it will show a menu where you pick the format you want (mp3, flac, atmos)\n\nAfter selecting a codec, you simply need to send a link to a track or album.\n\nThe bot will download all the tracks in the format you pick, but if the track is not available in Atmos it will be ignored”\n\n1c) Apple Music Bot\n<https://t.me/GlomaticoAppleMusicBot>\n\n<https://t.me/bayapplemusicbot>\n\n1d) Deezer Telegram Bot\n\n<https://t.me/deezload2bot> - for ARLs [see](https://rentry.org/firehawk52#deezer-arls) (but those public get taken down often)\n\n1e) Spotify / Deezer / Tidal / Yandex / VK / FLAC / 25 Daily bot\n\n<https://t.me/BeatSpotBot>\n\n1f) Deezer mp3/FLAC bot\n\n<https://t.me/DeezerMusicBot>\n\n1e) [VK Bot](https://t.me/vkmsaverbot), [vkmusbot](https://t.me/vkmusbot) or [Meph Bot](https://t.me/mephbot) - VK / 320kbps MP3\n\n*(sites not ripping directly from streaming later below)*\n\n*Rippers*\n\n*2)* [Murglar](https://telegra.ph/murglar-en-05-12) app - [apk](https://github.com/badmannersteam/murglar-downloads/releases/tag/murglar_2.1.3_279_stable) for Android *- player and downloader working with Deezer, SoundCloud, VKontakte and Yandex Music (alternatively you can use it in Android virtual machine)*\n\n3) Apple Music ALAC/Atmos downloader\n\n<https://github.com/adoalin/apple-music-alac-downloader> (*valid subscription required, you can’t use an account that’s on a family sharing plan*, more about installation below in [Dolby Atmos](#_ueeiwv6i39ca) section)\n\nMight be less comfy to install for beginners. It requires Android (rooted at best and in Android Studio w/o Google APIs) and installing specific Frida server version (for not rooted devices it might be more complicated) and specific version of Apple Music app.\n\nRefer to GitHub link above and Frida website for further instructions.\n\n*(the section continues later below)*\n\n4) <https://github.com/zhaarey/apple-music-downloader>\n\n[Instruction](https://rentry.co/zhaareywrapper)\n\n“and additional note if that does not work\n\nreplace Part 2 Line 2\n\nDownload and extract the NDK needed to build the wrapper.\n\nwget https://dl.google.com/android/repository/android-ndk-r23b-linux.zip && unzip android-ndk-r23b-linux.zip -d ~\n\nwith\n\nDownload and extract the NDK needed to build the wrapper.\n\nwget https://dl.google.com/android/repository/android-ndk-r27c-linux.zip && unzip android-ndk-r27c-linux.zip -d ~\n\nthe change is the Android NDK version from 23b to 27c”\n\nBas Curtiz tutorial:\n\n<https://www.youtube.com/watch?v=eJ7a3W8qy5o>\n\n\\_\\_\n\nGeneral bots usage instructions\n\nGo to proper dl request channel and write\n\n!dl\n\nand after !dl (on Discord $dl), paste a link to your album or song from the desired streaming service and send the message, e.g.\n\n!dl https://www.deezer.com/en/track/XXXXXX\n\nFollow this link pattern. Sometimes the sharing option on the site changes the link pattern, so you need to open the changed link, and then it will redirect to the one similarly looking like above.\n\nTo open the Deezer player to search for files without active subscription, log-in and just go to:\n\n<https://www.deezer.com/search/rjd2%20deadringer>\n\nAnd replace the search query with yours after opening the link.\n\nIf the bot doesn’t respond in an instant, it probably means the track/album is regional-blocked, and you should use a link from another service or another channel (UK and NZ alternative servers available). It's capable of printing out unavailability errors as well.\n\nSome bots rip tracks or whole albums from Qobuz, Deezer, Tidal - all losslessly, while:\n\nSpotify, Soundcloud Go+, YouTube Music, JioSaavn are lossy.\n\n*Providing valid links for bot*\n\nFor your comfort, you should register and log into every streaming service and share links for specific tracks or albums from these services (e.g. instead of pasting full album links if you want), when you can’t simply find a specific single track in Google for this service, or share the link only for it comfortably. So basically go to <https://play.qobuz.com/>,\n\nand you can share single tracks to paste for bot to download - available only after logging into free account and only in the link above instead of regular Qobuz file search you can find in Google - there you cannot share single songs to download using bot later. It can happen that you’ll see an error that Qobuz is not available in your country. It’s fine - you won’t have to buy a subscription at this step in order to use their search. It’s enough to log-in using specific link and not the main page, use this one:\n\n<https://play.qobuz.com/search/tracks>\n\nAnd it will allow you to log in.\n\nBecause the bot rips from Qobuz, it’s the best source of 24 bit files which I recommend if only available (either 44, 48 or 96kHz) as it delivers FLACs for end users, instead of partly lossy MQA on Tidal when some album/song uses Master quality which is compulsory for 24 bit (44/48) there, but MQA 16 bit and Master is also possible for some albums (and you should avoid 16 bit MQA). Of course there might be some exceptions where 24 bit MQA on Tidal will sound better than FLAC 24 bit on Qobuz as I mentioned above - the example is Eminem - Music To Be Murdered By (Deluxe Edition) - Volume 1 (the newer Side B, track - Book of Rhymes).\n\nFor using Deezer links with bot, you need to find a song/album, use option to share a link to track or album, then open the shared link so it will be redirected, and then rename the link to this form for a single song (otherwise bot will return “processing” instead of ripping or even possible error):\n\n<https://www.deezer.com/en/track/XXXXXXX>\n\n(ARLs trick doesn’t work anymore) Hint: There's also something like ARL, which is a cookie's session identifier which can be shared, so everyone can log into the premium account and download FLACs with ARLs of different regions and regional locks. Might be useful for some specific tracks. ARLs are frequently shared online, though harder to find nowadays (Reddit censorship).\n\nIRC, Deemix might use ARLs beside regular account log in process.\n\n5) [Tidal Downloader Pro](https://github.com/yaronzz/Tidal-Media-Downloader-PRO/releases) (the fastest method for batch and local downloading) in GUI.\n\nHiFi Plus subscription is no longer necessary, just valid Hi-Fi subscription (for at least Hi-Fi albums, the two are merged in one for the price of the cheaper one now).\n\nYou won’t be able to download with better quality than 24 bit/48kHz and in Atmos with this downloader (then use [orpheusdl\\_tidal](#_2myqsboh95hp) instead, or [tidal-dl-ng](https://github.com/exislow/tidal-dl-ng) ([mirror](https://pypi.org/project/tidal-dl-ng/), [2](https://github.com/rodvicj/tidal_dl_ng-Project)) - but for that one I’m not sure if it downloads Atmos files) or [tiddl](https://github.com/oskvr37/tiddl).\n\nInstall Tidal app on Windows and log in, then open the downloader and click log, copy and paste the given code in the opened browser tab and voila.\n\nOr if that GUI temporarily doesn’t work, go to: <https://github.com/yaronzz/Tidal-Media-Downloader/releases> and download the newest source code. It contains CMD version for downloading, located in: Tidal-Media-Downloader-202x.xx.xx.x\\TIDALDL-PY\\exe\n\nDocumentation: <https://doc.yaronzz.com/post/tidal_dl_installation/>\n\nIf you have problems with running the app and people also write in GitHub issues that the current version is not working, keep tracking new versions, or read all the issues about this version, it may happen that someone else will update the app before.\n\nVersions “2022.01.21.1” and ”1.2.1.9” need to be updated to newer versions, they stopped working entirely.\n\n(not needed anymore, as current should still work)\n\nYou can alternatively grab this [recompiled](https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmega.nz%2Ffolder%2FDR1k1CCa%23Opt5RbSvXZVMMsq_QabAvA&data=04%7C01%7C%7C7a3ae2ea93da4c74678c08d9e99f4da0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637797692673323048%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tJ7ZlRYX5SDVc5MsMWZLau3FHcDkYq0g5l3QFJXrv4A%3D&reserved=0) version by another user.\n\nBy these downloaders you can easily download whole albums including hi-res and in GUI (PRO), and also queue for single tracks to download automatically is available (Pro).\n\nThere are cases when certain songs are behind regional block, and won’t be downloaded by any Divolt or Discord bot resulting in error.\n\nIn such a case, you’ll need the above downloader used locally, along with a Hi-Fi Plus subscription bought for your localization. Accounts bought from elsewhere, or paid with foreign currency, will most likely have regional block for some other country, so after you log into the service, certain songs won’t show in search, so the only way to show them without proper account (at least for your region) is to log out from regional locked account, start new account, and visit: <https://listen.tidal.com/> (you don’t need to have a valid subscription to search for songs on Tidal).\n\n###### Besides trial, you can go for Tidal Hi-Fi cheap subscription to: [https://www.hotukdeals.com/vouchers/tidal.com](https://www.hotukdeals.com/search?q=Tidal) or [pepper.pl](https://www.pepper.pl/search?q=tidal) or [mydealz.de](http://mydealz.de) which always have some free or almost free giveaways (linked to a ready search). Then install the desktop Tidal app and log in and open the downloader. It might automate the login process in the downloader\n\n(if you need to switch an account, you better delete Tidal-GUI folder from your documents folder in case of any problems). Monthly Argentinian subscription is the most reliable solution now if you don’t want to change your account every month or two searching for new offers.\n\nTidal over some other streaming services has some tracks in master quality which is 24 bit, and it gives better results for separation as the dynamics are usually better. But check if your downloaded file is really a 24 bit and your downloader is configured properly (read the documentation in case of any issues).\n\nBut, on Tidal there ~~are~~ were some fake master files in the past, which in reality were 16 bits, and they’re MQA to save space on their servers or mislead people, so there is no benefit from using them vs Audio CD 16 bit rip, since MQA alters quality in higher frequencies (only) and it will have an influence on separation process. So to verify if your downloader is set up properly, check whether you can download any track from Music To be Murdered By, by Eminem in 24 bit. If you can, you have properly installed and authorized the downloader, so it can download 24 bit files or at a higher sample rate than 44kHz if available.\n\nYou can paste links from Tidal into the GUI browser to find that track. Just delete “?u” at the end of the shared link.\n\n5b) Colab for downloading from Tidal and Qobuz using your own valid account (based on streamrip, active subscription required):\n\n[colab.research.google.com/github/r-piratedgames/rip/blob/master/rip.ipynb](http://colab.research.google.com/github/r-piratedgames/rip/blob/master/rip.ipynb)\n\n6) For Deezer <https://archive.org/details/deemix> - it allows you to download mp3 320 and FLAC files for premium Deezer accounts, and only mp3 128kbps for free Deezer accounts.\n\nBe aware that deemix.pro site is unofficial, and the PC 2020 version linked there is not functional. The last 2022 is on the archive.org linked above from reddit.\n\nQobuz or Deezer might give better results since Tidal is recently deleting FLAC support for 16 bit files on some albums, making all the files 16 bit MQA, which is not fully lossless file format, but close (of course Tidal Downloader converts the same MQA to FLAC). It also provides some high resolution files, but most likely less of them than on Tidal.\n\nBe aware that using some streaming services downloaders or even official Deezer/Tidal/Spotify apps, you might not be able to find or even play there some specific tracks or albums due to:\n\na) premium lock (it won't be played for free users)\n\nb) regional lock (search will come up empty [the same applies to Tidal here])\n\nExample: Spyair - Imagination instrumental - it shows up in search probably in Japan, though it cannot be downloaded using 2) https://free-mp3-download.net, but deemix with premium Deezer subscription did the job in downloading the song (not sure if it was Japan account).\n\nPS. You can cancel your trial subscription of Deezer or Tidal immediately to avoid being charged in the future, but also keeping the access to premium till the previous charge date at the same time.\n\n**Warning:** Qobuz started banning accounts using Qobuz-DL; can’t guarantee that other downloaders are not affected.\n\n7) <https://github.com/yarrm80s/orpheusdl>\n\nSupports Qobuz, Tidal (with [this](https://github.com/Dniel97/orpheusdl-tidal) module, and unlike tidal-dl, also downloads files greater than 24/48 and Atmos) and probably more\n\n7a) Bas Curtiz [GUI](https://www.youtube.com/watch?v=RAXsW67SjGU) for Orpheus (still needs working subscription)\n\n7b) [QobuzDownloaderX-MOD](https://rentry.org/firehawk52#qobuzdownloaderx-mod)\n\n(\\*May not work anymore)\n\n7\\*) If you have a Qobuz subscription, you can just use [qobuz-dl](https://github.com/vitiko98/qobuz-dl) (last updated a year ago, probably no longer works, but not sure, there might be some alternative already).\n\nAlternatively check:\nQobuz Downloader X\n\nor Allavsoft (both requires subscription)\n\n<https://www.qobuz-dl.com/> (takendown frontend browser client for downloading music for Qobuz. The code for hosting on [GH](https://github.com/QobuzDL/Qobuz-DL))\n\n7b) <https://github.com/nathom/streamrip>\n\nA scriptable stream downloader for Qobuz, Tidal, Deezer (active subscription required) and SoundCloud.\n\n8\\*) For Deezer you can use [Deezloader](https://deezloader.site/) or Deezloader Remix - it doesn’t require any subscription for mp3 128kbps, just register a Deezer account for free before, and use the account in the app. For free users it gives only mp3 128kbps with 16kHz, so it's worse than YT and Opus, so don't bother.\n\n9a) For Spotify, you can use Soggfy, or\n\n9b) SpotiDown (premium subscription for 320kbps downloading and app compiling required)\n\n9c\\*\\* Seemingly you can use <https://spotiflyer.app/>\n\nbut it “doesn't download from Spotify, but from Saavn, in 128kbps/low-quality.\n\nAlso, since it doesn't d/l from Spotify, you can't d/l exclusives released from there.”\n\nIt doesn't require a valid subscription irc and also allows playing and sharing music inside the app.\n\n9d\\*\\* The same sadly goes to this telegram bot downloader:\n\n<https://t.me/Spotify_downloa_bot>\n\n9e) <https://spotify-downloader.com/>\n\nOther lists of rippers and sites:\n\n<https://rentry.org/firehawk52>\n\n<https://ripped.guide/Audio/Music/>\n\n<https://fmhy.net/audiopiracyguide#audio-ripping-sites>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n**Other sites**\n\n(not downloading directly from streaming)\n\n10) Go to [allflac.com](http://allflac.com) - it’s paid, but they don’t pay royalties to the artist and its labels, as I spoke with at least one. They don’t keep up with the content with the streaming services, but they share stuff also not available on streaming services, even including vinyl rips as hi-res ones. Most if not all the files on the site are CD rips taken from around the net.\n\nI’ll explain to you how to download files for free from allflac and flacit:\n\n0. Log in\n\n1. Find desired album (don't press play yet!)\n\n2. Open the Chrome Menu in the upper-right-hand corner of the browser window and select More Tools > Developer Tools>navigate to “Network”\n\n3. Press CTRL+R as prompted\n\n4. Play audio file\n\n5. If it's 16/44 FLAC, go to media, sort by size, right-click on the entry and open in new tab to download (sometimes it appear after some time of playing and only in “all” instead of “media”)\n\n6\\*. On some 24 bit files, go to all, play the file and sort by size. You will find an entry with increasing size with xhr type and flac name if it’s not shown in the media tab.\n\n7. Recently it happened once, that the point five stopped working and the FLAC link is red. Now you need to go to the console and open a link with ERR\\_CERT\\_DATE\\_INVALID in the new tab and open the site, clicking on advanced.\n\nIn case of a 32 KB/s download, get Free Download Manager, and paste the download link there, and with 2 active connections in the downloader, it will speed up to 96KB/s occasionally (properly set JDownloader also allows increasing number of connections).\nHaven’t tried switching accounts to check if it will make the speeds back to normal (it wasn’t like that before).\n\nSome albums on allflac.com don't have tracks separated, but all the albums are in track 1.\n\nIf you want to physically divide the audio file -\n\nIn such case, you can search for cue sheet here: <https://www.regeert.nl/cuesheet/>\n\nPlace it near the file, and eventually rename, and it's ready, but it's only for playing and playlist purposes. It doesn't separate the audio file physically. To cut the file losslessly you need lossless-cut <https://github.com/mifi/lossless-cut/releases/> - it allows importing cue sheets to cut the album. Now if you have all the files divided you can probably use MusicBrainz (probably Foobar2000 plugin is available) to tag the files (but not the filenames - for that, you need mp3tag and tagged files to copy tags to filenames with specific masks). I know that lossless-cut might be not precise, and it may create a problem with automatic content detection in MusicBrainz, but I know that tool or similar allowed to just search for the album you specifically searched for, and not by just mark files>album detection in Foobar which may fail. So technically cutting and tagging the files should be possible, but time-consuming.\n\nLooks like, unlike 24/48 files, all 24/96/192kHz ones are just vinyl rips taken from various torrents. If again there’s only one or two files with the whole album, originally attached with cue, you should be able to find specific cue files simply searching in Google for its specific file name with quotes (file list is below track list there). Of course, you can also cut your album manually, or even make your own cue sheet to cut the album.\n\nAlso be aware that sometimes you won’t be able to download the file, and it won’t appear as FLAC, if you do not press CTRL+R on Network before starting playing the file, otherwise you need to close and reopen the tab and press CTRL+R in Network again.\n\nAnd also, such files can reset during downloading near the end (maximum size of downloaded file cannot exceed 1GB, otherwise it gets reset for some reason). To prevent it, copy the download link from your browser, and paste it to some download accelerator. Even free BitComet will do the trick since it supports HTTP multiple connection downloading. If you’re lazy, to prevent losing at least these 1GB, simply open the still downloaded file using MPC-HC and Chrome won’t reset the file size after it starts to reset the whole download (because the file cannot be deleted now), wait for the reset of the download, now just make a copy of the file and rename file extension to FLAC from temporary extension added by e.g. Chrome during downloading. Now you can stop downloading in Chrome. The downside is - the moment the file gets reset is not when it ends, meaning it’s not fully complete. But mostly. Of course, you can be lucky enough to find the original torrent with the files and simply finish downloading by verifying checksums of existing ones in P2P client (filenames must match to torrent files, simply replace them and find option to verify checksums).\n\n10b. All of the above applies to <https://www.flacit.com/>\n\nLooks like it has the same library taken from\n\nadamsfile.com\n\nwhich is also a warez site allowing playing files and downloading them using the method above.\n\nYou also need to register before playing any file there (registration is free).\n\n11. <http://flacmusicfinder.com/>\n\nBut it has a small library.\n\n\\*. FLAC sites listed [here](https://fmhy.net/audiopiracyguide#download-sites)\n\n12. Soulseek - but it’s simply a P2P based client, so carefully with that, and better use VPN (good one at best). GUI - [Nicotine+](https://nicotine-plus.org/) and Seeker working on Android\n\n13. Rutracker (the same advice as above)\n\n14. Chomikuj.pl (use their search engine, eventually Yandex, Duckduckgo, Bing) - free 50MB per week for unlimited amounts of accounts, free transfer for points from files uploaded or shared from other people’s profiles. People upload there separate tracks on loose as well, but they frequently get taken down, so search for rather full album titles in archives rather than single files. Mp3 files and those files which allow preview, can be downloaded for free with JDownloader, but occasionally some of such files might not work in JDownloader, and they’ll have to be downloaded manually.\n\n15. <https://music.binimum.org>\n\n16. <https://monochrome.tf>\n\n17. <https://dab.yeet.su> (requires free account creation)\n\n18. Simply search for the track on Google, or even better - Yandex, Duckduckgo, eventually Bing, because Google frequently blacklists some sites or search entries. Also, your specific provider may cut connection to some sites, so you'll be forced to use VPN in those cases when a search engine shows up a result from a site you cannot open.\n\n19. Redtopia archives on torrent sites\n\n20. YouTube Music - higher bitrate (256kbps) than max 128/160kbps on regular YT for Opus/webm (20kHz) and AAC/M4A 128kbps (16kHz). Similarly, like in Spotify - it can possess some exclusive files which are unavailable in lossless form on any other platform, but most on YTM are available losslessly on other streaming services, so use them instead.\n\nIf you have YouTube Premium you apparently can download files from it if you provide your token properly to yt-dl.\n\nMaybe logging into Google account with enabled premium in JDownloader 2 will do the trick as well.\n\nAnyway, Divolt bot (or any other currently available) will work too.\n\n\\_\\_\\_\\_\\_\\_\n\n***Outdated/closed/defunct***\n\n~~0)~~ [~~us.deezer.squid.wtf~~](https://us.deezer.squid.wtf/) | (out of order for now, Deezer only; search didn’t respond before) [~~eu.deezer.squid.wtf/~~](https://eu.deezer.squid.wtf/) (also offline) - works for queries, not URLs, single songs or albums, if you don’t check the “Save songs on download” (supported on Chrome) you need to press download button manually after ripping finishes (so once you’re notified), sometimes ripping can be progressing very slow, but consequently when you zoom the progress bar. Also, sometimes downloading in your browser might get interrupted in the middle (press resume in your browser download queue if necessary).\nSome users (e.g. French) are unable to reach the site (e.g. 403 error) then use VPN.\n\n~~0)~~ [~~https://us.qobuz.squid.wtf/~~](https://us.qobuz.squid.wtf/) (went offline) | [~~https://eu.qobuz.squid.wtf/~~](https://eu.qobuz.squid.wtf/) (back offline; Qobuz only) - -||- It can happen that using search on Qobuz won’t give you the desired results, while the search in the link will be successful.\n\n0) <https://qqdl.site/> - ~~Qobuz~~ (currently redirects to TIDAL), some smartphones might run out of memory running this, a new site by Lucida stuff, previously redirected to doubledouble.top\n\n0) [~~https://deezmate.com~~](https://deezmate.com/) - Deezer - mp3 or FLACs, working with links\n\n0) <https://free-mp3-download.net> (Deezer, FLAC, separate songs downloading)\n\nHere you can find (all?) mp3/flac files from Deezer. If the site doesn't work for you, use VPN. If the site doesn't search, mark \"use our VPN\". Single files download and captcha. No tags for track numbers and file names, FLAC/MP3 16 bit only.\n\n- If you see an error “file not found on this server” don't refresh, but go back and click download again.\n\n- From time to time it happened that it didn’t show up the FLAC option, and that it's “unavailable”, and sometimes it can show up after some period of time. The site started to have some problems, but it was fixed already.\n\n-Open every found track in a new tab, as back button won't allow you to see search results you looked for\n\n1 b) (doesn’t work anymore for 07.02.24)\n\nDiscord server with sharing bot (albums and songs)\n\nhttps://discord.gg/MmE4JnUVA\n\n-||-\n\nhttps://discord.gg/2HjATw6JF\n\n(another invite link valid till 12.11.13; needs to be renewed every month, probably current invitations will be on Divolt server here when the above will expire)\n\nLater, they required writing to the bot via DM to access the welcome channel with requests. Once I couldn't access the channel, and I needed to update Discord or wait 10-15 minutes, so the input form appeared.\n\nTo download, in welcome channel, paste:\n\n$dl [link to the song or album on streaming service without brackets]\n\nMore detailed instruction of usage below.\n\n(Defunct)\n\n2) <https://slavart.gamesdrive.net/tracks>\n\n(sometimes used to work, but not too often)\n\nAs of June 2023-March 2024 it is defunct, and throws: “There was an error processing your request!” on track download attempt, or in the past it was loading forever and nothing happens on multiple tries, before it worked after download button will stop being gray, and it’s green again, so you should click it and download may start shortly, but it stopped, lately it was working, you only needed to wait a bit.\n\nSimilar search engine for FLACs. Files are sourced from Qobuz (including hi-res when available). Songs listed double are sometimes in higher bit depth/resolution (different versions of the same track).\n\nIf you want to know what is the version you download, go to https://play.qobuz.com/ share track from there, and use download bots.\n\n1 b) Join their Divolt server directly by this link (if the above stopped working):\n\nhttps://divolt.xyz/invite/Qxxstb7Q (currently the bot don’t allow posting, containing only Discord invite, check it again later for valid link if necessary)\n\nFree registration required.\n\nIf this Divolt server is also down, go here:\n\n<https://slavart.gamesdrive.net/> (defunct)\n\nto get a valid Divolt invite link (it might have changed). But it had the old link for the long time later.\n\n### Dolby Atmos ripping\n\n“Streamed Dolby Atmos is eac (5.1 Surround) and JOC (Joint Object coding) it's a hybrid file of channels and objects that decodes the 5.1 + joc to whatever your speakers are from 2.0 up to 9.1.6.\n\nIt's not a multitrack, although clearly what some mixers do is put all the vocal in the center channel, so effectively you have an a cappella in center and then the instrumental in everything else, but many labels forbid engineers doing it and have policies that they must mix other sounds into center, so people don't rip the a cappella.\n\n[“apparently Logic Pro does it automatically as well” isling, src: ScrepTure]\n\nYouTube only supports FOA Ambisonics as spatial audio, but you can encode Dolby Atmos to Ambisonics. [by e.g. <https://www.mach1.tech/>]\n\nApple Music has a larger amount [of Atmos songs] because Apple Pay the labels for exclusive Atmos deals.” ~Sam Hocking\n\nTidal only supports 5.1 or maybe 5.1.4, and Apple Music at least up to 7.1.4 (9.1.6 support could have been dropped since macOS Sonoma, not sure “On latest MacOS you do now have the ability to decode directly to 7.1.4 pcm realtime from Apple Music.”).\n\n“I tried using channels from an Atmos mix to get better instrumentals and very surprisingly it sounds a lot worse\n\nI rendered it into 7.1 and upmixed the channels into 3 separate stereo tracks, and processed each using unwa's v1e+\n\nIt ended up sounding more muddy than using a lossless stereo mix” - santilli\\_\n\n“sometimes rendering into 9.1.6 is good for some instruments yet everyone says it’s really unnecessary\n\nwhich is kinda true\n\nlike the 1-2 channel for 9.1.6 dolby is insanely clean on the songs i’ve tried but some other stems sound a bit ehh” - Isling\n\n*- from Tidal (via ~~Tidal-Media-Downloader-PRO [Tidal-DL GUI]~~)*\n\n*(doesn’t work anymore for Atmos; see further below)*\n\nJust get Tidal-dl with HiFi Plus [subscription](#_89rwiestpg95) - now merged into one subscription (CLI version; for one user on our server it works for 13.10.22, but for some people strangely not).\n\nFor 30.04.24 with Tidal app installed on Windows and tidal-gui authorized by browser prompt/or automatically, Atmos files are not downloaded (checked all qualities in settings incl. 720/1080), at least on subscription automatically converted into higher plan due to recent changes (MQA files started to play since then, so it might be not subscription issue).\n\nIf having some problems, use tidal-dl (non-GUI) and tidal account with valid subscription and proper plan, set up to fire tv device api (option 5 iirc).\n\nBut I cannot guarantee it will work for Atmos.\n\n###### **> from Tidal (with** [**orpheus\\_dl\\_tidal**](https://github.com/Dniel97/orpheusdl-tidal) **installed over** [**orpheusDL**](https://github.com/OrfiTeam/OrpheusDL)**; max 5.1[.4?])**\n\nDownloads [Atmos](https://tidal.com/browse/track/280733977) and [high resolution](https://tidal.com/browse/track/200566143) files bigger than 24/48.\n\nIt’s only CLI app (valid [subscription](#_89rwiestpg95) is still required).\n\nA bit convoluted installation.\n\nIf you have problem with using git in the Windows command line, use [this](https://drive.google.com/file/d/1V6ZwVgzYXjqc4QH9BGRoBqFZE-ZPNO16/view?usp=sharing) ready OrpheusDL package (works for 30.04.24, later it can get outdated; it already has Tidal settings and Atmos enabled) after you install python-3.9.13 or newer (works currently also on python-3.12.3-amd64).\n\nOr else, to install manually following GH instructions, to fix git issue, execute:\n\npip install gitpython\n\nand/or install git from [here](https://git-scm.com/download/win)\n\n(one or both of these should fix using git in CML when pip install git cannot find supported distribution and git command is not recognized).\n\nOnce Python and the OrpheusDL package is correctly installed, CML usage is:\n\norpheus https://tidal.com/browse/track/280733977\n\nYou can place it as parameter for shortcut to orpheus.py on your desktop in Target (PPM on shortcut).\n\nE.g. \"C:\\Program Files\\OrpheusDL\\orpheus.py\" https://tidal.com/browse/track/280733977\n\nOr else, press Win+R>cmd.exe, and if you’re currently not at the same partition as Orpheus (e.g. C:\\) press e.g.\n\nd:\\\n\nand seek to the folder you have Orpheus installed, e.g.\n\ncd D:\\Program Files\\OrpheusDL\\\n\nthen execute\n\norpheus https://tidal.com/browse/track/280733977\n\nAlways delete “?u” at the end of the link copied from Tidal, or it won’t work.\n\nOnce you execute the command, it will ask you for login method (I tested the first one - TV) - now it will redirect to your browser to authorize.\n\nMQA is disabled by default (not used by Atmos), but you can enable it in config\\settings\\\n\nby editing \"proprietary\\_codecs\": to false in line 21.\n\nDownloaded files are located in OrpheusDL\\downloads folder\n\n*spatial\\_codecs* flag is enabled by default and supports Dolby Atmos and 360 Reality Audio.\n\n\"Some of the 360 stuff is impossible to split right now. Not sure what is going on. Maybe some type of new encryption. I have the MP4 to David Bowie Heroes 22 channels, and it's a brick, useless…\"\n\nThe output of downloaded Atmos files is m4a encoded in E-AC-3 JOC (Enhanced AC-3 with Joint Object) - Dolby Digital Plus with Dolby Atmos and possibly AC-4, and FLAC for hi-res\n\nDownloaded hi-res and Atmos files can be played in e.g. MPC-HC or VLC Media Player, but will fail on some old players like Foobar2000 1.3 and 1.6.\n\n###### **> from Tidal (with** [**https://github.com/exislow/tidal-dl-ng**](https://github.com/exislow/tidal-dl-ng)**)**\n\n- *from Apple Music (Android, max 7.1[.4?])*\n\n<https://github.com/adoalin/apple-music-alac-downloader>\n\nInstallation tutorial:\n<https://www.youtube.com/watch?v=blazHnCh6jQ>\n\n“Pre-Requisites:\n\nx86\\_64 bit device (Intel/AMD Only)\n\nInstall Python: <https://www.python.org/>\n\nInstall Go: <https://go.dev/doc/install>\n\nInstall Android Platform Tools: <https://developer.android.com/tools/releases/platform-tools>\n\nand set it to environment variables / path\n\nDownload and extract Frida Server - <https://github.com/frida/frida/releases/download/16.2.1/frida-server-16.2.1-android-x86_64.xz>\n\nDownload Apple Music ALAC Downloader - <https://github.com/adoalin/apple-music-alac-downloader>\n\nExtract content to any folder.\n\n1)\n\nInstall Android Studio\n\nCreate a virtual device on Android Studio with an image that doesn't have Google APIs.\n\n2)\n\nInstall SAI - <https://github.com/Aefyr/SAI>\n\nInstall Apple Music 3.6.0 beta 4 - <https://www.apkmirror.com/apk/apple/apple-music/apple-music-3-6-0-beta-release/apple-music-3-6-0-beta-4-android-apk-download/>\n\nLaunch Apple Music and sign in to your account. Subscription required.\n\n3)\n\nOpen Terminal\n\nadb forward tcp:10020 tcp:10020\n\nif u get a msg that there are more than 1 emulator/devices running, seek up NTKDaemonService in task manager/services and stop it\n\nadb root\n\ncd frida-server-16.2.1-android-x86\\_64\n\nadb push frida-server-16.2.1-android-x86\\_64 /data/local/tmp/\n\nadb shell \"chmod 755 /data/local/tmp/frida-server-16.2.1-android-x86\\_64\"\n\nadb shell \"/data/local/tmp/frida-server-16.2.1-android-x86\\_64 &\"\n\nThe steps above place Frida-server on your Android device and starts the Frida-server.\n\n4)\n\nOpen a new Terminal window\n\nChange directory to Apple Music ALAC Downloader folder location\n\npip install frida-tools\n\nfrida -U -l agent.js -f com.apple.android.music\n\n5)\n\nOpen a new Terminal window\n\nChange directory to Apple Music ALAC Downloader folder location\n\nStart downloading some albums:\n\ngo run main.go https://music.apple.com/us/album/beautiful-things-single/1724488123\n\ngo run main\\_atmos.go \"https://music.apple.com/hk/album/周杰倫地表最強世界巡迴演唱會/1721464851\"\n\n*- from Apple Music (alternative tool)*\n\n<https://rentry.co/AppleMusicDecrypt>\n\n(after March 5, 2025 get WSA from here:\n\n<https://github.com/MustardChef/WSABuilds>)\n\n*from Apple Music (MacOS, virtual soundcard recording)*\n\n(Guide by Mikeyyyyy/K-Kop Filters, [source](https://discord.com/channels/708579735583588363/708579735583588366/992461132755369984))\n\nYou will need a Mac to do this, this will only work for MacOS, you will need an Apple Music subscription, \"Blackhole 16ch\" and any DAW of your choice I prefer FL Studio (can be also Audacity),\n\nStep 1. Install Blackhole Audio driver (search for it in Google)\n\nStep 2. Download the song you want in Dolby Atmos (if you don't know how to do it, go to settings in Apple Music then to general then toggle download Dolby Atmos)\n\nStep 3. Go to your desired DAW and in your mixed select input, and it will show your 16 outputs select 1, (Mono) for the first mixer, then number 2 mixer do the same but 2 and so on until you reached 6\n\nStep 4. Hit record and play the track in Dolby, and you're done!\n\n[Similar](https://sharemania.us/threads/tutorial-how-to-rip-apple-musics-dolby-atmos-channels-7-1-4.209629/?__cf_chl_tk=e.AYlwMiV9oW5skYALD286x_IM5QgjDS0GaB_FxH610-1682477648-0-gaNycGzNDFA) tutorial based on Blackhole and Audacity on Mac (open the link in incognito in case of infinite captcha)\n\n“On the latest MacOS you now have the ability to decode directly to 7.1.4 pcm realtime from Apple Music. If you use a loopback virtual audio driver you can record the 12 channels. Depending on how the song was mixed might mean the C channel has even clearer vocals.\n\nProbably worth mentioning Dolby Atmos is delivered as dd+ (Dolby Digital 5.1 Surround downmix) but JOC allows it to be decoded up to 9.1.6 16 channels. To do that you need either an AVR or Dolby Reference Player or Cavern/Cavernize.” Sam\n\nYou won't be able to do the same on Windows with [LoopBeAudio](https://www.nerds.de/en/loopbeaudio.html) instead (paid, but trial works for every 60 minutes after boot) because Apple Music on Windows (including the one in MS Store) doesn't provide Dolby Atmos (7.3.1) files at all (only stereo hi-res lossless) no matter what virtual soundcard you use, so you'll need Hackintosh or VMware.\n\n\"Vmware kinda lag\n\nand find own seri to fix login apple services\"\n\n- [ittiam-systems/libmpegh: MPEG-H 3D Audio Low Complexity Profile Decoder](https://github.com/ittiam-systems/libmpegh)\n\nUsing this program, you can extract the 12 channels of the Dolby Atmos tracks.\n\n“MPEG-H is essentially Sony360, just Sony360 licenced decoders needed. Fraunhofer allow it to be used for free, though.\n\nia\\_mpeghd\\_testbench.exe -ifile:\"FILENAME.mhm\" -ofile:track1.wav\n\nor:\n\nia\\_mpeghd\\_testbench.exe -ifile:\"input file name.m4a\" -ofile:\"output file name.wav\" -cicp:13\n\n“renders to 22.2 as well”\n\n<https://mpegh.lze-innovation.de/#LZE>\n\nBut seems like you need to write them some message.\n\nAbove it tells it's for professionals, but try your luck:\n\n[https://www.iis.fraunhofer.de/en/ff/amm/broadcast-streaming/mpegh.html?source=post\\_page](https://www.iis.fraunhofer.de/en/ff/amm/broadcast-streaming/mpegh.html?source=post_page---------------------------)\n\nYou should also have a success with extracting stems with MMH Atmos Helper “includes a MPEG-H decoder built-in apparently”\n\n360RA/MPEG-H 3D online decoder by run4r.ses/r4r\n\n<https://run4r-ses.github.io/mpeghdeconline/>\n\n“It uses WASM and the libmpegh testbench\\* to decode files. However, that means that it uses MEMFS, which can lead to the decoder using up to 1.5GB for a 22.2 file so make sure you have plenty of RAM for the decoder to use.\n\nif you don't like that RAM limitation (for mobile devices etc.) you can download builds of the testbench from <https://run4r-ses.github.io/libmpegh_builder/> (includes Android builds which you can run through terminal emulators) although the web decoder shouuulddd be fine for most devices (except iOS, it mostly crashes on it)\n\nI hope this is helpful to anyone! If you encounter any issues, please let me know, as this is pretty much still a WIP.\n\n\\* The original libmpegh testbench is made by Ittiam Systems, so all credits to them! Their licenses are linked in both websites”\n\nIsling: “do you happen to know what all of the names of the channels are?\n\nfor the 22.2 layout for example, i have no idea what the channels/speakers are called, i see you have the acronyms for it though”\n\nA: <https://en.wikipedia.org/wiki/22.2_surround_sound#Channels>\n\nIsling: “Great site too, it has customisability too, the one by ITTIAM doesn't seem to keep the surround sound effect so i'll try this one out.\nLike, when i decode the 360ra files they lose their surround sound, and they don't sound as good in DAWs compared to if i was playing them with amazon music for example\n\nthey just sound like stereo mixes without mastering, i want them to still have that cool effect”\n\nR: “Well, if you decode to stereo then it might sound worse than Amazon Music because the binaural renderer that ittiam uses is very basic, and you will get the same result with this too. if you want a \"custom surround sound\" then you'll probably have to render to something like 22.2, use your own HRTF model etcetera etcetera which is what i imagine Amazon Music does.”\n\nI: Yeah I decode it to 22.2. And I didn't realise there were extra steps but that makes sense.\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nAll Dolby Atmos is encoded, so to play it, basically it has to be decoded to audio playback through a Dolby licensed decoder. There are ways to decode, though. Easiest is to use Cavern.\n\n<https://cavern.sbence.hu/cavern/>\n\nAtmos is a lossy format. 768kbps across 6 channels so not the highest resolution, but to decode to multichannel .wav just download cavern and put your dd+joc file through Cavernize. Streamed Atmos [is lossy]. TrueHD Atmos isn't. Atmos Music is only distributed lossy, though.\n\nYou can encode Dolby Atmos to Ambisonics by e.g. <https://www.mach1.tech/>.\n\nAtmos files downloaded from Tidal with OrpheusDL are simply FLACs in m4a container, and can be read by MPC-HC, VLC, and Foobar 2.x.\n\nOn the side, “The process of making Atmos [*from an engineer standpoint*] is:\n\nDAW > 128 channel ADM\\_BWF > 7.1.4 >5.1(joc). So basically those 128 channels are encoded to 6, but the object audio is still known where it should exist in the space and pulls that audio out of the 5.1 channels to make up to 9.1.6 (max supported for music)”\n\nAnd authoring for Atmos is not available on Windows but:\n\n“Traditionally it's not been unless you ordered a DELL from Dolby configured to use as a Rendering machine, but today both Dolby Atmos Renderer, DAWs like Cubase and Nuendo and 3rd party VST exist to do it on Windows now. I use Fiedler Atmos Composer on a stereo DAW called Bitwig to build demix projects for Atmos engineers to then master to Atmos from Stereo (sometimes all they have left to work with as multitrack tapes lost/destroyed/politics/easier)” ~Sam Hocking\n\n### 360 Reality Audio FAQ\n\n“A lot more stems in 360, and it has no bleeding or filtered sounding artifacts.\n\nA big problem with Dolby stems is artifacts, essentially none of that in 360.\n\nAnd if there is bleeding in 360 then the volume is completely balanced and doesn’t change all the time” - I.\n\nQ (isling): Is there a way to just listen to 360 files locally while still getting the immersive 3d effect?\n\nDecoding them and listening in DAWs don't sound like how they do on Amazon Music for example.\n\nIs there even a way to listen to them without decoding? I've downloaded the MPEG-H software stuff.\n\nThe MPEG-H format player doesn't read the decoded WAVs nor supports m4a which is what the original non-decoded files are.\n\nThe 360 WalkMix player didn't work either.\n\n<https://github.com/ittiam-systems/libmpegh/releases>\n\nThis one is the one I use, works great to use, but doesn’t have the same 3D effect.\n\nI tried playing the decoded 360 Reality Audio stems in VLC as it has the best Dolby effect, but it didn't have all the channels playing.\n\nA (Sam Hocking): I've worked it out, there's a free decoder. These are the main MPEG-H tools for both creating and playing MPEG-H 3D: <https://mpegh.lze-innovation.de/>\n\nAll free. The plugin is really cool. Request the plugin from their site.\n\nDolby and Sony 360 are object based [not channel base like Ambisonics]. Sony 360 is just protected MPEG-H 3D.\n\nWhen you rip from Tidal, you can choose how you decode the file to channel-based audio. This is the point of object-based audio, you decode it to what number of speakers you have\n\nIf you use tidal-gui, enter an Android token from your Tidal app on the phone and download the file, then you can decode the Sony 360 / MPEG-H by the following:\n\nDescription in format Front/Surr.LFE\n\n1: 1/0.0 - C\n\n2: 2/0.0 - L, R\n\n3: 3/0.0 - C, L, R\n\n4: 3/1.0 - C, L, R, Cs\n\n5: 3/2.0 - C, L, R, Ls, Rs\n\n6: 3/2.1 - C, L, R, Ls, Rs, LFE\n\n7: 5/2.1 - C, Lc, Rc, L, R, Ls, Rs, LFE\n\n8: NA\n\n9: 2/1.0 - L, R, Cs\n\n10: 2/2.0 - L, R, Ls, Rs\n\n11: 3/3.1 - C, L, R, Ls, Rs, Cs, LFE\n\n12: 3/4.1 - C, L, R, Ls, Rs, Lsr, Rsr, LFE\n\n13: 11/11.2 - C, Lc, Rc, L, R, Lss, Rss, Lsr, Rsr, Cs, LFE, LFE2, Cv, Lv, Rv,\n\nLvss, Rvss, Ts, Lvr, Rvr, Cvr, Cb, Lb, Rb\n\n14: 5/2.1 - C, L, R, Ls, Rs, LFE, Lv, Rv\n\n15: 5/5.2 - C, L, R, Lss, Rss, Ls, Rs, Lv, Rv, Cvr, LFE, LFE2\n\n16: 5/4.1 - C, L, R, Ls, Rs, LFE, Lv, Rv, Lvs, Rvs\n\n17: 6/5.1 - C, L, R, Ls, Rs, LFE, Lv, Rv, Cv, Lvs, Rvs, Ts\n\n18: 6/7.1 - C, L, R, Ls, Rs, Lbs, Rbs, LFE, Lv, Rv, Cv, Lvs, Rvs, Ts\n\n19: 5/6.1 - C, L, R, Lss, Rss, Lsr, Rsr, LFE, Lv, Rv, Lvr, Rvr\n\n20: 7/6.1 - C, Leos, Reos, L, R, Lss, Rss, Lsr, Rsr, LFE, Lv, Rv, Lvs, Rvs\n\nNote: CICP 13 is applicable for baseline profile streams with only object audio.\n\nBut the delivery is all contained within 12 channels (7.1.4)\n\nThere are different levels of MPEG-H. For streaming, it's 7.1.4 which is level 3 IIRC.\n\nQ: 3rd order Ambisonics you mean? And 7.1.4 is literally just Dolby, right?\n\nA: No 7.1.4. You can consider it the same as Dolby Atmos.\n\nIt's object-based audio, Ambisonics is channel based audio.\n\nQ: Isn't 360 RA Ambisonics though? Dolby is object based, right?\n\nA: Yep, Dolby and Sony 360 are object based. Sony 360 is just protected MPEG-H 3D\n\nQ: So Sony 360 is also object based? So it's not Ambisonics\n\nA: [Sony 360 7.1.4 “decoded to normal channel-based audio” looks like just 12 stems which can be imported into Audacity]\n\nQ: The one I got was from Amazon Music, not Tidal. Shouldn't make a difference?\n\nA: It's actually far more powerful than Dolby Atmos in this sense.\n\nMusic Media Helper can decode Sony 360. Last time i checked they are possible to rip from Tidal although iirc Tidal are dropping 360 support?\n\n<https://www.quadraphonicquad.com/forums/threads/music-media-helper-tools-for-multichannel-audio-music-videos.22693/> (Sam)\n\nAlternatively, this paid plugin can handle 360RA downmix to binaural audio. <https://www.perfectsurround.com/> but you need an iLok ID even for the free trial (jarredou)\n\n## \\_\\_\\_AI mastering services\\_\\_\\_\n\nMight be useful even for enhancing quality of instrumentals after separation (or your own mixed music)\n\nBe aware that at least some advanced mixing beforehand may cheat the content ID detection system, so your song won't be detected. If some label prevents you from uploading their stuff on YT by blocking it straight after uploading a regular file, you may get a copyright strike after some time of uploading mastered instrumental as they also use the search engine on YT too to find their tracks at certain periods.\n\nIf you don't find satisfying results with the services below, read [that](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?usp=drivesdk).\n\n**Paid**\n\n<https://emastered.com/> (unlimited free preview, 150$ per year)\n\nPreview is just mp3 320kbps @20kHz cutoff, which is claimed to have a watermark, but it cannot be heard or seen in Spek. The preview file can be downloaded by opening Developer Tools in browser, and playing preview, then in \"media\", the proper file should appear on the list (don't confuse it with original file), now open the proper link in the new tab and open options of the media player and simply click download.\n\nIt's the most advanced and better sounding service vs all free ones I tested (even if you have only access to mp3, but I also listened to max 24 bit WAVs on their site with a paid account). Also, it's one of those, which are potentially destructive if you apply wrong settings, but leaving everything in default state is a good starting point, and works decent for e.g. mixtures and even previously mastered music to some extent, at least which does not hit 0dB (but e.g. even -1dB, but it is claimed to work the best between -3dB and -6dB). Generally I recommend it. Worth trying out.\n\nNote for paid users - be aware that preview files can be mp3 files as well. So what you hear during changing various parameters, is not exactly the same as final WAV output.\n\n[https://distrokid.com/mixea](https://distrokid.com/mixea/) (99$ per year/first master for free)\n\n“[vs LANDR, BandLab and eMastered] I experienced that Mixea mastered with a much stronger sound and brighter (in a good way, the trebles are very clear) than the others.”\n\n[https://www.masteringbox.com](https://www.masteringbox.com/) >\n\n<https://www.landr.com/> (now also plugin available)\n\n<https://masterchannel.ai> (15/20$ per month, only free previews, also can convert stereo to multichannel audio)\n\n<https://ariamastering.com/en/Pricing> (from 50$ per month or 9.90$ per master, mastering based on fully analog gear and robotic arm to make adjustments in real time)\n\n<https://glowmastering.com/> (8$ for 5 masters, unlimited for 80$, 3 for free)\n\n**VST plugins**\n\n[iZotope Ozone Advanced](https://www.izotope.com/en/shop/ozone-11-advanced/) 9 and up (paid)\n\nVersion Advanced has a new AI mastering feature which automatically detects parameters which can be manually adjusted after the process. It works pretty well, and repairs lots of problems with muddy mixes (especially with manual adjustments - don't be afraid to experiment - AI is never perfect).\n\nMastering Assistant built-in the recent versions of [Logic Pro](https://www.apple.com/logic-pro/) DAW (MacOS only)\n\nIt can give more natural results than Izotpe above\n\n[AI Master by Exonic UK](https://www.exonicuk.com/product-page/ai-master) (paid)\n\n[master\\_me](https://bedroomproducersblog.com/2023/03/05/master_me/) (free)\n\nIt contains a decent mastering chain which adjusts settings for you automatically for the song which can be changed later, and also you can change target ilufs value manually. By default, it's -14 ilufs and can be too quiet for songs already mastered louder, and it can become destructive while set that way for some songs\n\n**Free** **online services** (all below remarks apply when mastering AI separated instrumentals)\n\n<https://aimastering.com/> (redirects to <https://bakuage.com/app/>)\n\nwav, mp3, mp4 accepted, output: wav 16-32, mp3 320kbps, 44 or 48kHz\n\nYou can optionally specify a reference audio file.\n\nTons of options but not a comfortable preview while tweaking them. You can optionally specify the reference audio, uploading a file. Also, there’s one completely automatic option. Generally it can be destructive to the sound, even using the most automatic setting - attenuation of bass, exaggerating of higher tones.\n\nPreferred options while working with a bit muffled snare in the mix of 500m1 model for instrumental rap separation result\n\n(automatic (easy master) is (only) good for mixtures [vocal+instr]):\n\n* True Peak, Oversampling 2x, AM Level 0.3, WAV 32. SAO, 0/22000 (the rest untouched)\n\nFor still too muffled sound (e.g. when lost in lots of hi-hats):\n\n* YouTube Loudness, OVS to 1x and AM Level 0.2 and 24 bit (+ true peak, SAO, 0/22000)\n\nAlternative (good for mixtures and previously mastered music with a bit muddy snare):\n\n* YouTube Loudness, Target Loudness -8, Ceiling -0.2, OVS to 2x, True Peak and AM Level 0.3 and 32 bit, SAO, 0/22000\n\nThe most complicated tool, but the most capable amongst all free ones mentioned here so far. After two first files, it gets you into a short queue. Processing takes 2-3 minutes. Cannot upload more tracks than one at the same time. Great metrics, e.g. one measuring overall \"professionality\" of the result master. At this point, it can also start exaggerating vocal leftovers from the separation process. Equalize Loudness doesn’t do anything when checked just before download (probably only after when you click remaster).\n\nThey also have offline app: [https://github.com/ai-mastering/phaselimiter-gui/releases/](https://github.com/ai-mastering/phaselimiter-gui/releases/tag/v0.1.1)\n\nwith some features used on Bakuage/aimastering.com\n\n“but most of the settings you want are on their site, their offline version is set and forget. (...) doesn't give you some specific settings to adjust.”\n\n<https://moises.ai/>\n\n16-32 bit WAV output (now WAV is only in premium), any input formats. They have bad separation tools, but great, neutral mastering AI. It works very good for vinyl rips. You can get more than 5 tracks per month for free (don’t know how many - the 5 tracks limit is for separation, not for mastering feature, at least 30 worked in 2022).\n\nThe mastering feature is only available in the web version, so if you’re on the phone, run the site in PC mode.\n\n24 bit -9 iLUFS / or without limiter does the best job in most cases for e.g. GSEP (the latter is when you don’t want to smooth out the sound). -8 tends to harm the dynamics of songs, but in some cases it might be useful to get your snare louder.\n\nThe interface has a bug when you need to pick your file to upload twice, otherwise you won’t be able to change parameters and confirm the upload process (also on mobile parameters not always appear immediately after you pick your file/pasted link enlisting the options manually doesn’t let you confirm the step to proceed to upload, and you need to retry picking the file, and now you can proceed).\n\nSometimes uploading is stuck for very, very long on 99% and if you leave your phone in sleep mode and return after 15 minutes, it will start some upload again on this 99%, but eventually it will return the error. You simply need to retry uploading the file (it will also stack at 99%, but it will still upload at that time).\n\nAlso, importing the same file via GDrive may not work.\n\nAdditionally, if you pick 32 bit output quality, when mastering is done, when you will want to download the file, in WAV it will show 24 bit, but the file will be 32 bit as you selected before.\n\nIt’s the most neutral in sound in comparison to the two below.\n\nIf you plan to master your own music, read “Preparing your tracks” here: <https://moises.ai/blog/how-to-master-a-song-home/> I think these tips are pretty universal for all of these services.\n\n<https://www.mastering.studio/>\n\nFour presets with live preview, only 16 bit WAV for free, only WAV as input accepted (for the best quality convert any mp3’s to WAV 32-bit float (you can use Foobar2000), 64 bit WAV input unsupported).\n\nIf you see \"upload failed\", register and activate a new account in incognito mode and everything using VPN (probably a block for ISP which I had).\n\nJudging by only 16 bit output quality (which is unfair comparison to 24 bit on moises.ai) and for GSep 320kbps files, I found it worse, and even the London smooth preset is not so neutral like moises in overall, and it can be destructive to the sound quality. But, if you need to get something extra from the mix if it’s blurry, that’s a good choice (while some people can find emastered too pricy).\n\nBandLab Assistant mastering\n\nFirst, you need to download their assistant here:\n\n<https://www.bandlab.com/products/desktop/assistant>\n\nThen insert the file, pick preset, listen, and then it is uploaded for further processing, and you’re redirected to the download page.\n\nThey write more about it below:\n\n<https://www.bandlab.com/mastering>\n\nFour presets - CD, enhance, bass boost, max 16 bit WAV output only. In comparison to paid emastered, it’s average. But in some cases it’s better than free mastering.studio when you have a muffled snare in the instrumental. On GSEP only CD preset was usable. The sound is more crusty than even LA Punch - more saturated (less neutral) a bit too bassy and compressed, but it may work in some songs where you don’t have a better choice and all above failed.\n\nIf your file doesn’t start uploading (hangs on “Preparing Master”), make sure you don't have “Set as a metered connection” option enabled in W10/11. If yes, disable it, and restart the assistant.\n\nStraight after your file is done uploading, it is being processed, so don’t bother going to BandLab site too fast - sometimes it’s being processed even after download button appeared, where you start waiting in a queue even few minutes after you press the WAV button later, and you will not make this any faster.\n\nOn the side. The audio you hear during preview is not exactly the same as in result downloaded from the site. Preview is a bit louder, and stresses vocal residues more, and snare is less present in the mix, although the file is more clear, sadly it’s also 16 bit, in overall it doesn’t seem to be better. Also, the file doesn’t seem to be stored locally anywhere. But if you’re desperate enough to get this preview, fasten your seatbelt. If you processed more files before, close the assistant, and open again, now process the file, so preview can be played, pause it.\n\nOn Windows go to task manager, go to details, sort by CPU, RBM on BandLab Assistant.exe (the one with the most memory occupied)>Create dump file. Open it in HXD (located in temp), write in bytes per row instead of “16”, “4000”, find string “RIFF,”. If you cannot find it, it’s wrong process - make a dump of another assistant one (one of three most intensive). If you found the “RIFF,” delete everything above it (mark everything dragging the mouse to the top, with page up pressed and then keep shift pressed and left arrow to mark also the first row, then press delete), then save it as wav. The file can be played, but it’s too big. To find the end, go to “find” (CTRL+F), hex and write FF 00 00 02 00 01 00, find (it shouldn’t be at the beginning of the file - press F3 even more than once if necessary), mark everything dragging the mouse to the top with page up pressed and press copy (CTRL+C) and paste it into new file and save as wav.\n\nYou can also use **Matchering**. It works in a way that you provide a reference file, and it tries to match the sound of your audio to the reference you provided.\n\nReference file(s) to use\n\n“Mastering The Mix” (all-in-one collection of reference songs in one file):\n\n<https://drive.google.com/file/d/1kqPmcVC3qvh_Mqd9vIssGUKpz3jTddPc/view?usp=sharing>\n\nYou need 7zip with WavPack plugin to extract it.\n\nBrown/Pink noise:\n\n<https://drive.google.com/file/d/1wJHKRb2SIgJZIc-J8kEDD1k4OQj_OXzp/view?usp=sharing>\n\n“Try to use this as reference track in Matchering to get nice wide stereo and corrected high frequencies.” zcooger\n\nBut you can use a whole song of your choice, or its short fragment (e.g. instrumental part to get better result of separation)\n\n- New Colab:\n\n<https://colab.research.google.com/github/kubinka0505/matchering-cli/blob/master/Documents/Matchering-CLI.ipynb>\n\n- Old Colab:\n\n<https://discord.com/channels/708579735583588363/814405660325969942/842132388217618442>\n\n- UVR5 (in Audio Tools) - incorporates Matchering 2\n\n- [Songmastr](https://www.songmastr.com/) can be used online instead of Colab, uses Matchering 2 (7 free masters per week).\n\nBe aware that there’s a length limit in at least UVR5, and it’s 14:44 (or possibly just 15 minutes). Instead of hit or miss by lots of reference files in one, you can also use simply one song you think will fit the most for your track. You can even further split it to a smaller fragment with e.g. lossless-cut in order to avoid reencoding. It can work even more efficiently that way.\n\nSometimes I use Matchering for different master versions of the same song when I have a few masters I like certain things in them, but none good enough on their own.\n\nUsually, in the target file should be placed the file with the richest spectrum (but feel free to experiment).\n\nCan be a target file e.g. after a lot of spectral restoration, which e.g. lost some warm and fidelity, and you need something from the previous master version.\n\nYou can also try to reprocess your result up to even 6 times, inputting a new file in target or reference each time, till you’ll find the best result. But usually 2-3 should do the trick, sometimes while using target and reference interchangeably for different result files.\n\nFor using Matchering in UVR5, necessarily check the option “Settings Test Mode” in additional settings. It will add a 10 digits number to each result, preventing you from overwriting your old files during multiple experiments conducted on your files. UVR doesn’t ask before overwriting!\n\nFeel free to experiment with WAV output quality. Probably the further you’ll go from 24 bit, the more different your result will be after converting back to 16 bit by some lossy codec like Opus on YT. But if you care mostly about the result file, then simply be aware that you can use output quality to your advantage, knowing in what way specific bit depth affects output results. E.g. the muddier results start with PCM\\_32 (non-float), 64 bit has it too, but additionally with some grittiness, but you can convert it back to 16 bit without dithering e.g. in F2K (not in DAW to avoid another conversion to 32 bit) to glue it back, when having more clarity than PCM\\_32. Sometimes 16 bit can be good to glue well sounding audio together with loudly sounding snares already, but can be muddy frequently or harsher than e.g. 32 bit non-float. Usually your result will be not so good in most cases, hence I’d encourage using higher bit depths than 16 bit here, but 24 bit can make your audio too bright at times, hence in such cases you can check 32 bit float and non-float. There’s no simple setting working for every song, but the most universal setting I found so far is using non-float 32 bit and sometimes convert it to 16-bit manually or 64 bit converted to 16 bit. These are the most balanced settings across the whole song.\n\nSometimes it can be good to have the richest file on a spectrogram as a target file, as it won’t be lost after processing.\n\nMatchering can be generally useful when you have different versions of your masters, and you’re running in circles finding the best one. Then you can use such different versions as target and reference (or in reverse), check what sounds the best, get the result, use in one of the fields, retry, and the same up to 4 times till it sounds the best. Then you could potentially master it further and/or separate into stems and bring the session back from this place.\n\nIf you need more customizable settings for Matchering, e.g. controlling limiter intensity, or disabling it completely, consider using [ComfyUI-Matchering](https://github.com/MuziekMagie/ComfyUI-Matchering) (standalone/portable ComfyUI package for CPU or Nvidia: [new\\_ComfyUI\\_windows\\_portable\\_nvidia\\_cu124\\_or\\_cpu](https://github.com/comfyanonymous/ComfyUI/releases/download/latest/new_ComfyUI_windows_portable_nvidia_cu124_or_cpu.7z))\n\n- **.masterknecht**\n\n<https://masterknecht.klangknecht.com/>\n\nWeb-based competitor of Matchering (it’s not associated with Matchering). All the processing is done locally on your machine without uploading files to a server.\n\nThe results using default settings usually sound a bit softer/warmer to those from Matchering, output is 48kHz, plus there’s much more customizable settings.\n\n[EQ curve/master transfer](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.7re8afevk1p)\n\n*Others*\n\nWindows app\n\n<https://www.curioza.com/>\n\nSome newer AI:\n\n<https://huggingface.co/spaces/nateraw/deepafx-st>\n\nAlso try this one:\n\n<https://github.com/jhtonykoo/music_mixing_style_transfer>\n\nOr:\n\n[https://github.com/joaomauricio5/AssistedSpectralRebalancePlugin](https://github.com/joaomauricio5/AssistedSpectralRebalancePlugin/releases/tag/v2.0)\n\nOthers (by bizzare808)\n\n<https://reedmayhew-audiomaster-ai.hf.space/>\n\n<https://tamamologics.de/>\n\n<https://entrepeneur4lyf.github.io/Web-Audio-Mastering/>\n\n<https://www.munute.com/ai-mastering>\n\nChatGPT\n\nIt can now master songs based on prompts and whatever you ask to make it sounds like. In the video below, the author wanted to make three instrumentals sound like Juice World. One result was decent, in the other one there was an issue with ovecrompressing/overlimiting, so the guitar was fading in/out once some other instrument was kicking in. Some prompts might fail to give you the result file, and he provided the examples.\n\n<https://www.youtube.com/watch?v=0kGJVgiyhAk>\n\n“GPT does a lot of things right now, but the biggest problem is that it can't get larger files (wav) into the buffer and thus can't process them. Compared to unstable work results, not working is more serious.” - tat\\_evop1us\n\n<https://beatstorapon.com/ai-mastering> (only mp3 192kbps for free)\n\nFor enhancing 4 stems separations from Demucs/GSEP:\n\n<https://github.com/interactiveaudiolab/MSG> (16kHz cutoff)\n\n[Platinum Notes](https://www.platinumnotes.com/)\n\n(Windows/Mac paid software)\n\n\"corrects pitch, improves volume and makes every file ready to play anywhere (...) add warm\" and dynamics, remove clipping.\n\nMastering services I'm yet to test:\n\nLandr, Aria, SoundCloud, Master Channel, Instant Mastering (iirc April fools joke), Bakuage, Mixea.\n\nAI mixing services\n\n<https://automix.roexaudio.com>\n\nAI online auto-mixing service. Various instruments, genre settings, stem priority, pan priority.\n\n1 free mix per month.\n\nMight be useful for enhancing 4 stem separations.\n\n\"I tried 2 songs with it. Wasn't really pleased with results\"\n\n\"The biggest problem I had [...] while I am trying to balance my vocals in instrumental like Hollywood style\"\n\nOther tool by Sony (open-source)\n\n<https://github.com/sony/fxnorm-automix>\n\nYou can also train your own models using wet music data.\n\nA new free tool by Sony:\n\n<https://github.com/SonyResearch/MEGAMI>\n\n<https://www.arxiv.org/abs/2511.08040>\n\n[HyMPS list](https://github.com/FORARTfe/HyMPS/blob/main/Audio/AI-based.md#mixing-) of AI/CML tools\n\nAI mixing plugins\n\niZotope Nectar\n\niZotope Neutron (Mix Assistant)\n\nSonible Pure Bundle\n\nCreating mashups and also DJ sets (two options)\n\n<https://rave.dj/mix>\n\nIt can give better results than manual mixes performed by some less experienced users (but I doubt it will work with more than 2 stems).\n\n[Ripple](https://apps.apple.com/us/app/ripple-music-creation-tool/id6447522624)\n\niOS only app (currently for US region only)\n\n\"Ripple seems to be SpongeBand just translated into English, it was released last year: <https://pandaily.com/bytedance-launches-music-creation-tool-sponge-band/>\n\n(more info about its capabilities)\n\nBack then, it only didn't have separation to 4 stems (but now the separation feature is defunct, anyway).\n\n\\_\\_\\_\\_\\_\\_\\_\n\nFor enhancing vocal track you can use WSRGLOW, and better yet, process it through Izotope RX (7-9) spectral recovery tool (in RX 10 it’s only in more expensive version irc), and then master it, or send it somewhere else above.\n\n<https://replicate.com/lucataco/wsrglow>\n\nThere are a lot of requests for music upscaling on our Discord. You can use online mastering services as well. Technically it's not upscaling in most cases, but the result can be satisfactory at times.\n\nIf you try out all solutions, and learn how they work and sound, you can easily get any track in better quality in few minutes.\n\nFor very low resolution music (if you manage to run it):\n\nAudioSR - used more often then the below, lately (voc/inst)\n\nAudio Super Resolution\n\n<https://github.com/olvrhhn/audio_super_resolution>\n\nhifi-gan-bwe\n\n<https://github.com/brentspell/hifi-gan-bwe/>\n\nMore details and links, Colabs for these in the upscaler’s full [list](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit#heading=h.i7mm2bj53u07)\n\nIf you want to start making your own remasters (even if your file is in terrible quality, especially 22kHz):\n\n<https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?usp=drivesdk>\n\nMight be useful also for low quality, crusty vocals, but it is also a guide for mixing music in overall but focused on audio restoration as well.\n\n### \\_\\_\\_Best quality on YouTube for your audio uploads\\_\\_\\_\\_\n\n1. If you already have a ready video which is not just a one frame (e.g. a cover all over the video), download MKVToolnix and replace audio track with lossless one instead of rendered lossy track. You will avoid recompression or reencoding, unlike it is during rendering normal video.\n2. If you can, upscale the video to at least 1440p or greater. It will avoid deferred transitioning of your AAC (16kHz) audio stream to Opus (20kHz) when your video gets popular, or it's old enough (for current YT audio format, check statistics for nerds). QHD/+ makes your video play in better Opus codec from the beginning, and it will sound better than after deferred transition from AAC to Opus on FHD clip (Opus audio streams checksums differs in FHD and QHD videos despite the same video source file and most likely something is broken on YT side during the process, though both Opus files are 20kHz, so the file in FHD is not recompressed from AAC, perhaps from other audio file created during YT rendering, but not from the source video).\n3. Alternative - if you have just one image to make a video of it (e.g. cover), make sure it’s at least 1440p or greater. If not, simply upscale it (e.g. XnView has some basic upscaling filters). Then place the image nearby this batch [FFmpeg script](https://disk.yandex.com/d/w7gmg_9mKSni2Q) with your lossless audio files. It will render videos with the same audio streams like original files, but muxed into your output MKV files (you can check in Foobar2000 for Audio MD5 comparison or by using AudioMD5Checker, if MD5 checksum is not embedded when looking in F2K file properties) so it won’t be recompressed on your end while making a video for upload on YT (yes, YT supports MKV!). It’s faster than MKVToolnix and you can convert multiple files with the same image at the same time (it's very fast, incomparable to normal video rendering, and output is only 1 FPS, so it will buffer in YT also very fast).\n4. You don’t have to wait till YT stop processing your HD version for Opus to appear. It happens at a point when FHD resolution appears before QHD when processing is still in progress. So check it out from time to time before you hit the publish button.\n5. Because Opus is 16 bit, and your input audio file in Matroska container might have higher bit depth, it’s good to compress your input file to Opus VBR 128kbps for testing purposes to check how it will sound on YT (of course don’t use it later for MKV file). Downsampling performed by the encoder can occasionally introduce some unwanted changes to the sound. It’s the most noticeable when audio input is 64 bit, but smaller can be still good enough.\n6. YouTube videos from early 2010 on archive.org have 192kbps AAC for 1080p ([example](https://web.archive.org/web/20130116173108/https%3A//www.youtube.com/watch?v=iD-NBs5kJqI)) (thx theamogusguy)\n7. If you deal with some harshness on your YT audio uploads with original 44kHz sample rate audio in uploaded video file, consider upsampling them manually to 48kHz before upload. It will bypass built-in Opus resampler. You can do that using e.g. Izotope RX (smooth). Although it can sound smoother just because of forced upsampling on file import to 32-bit in RX Editor and downsampling during export, esp. without dithering.\n   Alternatively you can use dBpoweramp/SSRC (F2K plugin) or SoX in the newest Foobar2000 x64 and save the output as lossless format.\n   Currently the best resampler on the Hydrogenaudio SRC [chart](https://src.hydrogenaudio.org/) is:\n   <https://github.com/rorgoroth/mingw-cmake-env/releases/tag/latest>\n   Usage:\n\nffmpeg -i \"C:\\input96or48.wav\" -af asf2sf=dblp,ardftsrc=44100:quality=61656210:bandwidth=0.9941249 -c:a pcm\\_f32le C:\\output44.wav\n\nIf you don’t deal with clipping, delete -c:a pcm\\_f32le to use just 16-bit output ([other formats](http://trac.ffmpeg.org/wiki/audio%20types)).\n\n24GB RAM recommended. It doesn’t track progress. It can take more than 10 minutes in insufficient RAM scenario (and normally less than 5 for 5-minute audio).\n\n1. Since now MVSEP supports batch API conversions, you can use Case Changing in Ant Renamer to reestablish uppercase letters to song titles for your YouTube uploads.\n\nSo if you want to batch upload on YouTube the name will not appear as \"song artist song title\" but \"Song Artist Song Title\" (since YT removes dashes and commas etc.)\n\n1. It seems like 720p is now enough to get Opus after upload (thx dca100fb8; points 9 and 8)\n2. Safari browser probably still don’t support Opus, so it rather always uses AAC 128 kbit/s 44kHz audio stream on YT for all videos instead (should be availalbe to check in PPM on played video>Stats for nerds)\n\n## \\_\\_\\_Best quality from YouTube and Soundcloud - how to squeeze out the most from the music taken from YT for separation\\_\\_\\_\n\nSometimes a better source just doesn’t exist, and only YouTube audio can be used for separation in some cases.\n\n*Introduction*\n\nAudio on YT in most cases is available in two formats:\n\n1) AAC (m4a) and Opus. As I mentioned, the latter appears for older or popular uploads, or videos uploaded in QHD or 4K. Most videos will have both formats available already. Currently only browsers without Opus support plau that audio stream (iirc Safari)\n\nAAC on YT is @128kbps with 16kHz cutoff and 44kHz (that’s not artificial cutoff - that’s how the codec normally behaves when such bitrate is set).\n\n2) Opus on YT is 96/128/152kbps with 20kHz cutoff (spectrum up to 24kHz for videos uploaded before ~2020+, but only with some aliasing above 20kHz, probably as a result of applied resampler) always 48kHz (44kHz audio is always upsampled with built-in resampler in Opus - that’s how the Opus works - it has always 48Khz output).\n\n1) and 2) can be downloaded, e.g. via JDownloader 2 (once you downloaded one file, you must delete the previously shown entry in link grabber and add the link once more, and now pick the Opus (m4a is default) for download).\n\nYou can also use it online too:\n<https://ytdlp.online/> - YouTube-dl web interface allowing downloading Opus when all others fail, then just use:\n-f 251 <https://www.youtube.com/watch?vXXXXXXX>\n\nAfterwards, the download link will be shown at the end of the command line - RBM on it and save the link. Registering and going to a separate download page afterwards shouldn’t be necessary.\nBe aware that sometimes Opus/format 251 is just not available for some videos - check it in stats for nerds when you open it on YT, see [all YT formats](https://gist.github.com/MartinEesmaa/2f4b261cb90a47e9c41ba115a011a4aa))\n\nThese sites are frequently unreliable:\n<https://cobalt.tools/> - (stopped supporting YT), probably it was just GUI for yt-dlp.\n<https://yt-dl-web.vercel.app/> - don’t always work\n\n<https://savefrom.net/> - (the one redirecting from ssyoutube.com) - works when all the\n\nabove fails, but sometimes doesn’t allow obtaining opus/webm or above 720p (then [downr.org](http://downr.org) will be able to download in better quality, but audio can be still in M4A).\n\nOpus files downloaded from JDownloader are different from Opus in webm files seen by spectrum, but I can’t compare it with Cobalt as Spek doesn’t cooperate with its webm files in at least progressive mode which is “direct vimeo stream”. yt-dlp with -x argument might be free of the issue, but I haven’t checked yet.\n\nDon't download as Opus from JDownloader 2. The quality will be affected.\n\nDownload always as webm in any quality - all qualities will contain the same Opus audio stream in the same bitrate.\nBe aware that sometimes JDownloader wrongly reports bitrate as 96kbps, while when you demux the webm file with MKVToolnix-GUI and then with MKVExtractGUI2 (compatible with MKVToolnix v 20), the result Opus file (add extension manually afterwards) will have average bitrate of not much below 128kbps (that’s how VBR works).\n\nDon’t download in OGG from Cobalt. It's a recompression from webm/Opus. OGG file is not on the variants list in JDownloader (and probably the same would be in CML tools like yt-dlp, so it’s simply not on YT).\n\nHowever, it will have some additional information below 16kHz compared to Opus downloaded from JDownloader, probably because it was sourced from webm, and not JDownloader's Opus, but that’s it. Recompression here will add some ringing issues and compression artefacts. Details and spectrograms [here](https://discord.com/channels/708579735583588363/708579735583588366/1210343573388525619).\n\nSometimes it happens that m4a (AAC) sounds better than Opus. It all depends on a track. It is more likely to happen if both have the same cutoff in spectrogram due to how it was uploaded on YT.\n\n*What to do to improve the audio gathered from YT?*\n\n*#1 Joining frequencies with EQ method*\n\n1) Download both M4A and Opus audio from YT (if Opus is available for your video)\n\n2) Upsample M4A to 48kHz (or else you won’t align the two files perfectly) with e.g. Resampler (PPHS) in Ultra mode in Foobar 1.3.20>Convert>...\n\n3) To have frequencies above 16kHz from Opus and better sounding frequencies up to 16kHz from AAC, we will combine the best of the both worlds by:\n\na) applying [resonant highpass](https://i.imgur.com/QLwsGzK.png) on Opus file at 15750Hz in e.g. Ozone 8/9 EQ\n\nb) aligning the track to M4A audio file (converted to 48kHz WAV 32), so added as separate track in free DAWs like Audacity, Cakewalk, Ableton Lite, or Pro Tools Intro (or eventually Reaper with its infinite trial).\n\nExport the mixdown as WAV24. It should be more than enough.\n\n*Using brickwall highpass instead will result in a hole in frequency in the result spectrogram (check it in Spek afterwards, and also whether there are no overlap frequencies in the crossover - consider checking also linear phase in e.g. free QRange EQ).*\n\n*#2 Manual ensemble in UVR*\n\n*Files ensemble with Max Spec* in UVR\n\nInstead of EQ, you can useensemble after manual upsampling of M4A file. You can have your files aligned in UVR.\n\nBe aware that this method is not fully transparent, and produce files a little bit brighter, and still with cutoff, but not brickwall like in M4A.\n\nWithout upsampling step, you can use Max Spec method with great results also for *Soundcloud* which provides 64kbit/s opus and 128kbp/s mp3 and 256kbp/s aac.\n\nYou only need to amplify the mp3 file by 3dB. Align step is also necessary here, but it can be performed in UVR.\n\n(fixed in UVR 5.6) Be aware that a bug in manual ensemble exists which forces 16 bit output despite choosing e.g. 32-bit float. To fix it, you need to execute regular separation of a song with any AI model with a 32 bit set, and then you need to return to manual ensemble without changing any settings now, so from now on it will retain 32-bit float in manual ensemble.\n\nYou can fix this by changing the 510th line of lib\\_v5/spec\\_utils.py to:\n\nsf.write(save\\_path, normalize(output.T, is\\_normalization), samplerate, subtype='FLOAT')\n\nthen restart the program (you may not find that file if your UVR is not taken from source).\n\nTBH, I didn’t compare directly the first EQ vs the latter Max Spec method, but the latter sounds brighter for sure than opus, and m4a.\n\n“while it helps to make trebles more defined, it's a bit flawed, due ensembling 3 different compression methods, so 3 different compression flaws/errors and noises”.\n\nPS. For YT I also tried downsampling Opus to 44 and to leave M4A intact, but it gave worse results (probably because of more frequencies affected by resampler in this case).\n\n*Explanation*\n\nAudio file sizes and bitrate are the same for both formats. Knowing that the cutoff in AAC is not artificial, but codec without a doubt efficiently compresses only audio up to 16kHz, leaving everything higher blank and untouched, we can come to the conclusion that frequencies up to 16kHz in AAC may sound better than in Opus, since the size and bitrate of both files is the same, and most likely bitrate in AAC is not used to frequencies above 16kHz, so full 128kbps bitrate is used only for frequencies up to 16kHz in AAC codec while in Opus for the whole spectrum up to 20 or even 24kHz in some old videos till around 2020, while keeping the same size, so that might be more harmful for frequencies up to 16kHz than in AAC.\n\nPS. After some time, I receive explanation/reassurance on the purpose of this process [here](https://imgur.com/a/4qBnIx3), saying it’s generally justified and Opus is actually better than AAC even above 9600Hz, so one more additional cutoff in AAC will be needed. Also, might be worthy to use phase linear EQ to get rid of some coloration of the result file.\n\nExperimenting with it, make sure that you don’t run into overlapping frequencies in area of bypassing (e.g. you can see it [here](https://cdn.discordapp.com/attachments/1070055072706347061/1088074452824248481/image.png) as slightly brighter area above 9.6kHz up to 12kHz) to avoid it in e.g. in RX editor, one filtered signal needs to be 10Hz away from another one. I.e. if lowpass is 12000 Hz, then highpass is 12010 Hz. “But there is a catch with iZotope RX. The 10Hz away I described is only applied to the Copy operation (when you basically select the frequency range, and just CTRL+C by copying it). But there is also Silence operation (when you select freq. range and press Delete, it eliminates the freq. in this range), and it is another way around: you need to get the other signal 10Hz inward, so they overlap. I.e.: 12000 Hz lowpass, 11990 Hz highpass. Here is the video demo: <https://youtu.be/h5yE5cpqqMU>”\n\n*#3 Bash script to automate the AAC/Opus quality combining from YT audio*\n\nintroC eventually wrote his bash script which makes an alignment (so trimming 1600 samples from m4a), performs cutoffs and joins frequencies of both files for you - without an overlap issue (tested with white noise). The script works for multiple m4a and webm files with the same name. Probably, MSYS2 (or cygwin) is required to run this script on Windows or for W10/11 use WSL ([read](https://www.thewindowsclub.com/how-to-run-sh-or-shell-script-file-in-windows-10)).\n\nHe also took a more conservative approach here and changed the cutoff frequency from 9600Hz to 1400Hz since AAC didn’t perform better in one song, but below 1400Hz it will be rather in every case. What cutoff is actually the best might be sometimes depending on a song. The [script](https://drive.google.com/file/d/16jarqz3gyFcjB4HEBS_24-i9eJ5iyP3D/view?usp=sharing) is subject to change.\n\n#4 *Method for better quality of instrumental leaks on YT by theamogusguy*“I did something really odd. (...) since you can only rip max 128kbps I did something really odd to get a higher quality instrumental:\n\nI inverted the 128kbps AAC YouTube rip into the original to get the acapella\n\nI took the subtracted acapella and ran it through AI (mel-roformer 2024.10) to reduce the compression artifacts\n\nI then inverted the isolated acapella and mixed it with the lossless to get an... unusual lossless instrumental file?\nalso the OPUS stream goes up to 20khz but I feel like the sample rate difference is gonna cause issues, so I ended up ripping AAC (OPUS is 48khz while most music is 44.1khz)”\n\n###### \\_\\_\\_\\_\\_Custom UVR models\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nMostly outdated models, see [here](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2156069553) for more submissions from 2024\n\n0) BubbleG — 15.06.2021\n\n[Final drum model](https://cdn.discordapp.com/attachments/708580573697933382/854421893166530609/drums-4BAND-3090_4band.pth) (for UVR 5 and 4band\\_44100.json4band\\_44100.json)\n\n1. Dry Paint Dealer Undr — 08.07.2021\n\nharing [wip piano model](https://drive.google.com/file/d/1_GEEhvZj1qyIod1d1MX2lM6u65CTpbml/view?usp=s) trained on almost 300 songs might continue to train might not, has an issue where it also removes bass guitar too\n\n1. BubbleG — 16.06.2021\n\n[Temp. bass model.](https://cdn.discordapp.com/attachments/708580573697933382/854812714115268608/bass-4BAND-3090_4band.pth) Must use with 4band\\_44100.json\n\n1. viperx — 04.08.2021\n\nMy [simple karaoke model](https://drive.google.com/file/d/1Ra55Gb55Df-x9NrlJGCDcDlKSNECusMH/view?usp=sharing) that I trained in month 5 until epoch 25/28 doesn't complete the training because I've been busy with other projects, and I left this one aside, but this simple model removes the second voice, it can be useful in only some cases, it's bad but it's acceptable\n\n1. [centre isolation model](https://cdn.discordapp.com/attachments/708580573697933382/874518732737765376/model_0_1.pth) epoch 0 inner epoch 1 - 150 pairs for UVR [4.0.1](https://github.com/tsurumeso/vocal-remover/tree/develop)\n2. K-POP FILTERS — 02.07.2021\n\n[model\\_0\\_0\\_1024\\_2048.pth](https://drive.google.com/file/d/12kRtL6XfRLiOwUx1JF2vXKE_-8P2g5Wt/view?usp=sharing)\n\nfeedback will be appreciated\n\nCheck [#model-sharing](https://discord.com/channels/708579735583588363/708580573697933382/874518735816368168) for current WiP models\n\n###### \\_\\_Repository of old Colab notebooks\\_\\_\n\nUVR 5 (Colab by HV): <https://colab.research.google.com/github/NaJeongMo/Colaboratory-Notebook-for-Ultimate-Vocal-Remover/blob/main/Vocal%20Remover%205_arch.ipynb>\n\n(On Mobile Chrome use PC mode)\n\nAlternative UVR 5 notebook up to date (not HV’s):\n\n<https://colab.research.google.com/github/lucassantilli/UVR-Colab-GUI/blob/main/UVR_v5.ipynb#scrollTo=-KYA8iOZ8BKq>\n\nMDX (Colab by CyberWaifu, 4 stem, cannot be used in Mobile Chrome even using PC mode - there's no GDrive mounting and track downloading is always 0%. Model A cleaner but with more bleeding; Audioshake is based on it, but with different model based on larger dataset iirc, UVR team consider training it on their own bigger dataset to get better results - it’s based on phase unlike UVR, but tsumeruso works on adding phase, so then it might get rewritten to UVR)\n\n<https://colab.research.google.com/drive/1R32s9M50tn_TRUGIkfnjNPYdbUvQOcfh?usp=sharing>\n\n(wait patiently, it doesn’t show the progress)\n\nUVR 5 (old version by HV with any 2 files ensemble feature, put tracks in separated folder. As for x/z - similar results, but not the same. Put as first the one you want the result more similar to)\n\n[https://colab.research.google.com/drive/1eK4h-13SmbjwYPecW2-PdMoEbJcpqzDt?usp=sharing](https://colab.research.google.com/drive/1eK4h-13SmbjwYPecW2-PdMoEbJcpqzDt?usp=sharing#scrollTo=CT8TuXWLBrXF)\n\n<https://colab.research.google.com/drive/1C6i_6pBRjdbyueVw27FuRpXmEe442n4k?usp=sharing#scrollTo=CT8TuXWLBrXF> (+12 ens, no batch ens, deleted)\n\n2021-ISMIR-MSS-Challenge-CWS-PResUNet (byteMSS) (if you run out of memory, split up the input file)\n\n<https://colab.research.google.com/drive/17m08bvihZAov_F_6Rg3luNj030t6mtyk?usp=sharing>\n\nWoosung Choi's ISMIR 2020 (Colab by CyberWaifu)\n\n<https://colab.research.google.com/drive/1jlwVgC9sRCGnZAKZTpqKgeSnzP3sIj8U>\n\nVocal Remover 4:\n\n<https://colab.research.google.com/drive/1z0YBPfSexb4E7mhNz9LJP4Kfz3AvHf32>\n\nTo fix librosa error, try adding the\n\n!pip install librosa==0.8.0\n\nor 0.9.? works as well\n\nline about librosa, and if still the same, about pysound as well:\n\n<https://discord.com/channels/708579735583588363/767947630403387393/1089518963253317652>\n\n<https://colab.research.google.com/github/burntscarr/vocal-remover/blob/main/vocal_remover_burnt.ipynb>\n\n(UVR4 + models description:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/tree/v4.0.1>\n\nSearch for:\n\n\"Models included\" at the bottom\".)\n\nUVR 2.20 (it achieved some good results for old 70’s pop music for me where cymbals got muffled on current models, but prepare for more bleeding in some places vs VR4 and newer)\n\n<https://colab.research.google.com/drive/1gGtjAo3jK3nmHcMYTz0p8Qs8rZu8Lhb6?usp=sharing>\n\nSpleeter (11/16kHz, 2, 4, 5 stems, currently doesn’t work): <https://colab.research.google.com/drive/1d-NKFQVRGCV5tvbd0GOy9spMMel6mrth?usp=sharing>\n\nAccording to my experience, if you don’t need piano stem, 4 stem model makes better job than 5 stem (and even vs 2 stem, and it is also reflected in SDR results). Use 11kHz models only if your input files are sampled at 22kHz (it will provide better result in this and only in this case).\n\nIf you can, use Izotope RX-8 for 22kHz 4 stem, as it provides better separation quality with aggressiveness option. It’s Spleeter, but with better model (full band).\n\nDemucs 3.0\n\n<https://colab.research.google.com/drive/1yyEe0m8t5b3i9FQkCl_iy6c9maF2brGx?usp=sharing>\n\nTo install it locally (by britneyjbitch):\n\nI cracked the Da Vinci code on how to install Demucs V3 sweat\\_smile For anybody who struggled (on Windows) - I got you!\n\n1. DL a zip folder of Demucs 3 from Github (link: https://github.com/facebookresearch/demucs) and extract it in a desired folder\n\n2. Inside the extracted folder run cmd\n\n3. If you want to simply separate tracks, run the following command:\n\npython.exe -m pip install --requirement requirements\\_minimal.txt\n\n4. If you want to be able to train models too, run the following command:\n\npython.exe -m pip install --requirement requirements.txt\n\n5. If a read error for incompatible versions of any of the modules appears (e.g. torch) run the following command:\n\npip install desired\\_module==version\\_of\\_desired\\_module\n\ne.g. pip install torch==1.9.0\n\n6. Repeat step 5 for any incompatibilities that might occur\n\n7. Separating tracks:\n\npython.exe -m demucs -n \"desired\\_model\\_to\\_run\\_separation\" \"path\\_to\\_track\"\n\n8. If you want help finding all additional options (for example overlap or shifts), run:\n\npython.exe -m demucs --help\n\nAt least that worked for me, feel free to let me know if this worked for others as well\n\nexclamation Oh, and I forgot - between step 6 or 7, don't pay attention to a potential red error ''torchvision 0.9.1+cu111 has requirement torch==1.8.1, but you'll have torch 1.9.0 which is incompatible.''\n\nDo NOT change back to torch 1.8.0 cuz you won't be able to run demucs\n\nwarning! If ''torchvision 0.9.1+cu111 has requirement torch==1.8.1, but you'll have torch 1.9.0 which is incompatible.'' is the only red error you're getting after executing the commands from step 3,4 and/or 5, you're good to go with separation!\n\nDemucs (22khz, 4 stem):\n\n[https://colab.research.google.com/drive/1gRGRDhx9yA1KtafKhOaXZUpUoh2MuF8?usp=sharing](https://colab.research.google.com/drive/1gRGRDhx9yA1KtafKhOaXZUpUoh2MuF_8?usp=sharing)\n\n<https://colab.research.google.com/github/facebookresearch/demucs/blob/master/Demucs.ipynb>\n\n<https://colab.research.google.com/drive/1gRGRDhx9yA1KtafKhOaXZUpUoh2MuF_8?usp=sharing>\n\nOther one(s):\n\nLaSAFT:\n[https://colab.research.google.com/drive/1XIngzXDi2mF\\_y6WwDrLLx4XZtI8\\_1FAz?usp=sharing](https://colab.research.google.com/drive/1XIngzXDi2mF_y6WwDrLLx4XZtI8_1FAz?usp=sharing#scrollTo=ZNfiadwPdWbK)\n\n(original, cannot define model ATM)\n\n[https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/main/colab\\_demo/LaSAFT\\_with\\_GPoCM\\_(large)\\_Stella\\_Jang\\_Example.ipynb](https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/blob/main/colab_demo/LaSAFT_with_GPoCM_%28large%29_Stella_Jang_Example.ipynb)\n\nIf you cannot load the file, upload it manually to your Colab, or just wait patiently. Refresh Github page with CTRL+R if you can’t see the code preview.\n\nCheck out also this laSAFT [download](https://www.mediafire.com/file/j6hhuubmbb4m0po/lasaft2021.rar/file) with [message](https://discord.com/channels/708579735583588363/708579735583588366/819091933967679529) which says about superiority of 2020 model (said in march 2021).\n\nClone voice:\n\n<https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/RealTimeVoiceCloning.ipynb>\n\nMatchering:\n\n<https://cdn.discordapp.com/attachments/814405660325969942/842133128851750952/MatcheringColabSimplified.ipynb>\n\nFor more Colabs search for colab.research.google.com on our Discord server\n\n###### \\_\\_Google Colab troubleshooting (old)\\_\n\n###### *Error of authorisation during mounting:*\n\n###### TL:DR - you need to log into the same account in Colab you want to mount drive later, or just change your Colab account.\n\n###### It was introduced to Colab at some point. Once I tried to log into another account during mounting, it displayed a new window with only one account, where the wanted account didn't appear, and when I manually signed in to it, Colab showed an error on Colab, something about unsuccessful authorisation. When I changed account in the right corner this time for the same account I wanted to choose when mounting, everything went fine as it always used to be. Full list of accounts appeared. HV Colabs already have the new mount method implemented, so the old one doesn’t cause error, but in UVR notebook you can choose between the new (default) and the old one (just in case Google changed something again).\n\n* Try to log into another Google account(s) if you cannot connect with GPU anymore and/or you *exceeded your GPU limit*\n* (cannot really say if it’s really helpful at this point)\n\nPaste this code to console (Chrome: CTRL+Shift+I or …>more tools>tools for developers>console) to avoid disconnections from runtime environment or if you encounter problems while being AFK and if you run into issues of being unable to connect to GPU after reconnection after idle time or possibly after the code was executed, and you’re AFK for too long. It won’t prevent you from showing one captcha in the session.\n\ninterval = setInterval(function() {\n\nconsole.log(\"working\")\n\nvar selector = \"#top-toolbar > colab-connect-button\"\n\ndocument.querySelector(selector).shadowRoot.querySelector(\"#connect\").click()\n\nsetTimeout(function() {\n\ndocument.querySelector(selector).shadowRoot.querySelector(\"#connect\").click()\n\n}, 1000)\n\n}, 60\\*1000)\n\nIt will constantly reclick one window to appear in Colab to prevent idle check.\n\n### Repository of stems/multitracks from music to create your own dataset\n\nDatasets search engine\n\n<https://datasetsearch.research.google.com/>\n\nUp-to-date list of datasets\n\n<https://github.com/Yuan-ManX/ai-audio-datasets-list#music>\n\n33 datasets compilation list:\n\n<https://sites.google.com/site/shinnosuketakamichi/publication/corpus>\n\nZFTurbo’s list (contains duplicates from below):\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/40>\n\nCheck out also:\n\n[#resources](https://discord.com/channels/708579735583588363/773763762887852072) | [#datasets](https://discord.com/channels/708579735583588363/1286052299931652106) ([invite](https://discord.gg/ZPtAU5R6rP))\n\n**musdb18-hq** (for described errors in the repo [read](https://sigsep.github.io/datasets/musdb.html#errata))\n\n<https://drive.google.com/file/d/1ieGcVPPfgWg__BTDlIGi1TpntdOWwwdn/view?usp=sharing> (14GB 7z)\n\n<https://zenodo.org/record/3338373#.Yr2x0aQ9eyU> (mirror, 22GB zip, it can be slow at times)\n\n**Slakh2100** (2100 tracks), mono, guitar + piano, and a LOT of other stems, no vocals\n\nIf we were to ever train a multiple-source Demucs model, it would be greatly helpful\n\n<https://drive.google.com/file/d/1baMOSgbqogexZ5VDFsq3X6hgnIpt_bPw/view>\n\n<https://github.com/ethman/slakh-utils>\n\n<https://drive.google.com/file/d/1sxdNk0kekvv8FwDvzNypYe6Nf7d40Iek/view?usp=drivesdk>\n\n**Jammit** ([torrent](https://drive.google.com/file/d/1yL_ni6ENH9g9gJF4APCNwni7KiJTlZ1k/view?usp=sharing))\n\n\"the audio files can't be mixed directly. You need to apply a gain reduction of 0.77499997615814209 (in dB : -2.2139662170837942) on each track to get a perfect mixdown. This factor is about to set a 0dB on the original jammit mixtable.\"\n\n**MoisesDB**\n<https://music.ai/blog/news/introducing-moisesdb-the-ultimate-multitrack-dataset-for-source-separation-beyond-4-stems/>\n\n\"Total tracks: 240\n\nHow often folders exists for track: ('vocals', 239), ('drums', 238), ('bass', 236), ('guitar', 222), ('other\\_keys', 110), ('piano', 110), ('percussion', 99), ('bowed\\_strings', 45), ('other', 39), ('wind', 26), ('other\\_plucked', 7)\"\n\nScript to convert MoisesDB in MusDB18 format:\n\n<https://gist.github.com/kiselecheck/df62174c5d986afcc5875300fd38bf9a>\n\n**Cambridge Multitrack Library**\n\n<https://multitracksearch.cambridge-mt.com/ms-mtk-search.htm>\n\nA nice collection of legally available multitracks.\n\n\"I believe about 2/3rds of musdb18's tracks are taken from this.\"\n\nGreat for dataset for creating stem specific models like acoustic guitars, electric guitars, piano, etc. You will just get the stem file you want and combine the rest\n\n**DAMP-VSEP**\n\n<https://zenodo.org/record/3553059>\n\nSmule Digital Archive of Mobile Performances - Vocal Separation\n\nseems to be a really big dataset of instrumental-amateur vocal-mix with compression and such triplets.\n\n**Metapop**\n\n<https://metapop.com/competitions?p=1&status=ended&type=all>\n\n“Most of them have a click through to download stems. You might need to automate downloads using Simple Mass Downloader browser extension or something. Some are remix competitions, some a production, but all have stems.\"\n\n**Guitar Hero / Rockband stems**\n\n[remixpacks.ru](https://remixpacks.ru) / [remixpacks.club](https://remixpacks.club/) (taken down, now it’s under <https://remixpacks.net/> address [not sure if the site content is the same)\n\n[Python script](https://drive.google.com/file/d/1hQFEqotf1JDraxScCIMOxIDTWqNokiFk/view?usp=sharing) by MissAllure for downloading stems from: <https://docs.google.com/spreadsheets/d/1BtUSgPffbcaW4bMuGClYi8FGvaYmYyc1p4SkfpNty-U/edit?gid=0#gid=0> (only 10, but you can change it; saves you from having to open links; written by AI)\n\nRemix packs master post (removed dead links) - still has like 2000+ stems\n\n<https://drive.google.com/file/d/11NrElQSjrXT_DbTL00r9OeMrrEBral3V/view?usp=sharing>\n\nTorrent:\n\nor here:\n\nmagnet:?xt=urn:btih:45a805dbd78b8dec796a0a127c4b4d2466ddbb9a\n\n(list with names:\n\n<https://docs.google.com/spreadsheets/d/1uCWmuAUfvVLonbXp9sQUb9dEODYTHmPAOyvGxulMOCA/edit?usp=sharing>)\n\nRenamer - python script\n\n<https://mega.nz/file/gEgwwaaB#BCDDMpl-VcIZDnNYQziyklOV9Vpf43wuc76hsS3JTlw>\n\nShowcase\n\n<https://www.youtube.com/watch?v=95Q31HjU04E>\n\nArchive.org copy\n\n[~~https://web.archive.org/web/20230105142738/https://telegra.ph/Remixpacks-Collection-Vol-01-04-12-25~~](https://web.archive.org/web/20230105142738/https%3A//telegra.ph/Remixpacks-Collection-Vol-01-04-12-25)\n\nOr here (but you can’t access all the sections at the bottom and after some time you get “Unable to load” error; probably using the old Manifest uBlock with blocking specific site element would work, not sure):\n\n[https://web.archive.org/web/20230521064118/https://docs.google.com/spreadsheets/d/1\\_dIFNK3LC8A40YK-qCEHhxOCFIbny7Jv4qPEoOKBrIA/edit](https://web.archive.org/web/20230521064118/https%3A//docs.google.com/spreadsheets/d/1_dIFNK3LC8A40YK-qCEHhxOCFIbny7Jv4qPEoOKBrIA/edit)\n\n(separate downloads)\nOG subreddit source along with the file was deleted, and back when it was online, probably it was locked from downloading and scrapping it was difficult.\n\nQ: [~~https://web.archive.org/web/20230105142738/https://telegra.ph/Remixpacks-Collection-Vol-01-04-12-25~~](https://web.archive.org/web/20230105142738/https%3A//telegra.ph/Remixpacks-Collection-Vol-01-04-12-25) contents list? I don’t want to download all of them just to find one thing (genie in a bottle stems)\n\nA:\n\n<https://docs.google.com/spreadsheets/d/1eN2-l0OBD3R8AHRGjKuHpxTHbevYi0kg1O7zJZHvylY/edit?usp=drivesdk>\n\nIf there are no seeds, so the torrent is dead, “a major part of these stems are on the songstems telegram chat, including new stems that aren't in these packs”\n\n[https://t.me/+mrluHEcfixwwNzRk](https://t.me/%2BmrluHEcfixwwNzRk)\n\n“For those that aren't able to d/l the torrents anymore, or just want to d/l some of the remixpacks content,\n\nI uploaded all 26 collections (~3TB) here: <https://remixpacks.multimedia.workers.dev/>\n\nDM me to request username/password.” Bas Curtiz#5667\n\n<https://clubremixer.com/> - outrageously big database, probably reuploads from remixpacks too (but on slow Nitroflare or simply paid irc)\n\n<https://songstems.net/> - lots of remixpacks stuff reuploaded from masterposts of clubremixer.com to Yandex (free Nitroflare is 20KB/s)\n\n~~Mega collection of stems/multitracks (remixpacks - Guitar Hero, Rock Band, OG)~~\n\n[~~https://docs.google.com/spreadsheets/d/1\\_dIFNK3LC8A40YK-qCEHhxOCFIbny7Jv4qPEoOKBrIA~~](https://docs.google.com/spreadsheets/d/1_dIFNK3LC8A40YK-qCEHhxOCFIbny7Jv4qPEoOKBrIA)\n\nRock Band 4 stems (free Nitroflare mirror)\n\n<https://clubremixer.com/rb4-stems/>\n\nDifferent mixing of the RB tracks was a factor in models trained by the community. “Also, RB tracks never fade out. They are also never brickwalled.”\n\n“brickwall audio has negative influence on waveform based archs, but on spectrogram based one like all recents one, it doesn't seem to have big impact on results quality” - jarredou\n\nGH stems from X360 instead of Wii for better quality <https://www.fretsonfire.org/forums/viewtopic.php?f=5&t=57010&sid=3917a8e390f65097f07d69595dd5ba55>\n\n(free registration required, basically content of all zippyshare links of the PDF below:)\n\nPDF with separate RB3-4 stems description and DL (lots of links are offline as zippyshare is down), page 6 shows some table of content with evaluation progress.\n\n[toaz.info-stemspdf-pr\\_7a1e446f01c9b1666a9bebe9fd51f419.pdf](https://drive.google.com/file/d/1ku9KB0GQGkxh1a6z7Wm2WteMWeCyTkFV/view?usp=sharing) (reupload)\n\nHuge database (probably contains some of the above)\n\n<https://songstems.net/>\n\n*Others*:\n\nfrp.live instrumentals/acapellas\n\n<https://docs.google.com/spreadsheets/d/1NuQV8cfFPehvIwPBUGOMbiC4FSei2p923qC6af5tCV8/>\n\n22 instrumental albums and some single tracks ([DL](https://disk.yandex.com/d/Y3qPsnphNoKM2Q)) - hard to align for inversion, even for lossless, sometimes time shifts every verse, possible artefacts/bleeding after inversion to be cleaned further with [models](#_tv0x7idkh1ua).\n\n127 hip-hop instrumentals with vocal chops (duplicates from the above), and 80 with scratches or harmonies ([DL](https://buzzheavier.com/bjbhhr92sd8d))\n\n*Giliaan stems* for 4 songs (EDM/Dance/House) and Mainstream Dataset with 20 songs:\n\n<https://drive.google.com/drive/folders/1JbQRMYH9DT_vUHpf4jHwD80eC6VvpDZX>\n\nMirror (with messages below)\n\n<https://discord.com/channels/708579735583588363/1286052299931652106/1304884347492372580>\n\n50 Produce Like a Pro multitracks\n<https://producelikeapro.com/blog/happy-new-year-2022-3/>\n\n<https://producelikeapro.lpages.co/keep-truckin-multitracks-form/>\n\nPotentially more: <https://www.youtube.com/playlist?list=PLnLOmVwRMCqS1ia3o9Vv0nG5sMgcFR9Tc>\n\n*Instrumentals/vocals/stems*\n\n- Metal genre dataset\n\nContact @33meskvlla33 (iirc 2K unique songs)\n\n- (dead) Here is a smaller version of the metal dataset + the validation dataset (there is also not metal in there, but lots of the data is metal oriented)\n\n302 vocals + 802 instrumentals\n\n<https://drive.google.com/drive/folders/1TlY1FXP54sVA9T0Kfq0oOXJXq03czxrv?usp=drive_link>\n\nIf anybody wants to train an instrumental/vocal model on metal, this can get you started (I'm severely limited by my hardware).\n\nA lot of the instrumentals are official instrumental versions of albums\n\nthe stuff with my username is from my stempacks except for Omega Virus, Behold the Void and Rings of Saturn (IDK why I named these with that xd)\n\n- mesk dataset reworked (dead)\n\n<https://drive.proton.me/urls/D89RY8EEE8#x687Dk7ukX30>\n\n“This is the \"stem part\" of the metal dataset\n\nthis contains 255 instrumentals and 242 vocals spanning over:\n\n- 24 albums + 6 songs released officially as stems [a.k.a. the 30 stem packs I legally bought]\n\n- 13 Nail the Mix stem packs (labeled NTM)\n\n- 33 stems from various games, mostly Rockband (labeled RMX)\n\n- 2 tracks from the second part (coming later) I did prematurely, so they’re included [254 and 255, \\*BLOODHOST\\* by DARKO US and The Sea Starts Here by The Dali Thundering Concept]\n\nThere’s one track without numbering in the \"vocals\" folder, Crawl by Bad Omens, because it doesn’t have a corresponding instrumental, it’s only vocals.” - Mesk\n\n- Mesk 60GB dataset\n\n<https://drive.proton.me/urls/ZFFRW5BRSC#7S8qTF07Cm4d>\n\n“This is the new, full version of my dataset. (...) I made it separate into two 30gb RARs.\n\nThis contains 1357 instrumentals and 1648 vocals for a total of 3005 files total.\n\nVarious things were used, such as :\n\n- official stem packs\n\n- officially released instrumentals\n\n- playthroughs from youtube, to feed compressed files to the model (some of the instrumentals were also mp3 originally but 320 kpbs, these being the Deftones, Catsclaw, Crown Magnetar, Loathe and Cabal instrumentals)\n\n- official stem pack vocals\n\n- inverts of said officially released instrumentals, labeled [METAL-INVERT-VOCALS], cleaned up with 07.2025\n\n- 3 BS-Rofo 07.2025 results, these being : DEITY by DIR EN GREY (its there 2 times) and Sonne by Rammstein\n\n- some of my own vocals, labeled [MESK-VOCS]: these were recorded from my phone, some are reverb heavy, but most of these are just me whispering stuff\n\n- some of Kanye West’s ‘Graduation’ vocals (I did say this dataset was kinda weird)\n\n- Isling's vocals he graciously provided me, thank you :D\n\n- some random vocals, labeled [VOCALS-VARIOUS-IDK] and a random sample from Vulvodynia’s ‘Psychosadistic Design’: track ‘Triple OG Slamdown’, labeled [RANDOM-ASS-SAMPLE]\n\n- the entirety of the Extreme Metal Vocals Dataset\n\nI spent a month on this, so please tell me if I made a mistake! I probably did some at some point haha.\n\nmistakes found (I’d probably update this message as I go)\n\n1: there's only a partial output of \"Frozen Tomb\" by Shadow of Intent, the one with vocals.\n\n2: vocal track numbered 34 of my stems, A Viscious Reforming of Features is in mono!\n\nstereo version linked here: <https://drive.proton.me/urls/8FMHAXVJF8#df5xtHLvVP0p>\n\n3: Food for the Maggots has a sample at the beginning, new version without here:\n\n<https://drive.proton.me/urls/Q7A0W1R2DG#pNGOCRMzFuW2>\n\n\\_\\_\\_\n\n- Index of ~7,5K songs in multitracks in the wild - 13.03.2023 (updated link above)\n\n<https://krakenfiles.com/view/XiDE82aLOR/file.html>\n\nNo download links. Probably some will be available around the net if you search well.\n\n- Here’s a magnet link with some stems:\n\n[https://web.archive.org/web/20200606113408/https://pastebin.com/6bZtpvur](https://web.archive.org/web/20200606113408/https%3A//pastebin.com/6bZtpvur)\n\n- “From the 90s hit maker Moby himself, 500 multitracks (unreleased songs, copyright free):\n\n<https://mobygratis.com/>”\n\n- Official accapellas, instrumentals and stems\n\n<https://infinity101.wolf.usbx.me/filebrowser/share/Q9HHlUB6>\n\n- “beatmania the rhythm game makes charts very interestingly because they are all keysounded, but what's interesting is that someone made a chart to reaper project converter\n\nand essentially it just gives you stems. I think people could probably export a shit ton of electronic stems and improve models because there are a LOT of bms charts” [src](https://discord.com/channels/708579735583588363/1286052299931652106/1368998765507252254)\n\n- [Songstems.net](http://songstems.net) Telegram group where you might find some music stems\n\n[https://t.me/+mrluHEcfixwwNzRk](https://t.me/%2BmrluHEcfixwwNzRk)\n\n- Lots of instrumentals (sometimes with backing vocals) - [click](https://warnermusic.disco.ac/playlist-new/13287836?date=20230412&user_id=871447&signature=WiCwv1XExpGMH98TrQ9f9HKdK7M%3Ab6EnfZ2f)\n\n- The Spheres Dataset\n\n(orchestral)\n\n<https://zenodo.org/records/17347681>\n\n- Metal dataset\n\n<https://zenodo.org/records/8406322>\n\n760 audio excerpts from 1 to 30 seconds in mono.\n\n“iirc, the audio samples can be very short, it may need pre-processing (merging multiple samples in 1 file) to be used for training”- jarredou\n\nI think mesk did the job already for his dataset (it’s uploaded later below).\n\nAround a hundred of Eminem’s acapellas leaked:\n\n<https://drive.google.com/drive/folders/141t33Qa2h3rEi2T0lYokvBPiiDBbC6dQ>\n\nOfficial and unofficial Eminem instrumentals (single links):\n\n<https://docs.google.com/spreadsheets/d/1x9tTOOqH5WpKOoptdQzABSN_x8oZbMgzIGlGH9w1IKA/edit?gid=965054462#gid=965054462>\n\nMesk’s metal dataset (old)\n\n“resharing my metal dataset for people to claw their hands on\n\n<https://drive.google.com/drive/folders/1ajlzmyAuX-fsiKiaypN8y2GT8EYBAws5?usp=drive_link>\n\nthis consists of:\n\nofficial instrumentals, straight from my stems, what's not labelled with my name are official as well [remixpacks stuff were curated]\n\nvocal folder has official vocals (from the stems again), remixpacks (curated), inverted vocals and some weird whispery sh!t from yours truly”\n\nMesk’s metal dataset, full:\n\n<https://drive.google.com/drive/folders/1ajlzmyAuX-fsiKiaypN8y2GT8EYBAws5?usp=drive_link>\n\n“this has 1982 instrumentals and 1807 vocals\n\ncredit to both me and @ bascurtiz (i didnt want to ping him) if you decide to use it for various projects. most of the dataset contents is from his part.\n\nall things titled with my name were 100% legally bought stems.”\n\n<https://multitracks.pages.dev/> (only a list, no DL links)\n\nEnglish and Spanish multitracks\n\n- Around 30GB of T.Swft stems ([1](https://mega.nz/file/o8wRmZBR#1-XGfJ-051o8JUL10wRth4dskPnvI7jf6qc22zloZ0w) (not necessarily mirror) | [2](https://mega.nz/folder/JPtSFLTB#8IQTBnSgUnnwwWBKvt_VCA))\n\nOrchestra:\n\n<https://www.upf.edu/web/mtg/phenicx-anechoic>\n\n[https://web.archive.org/web/20241209233028/https://www.lam.jussieu.fr/Projets/index.php?page=AVAD-VR](https://web.archive.org/web/20241209233028/https%3A//www.lam.jussieu.fr/Projets/index.php?page=AVAD-VR)\n\n<https://www.openair.hosted.york.ac.uk/?page_id=310>\n\n<https://zenodo.org/records/4955282>\n\n*- Expressive Anechoic Recordings of Speech (****EARS****) dataset.*\n\n* **100 h** of speech data from **107 speakers**\n* high-quality recordings at **48 kHz** in an anechoic chamber\n* **high speaker diversity** with speakers from different ethnicities and age range from 18 to 75 years\n* **full dynamic range** of human speech, ranging from whispering to yelling\n* 18 minutes of **freeform monologues** per speaker\n* sentence reading in **7 different reading styles** (regular, loud, whisper, high pitch, low pitch, fast, slow)\n* emotional reading and freeform tasks covering **22 different emotions** for each speaker\n\n<https://github.com/facebookresearch/ears_dataset>\n\nDL:\n\n[1](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p001.zip) | [2](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p002.zip) | [3](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p003.zip) | [4](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p104.zip) | [5](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p105.zip) | [6](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p106.zip) | [7](https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p107.zip) “The dataset is made of 107 zip files that you can download one by one manually”\n\n“What is great with this dataset is that it was recorded in anechoic chamber, so no reverb, no echo, with high-end hardware. You can use it as baseline for reverb removal, speech enhancing, etc...” jarredou\n\nSites:\n\nMultitracks’ section of rutracker (requires free account):\n\n<https://rutracker.org/forum/tracker.php?f=2492>\n\nMultitracks/multitrack queries on The Pirate Bay\n\n<https://thepiratebay.org/search.php?q=multitrack&all=on&search=Pirate+Search&page=0&orderby=>\n\n<https://thepiratebay.org/search.php?q=multitracks&all=on&search=Pirate+Search&page=0&orderby=>\n\n“You can just go here <https://rutracker.org/forum/tracker.php?f=1674> (sample libraries category) and type the instrument you want, it will pop all the sample packs.\n\nMaybe add \"loop\" to the search too, will filter out some weird packs”\n\nMaybe you find something useful on sharemania.us too (160 lossy/261 lossless)\n\nSeems 'acapella tools' or 'instrumental tools' are good key-words to search for.\n\nSome are covers of original tracks, but that shouldn't matter, since they represent the same.\n\nThis is on Deezer, but u might find others on Tidal.\n\nThere’s also some stuff available on Soulseek (P2P service)\n\n<https://promodj.com/tools>\n\nThere is a lot of filtered trash, but you can also find official acapellas.\n\n<https://www.acapellas4u.co.uk/>\n\nCollection of 40K instrumentals and accapellas (lossy, rather avoid using such files for training, and search for lossless if possible)\n\n<https://isolated-tracks.com/>\n\nMultitracks. Looks like paid, but it has also few pages with some free ones (e.g. Fleetwood Mac, not sure if free) “They’re 16kHz mp3s re-encoded to 48kHz”, the same for:\n\n<https://backtracks4all.com/>\n\n<https://www.multitracks.com/>\n\nThis is also paid, but it has less known music\n\n“those are covers from famous songs, but all in multitracks.\n\nAnd from what I've listened to so far, is that they are pretty conservative.\n\nThe vocals all seem to be dry and none seem to contain bleed so far.\n\nAlso, the instrument stems are proper / not mixed up with other instrumentals.\n\nThe stems are the exact same duration.\n\nAll in all, a solid dataset right off-the-bat imo.\n\nI should've calculated it prior, what the better subscription was, the 10GB or 20GB a day one vs. price vs. content approx. in total.\n\n52mb (wav) \\* 12 (multitracks) = 624mb per song\n\n4.766 songs \\* 624 = 2973984 mb = 2.97tb\n\nweekly limit = 70gb \\* 4 (weeks) = 280gb = 280000mb\n\n2973984 / 280000 = 10,6 weeks in total.\n\n10,6 / 4 = 2,65 so 3 months x $30 = 90 bucks”\n\n<https://www.epidemicsound.com/music/search/>\n\nCan be ripped. Some tracks there will be a subject to rule out due to bleeding. Plenty of genres. Might be good for diverse dataset.\n\n<https://bleep.com/stream/stems>\n\nLooks like official stems for sale. ~45 songs in total.\n\n**FullSOL** (only for premium users; min. 200EU for year)\n\n<https://forum.ircam.fr/projects/detail/fullsol/>\n\n19,91 GB of audio samples. No percussion.\n\nInstruments: Bass Tuba, Horn, Trombone, Trumpet in C, Accordion, Harp, Guitar, Violin, Viola, Violoncello, Contrabass, Bassoon, Bb Clarinet, Flute, Oboe, Alto Saxophone\n\n(jarredou has it)\n\n***Vocals/speech***\n\n**MedleyVox dataset** (for separating different singers)of which they refrain from releasing the model for (and Cyrus eventually did it single-handedly):\n\n<https://github.com/CBeast25/MedleyVox> (13 different singing datasets of 400 hours and 460 hours of LibriSpeech data for training)\n\n<https://zenodo.org/record/7984549>\n\nk\\_multisinger (data folder struct reconstructed to male/female subfolder only. and not includes labels)\n\nfor train RoFormer Chrous male/female separator:\n\n<https://drive.google.com/file/d/18evyY82ec4IdNT2z8q76zm30EWhfc-9j/view?usp=sharing>\n\nk\\_multitimbre a.k.a. K\\_multitembre (Original Folder Struct):\n\n<https://drive.google.com/file/d/1Ic4P8gCGwbLshR118N8V3tbAU-D9Us-i/view?usp=sharing>\n\nPotentially more here:\n\n<https://sites.google.com/site/shinnosuketakamichi/home>\n\nBe aware that the only one MedleyVox dataset which remains unobtainable to this day is TONAS, but it’s small, esp. compare to the Korean datasets. Besides this one, queer and Cyrus have them all on our Discord, but they’re huge. Ksinger and Ktimbre takes ~300GB unzipped for both.\n\nChoralSynth dataset\n\n<https://zenodo.org/records/10137883>\n\n- screaming, cheering, applause, whistling, mumble, etc... dataset by jarredou’s (@rigo2) “collected from all the sources I've found, to help model creation:\n\n+5000 stereo wav files, 44100hz\n\n~37 hours of audio data”\n\nHit him on Discord for the link\n\n- ”Ultimate **laugh** tracks for sitcoms, game shows, talk shows, and comedy projects (available on Amazon Music and Apple Music ([ripped](https://mega.nz/file/1Nsh1YZa#laCLTYV1Rx_LpMQX63hT4hk_76ajw0sOLrVfqjLt42Y), YT upload has similarly looking spectrograms)\n\n- Laughter-Crowd Dataset #2.zip <https://terabox.com/s/1xLuZWvpGX0LTQypO1p7u_g>\n\n- There is 768.44 GB of K-pop stems somewhere in the wild (maybe ask .mikeyyyyy)\n\n- Gabox karaoke dataset (2GB)\n\n<https://gofile.io/d/TyzaH8> (dead; “i won't upload it again since becruily told me that dataset type 4 (iirc) was the best for karaoke”)\n\n“(may need a check, iirc there were songs without bv, also it doesn't have the vocals part)”\n\nGabox [type 4](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/dataset_types.md#type-4-musdb-aligned) karaoke dataset V2 (21GB)\n\n(thanks dca100fb8)\n\n<https://gofile.io/d/bUkLAE>\n\n“Some vocals have inst bleed, if someone is willing to help it'd be much appreciated (...)\n\nmy dataset \"other\" folder vocals are private”\n\n- dca100fb8’s WIP Karaoke dataset (30.2GB, 674 pairs)\n\n<https://gofile.io/d/OgOxfG> (“it might expire soon so remember to download it if it's useful for you”\n\nReupload: <https://gofile.io/d/pSbWEL>)\n\n“I apologize for the mistakes in this dataset, there ARE errors like duplicate songs, BVinsts without vocals at all, or LV stems with noise/bleed, or even low quality files, if someone is able to clean this dataset, it would be appreciated. Also I'm aware it has double leads (double stack - 2 lead vocals sang together in mono center channel, layered) which prob shouldn't be kept in the data as it could confuse the AI, while other songs from the list clearly separate lead vocals (thanks gilliaan for noticing), thank you for your comprehension”\n\n- RawStems\n\n<https://huggingface.co/datasets/yongyizang/RawStems> (DL)\n\n<https://github.com/yongyizang/music-source-restoration/blob/main/preprint.pdf> (paper)\n\n“A dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories”.\n\n**DnR** (speech, music, effects)\n\n*Divide and Remaster v3: Multilingual Validation & Test Set*\n\n<https://zenodo.org/records/12658755>\n\n“but there are a bunch of other versions of v3 for specific languages, I'm not sure what is the difference.\n\nThere are separate versions for English, Spanish, French, etc.\n\n<https://zenodo.org/communities/opencass/records?q=&l=list&p=1&s=10&sort=newest=>”\n\nIn fact, you can train on validation too, but it’s not necessary anymore as the dataset was published already:\n\n<https://github.com/kwatcharasupat/divide-and-remaster-v3/wiki/Getting-the-Dataset>\n\n###### **SFX**\n\nDatasets for potential SFX separation model\n\n- <https://cocktail-fork.github.io/> (SPEECH-VOICE-SFX (3 stems), 174GB)\n\n- <https://www.sounds-resource.com/>\n\n- <https://mixkit.co/free-sound-effects/game/>\n\n- <https://opengameart.org/content/library-of-game-sounds>\n\n- <https://pixabay.com/sound-effects/search/game/>\n\n- <https://www.boomlibrary.com/shop/?swoof=1&pa_producttype=free-sound-effects>\n\n- <https://www.adobe.com/products/audition/offers/adobeauditiondlcsfx.html>\n\n- Spongebob stems (500MB)\n\n<https://drive.usercontent.google.com/download?id=1P19Diyw7CRteqeLs0beDpCaexFyZJiDs&export=download&authuser=0>\n\n- Nickelodeon leak (2024 Nick Giga Leak7.zip/nick.7z) (10.7GB)\n\n<https://myrient.erista.me/files/Miscellaneous/Nickelodeon%20Leaks/>\n\n(not a full leak, as it has 500GB and only some people have it)\n\n- Sound effects HQ by soniss 2024 (27.5GB+)\n\n<https://gdc.sonniss.com/>\n\n- “Free sound FX samples packs from Adobe”:\n\n<https://www.adobe.com/products/audition/offers/adobeauditiondlcsfx.html>\n\n- GTA San Andreas lossless SFX\n\n<https://gtaforums.com/topic/957917-sa-uncompressed-sfx-pack/?tab=comments#comment-1071266243>\n\n- Collection of SFX from many games\n\n<https://sounds.spriters-resource.com/>\n\n**Piano**\n\n“**MAESTRO**” is a dataset composed of about 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.\n\n<https://magenta.tensorflow.org/datasets/maestro>\n\n**GiantMIDI-Piano** is a classical piano MIDI dataset contains 10,855 MIDI files of 2,786 composers. The curated subset by constraining composer surnames contains 7,236 MIDI files of 1,787 composers. GiantMIDI-Piano are transcribed from live recordings with a high-resolution piano transcription system\n\n<https://github.com/bytedance/GiantMIDI-Piano>\n\n**Drums**\n\n[StemGMD: A Large-Scale Audio Dataset of Isolated Drum Stems for Deep Drums Demixing](https://zenodo.org/records/7860223)\n\n(although drumsep used bigger dataset consisting of MIDI sounds to avoid bleeding, with XLN only)\n\n*Virtual drumkits*\n\nThe “advantage is that you can have zero bleed between elements, which is not possible with real live drums.\n\nYou can create “more than 300 drumkits as virtual instruments (toontrack, kontakt, xln, slate, bfd, XLN ones are nice too (from their trigger and drums VST) + a Reaper framework to multiply that by 10 (using heavily different mixing processes for each drum elements), so potentially 3000 different sounding drumkits “\n\n“one could use producer sample packs/kits for more modern samples” there are tons of packs around the net.\n\njarredou (rigo2):\n\n“For those interested, I'm sharing on demand my drums separation dataset.\n\nIt's not a final version. I've realised after generating 130h of audio data that I've made a mistake in routing, leading to some occasional cowbell in snare stems. So it's [Kick/Snare-cowbell/Toms/HiHat/Ride/Crash] stems.\n\nI've stopped it's rendering and will not make the final \"mastering\" stage that was planned.\n\nI will make a clean no-cowbell version, but as I'm lacking free time, I don't know when, and as this one is here and already great sounding why not using it in the meantime.\n\nJust don't mind the cowbell!”\n\nLooks like it’s the thing:\n\n[~~http://rigaudio.fr/datasets/DrumsDataset.zip~~](http://rigaudio.fr/datasets/DrumsDataset.zip)\n\nNewer version in a better formatted version, with train/valid separated parts, generated mixtures and a fixed filename that was containing an extra space:\n\n<https://rigaudio.fr/datasets/DrumsDatasetv2.zip> (25GB)\n\n(still the same issues with some occasional cowbell in snare stems. “There are also few other percussions here and there on some little parts for some tracks (like tambourine in ride stem).”)\n\n“I realise now that I totally forgot to lowercase all filenames before reuploading the dataset.\n\nTo avoid issues where some awaited filenames are hardcoded in ZFTurbo's script, the best way is so to lowercase all filenames in train/valid parts, convert the valid part to .wav files (no need for the train part that can handle flac correctly).\n\nAnd lowercase the stem names in training part of the config file accordingly.”\n\n“Can probably be useful to create electro drums separation dataset, free 50,000 drums MIDI files:” <https://abasynthphony.gumroad.com/l/50000MIDIFilesforDanceMusicDrum?layout=profile>\n\nmesk’s metal drums dataset (drums in one stem):\n\n??? (maybe ask him, looks I forgot pasting the link and can’t find it anymore)\n\n*Rhythm and lead guitar*\n\n<https://www.mrtabs.com/>\n\n“He has isolated tracks for his videos that he makes, and it's free (or he does not know how to properly Patreon lock certain content on his website).\n\nYou can navigate to any tab page and look for the header: \"Isolated TRACKS (mp3)\" and find the textbox below where it says:\n\n\"Please sign up on Patreon, or if you are already a member, please login.\"\n\nThe last word links to a Patreon signup page, and if you sign in, it does not check if you are subscribed to his Patreon or not, it will give you access regardless.\n\nBoom! Now you have access to 250+ Lead and rhythm guitar pairs. There is a goldmine worth of metal stuff in there too.\n\nThis is probably the closes we could ever get to having a contemporary rhythm/lead guitar dataset that is both relatively large, the rhythm and lead has their own tracks, its diverse, and actually includes songs that we like/listen to.\n\nOnly problem is all of them are exported in mp3 with a cutoff of 16khz, so it is equivalent 128 kbps, and the denoising that was done in post is pretty lazy.\n\nHowever, I think if these parts are upscaled with FlashSR it would be great.\n\nOr maybe Re-Amp a low pass filter version of the stems with the Ampltibe 5 presets that he also attaches to all tabs, and ensemble the remaining frequencies that way.\n\nI personally would not recommend using the drum and bass stems, only the guitar parts, since the drum and bass are both programmed and are uniform.\n\nThe tone for every video is unique and tone matched to their respective albums and tracks. Even if it's not dead on, it's better than trying to use some yayhoos guitar doodles that uses the same amp/cabinet/simulator for every track.” Vinctekan\n\nErnhu\n\n[China traditional music instrument dataset]\n\n<https://zenodo.org/records/8012071>\n\nEGFxSet: Electric guitar tones processed through real effects of distortion, modulation, delay and reverb\n\n<https://zenodo.org/records/7044411>\n\n\\_\\_\\_\\_\\_\\_\\_\n\n<https://www.monotostereo.info/>\n\n“Helped me find not only tools but also other resources like research papers, etc on audio source separation in general. A fantastic resource for anyone into audio source separation”\n\nFor more links, check [#resources](https://discord.com/channels/708579735583588363/773763762887852072) and [#datasets](https://discord.com/channels/708579735583588363/1286052299931652106) and [Post dataset](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/40) (you may encounter duplicates)\n\n\\_\\_\\_\\_\n\n## List of cloud services with a lot of space\n\n## or for temporary storage\n\nUnlimited\n\n<https://filegarden.com/>\n\nNo info on any limits, URL shortener, browser bar player, open [source](https://github.com/filegarden), registration required.\n\nUnlimited\n\n<https://imgur.gg/>\n\n500MB/file and 5GB/file for registered users, no expiration, no registration required.\n\nAudio files previewing.\n\nUnlimited\n<https://krakenfiles.com/>\n\n1GB/file and 5GB/file for premium users, no registration required.\nAudio files previewing.\nYou can accidentally download a virus without having any adblocker, esp. on iOS - be aware ([example](https://discord.com/channels/708579735583588363/708579735583588366/1337956279624274022)).\n\nUnlimited\n<https://pillows.su/>\n\n200MB/file and 500MB/file for registered\n\nPerhaps not the best solution out of all. “It had issues during December and January 2025” it might have a problem with uploading not working randomly. Even in August 2025 someone had issues with accessing pillowcase uploads.\n\nOnly for audio files (also zip for registered users) - allows playing audio files, shows spectrograms. I’ve met with a case when some files uploaded one by one couldn’t be played or downloaded (HTTP 500 error) at least after some period. For at least 503 error, it was enough to reload on a page the error appeared on to start the download.\n\n(formerly [pillowcase.su](http://pillowcase.su), thx Nick088)\n\n?Unlimited\n\n<https://vocaroo.com/upload>\n\nNot sure on file size limit or expiration\n\nFor audio files only, no registration option\n\nUnlimited\n\n<https://buzzheavier.com/>\n\n(mirrors: <https://trashbytes.net/> / <https://flashbang.sh/>)\n\nFiles kept “forever”. Almost 600 Mbit/s of upload speed.\nIt optionally creates not only download link, but also torrent file and seeds it. Optional expiration (with also “never”). “The owner doesn't give a damn about DMCA.” On unstable connections, it might break the download in the middle, forcing to start from scratch. Then you must refrain from using the connection during the download, or potentially using Free Download Manager might fix the issue too. Be aware that first download click on the site redirects you to advertisement without uBlock Origin, and it might lead to starting downloading malicious files instead. Also, uploading stops on 0% from certain hosts.\nNo audio previewing.\n\nUnlimited\n\n<https://qiwi.gg/>\n\n(at least any info about limits in the account panel cannot be found)\nRegistration required\n\nUnlimited\n\n<https://pd.heracle.net/drive>\nNo file size limit (999TB Storage). No download speed limit.\n\nSlow upload (around 5 Mbit/s), registration required. Unlimited file size upload and storage space.\n\n*Expiring/temporary or problematic*\n\nUnlimited\n\n<https://catbox.moe/>\n\n200MB max file size,\n\nfiles kept forever, donations.\n\nDon’t use it. Some providers block the site and its subdomains: “litter.catbox.moe” (domain for downloading files uploaded with litterbox.catbox.moe), and litterbox.catbox.moe (expiring, with up to 1GB file limit) and e.g. not everyone on our server are able to download files from there. Also, not all VPNs work with it. Possible issues: SSL errors, DNS block (then using 108.181.20.36 might work) or IP block (then use VPN, but same issues may occur at times), timeouts, also be aware that deleted files by users or which already went offline have weird old school “404 Not Found nginx/1.18.0 (Ubuntu)\" which might be misleading that there’s something wrong with their provider, but it’s just a file being offline. Countries blocking the site: Australia, UK, Ireland, Afghanistan, Iran. Providers: Verizon, Spectrum, Rogers, Quad9 DNS, Comcast/Comcast Business. [More](https://catbox.moe/faq.php)\n\nUnlimited\n\n<https://transfer.it/>\n\nNo file size limit, up to 90 days expiration\n“New file upload service hosted by MEGA”, no account required, MEGA opt. integration\n\n8 minutes/25MB for free, 5 hours/250MB for pro accounts\nFor audio files only, it compresses all non-mp3 files to mp3 320kbps (mp3s remain untouched). Files uploaded without an account expire after 24 hours.\n<https://whyp.it/>\n\n2GB max file size, free, expiring up to 7 days (or paid), now requires email\n\n<https://wetransfer.com/>\n\n2,5GB max file size, free, no account, expiring\n\n<https://send.vis.ee/>\n\n5GB max file size/storage\n\n<https://www.sendgb.com/>\n\nExpiring after 1-90 days, paid 1TB storage and 500GB max file size\n\n15 GB, expiring\n<https://fileditch.com/>\n\n10 GB, expiring\n\n<https://tmp.ninja/>\n\nUnlimited, but usually 14GB (10 day expiry date till last DL):\n\n[*https://gofile.io*](https://gofile.io)\n\n*Don’t use it,*\n\nas certain files, e.g. with 1GB size (at least in some cases), can be only downloaded with premium account if servers are overloaded - you can visit the link for 10 days till it expire in hope of the server being offloaded, but still not be able to download the file at all. GFY, WS.\n\nUnlimited (till their server space is full, which sadly is often the case,\n\nfiles get deleted after 6 days):\n\n<https://filebin.net/>\n\n2TB (once you used your mobile app)\n\nBaidu Pan\n\nUsers outside of China are restricted from registering, but there are ways to circumvent the issue with [Baidu Cloud](https://www.infinityfolder.com/how-to-create-a-baidu-account-from-outside-china/) or [duspeaker](https://www.gizdev.com/create-baidu-account-without-china-number-and-vpn/). Guides: [1](https://www.baiduinenglish.com/baiduyun-pan-baidu-cloud-wangpan-disk.html) | [2](https://github.com/junh1024/junh1024-Documents/blob/master/Computers/How%20to%20use%20Baidu%20Pan.md)\n\n“I don't recommend it, unless you pay, you're stuck with 150KB/s downloads maximum” plus\n\n“a requirement to use their app to actually download (at least if you decide to share your files)”\n\n100GB (files expire after 21 days/50 downloads, max 6GB per file)\n\n<https://filetransfer.io>\n\nIt could mess up with filenames after downloading from direct link which was also possible [at least in the past] and could be used e.g. in Google Colab, there was some sneaky method of extracting direct links clicking on download buttons instead of sharing classic links\n\n50GB without registration (up to 30 days of expiry date)\n\n<https://www.swisstransfer.com/>\n\n50GB without registration (up to 14 days expiry date):\n\n<https://dropmefiles.com/>\n\n250GB (expires 15/45 days after last download [un/reg], or never for 1TB 6$ per month)\n\n<https://filelu.com/>\n\n1000GB\n\n<https://www.terabox.com/>\n\n(but I have some reports that after uploading X amount of data (it depends) they block the account and tell you to pay)\n\n100GB\n\n<https://degoo.com/>\n\n(but Degoo has bots which look for DCMA content, and they close even paid accounts in such cases or even some files without any reason)\n\nUnlimited\n\nDepositfiles (now [dfiles.eu](https://dfiles.eu/))\n\n10GB/file, FTP\n\nThey exist since forever (2006) and probably didn't collapse so far due to nightmarishly slow download speeds (20 or even 50KB/s, can't remember) for at least non-Gold accounts. But links get offline there occasionally too (maybe less frequently than some other services). Registration required.\n\nUnlimited\n\n[Chomikuj.pl](https://chomikuj.pl/)\n\nOnly 50MB of downloading for free (even for your own files)\nBe aware that it happened in the past, that along years, among very big collection of files, someone had I think some even encrypted private files deleted, but usually only DCMAed files are taken out.\nSince around the end of 2023, they started sending PMs to users warning about deleting some of the files on their account, moving them to special folders with deletion date. You can reupload your file after deletion date again. The same action needs to be repeated at least once a year.\n\nAlso, the site exists since “forever” (2006), and didn't collapse despite many court cases, probably due to creative changes of owners from specific countries. It can be also used as public sharing disk with points of transfer for downloaded content from your disk.\n\nUnlimited\n\n[Pixeldrain.com](https://pixeldrain.com/)\n20GB/file, expiring for free accounts (4 months) or till the pro account is valid (min. 8 months?)\n6 GB per day downloading cap for free, then it downloads with 1 MiB/s.\n\n~~Some files uploaded on Pixeldrain are only available for Pro users.\nOn some files it will just tell you that servers are overloaded, and the error will last for days, weeks even, and not let you download. So I'd rather refrain from using it.~~ I think I confused it with an issue with gofile I once had.\n\nHow to download faster from sites like PixelDrain:\n\n<https://pixeldrain-bypass.cybar.xyz/>\n\n50GB\n\nNo registration required, 7 days expiry time for free\n\n<https://fex.net/en/>\n\n32GB\n\n(down) <https://www.transferfile.io/>\n\nDecentralized file hosting. If it goes down, perhaps the link can be replaced by ipfs.io\n\nCode/datasets/models repository sites\n\n[GitHub](https://github.com/)\n\n2GB/file\n\nOn the release page of (at least) public repositories, you can upload any files, and also bigger than directly in repositories (e.g. encrypted to avoid any problems - copyrighted content, even one music file in the repository can get taken down after some time).\n\nYou can split your archive into parts if necessary.\n\nIt’s perfect to use in Colab notebooks. Very reliable and fast.\n\n[Huggingface](https://huggingface.co/)\nLot of big models are stored there nowadays too. Probably there weren’t any problems with their hosting in the past. Prob. 50GB file limit.\n\n[Zenodo](https://zenodo.org/)\n\nWe had some issues with slow downloading from there in Colabs in the past, but tons of big datasets are stored there.\n\n*Bigger popular cloud services\n(size provided for total storage space per account)*\n\n20 GB\n\n[mega.nz](https://mega.nz/)\n\nNo expiry, usual DCMAs\nOne user from my other community had his whole account deleted years ago. It happened after a few file takedowns on his account before. That time he uploaded all music segments, basically assets extracted from a computer game publicly on a forum, and shortly later he was banned. It wasn’t even arranged and ready for a normal release. Probably the publisher was snitching on him for quite some time already.\n\n15GB\nGoogle Drive\nBig, suddenly popular files can become suddenly either offline or with some other error disappearing later during the day.\nAlso, it happened in the past that very popular, bigger files started being limited as visible only for the owner due to reaching quota. You could circumvent it by adding the file into your own GDrive (if you had enough space) - not sure if the trick still works.\nGoogle reserves the right to delete the account after 2 years of inactivity. From what I’ve found, they don’t delete accounts which have YouTube account with at least one video uploaded (not sure if it must be online and public). I had a case few times with revoked privileges to some documents on GDoc, which were changed to private without my knowledge. Can’t guarantee if the same cannot happen with GDrive files.\n\n15GB\n\nMediafire\nI’ve seen very old uploads from there.\n\n15GB\n\n[4shared.com](https://www.4shared.com/)\n\nMax 3GB of daily traffic, 30GB per month\n\n(Once I got my account deleted over years, maybe due to inactivity, but even if I was warned, messages were coming into Gmail spam)\n\n15 GB\n\n[fileditch.com](https://fileditch.com/)\n\nIt allows sharing direct links like from FTP or GitHub when you upload file on release page where also bigger files can be uploaded vs 50MB in source files. I see 9 months old links still active from fileditch. Download can however be slow, 5 mbits, on old files that have not been accessed in 30 days.\n\n10.2 GB\n\n[safe.fiery.me](https://safe.fiery.me)\n\nI think this has no expiration, not sure\n\n10GB\n\n<https://box.com>\n\n250MB file limit for free\n\n11GB\n\n[yandex.com/client/disk](https://disk.yandex.com/client/disk)\n\nAt least after the war, some users started to report very slow downloads.\nRare DCMAs vs Mega.\nSince 2022 difficult registration without a phone, and/or when you use public SMS receiving gate and/or VPN e.g. Russian - they can prevent you from access to a disk right after registration if they detect something suspicious during registration process, after the war they decreased max file size limit to 1GB iirc.\nAlso, there are very little means to recover your account in case your secret name miraculously get changed, and using new ISP or after long inactivity, you’re asked for it (I had such situation in the past, and IDK how long the files are stored on inactive accounts). Attempts of using automated forms to fill for account recovery are vain (esp. if you don’t know precise account creation dates, didn’t use their email, etc.). Even if you’re logged into their Disk app on the phone, you cannot change any account related information like secret question or password without being redirected to their page and having to log into the account which you forgot the secret question (or it got suspiciously changed; quite honestly - I got it changed either by them after long inactivity of around 2 years, or by some attacker [but it would require secret question]).\n\n10GB\n\n<https://ufile.io>\n\n5GB\nOneDrive\nMS have taken some space expansions given away for free once (some older accounts might be still bigger than that)\n\n5GB\n\nProton Drive\n\nEnd-to end encryption working also for sharing (although mega has probably the same)\n\n2GB\nDropbox\n\nWith some options to expand it for free and also with referrals\n\n20GB/month\n\n[files.fm](https://files.fm/)\n\n5GB upload per file limit/5GB zip file download limit,\n\ninformation on expiring not provided (iirc possible to set manually at least);\n\nunregistered up to 60 days.\n\n*Temporary file uploads that expire anytime (by HV):*\n\n<https://litterbox.catbox.moe/> (don’t confuse with catbox.moe)\n\n(1 GB)\n\n<https://pomf.lain.la>\n\n(512 MiB)\n<https://cockfile.com/>\n\n(128 MiB, IK funny name, but)\n<https://uguu.se/>\n\n(128 MiB)\n\n<https://sicp.me/>\n\n(114 MB)\n\n<https://www.moepantsu.com/>\n\n(128 MiB)\n\nHint. In the case of some free, not very well-known services which can even disappear after some longer period of time (do you remember RapidShare, copy.com, hotfile or megaupload, catshare, freakshare, uploaded.to, fileserve, share-online.biz, odsiebie, hostuje.net?) it’s better to keep your files in more than one service (I recommend 3 copies for important data kept long), or stick to some popular big tech companies which are unlikely to disappear soon (if they don’t take your upload down) or if another war will break out and increasing energy costs will make smaller services unprofitable like not long ago.\n\nPaid:\n\n*- You can get 1TB OneDrive with an .edu email*\n\n- If you sign up for the Google Workspace (it was called G Suite until recently) version of Google Drive, you can get 1TB for ~$10 USD a month, but here's the thing... I have been way over 1tb for a couple of years now, and they have never charged me anymore. I am over 4tb now and have been for ~3 months, and it is still only ~$10. If you do it, just create a 1 user account and just keep filling it up until they say you need to add more users or pay more.\n\nWell it looks like it is $12 now, but it's for 2tb and maybe that is what they change my plan to and are charging me now too… I thought there was some kind of surcharge and tax (never really paid attention to the exact amount) but guess it is just $12 + tax now...\n\n<https://workspace.google.com/pricing.html>\n\nIt looks like they might have gotten rid of it, but it used to be $50/month for unlimited storage, but I think as long as you do what I do, I think it is probably close to unlimited for $12/month\n\nIt's pretending you're in a college and college drives have infinite storage\n\nI used to have one for 1-2 years, but it suddenly got removed, so it's not safe. All the files are gone too, without notice\n\nBTW. For workspace you still need to have your own domain (with the possibility of changing DNS entries, so free ones are out). Yearly cost is negligible, but you have to remember about it.\n\n- Also, if you have ProtonVPN on the Proton Unlimited plan, you get 500GB of storage on Proton Drive for free.\n\n- Also, Google Pixel 1 phones used to have like unlimited or at least bigger GDrive plans iirc (it was withdrawn from later Pixel phones). Some people bought these phones just for the space.\n\n- You can get very cheap 2TB (around 16$) for a year on Google Drive in Lyres (I think they only changed it in methods of payment, not necessarily whole region), but some people say it's better to get it in Brazil due to fewer problems.\n\nI heard it's better to not buy it in Lyres on your main account, because your apps can get regional lock (e.g. Tidal). Some people even had problems with currency in their other accounts, and you can change it only once for a year on an account, and in case of some emergency, you might have to be forced using Revolut cards. There is a lot of misinformation about that promo trick so verify it all, but there should be some reasonable amount of info scattered around the net already (e.g. hotukdeals, pepper.pl, or the site’s German counterpart).\n\n- 50GB on Dropbox from some Galaxy phones (e.g. S3) can no longer be redeemed (since ~2016 I believe)\n\n<https://www.multcloud.com/>\n\nService allowing moving files across various cloud accounts and services for free\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n(outdated)\n\n(for old MDX SDR 9.4 UVR model):\n\n(input audio file on Colab can be max 44kHz and FLAC only).\n\nOriginal MDX model B was updated and to get the best instrumental - you need to download invert instrumental from Colab.\n\nModel A is 4 stem, so for instrumental, mix it, e.g. in Audacity without vocals stem (import all 3 tracks underneath and render). Isolation might take up to ~1 hour in Colab, but recently it takes below 20 minutes on 3.00 min+ track.\n\nIf you want to use it locally (no auto inversion):\n\n<https://discord.com/channels/708579735583588363/887455924845944873/887464098844016650>\n\nB 9.4 model:\n\n<https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/MDX-Net>\n\nOr remotely (by CyberWaifu):\n\n<https://colab.research.google.com/drive/1R32s9M50tn_TRUGIkfnjNPYdbUvQOcfh?usp=sharing>\n\nSite version (currently includes 9.6 SDR model):\n\nhttps://mvsep.com/\n\nYou can choose between MDX A or B, Spleeter 2/4/5 Stems), UnMix 2/4 stems, but output is mp3 only)\n\nNew MDX model released by UVR team on mvsep is currently also available. If you have any problems with separating in mobile browser (file type not supported) add for a file additional extension: trackname.flac.flac.\n\nMDX is really worth checking. Even if you have some bleeding, and UVR model cuts some instruments in the background.\n\nCyberWaifu Colab troubleshooting\n\nIf you have a problem with noise after a few seconds of the result files, try to use FLAC. After an unsuccessful attempt of isolation, you can try restoring default runtime to default state in options. The bug happened a few days after releasing the Colab suddenly one day and the is prevailing to this day (so WAV no longer works). If you run the first cell to upload, and afterwards after opening file view, one of the 001-00X wav files is distorted (000 after few second) it means it failed, and you need to start over till you get all the files played correctly. But after longer isolation, it may cause reaching GPU limit, and you will not be able to connect with GPU. To fix it, switch to another Google account. If you have a problem, that your stems are too long, and mixed with a different song, restore default runtime settings as well, or delete manually\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n(outdated, deleted feature from HV Colab) Be aware that normalization turned on in Colab for instrumentals achieved with invertion may lead to occurrence of some artifacts during inversion, but general mix quality and snare in the mix might be more loud and sound more proper with normalization on, though it’s not necessarily universal solution in every case when the track might sound a bit off than more flat sound of normalization turned off (at least in some parts of it).\n\n## AI-killing tracks - difficult ones to get instrumentals (or vocals) - a lot of e.g. vocal (or instrumental) leftovers in current models\n\n\"instrument-wise, the problematic ones I can remember are:\n\nalto sax, soprano sax, any type of flutes/whistles (including synths), trombone slides, duduk, some organ sounds (close to sine wave sound)\" plus harmonica, erhnu, theremin, \"anything with violin”.\n\n\"even if some models do a bit better job than others, these instruments are still problematic because their timbres are close to [human] voice\"\n\nAnd in general - songs heavily sidechained, with robotic, heavily processed vocals, sometimes with lots of weird-sounding overdubs where some are missed (e.g. in trap), also laughs and moans.\n\nAnjok stated that the hardest genre for separation is metal and vocal-centered mixes. If the instrumental has a lot of noise, e.g. distorted guitars, the instrumental will come out muddier.\n\nTracks from the 70s and 80s can separate well. The 50-60s will be harder, e.g. recorded in mono. The early stereo era gets a little better.\n\nOpen [GSheet](https://docs.google.com/spreadsheets/d/1umHpYbh1NzXIkoLj_7aM2tFwX5SHFMdaxJnhe75j8bA/edit?usp=sharing) with more songs for everyone with a Google account to contribute (we kinda tried not to duplicate too many songs in both places too much).\n\nInstrumentals\n\n* Childish Gambino - Algorithm (robotic vocal effects, autotune, echoes, specific processing plugins on vocals, constant audible vocal residues for all current models)\n* tatu - Not gonna get us / Nas ne dogonyat (\"This song is impossible to quality separate by any model. Our dataset contains several songs by this artist, but this did not improve the result in any way. Just forget about it for a few years\") - IRC, the result was enhanced by slowing down, a.k.a. soprano option on x-minus.\n* Eric Prydz - Call On Me (aggressive sidechain compression “It's literally ditching the vocal part [and instruments] out to make room for the kick. So yeah, good luck in getting that vocal back.“)\n* Jamaroquoi - Virtual Insanity (\"One of the most difficult challenges of all my experience has been that is not very well handled even when maxing out quality in v5.\")\n\nOthers:\n\n* Beyonce - I'm that girl\n* Half Alive - Still feel\n* Queen - White Queen\n* Queen - Bohemian Rhapsody (very complicated song; mix of various vocals and guitars)\n* Queen - These Are The Days Of Our Lives - to evaluate BVE model and how it reacts with harmonies. If it works on this track, probably all the others will work.\n* J Dilla - Don't Cry (lots of so-called lo-fi “cuts” or chops of vocals from old vinyls, characteristic for hip-hop productions which are sometimes harder to separate)\n* Lots of Juice WRLD (his tracks have leftovers here and there, e.g. in \"Off the rip (Gamble)\")\n* Eminem - No regrets (constant low-volume vocal leftovers)\n* Louis The Child - Better not (problem with vocals with currently the best MDX23 MVSEP beta model, and also Demucs ft and Kim model)\n* A$AP Rocky - Fashion Killa (same as for “Night Lovell - Dark Light” - \"almost every AI can't separate the main vocals from the melody, the melody has a part that sounds like vocals, so just about every AI picks some of it up in the vocals section instead of the instrumental section\")\n* Porcupine - Trees Don't Hate Me (\"Quiet bits, loud bits, flutes and strings, things I can't even name plus all the usual suspects, drums etc, and Steven Wilson has a crisp clean voice a lot of the time\")\n* Thomas Anders - You Will Be Mine (vocal residues in instrumentals using all current models for April 2023)\n* Modern Talking - One in a million (also minimum vocal residues)\n* Modern Talking - Mrs. Robota (too many synthesizer effects bleed in vocals of MVSEP MDX23 10.04.23 Ensemble [so consisting of only kim vocal 2, kim inst and Demucs 4 yet)\n* Crush 40 - Live & Learn'\n* JPEGMAFIA - HAZARD DUTY PAY! (hard to get vocals from rapping section; Kim vocals 1)\n* Bjork - All Is Full Of Love (2:06, 2:12, 2:50 and throughout from that point, the vocals partially still bleed.\n  The song was tested on MDX Inst 3, Inst HQ\\_1 and 3, Inst Main, Kim Inst, HTDemucs, HTDemucs\\_FT and 6S, and ensembles including (Kim vocal 2, Kim Inst, Inst Main, 406, 427, HTDemucs FT) and (Kim Inst, Voc FT, Inst HQ 3)\n* Stray Kids - GO生 (GO LIVE)\n* Royce da 5'9\" - I'm The King (Vinyl rip[)](https://www.youtube.com/watch?v=2fKW6q8tcVs)\n* Mike Mareen - Love Spy\n* Bomfunk MC's - Freestyler\n* Lasgo - Something\n* South Park - Chocolate Salty Balls (bad results with most models)\n* Tally Hall - Never meant to know (the almost impossible goal for now is to remove \"with\" in 2:39).\n* WWE - Demon in Your Dreams - (here's a track that sounds bad - the parts where the vocals usually are sound muffled and dull, guitars are barely audible - HQ\\_3, Demucs 6s tested)\n* Taylor Swift - Better Than Revenge (Taylor's Version) (background vocals in all models including HQ3 and voc\\_ft - using Dolby Atmos version, and (I think just) muting (?vocal) channel(s) helped)\n* Bon Jovi - I believe\n* Twenty One Pilots - The Hype (“voc\\_ft leaves too many perc/drums/synths [in vocals] that sounds like t's and s' or just sound like vocals, and it's really annoying, also because of this nearly no other model can separate it either because they think it's part of the vocals, but it's mostly just synths, Ripple put a lot of echo into the other stem”)\n* The Weeknd - Until I Bleed Out (vocal stem includes a bunch of drum and synth bleeding. Tested on htdemucs\\_ft, VocFT, Inst HQ 3, InstVoc HQ 2, Kim 1, Kim 2, Kim inst, and ensembles (htdemucs\\_ft, VocFT, Inst HQ 3, InstVoc HQ 2), (Kim 2, Kim inst, Inst Main, 406, 427, htdemucs\\_ft) and (Kim inst, VocFT, Inst HQ 3))\n* Travis Scott - Nightcrawler (vocal residues in 1:27, 2:24, 3:56, and 4:50 using BS-Roformer 1296 in UVR beta and overlap 2/8 - less than in other ech models though it's more muddy, though 04.24 model on MVSEP less, [discussion](https://discord.com/channels/708579735583588363/773763762887852072/1229378741226963005))\n* Shaft - Mambo Italiano\n* Yello - Oh Yeah (both Aufr33 suggestions)\n* song\\_45\\_vocals in the multisong dataset have some weird effects which some models struggle with\n* Black Gryph0n - Jester (Pomni's Song) (feat. Lizzie Freeman) “the vocoded vocals in the drop of this song [are] impossible to isolate with current models either”\n* Isabela Souza - Tu color para pintar - violin put in vocal stem in a lot of models (e.g. Bas Curtiz FT 25.10.24)\n* George Michael - Amazing (Vocal bleed starting at 3:52 with zfturbo mel roformer 2024.10, residues with Dango too)\n* Eminem - The Warning (quite laugh residue at 0:04; instr unwa v1)\n* Gregory Brothers - Dudes a Beast (trumpets in vocal stem at 0:51; unwa’s beta4 and inst v1e, fixed in Gabox voc\\_fv5)\n* DJ Krust / Saul Williams - Coded Language (“I can hear a light bleed of the lowcutted bassline at times” becruily Mel)\n* Dayseeker - Sleeptalk (“serious volume clipping issues throughout when removing vocals (...) I've tried every model and several ensembles, edited UVR settings, and also tried Dango.” - comfortable couch)\n* Darko - Starfire (almost everything can't remove that particular scream; shift conversion pitch method kinda did - mesk)\n* Meshuggah - Ayahuasca Experience\n* Fredrik Thordendal's Special Defects - Vitamin K Experience (A Homage to The Scientist / John Lilly)\n* Thievery Corporation - Culture of Fear (“only supported by MDX Kar v2 to have an instrumental with backing vocals, all the other karaoke models/bve fail” dca)\n* Inside Out · Bad Suit (“all UVR models I've tried (all that jarredou had on his Colab) struggle with the slap bass. Slap sounds (which contains mid to high frequencies) goes into guitar stem, so that's still hard” - 03.2025)\n* “Songs that were produced at Cheiron Studios from the 90s/00s still don't isolate well\n* the mixes are so d\\*\\*n complex” - JadDeluxe\n* Yello - Oh Yeah\n* Duck Sauce - Barbra Streisand (Radio Edit) (sidechain is the problem I assume)\n* Al Bano & Romina Power - Sharazan\n* Billie Eilish - L’AMOUR DE MA VIE [OVER NOW EXTENDED EDIT] (“it seems like none of the existing models can pick up those high pitched vocals” - black\\_as\\_night)\n* AJR - 100 Bad Days (~“always my go to with models to see which includes a little \"aaaahhhhh\" vocal” - mrmason347)\n\n*Complex vocals and vocoder - severe bleeding on every model tested as of 06.12.24 including Dango (dca100fb8 contributions)*\n\n* Tame Impala - One More Year\n* Daft Punk - Around the World\n* Daft Punk - Television Rules the Nation\n* Daft Punk - Doin' it Right\n* Daft Punk - Human After All\n* Daft Punk - Get Lucky\n* Daft Punk - Lose Yourself to Dance\n* Daft Punk - Robot Rock\n* Deep Forest - Sweet Lullaby (Version 1992) (yodelling difficult to remove; BV bleed starting at ~1:21)\n* Radiohead - The National Anthem (vocal effects difficult to separate; LV effects bleed starting at ~1:36)\n* Moby - One Last Time (vocal bleed in instrumental; LV effects bleed starting at 1:39)\n* Air - Run\n* Kodex - Do jutra (esp. at 1:30 vocals bleeding in a lot of models with low bleedless metric)\n\n*Bleeding on every model tested as for 06.12.24 incl. Dango -\nnot so severe, but containing vocal pop-ins (dca100fb8):*\n\n* \"Supersonic\" by Jamiroquai (BV bleed starting @~0:07)\n* \"Amazing\" by George Michael (BV bleed starting @~3:45)\n* \"El lilady\" by Samo Zaen (BV bleed starting @~3:30)\n* \"Here Comes the Rain Again\" by Eurythmics (BV bleed starting @~1:17)\n* \"Sun is Shining\" by Bob Marley & The Wailers (BV bleed starting @~1:52)\n* \"Sun is Shining (Kaya 40 Mix)\" by Bob Marley & The Wailers (BV bleed starting @1:52)\n* \"Road to Zion\" by Damian Marley (BV bleed starting @~0:00)\n* \"Les Rubans\" by Daniel Masson (LV bleed starting @~2:13)\n* \"Remember\" by Air (BV bleed starting @~0:46 + LV bleed starting @~0:58)\n* \"Strangers\" by Portishead (LV bleed starting @~0:30)\n* \"Samsam (Chanson du générique)\" (BV bleed starting @~0:00)\n* \"Aicha\" by Khaled (BV bleed starting @~3:45)\n* \"Forest Hymn\" by Deep Forest (BV bleed starting @~0:33)\n* \"33 Degree\" by Thievery Corporation (LV effects bleed starting @~1:44)\n* \"Run\" by Air (BV bleed starting @~1:08)\n* \"An Indian Summer\" by Al-Pha-X (BV effects bleed starting @~3:13)\n* \"Forest Hymn\" by Deep Forest (BV effects bleed starting @~0:33)\n* \"In The Air Tonight\" by Phil Collins (LV effects bleed starting @~3:03)\n* Goldfrapp - Utopia (BV bleed starting @~0:00)\n* Depeche Mode - Sacred (BV bleed starting @~0:00)\n* Seal - Love's Divine (BV bleed starting @~3:26)\n* Da Lata - Alice (No Pais Da Malandragem) (LV or BV bleeding at a different time with each model)\n* ABBA - Gimme! Gimme! Gimme! (A Man After Midnight) (BV bleed starting @1:22)\n* 311 - Beyond the Gray Sky (LV effects bleed @~1:24)\n\nVocal bleed in instrumentals using MVSEP MelRoformer 2024.10 model which Dango fixes (dca100fb8)\n\n- \"For You\" by Coldplay (BV bleed starting @1:32 --> v1e, Kim Mel or even Dango doesn't have this issue)\n\n- \"L'été indien\" by Joe Dassin (BV bleed starting @0:24 --> v1e or Dango fixes the problem)\n\n- \"Porcelain\" by Moby (LV bleed starting @2:10 --> Dango and SCNet XL high fullness models on MVSEP fix the issue)\n\n- \"Night Bird\" (from \"Essence of the Forest\" album) by Deep Forest (BV bleed starting @0:28 --> Dango fixes the issue)\n\n- \"Desert Walk\" (from \"Essence of the Forest\" album) by Deep Forest (BV bleed starting @0:03 --> Dango fixes the issue)\n\n- \"Desert Walk (Version 1992)\" by Deep Forest (BV bleed starting @0:07 --> Dango fixes the issue)\n\n- \"Love Is Gone (Fred Riester & Joachim Garraud Radio Edit Remix)\" by David Guetta (BV bleed starting @0:49 --> Dango fixes the issue)\n\n- \"Attention Mesdames et Messieurs\" by Michel Fugain & Le Big Bazar (BV bleed starting @0:50 --> Dango fixes the issue)\n\n- \"Everything In Its Right Place\" by Radiohead (BV bleed starting @0:52 --> Dango fixes the issue)\n\n- \"Sweet Dreams (Are Made Of This)\" by Eurythmics (BV bleed starting @0:48 --> Dango fixes the issue)\n\n- \"Lebanese Blonde\" by Thievery Corporation (BV bleed starting @0:52 --> Dango fixes the issue)\n\n- \"Lift Me Up\" by Moby (LV chops starting @2:45 --> Dango fixes the problem)\n\n- U.S.A. for Africa - We Are The World (BV bleed starting @5:34, fixed using v1e)\n\n- Led Zeppelin - Kashmir (LV bleed starting @2:16, fixed using v1e)\n\n- Jamiroquai - White Knuckle Ride (LV bleed starting @0:15, fixed using v1e)\n\nDuplicates from the [GSheet](https://docs.google.com/spreadsheets/d/1umHpYbh1NzXIkoLj_7aM2tFwX5SHFMdaxJnhe75j8bA/edit?usp=sharing)\n\n* Moby - Porcelain (in Gsheet; in instrumental, vocal reverb bleeding at 1:00, and bleed at 2:10/20, all good MDX models, GSEP, MDX23 by ZFTurbo tested, still getting more or less the same results, Dango and SCNet XL fixes the issue)\n* Queen - March Of The Black Queen (always causes issues, the best result on Full Band 8K FFT, as for 06.08.23, but still lot of BV is missed)\n* Night Lovell - Dark Light (\"almost every AI can't separate the main vocals from the melody, the melody has a part that sounds like vocals, so just about every AI picks some of it up in the vocals section instead of the instrumental section\")\n* Bob Marley - Sun is Shining (all current models bleed in the same timestamps: 1:02, 1:42, 1:54, 1:57, 2:50)\n* Daft Punk - Give back life to music (problem with vocoder in the vocals rendering bad instrumental results)\n* Daft Punk - Within (robotic voices)\n* Frank Ocean - White Ferrari\n\nList of songs where all the current Mel-Roformer instrumental models fail in recognizing some instruments correctly in the instrumental track (they mainly struggle with sax and harmonica), whereas not the case with SCNet XL except for talkbox, theremin and erhu (SCNet still fail at it for these) by dca100fb8\n\n* Cowboy Junkies - I Don't Get It (harmonica picked up in vocal, starting @00:12)\n* Pink Floyd - Shine On You Crazy Diamond (Parts I-V) (saxophone picked up in vocal, starting @11:09)\n* Travis - Sing (FX picked up in vocal, starting @00:00)\n* Thievery Corporation - Revolution Solution (FX picked up in vocal, starting @00:00)\n* Thievery Corporation - Safar (The Journey) (instrument picked up in vocal, starting @00:02)\n* Samsam Song (International Version) (flute picked up in vocal, starting @00:00)\n* Portishead - Humming (FX and theremin picked up in vocal, starting @00:00)\n* Asian Dub Foundation - Tu Meri (instrument picked up in vocal, starting @02:33)\n* Goldfrapp - Horse Tears (organ picked up in vocal, starting @00:00 + elec guitar @01:28 + talkbox @01:22)\n* Goldfrapp - Lovely Head (talkbox picked up in vocal, starting @01:15)\n* Goldfrapp - Pilots (talkbox picked up in vocal, starting @00:00)\n* Moby - Lift Me Up (synth picked up in vocal, starting @00:00)\n* Phil Collins - In The Air Tonight (elec guitar picked up in vocal, starting @02:10)\n* Pink Floyd - Money (saxophone picked up in vocal, starting @02:00)\n* Thievery Corporation - Radio Retaliation (FX picked up in vocal, starting @00:00)\n* Thierry David - Huong Vietnam (erhu picked up in vocal, starting @00:35)\n* Supertramp - The Logical Song (saxophone picked up in vocal, starting @01:52)\n* Supertramp - School (harmonica picked up in vocal, starting @00:00)\n* Archive - Again (harmonica picked up in vocal, starting @00:00)\n* Deep Forest - Desert Walk (Version 1992) (flute picked up in vocal, starting @00:00)\n* Deep Forest - Dignity (elec guitar picked up in vocal, starting @00:36)\n\nVocal bleeding not existing on MVSEP’s SCNet XL high fullness vs Roformers\n\n* Tame Impala - On Track\n\n*You can visit our* [*#request-separation*](https://discord.com/channels/708579735583588363/1020514529601409084) *channel to look for some interesting cases of people seeking help with some specific songs they struggle with and a new* [*#your-bad-results*](https://discord.com/channels/708579735583588363/1325284456382075041) *channel.*\n\nSongs to compare weaker vs more effective models in instrumentals (e.g. inst 464/Kim inst or HQ\\_2/3 or 4 vs all others)\n\n* O.S.T.R. - Incognito (non Snap Jazz version) (lo-fi Polish hip-hop with constant vocal leftovers in all models and AIs except MDX-UVR inst 1-3, main where inst 3/464 performs the best, it’s also good to test an influence of various chunks settings at 1:53. Publicly available songs for datasets usually don't include hip-hop at all, especially not from some low, weird sounding languages with loud, bassy, over processed voices. In Snap Jazz version also in 464 there are e.g. less vocal residues than on GSEP - still slightly hearable).\n* Kaz Bałagane - Stara Bida (constant vocal leftovers in all models and AIs except MDX-UVR inst 1-3 and inst main where inst 3/464 performs the best [good to test weaker models or specific epochs], flute from 1:11 gets deleted on MDX-UVR HQ models).\n* The Weeknd - Hardest To Love (htdemucs\\_ft did well here).\n* NNFOF - Jeśli masz nierówno pod sufitem (all MDX-UVR instrumental models will filter out inconsistently flute from the track, while GSEP handles that song well - it happens for all kinds of songs containing flute and oriental instruments in these models)\n* Ace of Base songs (“any of them have those flute-ish synthetic instruments which have always been a nightmare in terms of getting a flawless a cappella”).\n* Różewicz Interpretacje (Sokół) - Wicher (very deep and low rap voices cause problems with weaker models, e.g. original MDX23 on mvsep1.ru (now MVSEP.com)/ZFTurbo MDX23 Colab; you can also try out also Sokół - Nic and Sokół - Wojtek Sokół albums)\n* Chaos (O.S.T.R., Hades) - Powstrzymać Cię (lots of bleeding in e.g. MDX23 model on MVSEP in 2:00. Not that much in Kim inst)\n* DJ Skee & The Game (from 2012 mixtape) or Tyler the Creator (album version from 2011) - Yonkers (same beat prod. by Tyler the Creator)\n\n(the first is from a mixtape with more cuts/vocal chops difficult to get rid of. HQ models usually confuse vocal chops with vocals, but here it might be useful)\n\n* Avantasia - The Scarecrow (HQ3 generally has problems with (here bowed) strings. mdx\\_extra from Demucs 3 had better result, sometimes 6s model can be good compensation for these lost instruments)\n* Static Major - CEO (if someone wants to test out isolation of many vocal layers using e.g. Melodyne)\n* oikakeru yume no saki de - sumikeke (here vocal layers extraction Karaoke models and Melodyne fail)\n* Dizzy Wright - No Writer Block (hard track to keep hi-hats consistent throughout the whole output with even some snares - it can all get easily washed out, also more vocal leftovers in MDX23C ensemble on MVSEP1.ru vs MDX23 2.1 Colab [despite better SDR], not bad GSEP result, but it makes hi-hats like a bit out of rhythm probably due to some built-in processing in GSEP)\n* Dariacore - will work for food (generally that whole Dariacore album can be tasking due to its loudness and “craziness”)\n* Centrala Katowice - Reprezentowice (first version of GSEP in 192kbps was consistently failing in picking up vocals and also leaving strong vocal residues)\n* Cher - The Music's No Good Without You (“there is progress on removing Cher's vocals. Previously, this song was an AI killer, but the mel-rofomer model removes the vocals almost completely (...) Removing vocals from \"Believe\" is no problem either.” Aufr33)\n\nA list of songs which have vocal bleed in the instrumental using unwa's v1e model\n“These songs also present issues after using Mel 2024.10 and BS 2024.08 from MVSEP, but the timestamps where the bleed occurs might be different” plus Mel 2024.10 model might have less of these residues by dca100fb8\n\n- \"Supersonic\" by Jamiroquai (BV bleed starting @0:07)\n\n- \"Amazing\" by George Michael (BV bleed starting @3:45)\n\n- \"Here Comes the Rain Again\" by Eurythmics (BV bleed starting @1:17)\n\n- \"Porcelain\" by Moby (LV bleed starting @1:00 --> Dango and SCNet XL fix the issue)\n\n- \"Sun is Shining\" by Bob Marley & The Wailers (BV bleed starting @1:52)\n\n- \"Sun is Shining (Kaya 40 Mix)\" by Bob Marley & The Wailers (BV bleed starting @1:52)\n\n- \"Give Life Back to Music\" by Daft Punk (LV bleed starting @0:49 --> Dango fixes the issue)\n\n- \"Road to Zion\" by Damian Marley (BV bleed starting @0:00)\n\n- \"El lilady\" by Samo Zaen (BV bleed starting @3:30)\n\n- \"Night Bird\" (from \"Essence of the Forest\" album) by Deep Forest (BV bleed starting @1:12 --> Dango fixes the issue)\n\n- \"Desert Walk\" (from \"Essence of the Forest\" album) by Deep Forest (BV bleed starting @0:44 --> Dango fixes the issue)\n\n- \"Desert Walk (Version 1992)\" by Deep Forest (BV bleed starting @0:18 --> Dango fixes the issue)\n\n- \"Love Is Gone (Fred Riester & Joachim Garraud Radio Edit Remix)\" by David Guetta (BV bleed starting @1:15 --> Dango fixes the issue)\n\n- \"Attention Mesdames et Messieurs\" by Michel Fugain & Le Big Bazar (BV bleed starting @0:50 --> Dango fixes the issue)\n\n- \"Strangers\" by Portishead (LV bleed starting @0:30)\n\n- \"Everything In Its Right Place\" by Radiohead (BV bleed starting @3:28 --> Dango fixes the issue)\n\n- \"The National Anthem\" by Radiohead (LV effects bleed starting @1:36)\n\n- \"Samsam (Chanson du générique)\" (BV bleed starting @0:00)\n\n- \"33 Degree\" by Thievery Corporation (LV effects bleed starting @1:42)\n\n- \"Sweet Dreams (Are Made Of This)\" by Eurythmics (BV bleed starting @0:48 --> Dango fixes the issue)\n\n- \"Sweet Lullaby (Version 1992)\" by Deep Forest (BV bleed starting @1:21)\n\n- \"Lebanese Blonde\" by Thievery Corporation (BV bleed starting @0:52 --> Dango fixes the issue)\n\n- \"Forest Hymn\" by Deep Forest (BV bleed starting @0:33)\n\n- \"Aicha\" by Khaled (BV bleed starting @3:45)\n\n- \"Run\" by Air (BV bleed starting @1:08)\n\n- \"Remember\" by Air (LV bleed starting @0:31)\n\n- \"Doin' it Right\" by Daft Punk (LV bleed starting @1:21)\n\n- \"Human After All\" by Daft Punk (LV bleed starting @0:49)\n\n- \"Get Lucky\" by Daft Punk (BV bleed starting @4:06)\n\n- \"Lose Yourself to Dance\" by Daft Punk (BV bleed starting @1:55)\n\n- \"Robot Rock\" by Daft Punk (LV bleed starting @1:02)\n\n- \"J'ai demandé à la lune\" by Indochine (BV bleed starting @1:45)\n\n- \"Hey Jude\" by The Beatles (LV bleed starting @0:00)\n\n- \"Within\" by Daft Punk (LV bleed starting @1:42)\n\n- \"Lift Me Up\" by Moby (LV chops starting @2:45)\n\n- \"Nothing Else\" by Archive (LV effect bleed starting @1:13)\n\n- \"One More Year\" by Tame Impala (BV bleed starting @1:03)\n\n- \"Around the World\" by Daft Punk (LV bleed starting @3:57)\n\n- \"Television Rules the Nation\" by Daft Punk (LV bleed starting @1:49)\n\nList of songs which have important crossbleeding of vocals in instrumental using \"basic\" SCNet XL model from mvsep (don’t confuse with undertrained SCNet XL on ZFTurbo GitHub) by dca100fb8\n\n* Thievery Corporation - Where It All Starts (crossbleeding starting @0:15)\n* Thievery Corporation - Le Monde (crossbleeding starting @0:02)\n* Thievery Corporation - Lebanese Blonde (crossbleeding starting @0:52)\n* Jamiroquai - Canned Heat (crossbleeding starting @0:56)\n* Jamiroquai - Black Crow (crossbleeding starting @0:55)\n* Jamiroquai - Supersonic (crossbleeding starting @0:07)\n* Andru Donalds - Mishale (crossbleeding starting @0:50)\n* Moby - Porcelain (crossbleeding starting @2:10)\n* George Michael - Amazing (crossbleeding starting @0:28)\n* Samo Zaen - El lilady (crossbleeding staerting @3:30)\n* Zero 7 - In The Waiting Line (crossbleeding starting @0:48)\n* Coldplay - For You (crossbleeding starting @1:33)\n* Kool & The Gang - Fresh (Single Version) (crossbleeding starting @0:51)\n* Kool & The Gang - Too Hot (Single Version) (crossbleeding starting @1:07)\n* Khaled - Aicha (crossbleeding starting @3:22)\n* Keane - Put The Radio On (crossbleeding starting @1:30)\n* Nelly Furtado - Say It Right (crossbleeding starting @0:23)\n* Black Sabbath - Planet Caravan (crossbleeding starting @0:10)\n* Groundation - Smile (crossbleeding starting @0:10)\n* Eurythmics - Here Comes the Rain Again (crossbleeding starting @1:23)\n* Damian Marley - Road to Zion (crossbleeding starting @0:10)\n* David Guetta - Love Is Gone (Fred Riester & Joachim Garraud Radio Edit Remix) (crossbleeding starting @1:15)\n* Indochine - J'ai demandé à la lune (crossbleeding starting @1:47)\n* Bob Marley & The Wailers - Sun is Shining (crossbleeding starting @1:53)\n* Morcheeba - Blindfold (crossbleeding starting @0:53)\n* Daniel Masson - Les Rubans (crossbleeding starting @2:13)\n* Depeche Mode - Sacred (crossbleeding starting @0:00)\n* Goldfrapp - Utopia (crossbleeding starting @0:11)\n* Da Lata - Alice (No Pais Da Malandragem) (crossbleeding starting @0:38)\n* Joe Dassin - L'été indien (crossbleeding starting @0:24)\n* Led Zeppelin - Kashmir (crossbleeding starting @2:16)\n\nList of songs which have bleed in the vocal track using the new BS-Roformer Revive v1 experimental vocal model by unwa (dca’s contribution as well):\n\n* Travis - Sing (FX picked up in vocal, starting @00:09)\n* Thievery Corporation - Safar (The Journey) (instrument picked up in vocal, starting @00:03)\n* Asian Dub Foundation - Tu Meri (instrument picked up in vocal, starting @02:33)\n* Thievery Corporation - Radio Retaliation (FX picked up in vocal, starting @00:00)\n* Thierry David - Huong Vietnam (erhu picked up in vocal, starting @00:35)\n* Deep Forest - Dignity (elec guitar picked up in vocal, starting @00:36)\n* Zaz - Je veux (whole kazoo solo picked up in vocal, starting @02:08)\n* Archive - Fool (harmonica picked up in vocal, starting @00:00)\n* Portishead - Humming (FX and theremin picked up in vocal, starting @00:00)\n* Moby - Lift Me Up (synth picked up in vocal, starting @00:00)\n* The Cardigans - My Favourite Game (guitar picked up in vocal, starting @00:10)\n* Talk Talk - It's My Life (FX picked up in vocal, starting @00:05)\n* Radiohead - The National Anthem (various FX and instruments picked up in vocal throughout the song)\n* Zero 7 - Distractions (synth/FX bleed @00:07/@04:29)\n* Porcupine Tree - What Happens Now? (synth/FX/guitar bleed @~07:31)\n* Porcupine Tree - Start of Something Beautiful (synth bleed @04:55)\n* Porcupine Tree - The Start of Something Beautiful (Live) (synth bleed @04:38)\n* Porcupine Tree - Don't Hate Me (guitar bleed @00:29/@03:54)\n* Porcupine Tree - Dark Matter (synth bleed @03:18/@05:42)\n* Porcupine Tree - Arriving Somewhere but Not Here (synth bleed @04:12, elec guitar bleed @04:46)\n* Porcupine Tree - Way out of Here (Live) (synth bleed @00:13, @06:36)\n* Radiohead - Pyramid Song (FX bleed @00:05/@04:08)\n* Talk Talk - Happiness is Easy (wind instrument bleed @04:20)\n* The Cinematic Orchestra - Evolution (scratching bleed @04:38)\n* Pink Floyd - Dogs (elec guitar/fx/synth bleed at times)\n* Archive - Again (harmonica/fx/synth bleed at times)\n* Archive - Lights (fx/synth bleed at times)\n\nList of song where Roformer SW and BS 2025.06 solves the problem of vocals/BV crossbleeding in some songs by dca100fb8\n\n* Jamiroquai - Canned Heat (no crossbleeding)\n* George Michael - Amazing (no crossbleeding)\n* Samo Zaen - El lilady (no crossbleeding)\n* Coldplay - For You (no crossbleeding)\n* Kool & The Gang - Fresh (Single Version) (no crossbleeding)\n* Kool & The Gang - Too Hot (Single Version) (no crossbleeding)\n* Khaled - Aicha (no crossbleeding)\n* Nelly Furtado - Say It Right (no crossbleeding)\n* Eurythmics - Here Comes the Rain Again (no crossbleeding)\n* Eurythmics - Sweet Dreams (Are Made of This) (no crossbleeding)\n* David Guetta - Love Is Gone (Fred Riester & Joachim Garraud Radio Edit Remix) (no crossbleeding)\n* David Guetta - Love Don't Let Me Go (no crossbleeding)\n* Bob Marley & The Wailers - Sun is Shining (no crossbleeding)\n* Bob Marley & The Wailers - Running Away (no crossbleeding)\n* Daniel Masson - Les Rubans (no crossbleeding)\n* Michael Jackson - Workin' Day And Night (no crossbleeding)\n* Radiohead - Morning Bell (from Kid A) (no crossbleeding)\n* Radiohead - Scatterbrain (no crossbleeding)\n* Portishead - Strangers (no crossbleeding)\n* Moby - Lift Me Up (no crossbleeding)\n* Seal - Love's Divine (no crossbleeding)\n\nUnwa’s BS Roformer Resurrection Inst model fixing crossbleeding of vocals in the instrumental by the first time. So it fixes the crossbleeding problems like BS Roformer SW/2025.06/07 did on these songs, for some reason (dca):\n\n* Bob Marley - Running Away (Kaya 40 Mix)\n* Samo Zaen - Tonight\n* Eurythmics - Here Comes The Rain Again\n* Eurythmics - Sweet Dreams (Are Made of This)\n* George Michael - Amazing\n* Kool & The Gang - Fresh (Single Version)\n* Kool & The Gang - Too Hot (Single Version)\n\nSongs which can't be separated to an instrumental with BVs (generally because lead vocals can't be differentiated from backing vocals) by dca100fb8 (from before becruily karaoke model release) by dca100fb8\n\n* Nelly Furtado - Maneater (@0:16, LV are counted as BV because of the panning, but using stereo 50% with uvr bve v2 doesn't solve the issue)\n* Lenny Kravitz - Low (@1:03, BV are counted as LV by BVEs or Kar models, including Dango and LALAL.AI)\n* Timbaland - The Way I Are (@0:25, LV are counted as BV by BVEs or Kar models, including Dango and LALAL.AI, uvr bve v2 50% stereo LV panning failed)\n* Timbaland - Give It To Me (@0:25, LV are counted as BV by BVEs or Kar models, including Dango and LALAL.AI, uvr bve v2 50% stereo LV panning failed)\n* Timbaland - Morning After Dark (@2:02, BVEs and Kar models couldn't differentiate LV from BV, including Dango and LALAL.AI, uvr bve v2 50% stereo LV panning failed)\n\nFixed:\n\n- Simply Red - Sunrise (@0:45, LV are counted as BV because of the panning, but using stereo 50% with uvr bve v2 doesn't solve the issue; fixed by Dango Backing Vocal Keeper by processing left then right channel)\n\nList of difficult songs to extract the BVs with becruily's karaoke model - divided in three categories by dca100fb8\n\nVery important:\n\n* Phil Collins - In The Air Tonight (LVs crossbleeding during the whole song starting @00:52; BVE v2 fixes it)\n* UB40 - Red Red Wine (LVs crossbleeding during the whole song starting @00:00\n* Supertramp - It's Raining Again (it still has the lead vocals during the whole song after conversion to Inst w/ BV; MDX Kar v2 fixes the problem)\n* Phil Collins - I'm Not Moving (LV crossbleeding, It seems Becruily Kar has issues with the way Phil Collins's voice is often mixed)\n\nImportant:\n\n* Charlie Winston - Kick the Bucket (BVs are missing starting @01:42)\n* Coldplay - Daylight (LVs are still present starting @00:28)\n* Massive Attack - Spying Glass (LVs are still present in the whole song starting @00:19)\n* Indochine - J'ai demandé à la lune (BVs are missing starting @02:28)\n* Pierpoljak - Pierpoljak (Radio Edit) (BVs/LVs crossbleeding starting @00:05)\n* Jamiroquai - King for a Day (BVs are missing starting @00:55)\n* Jamiroquai - Light Years (BVs are missing starting @00:22)\n* Jamiroquai - Rock Dust Light Star (BVs are missing starting @01:42)\n* ABBA - Dancing Queen (LVs are still present starting @00:19)\n* Archive - You Make Me Feel (BVs/LVs crossbleeding starting @00:24)\n* Justin Timberlake - Rock Your Body (BVs/LVs crossbleeding starting @00:28)\n* Simply Red - Turn It Up (BVs/LVs crossbleeding starting @00:04)\n* Justin Timberlake - SexyBack (BVs/LVs crossbleeding starting @00:15)\n* Demis Roussos - From Souvenirs to Souvenirs (BVs/LVs crossbleeding starting @01:34)\n\nLess important:\n\n* Jamiroquai - Hot Tequila Brown (LVs are still present starting @01:19)\n* Bee Gees - How Deep Is Your Love (LVs are still present starting @00:47)\n* Simply Red - How Could I Fall (BVs are missing starting @01:40)\n* Simply Red - Something Got Me Started (BVs are missing starting @00:53)\n* Phil Collins - Don't Let Him Steal Your Heart Away (BVs/LVs crossbleeding starting @01:51)\n* Phil Collins - Another Day in Paradise (BVs/LVs crossbleeding starting @01:13)\n* Phil Collins - That's Just the Way It Is (BVs/LVs crossbleeding starting @00:59)\n* Groundation - Confusing Situation (LVs bleed starting @00:24 + crosbleeding starting @01:06)\n* Wham! - Everything She Wants (BVs missing starting @00:41)\n* Thievery Corporation - Thief Rockers (BVs missing starting @00:00)\n* Thievery Corporation - Radio Retaliation (BVs missing starting @00:01)\n\nDue to uncommon panning:\n\n* Nelly Furtado - Maneater (@0:16, LV are counted as BV because of the panning, but using stereo LV Panning doesn't solve the issue)\n* Lenny Kravitz - Low (@1:03, BV are counted as LV by BVEs or Kar models)\n* Timbaland - The Way I Are (@0:25, LV are counted as BV by BVEs or Kar models, LV panning failed)\n* Timbaland - Give It To Me (@0:25, LV are counted as BV by BVEs or Kar models, LV panning failed)\n* Timbaland - Morning After Dark (@2:02, BVEs and Kar models couldn't differentiate LV from BV, LV panning failed)\n* Simply Red - Sunrise (@0:45, LV are counted as BV because of the panning, but using Lead Vocal Panning doesn't solve the issue)\n* Jamiroquai - Automaton (Lead vocal panning didn't help)\n* Thievery Corporation - Culture Of Fear (MDX v2 works)\n* Jamiroquai - Supersonic (Lead vocal panning didn't help)\n* Jamiroquai - Travelling Without Moving (Lead vocal panning didn't help)\n* Philip Bailey & Phil Collins - Easy Lover (LVs very difficult to differentiate from BVs)\n* Thievery Corporation - Sol Tapado (Lead vocal panning didn't help)\n* Moby - Landing (LV panning didn't help)\n* Moby - We Are All Made of Stars (LV panning didn't help)\n* The Weeknd - Sacrifice (@01:25, LV panning didn't help)\n* Nitin Sawhney feat. The London Symphony Orchestra - Songbird (LV panning dind't help)\n\nUncategorized\n\n* Night Lovell - Dark Light (thx 97chris)\n* Supertramp - Dreamer (LVs difficult to distinguish from BVs - dca)\n\nHappens on Becruily & Frazer/Anvuew/MVSEP Team/Dango models\n\n* Massive Attack - Spying Glass\n* Demis Roussos - From Souvenirs to Souvenirs\n* Archive - You Make Me Feel\n* Justin Timberlake - Rock Your Body\n* Thievery Corporation - Sol Tapado\n* Moby - Landing\n* Nitin Sawhney feat. The London Symphony Orchestra - Songbird`\n* Miley Cyrus ft. Future - My Darlin' (makesomenoiseyuh)\n\nWarning. If you upload lots of music in on our server (or any other server), recently our long-term users receive warnings from Discord about possible deletion of their accounts and whole good results channel got deleted - we advise sharing only links to e.g. GDrive or any other cloud instead of uploading music directly to Discord. So far our user has received two warnings from Discord without deleting the account yet. The whole good results channel got deleted after linking to uploads instead of uploading after the last clean-up we got. Recently we added bot automatically deleting audio files uploaded directly to Discord instead of links added and the channel has been reopened.\n\n## Training models guide\n\n[Read mesk’s guide](https://docs.google.com/document/d/1jUcwiPfrJ8CpHqXIRHuOu70cFDMv_n-UzW53iaFuM9w/edit?tab=t.0) (new link #2), then proceed below for arch explanations and more details.\n\nAs for a training code for Roformers and MDX23C, SCNet or adding new archs, most people use [MSST](https://github.com/ZFTurbo/Music-Source-Separation-Training) by ZFTurbo. It’s also provided with a bunch of documentation.\n\n“You can start with Sucial MSST [WebUI](https://github.com/SUC-DriverOld/MSST-WebUI) ([link](https://huggingface.co/Sucial/MSST-WebUI/tree/main/1.7.0)). I use that to train all my models” - Gabox\n\n*Introduction*\n\n“There are three components to the model scaling laws.\nThey are the size of the data set, the number of parameters in the model, and the computational resources.” unwa\nFor training, depending on model type (explained above), it can be e.g. three files for training e.g. vocal model - vocals, instrumental, and mixture. When you’ll try to train without mixture, the results will be “terrible” (iirc it was said somewhere in times of Mel Kim’s model).\n\n\"You can train any sound you want with any architecture (MDX-Net, Demucs, Spleeter)\"\n\nBut don’t use Spleeter, it’s deprecated since so many archs were released ([Kim](https://cdn.discordapp.com/attachments/911050124661227542/1158015998318882856/image.png?ex=651ab5f0&is=65196470&hm=0fe6a1f080140fe3887c64a34a9475ea353561691151c49dce4ede42f362943d&)).\n\nJust be aware that not every arch is a good choice for some specific tasks or instruments.\n\n*(Among others, the following based also on Anjok’s* [*interview*](https://www.youtube.com/watch?v=-pcVN54cgw0)*, around 0:40:00)*\n\nFor training a new model, use at least 200 samples for such a model to achieve any good results. Anything below that might give you the results you might not be happy with, and of course, above that will give better results.\n\nFor BS/Mel Roformers, 525 songs were not enough to train a good model from scratch at some point.\n\nFor fine-tuning, have at least 500 pairs - Gabox\n\nQ: Anyone know how many songs are generally needed to finetune a Mel-Roformer model\n\nA: Few thousand - Unwa.\nMesk for metal dataset at some point had 2135 instrumentals and 1779 vocals (total 3914 tracks)\n\nQ: Is 7 thousand tracks typical for a model or just a really good one\n\nA: It’s very good for finetune - frazer\n\nGabox: \"the biggest epoch i trained was 30\" - although they were fine-tunes\n\n*Overfitting*“Is when a model is still improving on training data but not on unseen data, and if training is push too far, it can even start to perform worse on unseen data.\n\nIt's more important issue when you want a model that generalise well”, [e.g. targeting only 909 hihats], you want a model which targets one really precise sound (with some variation, but still 909 hihats, so it's not really about generalisation.” jarredou\n\nIn terms of training, Anjok used A6000 48GB and Ryzen 7 5800, 128GB RAM, 3TB NVMe, you need an SSD for training as the training process is intensive for a massive amount of data.\n\n“Training source sep models [from scratch] requires way more time than training RVC model, especially with Roformers (it takes weeks/months to get good models with high end hardware)\n\ntho, finetuning a Rofo to achieve a new target role can take long too, depending of the baseline model/goal/context/expectation” - jarredou\n\n- Training starting weights (models trained from scratch, which can be used for finetunes):\n\nBS-Roformer SW, Mel Kim,\n\nor even Mel viperx 1143, BS viperx 1297/1296\n\n- Sometimes using base models instead of fine-tunes used for fine-tunes gives better results when you struggle with getting good results, but it can be a problem with dataset too (specific songs, some stuff inside specific genre, or bleeding) despite parameters, training time, subpar GPU, etc\n\n- ZFTurbo when trains specific instrument models doesn't train from scratch, but retrains on model which already understands what music is. Training is easier.\n\nMakidanyee:\n\nI think generally finetuning with too short dataset and overtraining it could lose the inherited ability of the original model to handle unseen data (barely similar to the finetune dataset but maybe so to the original model's dataset) well\n\nCross:\n\nI believe any lower limit would be you need to have at least 1 set of stems that are as long as or longer than the currently set segment length in the config\n\na smaller cleaner dataset is better than a gigantic one with leakage\n\ncheck your dataset make sure the samples are clean\n\n(...) i believe any lower limit would be you need to have at least 1 set of stems that are as long as or longer than the currently set segment length in the config\n\nThe transformers self attention mechanism is surprisingly good at ignoring small dataset impurities and mistakes but cannot work magic. if most of your dataset is one way the model will turn out with that result\n\nlike if there's one or two instances of a blip or glitch in the dataset the transformer can actually learn that its a statistical outlier and ignore it\n\nFor fine-tuning of existing models of these archs, RTX 3070 Ti and 4060 (both 8GB) were used by unwa and Gabox respectively (RTX 2000 series don’t support Flash Attention implementation in ZFTurbo’s training [repo](https://github.com/ZFTurbo/Music-Source-Separation-Training/)).\n\n“You COULD train using 8GB of VRAM, it is doable, but not recommended, you at least need 16 or more. Training is difficult because it quickly fills up your VRAM even with gradient checkpointing enabled” - mesk\n\nTraining was also tested by unwa and working on RX 7900 XTX 24GB (gfx1100) on Ubuntu 24.04 LTS using Pytorch 2.6 for ROCm 6.3.3, PyTorch 2.6 for ROCm 6.2.4.\n\n“No special editing of the code was necessary. All we had to do was install a ROCm-compatible version of the OS, install the AMD driver, create a venv, and install ROCm-compatible PyTorch, Torchaudio, and other dependencies on it.”\n\n“To install only the minimum necessary items, I first installed PyTorch, then ran train.py many times to install the missing items little by little.”\n\n“Basically, it is almost no different from PyTorch for CUDA.\n\nFor example, when specifying a device in your code, you can just use 'cuda' as is.\n\nAlso, Flash Attention can be used by setting the environment variable to ‘TORCH\\_ROCM\\_AOTRITON\\_ENABLE\\_EXPERIMENTAL=1’.”\n\nFor now, the only [supported](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) consumer AMD Radeon GPUs for ROCm on Linux are:\n\nRX 7900 XTX, RX 7900 XT, RX 7900 GRE and AMD Radeon VII (probably a fuller list of GPUs from [here](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html) should have working GPUs with ROCm too),\n\nbut “even right now hipcc in ROCm 6.3.3 has gfx1200 and gfx1201 targets [namely RX 9070 and RX 9070 XT]. You'll still be able to build and run stuff with ROCm. For whatever reason, AMD feels it's not ready to give its stamp of approval.” (EmergencyCucumber905)\nE.g. RX 6700 XT 12GB seems to work with ROCm too, but its performance might turn out to be not good enough, seeing how ZLUDA based on ROCm performed (more on ZLUDA later below, it’s rather not feasible for training even in its fork state).\n\n“The 7900 XTX is great but probably no match for the 9070 XT, which will be optimized in time; the 7900 XTX certainly has more VRAM, but RDNA4 has FP8 support and greatly enhanced performance at FP16/BF16.\n\nAlso, the 7900 XTX is a top-end GPU and generates tremendous heat.\n\nThe room becomes unbearably hot after running it for a while.” Unwa\n\nUnwa half a year later:\n\n“If you want to use AI properly with an AMD GPU, the MI300X is the best choice.”\n“Honestly, I miss how comfortable CUDA is.”\n\nUnwa some months later:\n\n“If you're buying an AMD GPU, anything less than the Instinct MI300X/MI350X will make training AI models a painful experience.”\n\n*GPU handling*\n\n“I’ve reduced mine's core clock down to 60% with barely any reduction in performance, but it’s much cooler now (stays about 55 degrees while training as opposed to 75-80)” becruily (iirc it was on 3090 or Ti).\n\nBad thermal paste (e.g. MX-2) can dry out in a year in high temperatures up to 80 degrees.\nSome GPU brands allow changing thermal paste before warranty period ends.\n\nConsider a PC case with good airflow. Thermal pads might degrade after 5 years or when you disassemble them and, in a result, increase temperatures for memory. Then, check specific pad density for your model and replace it. You should be able to monitor VRAM temperature in e.g. GPU-Z. Sometimes it can be good at stock state even longer.\n\nPyTorch supporting ROCm on Windows natively without WSL was unavailable before ([old documentation](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/component-support.html)), and for consumer GPUs is now supported with ROCm 6.4.4 for only RX 7000/9000.\n\nSeeing by e.g. Stable Diffusion WebUI, any potential DirectML forks will be much slower than ROCm ([src](https://www.theregister.com/2024/06/29/image_gen_guide/?page=2)), so for now, ROCm is the only reasonable way to go on Radeons (probably ZLUDA forks are still too much behind in development to be any useful (for now official repo supports only Geekbench, and further maintained [fork](https://github.com/lshqqytiger/ZLUDA/releases) of the old base doesn’t work with e.g. UVR [or we just haven’t tried hard enough yet]).\n\nUnwa: “The transition from GeForce to Radeon was not too difficult.\n\nIt may be a bit cumbersome to build the environment.”\n\n“So far I have not had any problems. Running the same thing appears to use a little more VRAM than when running on the NVIDIA GPU, but this is not a problem since my budget is not that large and if I choose NVIDIA I end up with 16GB of VRAM (4070 Ti S/4080 S).\n\nProcessing speeds are also noticeably faster, but I did not record the results on the previous GPU, so I can't compare them exactly.“\n\n-\n\n[becruily](https://discord.com/channels/708579735583588363/1220364005034561628/1454786646393487410) - “unwa didnt you have amd gpu\n\neither way just use this, one click install (its a bit better than native torch fa for me)\n\n<https://github.com/AuleTechnologies/Aule-Attention>”\n\nI switched back to RoPE and tried using flash\\_attn, which reduced memory usage to one-third. (21.9GB -> 7.7GB)\n\n“However, speed decreased from 1.16 it/s to 1.46 s/it.\n\n[Later on]\n\nif u set export FLASH\\_ATTENTION\\_TRITON\\_AMD\\_AUTOTUNE=1 itll do rdna for u too\n\n[code](https://discord.com/channels/708579735583588363/1220364005034561628/1454929468249342033) at the end\n\nQ: plz enlighten me in ai source sep knowledge bcs i cant sleep rn\n\nA: if you haven't read it yet, it's a must read when you want to learn more about source sep<https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/95818/Thesis_Sarkar.pdf>\n\n*MDX-Net v2*\n\n*(not incl. in MSST, see Kim’s MDX-Net v2 training repo* [*here*](https://github.com/KimberleyJensen/mdx-net)*) - lighter and older arch than Roformers, less effective and aggressive, less filtered*\n\nTurned out to be easier in picking out proper parameters for training than VR.\n\nIn case of e.g. MDX-Net, you take under consideration how big your model is intended to be by fft parameter determining the cutoff of the model, and also in-out channels (size of the channels long story short) - it increases size of the model and intensifies the resources needed for training.\n\nSo if you have a smaller dataset, your model doesn’t have to be that large.\n\nIf you crank up the model size too much for a small dataset, you're putting yourself into a risk of overfitting. It means that the model will work too well on a data which was trained on, but it will not work so well on unknown songs which the model wasn’t trained on.\n\nIn case of situation of having large database with small model size, there won’t be much training at all. It will basically forget features of larger dataset. You need to find a balance here.\n\nBatch size is the amount of samples that are being fed into the model as it’s being trained. Smaller batch sizes will take longer to learn, but you might get a better result at the end. Larger batch size will make the model not so good, because it has to learn bigger passages at once, but the model will train faster.\n\nYou need to tweak, balance out and find what works for you the best for a model you’re training. Also balancing things out might be helpful for end users with slower GPUs, or even CPUs [although bigger MDX23C (v3) models are very difficult to separate on CPU, nearly impossible on the oldest 4 cores and still noticeably slower than MDX-Net models on GPUs like 3050].\n\nThe section continues later below.\n\n*MDX23C*\n\nNoticeably slower for separation than MDX-Net, even for GPUs like 3050.\n\n3000 samples of 3-4 minutes length, it's going to take at least for batch size of 8, a month and a half (?on A6000 and MDX-Net). Anjok didn't want to make models too big, having end users with not the best hardware in mind (hence the choice of the older arch).\n\n(here the interview section ends)\n\nEverything should be trained to min. 200 epochs (at least for a model trained from scratch), and better, for 500 (e.g. MDX-Net HQ\\_2 was trained to 450 epochs). From e.g. 200 upward, the increase of SDR can be very low for a longer time. Experimentally, HQ\\_4 was trained to epoch 1149, and it slowly, but consequently progressed further beyond. In general, some people train models up to 750 or 1000 epochs, indeed, but it takes longer.\n\nSomewhere at the beginning of 2023, UVR dataset consisted of 2K songs (maybe for voc\\_ft, can’t remember), and probably more for MDX23C, and 700 pairs for BVE model, but in case of vocal model, the one with 7K songs didn't achieve much better SDR results than 2K. Could've been a problem of overfitting or no cutoff for vocal model or any other problem with dataset creation we will tackle here later.\n\nThe best publicly available archs for training instrumentals/vocals which community already used, are:\n\nMelBand Roformer (faster and can surpass MDX23C and BS-Roformer SDR-wise with e.g. Kim config below [and not only], and can sound better and less muddy than BS), BS-Roformer (very demanding, better for specific tasks), MDX23C (can produce more residues in instrumentals than MDX-Net v2, but can give a bit more clarity), MDX-Net v2 2021 (instrumentals can get a bit muddy even in fullband models, still more residues than in Roformers), Demucs HT a.k.a. Demucs 4 (Anjok failed at training single stem model for it), vocal-remover (VR) by tsurumeso 5 (good for specific tasks like Karaoke/BVE models or dereverb, and for instrumentals it leaves lots of unpleasant residues), VR 6 (now takes phase under consideration, so there should be less residues, but it’s outperformed by newer archs), VitLarge (probably the fastest), SCNet (still faster than Roformers).\n\nI think on example of HQ\\_3 and 16.xx models, it’s safe to say that MDX-Net v2 fullband models have less vocal residues in instrumentals than newer MDX23C arch, but it is also much more muffled, and it depends on specific song what arch will fit the best.\n\nAbout BS-Roformer, e.g. the model trained by Bytedance didn’t include other stem and is obtained by inversion, and initially the results had lots of vocal residues in instrumentals or instruments in other stem, but it can be alleviated by decreasing volume of input file for separation by 3dB (the best SDR among lots of tested values). Generally, viperx models sounds similar to Ripple. The arch itself has potential for the best SDR currently (although currently there’s a small difference between the two best Mel and Rofo models SDR-wise - 2024.08.07 and 2024.10 on MVSep.com, while BS models are more muddy).\n\nThere are other good archs like BSRNN which is already better than Demucs, and later released SCNet (but the results weren’t as good as Roformers, they had more noise, and training wasn’t that straightforward as initially thought). It's faster than BS-Roformer, but probably due to arch differences, rather not better, although it might be still decent in some cases (you can hear the results on MVSEP).\n\nViperx trained on far more demanding arch (BS-Roformer) with 8xA100-80GB (half of what ByteDance used), on 4500 songs, and only on epoch 74 they already surpassed all previous UVR and ZFTurbo’s/MVSEP models, including ensembles/weighted results (more info on that later below).\n\nViperx made a private model with Mel-Roformer which reached an epoch of around even 3100. He uploaded the SDR results to MVSEP, but it has been taken down since [presumably by viperx himself]. And even then, the result was not above 9.7 unfortunately, achieving results not much better than MDX23C SDR-wise, but with a probably bigger dataset.\n\nLater Kim fixed the issues with low SDR in Mel with her config and released the model which become the base of all the fine-tunes by Unwa/Gabox/Syh-Aname-Super-YH (more below).\n\n- “as training progresses, the metrics will improve slower and slower until a point where it's too slow = stop training” - becruily\n\n- I always stop when [loss, avg\\_loss=] nan\n\nQ: it went back to SDR 20 again\nA: “that progress on valid updates per track, so some tracks are 20 SDR some will be like 2 SDR” (frazer)\n\nQ: yea, some are still at 11 etc\n\nA: “train until either avg loss nan or fixed sdr (example: 14 for me) with 5.0e-05\n\nuse best checkpoint from that run with 1.0e-05 to get some boost\n\nidk why it works (and if it works im at step 1 rn xD)” - mesk\n\nQ: is it normal that when i use a checkpoint with 12.53 sdr, then restart training, the results at epoch 0 and 1 drop back to 11.77?\n\nA: yes\n\nonly if u restart with a higher LR\n\nso i had a checkpoint that was 12SDR trained with 5e-5, then if i trained that with 1e-5, youd expect it to start at like 11.Xsdr\n\nif i had a checkpoint 12SDR trained 1e-5, and i train with 5e-5, id expect it to either start at 12, then drop, then start to increase\n\nkeep it going 5e-5 for literally as long as u can\n\nthis is called overfitting - just make sure it doesnt do this\n\nits the point where the model begins to not generalize but memorize the training set - happens on finetuning if u train too long\n\nwhat u do is just keep the redline score if u still have it ([pic](https://imgur.com/a/POe76dn)) - frazer\n\n### **Creating dataset**\n\nLet’s get started.\n\nFirst, check the -\n\nRepository of stems - [section](#_k3cm3bvgsf4j) of this document.\n\nThere you will find out that most stems are not equal in terms of loudness to contemporary standards, and clip when mixed together.\n\n*About sidechain stem limiting guide by Vinctekan (*[*moved*](https://docs.google.com/document/d/1WDDK7E8-HY7EYUOM-evIcXNYX_Of0gYOVjG95zLQfKo/edit?tab=t.0)*)*\n\nThe sidechain limiting method might be not so beneficial for SDR as we thought initially, irc it’s explained in the [interesting links section](#_8uxrvfoxzav6) with the given [paper](https://arxiv.org/abs/2402.18407).\n\nOther useful links:\n\n<https://arxiv.org/pdf/2110.09958.pdf>\n\n<https://github.com/darius522/dnr-utils/blob/main/config.py>\n\n“You can also just utilize this <https://github.com/darius522/dnr-utils/blob/main/audio_utils.py>\n\nand make a script suited to your own, the one already on this repo is a bit difficult to repurpose.\n\nI just concatenated a lot of sfx music and speech together into 1hr chunks and used audacity tho (set LUFS and mix)\n\noh and then further split into 60 second chunks after mixing them” - jowoon\n\n“Aligned dataset is not a requirement to get performing models, so you can create a dataset with FL/Ableton with random beats for each stem. Or using loops (while they contain only 1 type of sound).\n\nYou create some tracks with only kick, some others with only snare, other with only...etc...\n\nAnd you have your training dataset to use with a random mixing dataloader (dataset type 2 in ZFTurbo script, one folder with all kick tracks, one folder with all snare tracks, one folder with… etc.\n\nThen you have to create a validation dataset accordingly to the type of stems used in training, preferably with a kind of music close to the kind you want to separate, or \"widespread\", with a more general representation of current music, but this mean it has to be way larger.\n\nThe only requirements are:\n\n44.1Hz stereo audio.\n\nLossless (wav/flac)\n\nOnly 1 type of sound by file (and no bleed like it would happen with real drums)\n\nAudio length longer than 30s (current algos use mostly ~6/12 second chunks, but better to have some margin and longer tracks so they can be used in future when longer chunks can be handled by archs & hardware).” jarredou\n\n“You can use flac too; saves space (though make them 44.1 / 16-bit / stereo, even if u use mp3's or whatever other format - convert upfront)\n\nvalidation set however needs to remain in wav with mixture included.” Bas Curtiz\n\n“A quite unknown Reaper script to randomize any automatable parameters on any VST/JS/ReaXXX plugin with MIDI notes. It's REALLY a must-have for dataset creation, adding sound diversity without hassle.\n\n<https://forum.cockos.com/showthread.php?t=234194>” jarredou\n\n*(Guides for stem limiting moved to the end of the section for archival purposes - rather outdated approaches due to the statements in the paper above)*\n\n*FAQ*\n\nYou shouldn't compare training data against evaluation/validation data, while those are the same.\n\nYou can use a multisong dataset from MVSEP, and make sure you don't have any of those songs in your dataset. Or it can be e.g. Moises.\n\nQ: Does evaluation data matter for the final quality of the model?\n\nA: Absolutely not. It's merely indication\n\nQ: How big should be validation dataset\n\njarredou: Enough to have a good idea of how your model is performing. If your dataset is already limited, like 50 tracks, maybe use 5 of them, the most different ones, as validation. But higher is better there too (when you can).\n\nIf training is done with vocals and instrumental stems, the script is expecting vocals, instrumental and mixture stems for validation\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/dataset_types.md#dataset-for-validation>\n\n“Even when experimenting I try to use at least 5 validation tracks, which are highly different, it doesn't cost a lot of time and gives a better point of vue of what the model is really able to do\n\nFor some experiments I was using only 1 song, it was always giving negative results, but when I've added more song to validation, I saw that the model was in fact learning things, just that with that initially selected track it was just bad (and after some further experiment it was because of dataset, that had no track sounding like this one)\n\nmodel doesn't learn from validation content, it's just to evaluate a model performance on content it hasn't seen during training” - jarredou\n\n- Preparing de-reverb validation dataset ([link](https://discord.com/channels/708579735583588363/911050124661227542/1425108782953791589))\n\n\\_\\_\n\nQ: Could i also add instrumentals without the pairing vocals\n\nlike for example can i add gaga`s instrumentals to mayhem and other albums without the vocalsxx.wav\n\nFrazer: yeah u can but u get better scores when they’re aligned\n\nMesk: most of the albums i got dont have vocals x)\n\nQ: Can’t tell if its working or not Instr Vocals sdr: -0.4951 (Std: 1.2174)\n\nA: if it's improving over time then yes\n\nbut it's gonna take a very long time to decent results training from scratch\n\n- Gabox: \"the biggest epoch i trained was 30\" - although they were fine-tunes iirc\n\n- He said, for finetuning have at least 500 pairs\n\n- it will be better to use type 4 for adlibs\n\nso each song has its own folder (i assume the songs are not random)\n\n- No tracks found\n\nusually validation is strictly the same as type 1 so:\n\n>validation\n\n>song 1\n\n>other\n\n>vocals\n\n>mixture\n\n>song 2\n\n>other\n\n>vocals\n\n>mixture\n\nyou're missing this:\n\n⁨`--dataset\\_type 2`⁩\n\n- (Neoculture while attempting to create a new inst model preserving vocal chops having e.g. lot of hip-hop):\n\n“i removed some of metal tracks from my dataset, my vocals bleed issue is almost fixed\n\n(...) ofc, there's a tracks that still have vocals bleed and i still trying to find and delete it”\n\n- (mesk) this was my biggest complaint about 07.2025 the fact that it was a muddy vocal model, so i'm very excited. sure i cant release it bcs ft of said model, but atp i don't really care\n\nafter a smaller period away from the server, i finally know how to fix the hollowness in my finetunes lmfao\n\n([link](https://discord.com/channels/708579735583588363/708912597239332866/1454871500850200738))\n\nremove these then put the loss\\_weight to 0.1 instead\n\nand i added this in the code (line 655):\n\n` multi\\_stft\\_resolution\\_loss = multi\\_stft\\_resolution\\_loss + F.l1\\_loss(torch.abs(recon\\_Y), torch.abs(target\\_Y))`\n\ninstead of:\n\n` multi\\_stft\\_resolution\\_loss = multi\\_stft\\_resolution\\_loss + F.l1\\_loss(recon\\_Y, target\\_Y)`\n\nQ: Should I de-clip my dataset? i checked it in audacity and a lot of tracks have red clipping indicators\n\nA Training is done in 32bit so not needed it wouldn’t help anyway - becruily\n\nQ: Does somebody know the best way to make dataset smaller? I have very huge dataset in flac format, so the one idea is to truncate part in the song where is only music without vocals? Also, I can convert it to opus format, does it worse it? Or maybe there is something better that I don't know?\n\nA (jarredou): If you plan to use random mixing of stems during training (so non-aligned dataset), then you can remove all silent parts from stems pre-training, on instrumental it will not change a lot but for vocals it can save a lot of space (h/t Bas Curtiz for the idea)\n\nQ: Currently the dataset is aligned, but does this random mixing is the standard approach? I am going to train the official SCNet model, so maybe it will require modifications for this?\n\nA: <https://arxiv.org/abs/2402.18407> (Why does music source separation benefit from cacophony?)\n\n<https://www.merl.com/publications/docs/TR2024-030.pdf> (same non-columns formatting)\n\n“It thus appears that a small amount of random mixes composed of stems from a larger set of songs performs better than a large amount of random mixes composed of stems from a smaller set of songs.”\n\nIf needed, the training script that ZFTurbo has made does handle random and aligned dataset and has also SCNet implementation: <https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\nQ: As I know, SCNet supports only for inference here.\n\nA: It does training too, ZFTurbo has recently trained a SCNet-large model on MUSDB18 dataset\n\nDataset types doc <https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/dataset_types.md>\n\n(he only didn't update help string)\n\nQ: also I added moisesdb to my dataset and now my disk is bottlenecking\n\nI wonder if going from ssd -> nvme would help\n\n>nvme is the best\n\n>having more tracks in dataset shouldn't make the tracks loading slower\n\nNvme is ntfs format and that is slow on Linux\n\n>not if its a mounted drive\n\n>linux disk io is faster than windows' in every way in my experience\n\n(quotes)\n\nQ: does anyone know if hdds are still ok for training or they will cause bottleneck due to low speed\n\nssds have gone so expensive alongside rams\n\nA if you're training a real model like bs rofo on a consumer card the bottleneck will probably be somewhere else imo\n\nA as long as the model is slower/equal than ur hdd speed ur good\n\njust do a time around the model forward +backward and calculate based on the batch\\* chunksize and how big that batch\\*chunksize is on disk (ie if ur doing 10s chunks, batchsize 4, say those are like 5mb on disk each - thats 20mb per batch, how fast did it go through the model, say 2s, thats 10mb per second - if ur hdd is faster than 10mb per sec ur good)\n\nConsider a RAID\n\n- you can use the model to filter your dataset. I tried it, and it bring a bit of improvement\"\n\nOr demucs4 if i'ts too slow - anvuew\n\nQ: This new type isn't the same as dataset type 4, right? The audio won't be feed in sync, from the same start, even if I use dataset type 5?\n\nbecruily: https://github.com/ZFTurbo/Music-Source-Separation-Training/commit/68da5331068104819705b3190e242296fda9a658\n\nlooks the same as 4 (it will be in sync)\n\nQ: I will try this type and see if it will get similiar results with the same dataset (and dataset type 4).\n\nQ: super beginner question here but if I'm playing around with the musdb set where 'other.wav' means everything \\*except\\* drums, vocals, bass, when training a [vocals, other] model with other\\_fix on, I just leave the dataset as is and the script will subtract vocals from mixture when validating, right? No need to rename anything? Just cross referencing valid.py and train\\_accelerate.py in the zfturbo repo to check...\n\nA: Yes - anvuew\n\nQ: Do I also need to collect lead vocal tracks for the karaoke dataset? i saw in the guide there are two folders in the dataset section (other and vocals)\n\nBecruily: this is for dataset type 2, for karaoke models i personally believe it should be aligned tracks (type 4)\n\nthe stem names must match in the config too, seems like in your config they're \"Vocals\" (upper case) and \"Instrumental\" (upper case too)\n\nwhen you use full paths put them in quotations\n\nfor example `--config\\_path C:\\Users\\Administrator\\Documents\\Music-Source-Separation-Training-main\\result` becomes `--config\\_path \"C:\\Users\\Administrator\\Documents\\Music-Source-Separation-Training-main\\result\"`\n\nalso if you're training from scratch it should be `--start\\_check\\_point \"\"` not a path to a folder\n\nQ: I'm a bit hesitant to include lossy quality audio in my dataset\n\nA: i have mp3, opus, m4a, ogg, and mpeg in my dataset - Gabox\n\nA: I have youtube stuff in mine its not really a problem tbh - Mesk\n\nQ: so lossy audio doesn’t actually make our model noisy af?\n\nA: i don't think so\n\ni play with the yaml to get different types of noise xd\n\n- Gabox\n\n### **Creating dataset**\n\n### **Guide by Bas Curtiz**\n\n(now also [video](https://www.youtube.com/watch?v=Wmt_0zu94L8) available)\n\nHow to: Create a dataset for training\n\n1. Download any sample pack that is focused around inst/synth x or y.\n\n(sources to seek on: audioz.download, magesy.blog, rutracker.org, freesounds.org, etc.)\n\n2. Use real-debrid.com or alldebrid.com to speed things up DL-wise\n\n(costs a few bucks but worth it,\n\nso better prepare so u can sign up for a free trial or a 3 bucks access for x amount of days)\n\n3. Unzip all (so they are all in a separate individual folder)\n\n4. Convert all to make it consistent (I use https://www.dbpoweramp.com/ to batch-process)\n\na) Convert all to WAV/16-bit/Stereo & delete any other audio format like AIFF, MP3, OGG.\n\nb) Convert all to FLAC (saves space without degrading quality)\n\nc) Delete all \\*.WAV files\n\n5.\n\na) Move all files to root folder of the individual folder.\n\n(I use a python script for that. Hit me up so I can share.)\n\nb) Remove empty directories\n\n(I use https://sourceforge.net/projects/rem-empty-dir/ to batch-process)\n\n6. Rename files by adding a prefix of the folder they're in.\n\nFor convenience, add a tag like [ORGAN] or so to it:\n\nexample: `[PERC] - Aaroh South Indian Percussion - AR\\_SIP\\_80\\_percussion\\_small\\_nagara\\_double\\_rhythm.flac`\n\n(I use https://www.bulkrenameutility.co.uk/ to batch-process)\n\n7. Sort on length. If below 11s, move them elsewhere.\n\n(I use Mp3tag to sort and move in batch, but Windows explorer is able to do so too)\n\n8. Loop those up till 11 seconds.\n\n(I use https://www.dbpoweramp.com/ > Loop DSP for that)\n\n9. Move \\*.flac files into 1 folder (the looped + untouched audio files) - now u can ditch all unneeded files/folders\n\n10. Undupe\n\n(I use <https://www.similarityapp.com/>, <https://www.duplicatecleaner.com/> or <https://dupeguru.voltaicideas.net/> to batch-process)\n\n(optional - only when applicable)\n\n11. Sanitize based on SDR\n\na) Process the original files with HTdemucs.\n\nBased on whether your dataset should contain bass/drums/other/vocals, set the proper output.\n\nb) Rename the output so it matches the original filename again (using <https://www.bulkrenameutility.co.uk/>)\n\nc) Use SDRCALC.exe like `sdrcalc \"c:\\organ\" \"c:\\organ-htdemucs\" > sdr-organ.txt`\n\nd) copy over the output in the text-file to a GSheet for convenience, to sort on SDR\n\ne) move all unprocessed/original files above a certain SDR to a new folder\n\n(I use a python script for that. Hit me up so I can share.)\n\n12. Review the content on filename and play some you aren't sure, what this filename would sound like.\n\nThey can be hit or miss for your specific dataset. So anything that mentions something unusual or so usual, you know it's part of something totally different,\n\nmove them elsewhere, to keep the dataset close to what you try to obtain sound-wise.\n\n(example \\*timpani\\* is part of percussion, not so much part of a String Dataset)\n\nYou could try to cluster the samples upfront with s/w like https://www.sononym.net/ - also available at audioz.download\n\n13. Zip those and upload, so you can share with those that have the ability/experience to train.\n\nAlso needed when you're going to hire a cloud-gpu setup, to copy over the dataset to its server).\n\n(I use sharepoint/onedrive for that, but u can use buzzheavier.com for unlimited storage)\n\nDone.\n\n“In some cases when there aren't clean versions available, you can use a portion of the song where it doesn't have vocals (but has the bleed instruments) and add random clean vocals\n\nit's not aligned dataset, but works for fine-tuning” - becruily\n\neven aufr33 admitted that makes models with isolated tracks\n\n“Lossless is always better (and if needed you can use mp3 encoding as an augmentation during training, based on the lossless files)\n\nBut as 320kbps has quite a high cutoff (20khz or something), it would be less problematic than more compressed audio with hard cutoff at 16khz or 17khz.\n\nI would say that, like for other less regular stuff in your dataset, make it obvious in filename that it's not lossless if you share these files\n\nQ: So... maybe not? IDK I feel like I could make an entire 20 songs dataset out of those, because the best ones aren't lossless\n\nbut would it actually be helpful\n\na lot of leaked stuff is in 320kbps mp3\n\nA: I think while codec cutoff is around 20khz or above it's ok.\n\nBecause that will not bias model output results.\n\nWith first Roformers models from ByteDance on ripple, that were trained on mixed lossless and 128kbps mp3 with hardcutoff around 16~17khz, we could see that bias in separated audio even when it was lossless input.\nMaybe it's a question of balance between lossless/compressed content. I remember the first Ripple bsroformer outputs with these incoherencies in high frequencies\nwhile we knew it was trained on mixed lossless/128kbps mp3” - jarredou\nBas: diversity is key we learned from this paper: https://arxiv.org/pdf/2402.18407\n\nso I take that literally, and as a starting point.\n\nQ: so it should also have compression for it to work better after all?\n\nA: so diversity is also in audio compresion\n\nwe don't know for sure, but since my model doesn't seem to perform bad, let's pretend\n\nD: I’m not sure if it’s really worth to worsen the quality of already lossy stems to create diversity in the dataset artificially. Yes, the model might behave better at lossy inputs, but people should use lossy inputs only occasionally, so I wouldn’t sacrifice quality of lossless inputs that way, plus dataset “can be degraded in many ways on the fly during training with augmentations if needed”.\n\nQ: Can the training files for dataset be mp3\n\nI added over 2k tracks and deleted the metadata, it keeps only scanning the original 2k tracks, not the 4k+\n\nA: “I think you can add \"mp3\" extension to the list there in dataset.py\n\narg... it's not that simple, there are other places with wav/flac hardcoded... ” [jarredou](https://discord.com/channels/708579735583588363/708579735583588366/1327685234988417175)\n\n(if it wasn’t implemented already)\n\n“If you were annoyed by dataset metadata generation step being slow with MSST, update/do that:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/pull/178/files>\n\nit's like hundred times faster than before” - jarredou\n\n*Bas Curtiz’ Q&A*\n\n“1. “Is it really necessary to have vocals for every track when training?\n\nDepends. Do you wanna make a model that can split vocal from instrumental? Or vocal from whatever 'other' is?\n\n2. Could adding more instrumental-only data be beneficial for variety?\n\nMy dataset wasn't 50/50 either. Does that benefit? No idea.\n\n3. Like is it okay to just fill up the dataset with instrumentals to complete it or is there a risk it’ll start underfitting on vocals?\n\nUnderfitting is always on the lure, so make sure u have plenty of data in general.\n\n4. I know in a lot of songs there are instrumental breaks with no vocals, so I'm just wondering.\n\nHence, see video \"my how to create a dataset'': <https://www.youtube.com/watch?v=Wmt_0zu94L8>\n\nWe ditch the silence parts from start/in between/end, from vocal and instrumental.\n\nIf you used dataset type 4, then don't throw away silence, since then there's no cohesion any longer between the inst/vocal or drums/bass/vocal/other or whatever u training.”\n\nQ: Why is my model bleeding the least vocals in the instrumental output?\n\nA: Cause I didn't use pre-processed/cleaned up vocals (I did, but only ~10% of the dataset).\n\n[he refers to his fine-tune exclusively on MVSEP]\n\nQ: What has that to do with the absence of vocal bleeding in the instrumental output?\n\nA: Cause the full spectrum is being used to determine what is a vocal. Even low rumble and potential noise.\n\nQ: So why does yours bleeds less compared to other models?\n\nA: As stated, aufr33 for ex. used pre-processed vocals to train on.\n\nThis means, a part of the noise/low rumble is already gone.\n\nSo it's trained like that stuff isn't part of the vocal. And if there is, it is dumped into the other (the instrumental output).\n\nThis is my gut-feeling why we do hear vocal leftovers in the instrumental output.\n\nQ: So what is the solution?\n\nA: My model 🙂 If... you want clean instrumental output.\n\nQ: But yours bleeds percussion/noise in the vocal output though...\n\nA: Yes. That's the side-effect of training on non-processed/cleaned up vocals, I think.\n\nQ: Solution?\n\nA: Fine-tune my model, but this time based on de-noised vocals (and add a shitload of percussion samples).\n\nMy prediction is that we then have the best of both worlds:\n\nThe current model as-is = great for instrumentals (as described by the community several times due, using non-processed/cleaned up vocals as input)\n\nAnother fine-tune based on my current model = great for vocals (due to the lack of noise/low rumble/no bleed percussion)\n\nWarning:\n\nThis is based on logic and gut-feeling.\n\nDill: caught it going to nan again when everything was going smoothly very suddenly between epochs|\nZFTurbo: I think it's never happened to me on SCNet, but often on htdemucs as I remember.\n\nuse\\_amp: false can help\n\nAfter you can switch back on usage\n\nDry Paint: I can speak from experience that it only kinda helps\n\nI have the exact same issue with SCNet\n\nmaking amp=false does solve it but causes the loss to skyrocket to like 3.2e32 around the same time nan loss would appear\n\nQ: Can I put a 4-hour file in one of the dataset's songs? Or should I split it?\n\nA: During training the script uses chunks anyway, so yes you can feed it a 4 hour file (in theory)\nA: I did this, but I had to modify the training code to check the file length without loading the whole thing into memory\n\n#### Leading architectures\n\n**MDX-Net (2021) architecture** (a.k.a. v2) (<https://github.com/kuielab/mdx-net>). Old.\n\nFrom public archs, before MDX v3 2023, it gave us the best results for various applications like vocal, instrumental, single instruments models compared to VR arch. But denoise and dereverb/deecho model turned to be better using VR architecture, the same goes to Karaoke/BVE models where in contrary to 5/6\\_HP, MDX model sometimes does nothing.\n\nIn times of Demucs 3 there was also e.g. custom UVR instrumental model trained, but it didn’t achieve that good results vs MDX-UVR instrumental models.\n\nOnce there was UVR **Demucs 4** model coming up, but the training was cancelled due to technical difficulties. Looks like ZFTurbo managed to train his model for SDX23 challenge and also vocal model, but “[the] problem is that Demucs4 HT [traning is] very slow. I think there is some bug. Bug because sometimes I observe large slow-downs on inference too. And I see high memory bandwidth - something is copying without reason...”\n\nSpleeter might seem to be a good choice, because training is pretty well documented, but it isn’t worth it seeing how these models sound (also it was very first AI for audio separation at the time, and even VR arch is better than Spleeter hence UVR team started to train on VR arch with much greater results than Spleeter).\n\nYour starting point to train MDX model would be here:\n\n<https://github.com/KimberleyJensen/mdx-net>\n\n(visit this repo, it has some instructions and explanations)\n\nZFTurbo released his training code for other various archs here:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\n\"It gives the ability to train 5 types of models: mdx23c, htdemucs, vitlarge23, bs\\_roformer and mel\\_band\\_roformer.\n\nI also put some weights there to not start training from the beginning.\"\n\n“Set up on Colab is simple:\n\nYou only have to create one cell for installation with:\n\nfrom google.colab import drive\n\ndrive.mount('/content/drive')\n\n%cd /content/drive/MyDrive\n\n!git clone https://github.com/ZFTurbo/Music-Source-Separation-Training\n\n%cd /content/drive/MyDrive/Music-Source-Separation-Training\n\n!pip install -r requirements.txt\n\nAnd a cell to run training:\n\n%cd /content/drive/MyDrive/Music-Source-Separation-Training\n\n!python train.py \\\n\n--model\\_type mdx23c \\\n\n--config\\_path 'configs/config\\_vocals\\_mdx23c.yaml' \\\n\n--results\\_path results/ \\\n\n--data\\_path '/content/drive/MyDrive/TRAININGDATASET' \\\n\n--valid\\_path '/content/drive/MyDrive/VALIDATIONDATASET' \\\n\n--num\\_workers 4 \\\n\n--device\\_ids 0\n\nDon't forget to edit the config file for training parameters\n\nYou can also resume training from an existing checkpoint by adding\n\n--start\\_check\\_point 'PATH/TO/checkpoint.ckpt' \\\n\nparameter to the command in the training cell\n\nthe checkpoints are saved in the path provided by the :\n\n--results\\_path results/ \\ parameter of the command, so here, in \"results\" folder\n\nWith ZFTurbo's script, mixtures are needed for validation dataset, to evaluate epoch performance” - jarredou\n\n“it saves every checkpoint as \"last\\_archname.ckpt\" (file is overwritten at each epoch), and also saves each new best checkpoint on validation as \"archname\\_epxx\\_SDRscore.ckpt\".\n\nIt also lowers the learning rate when validation eval is stagnant for a chosen number of epochs (reduceonplateau), you can tweak the values in model config file.”\n\nQ: What does this gradient accumulation step/grad clip mean exactly?\n\nA: “Accumulation lets you train with a larger batch size than what you can fit on your GPU, your real batch size will be batch\\_size multiplied by gradient\\_accumulation\\_steps.\n\ngrad\\_clip clips the gradients, it can stop the exploding gradients problem\n\nExploding gradients = model ruined basically, i had this problem with Demucs training, but I used weight decay (AdamW) to solve it instead of grad\\_clip\n\nI don't think grad\\_clip uses any resources, but accumulation uses a little bit of VRAM, i don't know the exact number” - Kim\n\nQ: Why can’t models have like an auto stop feature or something IDK like if the model stops improving it’ll stop automatically\n\nor overtraining, but IDK if models can overtrain\n\nA: Nothing stopping you from adding a thing to stop training after seeing SDR (or whatever) is stagnant, some people even represent it in a chart\n\nA: That’s easy to get it done in PyTorch, just use EarlyStopping after the overall validation loss computation and the training will stop depending on the patience you set on EarlyStopping…\n\n- [Colab](https://colab.research.google.com/drive/17SSjougcnVhX6WewW88QoKKFuFiKNz8t?usp=sharing) by jazzpear96 for using ZFTurbo's MSS training script. “I will add inference later on, but for now you can only do the training process with this!”\n\n- Training lots of epochs on Colab might be extremely tasking - for free users they currently only give slow GPU with performance of around RTX 3050 in CUDA but with 11GB of VRAM. It’s only good enough for inference.\n\nQ: how can I train a heavier MDX-NET model with a higher frequency cutoff like recent UVR MDX models?\n\nKimberleyJSN:\n\nA: these are the settings used for the latest MDX models you can change them at configs/model/ConvTDFNet\\_vocals.yaml and configs/experiment/multigpu\\_vocals.yaml\n\noverlap - 3840\n\ndim\\_f - 3072\n\ng - 48\n\nn\\_fft - 7680\n\nThese seem to be actually parameters for the last Kim ft other instrumental model, while e.g. half of MDX-UVR HQ models without cutoff has n\\_fft/self n\\_fft set to 6144.\n\nAlternatively, see this guide:\n\n<https://github.com/kuielab/mdx-net/issues/35#issuecomment-1082007368>\n\nYou also need to be aware of a few additional things:\n\n(Bas Curtiz, and brackets mine)\n\n**Few [training] key points**:\n\n- If you don't have a monster PC incl. a top range GPU [RTX 3080 min?] (or at work), don't even consider. [smaller models than good inst/vocs with fewer epochs of around 50 might be still in your range though]\n\n- If you don't have money to spent renting a server instead, don't even consider.\n\n- If you aren't tech-savy, don't even consider.\n\n- [If training] a particular singer, [then does it have] highly 100 tracks with original instrumental + vocal?\n\n- IDK, but I don't think that will be enough input to get some great results, you could try though [good models so far have varying genres and artists in the dataset, not just one].\n\n- If you need some help setting it up, Kimberly (yes, she's the one who created Kim\\_vocal\\_1 model, based on an instrumental model by Anjok),\n\nyou can ask her (@)KimberleyJSN.\n\n###### **MDX23C a.k.a. MDX-Net 2023 (v3)** (not always better than v2)\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\nMDX23C+ by[**jarredou**](https://discord.com/channels/708579735583588363/1220364005034561628/1462589158949257389) ([DL](https://drive.google.com/file/d/1MmidXBnDs_laZ6ZMGiedX_bnTVM2RTRo/view?usp=sharing)) (explained later below)\n\nOG repo:\n\n<https://github.com/kuielab/sdx23>\n\nLots of general optimizations to the quality while keeping decent training and separation performance. Theoretically the go-to architecture over MDX-Net v2, although currently SAMI Bytedance reimplementation (under VR section below) has much less bleedy results for much more compute intensiveness. It was used for trained models by ZFTurbo. On the same if not better dataset than previous V1 models, it received not much worse SDR than V1 arch for narrowband, but with much fuller vocals, although with more bleeding (also in instrumentals). For fullband, SDR was high enough to surpass previous models, but SDR stopped reflecting bleeding on multisong dataset.\n\n\"It doesn't need pairs anymore.\n\nThis... is HUGE.\n\nIt randomizes chunks of 6 seconds from random instrumental and random vocal to learn from.\n\nIn other words, no more need to find the instrumental+vocal for track x.\n\nJust plump in any proper acapella or instrumental u can find.\n\nThe downside so far is the validation.\n\nIt takes way longer.\" so you might be able to perform evaluation per only, e.g. 50 epochs.\n\nDataset structure looks like\n\n- train folder > folders with pairs > other.wav + vocals.wav\n\n- validation folder > folders with pairs > other.wav + vocals.wav + mixture.wav\"\n\nLibsnd can read FLACs when renamed to WAV. It can save a lot of space.\n\nI think in the old MDX-Net, we didn't have a model with not worse SDR than epoch greater than 464, although 496 with lower SDR also had its own unique qualities (though more vocal residues at times). Also, frequently training is ended on epoch 300, and might not progress SDR-wise for a long time (maybe till 400+).\n\n<https://cdn.discordapp.com/attachments/911050124661227542/1136258986677645362/image.png> (dead)\n\n(written before Roformers) We may already be hitting the wall SDR-wise, as Bas once conducted an experiment with training a model made of the dataset evaluation and the result was only 0.31 higher than the best current ensemble (although it used lower parameters for separation). Generally, to break through that wall, we may need to utilize multi-GPU with batch size “16 or even 8”.\n\n“If you did this experiment with batch size 16 or even 8 you would see much better performance I think” - Kim\n\n“mhm but that requires multi GPU” - Bas\n\n“yeah that is the wall I think” - Kim\n\n- “at least for vocals, using the default 8192 n\\_fft for mdx23c and reducing hop\\_length from 2048 to 1024 gave better results (it's InstVocHQ config iirc).\n\nIn mdx23c paper, they got better score with higher n\\_fft/hop\\_length resolution.\nHop\\_length means the portion of audio that will NOT be overlapped during STFT processing. So the lower, the higher overlap you get in the end. And a bit like the overlap we are doing during inference (which is different, as applied on waveform, on way larger scale) but in the same way, the higher overlap you use, the higher quality you get, but at cost of more resources, and at some point it gets stagnant (or could even reduce quality too if set too high)” - jarredou\n\n- For “ringing and unpleasant artifacts at subband edges, (...) more of a dip, that seems to get filled progressively along training”\n\n> “It seems like overlapped subbands is the thing (not surprised about it) but the dips are maybe a bug, I'll see if I can improve that.\n\nI'm still experimenting to find best way to alleviate the subband artifacts of mdx23c, but here's an already working fork with separatable depthwise convs (model size is divided by ±6)\n\n[DL](https://drive.google.com/file/d/1lgZRtKMqzWbbFnpxhn_QWkzH8EAx1HuT/view?usp=sharing)\n\n- “[The] opposite to BS/Mel Rofos, the more you have sub-bands with mdx23c the less resources needed to train a model (until quick collapse of the model if too much sub-bands used, from what I've experimented until now)” - -||-\n\nFrazer: “wouldn't another fix be to change how the transposed convolutions work? So that the input to the TConv includes the other bands first or last N indices\n\nthen crop it out afterward or sum/avg those extra indices\n\nsurely if what's happening is that as the bands are downsampled either they drop indices at the edges because it's not a perfect crop or that for whatever reason the latents at the edges don't receive proper gradients and don't optimize - then accounting for that by either expanding input bands or allowing the convs to work on the borders maybe fix it. I like your idea more, it's cleaner, but I'm just trying to come up with some system where it's cheaper than just having bigger bands, yk. I mean, your system would fix it, but I think you'd need to probably drop some indices right near the border on the expanded bands since those would be broken in the same way as they are currently, if that makes sense” [artefacts](https://imgur.com/a/TMJCymC)\n\n[the modified code] Confirmed with vanilla OG mdx23c code, with only that change and nothing else.”\n\nQ: How about changing it to a depthwise separable convolution + pointwise convolution?\n\nA: “That's what I've done with my latest version.\n\nDuplications “kinda back, differently but still can lead to similar audible ringing artifacts with some configs. Taking it from the opposite side: if it's the low band high energy the issue, reduce the low band high energy (using preemphasis and deemphasis - [pic](https://imgur.com/a/BMUrnqM)). It seems to get rid of the duplicated stuff and ringing artifacts at training start.\n\n“masking + preemphasis seems to does the trick with mdx23c, around SDR 4 after 8 epochs, which is the expected range with the config used” - jarr\n\n“Idk if anyone has tried transformer layers in mdx23c yet but it seems like it could work well”\n\n[More](https://discord.com/channels/708579735583588363/1220364005034561628/1414727336649166951)\n\n“CMX reshaping doesn't create artifacts like the original subband splittin”\n\n<https://arxiv.org/abs/2509.26007v1>\n\nWhen I'll have time I'll try that to replace cac2cws/cws2cac in mdx23c\n\nI was just talking about that \"cmx\" part\n\n>well in that case check this paper out\n\n<https://arxiv.org/html/2502.20388v2>\n\nthis paper, if you pixel shuffle the channels, well you'd probably not want to predict next token but instead next chunk, so a little square of all the channels per timestep chunk by chunk (next cell prediction)\n\nthat might be very very good\n\n>https://github.com/chynggi/Music-Source-Separation-Training-Vibe-Experimental\n\nMDX23 CMX implemented with vibe coding. should i open pull request?\n\n>MDX23C uses InstanceNorm with affine=True, but this appears to be using BatchNorm. Additionally, MDX23C used GELU as the activation function, but this uses ReLU.\n\nWhat exactly has been “improved”?\n\n>you're right. I was relying too much on VibeCoding, so I missed it. So I just fixed that part. Also, that file is an experimental file and isn't actually used.\n\nq: Does it fold not only frequency, but also time?\n\na: Yes. + that CMX reshaping doesn't create artifacts like the original subband splitting\n\n>This diagram shows it clearly [[more](https://discord.com/channels/708579735583588363/1220364005034561628/1423365182637346968)]\n\n“I've tried CMX [4,4] with more longer training session with my \"small\" training config, it's giving worst results despite having no artifacts. Probably everything is too much sliced and scale reduced with that config. Trying now a [16,1] config to compare potatoes with potatoes (16 subbands / 16 freq multiplexing, flat time)” - jarredou\n\nMDX23C mixer modification from some paper:\n\nself.mixer\\_layer = nn.Linear(in\\_channels, out\\_channels)\n\nwould be faster and identical\n\nLLMs told me that nn.linear applied like this would kill the ability to change chunk\\_size during inference, so I let it switch to 1x1conv (original mixer code is using linear)\n\n[code](https://discord.com/channels/708579735583588363/1220364005034561628/1458255333410607156)- jarredou\n\nA llm is wrong because you rearrange b n c t -> b t (n c) when you use linear\n\nnn.linear operates on the last dimension of the tensor so as long as that's not time then the model will be chunk size agnostic\n\n**MDX23C+ by** [**jarredou**](https://discord.com/channels/708579735583588363/1220364005034561628/1462589158949257389) ([DL](https://drive.google.com/file/d/1MmidXBnDs_laZ6ZMGiedX_bnTVM2RTRo/view?usp=sharing))\n\n“It's done with ultra minimal config (same for both runs), heavily undertrained but already big diff.\n\nThe bigger the config is, the larger the gap between OG and new version is (it seems)\n\nOG mdx23c is 240 lines of code, my version is 1000+ lines\n\n(there's lots of optionnal stuff)”\n\nQ What kind of stuff did you change?\n\nA all are optional :\n\nthe mixer thing (improve overall results in multistem config) (there are multiple type of mixer, I've not tested all of them yet, current is simple linear one like OG mixer)\n\nthe small added conv layer with subband groups that I've shared some days ago (improve overall spectral quality, subband artifacts are better/faster managed)\n\nfirst conv layer of model is also changed to kinda goup real and imag (claude made this, and it seems to give also better quality)\n\ndepthwise sep conv in tfc\\_tdf block (no big effect on my current test, aside reducing model size, but has bigger effect with bigger config)\n\na \"LearnableMelSTFT\" thing to replace regular linear STFT (not tested enough yet)\n\na dual path bottleneck (not used here and not tested enough)\n\nalternative to \"small added conv layer with subband groups\" with a small rnn (not used here and not tested enough)\n\nThe impact on required ressource is very low with the ones I've used\n\nIt's messy + some stuff I've haven't checked if they work yet”\n\n##### **Using Colab for training MDX23C model (on free T4)**\n\n“Can't train multistem model because of limited resources of Colab, so it's one by one [if you want multistem].\n\nIt's only 1x Tesla T4, 15GB VRAM so lots of [GPUs] can be [much] better!\n\nI can run batch\\_size = 8 with it\n\n(with n\\_fft=2048 instead of 8192 in the model config; other archs are using n\\_fft=2048 too [Demucs, Rofos…]).\n\nWhen you use full runtime credits one day, the day after, you get only 1h10min GPU time (2 epochs.\n\n1. It's really boring to do\n\n2. You must have multiple Google accounts\n\n3. You must have a dataset and host it on GDrive and share it with all the accounts (and making it accessible at root for each account)\n\n4. Use this fork of ZFTurbo's training script that is allowing better resuming, which is required for Colab as sessions are deleted after 3h~4h max (often less) <https://github.com/jarredou/Music-Source-Separation-Training/tree/wandb%2Bresume>\n\n5. Edit this config baseline accordingly to your dataset/needs ([click](https://drive.google.com/file/d/1xl7pe8Atv89X6yt0siIqcVBH1S6l9btK/view?usp=sharing))\n\n6. Set parameters accordingly, GDrive connection and run.\n\n7. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun. When you've burnt all credits from one account, switch and rerun.... and do that 7 loop for weeks.\n\nI've only spent 3€ to get enough google drive space to host the dataset, other than that I was abusing free colab (...)\nIn GDrive, you share the content between all accounts, then from the \"shared with me\" tab of each account, create a shortcut to access the content at root of each GDrive account, and then each account can access that shared content in Colab (...) I could be online to switch account every 3 hours or so easily. (...)\n\nI've started this experiment just to see if a lightweight mdx23c model could be trained with free Colab, but as I saw it was quickly achieving higher SDR than drumsep on my tiny eval dataset, I'm continuing the training, it's almost at 18SDR now for kick” - jarredou\n\nQ: Any particular reason why the num steps are 1268 instead of 1000? Is that a random number or a calculation?\n\nA: I've read that it's better to use a multiple of the number of tracks in the dataset (here 317 \\* 4) To avoid that some of the tracks are used more than others at each epoch. (jarredou)\n\nThe first jarredou/Aufr33 drumsep model was trained with 3 seconds chunks.\n\n- Here you can find [Colab](https://colab.research.google.com/drive/1EZP6tPoaZzm3NjEum4wj6-7E3cVWOHeO?usp=sharing) made by yukunelatyh (dead)\n\n- [Colab](https://colab.research.google.com/drive/17SSjougcnVhX6WewW88QoKKFuFiKNz8t?usp=sharing) by jazzpear96 (with the OG MSST repo; maybe you could just replace the link)\n\nI think later I quote jarredou on training with extremely low parameters configs on other archs as well.\n\n- [Here](https://discord.com/channels/708579735583588363/708912597239332866/1225994922822467674) the community guide one user on training on 3060 12GB ([invite](https://discord.gg/ZPtAU5R6rP))\n\n\\_\\_\\_\\_\n\nJFI -\n\nMulti Source Diffusion\n\n<https://github.com/gladia-research-group/multi-source-diffusion-models>\n\nSome results posted by Bytadance were labelled as “MSS” but it’s rather not the same arch. In the original MSS paper above, only Slakh2100 was used.\n\nByteDance probably expanded it further, and had it was said they had issues with their legal department with making their work public, so they can equally use unauthorized stems just like us, or looking for ways to monetize their new discovery for TikTok, as their company largely invest in GPUs lately, so something might happen maybe in the end of the year, and maybe it will be released in their exclusive service (Ripple and Capcut were released later indeed). TBH, it's hard to get a good model using only public datasets. For public archs, it's even impossible. They probably know it too, so it’s kinda grey zone, sadly, and model trained later for Ripple was probably done from scratch and contains only lossy files for training from now on.\n\nBytedance (Ripple too?) was said to train on 500 songs only + MUSDBHQ\n\n###### **BS-Roformer**\n\n(now incorporated in [MSST](#_2y2nycmmf53) by ZFTurbo -\n\nFor weird errors with depth and stuff like:\n\nWrong shape: expected 4 dims. Received 3-dim tensor:\n\n\"zf has been testing some things with roformers lately, i already notified him that the rofos have bugs\n\nyou'll have to use an older version of mel\\_band\\_roformer.py, for example 1-2 months ago\" - becruily\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/tree/v.1.0.4>\n\nold ass one, it doesn't even have sage attention, LoRA and all that shit - mesk)\n\nCurrently one of the two best archs, but the slowest tested arch out of the all in this doc SDR-wise. Once considered as SOTA (state-of-the-art algorithm), but it has its own caveats, like very strong denoising (which is double-edged sword and might give too muddy results frequently), but using Mel-Roformer and proper config tweaks and prioritizing stem other than vocals there helped for the muddiness issues in instrumental along with using proper loss which determines [bleedless and fullness](#_le80353knnv5) of the model. Plus, it seems like training 2 stem model (dual/duality/deux) and then retraining into a single model gives good results. Also, Mel-Roformer has a bigger SDR according to the Mel paper (with an exception for bass), and it’s better for vocals than BS. Plus, Mel seems to handle creating duet singing separation model better. “BS (...) uses more VRAM; unlike Mel, BS has no overlap between bands, so VRAM usage and model size are smaller.” Unwa More below.\n\n“I find it cute how they call the Transformer based models (which destroy the older convnets) \"Roformers\" because they use RoPE embeddings. By that naming scheme, all llama-like models are Roformers too...” kalomaze\n“By the way, it wasn't us who started calling Transformer using RoPE \"Roformer.\n<https://arxiv.org/abs/2104.09864>\n\nMelBand-Roformer and BS-Roformer may also be considered as generative objective models in a sense. The goal of these models is to generate a mask to extract the desired stem from the mixed source\" - unwa\n\n<https://github.com/lucidrains/BS-RoFormer> / [mirorr](https://codeberg.org/lucidrains/BS-RoFormer) (OG repo, it incorporates both BS and Mel variants, implemented in MSST training repo above)\n\nIt’s safe to say it’s SAMI Bytedance arch from MVSEP chart reimplemented from their paper - done by lucidrains.\n\nArch papers:\n\nBS (band split): <https://arxiv.org/abs/2211.11917>\n\nMel (mel scale): <https://arxiv.org/pdf/2310.01809.pdf>\n\n*Technical details about Roformers trained from scratch*\n\n“Bytedance didn't give any info of training duration for these scores, but in their last [ISMIR2024] paper for Mel-Roformer:\n\n<https://arxiv.org/abs/2409.04702>\n\nwith L=12, they get 12.08 SDR on vocals with Musdb-only by using :\n\n8 Nvidia A-100-80GB GPUs with batch\\_size=64, and the training stopped at 400K steps (~40 days).”\n\n32x V100 will require two months of training (most likely for 500 songs only + MUSDBHQ)\n\n“It’s better to have 16-A100-80G”, viperx trained BS-Roformer with 4500 songs on 8xA100-80GB and after 4 days achieved epoch 74, and on epoch 162 achieved only 0,0467 better SDR for instrumental.\n\nZFTurbo having 4x A6000 gave up training on it, having to face 1 year of training time.\n\nAfter the BS variant, [**Mel-Band RoFormer**](https://github.com/lucidrains/BS-RoFormer/tree/main/bs_roformer) based on the band split was released (“Mel-Roformer uses a Mel-Band spectrogram whereas BS-Roformer doesn't”), which is faster (becruily and stray\\_kids\\_filters actually claims that Mel is more GPU intensive than BS for training, and BS is faster). Initially it achieved worse SDR than BS-Roformer than on the paper. But it was till Kimberley Jensen released her new Mel model, and by the occasion, tweaked config in a way that it made the SDR on pair with BS variant, but presumably, by training on even smaller dataset.\n\nLater, viperx trained drums only model, both on BS and Mel-Roformer, and BS-Roformer was still slightly better, but there wasn’t such a difference between both anymore (12.52 vs 12.40 SDR).\n\n“Main diff is that BandSplit is using linear Hertz scale for frequency splitting while MelBand is using Mel scale (which is a more close representation of how humans are hearing frequency distances than linear Hertz scale).\n\nMelBand matrixing is by design using overlapping frequency bands, while with BS, there's no overlap in the frequency range.” jarredou\n\n“remember, just because the melscale is more perceptual, it doesn't necessarily translate into the neural net learning the representation better. It might be a good idea to use a modified zipformer” frazer\n\n**Kim’s Mel training** [config](https://drive.google.com/file/d/1Bx1UpCCtWbANF8V-mGkWrrDpefaO2P4g/view?usp=sharing) made for H100 (model on x-minus was trained for 3 weeks) which viperx probably used later for drums model or reworked for his GPU.\n\nKims says 5.0e-05 is already low enough learning rate, setting it too low may make it too slow to train. Kim “also said that during the end of her training the loss would plateau but the SDR was improving”, “no patience, no LRreduceonpleateau at all. While, if set incorrectly, it can ruin your training very quickly” - jarredou\n\nWith the unwa’s inst v1 model it turned out, prioritizing stem in the config matters a lot, so depending on which model you want (other, instrumental, null, multistem like in duality models or vocal) that’s what you should set in the config. Although, prioritizing stem to instrumental gave a noise similar to VR or MDX-Net v2 models (but not the same). We ended up with the [code](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=drive_link) based on Aufr33 idea, recreated by Becruily, that copies phase from Mel-Kim model which is deprived of the noise and doesn’t prioritize vocal stem like unwa inst 1/1e/v2 models, and it gets rid of some noise in those models. Aufr33 own implementation is added in ensemble on x-minus.pro/uvronline and in UVR latest patches there’s becruily [script](#_j14b9cv2s5d9) rewrite.\n\nAccording to mesk, fine-tuning a fine-tuned model might be a worse solution than simply fine-tuning the Kim's model (from experience on training on genre-oriented dataset like metal which Mesk tried to train).\n\n“I got an error when I set num\\_stems to 2.” unwa\nYou can use “target\\_instrument: null” instead, which is also required for multistem training like on [this](https://imgur.com/a/eOSW8I7) example ~jarredou\n\n“increasing num\\_stems increases model size” “multistem is like having multiple checkpoints in one file (1 for each stem). All model types work like that with ZFTurbo's script AFAIK”\n\nuse\\_torch\\_checkpoint: true\n\nin the current MSST repo will reduce VRAM usage.\n\nUsing various chunk\\_size during different stages of the training can be helpful, and also using different dataset sizes based (e.g. leaving only more clean or official at certain points).\n\nTraining 2 stem/dual fine-tuned model like deu,x and then retraining to single stem seems to yield good results.\n\nWarning! For weird errors with depth and stuff like:\n\nWrong shape: expected 4 dims. Received 3-dim tensor.\n\n>Becruily: ZFTurbo \"has been testing some things with Roformers lately, I already notified him that the Rofos have bugs.\n\nYou'll have to use an older version of mel\\_band\\_roformer.py, for example 1-2 months ago\"\n\nMesk: I'm using this one:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/tree/v.1.0.4>\n\nOld ass one, it doesn't even have sage attention, LoRA and all that shit\n\nBecruily: They're useless anyway]\n\nSo the error was fixed that way.\n\n\\_\\_\\_\n\nFirst BS-Roformer models were trained on ZFTurbo dataset, later viperx trained on his own, presumably larger dataset (and possibly better GPU) and achieved better SDR, then another model was made from fine-tuning viperx model on ZFTurbo dataset, and Kim’s Mel model was trained on Aufr33 dataset from UVR, later Unwa trained on Bas Curtiz’ dataset (?too).\n\nYou can use ZFTurbo code as base for training Roformers:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training>\n\n“change the batch size in config tho\n\nI think ZFTurbo sets the default config suited for a single a6000 (48GB)\n\nand chunksize” joowon\n\nSo, to sum up, BS-Roformer is the best publicly available arch SDR-wise for now (and in practice, Mel-Roformer scores a bit lower on the same datasets with the same public code we currently have), although both are very, very demanding compared to MDX23C or MDX-Net v2 or VR (voice-remover by tsurumeso).\n\nE.g. Aufr33 said that BS Roformer turned out to be better for training BVE model. “Although I get about zero or even negative SDR with both, the BS does the job better.\n\nMaybe it's not the architecture but the augmentation, I disabled it for BS ”. Some other explanations:\n\n“In BS-Roformer they don't do any downsampling or compression” hence it’s so slow to train.\n\n“I've noticed that it separates mono (i.e., panned in the center) harmonies well if the two voices belong to different people, and much worse if it's the same singer (even though they differ by a third interval).\n\nThis leads me to believe that the BS architecture would work well for separating female and male voices.” (after some problems with SDR being in the region of 0 or 1) “I have another thought. Roformer works differently with stereo, not like VR. It's like it partially merges channels, which is bad for backing vocal detection. It seems to me that using a stereo expander would help.” Aufr33. Eventually he later started the training Mel from scratch on 4x RTX 6000 Ada using stereo expander and that lower training rte, and eventually it started to increase SDR, but later negative value was showing, and he continued training BS and SDR was above 1, but it stopped progressing, “It seems that 525 songs is not enough to train a [BS/Mel] model.”\n\nSome things here can be outdated already, as some optimizations were introduced to the training code (read more below):\n\n\\_\\_\n\nTime-wise, BS vs Mel, instead of 16x a100 in BS-Roformer, it might be like 14x a100 to train in decent time, but at best, without the config tweak, SDR will be only in pair with MDX23 and MDX-Net archs v2 archs, and BS-Roformer will achieve better SDR than Mel-Band. Might be some issue in Mel-Band Roformer reimplementation, maybe paper lacking something. Only in BS-Roformer some of the original authors from Bytedance took part in some reviewing of the reimplementation code made by lucidrains.\n\nOn Mel-Band, epoch 3005 took 40 days on 2xA100-40GB with the previous viperx model.\n\nViperx trained their own vocal model, using BS-Roformer on +4500 songs (studio stems \\* +270h) using 8xA100-80GB, and only on epoch 74 they almost surpassed sami-bytedance-v.0.1.1 result (which was actually multistem model iirc), achieving 16.9279 for instrumental, and 10.6204 for vocals.\n\nWith epoch 162, they achieved 16.9746 and 10.6671, which for instrumental, is now only 0,0017 difference in SDR vs v.0.11 result.\n\nTraining settings:\n\nchunk\\_size 7.99s\n\ndim 512 / depth 12\n\nTotal params: 159,758,796\n\nbatch\\_size 16\n\ngradient\\_accumulation\\_steps: 1\n\nSince epoch 74 there were “added +126 songs to my dataset”\n\nTraining progress:\n\n<https://ibb.co/1zfFX82>\n\nSource:\n\n[https://web.archive.org/web/20240126220641/https://mvsep.com/quality\\_checker/multisong\\_leaderboard?sort=instrum](https://web.archive.org/web/20240126220641/https%3A//mvsep.com/quality_checker/multisong_leaderboard?sort=instrum)\n\n[https://web.archive.org/web/20240126220559/https://mvsep.com/quality\\_checker/entry/5883](https://web.archive.org/web/20240126220559/https%3A//mvsep.com/quality_checker/entry/5883)\n\nIt sounds similar to the Ripple model.\n\n\"7 days training on 8xA100-80GB\": 7\\\\*24\\\\*15.12 (runpod 8xa100 pricing) = $2540.16”\n\nviperx trained on dataset type 2, meaning that he had 2 folders:\n\nvocals and other and no augmentations\n\n“For more detailed infos, you can read ZFTurbo's doc about dataset types handled by his script <https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/dataset_types.md>”\n\nviperx trained on faster Mel-Roformer arch variant before, and on epoch 3005 trained 40 days on 2xA100-40GB with 4500 songs, he achieved only 16.0136 for instrumentals, and 9.7061 which is in pair with MDX-Net voc\\_ft model (2021 arch).\n\n“Each epoch [in Mel-Roformer] with 600 steps took approximately 7 to 10 minutes, while epochs with 1000 steps took around 14 to 15 minutes. These are estimated times.\n\nInitially, I suspected that the SDR was not improving due to using only 2xA100- 40GB GPUs. After conducting tests with 8 x 80GB A100 GPUs, I observed that the SDR remained stagnant, suggesting that the issue might be related to an error in the implementation of the Mel-Roformer architecture.” [More info](https://mvsep.com/quality_checker/entry/6019) ([copy](https://web.archive.org/web/20240211181707/https%3A//mvsep.com/quality_checker/entry/6016)). Probably it was the issue (at least partially?) fixed by Kim config tweaks.\n\nLater, the viperx’ BS-Roformer model was further trained from checkpoint by ZFTurbo, and it surpassed all the previously released models, and even ensembles, at least SDR-wise. Then it was finetuned on different dataset. Still, as all Roformers, it might share some characteristic features, like occasional muddiness, and filtered sound at times, but Mel variant seems to be less muddy.\n\nMore insides:\n\n<https://github.com/lucidrains/BS-RoFormer/issues/4#issuecomment-1738604015>\n\n<https://media.discordapp.net/attachments/708579735583588366/1156700109682262129/image.png?ex=6516952c&is=651543ac&hm=988a5acc32f075988c1701d41c2090321a25955c4ffedd64516e0062fa1002e0>\n\n<https://cdn.discordapp.com/attachments/708579735583588366/1156700305069707315/image.png?ex=6516955b&is=651543db&hm=bf5737f95f3a93fd3e3a23a679e2ad0031e0feb6c622fbb85eafa053ed483e08>\n\n<https://media.discordapp.net/attachments/708579735583588366/1156700453585829898/image.png?ex=6516957e&is=651543fe&hm=06ed766b39c3c7f4a8329420a22bcc572e856116a6e1cea030d158c984c46825>\n\nZFTurbo:\n\n\"1) Best [BS-Roformer] model with the best parameters can be trained only on A100, and you need several GPUs. The best is use 8. It reduces possibilities of training by enthusiasts.\n\n[later it was found out that checkpointing decreased VRAM usage allowing using probably more modest GPUs]\n\nAll other models like HTDemucs or MDX23C can be trained on single GPU. Lower parameter BS-Roformers don't give the best results. But maybe it's possible. Solution:\n\nWe need to try train smaller version which will be equal to current big version. Lower depth, lower dim, less chunk\\_size. We need to achieve at least batch 4 for single GPU. Having such model can be useful as starting point for fine-tuning for other tasks/stems\n\n[perhaps Unwa’s 400MB exp value residue model meets that requirements).\n\n2) I also noticed a strange problem I didn't solve yet. If you try to finetune version trained on A100 on some cards other than A100 then SDR drops to zero after first epoch. Looks like \"Flash attention\" has some differences (???).\n\n3) Training is extremely slow. And I noticed BS-Roformer more sensitive to some errors in dataset.\n\n[probably for 4090]\n\nchunk\\_size: 131584\n\ndim: 256\n\ndepth: 6\n\nI think these settings can give batch\\_size > 4\n\nFor example, I can't finetune viperx model on my computer with 48GB A6000 because the model is too large.\n\nchunk\\_size is what affect model size the most, I think. And I saw it's possible to get good result with small chunk size.\n\nI put the table here:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/bs_roformer_info.md>”\n\n[see also <https://lambdalabs.com/gpu-benchmarks> batch size chosen in Metric, fp16, but ZFTurbo said that training on fp32 is also possible]\n\nThe 0 SDR issue was later fixed: “the non A100 issue is fixed with latest torch, but I think the general rule is that batch size 1 (like in my case with 3090) won't give good results on Roformers.\n\nBut I’ve been doing batch size 2-4 with A6000 and A100 and no issues there.\n\nBut the model is also large, when I finetuned a smaller Mel-Roformer the 3090 worked there”\n\n“Full sized Roformers are very heavy to train and unless you have crazy hardware like A6000 (the very minimum), A100 etc, training from scratch will take months to get a good model (with SDR similar to current ones, and this is without considering the dataset).\n\nMaybe someone with 4090 can give more insight, but I personally can't train/finetune a full sized Roformer model with my 3090, it's way too weak, I'd have to make the model smaller meaning the quality won't be as good as the original checkpoint” becruiily\n\n##### **More hints/FAQ for training Roformers**\n\nQ: Can we change the model size of the existing model and fine tune it? Or it must have been trained from scratch with the same chunk size\n\nA: 1) If you just decrease chunk size, it will work almost the same as with larger (as I remember)\n\n2) If you decrease dim or depth, score will drop very much\"\n\nDon't forget, each time you change something in dataset, you have to delete metadata\\_x.pkl file to create new database on training launch taking into account new changes (it made me become crazy during my first tests when forgetting to delete it)\n\nI've just checked ZFTurbo's code, and for dataset type 2, the \".wav\" extension is still required for the script to find the files (it doesn't work with any other)\n\n- Changing dims in the middle of training is maybe not the best thing to do, unless you really know what you're doing. chunk\\_size, yeah, but wait until it become stagnant/slowed down - jarredou\n\nQ: What can I increase in the config.yaml to train the model better?\n\nA: Depth - unwa\n\nQ: Is it possible to fine tune the 3080 Ti 8GB? unwa did it [it was actually 3070 Ti 8GB]\n\nA: “By reducing the chunk\\_size and using AdamW8bit for the optimizer, I was able to train even with 8GB of VRAM.” Usually such fine tune on inferior GPU was decreasing SDR. Here it only dropped by 0.1 SDR after few thousands steps of training (so only a few epochs) on musdb18hq+moises+original dataset (41 songs (In total, 432 tracks, 16.5GB, FLAC, very small dataset, OG was trained on 5K songs) ~becruily\n\nUsed config to finetune:\n\n<https://drive.google.com/file/d/1gK1_n_bpRHD1i02VA2bgUc3TrpEJUcg9/view?usp=drive_link>\n\ntrain.py with optimizer (probably it's already integrated into MSST code):\n\n<https://drive.google.com/file/d/1jLSTDajYxZRSb5wLOwyVuRJrayNLIWUZ/view?usp=sharing>\n\nThe changes were pushed to ZFTurbo training dataset, but the optimizer turned out to not save too much VRAM ([diagram](https://discord.com/channels/708579735583588363/911050124661227542/1268191215875129387)).\n\n“'adamw8bit' for training.optimizer in config). Also added possibility to provide optimizer parameters in the config file using keyword 'optimizer'.” ZFTurbo\nOptimizers explained later below.\n\nAnd it’s not enough to point out that the model trained by unwa turned out to have the same 0 SDR issue becruily had, but unwa didn’t notice it due to lack of validation dataset. So inference worked correctly despite 0 SDR, but the model was getting worse gradually during finetuning.\n\nFrazer suggestion (probably already implemented by ZFTurbo):\n\n“I think the issue with the checkpointing / 0 SDR bug is due to\n\n|At least one input and output must have requires\\_grad=True for the reentrant variant. If this |condition is unmet, the checkpointed part of the model will not have gradients. The |non-reentrant version does not have this requirement.\n\nSo I think the fix is either assigning\n\n*x.requires\\_grad=True*\n\nin train.py to the batch tensor before passing into the model\n\nor passing\n\n*use\\_reentrant=False*\n\nto all torch.utils.checkpoint.checkpoint calls\n\nI think it's probably better to use the non-reentrant variant, since Pytorch will default to this in later versions\n\n<https://pytorch.org/docs/2.1/checkpoint.html#torch.utils.checkpoint.checkpoint>\n\nalso this point here looks interesting to test whether it causes a significant performance hit or not\n\n|The logic to stash and restore RNG states can incur a moderate performance hit depending |on the runtime of checkpointed operations. If deterministic output compared to |non-checkpointed passes is not required, supply preserve\\_rng\\_state=False to checkpoint or |checkpoint\\_sequential to omit stashing and restoring the RNG state during each |checkpoint.”\n\n- jarredou “Optimizers play a role in the output quality and characteristics more that just speed of training, using @BortSampson's \"live training stft visualizer\" (with some tweaks, like mel scale) and mdx23c (because it's fast), I made these videos with different optimizer, applied on exact same chunk. Some are close to each other, but some behave differently, leading to quite different outputs (...) originaly 1 video frame = 1 step (so it's training with batch\\_size=1 on a unique chunk to make the visualisation possible), but it was speed up to fit in 30s videos\n\nThis experiment has lot of possible bias, but goal was not to show \"best optimizer\", just that they behave differently, and so give different results ” [See](https://discord.com/channels/708579735583588363/708912597239332866/1425662684124418079)\n\nQ: Does gradient clipping still matter, but not patience/reduce\\_factor?\n\nanvuew: with multi resolution stft loss, grad is stable, most of time you dont need clip grad\n\nbecruily: if you're finetuning with prodigy, id rather set grad clip to 0 but reduce lr to 0.1 if it explodes\n\nQ: Should learning rate be always 1.0 with prodigy? - AngelB\n\nbecruily: If it’s stable and the model learns yes leave it 1.0\n\nQ If it doesn't improve anymore, then the training basically \"done\"? Or can we still \"downgrade\" learning rate by 0.95?\n\nA: For prodigy you can try 10x lower or 100x lower, if it’s still not learning then it’s done probably (with the current dataset)\n\nQA I changed the recommended settings in Colab and it improved a lot; the drums no longer interfere with the vocals. Now I have a problem: even after removing back vocals, reverb, and finally echo, I still have a slight hint of reverb or echo—I don't know which—but it appears from time to time.\n\n- frazer: me and becruily found something weird with adam-- if you reset after a convergence (stop training, and rerun with the new startcheckpoint) and use the last checkpoint as the warmstart... you just keep jumping up SDR\n\ni think theres something kinda bad about adam's momentum where resetting it seems to allow the model to learn more than using the same optimizer state\n\nthe gains are silly as well\n\n[pic](https://imgur.com/a/BzgX579)\n\n- just noticed that pytorch 2.9 has officially implemented Muon optimizer\n\n<https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html>\n\n- It was discovered that using different chunk\\_sizes at various stages of training can be beneficial (iirc esp. for training time at initial stages without much SDR sacrifice).\n\nQ: Is it okay to mess around with the config when restarting the training? or should I just leave it as it was?\n\nA: Unless there is some issues really messing up your training (and most of the time you can spot that at first trained/finetuned epoch results), with that kind of AI training, you'll need hours days/weeks to really know the outcomes of your changes (especially with roformers). With small arch and config, you can have better insigjhts quickly (that's why I'm doing these tests with mdx23c arch currently) - jarredou\n\n- “I added code to train using accelerate module:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/train_accelerate.py>\n\nIt's useful on multi GPU systems. In my experiments, speed improved ~50%.\n\n~1.57 sec per iteration goes down to ~1.07 sec per iteration.\n\nBut I think the script has some flaws - my validation score during training is lower than in reality. I didn't find the reason yet.\n\nAlso, script allows training across multiple machines without changes in code.\n\nMore information here:\n\n<https://huggingface.co/docs/accelerate/index>\n\n<https://huggingface.co/docs/accelerate/quicktour>” - ZFTurbo\n\n- What unwa said later in October 2024 what allowed the fine-tunes to be made on 8GB VRAM and RTX 3070, is they used gradient checkpointing - “time and space are a trade-off, and [gradient checkpointing](https://github.com/cybertronai/gradient-checkpointing) saves memory at the expense of computation time”\n\nQ: And you don't get the 0 SDR issue?\n\nA: “Yes. I'm using L1Freq metric now. As it turns out, it was not a failure to train properly, but just a problem with the validation function.” - unwa\n\n[More](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/82) on the issue. “The validation issue should be solved now [in valid.py] but not sure if it was the same issue ZFTurbo was facing”\n\n- “<https://huggingface.co/pcunwa/Mel-Band-Roformer-small>\n\nIn the experiments with the Mel-Band Roformer big model, it was confirmed that increasing the number of parameters for the Mask Estimator did not improve performance.\n\nTherefore, I conducted an experiment to see if I could reduce the number of parameters while maintaining the performance.\n\nIt even runs well on 4 GB cards due to the reduced memory used.” - unwa\n\n(10.10.24) ZFTurbo: “I looked onto unwa code for small Roformers. Roformers have one parameter mlp\\_expansion\\_factor which couldn't be change[d] from config and fixed as 4. It uses a lot of memory:\n\n│ └─MaskEstimator: 2-8 [1, 1101, 7916] --\n\n│ │ └─ModuleList: 3-73 -- 201,465,304\n\nif set to 1 (memory reduced 10 times 200 MB to 23 MB):\n\n│ └─MaskEstimator: 2-8 [1, 1101, 7916] --\n\n│ │ └─ModuleList: 3-73 -- 23,836,120\n\nYesterday I already added in my repository possibility to change mlp\\_expansion\\_factor from config.\n\nUnfortunately, while overall number of weights is greatly reduced, it won't allow to greatly increase speed or batch size for training:\n\nMel band (384, 6, chunk: 352800)\n\nmlp\\_expansion\\_factor = 4, normal training: batch size: 2 (1.27 s/it)\n\nmlp\\_expansion\\_factor = 3, normal training: batch size: 2 (1.20 s/it)\n\nmlp\\_expansion\\_factor = 2, normal training: batch size: 2 (1.19 s/it)\n\nmlp\\_expansion\\_factor = 1, normal training: batch size: 2 (1.16 s/it)\n\nEven in last case, batch size 3 is not possible\n\n- I will check how \"checkpointing\" method works\n\nOMG checkpointing technique impressed me a lot!!\n\n**It reduced required memory ~20 times**\n\nMel band (384, 6, chunk: 352800) Single A6000 GPU 48 GB\n\nmlp\\_expansion\\_factor = 4, normal training: batch size: 2 (1.27 s/it) - 0.635 sec per image\n\nmlp\\_expansion\\_factor = 3, normal training: batch size: 2 (1.20 s/it) - 0.600 sec per image\n\nmlp\\_expansion\\_factor = 2, normal training: batch size: 2 (1.19 s/it) - 0.595 sec per image\n\nmlp\\_expansion\\_factor = 1, normal training: batch size: 2 (1.16 s/it) - 0.580 sec per image\n\nmlp\\_expansion\\_factor = 1, low mem training: batch size: 2 (0.60 s/it) - 0.300 sec per image\n\nmlp\\_expansion\\_factor = 1, low mem training: batch size: 40 (6.32 s/it) - 0.158 sec per image\n\nmlp\\_expansion\\_factor = 4, low mem training: batch size: 40 (6.87 s/it) - 0.171 sec per image\n\n*So my batch size for single GPU grew from 2 to 40*\n\n*[So maybe there won’t be necessary to train a good model without x16 A100]*\n\nAnd speed per image increased ~ 4 times.\n\nOk changes in repo. To train with low memory, you need to replace only one thing: mel\\_band\\_roformer -> mel\\_band\\_roformer\\_low\\_mem. And increase batch\\_size in config. All weights and model parameters are the same.\n\nThe same can be done for BSRoformer as well (need to add).\n\nWith current improvements for memory, we can try big depths for training\n\nBS-Roformer with depth 12 now has batch\\_size: 32\n\nWe can add sum of inputs, for example for every 3 blocks of freq and time transformer blocks.\n\nOr even use DenseNet approach.\n\nI found a problem. If internal loss calculation for Roformers is used based on FFT. Batch size reduced to 12 instead of 40.\n\nLoss calculation inside the model consumes too much memory.” - ZFTurbo\n\nunwa: The core of the model is the Roformer block, and the Mask Estimator probably did not need that many parameters.\n\nAccording to the paper, the entire model has 105M parameters, whereas when the mlp\\_expantion\\_factor is 4, the Mask Estimator alone exceeds that number by a wide margin.\n\nSorry, I forgot about this, I had removed 4096 from multi\\_stft\\_resolutions\\_window\\_sizes.\n\nQ: Is the speed also faster despite the use of gradient checkpointing because memory bandwidth was the bottleneck?\n\nA: I don't know, maybe yes. Now we need to ensure it doesn't affect the training process. And the quality of models stays the same.\n\nSo there is no need to decrease mlp\\_expansion\\_factor from 4 to 1 currently (may be later for train new models).\n\nI will add possibility to train with low mem in my repo in several minutes\n\nI think speed up is because of the benefits of large matrix multiplication (because it's calculated for 40 images at the same time).\n\nQ: Can the gradient checkpointing be applied to other architectures? For example SCNet\n\nA: I think yes, but maybe with lower benefits.\n\n- “One thing to keep in mind too, is that Rofos are using flash attention by default, which is not compatible with all GPUs (not with small ones), and this flash attention is greatly reducing training duration, from what I get. Non-compatible GPUs use lower performing attention.\n\n<https://github.com/Dao-AILab/flash-attention>”\n\nSupported GPUs: <https://imgur.com/a/QGtSuKR> ([src](https://github.com/Dao-AILab/flash-attention#nvidia-cuda-support) - half a year later, Flash Attention 2 is still unsupported on RTX 2000 series, and FA3 beta is available for Hopper GPUs (e.g. H100)\n\nQ: The training script defaults to memory efficiency unless you use A100 though, does this mean ZF didn't implement flash att for the other GPUs listed there\n\nA: I don't know, I think it's more related to the custom flash attention module made by lucidrains, he doesn't use this flash attention repo. frazer and unwa had a discussion about that in devtalk some days ago, I didn't understand everything, just learnt that it was custom implementation\n\nanvuew made a pull request (“Enable flash attention for compute capability >= 8.0” so GPUs from RTX 3000 onward) <https://github.com/ZFTurbo/Music-Source-Separation-Training/pull/52> (merged already)\n\nQ: Is adam or adamw better for Mel-Roformer?\n\nA: “This is not particularly relevant if weight decay is not used.\n\nIn Adam it was implemented with L2 normalization; in AdamW it is implemented in its original form.” - unwa\nBas Curtiz later conducted some experiments on the best optimizer, and the winner is:\nProdigy (with LR 1.0) on the same amount of steps (~14K) and ~20 hours in (at least when training from scratch). [More](https://discord.com/channels/708579735583588363/708579735583588366/1303100703144804394)\n\n- Unwa’s\n\n“1) v1e fine-tune model was trained with a custom loss, if you don't use the same or similar multi resolution loss, your results will be bad\n\n2) because of [the] 1[st], SDR is not the best metric to keep track of how good the model is, SDR will be lower when training with mrstft losses\n\n3) you must set the target instrument to other, and yes you need more vocals” - becruily\n\n- Unwa “reduced mask estimator depth from 3 to 2, and he said that it didn't hurt the quality but reduced the size significantly. also there's some additional line 'mlp\\_expansion\\_factor: 1' in his small model config. Maybe that helped somehow too.”\n“The mask estimator is already 2 by default on Kim model for example (and in bort's config too)”\n“I have unwa's beta and duality models and on batch size 1 they don't eat that much memory”\n“inference maxes out the GPU mem and sits at 0%\n\nckpt file is 3GB.\n\nMoral of this story is try running inference before you get to epoch 80”\n\nQ: Does anyone here know how much VRAM is required to train a Roformer model with the same specs as Unwa's and Kim's models?\n\nA: ZFTurbo has made this small benchmark some months ago with BS-Roformer:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/bs_roformer_info.md>\n\nand [newer](https://imgur.com/a/G97LcMz) one for Mel-Roformer\n\n- ZFTurbo experimented with 4 stems Mel model creation on MUSDB18, and struggled with getting good results like in the paper. [Here](https://imgur.com/a/cWme9ZW) he evaluated various parameters and achieved SDR.\nEventually, he released a checkpoint with different parameters [here](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/mel_roformer_experiments.md).\n\nLater, he trained BS-Roformer 4 stem model on MUSDB18.\n\nQ: Is it not possible to emphasize both SDR and Aura scores?\n\nA: “Training the model using [l1\\_freq and AuraMRSTFT] metrics is prone to phase problems. Like my 5e and v1e models” Unwa\n\n- “You can enable spectral phase with auraloss (never really tried)” J.\n\nA: It makes it less stable in training; Loss is more likely to be NaN\n\nIt is difficult to optimize a phase spectrogram that looks like noise.\n\nA: “You don't emphasize a model with them, they're just metrics, and you're tracking how well each epoch scores\n\nthe metrics will go up and down, but it doesn't specifically emphasize the chosen metric B.”\n\n“Phase also has a significant impact on sound quality and cannot be ignored. However, these metrics ignore phase; models that emphasize fullness are those that compromise phase optimization to some degree in favor of optimizing the amplitude spectrogram, and clearly these metrics favor such models.\n\nI would endorse the log\\_wmse metric.\n\nIt is a relatively new time-domain metric over SDR and SI-SDR that is not overly sensitive to low frequencies like SDR and can accurately evaluate silent intervals.\n\nIn addition, time-domain metrics can be evaluated for both amplitude and phase.”\n\nQ: Is there a version of Mel-Roformer without all the nonsense value residuals and stuff like that\n\nA: Yeah ZFturbo separated those a few weeks ago on a new file\n\n“<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/models/bs_roformer/mel_band_roformer.py>\n\nor I guess lucidrains file is as bare/og as it gets”\n\nLater, frazer’s blocks are written somewhere in #dev-talk.\n\n- If anyone wants to see what I consider to be a readable implementation of BS-Roformer, at least to my brain:\n\n<https://github.com/polson/MusicSep-Lightning/blob/main/model/bs_roformer/model.py>\n\n- BortSampson\n\n- Some Roformer training insides in the far right of [this](https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?gid=1025734018#gid=1025734018) Bas Curtiz’ sheet (phase image may overlap the text, navigate by arrows to read it down below).\n\n- “bleedless/fullness metrics are stft magnitude-only based and as they are discarding the phase data, they have some kind of blind spots.\n\nI guess this noise could be also reduced by using higher n\\_fft values for model (smaller bins, finer freq separations, but way more resources needed to train models)” - jarredou\nQ: High n\\_fft values increase the frequency resolution but decrease the time resolution\n\nA: Yeah. But Roformers are trained with 2048, it's not a high value. MDX23C original models are using 8192 by default, with improved results compared to lower values. (ZFTurbo has made some tests back then comparing different n\\_fft/hop\\_length config)\n\nQ: Does it matter that much when the model is trained with multi resolution. it should cover both low and high nfft values\n\nA: n\\_fft=2048 is around 21.53 Hz resolution per bin (on linear scale)\n\nwhile n\\_fft=8192 gives 5.39 Hz resolution per bin\n\nThis should benefit most of the stems types (maybe not drums and transient heavy content tho)\n\nWe don't have a multi-resolution arch yet, even if it could be interesting. Only the loss in multi-resolution, not the model. - jarredou\n\nQ: Does fine-tuning need to reach hundreds of epochs?\n\nA: “Not always. Depends on the amount of data - if it's small you could literally get away with training for a day or half a day” 40 hours is enough for just fine-tuning (frazer)\n\nQ: Is there something wrong with the config I’m using? I’ve already tried training 3 times, and the pattern is always the same, after the training metrics improve about 6 times, it becomes really difficult for them to improve any further in the next epochs. i’ve already tried changing the lr to 5e-5 and 1e-5, but it's still the same\n\nA: “try fiddling with adam's betas - change from betas=(0.9, 0.999) to betas=(0.8, 0.99)\n\nadamfactor might be a cool thing to test as well\n\nadd this into the config, it might work”\n\n*optimizer:*\n\n*betas: (0.8, 0.99)*\n\n(frazer)\n\nIdent like in a new line equally with training and inference, placed below training (so actually none)\n\nQ: So far the metrics keep improving, unlike the previous training. It didn't just cap like last time.\n\nQ: After e.g. 14 epochs, you can try out restarting the checkpoint from the best SDR weight because it will do this \"it's weird. The optimizer for whatever reason gets fucked up after a while\": [pics](https://imgur.com/a/41XSTNO) ([src](https://discord.com/channels/708579735583588363/708912597239332866/1416006626497794048))\n\n(frazer)\n\nA: how to setup this graphs?\n\n—wandb\\_key YOURKEY\n\n- Loss 0 issue\n\n<https://discord.com/channels/708579735583588363/1220364005034561628/1416389629866672261>\n\nQ: Is this right? i resumed yesterday’s training, but it went back to epoch 0\n\nA: yea that's normal because you're restarting, i think there's something to make it so that it saves that data but IDK\n\n- 2nd epoch takes a long time issue\n\n<https://discord.com/channels/708579735583588363/708912597239332866/1415577958646808636>\n\n“When VRAM runs out and begins using main memory, processing becomes extremely slow.”\n\n- SDR measurement is logarithmic, meaning that 1 SDR is 10x the difference.\n\nQ: Why I have negative SDR values (based on HTDemucs)\n\nA: Make sure there are no empty stems in any training dataset and or validation dataset\n\nBelow is just a theory for now and probably wasn't strictly tested on any model yet, but seems promising\n\nQ: Can you not calculate the average dB of the stems and fit one limiting value to them all?\n\nA: the stems are divide-maxed prior\n\nmeaning they are made so, that when joined together, they won't clip\n\nbut are normalized\n\nso they will be kinda standardized already\n\nbased on that, I should be able to just go with one static value for all\n\nExample\n\n<https://www.youtube.com/watch?v=JYwslDs-t4k>\n\nQ: This is great, I actually used this method before with a few sets of stems, before I decided to try sidechain compression/ Voxengo elephant method, but I'm not too sure if I am on the right path. However, I'm pretty sure this only works best for evaluation, if the resulting mixture has consistent loudness like in today's music.\n\nA: Yeah, it's a different approach than compression/voxengo indeed.\n\nBut the fact it scored high in SDR and UVR dataset is already compressed/elphanted\n\nI think it's a good combo to use both in the set, a bit like new style tracks and oldies [so to use both approaches inside the dataset]\n\nsome tracks in real life are compressed like fuck - some aren't\n\nso it mimics real life situation\n\nQ: if it's true that's awesome, with that the model basically has the potential to work in multiple mixing styles, without having to create new data, or changing it, right?\n\nWhile still adding new data\n\nA: Yeah, since UVR dataset is already compressed - and then add these one of mines with the more delicate way of mastering (incl. divdemax prior)\n\nQ: How do I make more metrics show up like this during training?\n\nA: --metrics sdr si\\_sdr log\\_wmse l1\\_freq aura\\_stft aura\\_mrstft bleedless fullness\n\n- “inverts can bring good quality shit if you clean the remaining instrumentals in your vocals with a Roformer (...) (ofc this only works if you possess both the original songs and instrumental version which you have to align perfectly)” - mesk\n\n- I discovered that viperx's bs-roformer weights contain a large number of consecutive zeros. I suspect this might be due to using smaller dim weights(192) that do not match the config(512) as a starting checkpoint. To what extent would this affect the model's performance?\n\nQ: I was wondering with mixup vs aligned stems, does anyone know if much accuracy is lost (if any)?\n\nA: ZF used to say aligned works better, lately i tend to agree with him\n\nthe roformer papers have used mixup to improve quality so its contradictive (probably more useful when dataset is small)\n\n“One observation:\n\n1) If I finetune model on aligned tracks (take all stems from the same place of the song) - I can increase score on multisong dataset, but synth dataset score dropped. So model care more about song structures. Because Synth dataset is random mix of samples.\n\nSo the model became less universal and became more song related.\" - ZFTurbo ([evals](https://discord.com/channels/708579735583588363/911050124661227542/1214822263111815198))\n\nQ: I want to make a vocal model that is really good for 1 specific artist\n\nGabox: Try lora.\n\nI tried to do that with a Japanese singer with my only bs karaoke model.\n\nKinda worked but idk\n\nQ: If i have like 100 sessions of an artist will that be enough\n\nG: No. 900\n\n- Saving weights as float16 reduces file sizes 2 times, but keep inference absolutely the same if you used `use\\_amp: True` for training. - ZFTurbo\n\nQ: What [metrics](#_le80353knnv5) should I monitor if I want to train fullness?\n\nunwa: aura mrstft\n\nmesk: I actually think the amplitude spectrogram fix with the multi\\_stft\\_window\\_sizes change is the best way to make fullness model sound less noisy, well you need to have a balance between bleedless and fullness so you have to know when to stop the training by looking at the metrics.\nthis was my biggest complaint about 07.2025 the fact that it was a muddy vocal model, so i'm very excited. sure i cant release it bcs ft of said model, but atp i don't really care\n\n- Starting with a chunk size of about three seconds will yield good results - unwa\n\nQ: I read some old messages where you were training very small chunk sizes with a large batch to start off Roformer training. Did you pick an overall strategy there to swap in larger chunk sizes at a certain point to build past like 8/9SDR? Heard you might be able to do that with RoPE encodings without changing the model size so you can continue but don't know yet.\n\nI'm training a Roformer from zero and it's steadily improving but it's been about 1.5 days and I'm up to like 5.2 SDR lol. Happy to read up on old threads with pointers too. - sammithy\n\nA: The BS-Roformer paper config is a large and slow to train model, you can try lower chunk size and increase batch size if you don't want to decrease params yeah. It might train faster, but eventually you gotta increase chunk size once it plateaus.\n\nMost of the time I just trained my own made up arch for fun and I never trained bs rofo long enough to get anywhere cool with it so idk how long it takes - BortSampson\n\nQ: Must we use only 1000 steps per epoch? Or can we increase the steps? I'm worried increasing steps per epoch might lead to overfitting. I am using dataset type 4. Or is there a way so that each epoch sees the entire training dataset?\n\nan epoch is how many times the model sees the entire dataset, basically, 1000 steps means it sees the dataset 1000 times - mesk\n\nbecruily: not really, there was a new dataset type added recently (5) that actually was made for the purpose of using the entire dataset the steps count will show much higher because thats how many steps it needs to read the entire dataset otherwise one epoch rarely passes the entire dataset\n\nQL Ah, I see. Is it connected to the batch size or is this indepently regarding batch size?\n\nA: ye, batch size is uuh, roughly how many chunks the model learns, batch size of 3 would mean 3 chunks of XX seconds per steps\n\nQ: So if batch size was, let's say, 2 and each chunk is roughly 3 seconds. Then it would still see the entire training dataset, even if it's overall length is like 4 hours?\n\nso in this case:\n\n1000 steps with batch size 3 and with 11 seconds would mean: 3 chunks of 11 seconds of random samples over 1000 passes in the dataset (if i understand that correctly, feel free to correct me xDDD)\n\nQ How do I make more metrics show up like this during training?\n\njust add more metrics to your commands\n\nunwa\n\n--metrics sdr si\\_sdr log\\_wmse l1\\_freq aura\\_stft aura\\_mrstft bleedless fullness\n\nQ: Today I installed the latest version of MSS. But how can I choose the loss function(s) and the model now? It shows this when I start training (I don't know what that is supposed to be)\n\nThere are no errors shown when I start training with CMD. It doesn't show what loss function is used. With an earlier version, it always showed what loss function has been used with the other informations when you start a training.\n\nA: it looks like you need to add the loss function to the print code here <https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/cbe87cbdf57f9fd3a5e08ad014a509dfce113ac7/train.py#L355>\n\nbcr: its a single thing, you can use it as a drop in for prodigy, og adam whatever:\n\nrandom\\_string\\_of\\_character: not really, no. There is a variant that is that but base muon doesn't store second moments (the variance of the weights) so it uses less memory. There's a good recent [video](https://www.youtube.com/watch?v=bO5nvE289ec) about it.\n\n[More](https://discord.com/channels/708579735583588363/1220364005034561628/1436057433775673524)\n\n- linear + act -> two kernel launches, one for linear one for act, fused is both in a single combine kernel - sammithy\n\nah right and they ship all the activation funcs - frazer\n\nyeah yeah its not deep, it just means when that layer runs instead of stacking a bunch of kernels automatically, itll use their custom bespoke all in one kernel\n\nits goated coz if its really fast u can throw them everywhere and anywhere\n\ndoes some cuda magic to basically allow lstm/gru to scale to crazy lengths without slowdown\n\nits just an optimizer but you can also manually define which layers to use muon and which to use adam\n\n\"Muon isn’t replacing AdamW entirely, but rather complementing it with specialized handling for matrix parameters.\"\n\n[https://medium.com/@jenwei0312/going-beyond-adamw-a-practical-guide-to-the-muon-optimizer-93d90e91dbd3](https://medium.com/%40jenwei0312/going-beyond-adamw-a-practical-guide-to-the-muon-optimizer-93d90e91dbd3)\n\nits a single thing, you can use it as a drop in for prodigy, og adam whatever its that inside muon there is adam as well\n\nbut it makes steps slower, almost twice as slow\n\nOhhhhhh\n\nSo the muon optimizer as shipped in PyTorch splits up some stuff for you to use these different techniques and you just focus on the overall settings\n\n- [jarredou](https://discord.com/channels/708579735583588363/1220364005034561628/1465121129805905982):\n\nAnother loss based on spec\\_rmse:\n\nSwitching to L1 for the log\\_mag part seems to give better results.\n\nAlso added separate weigthing for real and imag of the \"complex\" part\n\n([code](https://discord.com/channels/708579735583588363/1220364005034561628/1465121129805905982))\n\n- Frazer\n\nthere was something i tried that really improved scores but it needs some developing to get some meat out of it - ( i got 1/2 sdr extra with 4 layers, minimal increase in memory) ```\n\n([code](https://discord.com/channels/708579735583588363/1220364005034561628/1469654446853197966))\n\n- There's also <https://docs.pytorch.org/docs/2.8/generated/torch.nn.HuberLoss.html>\n\n(not implemented but easy to add it)\n\n> This loss combines advantages of both L1Loss and MSELoss\n\nyeah, from what i read\n\nl1 loss when dataset is noisy\n\nmse loss when dataset is clean\n\nalso keep in mind authors trained with l1 loss only, default in msst is masked loss\n\n- FFT parameters troubleshooting:\n\nif you want fullband then dim\\_f must be half of nfft + 1\n\nn\\_fft 2048 and hop\\_length = n\\_fft/4 is kinda the standard.\n\nbut other config are possible, some model are using n\\_fft=4096 by default and we have seen mdx23c instvocHQ which is n\\_fft=8192 and hop\\_length = n\\_fft/8 (and was previous sota before the rofos got open sourced)\n\n-jarr\n\n[More](https://discord.com/channels/708579735583588363/1220364005034561628/1426995241663201350)\n\nSweet spot for models seems to be between n\\_fft=2048 and 8192 (InstVocHQ, which is using hop=1024 iirc)\n\nBortSampson\n\nhop\\_size = n\\_fft / 4 is usually good imo\n\ncan very low nfft/win size and hop length sizes make high sdr impossible?\n\nunwa\n\nSetting nfft to a very small value allows you to identify the exact timing of the sound, but makes it difficult to identify its frequency.\n\nThis seems too crude to determine which frequencies to pass and which to block.\n\nbecruily\n\ncutting Roformer stfts in half really helps with transients and punchiness, but sdr struggles to pick up\n\nMost source separation projects are using `hop\\_length = n\\_fft // 4` as default value. And it seems that using even lower values like n\\_fft // 8 could help the models to get the separated stems a bit better (that said, it seems to not be a game changer and cost more resources).\n\nJar\n\n<https://arxiv.org/abs/1504.07372>\n\n<https://arxiv.org/abs/2210.01719v3>\n\n<https://sevag.xyz/blog/fft/>\n\nA: im not sure thats worth the memory increase tbh\n\njust based off of testing\n\n(just incase someone is thinking about trying it) - Cross\n\nQ: Has anyone got this error before? ValueError: operands could not be broadcast together with shapes (1,2,9526236) (1,2,9525812)\n\nA: You have to make sure all 3 end and begin at the right spot\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n**Roformers arch enhancements**\n\n**- LoRA training repository** by frazer - for only Mel and BS models at the moment\n\n(merged into ZFTurbo’s MSST training repo already)\n<https://github.com/fmac2000/Music-Source-Separation-Training-Models/tree/lora>\n\n“LoRA could specialize in a particular singer or genre.”\n\n“You can use LoRA as a replacement for full-weight fine-tuning\n\nfor now, all I can say is that it'll be faster to train and way more memory efficient than fine-tuning - whether the performance competes with full fine-tuning up is yet to be determined”\n\n“did a small test, and it achieved results in hours that took me days to achieve with A100”\n\n“I trained it for a week - 180 epochs\n\nI took Kim Melband and just trained a LoRA on the standard MUSDB - it took SDR vocals from 11 to around 12.3 after 100 epochs on batch\\_size = 1.\nI didn't save each epoch so that 100 epoch voc12.3 isn't there.\n\n(...) my bets it sucks ass since it's overtrained” frazer\n\n- Bytedance and Asriver prepared some **enhancements** for Roformer arch, and already published white paper which will be presented on ISMIR2024:\n\n<https://arxiv.org/abs/2409.04702> (already linked above)\n\nIirc, sami-bytedance-v.1.1 model is already some derivative of above with higher parameters, settings and from what I remember, trained on 16xA100, which cannot be even rented. Bas tried to train a model (model\\_mel\\_band\\_roformer\\_ep\\_617\\_sdr\\_11.5882) better than that, by just training purely on mutlisong dataset, but he couldn’t surpass that score.\n\n- At the end of December 2024, Lucidrains implemented \"**Value Residual Learning**\" into his BS-Roformer [repo](https://github.com/lucidrains/BS-RoFormer), based on the following paper:\n\n<https://arxiv.org/abs/2410.17897>\n\n“The paper argues that this mechanism can reduce the over-focus of attention and further reduce the vanishing gradient problem.”\n\nUnwa trained a small 400MB experimental instrumental [model](https://huggingface.co/pcunwa/BS-Roformer-Inst-EXP-Value-Residual) based on it. Doesn’t work in UVR.\n\nNow it’s also added into the ZFTurbo MSST repo.\n\n**- CFM**\n\nBefore the middle of 2025, somewhere in probably #dev-talk of our Discord server, jarredou\n\n“posted a repo from the ByteDance team which explained a method but didn't implement it in the code”\n\nBecruily: “Gemini did a quick pseudo implementation, and it trains 4 times slower”\n\nFrazer :”chat models can't even code javascript let alone AI”\n\nB: yeah the hallucinations and unnecessary edits are insane but 2.5 pro is prob the best I've seen as long as it has all the context\n\nF: “just standard CFM it doesn't need something audio specific for it to work 2s\n\ntime = torch.randn(B, device=device, dtype=dtype).sigmoid()\n\ntime = time.view(B, 1, 1)\n\nx = time \\* OUTPUTTENSOR + (1 - time) \\* INPUTTENSOR\n\ntarget = OUTPUTTENSOR - INPUTTENSOR\n\ncondition = self.timeembedder(time)\n\ncondition2 = self.someconditioningembedder(otherinfo here)\n\nsomecondition = condition + condition2\n\nx = model(x, somecondition)\n\nloss = F.mse\\_loss(x, target)\n\nyou need time embeddings for it to work, but it's copy-paste 10 minutes work”\n\nQ: what about the train/inference/valid, don't they have to be adapted to cfm as well or nah\n\nF: yeah - so inference will have to be something like this\n\ndef evaluate(self, x: torch.Tensor, N: int = 50) -> torch.Tensor:\n\nx = x.transpose(1, 2) #X must be B T C (if thats what bsroformer uses idk)\n\nt\\_span = torch.linspace(0, 1, N + 1, device=x.device, dtype=x.dtype)\n\ndt = t\\_span[1] - t\\_span[0]\n\nfor t in t\\_span[:-1]:\n\nk = self.forward\\_eval(x, t)\n\nx = x + dt \\* k\n\nx = x.transpose(1, 2)\n\nreturn x\n\nwhere you then define a new eval forward\n\ndef forward\\_eval(self, x: torch.Tensor, time: torch.Tensor):\n\nB, T = x.shape[0], x.shape[1]\n\ntime = torch.full((B,), time, device=x.device, dtype=x.dtype)\n\ntime = time.long()\n\ntime = self.embedding\\_time(time).view(B, 1, -1)\n\nconditioning = time #can add in other embeddings here\n\nx = self.forward\\_model(x, conditioning)\n\nreturn x\n\nif u want different losses you have to adapt them, what ur trying to do is to iteratively change the input by adding something to the latent across N timesteps”\n\n**HyperACE**\n\nUsed in a few unwa BS-Roformer models called HyperACE. They’re much slower than BS-Roformer Resurrection (probably the SW finetune), despite the size. The results were mostly more clarity but more noise, but it made a breakthrough with a few songs and people generally like the model (sometimes after phase fixing from vocal model in instrumental variant). the required py file for inference is the same for both v2 variants:\n\n<https://huggingface.co/pcunwa/BS-Roformer-HyperACE/blob/main/v2_inst/bs_roformer.py>\n\n““In v2, the TFC-TDF module used in models like the MDX23C has been added to the FreqPixelShuffle module.\n\nAdditionally, frequency-domain downsampling is now performed downstream of the Backbone module.”\n\nSome components in the SegmModel module were implemented based on this paper: <https://arxiv.org/abs/2506.17733>\n\n“HyperACE is the core of YOLOv13 (and when i asked about that, he replied with [this](https://miro.medium.com/v2/resize%3Afit%3A720/format%3Awebp/1%2AuJ2USYRtHQ7S7Uhn6WiUoA.png) graph [from [here]](https://sh-tsang.medium.com/brief-review-yolov13-real-time-object-detection-with-hypergraph-enhanced-adaptive-visual-a93200963687))”\n\nbecruily:\n\nIm sharing **bs cmhsa** which now doesn't use one gigaton of memory, and params/speed/vram should match original bs 2 days of training got me to 8.4 sdr but started plateauing (green was using global flattening which was the original way) icbf training more\n\n[(click)](https://discord.com/channels/708579735583588363/1220364005034561628/1450158929932718163)\n\nwhats it like against baseline it looks good\n\nconfig is same as apple, so idki mainly wanted it to go beyond sdr 12 but im not spending a few hundreds\n\ni mainly wanted it to go beyond sdr 12 but im not spending a few hundreds\n\n\\_\\_\\_\\_\\_\\_\n\n##### Links for research on source separation archs and more experimental stuff on training\n\nDue to GDoc limit, training miscellanies continue here:\n<https://docs.google.com/document/d/1WDDK7E8-HY7EYUOM-evIcXNYX_Of0gYOVjG95zLQfKo/edit?usp=sharing>\n\n\\_\\_\\_\\_\\_\n\nIf you already [prepared your dataset](#_wyh707wdm55j), here is a step-by-step guide by Bas Curtiz on:\n\n##### **Setting up training on your local machine** (read also [mesk’s guide](https://docs.google.com/document/d/1jUcwiPfrJ8CpHqXIRHuOu70cFDMv_n-UzW53iaFuM9w/edit?tab=t.0)) “Make sure you have all dependencies installed for ZFTurbo's Training & Inference script:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/gui.md>\n\nUpdate Nvidia drivers: <https://www.nvidia.com/en-us/drivers/>\n\nSet power plan to best performance + never fall asleep.\n\nDetermine what the fastest drive is on/in your PC:\n\n<https://www.guru3d.com/download/crystal-diskmark-download/>\n\nPut your dataset on the fastest drive.\n\nRead and apply the steps at:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#how-to-train>\n\nNotes:\n\na) ZFTurbo's repo has a lot of config files to start with. Pick the one based on the model type, you want to use, inside the folder /configs\n\nb) Alternatively, if you are going to fine-tune an existing model, use the .yaml associated\n\n(unsure, implementation looks unstable or smth wonky)\n\nTo train locally, assuming you don't have a powerhouse of a GFX card, add this line in config:use\\_torch\\_checkpoint: True\n\na) Learning rate wise, If you are training a model from scratch, you want to set it higher:5.00e-04 is used by ByteDance for example (=0.0005)\n\nb) If you are fine-tuning an existing model, set it lower: 5.00e-06 is recommended (=0.000005)\n\nSave the altered config and add your handle for ex.\n\nFor full overview of (optional) parameters that can be used:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/train.py#L106>\n\n- optional but recommended-\n\nCreate a free account at <https://wandb.ai> - shows u more insight on the training progress with graphs.\n\nA free personal/cloud-hosted account should suffice.\n\nAdd parameter --wandb\\_key YOUR\\_API\\_KEY (which u can get from https://wandb.ai/authorize)\n\nSave the full command in a text-file, handy for future usage. Hereby mine, which u can alter:\n\npython train.py --model\\_type mel\\_band\\_roformer --config\\_path configs/config\\_musdb18\\_mel\\_band\\_roformer\\_bascurtiz.yaml --dataset\\_type 2 --results\\_path results --data\\_path datasets/train --valid\\_path datasets/validation --num\\_workers 4 --device\\_ids 0 --wandb\\_key e304f2CENSOREDSOYOUNEEDTOUSEYOUROWNdecc122e\n\nRun the command in the root folder with CMD.\n\nCheck your progress/graphs at https://wandb.ai/[yourusername]/projects\n\nLatest update of repo gives u insight into fullness/bleedless too using parameter:\n\n--metrics sdr bleedless fullness l1\\_freq si\\_sdr log\\_wmse aura\\_stft aura\\_mrstft”\n\n“the augmentations help tho\n\neven tho it's slower\n\ngives it much more situations for real songs”\n\n“ye if your dataset isn't from the biggest it will help”\n\n“If you plan to stop/resume training many times, it could be interesting to also save optimizer (and scheduler) state with checkpoint, it can help train faster when you stop/resume a lot (as you resume everything in the state it was when you stopped training, instead of restarting optimizer from scratch each time).”\n\n*Patience parameter*\n\n“If set too low, it will reduce learning rate way too fast and lead to stagnant learning”.\n\nSo, if \"the model did not improve for like 10 epoches (weights did not save)\" while patience is set to default 2, \"you should set it to like 1000 to disable it for now.\"\n\n\"patience = X means that if during training, X consecutive epochs are not giving improvement (using sdr metric by default), it will reduce learning rate. If not set correctly, it can kill a training run by reducing too fast and too early learning rate.\n\nWhen not sure, it's better to set it to really high value (like 1000 here) so it will be never triggered.\" - jarredou\n\nMade from scratch training script by Dill “<https://github.com/dillfrescott/mvsep-beta>\n\n“it uses a neural operator architecture with something I call Kernel Scale Attention to capture a range of details. I'm training it now. No guarantees tho on the quality\n\nbut it's def working“\n\n*Troubleshooting of ZFTurbo’s training code by Vinctekan*\n\nIssue: GPU isn't available, using CPU instead it will be very slow\n\n“Turns out that the group and order of specific python packages that Turbo listed in the [.txt] is pretty cursed.\n\nAt the very end, pip just decides to remove your previous instances of torch, torchvision, and torhcaudio for some reason, and replaces it with the CPU versions, even if you decide to install pytorch CUDA beforehand. Tried removing torch==2.0.1 from the requirements but somehow it still stuck.\n\nIf you try to install pytorch CUDA AFTER installing the requirements, then it register as already installed. I thought about it for a while as to how could that be possible, but I slowly figured out that the CPU versions were installed because of it.\n\nThe way I found the fix is by pip uninstalling all 3 packages, and then reinstalling pytorch with the command on the website. It ultimately does not matter if it's 118 or 121.”\n\nQ: if I change something in the audio, model, training augmentations, inference section, or if I decide to remove augmentations entirely, will that still start training from where it left off, or is it going to start all over again?\n\nA: “as long as you provide a starting checkpoint in the training code, it will continue where it left off” becruily\n\nA: “Don't change \"audio\", \"model\" config, this must be the same as base checkpoint when resuming/fine-tuning, I think, but for \"augmentations\" part, you can edit as you want as it's pre-processing of the audio and done on the fly. mp3 encoding, pitchshifting and timestretching are quite resource heavy augmentations and can slow down training, other type of augmentations are more lightweighted.\n\nFor \"inference\", you can reduce overlap value if you want the validation step between each epoch to be a bit faster (overlap=1 will create clicks at chunk boundaries [fixed in newer MSST code])\n\n\"Training\" part, you'll probably have to edit batch\\_size to find the max value your GPU can handle.” jarredou\n\nQ: Isn't changing chunk size in audio fine as long as it's divisible by the hop length?\n\nI recently tried a lower chunk size while keeping everything the same (for melband) to help with VRAM issues, and it seemed to work (didn't train for long, just wanted to try)\n\nA: I have never tried, but yeah, I think you're right, that's probably why we can also use different chunk\\_size/dim\\_t values for inference and the models are still working.\n\nQ: I read some old messages where you were training very small chunk sizes with a large batch to start off roformer training. Did you pick an overall strategy there to swap in larger chunk sizes at a certain point to build past like 8/9SDR? Heard you might be able to do that with RoPE encodings without changing the model size so you can continue but don't know yet.\n\n>I'm training a roformer from zero and it's steadily improving but it's been about 1.5 days and I'm up to like 5.2 SDR lol. Happy to read up on old threads with pointers too.\n\nStarting with a chunk size of about three seconds will yield good results.\n\n- unwa\n\n-\n\nThe bs roformer paper config is a large and slow to train model, you can try lower chunk size and increase batch size if you don't want to decrease params yeah. It might train faster, but eventually you gotta increase chunk size once it plateaus.\n\nMost of the time I just trained my own made up arch for fun and I never trained bs rofo long enough to get anywhere cool with it so idk how long it takes\n\n- BortSampson\n\n- “Negative SDR means that there's more distortion (bleed/artifacts) than target signal in the evaluated audio”\n\\_\\_\\_\n\nThe lightest arch and still performing great seems to be\n\n###### **Vitlarge**\n\n“This arch is more tricky than other, even if lighter.” jarredou\n\n(?) segm\\_model in the script (or something like that)\n\nmusdb configs are for 4stem training, vocals ones are for 2stem\n\nQ: What’s the minimum length requirement\n\nA: “Default segment\\_size in htdemucs config is 11 seconds audio chunks, so your training audio files should be longer or equal to 11 second length.\n\nIt can be lower, if there’s no other choice.”\n\n- [Here](https://discord.com/channels/708579735583588363/708912597239332866/1225994922822467674), one user is being helped with training hi hat model from scratch using ZFTurbo code on an example of RTX 3060 12GB\n\n- “I believe a length of about one minute per song is appropriate for the validation dataset.”\n\nQ: “My avg loss is always in the 130–120 range, is it worth the time to keep waiting for the training? The previous training is also like this, never touched 110 or under 100”\n\nA: Don't worry about avg loss, look at the SDR on the metrics - is it improving?\n\nQ: No improvement so far, the last one was at epoch 9, now I’m heading into epoch 13\n\nA: “Yeah, don’t worry, so what you’re seeing is the loss curve.\n\nYou've been shooting down that ramp, but it slows improvements after a while”\n\n[pic](https://imgur.com/a/GASs52U) (frazer)\n\n##### Get fast GPUs for training (how to)\n\n*By Bas Curtiz (and others; redacted)*\n\n\"Budget\" option - 4090, or\n\nBuy A6000, preferably multiple.\n\nOr hire them in the cloud.\n\nBest bang for your buck for now:\n\n<https://vast.ai/>\n\n[<https://www.tensordock.com/> - similar prices (although for November 2024, worse for 8xA100]\n\n<https://www.runpod.io/>\n<https://app.hyperbolic.xyz/compute> - “5x and 8x H100 GPU instances with 1.8 TB storage for $5 and $8 per hour” - Kim\n\n“<https://www.thundercompute.com/> - A100XL (80GB) on $1.05/hour - Essid]\n\n<https://www.vultr.com/> - Sign up for an account at:\n<https://sites.google.com/site/vultrfreecredit?pli=1>\n\nGet 250 bucks free.\n\nAdd 50 bucks.\n\nNow GPU rental is unlocked. Start there and vast.ai and wait for a server that has a6000 x8 for a good price.\n\n[<https://lambda.ai/> - frazer]\n\n[<https://www.cerebrium.ai/> - but that's more for inference, it's pretty good but I was looking for faster cold starts - cheguevara6172]\n[coreweave - probs ur best bet for coldstart - frazer]\n\nBut if you have enough time at hand, RTX 4090 is cheaper in the long run [than vultr].\n\nDepends on your electricity costs, though, which varies per country.\n\n“The easiest would be Colab, if you pay for the compute units the v100 is identical to training with 3090 locally, but Colab can get expensive quickly” - becruily\n\nPaid Colab has now Nvidia A100 vs free Tesla T4. It’s also faster than v100 and L4.\n\n[“Check out the 5090 and get a memory mod from china\n\nlike 5090 on paper is slower than a6000 but because you get fp4 its faster” - Frazer]\n\nMost in-depth and handy article: <https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/>\n\nGPU performance chart:\n\n<https://i0.wp.com/timdettmers.com/wp-content/uploads/2023/01/GPUS_Ada_raw_performance3.png?w=1703&ssl=1>\n\ntldr; <https://nanx.me/gpu/>\n\n[Performance in training\n\nNVIDIA H100>A100 (40/80GB)>RTX 4090>RTX A6000 Ada>Nvidia L40 (also 18K CUDA cores)>prob. RTX 5000 Ada (12,8K CUDA cores)>RTX 4080 (9728K)>3090 Ti (10752)>V100 (32/16GB)>RTX 3090\n\n<https://i0.wp.com/timdettmers.com/wp-content/uploads/2023/01/GPUS_Ada_raw_performance3.png?w=1703&ssl=1>]\n\nCheaper GPUs for training\n\n2x GTX 3090s used are cheaper than 4090 (which has 16384 CUDA cores;\niirc, the performance for multi-GPUs doesn’t scale linearly, so might be not that affordable)\n\n?RTX 5060 Ti 16GB (CUDA cores: 4608; check out if it can outperform 3090 as in Wan2.2 video generation model)\n\nRTX 3090 24GB (CUDA cores: 10496)\n\nRTX 5070 Ti 16GB (CUDA cores: 8960)\n\nRTX 4070 Ti Super 16GB (CUDA cores: 8448)\n\n~RTX 4070 Ti 12GB (CUDA cores: 7680, it's (still) tasking to train on it, and Roformers will achieve worse SDR due to necessity to start training with lower parameters)\n\n2x GTX 1080 (CUDA cores: 2560, if dual GPU scaling would be decent enough - it’s not linear)\n\nNot mentioning these:\n\nRTX 2080 Ti (CUDA cores: 4352)\n\nRTX 3060 12GB (CUDA cores: 4864)\n\nGTX 1080 Ti (CUDA cores: 3584)\n\nTraining and inference performance for GPU per dollar\n\n<https://i0.wp.com/timdettmers.com/wp-content/uploads/2023/01/GPUs_Ada_performance_per_dollar6.png?ssl=1>\n\nBe aware that multi GPU configurations don’t scale linearly.\n\nWe had an interesting discussion on the server on the choice between GTX 1080 Ti vs RTX 3060 12GB in training. We’re yet to find out the final result, but unwa claims that despite having more CUDA cores, 1080 Ti might turn out to be slower. The possible reasons:\n\n- Pascals (GTX 1000) are “limited by FP16 performance by a factor of 64”\n\n- No tensor cores (“Normally, AMP (Automatic Mixed Precision) is turned on when training a model, but AMP uses Tensor Core to speed up the computation.” plus “Tensor Core generation is also newer [in 3000 series], with support for more precisions, including BF16.”)\n- “3060 is one generation newer than RTX 20xx / GTX 16xx and can use Flash Attention2. This is not very relevant for music source separation model, but may be very useful for LLM inference. (Roformer models have a Flash Attention entry in the configuration, but Memory Efficient Attention is used unless A100 GPU(s) are used; maybe it got changed since then as RTX 3000 started to be recommended as a minimum.)”\n\n- “This is an extreme example, but it's a table comparing video generation times for each GPU under the settings of Wan2.2 Q6 K, 1280x704, 84 seconds, and 8 steps. The 5060 Ti 16GB even outperforms the 3090” - Unwa\n\n[pic](https://imgur.com/a/6znQaPa)\n\nQ: How long does it takes to train a model?\n\nA: \"Depends on input and parameters and architecture.\n\nMDX old version:\n\n5k input (15k actually: inst/mixture/vocals) + 100 validation tracks (300, same deal), fullband, 300 epochs would have taken 3 months on a RTX 4090.\n\nYou can speed it up by going multiple GPUs and more memory, therefore:\n\nA6000 (48gb) x 8 was like 14 days.\n\nDamage on 300 epochs: ~700 bucks.\"\n\n\"7 days training of e.g. BS-Roformer on 8xA100-80GB\": 7\\\\*24\\\\*15.12 (runpod 8xa100 pricing) = $2540.16”\n\n4 days achieved epoch 74, and on epoch 162 for ~4200/4500 songs\n\nQ: “4070 [8GB] works, but I would only use for testing IMO\n\nA: I’ve trained some convtasnet in the past with really decent times [on 4070 8] (the new Ada Lovelace on 40 series makes faster tensor cores, which kinda compensates the less number of cores compared to 30 series)\n\nA: [4070 8GB] is fine for non transformers.\n\nIf mamba blocks are used good it could be fine TBF.\n\nThe thing with transformers is that it is really reliant on VRAM.\n\nA: Depends on what's inside the transformer, if it's flashatten then you need Ada.\n\nMamba has custom kernels, but I'm pretty sure 4090 can run it - what'll be cool is mamba + reversible net, super memory efficient in training, but it ends up being slower per step (around 2x compared to backprop).\n\nI guess in reversible net you can have gigantic batch sizes which kinda circumvent the problem of a slow step speed”\n\n- 80GB h200 card 1.5$ an hour\n\nu dont have to train by maxing out batchsize and model size, you can run 5 different models all at once\n\n- also might as well rent a B200 its like quite a bit more powerful than an H100\n\n- Mesk in his guide added some GPU renting solutions\n\nThere is a potential alternative to GPUs -\n\n**Training using TPU**\n\n<https://sites.research.google/trc/about/>\n\n“(...) equivalent in performance to an a100\n\nI'm not sure how good torch xla support is now (...)” Cyclcrclicly\n\nTurns out total usable VRAM for Pro is sadly 16GB, and 48GB of system memory\n\n24 hours of interrupted training is possible, and 12 hours for free users.\n\nIt turned out to be extremely convoluted to fix compatibility issues.\n\nBcr: models need to be rewritten in JAX, I think; can't just train like this\n\nFr:\n\nimport torch\\_xla\n\nmodel.to('xla')\n\nfor inputs, labels in train\\_loader:\n\n+ with torch\\_xla.step():\n\ninputs, labels = inputs.to('xla'), labels.to('xla')\n\nmodel(blah blah)\n\n#after above epoch is finished\n\n+ torch\\_xla.sync()\n\n+\n\nAnv: there's jax version <https://github.com/flyingblackshark/jax-bs-roformer>\n\nFr: You dont need JAX - all you need to do is import torchxla and move it to xla device\n\nwherever the model is instantiated, I think train.py probs, add that import at the top, where the model is moved to device - change that from .to('cuda') to .to('xla'), then in the train loop you add the with torch\\_xla step, move inputs to xla then after that train loop you put xla sync\n\nBas stuck during fixing it before, but here’s the convo where he stuck:\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\nWhy does training the bs\\_roformer model with 8s chunksize, 256 dim, 8 depth consume only 13GB of VRAM now, compared to 21GB last time [they could decrease VRAM since then]\n\nStuck troubleshooting of TPU training by Bas Curtiz (Q) and frazer, DJ NUO, jarredou and Cyclcrclicly (A):\n\nQ: Is it as simple as adding pytorch lightning though?\n\nA: try using \"xla\" as the device instead of cuda and if you're lucky everything will Just Work™️\n\nQ: The Pytorch's implementation of stft does not work on the XLA (TPU),\n\nbecause it internally uses some unsupported functions.\n\nThere are not feasible workarounds for it available.\n\nOnly some 3x PhD discussion, which discusses the underlying function not working,\n\nwhich would require forking pytorch to get it working, IF the solution was actually even feasible:\n\n(hacky super slow workaround, or just \"use different shit\").\n\nOnly \"realistic\" solution I've found is porting the mel band roformer to tensorflow.\n\nWhich is bruh, but the thing is in their docs STFT says:\n\nImplemented with TPU/GPU-compatible ops and supports gradients..\n\nAlso tensorflow is by google, the TPU as well, so yk, it might have better support.\n\nThe same error basically is described here:\n\n<https://github.com/pytorch/xla/issues/2241>\n\nA: As frazer said, you'll have better luck with jax than tensorflow\n\nA: Can you try putting data to CPU & running it there, and then put the result back on TPU?\n\nI encounter similar issues when running on Mac MPS (GPU), and this code helps to alleviate the issue:\n\nstft\\_repr = torch.stft(raw\\_audio.cpu() if x\\_is\\_mps else raw\\_audio, \\*\\*self.stft\\_kwargs, window=stft\\_window.cpu() if x\\_is\\_mps else stft\\_window, return\\_complex=True).to(device)\n\n(of course, in your case the code might be a bit different, but it demonstrates the idea)\n\nQ: obviously slow\n\nit is called in a loop in the forward function (= very slow)\n\n...if it was like only once / before each step, but not inside step.\n\nwe'll try anyways, thanks\n\nTimed-out after 10 mins, 0 steps were finished.\n\nImagine doing 4032 steps.\n\nJAX is like an optimizer/JIT.\n\nSTFT of it, is just Scipy's STFT but running under JAX.\n\nScipy's implementation is CPU-based.\n\nSo it expects CPU data. Not Tensor/GPU/TPU data.\n\nA: Or this might help (custom implementation of STFT): https://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/main/stft.py\n\nA: There's also https://github.com/qiuqiangkong/torchlibrosa/ that has a stft implementation\n\nQ: Hmm both use numpy which is cpu based\n\nA: yeah its some weird operation in the torch spec, i use https://github.com/adobe-research/convmelspec anytime incompatibility occurs\n\nQ: May be we need to replace mel spec with this in MelRoformer.\n\nI got a boilerplate/minimal produduction ready, but 2 things...\n\nno TPU for me right now to test - maybe someone else has better luck / paid Colab sub.\n\nLast outcome, which might be fixed by now: RuntimeError: Attempted to call variable.set\\_data(tensor), but variable and tensor have incompatible tensor type.\n\nA: you can use kaggle for tpuv3 with probably better availability\n\nQ: https://github.com/qiuqiangkong/torchlibrosa/ result:\n\nCalling \"torch.nn.functional.fold\" just gets stuck, when interrupting, the error stack has mentions of copying to CPU.\n\n...smth to do with the fold function.\n\nNumpy only in initialization (cpu), so that's fine.\n\nhttps://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/main/stft.py result:\n\nNumpy and basically cpu all-the-way, so no/go.\n\nhttps://github.com/adobe-research/convmelspec result:\n\nNot a STFT library / whole spectrogram, don't wanna dissect it, the STFT part seems internal,\n\ndidn't notice (would have to double-check) the inverse, but wasted 2 days already. done with it.\n\nA: Just a guess (have no experience with Tensorflow): what if STFT portion of the code can be executed by TensorFlow code -> convert result to numpy CPU -> convert to PyTorch tensor\n\nQ: Problem is... It simply takes too much time, copying to cpu and back is expensive resource-wise\n\nA: In some part of torchlibrosa they use a workaround for nn.functional.fold function, maybe that can be reproduced/adapted to the other failing part where fold used.\n\nA: line 239 is the drop in, you have to make sure the settings are the same from what i remember https://github.com/adobe-research/convmelspec/blob/main/convmelspec/stft.py\n\nQ: It got thru the whole ass forward step. But now it's stuck at backward step.\n\nyk, recalculate the weights based on this step to improve the model.\n\nreplaced the backwards function of stft with empty one, and yet: stuck.\n\nso since backwards step of stft/istft is disabled...\n\nthe problem is elsewhere.\n\nNo idea where, no idea how to debug, out of my expertise.\n\nA: I might be 100% wrong here, but I think you should disable the backward pass through that class if it is type nn.Module\n\nstft.requires\\_grad=False\n\nor when you call stft use a decorator with indentation\n\nwith torch.no\\_grad():\n\nx=stft(x)\n\n###### *Other archs:*\n\n###### **SCNet: Sparse Compression Network**\n\nLarge models by ZFTurbo turned out to sound between Roformers and MDX.\n\n“SCNet is maybe a bit more bleedy than MDX23c” and/or possibly noisy, judging by the MVSEP model(s). “seemingly impossible” to train guitars\n\n\"Very bleedy, but quicker to train and can get very close to metrics like SDR to Roformers, often sounds comparable too (for normal \"busy\" songs).\n\nIt just isn't as good at middle and high frequency due to it being 3 band model and the lower frequencies get the most features - becruily\n\n[July 10th 2024] “Official SCNet repo has been updated by the author with training code.<https://github.com/starrytong/SCNet>”\n\n“ZF's script already can train SCNet, but currently it doesn't give good results”\n\n[https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/](https://github.com/ZFTurbo/Music-Source-Separation-Training/releases/tag/v.1.0.6)\n\nThe author’s checkpoint:\n\n<https://drive.google.com/file/d/1CdEIIqsoRfHn1SJ7rccPfyYioW3BlXcW/view>\n\nJune 2025\n\nZFTurbo: “I added new version of SCNet with mask in main MSST repository. Available with key 'scnet\\_masked'. Thanks to becruily for help.”\n\n“the main thing is removing the SCNet buzzing” - Dry Paint\nHow heavily undertrained weights looks on spectrograms with mask vs without: [click](https://imgur.com/a/ZM0h6tI)\n\n“One diff I see between author config and ZF's one, is that dev has used learning rate of 5e-04 while it's 4e-05 in ZF config. And main issue ZF was facing was slow progress (while author said it worked as expected using ZF training script <https://github.com/starrytong/SCNet/issues/1#issuecomment-2063025663>)”\n\nThe author:\n\n“All our experiments are conducted on 8 Nvidia V100 GPUs.\n\nWhen training solely on the MUSDB18-HQ dataset, the model is\n\ntrained for 130 epochs with the Adam [22] optimizer with an initial\n\nlearning rate of 5e-4 and batch size of 4 for each GPU. Nevertheless,\n\nwe adjust the learning rate to 3e-4 when introducing additional data\n\nto mitigate potential gradient explosion.”\n\n“Q: So that means that you have to modulate the learning rate depending on the size of the dataset?\n\nI think it's first time I read something in that way\n\nA: Yea, I suppose because the dataset is larger you need to ensure the model sees the whole distribution instead of just learning the first couple of batches”\n\nPaper: <https://arxiv.org/abs/2401.13276>\n\n<https://cdn.discordapp.com/attachments/708579735583588366/1200415850277130250/image.png> (dead)\n\nOn the same dataset (MUSDB18-HQ), it performs a lot better than Demucs 4 (Demucs HT).\n\n“melband is still sota cause if you increase the feature dimensions and blocks it gets better\n\nyou can't scale up scnet cause it isn't a transformer\n\nit's a good cheap alt version tho”\n\nZFTurbo “I trained small model because author post weights for small. Now I'm training large version of model, but it's slow and still not reach quality of small version.\n\nI use the same dataset for both models\n\nMy SCNet large stuck at SDR 9.1 for vocals. I don't know why\n\nMy small SCNet has SDR 10.2\n\nI added config of SCNet to train on MUSDB18:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/configs/config_musdb18_scnet_large.yaml>\n\nOnly changes comparing to small model are these parts:\n\nSmall:\n\ndims:\n\n- 4\n\n- 32\n\n- 64\n\n- 128\n\nband\\_SR:\n\n- 0.175\n\n- 0.392\n\n- 0.433\n\nLarge:\n\ndims:\n\n- 4\n\n- 64\n\n- 128\n\n- 256\n\nband\\_SR:\n\n- 0.225\n\n- 0.372\n\n- 0.403”\n\nZFTurbo eventually trained SCNet large model, but it turned out to sound similar to Roformers, but with more noise. You can test the model on MVSEP.com\n\nSCNet Large turned out to be good for piano (vs MDX23C and MelRoformer) and also drums models according to ZFTurbo.\n\n“He also said SCNet didn't work that well for strings, Aufr didn't have luck with BV model as well”\n“MDX23c is already looking better on guitar after 5 epochs than scnet after 100 epochs”\n\n“with SCNet I've had the fastest results with prodigy [optimizer]” becruily\n\nLater, ZFTurbo released SCNet 4 stems (in his repo) and exclusive bass model on MVSEP.\n\nThere was also an older, an unofficial (not fully finished yet, it seems) implementation of SCNet: <https://github.com/amanteur/SCNet-PyTorch>\n\n“Just as a curiosity I got a scratch masked scnet (normal size) vocals up to 9.85 musdb which is just a hair shy of the standard scnet in the paper. In the long run the complex loss started to outperform <@751869403364196563> hybrid spec magnitude signal loss for that config. Only change was swapping out gelu/silu activations for xatlu which <@198414985779609600> posted about a couple days ago”\n\nBortSampson: btw throw this into scnet, you could ensure that the separated instruments sum to mixture\n\n1 # 1. Start with the \"fair share\" (Average)\n\n2 base = mixture / 2\n\n3\n\n4 # 2. Model predicts the adjustment (Residual)\n\n5 raw\\_residual\\_a, raw\\_residual\\_b = model(mixture)\n\n6\n\n7 # 3. Center the residuals so they cancel each other out\n\n8 # (e.g., if A is +5, B becomes -5)\n\n9 res\\_a = raw\\_residual\\_a - mean(raw\\_residual\\_a, raw\\_residual\\_b)\n\n10 res\\_b = raw\\_residual\\_b - mean(raw\\_residual\\_a, raw\\_residual\\_b)\n\n11\n\n12 # 4. Add to base\n\n13 source\\_a = base + res\\_a\n\n14 source\\_b = base + res\\_b\n\n- Band-SCNet: A Causal, Lightweight Model for High-Performance Real-Time Music Source Separation\n\n<https://www.isca-archive.org/interspeech_2025/yang25d_interspeech.pdf>\n\n[becruily code](https://discord.com/channels/708579735583588363/1220364005034561628/1423786716757622834) ([mirror](https://drive.google.com/drive/folders/1AjCrRJVLrp9eVG0cMg3VBHBH37iPMuof?usp=sharing))\n\n“inference.py needs slight mod to inference (valid works)”\n\nExperimental **BS-Mamba**\n\ngit clone https://github.com/mapperize/Music-Source-Separation-Training.git --branch workingmamba\n\nReimplementation with weights\n\n<https://github.com/EuiYeonKim/BSMamba2>\n\n**TS-BSmamba2**\n\n<https://arxiv.org/abs/2409.06245>\n\n<https://github.com/baijinglin/TS-BSmamba2>\n\nAdded to ZFTurbo training repo:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/>\n\nAt this moment, training works only on Linux or WSL.\n\nSDR seems to be higher than all the current archs, maybe besides Mel/BS Roformers (weren’t tested). “It's in between SCNet and Rofos but maybe more lightweight than them.\n\n(...) From the scores from MelBand paper [it seems] the Rofos are still like +0.5 SDR average above the other archs when trained on musdb18 only.\n\nBut it's great to finally see some mamba-inspired MSS arch with great performance”.\n\nAs for 22.09.24, ZFTurbo had problems with low SDR during training.\n\n<https://discord.com/channels/708579735583588363/1220364005034561628/1286650425596186645>\n\n<https://discord.com/channels/708579735583588363/1220364005034561628/1284221988294099102>\n\nAnother three very promising archs for the moment:\n\n**Conformer**\n\n*“performs just as well if not better than a standard Roformer”*\n\n<https://arxiv.org/pdf/2005.08100>\n\n<https://github.com/lucidrains/conformer>\n\n(people already train with it, and its implementation might be pushed to the MSST repo in not distant future)\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/169>\n\nEssid pretrain:\n\n<https://huggingface.co/Essid/MelBandConformer/tree/main>\n\n<https://mvsep.com/quality_checker/entry/9087>\n\n“Due to cost issues, I'm discontinuing the Mel-Band-Conformer MUSDB18HQ-based train. I'm sharing the ckpt and config, so anyone who wants to continue can use them.”\n\nIt has [shown](https://imgur.com/a/yk64g2G) steady improvement in training in the last 12 hours from epoch 0 to 83 (1 SDR increase on private validation dataset, probably on A100XL (80GB) on “thunder compute” $1.05/hour) and the shared weight is epoch 200+.\n\n“Testing a version of conformer with RoPE inside @Cross's repo and im really surprised its giving me 9sdr on the other side on the first 4k steps already, 2 sdr on vocals is more or so what I expected lol” - mesk\n\n(cannot find it, kindly ask one or the other or maybe look here:\n\n<https://github.com/dillfrescott/mvsep-beta>)\n\n**TF-Locoformer**\n\n<https://arxiv.org/abs/2408.03440>\n\n<https://github.com/merlresearch/tf-locoformer/blob/main/espnet2/enh/separator/tflocoformer_separator.py>\n\n“I see only now that the tf-locoformer repo was updated to include the variants published few months ago (TF-Locoformer-NoPE and BS-Locoformer)\n\n<https://github.com/merlresearch/tf-locoformer>” - jarredou\n\n**dTTnet**\n\n<https://github.com/junyuchen-cjy/DTTNet-Pytorch>\n\n“They report very good performance on vocals with low parameters” - Kim\n\nBack in the end of 2023, one indie pop song from multisong dataset (of the two there) received the best SDR - Bas Curtiz\n\n“better than SCNet imo, remains to see if it can beat rofos”\n\n“not fast to train. I'm back with vanilla mdx23c\n\nTrying a config to train model with less than 4GB VRAM, almost at 7 SDR for vocals in 8 hours of training (on moisesdb+musdb18, and using musdb18 eval, with my 1080Ti and batch\\_size=1, chunk\\_size is around 1.5sec) ”\n\nModification of the training code for MSST by becruily with jarredou mods ([DL](https://drive.proton.me/urls/NDNTT9RQQW#gwYFcDEeQEAD); [old](https://drive.google.com/drive/folders/1e2NmyxxJU1h2wGxomBl7D-NXJBYHMXsU?usp=sharing)).\n\nBreaks compatibility with the authors' checkpoint.\n\n“Also keep in mind authors trained with l1 loss only, default in msst is masked loss”\n\n“from what I read, l1 loss when dataset is noisy, mse loss when dataset is clean”\n\n“the loss is defined from msst, but in the original dttnet it was in the code itself\n\nyou can just --loss l1\\_loss”\n\n“just realised i mispelled it in the code, it's dTTnet, not dDTnet”\n\nTo jarredou: “I copied your tfc and tfc\\_tdf classes to my files (and used that latest stft/istft I sent) - and seems to be better, just like the og dttnet\n\nthe tfc/tdf fixed the nan issue for me” - becruily\n\nInstallation instruction (for the old one iirc):\n\n“In the latest MSST [at least for 13.10.25]\n\nadd the ddtnet folder to \"models\" and replace your settings file in utils with [this](https://discord.com/channels/708579735583588363/1220364005034561628/1426979843932815430)”\n\n[Fixed](https://discord.com/channels/708579735583588363/1220364005034561628/1427041482409382020)\n\n“The weird thing is, it sounds like a fullness model despite not being one, I barely can find dips in instrumentals\n\nddtnet vs kim melband, if anyone is curious <https://drive.google.com/drive/folders/12an8wnKC-FKE48gVu9pHvUaLSxzpC6C8?usp=sharing>\n\nKeep in mind ddtnet was trained only with musdb and has 10-20x less params while being comparable in quality”\n\n“the baseline models were evaluated on mvsep <https://mvsep.com/quality_checker/entry/5392>” -jarr\n\n- “I’ve found the issue in my DTTNet version leading to the \"noisy\" outputs. It was just the \\* changed to a + in forward [here](https://imgur.com/a/IgiHZkL)” - jarredou\nEverything uploaded at the top.\n\nMesk’s config for training instrumental model (achieved from SDR 6 to 9.3 in a third epoch [counting the first as 0]:\n\npython train.py --model\\_type dttnet --config\\_path config-ddtnet-other.yaml --start\\_check\\_point results/vocalsg32\\_ep4082.ckpt --results\\_path results/ --data\\_path [YOUR DATASET] --valid\\_path [YOUR VALIDATION DATASET] --dataset\\_type [TYPE 1/2/3/4/5] --num\\_workers 8 --device\\_ids 0 --metric\\_for\\_scheduler sdr --metrics fullness bleedless l1\\_freq\n\nchange these accordingly:\n\n>data\\_path\n\n>valid\\_path\n\n>dataset\\_type\n\n- “without the 'masking', so yeah, it's not needed at all\n\njust comment out\n\n# x = x \\* mix\\_stft.unsqueeze(1)\n\nin forward pass just before inverse stft” - [jarredou](https://discord.com/channels/708579735583588363/1220364005034561628/1427358045780049920)\n\n**KAN-Stem**\n\nThat might be interesting to train multistem, it’s based on Demucs:\n\n<https://github.com/waefrebeorn/KAN-Stem>\n\n**VR architecture** by tsurumeso<https://github.com/tsurumeso/vocal-remover>\n\n(VR models in UVR, use modified v5 training code in order to support e.g. 4 bands, inferencing v6 models is not yet supported in UVR)\n\nThe arch is obsolete for instrumentals - bleeding and vocal artefacts.\n\nNot really recommended anymore, unless for specific tasks like de-noise, de-reverb or Karaoke or BVE when MDX V1 wasn't giving that good results.\n\n(guide by Joe)\n\nQ: How do I train my own models?\n\nA:\n\nModel Training Tutorial\n\nRequirements:\n\n- Windows 10\n\n- Nvidia GeForce Graphic card (at least 8 GB of VRAM)\n\n- At least 16GB of Ram\n\n- Recommend 1 - 2TB of hard drive\n\nSetup your dataset\n\n1. You need to know...\n\nAttention:\n\n- Although you can train your model with mp3, m4a, flac file, but we recommend converting those file to wav file.\n\n- For high-resolution audio sources, the samples are reduced to 44.1kHz during conversion.\n\n- If possible, match the playback position and volume of the OnVocal and OffVocal sound sources.\n\n- The dataset required at least 150 pairs of songs\n\n2. Rename the file...\n\nAttention:\n\nCreate \"mixtures\" folder with vocals / \"instruments\" folder without vocals\n\nPlease separate the sound sources with and without vocals as shown below.\n\nThere is also a rule for file names, please make the file names numbers and add \"\\_mix\" / \"\\_inst\" at the end.\n\nExample:\n\nInstrumental with vocal:\n\nD:\\dataset\\mixtures\\001\\_mix.wav\n\nD:\\dataset\\mixtures\\002\\_mix.wav\n\nD:\\dataset\\mixtures\\003\\_mix.wav\n\n.\n\n.\n\n.\n\nInstrumental only:\n\nD:\\dataset\\instruments\\001\\_inst.wav\n\nD:\\dataset\\instruments\\002\\_inst.wav\n\nD:\\dataset\\instruments\\003\\_inst.wav…\n\n.\n\n.\n\n.\n\n3. Download the vocal-Remover from GitHub\n\nLink: <https://github.com/tsurumeso/vocal-remover/releases/>\n\n4. Install the program (Use this command down below)...\n\npip install --no-cache-dir -r requirements.txt\n\n5. Start learning\n\npython train.py --dataset D:\\dataset\\ --reduction\\_rate 0.5 --mixup\\_rate 0.5 --gpu 0\n\nAttention:\n\nIf you want to pause, press Ctrl+Shift+C\n\n6. Continue learning\n\nExample:\n\npython train.py --dataset D:\\dataset\\ --pretrained\\_model .\\models\\model\\_iter(number).pth --reduction\\_rate 0.5 --mixup\\_rate 0.5 --gpu 0\n\n\\_\\_\n\nCompared to VR5 arch, VR6 now can handle phase. Although I’m not sure if it implements Aufr33 mutliband functionality which models trained for UVR5 on VR5 utilize (I’m not sure if that training code is in the old CML UVR5 code).\n\n*MedleyVox*\n\nExcellent for training duet/unison and separately main/rest vocals.\n\nThe original code is extremely messy and broken at the same time, and the dataset is big and hard to obtain. Cyrus was to publish their own repository with fixed code and complete dataset at some point.\n\nThe problem of the model trained by Cyrus was cutoff used while training.\n\n\"The ISR\\_net is basically just a different type of model that attempts to make audio super resolution and then separate it. I only trained it cuz that's what the paper's author did, but it gives worse results than just the normal fine-tuned\" ~Cyrus\n\nApart from training code, there wasn't any model released by the authors. Only result snippets.\n\n<https://github.com/JusperLee/TDANet>\n\n\"I think this arch should worth a try with multiple singer separation, as it's performing quite well on speaker separation, and it seems it can be trained with a custom number of voices (same usual samplerate & mono limitations tho)\" jr\n\nMossFormer2 may perform better\n\n“These archs are not implement in ZF's script but are really promising for multiple speakers separation, and should be working for multiple singers separation if trained on singing voice:\n\n<https://github.com/dmlguq456/SepReformer> (current SOTA)\n\n<https://github.com/JusperLee/TDANet>\n\n<https://github.com/alibabasglab/MossFormer2>\n\n” jarreodu\n\nBas Curtiz\n\n“Few takeaways I learned from the issues at its GitHub, <https://github.com/dmlguq456/SepReformer/issues>\n\ncurrently only supports 8khz sample rate (so downsample your 44.1khz samples to this prior)\n\nsamples only: max 10-20 seconds input, otherwise potential memory issues (so chunk a full song into such segments prior)\n\nIndividual samples are not supported (so it's folder-based, put your samples in there)”\n\nAlso, there are various errors which some users tend to encounter, at least on Windows machines.\n\nNew sep algo\n\n<https://github.com/OliverRensu/xAR>\n\n[This repository includes the official implementation of our paper \"Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation\"]\n\nMore Expressive Attention with Negative Weights\n\n<https://arxiv.org/abs/2411.07176>\n\nthis seems a bit more than just another gimmick paper and might actually be a huge boost to trf models. its one of those things, I feel, has been staring people in the face for years (since prob like 2017) and was only recently thought of (2024)\n\nseems like you could apply the concept to non trf based architectures as well\n\nlucidrains recently added it to x-transformers, which is how i heard about it\n\nusually hes a bit picky on what features he adds to that repo as well\n\nso now the answer is, does it benefit mvsep models? if so how much?\n\nive gone through a lot of \"this is the next big thing\" papers and out of all of them this seems most likely to actually hold to that claim\n\nim thinking if the claims hold true this could provide a significant boost to at least language models\n\n- Cross/jreoka\n\n<https://arxiv.org/abs/2510.21450>\n\n[Frazer](https://discord.com/channels/708579735583588363/1220364005034561628/1435326150267244554) “idk how the hell these types work but this should work for GRU layers\n\nthis will be sick inside of scnet.\n\nbut it does increase memory use due to having to hold onto more stuff for backpropagation in xatlu since it's not fused like gelu/relu/silu. Had to decrease by about a batch size of 4 - x\n\nimport torch.nn as nn\n\nfrom pararnn.parallel\\_reduction.parallel\\_reduction import NewtonConfig\n\nfrom pararnn.rnn\\_cell.gru\\_diag\\_mh import GRUDiagMHConfig, GRUDiagMH\n\nclass ParaGRU(nn.Module):\n\ndef \\_\\_init\\_\\_(self, dim, intermediate\\_dim):\n\nsuper().\\_\\_init\\_\\_()\n\nnewton\\_cfg = NewtonConfig()\n\ncfg = GRUDiagMHConfig(\n\nstate\\_dim=intermediate\\_dim,\n\ninput\\_dim=dim,\n\nnewton\\_config=newton\\_cfg,\n\nmode='parallel\\_FUSED' #https://github.com/apple/ml-pararnn/blob/main/pararnn/rnn\\_cell/rnn\\_cell\\_application.py#L18\n\n)\n\nself.layer = GRUDiagMH(cfg)\n\ndef forward(self, x):\n\nreturn self.layer(x)\n\n-\n\n“I tried replacing the mask estimator with vocos/bigvgan and while it seems to work from early results, it’s very heavy to train and needs a lot of time” - becruily\n\n***On the side.***\n\n**ZLUDA** is a translation layer for CUDA allowing to use any CUDA-written app to be used with AMD (and formerly Intel) GPUs, and without any modifications to such app.\nWeaker GPUs than 7900 XT might show its weeknesses considerably, [compared](https://www.purepc.pl/image/news/2024/02/13_zluda_uklady_graficzne_amd_radeon_moga_skorzystac_z_bibliotek_nvidia_cuda_projekt_porzucony_przez_intela_i_amd_juz_dostepny_4_b.jpg) to better GPUs. The example came from ZLUDA in Blender, but rather from AMD period code, so before the takedown and rollback to pre-AMD codebase so now ZLUDA is more crippled.\nWith never released code, at certain point it was even made to support Batman Arkham Knight, with general plans to support DLSS, but it will probably never see a day light.\nMaybe [this](https://github.com/lshqqytiger/ZLUDA) repo still has the old base forked - version 3 codebase is still being updated there. Utilizing it on 6700 XT in [stable-diffusion-webui-amdgpu](https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu), it was performing slowly like DirectML, but on 7900 XT it sped up the process from 3-4 to ~1 minute. The first execution can be slow due to need of creating cache. Then it can surpass ROCm performance-wise if you manage to make it work. Plus, ZLUDA works on Windows and supports older AMD GPUs, like even RX 500 series (use [lshqqytiger’s repo](https://github.com/lshqqytiger/ZLUDA), check e.g. ROCm 5 version if your app doesn’t start, but it might crash anyway), while for ROCm on Linux and older GPUs, e.g. RX 5700 XT should work with some quirks (e.g. HIP 5.7 and ROCm around 5.2.\\* - [src](https://www.reddit.com/r/ROCm/comments/1gcf3x4/comment/ma62qop/), although you can try out 6.21 or 6.2.x to ensure, as it could happen that some earlier 6.x wasn’t supporting RX 5700 XT correctly, while e.g. for RX 6000 ROCm 6.24 should be used at the moment).\nIt could be interesting to see utilizing training repo using ZLUDA e.g. on Windows instead of ROCm Pytorch on Linux but Unwa notice in the ZLUDA repo fork, “PyTorch: torch.stft does not always return correct result.” and it might be problematic during training, so ZLUDA might be not a good solution for training currently, but who knows whether for inferencing on e.g. Windows using MSST or UVR, although the latter crashes for me during separation with nvcuda.dll. But I haven't tried messing with HIP SDK mentioned in the release page or other fork's ZLUDA versions than the newest. I don't even have anything in C:\\Program Files\\AMD\\ROCm (if it wasn’t even futile without it), but I have amdhip64.dll v. 5.5 in system32 (if 5.7 isn’t shipped with newer drivers and required).\n\nAlso, didn't follow these instructions yet, and they might be useful and contain some older GPUs workaround:\n\n<https://github.com/vladmandic/sdnext/wiki/ZLUDA>\n\nAll gfx names with corresponding GPU models:\n\n<https://llvm.org/docs/AMDGPUUsage.html#processors>\n\nMore ZLUDA research and workarounds (may work for UVR, not have too):\n\n<https://github.com/comfyanonymous/ComfyUI#amd-gpus-experimental-windows-and-linux-rdna-3-35-and-4-only>\n\n(RDNA 3-4 instructions for Python manual installation using DirectML branch of UVR to use with different backend)\n\n<https://github.com/comfyanonymous/ComfyUI#for-amd-cards-not-officially-supported-by-rocm>\n\n(flags for RDNA 2-3)\n\n<https://github.com/patientx/ComfyUI-Zluda>\n\n(Instructions for GCN4-RDNA4;\n\nRDNA 2 with HIP 6.2.4 and experimental 6.4.2)\n\nUsing hip 5.7.1 and corresponding ZLUDA should be possible on RDNA2 too\n\n<https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides#amd-fooocus-with-zluda>\n\n(Step 5 in \"Setting up Zluda\" a bit below - for GPUs below RX 6800 or 9070/60,\n\nand instructions above the point are there for 6800 or higher too\n\n<https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides#rocm-hip-sdk-57-with-zluda-setup>\n\n(Instructions for GCN4 [RX 400/500]; it contains a step with\n\npip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118)\n\n<https://github.com/advanced-lvl-up/Rx470-Vega10-Rx580-gfx803-gfx900-fix-AMD-GPU#important-notes-on-installation>\n\n(Instructions for GCN4 [GFX803 & GFX900])\n\n<https://github.com/ROCm/ROCm/issues/4749#issuecomment-3117083336>\n\n(some old 7900 XTX (gfx11100) troubleshooting\n\n- HIP SDK 6.5 might a bit less crashy,\n\n\"Follow the instructions at <https://github.com/patientx/ComfyUI-Zluda> then do the patches at <https://github.com/patientx/ComfyUI-Zluda/issues/222>)\n\nPossible complement for RX 6600/6700 (if wasn’t mentioned in the instructions above already)\n\n<https://github.com/YellowRoseCx/koboldcpp-rocm/releases/download/deps-v6.2.0/rocblas-6.2.0.dll.7z>\n\nextract that zip file into C:/Program Files/AMD/ROCm/6.1/bin/ it should merge with the \"rocblas\" folder that's already in there.\n\n\\_\\_\\_\\_\\_\\_\n\nOutdated stem limiting methods moved [here](https://docs.google.com/document/d/1WDDK7E8-HY7EYUOM-evIcXNYX_Of0gYOVjG95zLQfKo/edit?tab=t.0)\n\n\\_\\_\\_\n\n*Might be potentially useful for any training in Colab (by HV, 2021):*\n\n“function ConnectButton(){\n\nconsole.log(\"Connect pushed\");\n\ndocument.querySelector(\"#top-toolbar > colab-connect-button\").shadowRoot.querySelector(\"#connect\").click()\n\n}\n\nsetInterval(ConnectButton,60000);\n\n<- enter this on console (not cell)\n\nand keep Colab on foreground.\n\nIt's not really good to train in Colab at all, due to its limitations.\n\nIf you're training because you want a better model than v5/v4 mgm models, stop it, you won't surpass mgm models with just Colab. However, you could subscribe to <https://cloud.google.com/gcp>\n\nand watch some YouTube tutorials how to utilise its resources to Colab.”\n\n#### Local SDR testing script\n\n<https://drive.google.com/file/d/1GC9pwch0WQXZXwBNTz_QnXE_UyxdKmQF/view?usp=sharing> by Dill\n\n<https://drive.google.com/file/d/1BeqNw3TnRTDMwnoQGMbOqwrcGRwe4Zht/view?usp=sharing> GUI by zmis (but it scores a bit lower for some reason):\n\n“Here's a handy little python script I made using the help of Ai that can calculate the SDR of a track based off of the actual instrumental or vocal of the song.\n\nYou can do python sdr.py --help for an explanation on how to use the script.\n\nYou just need numpy and scipy for it to work, and python ofc!\n\nI'm not sure if you would like to pin this or not, but I've been using this script to help me improve my separation methods.\n\n<https://github.com/ZFTurbo/Audio-separation-models-checker/tree/main>\n\nBased on MUSDB18-HQ dataset”\n\n“Q: Why SDR goes <0 in silence parts? (song\\_006)\n\nA: SDR and SISDR behave weirdly when 1 of the input is silent, and that's why log\\_WMSE was made: <https://github.com/crlandsc/torch-log-wmse/>\n\nInterestingly, L1freqMag metrics is giving same results than some users here (1296 a bit better for instrumentals, 1297 a bit better for vocals).” jarredou\n\n#### Best ensemble finder for a song script\n\nby Vinctekan\n\n<https://drive.google.com/file/d/1LUtBsCSym1iDHqADEusmACs-LF2lNYLw/view?usp=sharing>\n\nCurrently, this optimized version can find the best combo of 9, 3 minute audio files in about 2 minutes and 40 seconds in Colab.\n\nRefactored best *weighted* ensemble finder by jarredou\n\n<https://drive.google.com/file/d/1Rm09z1wpj0Pi-6bFQ15u767n1XV95pDz/view?usp=sharing>\n\n“That's what I've used to find optimal weights for my MDX23 fork v2.5 update.\n\nIt's still Nelder-Mead based optimizer, but code is way more simple/clean than 1st version.\n\nTo use it, you need:\n\nA dataset of clean sources (with exact same filename scheme than mvsep multisong dataset).\n\nProcess dataset mixtures with all the models you want to ensemble and put the outputs in different folders, 1 for each model (and still with exact same filename scheme than mvsep multisong dataset).\n\nlibrosa and scipy python libs\n\nthen run (for example):\n\nweight\\_finder\\_v2.py \\\n\n--ref c:\\reference\\_dataset \\\n\n--est c:\\InstVoc c:\\bsrofo1296 c:\\kimrofo \\\n\n--stem vocals \\\n\n--extension flac \\\n\n--tracks 100\n\n--ref is clean sources' folder path\n\n--est is estimates (separations) folder paths (multiple inputs)\n\n--stem is stem name (based on multisong dataset filename scheme)\n\n--extension is audio file extension (flac/wav...)\n\n--tracks is a number of tracks in a dataset.\n\nIt will process the datasets many times and change weights each time until it find the best balance. When finished, it will output weights scaled to 10 max value.\n\nWarning: it can take hours (or even days, depending on the number of models to ensemble, size of dataset and resources of computer)\n\nA python lib to align audio:\n\n<https://github.com/nomonosound/fast-align-audio>”\n\n#### Universal function to make different types of ensembles by ZFTurbo\n\n<https://cdn.discordapp.com/attachments/911050124661227542/1192220574982881320/ensemble.py>\n\nI think it’s the same or newer:\n\n<https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/ensemble.py>\n\n“In my experiments SDR for avg\\_wave always the max.”\n\nNow also jarredou made his Colab with the above implemented with comfy GUI:\n\n<https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Manual_Ensemble_Colab.ipynb>\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n##### Volume compensation for MDX v2 models\n\n**How to automate calculation of volume compensation value for all older MDX models**\n\n*(results are not perfect and need to be fine-tuned)*\n\nby jarredou\n\nSo, I have maybe a protocol to find accurate volume compensation:\n\n- Use a short .wav file of just noise (I've used pink noise here) and pass it through the model you wanna evaluate\n\n- Take the resulting audio, the one that will have all the noise in it and compare it to the original noise with this little python script that will give you the difference in dBTP and the quivalent VC ratio (you'll need to\n\npip install librosa\n\nif you don't have it installed already). The results I've found with it are coherent with the ones you've found by ears ! (1.035437 for HQ2 / 1.022099 for KimFT other)\n\nHere's the script :\n\nimport numpy as np\n\nimport argparse\n\nimport librosa\n\ndef Diff\\_dBTP(file1,file2):\n\ny1, sr1 = librosa.load(file1)\n\ny2, sr2 = librosa.load(file2)\n\ntrue\\_peak1 = np.max(np.abs(y1))\n\ntrue\\_peak2 = np.max(np.abs(y2))\n\ndifference = 20 \\* np.log10(true\\_peak1 / true\\_peak2)\n\nprint(f\"Diff\\_dBTP : The difference in true peak between the two audio files is {difference:.6f} dB.\")\n\nratio = 10 \\*\\* (difference / 20)\n\nprint(f\"The volume of sound2 is {ratio:.6f} times that of sound1.\\n\")\n\nif \\_\\_name\\_\\_ == \"\\_\\_main\\_\\_\":\n\nparser = argparse.ArgumentParser(description=\"Find volume difference of two audio files.\")\n\nparser.add\\_argument(\"file1\", help=\"Path to original audio file\")\n\nparser.add\\_argument(\"file2\", help=\"Path to extracted audio file\")\n\nargs = parser.parse\\_args()\n\nDiff\\_dBTP(args.file1, args.file2)\n\n**Volume compensation values for various models (in reality they may differ +/- e.g. by 0.00xxxx, but maybe not much more)**\n\nAll values according to the script made by \\*\\*jarredou\\*\\*\n\n\\*(All default but Spectral Inversion - Off; Denoise Output: On; - the latter shouldn't affect the results if turned off)\\*:\n\n- Kim Vocal\\_1 - 1.012819\n\n- Kim Vocal 2 - 1.009\n\n- voc\\_ft - 1.021\n\n- Kim ft other - 1.020 (Bas' fine-tuned and SDR-validated)\n\n- UVR-MDX-NET 1 - 1.017194\n\n- UVR-MDX-NET Inst 2 - 1.037748\n\n- UVR-MDX-NET Inst 3 - 1.043115\n\n- UVR-MDX-NET Inst HQ 1 - 1.052259\n\n- UVR-MDX-NET Inst HQ 2 - 1.047476\n\n- UVR-MDX-NET Inst Main - 1.037812 (actually it turned out to be 1.025)\n\n- UVR-MDX-NET Main - 1.002124\n\n- UVR-MDX-NET-Inst\\_full\\_292 - 1.056003\n\n- UVR-MDX-NET\\_Inst\\_82\\_beta - 1.088610\n\n- UVR-MDX-NET\\_Inst\\_90\\_beta - 1.151219 (wtf)\n\n- UVR-MDX-NET\\_Main\\_340 - 1.002742\n\n- UVR-MDX-NET\\_Main\\_406 - 1.001850\n\n- UVR-MDX-NET\\_Main\\_427 - 1.002091\n\n- UVR-MDX-NET\\_Main\\_438 - 1.001799\n\n- UVR\\_MDXNET\\_9482 - 1.007059\n\n\"denoise is just processing twice with the second try inverted, after separation reinverted, to amplify the result, but remove the noise introduced by MDX, and then deamplified by 6dbs, so it still the same volume, just without MDX noise.\n\nBasically HV noise removal trick\"\n\n###### UVR-MDX parameters & hashes decoded by Bas Curtiz\n\n<https://github.com/Anjok07/ultimatevocalremovergui/blob/master/models/MDX_Net_Models/model_data/model_data.json> - the link with hashes possess MDX models parameters.\n\nThe above probably still doesn’t possess all the models added in the update, e.g. Foxy model, but there are only 4-5 combinations of settings so far.\n\nFile with newer models parameters:\n\n<https://raw.githubusercontent.com/TRvlvr/application_data/main/mdx_model_data/model_data_new.json>\n\nAll MDX-Net model parameters in UVR consist of these combinations:\n\n- HQ\\_4:\n\nself.n\\_fft = 6144 dim\\_f = 2560 dim\\_t = 8\n\n- All older HQ fullbands:\n\nself.n\\_fft = 6144 dim\\_f = 3072 dim\\_t = 8\n\n- kim vocal 1/2, kim ft other (inst), inst 1-3 (415-464), 427, voc\\_ft:\n\nself.n\\_fft = 7680 dim\\_f = 3072 dim\\_t = 8\n\n- 496, Karaoke, 9.X (NET-X)\n\nself.n\\_fft = 6144 dim\\_f = 2048 dim\\_t = 8 (and 9 kuielab\\_a\\_vocals only)\n\n- Karaoke 2\n\nself.n\\_fft = 5120 dim\\_f = 2048 dim\\_t = 8\n\n- De-reverb by FoxJoy\n\nself.n\\_fft = 7680 dim\\_f = 3072 dim\\_t = 9\n\n###### UVR model hash decode\n\n“I've made this little script a while back to find those hashes.\n\nUse with model\\_hash\\_finder.py path\\_to\\_model\\_file.”\n\n<https://drive.google.com/file/d/1D4TNKjuObNn6MSiss1PtmXPQoR3XJOwJ/view?usp=sharing>\n\nIt's a checksum hash but based only on the last 10MB of model files.” (jarredou)\n\nfull\\_band\\_inst\\_model\\_new\\_epoch\\_309.onnx fea6de84f625c6413d0ee920dd3ec32f\n\nfull\\_band\\_inst\\_model\\_new\\_epoch\\_337.onnx 4bc04e98b6cf5efeb581a0f382b60499\n\nkim\\_ft\\_other.onnx b6bccda408a436db8500083ef3491e8b\n\nKim\\_Vocal\\_1.onnx 73492b58195c3b52d34590d5474452f6\n\nKim\\_vocal\\_2.onnx 970b3f9492014d18fefeedfe4773cb42\n\nUVR-MDX-NET-Voc\\_FT.onnx 77d07b2667ddf05b9e3175941b4454a0\n\nkuielab\\_a\\_bass.onnx 6703e39f36f18aa7855ee1047765621d\n\nkuielab\\_a\\_drums.onnx dc41ede5961d50f277eb846db17f5319\n\nkuielab\\_a\\_other.onnx 26d308f91f3423a67dc69a6d12a8793d\n\nkuielab\\_a\\_vocals.onnx 5f6483271e1efb9bfb59e4a3e6d4d098\n\nkuielab\\_b\\_bass.onnx c3b29bdce8c4fa17ec609e16220330ab\n\nkuielab\\_b\\_drums.onnx 4910e7827f335048bdac11fa967772f9\n\nkuielab\\_b\\_other.onnx 65ab5919372a128e4167f5e01a8fda85\n\nkuielab\\_b\\_vocals.onnx 6b31de20e84392859a3d09d43f089515\n\nReverb\\_HQ\\_By\\_FoxJoy.onnx cd5b2989ad863f116c855db1dfe24e39\n\nUVR-MDX-NET-Inst\\_1.onnx 2cdd429caac38f0194b133884160f2c6\n\nUVR-MDX-NET-Inst\\_2.onnx ceed671467c1f64ebdfac8a2490d0d52\n\nUVR-MDX-NET-Inst\\_3.onnx e5572e58abf111f80d8241d2e44e7fa4\n\nUVR-MDX-NET-Inst\\_full\\_292.onnx b06327a00d5e5fbc7d96e1781bbdb596\n\nUVR-MDX-NET-Inst\\_full\\_338.onnx 13819d85cad1c9d659343ba09ccf77a8\n\nUVR-MDX-NET-Inst\\_full\\_382.onnx 734b716c193493a49f8f1ad548451c48\n\nUVR-MDX-NET-Inst\\_full\\_386.onnx 2e4fcd9ec905f35d2b8216933b5009ff\n\nUVR-MDX-NET-Inst\\_full\\_403.onnx 94ff780b977d3ca07c7a343dab2e25dd\n\nUVR-MDX-NET-Inst\\_HQ\\_1.onnx 291c2049608edb52648b96e27eb80e95\n\nUVR-MDX-NET-Inst\\_HQ\\_2.onnx cc63408db3d80b4d85b0287d1d7c9632\n\nUVR-MDX-NET-Inst\\_HQ\\_2.onnx 55657dd70583b0fedfba5f67df11d711\n\nUVR-MDX-NET-Inst\\_Main.onnx 1c56ec0224f1d559c42fd6fd2a67b154\n\nUVR-MDX-NET\\_Inst\\_187\\_beta.onnx d2a1376f310e4f7fa37fb9b5774eb701\n\nUVR-MDX-NET\\_Inst\\_82\\_beta.onnx f2df6d6863d8f435436d8b561594ff49\n\nUVR-MDX-NET\\_Inst\\_90\\_beta.onnx 488b3e6f8bd3717d9d7c428476be2d75\n\nUVR-MDX-NET\\_Main\\_340.onnx 867595e9de46f6ab699008295df62798\n\nUVR-MDX-NET\\_Main\\_390.onnx 398580b6d5d973af3120df54cee6759d\n\nUVR-MDX-NET\\_Main\\_406.onnx 5d343409ef0df48c7d78cce9f0106781\n\nUVR-MDX-NET\\_Main\\_427.onnx b33d9b3950b6cbf5fe90a32608924700\n\nUVR-MDX-NET\\_Main\\_438.onnx e7324c873b1f615c35c1967f912db92a\n\nUVR\\_MDXNET\\_1\\_9703.onnx a3cd63058945e777505c01d2507daf37\n\nUVR\\_MDXNET\\_2\\_9682.onnx d94058f8c7f1fae4164868ae8ae66b20\n\nUVR\\_MDXNET\\_3\\_9662.onnx d7bff498db9324db933d913388cba6be\n\nUVR\\_MDXNET\\_9482.onnx 0ddfc0eb5792638ad5dc27850236c246\n\nUVR\\_MDXNET\\_KARA.onnx 2f5501189a2f6db6349916fabe8c90de\n\nUVR\\_MDXNET\\_KARA\\_2.onnx 1d64a6d2c30f709b8c9b4ce1366d96ee\n\nUVR\\_MDXNET\\_Main.onnx 53c4baf4d12c3e6c3831bb8f5b532b93\n\nVR de-reverb models decode\n\nUVR-De-Echo-Normal.pth = f200a145434efc7dcf0cd093f517ed52\n\nUVR-De-Echo-Aggressive.pth = 6857b2972e1754913aad0c9a1678c753\n\nUVR-DeEcho-DeReverb.pth = 0fb9249ffe4ffc38d7b16243f394c0ff\n\nSo they’re all \"4band\\_v3.json\" config file (from [here](https://github.com/TRvlvr/application_data/blob/main/vr_model_data/model_data.json))\n\nMore thorough chart by David Duchamp a.k.a. Captain FLAM:\n\n[https://docs.google.com/spreadsheets/d/1XZAyKmgJkKE3fVKrJm9pBGIXIcSQC3GWYYI90b\\_ul1M](https://docs.google.com/spreadsheets/d/1XZAyKmgJkKE3fVKrJm9pBGIXIcSQC3GWYYI90b_ul1M/edit#gid=366525450)\n\n\\_\\_\\_\n\n##### Voice Cloning\n\n“RVC and some of its forks ([Applio](https://docs.applio.org/), Mangio, etc) are genuine free, open source ones for inference and training. For realtime voice changer that uses RVC models, there's w-okada: <https://rentry.co/VoiceChangerGuide>” no guide for Linux though.\n\n<https://www.tryreplay.io/>\n\n“Url downloads, local files, massive database of models, both huggingface and weightsgg, in built separation models, options to skip that part if you have vocals, ability to use multiple ai models for one particular result, and the option to either merge or just get multiple results at the end, plus whatever else, de-reverb and stuff” it has voc\\_ft vocal model from UVR5.\n\n“even my old laptop still can inferencing using applio\n\ni3 3217u 1.8ghz\n\nintel hd 4000”\n\nAnd you’re probably aware already that RVC Colabs to train voice cloned models are banned.\n\n##### Stable Audio Open Gen\n\nAvailable on MVSEP in the Experimental section. It’s not for separation, but generating sounds.\n\nZFTurbo: “Algorithm based on model:\n\n<https://huggingface.co/stabilityai/stable-audio-open-1.0>\n\nAudio is generated in Stereo format with a sample rate of 44.1 kHz and duration up to 47 seconds. The quality is quite high. It's better to make prompts in English.\n\nExample prompts:\n\n1) Sound effects generation: cats meow, lion roar, dog bark\n\n2) Sample generation: 128 BPM tech house drum loop\n\n3) Specific instrument generation: A Coltrane-style jazz solo: fast, chaotic passages (200 BPM), with piercing saxophone screams and sharp dynamic changes\n\nExamples:\n\nCat meow:<https://mvsep.com/result/20250612092110-b297c082fb-generated.wav>\n\nDog bark:<https://mvsep.com/result/20250612115517-b297c082fb-generated.wav>\n\n128 BPM tech house drum loop:<https://mvsep.com/result/20250612115841-b297c082fb-generated.wav>\n\nViolin solo:<https://mvsep.com/result/20250612120111-b297c082fb-generated.wav>\n\nWoman sing song \"Happy Birthday to you\":<https://mvsep.com/result/20250612120433-b297c082fb-generated.wav>”\n\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\n*Visit our* [*#dev-talk*](https://discord.com/channels/708579735583588363/1220364005034561628) *channel for more*\n\n##### [Anjok’s interview](https://www.youtube.com/watch?v=-pcVN54cgw0) on YT\n\n*TL;DW: UVR’s documentary + training, archs and demudder explained*\n\nAnjok is the developer of Ultimate Vocal Remover 5 (UVR5 GUI).\n\nHe intended UVR to be a Swiss army tool - to contain everything you need for separation, and also contain models made by the community (e.g. dereverb/denoise/deecho).\n\nHistory of UVR\n\nAnjok in times where Spleeter was still a thing, found a VR arch made by Japanese developer, tsurumeso, and received better results than Spleeter. He started to make his own model on laptop 1060 6GB on 100 or 150 pairs with the absolute minium parameters, and it turned out to be a better model than tsurumeso's one. Later he transitioned to faster GPU (probably before 3090 yet).\n\nAnjok wanted GUI for VR, and found BoskanDilan on Fiver and simply contracted him, paying to build the foundations of what UVR is today. BoskanDilan turned out to be a very good and talented coder.\n\nThey put the work on GitHub, and Aufr33 contacted Anjok with ideas on how the VR models can be improved etc.\n\nThen BoskanDilan left in mid 2021 for personal reasons. Then the GUI work was taken by Anjok who was mentored by BoskanDilan to improve on understanding the coding. Anjok started to working on UVR exclusively, spending 10 hours a day for UVR in 2022.\n\nHe decided to make a simple installer in one package, as he received lots of issues on GutHub, from people not knowing how to install it. He also re-coded the UVR to make the code easier to maintain. Then Bas Curtiz helped Anjok on design aspects of UVR, e.g. designed new logo, and gave some advice, and good amount of feedback from UVR user perspective. Early 2022 phase of UVR development took a lot of advice from early users of UVR.\n\nIn May 2022 there was a first installer released to make UVR more accessible without e.g. installing Python or other dependencies and specialized programming knowledge to set up a proper environment.\n\nAnjok was still in charge of introducing other archs than VR into UVR, being simply the only one behind the process, while normally bigger teams work on projects of that scale, when e.g. different archs could be coded into UVR by different developers. It was a stressful period of time, because Anjok intended to make the software which is free of bugs, and still not fully rely on the community in terms of bug reporting.\n\nThen the Mac version came out and M1/M2/M3 support for faster GPU acceleration. Anjok found out in Demucs repo a part of the code, making it easier to port UVR to Macs, and it is used by every model. Music community is pretty Mac-centered, and he devoted a considerable amount of time to make it work reliably on Macs too.\n\nIn the new UVR version there's a planned demudder to be introduced (described later), and possibly translations.\n\nAnjok currently trains a new model coming in several weeks.\n\nIt's intended to be a little smaller in order to be not so resource intensive, but also better than the best current MDX-Net model.\n\nUpdate 01.03.24\n\n“I'm going to allow HQ4 to continue training beyond 1500+ epochs as an experiment (it's currently at 1200), and interestingly, the SDR has been steadily increasing. It has significantly surpassed HQ3 in terms of SDR and listening tests, and it also outperformed MDXC23 in listening tests, though not in SDR (yet!). The most recent evaluation on the multi-dataset showed a score of 15.85, using the default settings. Clearly, there's a limit to how much further training can enhance performance, but up to this point, improvements are still being observed. This model has been in training since October! I'm chipping away at the next GUI update as well, and the demudder will be in it.”\n\nThe model was released, with already HQ\\_5 scheduled in following month/s.\n\n*The archs in UVR and their technicalities summarized*\n\n*VR*\n\nVR uses audio spectrograms and converts them to FFT spectrograms.\n\nVR uses only magnitude spectrograms, not phase.\n\nPhase represents timing where the data is, while magnitude represents the intensity of each frequency.\n\nPhase is much harder to predict.\n\nActually VR uses original phase from the mixture and saves it during the process \"and it just does the magnitude\".\n\nThat's the reason why VR tends to have more artefacts in it. The smearing in instrumentals of VR is because the phase from the mixture is still in there.\n\nAufr33 later introduced 4 bands support for UVR.\n\nLet's say for first of three bands between 0-700Hz there will be different resolution, for all other frequency ranges there will be different. E.g. knowing that vocals are in specific frequency range, you can optimize it further.\n\nThat feature made UVR and VR arch much better.\n\nLater they introduced -\n\n*Ensembling*\n\nSo a way to use multiple models to potentially get better results.\n\nThe three ways of ensembling:\n\navg - gets the average of vocals/instrumentals\n\nmax - is maximum result of each stem, e.g. in a vocal you'll get the heaviest weighted vocal from each model, and the same goes for instrumental, giving a bit cleaner results, but more artefacts\n\nmin\n\n*MDX-Net*\n\nUses full spectrogram with phase and magnitude\n\nTradeoff is muddier results, but natural, cleaner sound.\n\n*Training*\n\nAnjok separated on nearly every genre you can think of, and stated that the hardest genre for separation is metal and vocal-centered mixes. Also, if the instrumental has lot of noise, e.g. distorted guitars, the instrumental will come out muddier.\n\nMDX-Net was the arch, addressing lots of VR issues in its core.\n\nTracks from 70-80s can separate well. 50-60s will be harder, e.g. recorded in mono. Early stereo era gets a little better.\n\nA good model needs to be as good as the dataset for a model.\n\nThere was lots of work scrapping it from the internet.\n\nAufr33 was the mastermind behind Karaoke model and its dataset.\n\nDemucs model wasn't as successful, as probably was more meant for more stems, and MDX-Net gave better results for 2 stems.\n\nTraining details covered in this interview can be found at the top of [Traning models guide](#_bg6u0y2kn4ui) section of the doc\n\nThe biggest issue in terms of archs and the source of muddiness, is phase. Currently, in audio separation there's not a great way to calculate phase in a model like the phase spectrogram as it's not as obvious as the magnitude spectrogram.\n\nYou take the vocal out of a heavy rock track, but the process is not perfect, so it will take some part of the instrumental with it. Even if you don't hear instrumental in vocals, there's still instrumental data in there in the phase of that vocal track.\n\nIn the end of the day, source separation is prediction. It's predicting where it thinks it is, but there will be always some imperfections, e.g. whenever you hear muddier sound in a track which has more noise like metal tracks.\n\nAnjok emphasizes on (currently) lack of correlation between SDR and the fact that bigger SDR metric doesn’t necessarily mean better. He tried some top of the SDR chart result before, and wasn’t quite happy about them.\n\nBecause phase is a big part of the issue, now the new upcoming -\n\n###### *Demudder*\n\nA UVR feature incoming (it was also explained before on the server by Anjok - if something is not clear, try to find his messages there)\n\nIt uses lots of phasing tricks. It processes the track twice. The first takes instrumental from the first go around and compares it against the original mixture. It chops the mixture into 3 second chunks and ?inerts over that lists of chunks and for each segment, it cuts out where that segment is in the instrumental, and it finds similar events that aren't at the exact same place. It takes those chunks, and it analyses them against the instrumental that was generated, and it tries to find the most similar events it can from the instrumental, that aren't at the same place from that segment, and it finds similar events, and then it phases it, it does a phase invert of that instrumental\n\n(56:30) If the volume or DB threshold isn't past the certain point because it's too loud then it means it does not cancel out and doesn't make phase invert, if it reaches a certain threshold like if it is below certain threshold it'll phase that, and then it will basically stitch together a new mixture that is kind of phased from that original instrumental output, and it reprocesses that new stitch together, mixture with the phase with the instrumental phase changed, and it processes that through the second pass, and then it takes that vocal and then phase inverts it with the mixture, with the original mixture and then what you end up having is some of the parts that are similar from the other parts of the track, you end up having those fill in the spectral holes.\n\nSam remarks find some similarity with probably how Izotope Imager works.\n\nAnjok says: I'm trying to get a similar part, but also try to take it and phase it with that segment. Because it's not the exact same part of the segment, it's not gonna be a perfect phase, because it would be an original vocal output.\n\nSo it's kind of still finding the bit of instrumental that is still in the vocal.\n\nSam remarks about frequent situations where you perform separation, and it can lead to decrease of e.g. hi hat volume levels in instrumental, referring to what information separated vocal stem can wear. It's part of the muddiness Anjok tried to address with the feature.\n\nAnjok didn't want to compromise vocal quality, and in some cases it makes the vocals better too, but it also depends on how the track was mixed originally. If it's an analog track recorded in one session or even a live track, it won't work so well. The problem is with e.g. 10 minutes track, when demudder won't find phase similarities so effectively. It will work the best on music made with samples. If the track is digital, it is more likely to work better.\n\nAnjok currently works on it to make it work for all tracks.\n\nThe more he works on it, the more breakthroughs are made, but due to his day job, he had less time to work on it lately.\n\nAnjok gives his appreciation to the group of very talented developers who made MDX-Net arch in the University of Korea. It's his favourite network. He's a big fan of Woosung Choi's work.\n\n\\_\\_\\_\\_\\_\n\nLater, Aufr33 invented his own:\n\n###### Simpler demudder\n\nPublished for paid users of x-minus.pro (when you pick Roformer model for instrumentals, buttons with methods appear; it is only applied for instrumentals, not vocals)\n\nIn his own words:\n\n“\n\n1. Separate the song into vocals and music\n\n2. Invert the phase of the vocal and mix it with the music\n\n3. Now separate this mix\n\n4. Mix the vocals with the input song\n\nIt actually works more complicated than that. I added a high pass filter since the demudder is not needed at low frequencies.”\n\nProbably something from the 100-250 Hz range.\n\nActually, Aufr33 used following ffmpeg command:\n\n“ -filter\\_complex \"[0:a]highpass=f=900[hp1];[0:a][hp1]amerge,pan=stereo|c0=c0-c2|c1=c1-c3[lp];[1:a]highpass=f=900[hp2];[lp][hp2]amix=inputs=2:duration=longest:normalize=0[out]\"”\n\nRephrased by becruily\n\n“use Roformer on a song\n\nphase invert the vocal file and combine it with the instrumental\n\nseparate again using the same model\n\ncombine the original song and vocals (no inversion or anything) and you will get demudded inst\n\nthis is for instrumental, if you want demudded vocals just switch the two words (acapella and instrumental)”\n\n[Video](https://discord.com/channels/708579735583588363/708579735583588366/1265218940695740498) how to apply demudder method\n\n*Notes*\n\n- For HQ 4 and at least denoise model enabled, the method seems to produce more vocal residues, so it might be feasible more for Roformers (it’s used optionally for Kim Mel-Roformer on x-minus).\n\n- \"xminus demudder is more pleasing to the ears\" isling\n\n- Some people might still prefer max\\_mag ensemble on x-minus or mel-roformer + bs-roformer ensemble in UVR\n\nPhase fixer on x-minus for unwa inst v1 model copies phase from Kim Mel-Roformer model.\n\n###### UVR Demudders released in the [beta Roformer patch](#_6y2plb943p9v)\n\n(in Anjok’s words)\n\n* **Phase Rotate**:\n  The fullest sounding, but can leave a lot of artifacts with certain models. I only recommend that method for the muddiest models. Otherwise, Combined Methods is the best”\n  + First, a filtered instrumental is created, and the left and right channels are swapped.\n  + The phase is shifted by 90 degrees.\n  + This modified filtered instrumental is then inverted with the original mixture, and another inference pass is performed on the resulting mixture.\n  + Finally, the vocal from the second pass is phase-inverted and combined with the original mixture, creating a cleaner instrumental.\n* **Phase Remix** (similar to X-Minus; also available in [this](https://colab.research.google.com/drive/1IC6Q1hLF55_tK6mhky0SWYKGVF9T5WsY) Colab):\n  I don't recommend using phase remix on the Instrumental v1e model. I recommend combined methods or phase rotate for models producing fuller instrumentals.\n  + The mixture is first separated into stems.\n  + The phase of the vocal stem is inverted and mixed with the filtered instrumental to produce a modified \"mixture.\" Another inference pass is performed on this new mixture.\n  + The vocal stem extracted from the modified mixture is then reintroduced into the original mixture, creating a cleaner instrumental.\n  + This method is *only* recommended for models that produce very muddy instrumentals!\n* **Combine Methods**:\n  + It's basically a weighted mix of the final instrumentals generated by \"Phase Rotate\", \"Phase Remix,\" and the initial instrumental.\n\nDemudder in UVR doesn’t work on 4GB VRAM Intel/AMD GPUs\n([more demudder troubleshooting](#_u1hv6utk5vj))\n\n###### *Phase fixer/swapper* (decreases vocal residues in instrumentals)\n\nThe method was invented by Aufr33 to fix noise in Roformers models trained with instrumental or other stem target. It copies phase from instrumental from a model trained with vocal stem target which usually gives muddy instrumentals but better bleedless metric (e.g. Kim’s or Becruily’s vocal) and copies it into the instrumental model result. Initially it was added only to x-minus/uvronline (iirc for premium users), but later Becruily wrote his own script doing the same (both torch and librosa implementations if you fail to use one of them).\n\nLater Anjok implemented it into one of the UVR Roformer beta [patches](#_6y2plb943p9v) (Tools>Phase Swapper), although there it only allows changing high and low cutoff, but no high frequency weight, and santilli\\_ (Michael) found out that increasing it from 0.8 to 2 is beneficial for phase swapping from Becruily vocal to instrumental model, and that it’s much better than manipulating with high and low cutoff.\n\nYou can use their forked Colab [here](https://colab.research.google.com/drive/1uDXiZAHYk7dQajOLtaq8QmYXL1VtybM2?usp=sharing)/[fixed](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm) (it allows inferencing and batch separation with phase fixing automatically).\n\nYou can use and edit the original phase swap Python scripts for previously separated files [here](https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy?usp=drive_link) (iirc - be aware it has hardcoded file names into script to work, so Kim as voc model, so while using any other model, you still need the same file name).\n\nThe result of phase fixer - less noise in the instrumental, but more muddiness - it's not necessary in all cases (instructions of usage of the script is described in the link above).\n\nOptionally, in Phase Fixer you could set 420 for low and 4200 for high or 500 for both and Mel-Kim model for source; and bleed suppressor (by unwa/97chris) to alleviate the noise further (e.g. phase fixer on its own works better with v1 model to alleviate the residues). Besides the default UVR default 500/5000 and Colab default 500/9000 values, you could potentially “even try like 200/1000 or even below for 2nd value.” “I would say that the more noisy the input is, the lower you have to set the frequency for the phase fixer.” - jarredou\n\n*Phase fixer models suggestions*\n\nUsually I post suggested models in the descriptions of instrumental models along with values for phase fixer/swapper [here](#_2vdz5zlpb27h), but you have some suggestions gathered in one place also in phase fixer [Colab](https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm).\n\nE.g.\n\na) becruily voc/becruily inst (very bleedless, but muddy)\n\nb) becruily voc/flowersv10 (not as muddy)\n\nc) BS-Roformer 2025.07 (exclusive to MVSEP)/(any other inst model)\n\nd) Mel-Roformer Bas Curtiz Edition on MVSEP/(any other inst model) - less bleeding/vocal shells than while using becruily voc\n\n###### *Phase-fixed output used in UVR’s Manual ensemble*\n\n(method explained by objectbed, corrected by Ari/arxynr)\n\nOn example of dca’s [ensemble](#_nk4nvhlv1pnt) in the doc:\n\n\"0) Unwa BS Roformer Resurrection Inst (BS 2025.07 as a reference for phase fix) + MVSEP BS Roformer 2025.07 (Max Spec)\n\n—-> the least vocal crossbleeding.\n\nAlternatively, you can use becruily vocal model instead of 2025.07 for the ensemble -\n\n“Becruily vocal correctly recognize instruments far better than the instrumental one” - dca100fb8\"\n\n“This would equate to the following steps:\n\n[guide corrected; didn't give correct results before]\n\n1. Separate your mixture using the Unwa BS Roformer Resurrection Inst model.\n\n• Output: inst\\_unwa.wav (instrumental) + optional vocal.\n\n2. Separate your mixture using the MVSEP BS Roformer 2025.07 model.\n\n• Output: inst\\_mvsep.wav + vox\\_mvsep.wav.\n\n[Note: Set WAV output. Then MVSEP won't turn down the volume to accommodate for clipping, and in such cases it should use 32-bit float instead of 16-bit automatically, despite 16-bit being set for free user]\n\n3. Build the ensemble with UVR’s Manual Ensemble mode:\n\n4a. Inputs =\n\n• inst\\_unwa.wav (from Step 1)\n\n• inst\\_mvsep.wav (from Step 2)\n\n4b. Set Algorithm = “Max Spec.”\n\n4c. Click Start Processing\n\n4. Phase fix with UVR’s Phase Swapper:\n\n3a. Target Audio = inst\\_unwa+inst\\_mvsep (ensembled).wav (from Step 3).\n\n3b. Reference Audio = inst\\_mvsep.wav (from Step 2).\n\n3c. Click Start Processing.\n\n3d. Note the resulting new file (~~~\\_phaseswapped.wav or similar).\n\nThe resulting file = your final instrumental stem (the one referenced in the Google Doc instructions as having the least cross-bleed).”\n\nFAQ\n\nQ: Question about MVSEP BS Roformer 2025.07. Is this a model I can download, or do I have to do it online on their site? I can't find it.\n\nA: It's mvsep site only. No download. So you want to use this model, must go [mvsep.com](http://mvsep.com)\n\nQ: What exactly are we doing with a Phase Fixer/Swapper? As an audio engineer, I understand phase as how it relates to frequency over time. When a vocal stem is used as a \"reference\" for the target instrumental stem, what's actually happening?\n\nA: “There is a phase in the waveform domain, like audio engineers experience it every day, and there is a phase in the STFT domain. In STFT domain phase means how each STFT bins content is organized with other bins.\n\nThe math concept is similar to phase in the waveform domain, but instead of having 1 phase value for 1 waveform, you have 1 value for each STFT bin and for each STFT frame, so it's way more complex... (to make it really short).\n\nThe phase fixer/swapper thing is operating in the STFT domain, so for each STFT bins, it will use the magnitude data of 1 model, and it will use the phase data from another model, and hopefully this can improve the final result”. - jarredou\n\n“In short, what the script does is blend the stft phase of the \"donor\" file into the target, it uses different blending scales for different frequencies so that it'll only affect the parts that are directly related to the perceived noise” - santilli\\_ / Michael\n\nQ: Apparently the original Resurrection model only has two models (BS-Roformer 1296/1297 by viperx and BS Large V1 by unwa) that work for phase fixer, but the Gabox one says to use 2025.7 for phase fixer (which works almost perfectly) How does that happen? How did the working models change for Gabox's fine tune?\n\nA: ML models are black boxes. You never know what you'll get. Like in Forrest Gump. It depends on the specific instrumental model, which vocal model as the source for the phase fixer will work the best.\n\n\\_\\_\n\n###### P*itch shifting algorithms' comparison*\n\n<https://www.youtube.com/watch?v=gaSFt0tT2u4>\n\n<https://www.youtube.com/watch?v=s-5g4I30_eY>\n\n<https://www.youtube.com/watch?v=WH8KDQALYQY>\n\n“I've had much better results with Izotope RX than Studio One for example for stretching.”\n\nAlso, Bitwig can be good.\n\nYou can also try out paid Lossless Pitch AI on dango.ai ([tuanziai.com/en-US](http://tuanziai.com)).\n\nResearch: <https://discord.com/channels/708579735583588363/911050124661227542/1303058610934382675>\n\n###### *Restoring hi-end in pitched-down tracks -* [*click*](https://docs.google.com/document/d/1GLWvwNG5Ity2OpTe_HARHQxgwYuoosseYcxpzVjL_wY/edit?tab=t.0#heading=h.gko0a4vvgwqs)\n\n\\_\\_\\_\\_\n\n*What does changing batch\\_size from 1 to 2*\n(it wasn’t used in Rofo beta UVR for 9 Jan 2025, but maybe it got changed)\n\n“if your input is batch time sequence, it looks like this:\n\n[ batch1->[time->[sequence],time2->[sequence]..], batch2->[time->[sequence],time2->[sequence]..] ]\n\nAs you increase batch\\_size you increase the amount of data the model gets to churn through.\n\nSo higher batch\\_size allows the model to see more data before you do a thing called backward prop which calculates another thing called gradients,\n\nwhich are used to improve the model by tuning loads of little values inside the neurons so that the next pass through is more accurate” frazer\n\nQ: They told me that increasing batch size to 2 makes it process faster\n\nA: “So when that user says you can increase the batch\\_size what they mean is you can use more than one song to process - i.e. instead of running a single song at batch\\_size = 1 you can run 2 at the same time (batch\\_size = 2)\n\nQ: Ah so batch\\_size param is used for the amount of chunks of the input, so if I set [batch\\_size] to 4 my audio is chunked into 4”\n\nA: “No, the chunks are split based on defined chunk\\_size in config (which is more related to STFT settings), and then the script is stacking 'batch\\_size' number of chunks in same tensor to process them at same time (for inference).” jarredou\n\nA: “Increasing the batch size increases the number of chunks that can be processed at one time, which may speed up processing, but also increases memory usage.\n\nIt will probably not affect quality.” - unwa\n\nInference Colab by jarredou forces batch\\_size=1. Iirc the clicking issue with such value was fixed in MSST repo later, and you can stick to it. Probably in UVR too, since latest patches where newer inference code from MSST was implemented.\n\nSomeone was once telling that value not bigger than 2 takes no more than 4GB of VRAM, but it will rather differ from AMD/Intel when the VRAM usage is higher due to lack of garbage collector present in CUDA.\n\n\\_\\_\\_\\_\\_\\_\\_\n\nDone by deton24, 2021-2026\n\n*Special thanks to* [*Audio Separation*](https://discord.gg/ZPtAU5R6rP) *Discord\n(and all the people mentioned in the credits section)*"
  },
  {
    "path": "docs/deton24-model-mapping-and-ensemble-guide.md",
    "content": "# Deton24 Doc ↔ Audio-Separator Model Mapping & Ensemble Guide\n\n**Date:** 2026-03-15\n**Source document:** `docs/deton24-audio-separation-info-2026-03-15.md` (converted from [deton24's Google Doc](https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c))\n\n## Purpose\n\nThis document serves as:\n1. A **naming convention lookup table** to map between audio-separator model filenames and the informal names used in deton24's doc and the audio separation community\n2. A **section reference guide** with line numbers into the deton24 doc for finding specific topics\n3. A **task brief for Claude agents** implementing improvements to audio-separator (ensemble presets, new models, phase fix)\n\n---\n\n## Table of Contents\n\n- [Section 1: How to Navigate the Deton24 Doc](#section-1-how-to-navigate-the-deton24-doc)\n- [Section 2: Naming Convention Lookup Table](#section-2-naming-convention-lookup-table)\n- [Section 3: Key Metrics Explained](#section-3-key-metrics-explained)\n- [Section 4: Ensemble Algorithm Mapping](#section-4-ensemble-algorithm-mapping)\n- [Section 5: Recommended Ensemble Presets (Implementable Now)](#section-5-recommended-ensemble-presets-implementable-now)\n- [Section 6: Missing Top-Tier Models to Add](#section-6-missing-top-tier-models-to-add)\n- [Section 7: Phase Fix — What It Is and How to Implement It](#section-7-phase-fix--what-it-is-and-how-to-implement-it)\n- [Section 8: Agent Task Briefs](#section-8-agent-task-briefs)\n\n---\n\n## Section 1: How to Navigate the Deton24 Doc\n\nThe deton24 doc (`docs/deton24-audio-separation-info-2026-03-15.md`) is ~27,000+ lines. Here are the key section locations:\n\n| Topic | Approx. Line | Section Heading |\n|-------|-------------|-----------------|\n| **Best models list** | 6845 | `### **The best models**` |\n| **Best instrumental models** | 6858 | `###### > for instrumentals` |\n| **Instrumental ensembles** | 7676 | `###### **>Ensembles**` (for instrumentals) |\n| **Best vocal models** | 8352 | `###### **>** ***for vocals***` |\n| **Vocal ensembles** | 8769 | `###### **Ensembles**` (for vocals) |\n| **Debleeding/cleaning** | 9421 | `###### **Debleeding/cleaning vocals/instrumentals/inverts**` |\n| **Karaoke** | 9636 | `###### **>Karaoke**` |\n| **Lead vocals only** | 10034 | `###### >Keeping only **lead vocals**` |\n| **Drumsep** | 10717 | `###### **>Sep. parts of drums a.k.a. Drumsep**` |\n| **De-reverb** | 11220 | `###### **De-reverb**` |\n| **De-noising** | 11538 | `###### **De-noising (vinyl noise/white noise/general)**` |\n| **Ensemble algorithm explanations** | 12212 | `###### *Ensemble algorithm explanations*` |\n| **4-5 max models rule** | 12351 | `###### *4-5 max ensemble models rule*` |\n| **SDR leaderboard** | 13286 | `##### SDR leaderboard` |\n| **Instrumental bleedless rankings** | 13478 | `##### *Instrumental models sorted by instrumental* ***bleedless*** *metric:*` |\n| **Vocal bleedless rankings** | 13558 | `###### Vocal models/ensembles sorted by instrumental **bleedless** metric` |\n| **Phantom center extraction** | 14731 | `##### Similarity/Phantom Center/Mid channel Extractor` |\n| **Drumsep (detailed)** | 18938 | `## Drumsep - single percussion instruments separation` |\n| **Phase fixer/swapper** | 27072 | `###### *Phase fixer/swapper*` |\n\n### Tips for navigating\n\n- The doc uses **community nicknames**, not filenames. E.g., \"v1e+\" = `melband_roformer_inst_v1e_plus.ckpt`, \"deux\" = becruily's dual vocal+instrumental Mel-Roformer, \"Resurrection\" = unwa's BS-Roformer models.\n- Model creators are referenced by their Discord handles: **unwa** (pcunwa), **Gabox** (GaboxR67), **becruily**, **aufr33**, **viperx**, **anvuew**, **jarredou**, **Aname** (Aname-Tommy), **mesk** (meskvlla33), **ZFTurbo** (MVSEP developer).\n- \"MVSEP exclusive\" means the model is only available via mvsep.com and cannot be downloaded — these cannot be added to audio-separator.\n- \"UVR\" = Ultimate Vocal Remover GUI, the main desktop app most community members use.\n- \"MSST\" = Music Source Separation Training, the framework used to train and run inference on newer Roformer models.\n\n---\n\n## Section 2: Naming Convention Lookup Table\n\n### Roformer — Vocals\n\n| Community Name | audio-separator filename | Creator | Key Metrics | Notes |\n|---|---|---|---|---|\n| Kim (original) | `vocals_mel_band_roformer.ckpt` | KimberleyJSN | — | The original Mel-Roformer |\n| Kim FT | `mel_band_roformer_kim_ft_unwa.ckpt` | unwa | — | Fine-tune of Kim |\n| Kim FT2 | `mel_band_roformer_kim_ft2_unwa.ckpt` | unwa | — | |\n| FT2 bleedless | `mel_band_roformer_kim_ft2_bleedless_unwa.ckpt` | unwa | bleedless 39.30 | Very clean vocals |\n| Kim FT3 (preview) | `mel_band_roformer_kim_ft3_unwa.ckpt` | unwa | — | Used as phase fix reference in some setups |\n| becruily vocal | `mel_band_roformer_vocals_becruily.ckpt` | becruily | — | Key phase-fix reference model; part of \"deux\" dual |\n| Gabox voc | `mel_band_roformer_vocals_gabox.ckpt` | Gabox | — | |\n| Gabox voc v2 | `mel_band_roformer_vocals_v2_gabox.ckpt` | Gabox | — | |\n| Gabox voc fv1–fv6 | `mel_band_roformer_vocals_fv1_gabox.ckpt` … `fv6` | Gabox | fv4 best for RVC | fv6 = extreme fullness (24.93) |\n| Big Beta 4 | `melband_roformer_big_beta4.ckpt` | unwa | — | |\n| Big Beta 5e | `melband_roformer_big_beta5e.ckpt` | unwa | — | |\n| Big Beta 6 | `melband_roformer_big_beta6.ckpt` | unwa | — | |\n| Big Beta 6X | `melband_roformer_big_beta6x.ckpt` | unwa | SDR 11.12 | Good balance |\n| Revive | `bs_roformer_vocals_revive_unwa.ckpt` | unwa | — | |\n| Revive 2 | `bs_roformer_vocals_revive_v2_unwa.ckpt` | unwa | bleedless 40.07 | Highest bleedless vocal |\n| Revive 3e | `bs_roformer_vocals_revive_v3e_unwa.ckpt` | unwa | fullness 21.43 | High fullness vocal |\n| Resurrection (vocal) | `bs_roformer_vocals_resurrection_unwa.ckpt` | unwa | SDR 11.34, bleedless 39.99 | Top-tier, only 195MB, fast |\n| FullnessVocalModel | `mel_band_roformer_vocal_fullness_aname.ckpt` | Aname | — | |\n| SYHFT v1-v3 | `MelBandRoformerSYHFT.ckpt` etc. | — | — | Not prominently discussed in deton24 |\n\n### Roformer — Instrumentals\n\n| Community Name | audio-separator filename | Creator | Key Metrics | Notes |\n|---|---|---|---|---|\n| becruily inst | `mel_band_roformer_instrumental_becruily.ckpt` | becruily | SDR 17.55, bleedless 41.36 | Part of \"deux\" dual; \"SOTA\" per community |\n| Gabox inst / inst2 / inst3 | `mel_band_roformer_instrumental_gabox.ckpt` etc. | Gabox | — | |\n| Gabox bleedless v1–v3 | `mel_band_roformer_instrumental_bleedless_v1_gabox.ckpt` etc. | Gabox | — | Optimized for low bleed |\n| Gabox fullness v1–v3 | `mel_band_roformer_instrumental_fullness_v1_gabox.ckpt` etc. | Gabox | — | Optimized for fullness |\n| Gabox fullness noise v4 | `mel_band_roformer_instrumental_fullness_noise_v4_gabox.ckpt` | Gabox | fullness 40.40 | a.k.a. \"inst_Fv4Noise\" |\n| INSTV5 / INSTV5N | `mel_band_roformer_instrumental_instv5_gabox.ckpt` / `instv5n` | Gabox | — | |\n| INSTV6 / INSTV6N | `mel_band_roformer_instrumental_instv6_gabox.ckpt` / `instv6n` | Gabox | INSTV6N: fullness 41.68 | Extreme fullness, noisy |\n| INSTV7 / INSTV7N | `mel_band_roformer_instrumental_instv7_gabox.ckpt` / `instv7n` | Gabox | — | |\n| INSTV8 / INSTV8N | `mel_band_roformer_instrumental_instv8_gabox.ckpt` / `instv8n` | Gabox | — | |\n| Inst_GaboxFv7z | `mel_band_roformer_instrumental_fv7z_gabox.ckpt` | Gabox | bleedless 44.61 | **Best bleedless inst**, nearly noiseless |\n| Inst_GaboxFv8 | `mel_band_roformer_instrumental_fv8_gabox.ckpt` | Gabox | — | |\n| Inst_GaboxFVX | `mel_band_roformer_instrumental_fvx_gabox.ckpt` | Gabox | — | |\n| v1e | `melband_roformer_inst_v1e.ckpt` | unwa | fullness 38.87 | Classic fullness model |\n| v1e+ | `melband_roformer_inst_v1e_plus.ckpt` | unwa | fullness 37.89 | Best balanced fullness |\n| v1+ | `melband_roformer_inst_v1_plus.ckpt` | unwa | — | |\n| Resurrection Inst | `bs_roformer_instrumental_resurrection_unwa.ckpt` | unwa | SDR 17.25 | Only 200MB, great all-rounder |\n| BS-Roformer SW | `BS-Roformer-SW.ckpt` | — | — | Reversed Apple Logic Pro, 6-stem |\n\n### Roformer — Karaoke\n\n| Community Name | audio-separator filename | Creator | Key Metrics |\n|---|---|---|---|\n| aufr33/viperx karaoke | `mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt` | aufr33/viperx | SDR 10.20 |\n| kar_gabox | `mel_band_roformer_karaoke_gabox.ckpt` | Gabox | — |\n| Karaoke_GaboxV2 | `mel_band_roformer_karaoke_gabox_v2.ckpt` | Gabox | — |\n| becruily karaoke | `mel_band_roformer_karaoke_becruily.ckpt` | becruily | — |\n\n### Roformer — De-reverb / Denoise / Special\n\n| Community Name | audio-separator filename | Creator | Notes |\n|---|---|---|---|\n| Aufr33 Denoise (average) | `denoise_mel_band_roformer_aufr33_sdr_27.9959.ckpt` | aufr33 | Recommended first-choice denoiser |\n| Aufr33 Denoise (aggressive) | `denoise_mel_band_roformer_aufr33_aggr_sdr_27.9768.ckpt` | aufr33 | |\n| Gabox denoisedebleed | `mel_band_roformer_denoise_debleed_gabox.ckpt` | Gabox | Preprocessor |\n| anvuew dereverb (stereo) | `dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt` | anvuew | |\n| anvuew dereverb (less aggressive) | `dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt` | anvuew | |\n| anvuew dereverb mono | `dereverb_mel_band_roformer_mono_anvuew.ckpt` | anvuew | SDR 20.40 |\n| crowd removal | `mel_band_roformer_crowd_aufr33_viperx_sdr_8.7144.ckpt` | aufr33/viperx | |\n| bleed suppressor | `mel_band_roformer_bleed_suppressor_v1.ckpt` | unwa/97chris | For instrumentals post-processing |\n| chorus separator | `model_chorus_bs_roformer_ep_267_sdr_24.1275.ckpt` | — | |\n| aspiration/breath models | `aspiration_mel_band_roformer_sdr_18.9845.ckpt` etc. | — | |\n\n### MDX / MDX23C / VR / Demucs\n\n| Community Name | audio-separator filename | Notes |\n|---|---|---|\n| UVR-MDX-NET-Inst_HQ_5 | `UVR-MDX-NET-Inst_HQ_5.onnx` | Older MDX arch, still useful for fast/low-resource ensemble |\n| MDX23C DrumSep | `MDX23C-DrumSep-aufr33-jarredou.ckpt` | 5-stem drum separation (kick/snare/toms/cymbals/other) |\n| MDX23C De-Reverb | `MDX23C-De-Reverb-aufr33-jarredou.ckpt` | |\n| UVR De-Reverb | `UVR-De-Reverb-aufr33-jarredou.pth` | VR arch |\n| BVE (Backing Vocal Extractor) | `UVR-BVE-4B_SN-44100-2.pth` | VR arch, extracts backing vocals |\n| htdemucs_ft | `htdemucs_ft.yaml` | 4-stem (vocals/drums/bass/other), recommended for drum pre-separation |\n\n---\n\n## Section 3: Key Metrics Explained\n\nThe community uses three key metrics to evaluate models (see deton24 doc ~line 6850 and the leaderboard at ~line 13286):\n\n| Metric | What It Measures | Higher = |\n|--------|-----------------|----------|\n| **SDR** (Signal-to-Distortion Ratio) | Overall separation quality vs. ground truth | Better overall quality |\n| **Bleedless** | How little the *other* stems bleed into the target stem | Cleaner output, but potentially muddier/more muted |\n| **Fullness** | How much of the target stem's content is preserved | Fuller/richer output, but potentially noisier with more bleed |\n\n**Important:** Bleedless and fullness are in tension. A model cannot maximize both. The community categorizes models as:\n- **Fullness models**: v1e, INSTV6N, inst_Fv4Noise — preserve more instruments but have noise/vocal residues\n- **Bleedless models**: Fv7z, FNO, Revive 2 — cleaner but may lose some subtle instruments\n- **Balanced models**: Resurrection Inst, becruily \"deux\", Beta 6X — attempt to optimize both\n\nThe best ensembles typically combine a fullness model with a bleedless model using an appropriate algorithm.\n\n---\n\n## Section 4: Ensemble Algorithm Mapping\n\nThe deton24 doc (line 12212–12365) explains ensemble algorithms. Here's how the community terminology maps to audio-separator's `Ensembler` class (`audio_separator/separator/ensembler.py`):\n\n| Community / UVR Term | audio-separator algorithm | Effect | Best For |\n|---|---|---|---|\n| **Avg Spec** / **Average** | `avg_wave` or `avg_fft` | Averages all inputs | Highest SDR, safest default |\n| **Max Spec** | `uvr_max_spec` or `max_fft` | Keeps loudest frequency bins | Fuller output, more bleed; good for vocals |\n| **Min Spec** | `uvr_min_spec` or `min_fft` | Keeps quietest frequency bins | Cleaner output, less bleed; good for instrumentals |\n| **Max FFT** | `max_fft` | Same concept as Max Spec in FFT domain | Used interchangeably with Max Spec in community |\n| **Min FFT** | `min_fft` | Same concept as Min Spec in FFT domain | |\n| **Median** | `median_wave` or `median_fft` | Takes median value | Reduces outlier artifacts |\n\n### Community rules of thumb (from deton24 line 12212+):\n- **Avg/Avg** gets the highest SDR and is the safest default\n- **Max Spec** for vocals (fuller, captures more vocal content, more instrument bleed)\n- **Min Spec** for instrumentals (cleaner, less vocal residue, but can sound muffled)\n- **Max/Min** = max for vocal stem, min for instrumental stem (commonly recommended)\n- **Do not ensemble more than 4-5 models** — SDR drops above this (line 12351)\n\n---\n\n## Section 5: Recommended Ensemble Presets (Implementable Now)\n\nThese ensembles use **only models already in audio-separator**. Ranked by community consensus from the deton24 doc.\n\n### Instrumental Ensembles\n\n#### Preset: `instrumental_clean` — Bleedless + All-Rounder\n```\nModels:\n  - mel_band_roformer_instrumental_fv7z_gabox.ckpt\n  - bs_roformer_instrumental_resurrection_unwa.ckpt\nAlgorithm: uvr_max_spec\n```\n**Why:** Fv7z is the bleedless king (44.61) and Resurrection Inst is a great all-rounder (SDR 17.25). Max Spec fills in what Fv7z might miss. Deton24 doc line ~7739 discusses Fv7z in ensembles.\n\n#### Preset: `instrumental_full` — Maximum Instrument Preservation\n```\nModels:\n  - melband_roformer_inst_v1e_plus.ckpt\n  - mel_band_roformer_instrumental_becruily.ckpt\nAlgorithm: uvr_max_spec\n```\n**Why:** v1e+ (fullness 37.89) is the community's classic fullness model. Becruily inst is \"SOTA\" (SDR 17.55). Max Spec preserves energy. Based on ensemble patterns from deton24 line ~7777-7789.\n\n#### Preset: `instrumental_balanced` — Good Balance of Noise/Fullness\n```\nModels:\n  - mel_band_roformer_instrumental_instv8_gabox.ckpt\n  - bs_roformer_instrumental_resurrection_unwa.ckpt\nAlgorithm: uvr_max_spec\n```\n**Why:** Inspired by deton24 line ~7743 (\"Gabox Inst V8 + [model] = good balance between noise and fullness\"). Uses Resurrection Inst as the secondary model since the MVSEP model referenced there isn't available locally.\n\n#### Preset: `instrumental_low_resource` — Fast / Low VRAM\n```\nModels:\n  - bs_roformer_instrumental_resurrection_unwa.ckpt\n  - UVR-MDX-NET-Inst_HQ_5.onnx\nAlgorithm: avg_fft\n```\n**Why:** Resurrection Inst is only 200MB and fast. HQ_5 is MDX arch (very fast). Avg is safest. Deton24 line ~7855 mentions HQ_5 in low-resource ensembles.\n\n### Vocal Ensembles\n\n#### Preset: `vocal_balanced` — Best Overall Quality\n```\nModels:\n  - bs_roformer_vocals_resurrection_unwa.ckpt\n  - melband_roformer_big_beta6x.ckpt\nAlgorithm: avg_fft\n```\n**Why:** Resurrection (SDR 11.34, bleedless 39.99) + Beta 6X (SDR 11.12). Two top-tier models averaged for highest SDR. Community recommendations at deton24 line ~8789-8791.\n\n#### Preset: `vocal_clean` — Minimal Instrument Bleed\n```\nModels:\n  - bs_roformer_vocals_revive_v2_unwa.ckpt\n  - mel_band_roformer_kim_ft2_bleedless_unwa.ckpt\nAlgorithm: min_fft\n```\n**Why:** Revive 2 (bleedless 40.07) + FT2 bleedless (39.30) = cleanest possible. Min FFT removes anything not common to both, further reducing bleed.\n\n#### Preset: `vocal_full` — Maximum Vocal Capture\n```\nModels:\n  - bs_roformer_vocals_revive_v3e_unwa.ckpt\n  - mel_band_roformer_vocals_becruily.ckpt\nAlgorithm: max_fft\n```\n**Why:** Revive 3e (fullness 21.43) + becruily vocal (fullness 23.25). Max FFT keeps all vocal content from both. Good for capturing harmonies and backing vocals. Related: deton24 line ~8783-8795.\n\n#### Preset: `vocal_rvc` — Optimized for RVC/AI Training\n```\nModels:\n  - melband_roformer_big_beta6x.ckpt\n  - mel_band_roformer_vocals_fv4_gabox.ckpt\nAlgorithm: avg_wave\n```\n**Why:** Directly recommended at deton24 line ~8789-8791 for RVC: \"beta6x + voc_fv4\". Average for clean, consistent output suitable for training data.\n\n### Karaoke Ensembles\n\n#### Preset: `karaoke` — Lead Vocal Removal (3-model)\n```\nModels:\n  - mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\n  - mel_band_roformer_karaoke_gabox_v2.ckpt\n  - mel_band_roformer_karaoke_becruily.ckpt\nAlgorithm: avg_wave\n```\n**Why:** Deton24 reports 3-model karaoke ensembles reach SDR ~10.6 vs ~10.2 for single models. These are the three main karaoke models available in audio-separator.\n\n### Processing Pipelines (Sequential, Not Parallel Ensemble)\n\n#### Pipeline: `clean_vocals` — Full Vocal Cleaning Chain\n```\nStep 1: Separate vocals (vocal_balanced preset or single model)\nStep 2: denoise_mel_band_roformer_aufr33_sdr_27.9959.ckpt (denoise)\nStep 3: dereverb_mel_band_roformer_mono_anvuew.ckpt (dereverb)\n```\n**Why:** Deton24 line ~8807-8817 recommends: vocals → de-reverb → karaoke → de-noise. The denoise-first approach is also recommended by community member natethegratevhs.\n\n#### Pipeline: `drumsep` — Detailed Drum Separation\n```\nStep 1: htdemucs_ft.yaml (extract drums stem from mix)\nStep 2: MDX23C-DrumSep-aufr33-jarredou.ckpt (split drums into kick/snare/toms/cymbals)\n```\n**Why:** Deton24 line ~10717+ recommends pre-separating drums with htdemucs_ft, then using drumsep for sub-stem separation. The jarredou MDX23C model is the only downloadable drumsep model.\n\n---\n\n## Section 6: Missing Top-Tier Models to Add\n\nThese models are discussed prominently in the deton24 doc, are publicly downloadable, and would significantly improve audio-separator's quality — especially for ensembles.\n\n### High Priority (top-tier, public, frequently referenced)\n\n| Model | Creator | Type | HuggingFace URL | Key Metrics | Deton24 Line |\n|---|---|---|---|---|---|\n| BS-Roformer HyperACE v2 inst | unwa | instrumental | `pcunwa/BS-Roformer-HyperACE` (check exact path) | SDR 17.40, fullness 38.03 | ~7682, 7698, 7702 |\n| BS-Roformer HyperACE v2 voc | unwa | vocal | same repo | SDR 11.40 | ~8777 |\n| BS-Roformer-Inst-FNO | unwa | instrumental | `pcunwa/` (check) | SDR 17.60 (highest public inst) | ~7714, 7755 |\n| Mel \"deux\" (becruily) | becruily | dual (vocal+inst) | `becruily/mel-band-roformer-deux` | inst SDR 17.55, voc SDR 11.37 | ~7682, 7698, 8773 |\n| Big Beta 7 | unwa | vocal | `pcunwa/` (check) | SDR 11.20 | ~8775 |\n| Gabox voc_fv7 + betas | Gabox | vocal | `GaboxR67/MelBandRoformers` | SDR 11.16 | ~8826 |\n| Gabox inst_gaboxFlowersV10 | Gabox | instrumental | same repo | SDR 16.95, fullness 37.12 | ~7759 |\n| Rifforge | mesk | instrumental (metal) | `meskvlla33/rifforge` | — | ~7759 |\n| BS-Roformer 1296/1297 | viperx | vocal | check HF | SDR 12.96/12.97 | ~7753, 7847-7852 |\n| anvuew dereverb BS (stereo, SDR 22.50) | anvuew | de-reverb | check HF | SDR 22.50 | ~100 |\n| BS_RoFormer_mag | anvuew | vocal | `anvuew/BS_RoFormer_mag` | bleedless 32.17, fullness 22.15 | ~59 |\n\n### Medium Priority (useful but less critical)\n\n| Model | Creator | Type | Notes |\n|---|---|---|---|\n| BS-Roformer-Inst-EXP-Value-Residual | unwa | instrumental | Experimental |\n| BS-EXP-SiameseRoformer | unwa | vocal | Experimental, needs custom code |\n| gilliaan MonoStereo Dual Beta1/2 | gilliaan | phantom center | Niche use case |\n| Neo_InstVFX | neoculture | instrumental | Preserves vocal chops (K-pop) |\n\n### Cannot Add (MVSEP Exclusive)\n\nThese are referenced constantly in the doc but **cannot be downloaded** — they only work through mvsep.com:\n\n- BS-Roformer 2025.07, 2025.06, 2024.08, 2024.04 (by ZFTurbo)\n- SCNet XL, SCNet Large, SCNet XL IHF, SCNet variants\n- Various instrument-specific models (saxophone, trumpet, violin, etc.)\n- VitLarge23\n- Drumsep 4/5/6 stem Mel-Roformer and SCNet models\n\n---\n\n## Section 7: Phase Fix — What It Is and How to Implement It\n\n### Background\n\nPhase fix (also called \"phase swapper\") was invented by **aufr33** and is discussed in detail at deton24 doc line 27072-27170. It is one of the most impactful post-processing techniques for instrumental separation.\n\n### The Problem It Solves\n\nRoformer models trained with an **instrumental stem target** produce full-sounding instrumentals but with noisy vocal residues. Models trained with a **vocal stem target** produce muddy instrumentals (via inversion) but with better bleedless metrics. Phase fix combines the best of both.\n\n### How It Works (Technical)\n\nFrom jarredou's explanation (deton24 line 27160-27164):\n\n> \"There is a phase in the STFT domain. For each STFT bin, it will use the **magnitude data** of one model [the instrumental model], and the **phase data** from another model [the vocal model's instrumental output], and hopefully this can improve the final result.\"\n\nFrom santilli_ (line 27166):\n\n> \"What the script does is blend the STFT phase of the 'donor' file into the target. It uses different blending scales for different frequencies so that it'll only affect the parts that are directly related to the perceived noise.\"\n\n### Algorithm (pseudocode)\n\n```python\nimport librosa\nimport numpy as np\n\ndef phase_fix(target_audio, reference_audio, sr,\n              low_cutoff=500, high_cutoff=5000,\n              high_freq_weight=0.8, n_fft=2048):\n    \"\"\"\n    Apply phase fix to target audio using reference audio's phase.\n\n    Args:\n        target_audio: The instrumental from an inst-trained model (full but noisy)\n        reference_audio: The instrumental from a vocal-trained model (muddy but clean phase)\n        low_cutoff: Frequency below which phase is not replaced (Hz)\n        high_cutoff: Frequency above which phase blending weight decreases (Hz)\n        high_freq_weight: Blending weight for frequencies above high_cutoff (0.8-2.0)\n        n_fft: FFT size\n    Returns:\n        Phase-fixed audio\n    \"\"\"\n    # Convert to STFT\n    target_stft = librosa.stft(target_audio, n_fft=n_fft)\n    reference_stft = librosa.stft(reference_audio, n_fft=n_fft)\n\n    # Get magnitude from target, phase from reference\n    target_magnitude = np.abs(target_stft)\n    reference_phase = np.angle(reference_stft)\n    target_phase = np.angle(target_stft)\n\n    # Create frequency-dependent blending weights\n    freqs = librosa.fft_frequencies(sr=sr, n_fft=n_fft)\n    blend_weights = np.zeros_like(freqs)\n\n    for i, f in enumerate(freqs):\n        if f < low_cutoff:\n            blend_weights[i] = 0.0  # Keep target phase below low_cutoff\n        elif f < high_cutoff:\n            # Linear interpolation between cutoffs\n            blend_weights[i] = (f - low_cutoff) / (high_cutoff - low_cutoff)\n        else:\n            blend_weights[i] = high_freq_weight  # Use high_freq_weight above high_cutoff\n\n    # Apply blending: new_phase = target_phase * (1 - weight) + reference_phase * weight\n    blend_weights = blend_weights[:, np.newaxis]  # Shape for broadcasting\n    blended_phase = target_phase * (1 - blend_weights) + reference_phase * blend_weights\n\n    # Reconstruct with target magnitude and blended phase\n    fixed_stft = target_magnitude * np.exp(1j * blended_phase)\n    fixed_audio = librosa.istft(fixed_stft, length=len(target_audio))\n\n    return fixed_audio\n```\n\n### Key Parameters (from community recommendations)\n\n| Parameter | Default (UVR) | Default (Colab) | santilli_ recommendation | Use Case |\n|---|---|---|---|---|\n| `low_cutoff` | 500 Hz | 500 Hz | 100–420 Hz | Lower = more aggressive fix |\n| `high_cutoff` | 5000 Hz | 9000 Hz | 100–4200 Hz | Lower = more aggressive fix |\n| `high_freq_weight` | 0.8 | 0.8 | **2.0** | \"Increasing from 0.8 to 2 is beneficial\" (line 27076) |\n\n### Common Phase Fix Pairings\n\nThese are the recommended target→reference pairings from the doc (line 27086-27098):\n\n| Target (Instrumental Model) | Reference (Vocal/Phase-donor Model) | Cutoff Values |\n|---|---|---|\n| `melband_roformer_inst_v1e.ckpt` | `mel_band_roformer_vocals_becruily.ckpt` (inst output) | 100/100 (aggressive) |\n| `melband_roformer_inst_v1e_plus.ckpt` | `mel_band_roformer_vocals_becruily.ckpt` (inst output) | 100/100 |\n| `bs_roformer_instrumental_resurrection_unwa.ckpt` | `mel_band_roformer_vocals_becruily.ckpt` (inst output) | 3000/5000 |\n| Any Gabox inst model | `mel_band_roformer_vocals_becruily.ckpt` (inst output) | 500/5000 |\n\n**Important:** The \"reference\" is the **instrumental output** of a vocal-targeted model (not the vocal output). You separate with the vocal model, take its instrumental stem, and use that as the phase reference.\n\n### Existing Code References\n\n- **Becruily's original scripts:** https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy\n- **Phase Fixer Colab (santilli_):** https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm\n- **MSST repo** has phase fix integrated into inference pipeline\n\n### Implementation Suggestion for audio-separator\n\nPhase fix should be added as a post-processing step in the `Ensembler` class or as a standalone utility. The flow would be:\n\n1. User runs separation with an instrumental model → gets `instrumental_target.wav`\n2. User runs separation with a vocal model → gets `instrumental_reference.wav` (the \"other\" stem)\n3. Phase fix combines magnitude from step 1 with phase from step 2\n4. Optionally, the phase-fixed result is then ensembled with other models' outputs\n\nFor presets, this could be exposed as a pipeline configuration:\n```yaml\npreset: instrumental_phase_fixed\nsteps:\n  - model: melband_roformer_inst_v1e_plus.ckpt\n    output: target_instrumental\n  - model: mel_band_roformer_vocals_becruily.ckpt\n    output: reference_instrumental  # use the instrumental stem\n  - phase_fix:\n      target: target_instrumental\n      reference: reference_instrumental\n      low_cutoff: 100\n      high_cutoff: 100\n      high_freq_weight: 2.0\n```\n\n---\n\n## Section 8: Agent Task Briefs\n\n### Task A: Implement Ensemble Preset Configurations\n\n**Goal:** Add preset ensemble configurations to audio-separator that users can invoke by name.\n\n**Key files to modify:**\n- `audio_separator/separator/ensembler.py` — already has ensemble algorithms implemented\n- `audio_separator/separator/separator.py` — main separator class, handles model loading and separation\n- `audio_separator/utils/cli.py` — CLI interface, needs preset selection flags\n- A new config file (e.g., `audio_separator/ensemble_presets.json` or similar) for preset definitions\n\n**Presets to implement:** See [Section 5](#section-5-recommended-ensemble-presets-implementable-now) above. Start with these core presets:\n1. `instrumental_clean` — Fv7z + Resurrection Inst (max_spec)\n2. `instrumental_full` — v1e+ + becruily inst (max_spec)\n3. `vocal_balanced` — Resurrection voc + Beta 6X (avg_fft)\n4. `vocal_clean` — Revive 2 + FT2 bleedless (min_fft)\n5. `karaoke` — 3-model karaoke ensemble (avg_wave)\n\n**UX suggestion:** `audio-separator input.wav --ensemble instrumental_clean`\n\n**Reference:** Read the existing ensemble tests at `tests/unit/test_ensembler.py` for the current test patterns.\n\n### Task B: Add Support for Missing Top-Tier Models\n\n**Goal:** Add the high-priority models from [Section 6](#section-6-missing-top-tier-models-to-add) to audio-separator.\n\n**Key files to modify:**\n- `audio_separator/models.json` — model registry with download URLs and config\n- `audio_separator/separator/separator.py` — `list_supported_model_files()` method (lines ~440-608)\n\n**Process for each model:**\n1. Find the model on HuggingFace (check creator repos listed in Section 6)\n2. Identify the correct YAML config file (most Roformers need a paired YAML)\n3. Add entry to `models.json` with the download URL and config\n4. Test that the model loads and produces output\n5. Verify the output stem names match expectations\n\n**Priority order:**\n1. BS-Roformer HyperACE v2 (inst + voc) — used in current best ensembles\n2. Mel \"deux\" by becruily — SOTA instrumental, \"doesn't need phase fix\"\n3. BS-Roformer-Inst-FNO — highest public inst SDR\n4. Big Beta 7 — latest vocal model\n5. Gabox voc_fv7 + inst_gaboxFlowersV10 — latest Gabox models\n6. Rifforge — metal-specific, niche but popular\n7. BS-Roformer 1296/1297 (viperx) — classic models, still used for phase fix\n\n**HuggingFace repos to check:**\n- `pcunwa/` (unwa's models)\n- `GaboxR67/MelBandRoformers` (Gabox models)\n- `becruily/mel-band-roformer-deux` (deux dual model)\n- `meskvlla33/rifforge` (Rifforge)\n- `anvuew/` (anvuew models)\n\n### Task C: Implement Phase Fix Support\n\n**Goal:** Add phase fix as a post-processing step in audio-separator.\n\n**What to read first:**\n- [Section 7](#section-7-phase-fix--what-it-is-and-how-to-implement-it) of this document (algorithm, parameters, pseudocode)\n- Deton24 doc line 27072-27170 for full community discussion\n- Becruily's scripts: https://drive.google.com/drive/folders/1JOa198ALJ0SnEreCq2y2kVj-sktvPePy\n- Phase Fixer Colab source: https://colab.research.google.com/drive/1PMQmFRZb_XRIKnBjXhYlxNlZ5XcKMWXm\n\n**Implementation approach:**\n\n1. **Add a `PhaseFixer` class** (new file, e.g., `audio_separator/separator/phase_fixer.py`):\n   - Takes target audio (from inst model) and reference audio (from vocal model's inst stem)\n   - Parameters: `low_cutoff`, `high_cutoff`, `high_freq_weight`, `n_fft`\n   - Works in STFT domain: keeps target magnitude, blends in reference phase\n   - Must handle stereo (process each channel independently)\n\n2. **Integrate with Separator**:\n   - Add `--phase_fix_reference_model` CLI flag\n   - When set, separator runs the reference model first, keeps its instrumental stem\n   - Then runs the target model, applies phase fix using the reference instrumental\n   - Outputs the phase-fixed instrumental\n\n3. **Integrate with ensemble presets**:\n   - Some presets should include phase fix as a step (see Section 5 pipelines)\n   - The preset config format should support multi-step pipelines\n\n**Key gotcha:** The \"reference\" for phase fix is the **instrumental stem from a vocal-trained model** (not the vocal stem itself). So if using `mel_band_roformer_vocals_becruily.ckpt` as reference, you need its \"Instrumental\" / \"No Vocals\" output, not its \"Vocals\" output.\n\n**Testing:** Compare output of phase-fixed v1e (using becruily vocal as reference, cutoffs 100/100) against plain v1e output. The phase-fixed version should have noticeably less vocal residue/\"noise\" in quiet passages while maintaining instrument fullness.\n"
  },
  {
    "path": "environment.yml",
    "content": "name: audio-separator-dev\nchannels:\n  - conda-forge\ndependencies:\n  - python >=3.10\n  - anaconda-client\n  - conda-build\n  - conda-verify\n  - pip >=23.2.1\n  - poetry\n  - hatchling\n"
  },
  {
    "path": "pyproject.toml",
    "content": "[build-system]\nrequires = [\"poetry-core\"]\nbuild-backend = \"poetry.core.masonry.api\"\n\n[tool.poetry]\nname = \"audio-separator\"\nversion = \"0.44.1\"\ndescription = \"Easy to use audio stem separation, using various models from UVR trained primarily by @Anjok07\"\nauthors = [\"Andrew Beveridge <andrew@beveridge.uk>\"]\nlicense = \"MIT\"\nreadme = \"README.md\"\npackages = [{include = \"audio_separator\"}]\ninclude = [\"audio_separator/separator/models.json\", \"audio_separator/ensemble_presets.json\"]\nhomepage = \"https://github.com/karaokenerds/python-audio-separator\"\nrepository = \"https://github.com/karaokenerds/python-audio-separator\"\ndocumentation = \"https://github.com/karaokenerds/python-audio-separator/blob/main/README.md\"\nkeywords = [\"audio\", \"sound\", \"karaoke\"]\nclassifiers = [\n    \"License :: OSI Approved :: MIT License\",\n    \"Operating System :: OS Independent\",\n    \"Development Status :: 4 - Beta\",\n    \"Intended Audience :: Developers\",\n    \"Intended Audience :: Science/Research\",\n    \"Topic :: Multimedia :: Sound/Audio\",\n    \"Topic :: Multimedia :: Sound/Audio :: Mixers\",\n    \"Programming Language :: Python :: 3.10\",\n    \"Programming Language :: Python :: 3.11\",\n    \"Programming Language :: Python :: 3.12\",\n    \"Programming Language :: Python :: 3.13\",\n]\n\n[tool.poetry.dependencies]\npython = \">=3.10\"\nrequests = \">=2\"\nnumpy = \">=2\"\nlibrosa = \">=0.10\"\nsamplerate = \"0.1.0\"\nsix = \">=1.16\"\ntorch = \">=2.3\"\ntorch_directml = {version = \"*\", optional = true}\ntqdm = \"*\"\npydub = \">=0.25\"\naudioop-lts = { version = \">=0.2.1\", python = \"^3.13\" }\nonnx-weekly = { version = \"*\" }\nonnx2torch-py313 = \">=1.6\"\nonnxruntime = { version = \">=1.17\", optional = true }\nonnxruntime-gpu = { version = \">=1.17\", optional = true }\nonnxruntime-directml = { version = \">=1.17\", optional = true }\njulius = \">=0.2\"\ndiffq-fixed = { version = \">=0.2\", platform = \"win32\" }\ndiffq = { version = \">=0.2\", platform = \"!=win32\" }\neinops = \">=0.7\"\npyyaml = \"*\"\nml_collections = \"*\"\nresampy = \">=0.4\"\nbeartype = \"^0.18.5\"\nrotary-embedding-torch = \"^0.6.1\"\nscipy = \"^1.13.0\"\nsoundfile = \">=0.12\"\n\n[tool.poetry.extras]\ncpu = [\"onnxruntime\"]\ngpu = [\"onnxruntime-gpu\"]\ndml = [\"onnxruntime-directml\", \"torch_directml\"]\n\n[tool.poetry.scripts]\naudio-separator = 'audio_separator.utils.cli:main'\naudio-separator-remote = 'audio_separator.remote.cli:main'\n\n[tool.poetry.group.dev.dependencies]\nblack = \">=23\"\npytest = \"*\"\npytest-cov = \">=4.1.0\"\nmatplotlib = \">=3.8.0\"\npillow = \">=10.1.0\"\nscikit-image = \">=0.22.0\"\nfiletype = \">=1\"\ngoogle-cloud-firestore = \">=2.0.0\"\ngoogle-cloud-storage = \">=2.0.0\"\n\n[tool.black]\nline-length = 140\n"
  },
  {
    "path": "pytest.ini",
    "content": "# Used by PyDub, which uses a pure-python fallback when needed already, not an issue\n[pytest]\nfilterwarnings =\n    ignore:stft with return_complex=False is deprecated:UserWarning\n    ignore:'audioop' is deprecated:DeprecationWarning\n"
  },
  {
    "path": "scripts/download_preset_models.py",
    "content": "\"\"\"Download ensemble preset models for baking into Docker image.\"\"\"\nimport json\nimport importlib.resources as resources\nfrom audio_separator.separator import Separator\n\nwith resources.open_text(\"audio_separator\", \"ensemble_presets.json\") as f:\n    presets = json.load(f)[\"presets\"]\n\nmodels_to_download = set()\nfor preset_name in [\"instrumental_clean\", \"karaoke\"]:\n    models_to_download.update(presets[preset_name][\"models\"])\n\nprint(f\"Downloading {len(models_to_download)} models for ensemble presets...\")\nfor model in sorted(models_to_download):\n    print(f\"  Downloading: {model}\")\n    sep = Separator(model_file_dir=\"/models\")\n    sep.load_model(model)\n    print(f\"  Done: {model}\")\nprint(\"All models downloaded successfully.\")\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/001-update-roformer-implementation.md",
    "content": "# Feature Specification: Update Roformer Implementation\n\n**Feature Branch**: `001-update-roformer-implementation`  \n**Created**: September 25, 2025  \n**Status**: Draft  \n**Input**: User description: \"update Roformer implementation: this audio-separator project currently has an older implementation of the Roformer architecture inference code, copied from another project over a year ago into folder path audio_separator/separator/uvr_lib_v5/ this works well for many models, but some of the latest Roformer models don't work with the older inference code; the model fails to load with errors such as \"AttributeError: \"'norm'\" - File \"/Users/andrew/miniforge3/lib/python3.13/site-packages/audio_separator/separator/uvr_lib_v5/tfc_tdf_v3.py\", line 155, in __init__  norm = get_norm(norm_type=config.model.norm)\" or \" TypeError: BSRoformer.init() got an unexpected keyword argument 'mlp_expansion_factor'\". I've copied the latest inference code from the other project into this folder path, as a reference: audio_separator/separator/msst-models-new we need to identify the differences between the old and new roformer implementations and modify the audio-separator implementation to work with the newest models without breaking support for older ones. It's critical that we don't break existing functionality, so we should be careful to understand the old and new code fully before making changes, and get me to manually test to validate things still work whenever we've made changes.\"\n\n## Execution Flow (main)\n```\n1. Parse user description from Input\n   → Identified: Need to update Roformer implementation for latest model compatibility\n2. Extract key concepts from description\n   → Actors: developers, users with existing models, users with new models\n   → Actions: update implementation, maintain backward compatibility, validate functionality\n   → Data: Roformer models (old and new), inference code, model configurations\n   → Constraints: cannot break existing functionality, must support both old and new models\n3. For each unclear aspect:\n   → [RESOLVED] Clear requirements provided with specific error messages and paths\n4. Fill User Scenarios & Testing section\n   → Clear user flow: load models, separate audio, validate outputs\n5. Generate Functional Requirements\n   → All requirements are testable and specific\n6. Identify Key Entities\n   → BSRoformer, MelBandRoformer, model configurations, inference parameters\n7. Run Review Checklist\n   → No clarifications needed, no implementation details included\n8. Return: SUCCESS (spec ready for planning)\n```\n\n---\n\n## ⚡ Quick Guidelines\n- ✅ Focus on WHAT users need and WHY\n- ❌ Avoid HOW to implement (no tech stack, APIs, code structure)\n- 👥 Written for business stakeholders, not developers\n\n---\n\n## User Scenarios & Testing *(mandatory)*\n\n### Primary User Story\nUsers need to be able to load and use the latest Roformer models for audio separation without losing the ability to use their existing older models. The system should seamlessly handle both old and new model formats, providing the same quality audio separation results.\n\n### Acceptance Scenarios\n1. **Given** a user has an existing older Roformer model, **When** they attempt to separate audio, **Then** the separation should work exactly as it did before the update\n2. **Given** a user has a newer Roformer model with updated parameters (mlp_expansion_factor, sage_attention, zero_dc), **When** they attempt to load and use the model, **Then** the model should load successfully and produce high-quality audio separation\n3. **Given** a user has models that use different normalization configurations, **When** they load these models, **Then** the system should handle the normalization appropriately without AttributeError exceptions\n4. **Given** a user switches between old and new model types, **When** they perform multiple separations in sequence, **Then** all separations should complete successfully without conflicts\n\n### Edge Cases\n- What happens when a model configuration contains parameters that exist in new implementation but not old?\n- How does system handle models with missing or invalid configuration parameters?\n- What happens when a user tries to load a corrupted or incompatible model file?\n- How does the system behave when switching between different Roformer variants (BSRoformer vs MelBandRoformer)?\n\n## Requirements *(mandatory)*\n\n### Functional Requirements\n- **FR-001**: System MUST load and execute older Roformer models without any regression in functionality or performance, preferentially using new inference code with fallback to old code if loading fails\n- **FR-002**: System MUST load and execute newer Roformer models that include updated parameters (mlp_expansion_factor, sage_attention, zero_dc, use_torch_checkpoint, skip_connection) using the new inference code\n- **FR-003**: System MUST handle both BSRoformer and MelBandRoformer model variants, attempting new implementation first and falling back to old implementation only if necessary\n- **FR-004**: System MUST gracefully handle model configurations with missing required parameters by failing with detailed error messages that specify which parameters are missing and their expected types/values\n- **FR-005**: System MUST resolve normalization configuration issues that cause AttributeError exceptions in tfc_tdf_v3.py\n- **FR-006**: System MUST maintain identical audio separation quality for existing models after the update, validated using spectral analysis comparison with waveform and spectrogram similarity thresholds (≥0.90 for waveform, ≥0.80 for spectrogram)\n- **FR-007**: System MUST provide clear error messages when model loading fails due to incompatible configurations, including specific parameter mismatches and suggested corrections\n- **FR-008**: System MUST support backward compatibility for all existing model files and configurations\n- **FR-009**: System MUST validate model configurations before attempting to instantiate models\n- **FR-010**: System MUST allow seamless switching between different Roformer model types within the same session\n- **FR-011**: System MUST undergo manual testing validation after completing major implementation milestones (e.g., new inference code integration, backward compatibility implementation, error handling updates)\n\n### Key Entities *(include if feature involves data)*\n- **Roformer Model**: Audio separation model with specific architecture parameters, exists in old and new variants with different parameter sets\n- **Model Configuration**: Dictionary containing model parameters including dimension, depth, attention settings, and normalization preferences\n- **BSRoformer**: Band-split Roformer variant that processes audio in frequency bands, requires freqs_per_bands parameter\n- **MelBandRoformer**: Mel-scale band Roformer variant that uses mel-scale frequency bands, requires num_bands parameter\n- **Model Parameters**: Configuration values including mlp_expansion_factor, sage_attention, zero_dc, and other architecture-specific settings\n\n## Clarifications\n\n### Session 2025-09-25\n- Q: The spec mentions \"identical audio separation quality\" must be maintained, but doesn't specify how quality should be measured or what tolerance is acceptable for validation testing. → A: Use spectral analysis comparison with defined similarity thresholds\n- Q: How should the system determine appropriate defaults for missing model parameters? → A: Fail gracefully with detailed error messages rather than assuming defaults\n- Q: How should the system detect whether a model uses the old or new Roformer format? → A: All Roformer models should use new inference code (test first), fallback to old if fails\n- Q: When should manual testing validation be performed during the implementation process? → A: Test only after completing major implementation milestones\n\n---\n\n## Review & Acceptance Checklist\n*GATE: Automated checks run during main() execution*\n\n### Content Quality\n- [x] No implementation details (languages, frameworks, APIs)\n- [x] Focused on user value and business needs\n- [x] Written for non-technical stakeholders\n- [x] All mandatory sections completed\n\n### Requirement Completeness\n- [x] No [NEEDS CLARIFICATION] markers remain\n- [x] Requirements are testable and unambiguous  \n- [x] Success criteria are measurable\n- [x] Scope is clearly bounded\n- [x] Dependencies and assumptions identified\n\n---\n\n## Execution Status\n*Updated by main() during processing*\n\n- [x] User description parsed\n- [x] Key concepts extracted\n- [x] Ambiguities marked\n- [x] User scenarios defined\n- [x] Requirements generated\n- [x] Entities identified\n- [x] Review checklist passed\n\n---\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/contracts/fallback_loader_interface.py",
    "content": "\"\"\"\nInterface contract for fallback loader implementations.\nDefines the expected behavior for fallback loading mechanisms.\n\"\"\"\n\nfrom abc import ABC, abstractmethod\nfrom typing import Dict, Any\nfrom dataclasses import dataclass\n\n@dataclass\nclass ModelLoadingResult:\n    \"\"\"Result of a model loading attempt.\"\"\"\n    model: Any\n    model_type: str\n    config_used: Dict[str, Any]\n    implementation_version: str\n    loading_method: str\n    device: str\n    success: bool\n    error_message: str = None\n\n\nclass FallbackLoaderInterface(ABC):\n    \"\"\"\n    Interface for fallback model loading implementations.\n    \n    Defines the contract that fallback loaders must implement to provide\n    compatibility with legacy models when new implementations fail.\n    \"\"\"\n    \n    @abstractmethod\n    def try_new_implementation(self, \n                             model_path: str, \n                             config: Dict[str, Any], \n                             device: str = 'cpu') -> ModelLoadingResult:\n        \"\"\"\n        Try loading with new implementation.\n        \n        Args:\n            model_path: Path to the model file\n            config: Model configuration dictionary\n            device: Device to load model on\n            \n        Returns:\n            ModelLoadingResult with attempt results\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def try_legacy_implementation(self, \n                                model_path: str, \n                                config: Dict[str, Any], \n                                device: str = 'cpu') -> ModelLoadingResult:\n        \"\"\"\n        Try loading with legacy implementation.\n        \n        Args:\n            model_path: Path to the model file\n            config: Model configuration dictionary\n            device: Device to load model on\n            \n        Returns:\n            ModelLoadingResult with fallback attempt results\n        \"\"\"\n        pass\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/contracts/parameter_validator_interface.py",
    "content": "\"\"\"\nAPI Contract: Parameter Validator Interface\nThis defines the interface for validating Roformer model parameters.\n\"\"\"\n\nfrom abc import ABC, abstractmethod\nfrom typing import Dict, Any, List, Optional, Tuple\nfrom dataclasses import dataclass\nfrom enum import Enum\n\n\nclass ValidationSeverity(Enum):\n    \"\"\"Severity levels for validation issues.\"\"\"\n    ERROR = \"error\"      # Blocks model loading\n    WARNING = \"warning\"  # Allows loading but may affect performance\n    INFO = \"info\"        # Informational only\n\n\n@dataclass\nclass ValidationIssue:\n    \"\"\"Represents a validation issue found in model configuration.\"\"\"\n    severity: ValidationSeverity\n    parameter_name: str\n    message: str\n    suggested_fix: str\n    current_value: Any = None\n    expected_value: Any = None\n\n\nclass ParameterValidatorInterface(ABC):\n    \"\"\"Abstract interface for validating model parameters.\"\"\"\n    \n    @abstractmethod\n    def validate_required_parameters(self, config: Dict[str, Any], model_type: str) -> List[ValidationIssue]:\n        \"\"\"\n        Validate that all required parameters are present.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model (\"bs_roformer\" or \"mel_band_roformer\")\n            \n        Returns:\n            List of validation issues for missing required parameters\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_parameter_types(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate parameter types match expected types.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for type mismatches\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_parameter_ranges(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate parameter values are within acceptable ranges.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for out-of-range values\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_parameter_compatibility(self, config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate that parameter combinations are compatible.\n        \n        Args:\n            config: Model configuration dictionary\n            \n        Returns:\n            List of validation issues for incompatible parameter combinations\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_normalization_config(self, norm_config: Any) -> List[ValidationIssue]:\n        \"\"\"\n        Validate normalization configuration.\n        \n        Args:\n            norm_config: Normalization configuration (may be string, dict, or None)\n            \n        Returns:\n            List of validation issues for normalization configuration\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def get_parameter_defaults(self, model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Get default values for optional parameters.\n        \n        Args:\n            model_type: Type of model (\"bs_roformer\" or \"mel_band_roformer\")\n            \n        Returns:\n            Dictionary of parameter names to default values\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def apply_parameter_defaults(self, config: Dict[str, Any], model_type: str) -> Dict[str, Any]:\n        \"\"\"\n        Apply default values to missing optional parameters.\n        \n        Args:\n            config: Model configuration dictionary\n            model_type: Type of model\n            \n        Returns:\n            Configuration with defaults applied\n        \"\"\"\n        pass\n\n\nclass BSRoformerValidatorInterface(ABC):\n    \"\"\"Specialized validator for BSRoformer models.\"\"\"\n    \n    @abstractmethod\n    def validate_freqs_per_bands(self, freqs_per_bands: Tuple[int, ...], stft_config: Dict[str, Any]) -> List[ValidationIssue]:\n        \"\"\"\n        Validate frequency bands configuration.\n        \n        Args:\n            freqs_per_bands: Tuple of frequencies per band\n            stft_config: STFT configuration parameters\n            \n        Returns:\n            List of validation issues for frequency bands\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def calculate_expected_freqs(self, stft_n_fft: int) -> int:\n        \"\"\"\n        Calculate expected number of frequency bins from STFT configuration.\n        \n        Args:\n            stft_n_fft: STFT n_fft parameter\n            \n        Returns:\n            Expected number of frequency bins\n        \"\"\"\n        pass\n\n\nclass MelBandRoformerValidatorInterface(ABC):\n    \"\"\"Specialized validator for MelBandRoformer models.\"\"\"\n    \n    @abstractmethod\n    def validate_num_bands(self, num_bands: int, sample_rate: int) -> List[ValidationIssue]:\n        \"\"\"\n        Validate number of mel bands.\n        \n        Args:\n            num_bands: Number of mel-scale bands\n            sample_rate: Audio sample rate\n            \n        Returns:\n            List of validation issues for mel bands configuration\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_sample_rate(self, sample_rate: int) -> List[ValidationIssue]:\n        \"\"\"\n        Validate audio sample rate.\n        \n        Args:\n            sample_rate: Audio sample rate in Hz\n            \n        Returns:\n            List of validation issues for sample rate\n        \"\"\"\n        pass\n\n\nclass ConfigurationNormalizerInterface(ABC):\n    \"\"\"Interface for normalizing configuration between old and new formats.\"\"\"\n    \n    @abstractmethod\n    def normalize_config_format(self, raw_config: Any) -> Dict[str, Any]:\n        \"\"\"\n        Normalize configuration from various input formats to standard dictionary.\n        \n        Args:\n            raw_config: Configuration in any format (dict, object, etc.)\n            \n        Returns:\n            Normalized configuration dictionary\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def map_legacy_parameters(self, config: Dict[str, Any]) -> Dict[str, Any]:\n        \"\"\"\n        Map legacy parameter names to current parameter names.\n        \n        Args:\n            config: Configuration with potentially legacy parameter names\n            \n        Returns:\n            Configuration with current parameter names\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def extract_nested_config(self, config: Any, path: str) -> Any:\n        \"\"\"\n        Extract nested configuration value using dot notation path.\n        \n        Args:\n            config: Configuration object or dictionary\n            path: Dot notation path (e.g., \"model.norm\")\n            \n        Returns:\n            Extracted configuration value or None if not found\n        \"\"\"\n        pass\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/contracts/roformer_loader_interface.py",
    "content": "\"\"\"\nAPI Contract: Roformer Model Loader Interface\nThis defines the interface for loading Roformer models with backward compatibility.\n\"\"\"\n\nfrom abc import ABC, abstractmethod\nfrom typing import Optional, List, Dict, Any, Tuple\nfrom dataclasses import dataclass\nfrom enum import Enum\n\n\nclass RoformerType(Enum):\n    \"\"\"Supported Roformer model types.\"\"\"\n    BS_ROFORMER = \"bs_roformer\"\n    MEL_BAND_ROFORMER = \"mel_band_roformer\"\n\n\nclass ImplementationVersion(Enum):\n    \"\"\"Available implementation versions.\"\"\"\n    OLD = \"old\"\n    NEW = \"new\"\n    FALLBACK = \"fallback\"\n\n\n@dataclass\nclass ModelLoadingResult:\n    \"\"\"Result of model loading operation.\"\"\"\n    success: bool\n    model: Optional[Any] = None  # Actual model instance\n    error_message: Optional[str] = None\n    implementation_used: ImplementationVersion = ImplementationVersion.NEW\n    warnings: List[str] = None\n    \n    def __post_init__(self):\n        if self.warnings is None:\n            self.warnings = []\n\n\n@dataclass\nclass ModelConfiguration:\n    \"\"\"Model configuration parameters.\"\"\"\n    # Required parameters\n    dim: int\n    depth: int\n    \n    # Common optional parameters\n    stereo: bool = False\n    num_stems: int = 1\n    time_transformer_depth: int = 2\n    freq_transformer_depth: int = 2\n    dim_head: int = 64\n    heads: int = 8\n    attn_dropout: float = 0.0\n    ff_dropout: float = 0.0\n    flash_attn: bool = True\n    \n    # New parameters (with defaults for backward compatibility)\n    mlp_expansion_factor: int = 4\n    sage_attention: bool = False\n    zero_dc: bool = True\n    use_torch_checkpoint: bool = False\n    skip_connection: bool = False\n    \n    # Normalization (may be None in some configs)\n    norm: Optional[str] = None\n    \n    # Model-specific parameters\n    freqs_per_bands: Optional[Tuple[int, ...]] = None  # BSRoformer\n    num_bands: Optional[int] = None  # MelBandRoformer\n\n\nclass ParameterValidationError(Exception):\n    \"\"\"Raised when model parameters are invalid.\"\"\"\n    \n    def __init__(self, parameter_name: str, expected_type: str, actual_value: Any, suggested_fix: str):\n        self.parameter_name = parameter_name\n        self.expected_type = expected_type\n        self.actual_value = actual_value\n        self.suggested_fix = suggested_fix\n        \n        message = (\n            f\"Invalid parameter '{parameter_name}': \"\n            f\"expected {expected_type}, got {type(actual_value).__name__} ({actual_value}). \"\n            f\"Suggestion: {suggested_fix}\"\n        )\n        super().__init__(message)\n\n\nclass RoformerLoaderInterface(ABC):\n    \"\"\"Abstract interface for loading Roformer models.\"\"\"\n    \n    @abstractmethod\n    def load_model(self, model_path: str, config: Optional[Dict[str, Any]] = None) -> ModelLoadingResult:\n        \"\"\"\n        Load a Roformer model from the given path.\n        \n        Args:\n            model_path: Path to the model file (.ckpt, .pth)\n            config: Optional configuration override\n            \n        Returns:\n            ModelLoadingResult with success status and model or error details\n            \n        Raises:\n            ParameterValidationError: If model configuration is invalid\n            FileNotFoundError: If model file doesn't exist\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def validate_configuration(self, config: ModelConfiguration, model_type: RoformerType) -> List[str]:\n        \"\"\"\n        Validate model configuration parameters.\n        \n        Args:\n            config: Model configuration to validate\n            model_type: Type of Roformer model\n            \n        Returns:\n            List of validation error messages (empty if valid)\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def detect_model_type(self, model_path: str) -> RoformerType:\n        \"\"\"\n        Detect the type of Roformer model from the file.\n        \n        Args:\n            model_path: Path to the model file\n            \n        Returns:\n            Detected RoformerType\n            \n        Raises:\n            ValueError: If model type cannot be determined\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def get_default_configuration(self, model_type: RoformerType) -> ModelConfiguration:\n        \"\"\"\n        Get default configuration for a model type.\n        \n        Args:\n            model_type: Type of Roformer model\n            \n        Returns:\n            Default ModelConfiguration for the type\n        \"\"\"\n        pass\n\n\nclass RoformerModelInterface(ABC):\n    \"\"\"Abstract interface for Roformer model instances.\"\"\"\n    \n    @abstractmethod\n    def separate_audio(self, audio_data: Any, **kwargs) -> Any:\n        \"\"\"\n        Separate audio into stems using the model.\n        \n        Args:\n            audio_data: Input audio data\n            **kwargs: Additional separation parameters\n            \n        Returns:\n            Separated audio stems\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def get_model_info(self) -> Dict[str, Any]:\n        \"\"\"\n        Get information about the loaded model.\n        \n        Returns:\n            Dictionary with model metadata\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def cleanup(self) -> None:\n        \"\"\"Clean up model resources.\"\"\"\n        pass\n\n\nclass FallbackLoaderInterface(ABC):\n    \"\"\"Interface for fallback loading mechanism.\"\"\"\n    \n    @abstractmethod\n    def try_new_implementation(self, model_path: str, config: ModelConfiguration) -> ModelLoadingResult:\n        \"\"\"\n        Attempt to load model with new implementation.\n        \n        Args:\n            model_path: Path to model file\n            config: Model configuration\n            \n        Returns:\n            ModelLoadingResult indicating success or failure\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def try_old_implementation(self, model_path: str, config: ModelConfiguration) -> ModelLoadingResult:\n        \"\"\"\n        Attempt to load model with old implementation (fallback).\n        \n        Args:\n            model_path: Path to model file\n            config: Model configuration\n            \n        Returns:\n            ModelLoadingResult indicating success or failure\n        \"\"\"\n        pass\n    \n    @abstractmethod\n    def should_fallback(self, error: Exception) -> bool:\n        \"\"\"\n        Determine if the error warrants falling back to old implementation.\n        \n        Args:\n            error: Exception from new implementation attempt\n            \n        Returns:\n            True if should attempt fallback, False otherwise\n        \"\"\"\n        pass\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/data-model.md",
    "content": "# Data Model: Roformer Implementation Update\n\n## Core Entities\n\n### RoformerModel\nRepresents a Roformer audio separation model with configuration parameters.\n\n**Fields:**\n- `model_path: str` - Path to the model file (.ckpt, .pth)\n- `config: ModelConfiguration` - Model configuration parameters\n- `model_type: RoformerType` - Type of Roformer (BSRoformer, MelBandRoformer)\n- `implementation_version: str` - Which implementation version (\"old\", \"new\")\n- `loaded_successfully: bool` - Whether model loaded without errors\n\n**Validation Rules:**\n- `model_path` must exist and be readable\n- `config` must contain required parameters for the model type\n- `model_type` must be one of supported variants\n\n**State Transitions:**\n- `Unloaded → Loading → Loaded` (success path)\n- `Unloaded → Loading → Failed` (error path)\n- `Loaded → Unloaded` (cleanup)\n\n### ModelConfiguration\nDictionary-like object containing model architecture parameters.\n\n**Fields:**\n- `dim: int` - Model dimension\n- `depth: int` - Number of transformer layers\n- `stereo: bool` - Whether model handles stereo audio\n- `num_stems: int` - Number of output stems\n- `time_transformer_depth: int` - Depth of time transformer\n- `freq_transformer_depth: int` - Depth of frequency transformer\n- `freqs_per_bands: Tuple[int, ...]` - Frequency bands configuration\n- `dim_head: int` - Attention head dimension\n- `heads: int` - Number of attention heads\n- `attn_dropout: float` - Attention dropout rate\n- `ff_dropout: float` - Feed-forward dropout rate\n- `flash_attn: bool` - Whether to use flash attention\n- `norm: str` - Normalization type (new field handling)\n\n**New Parameters (for updated models):**\n- `mlp_expansion_factor: int = 4` - MLP expansion ratio\n- `sage_attention: bool = False` - Enable Sage attention\n- `zero_dc: bool = True` - Zero DC component handling\n- `use_torch_checkpoint: bool = False` - Enable gradient checkpointing\n- `skip_connection: bool = False` - Enable skip connections\n\n**Validation Rules:**\n- All numeric fields must be positive\n- Dropout rates must be between 0.0 and 1.0\n- `freqs_per_bands` must sum to expected frequency count\n- `norm` must be valid normalization type or None\n\n### BSRoformerConfig\nSpecialized configuration for Band-Split Roformer models.\n\n**Fields:**\n- Inherits all from `ModelConfiguration`\n- `freqs_per_bands: Tuple[int, ...]` - Required, defines frequency band splits\n- `mask_estimator_depth: int = 2` - Depth of mask estimation network\n\n**Validation Rules:**\n- `freqs_per_bands` must be provided and non-empty\n- Sum of `freqs_per_bands` must match STFT frequency bins\n\n### MelBandRoformerConfig  \nSpecialized configuration for Mel-Band Roformer models.\n\n**Fields:**\n- Inherits all from `ModelConfiguration`\n- `num_bands: int` - Number of mel-scale bands\n- `sample_rate: int = 44100` - Audio sample rate for mel calculation\n\n**Validation Rules:**\n- `num_bands` must be positive integer\n- `sample_rate` must be valid audio sample rate\n\n### ModelLoadingResult\nResult object returned from model loading attempts.\n\n**Fields:**\n- `success: bool` - Whether loading succeeded\n- `model: Optional[RoformerModel]` - Loaded model if successful\n- `error_message: Optional[str]` - Error description if failed\n- `implementation_used: str` - Which implementation was used (\"old\", \"new\", \"fallback\")\n- `warnings: List[str]` - Non-fatal warnings during loading\n\n**State Transitions:**\n- `Attempting → Success` (with model)\n- `Attempting → Failure` (with error message)\n- `Success → Cleanup` (model disposal)\n\n### ParameterValidationError\nException raised when model parameters are invalid.\n\n**Fields:**\n- `parameter_name: str` - Name of invalid parameter\n- `expected_type: str` - Expected parameter type\n- `actual_value: Any` - Actual value provided\n- `suggested_fix: str` - Suggestion for fixing the issue\n\n## Entity Relationships\n\n```\nRoformerModel\n├── has_one ModelConfiguration\n│   ├── extends_to BSRoformerConfig\n│   └── extends_to MelBandRoformerConfig\n├── produces ModelLoadingResult\n└── may_raise ParameterValidationError\n\nModelConfiguration\n├── validates_against ValidationRules\n└── contains ParameterSet\n    ├── required_parameters\n    ├── optional_parameters\n    └── new_parameters\n```\n\n## Data Flow\n\n### Model Loading Flow\n1. `ModelLoader.load(model_path)` → `ModelLoadingResult`\n2. Parse configuration from model file\n3. Validate configuration parameters\n4. Attempt loading with new implementation\n5. On failure, attempt loading with old implementation\n6. Return result with success/failure status\n\n### Parameter Validation Flow\n1. Extract parameters from model configuration\n2. Check required parameters exist\n3. Validate parameter types and ranges\n4. Check parameter compatibility\n5. Raise `ParameterValidationError` with specific details if invalid\n\n### Configuration Normalization Flow\n1. Load raw configuration from model\n2. Map old parameter names to new parameter names\n3. Set default values for missing optional parameters\n4. Validate final configuration\n5. Return normalized `ModelConfiguration`\n\n## Storage Considerations\n\n- Model files are read-only external artifacts\n- Configuration is derived from model metadata\n- No persistent state storage required\n- Memory usage scales with model size\n- Cleanup required after model disposal\n\n## Performance Considerations\n\n- Model loading is I/O intensive operation\n- Configuration validation should be fast\n- Parameter defaults should be computed once\n- Error messages should be pre-formatted for common cases\n- Fallback mechanism adds latency but ensures compatibility\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/plan.md",
    "content": "# Implementation Plan: Update Roformer Implementation\n\n**Branch**: `001-update-roformer-implementation` | **Date**: September 25, 2025 | **Spec**: [/specs/001-update-roformer-implementation.md](../001-update-roformer-implementation.md)\n**Input**: Feature specification from `/specs/001-update-roformer-implementation.md`\n\n## Execution Flow (/plan command scope)\n```\n1. Load feature spec from Input path\n   → If not found: ERROR \"No feature spec at {path}\"\n2. Fill Technical Context (scan for NEEDS CLARIFICATION)\n   → Detect Project Type from context (web=frontend+backend, mobile=app+api)\n   → Set Structure Decision based on project type\n3. Fill the Constitution Check section based on the content of the constitution document.\n4. Evaluate Constitution Check section below\n   → If violations exist: Document in Complexity Tracking\n   → If no justification possible: ERROR \"Simplify approach first\"\n   → Update Progress Tracking: Initial Constitution Check\n5. Execute Phase 0 → research.md\n   → If NEEDS CLARIFICATION remain: ERROR \"Resolve unknowns\"\n6. Execute Phase 1 → contracts, data-model.md, quickstart.md, agent-specific template file (e.g., `CLAUDE.md` for Claude Code, `.github/copilot-instructions.md` for GitHub Copilot, `GEMINI.md` for Gemini CLI, `QWEN.md` for Qwen Code or `AGENTS.md` for opencode).\n7. Re-evaluate Constitution Check section\n   → If new violations: Refactor design, return to Phase 1\n   → Update Progress Tracking: Post-Design Constitution Check\n8. Plan Phase 2 → Describe task generation approach (DO NOT create tasks.md)\n9. STOP - Ready for /tasks command\n```\n\n**IMPORTANT**: The /plan command STOPS at step 7. Phases 2-4 are executed by other commands:\n- Phase 2: /tasks command creates tasks.md\n- Phase 3-4: Implementation execution (manual or via tools)\n\n## Summary\nUpdate the Roformer implementation to support latest model architectures while maintaining backward compatibility. The primary requirement is enabling newer Roformer models with updated parameters (mlp_expansion_factor, sage_attention, zero_dc) while preserving functionality for existing older models. Technical approach involves integrating new inference code with fallback mechanisms and comprehensive validation using existing spectral analysis testing framework.\n\n## Technical Context\n**Language/Version**: Python 3.11+  \n**Primary Dependencies**: PyTorch, librosa, soundfile, numpy, onnxruntime  \n**Storage**: Model files (.ckpt, .pth, .onnx), audio files (FLAC, WAV, MP3)  \n**Testing**: pytest with custom audio validation (SSIM comparison)  \n**Target Platform**: Cross-platform (Windows, macOS, Linux) with GPU acceleration support  \n**Project Type**: single - Python library with CLI wrapper  \n**Performance Goals**: Maintain existing separation quality (≥0.90 waveform, ≥0.80 spectrogram similarity)  \n**Constraints**: Zero regression in existing model functionality, backward compatibility mandatory  \n**Scale/Scope**: Support for BSRoformer and MelBandRoformer variants, integration with existing 4 model architectures\n\n## Constitution Check\n*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*\n\n### I. Library-First Architecture\n- [x] Core functionality implemented in `Separator` class or similar library pattern\n- [x] CLI/Remote API are thin wrappers, not containing business logic\n- [x] Clear separation between model architectures (MDX, VR, Demucs, MDXC)\n\n### II. Multi-Interface Consistency  \n- [x] Feature accessible via Python API, CLI, and Remote API (if applicable)\n- [x] Parameter names identical across all interfaces\n- [x] Same model architectures supported across interfaces\n\n### III. Test-First Development (NON-NEGOTIABLE)\n- [x] Tests written before implementation\n- [x] Unit tests for all core functionality\n- [x] Integration tests with audio validation (SSIM comparison)\n- [x] CLI tests for all exposed functionality\n\n### IV. Performance & Resource Efficiency\n- [x] Hardware acceleration support considered (CUDA, CoreML, DirectML)\n- [x] Memory optimization for large files (streaming/batch processing)\n- [x] Tunable parameters for different hardware capabilities\n\n### V. Model Architecture Separation\n- [x] Each architecture in separate modules\n- [x] Inherits from `CommonSeparator` pattern\n- [x] Architecture-specific parameters isolated\n- [x] Loading one architecture doesn't load others\n\n## Project Structure\n\n### Documentation (this feature)\n```\nspecs/001-update-roformer-implementation/\n├── plan.md              # This file (/plan command output)\n├── research.md          # Phase 0 output (/plan command)\n├── data-model.md        # Phase 1 output (/plan command)\n├── quickstart.md        # Phase 1 output (/plan command)\n├── contracts/           # Phase 1 output (/plan command)\n└── tasks.md             # Phase 2 output (/tasks command - NOT created by /plan)\n```\n\n### Source Code (repository root)\n```\n# Option 1: Single project (DEFAULT)\naudio_separator/\n├── separator/\n│   ├── uvr_lib_v5/          # Existing old Roformer implementation\n│   ├── msst-models-new/     # New reference implementation\n│   ├── common_separator.py  # Base class for all architectures\n│   └── roformer_separator.py # Updated Roformer implementation\n└── utils/\n\ntests/\n├── contract/\n├── integration/\n└── unit/\n```\n\n**Structure Decision**: Option 1 - Single project structure, as this is a library enhancement rather than web/mobile application\n\n## Phase 0: Outline & Research\n1. **Extract unknowns from Technical Context** above:\n   - Analyze differences between old and new Roformer implementations\n   - Research parameter compatibility between BSRoformer and MelBandRoformer variants\n   - Investigate normalization configuration handling patterns\n   - Study fallback mechanism implementation strategies\n\n2. **Generate and dispatch research agents**:\n   ```\n   Task: \"Research differences between old uvr_lib_v5 and new msst-models Roformer implementations\"\n   Task: \"Find best practices for backward compatibility in PyTorch model loading\"\n   Task: \"Research parameter validation patterns for ML model configurations\"\n   Task: \"Study fallback mechanism implementations in audio processing libraries\"\n   ```\n\n3. **Consolidate findings** in `research.md` using format:\n   - Decision: [what was chosen]\n   - Rationale: [why chosen]\n   - Alternatives considered: [what else evaluated]\n\n**Output**: research.md with all technical unknowns resolved\n\n## Phase 1: Design & Contracts\n*Prerequisites: research.md complete*\n\n1. **Extract entities from feature spec** → `data-model.md`:\n   - RoformerModel: Configuration parameters, validation rules\n   - ModelConfiguration: Parameter dictionaries, normalization settings\n   - BSRoformerConfig: Band-split specific parameters\n   - MelBandRoformerConfig: Mel-scale band specific parameters\n\n2. **Generate API contracts** from functional requirements:\n   - Model loading interface with fallback capability\n   - Parameter validation interface\n   - Error reporting interface for configuration mismatches\n   - Output OpenAPI/GraphQL schema to `/contracts/`\n\n3. **Generate contract tests** from contracts:\n   - Test model loading with old format\n   - Test model loading with new format\n   - Test fallback mechanism activation\n   - Test parameter validation and error reporting\n\n4. **Extract test scenarios** from user stories:\n   - Load existing older Roformer model → separation works identically\n   - Load newer Roformer model → separation works with new parameters\n   - Switch between model types → no conflicts or failures\n\n5. **Update agent file incrementally** (O(1) operation):\n   - Run `.specify/scripts/bash/update-agent-context.sh cursor`\n   - Add Roformer implementation context\n   - Preserve existing audio separation knowledge\n   - Update with new model parameter handling\n\n**Output**: data-model.md, /contracts/*, failing tests, quickstart.md, agent-specific file\n\n## Phase 2: Task Planning Approach\n*This section describes what the /tasks command will do - DO NOT execute during /plan*\n\n**Task Generation Strategy**:\n- Load `.specify/templates/tasks-template.md` as base\n- Generate tasks from Phase 1 design docs (contracts, data model, quickstart)\n- Each contract → contract test task [P]\n- Each entity → model creation task [P] \n- Each user story → integration test task\n- Implementation tasks to make tests pass\n\n**Ordering Strategy**:\n- TDD order: Tests before implementation \n- Dependency order: Models before services before UI\n- Mark [P] for parallel execution (independent files)\n\n**Estimated Output**: 20-25 numbered, ordered tasks in tasks.md\n\n**IMPORTANT**: This phase is executed by the /tasks command, NOT by /plan\n\n## Phase 3+: Future Implementation\n*These phases are beyond the scope of the /plan command*\n\n**Phase 3**: Task execution (/tasks command creates tasks.md)  \n**Phase 4**: Implementation (execute tasks.md following constitutional principles)  \n**Phase 5**: Validation (run tests, execute quickstart.md, performance validation)\n\n## Complexity Tracking\n*No constitutional violations identified - all checks passed*\n\n## Progress Tracking\n*This checklist is updated during execution flow*\n\n**Phase Status**:\n- [x] Phase 0: Research complete (/plan command)\n- [x] Phase 1: Design complete (/plan command)\n- [x] Phase 2: Task planning complete (/plan command - describe approach only)\n- [x] Phase 3: Tasks generated (/tasks command)\n- [ ] Phase 4: Implementation complete\n- [ ] Phase 5: Validation passed\n\n**Gate Status**:\n- [x] Initial Constitution Check: PASS\n- [x] Post-Design Constitution Check: PASS\n- [x] All NEEDS CLARIFICATION resolved\n- [x] Complexity deviations documented\n\n---\n*Based on Constitution v1.0.0 - See `.specify/memory/constitution.md`*\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/post-implementation-issues.md",
    "content": "### Post-Implementation Issues: Roformer Routing and Execution Paths\n\n- **Observed behavior**: New Roformer models load and separate successfully, but execution is routed through `MDXCSeparator` rather than a dedicated Roformer separator. Logs show the MDXC path with Roformer-specific branching inside `MDXCSeparator`.\n\n- **Root cause**:\n  - `Separator.list_supported_model_files()` groups Roformer entries under the `MDXC` model type, so `Separator.load_model()` instantiates `architectures.mdxc_separator.MDXCSeparator` for Roformer files.\n  - `separator_classes` in `separator.py` has no mapping for a `Roformer` type; the dedicated `architectures/roformer_separator.py` is never routed to and is effectively unused.\n  - `MDXCSeparator` contains an internal Roformer branch (constructs `BSRoformer`/`MelBandRoformer`, loads checkpoints, and implements Roformer chunking/overlap-add). This duplicates responsibility with the new `roformer_loader`.\n\n- **Impact**:\n  - The new Roformer loader (normalization/validation/fallback) is not used in actual runs; loader stats remain zero in logs.\n  - Two overlapping Roformer code paths (inside `MDXCSeparator` and the new loader + `RoformerSeparator`) create confusion and increase maintenance cost.\n  - `RoformerSeparator` is dead code under current routing.\n\n- **Decision**: Proceed with Option A (minimal-change refactor)\n  - Keep routing via `MDXCSeparator` to avoid broad changes.\n  - Refactor the Roformer branch in `MDXCSeparator` to use `RoformerLoader` for model loading/validation/fallback, while preserving the existing Roformer chunking/overlap-add execution.\n  - Align the Roformer detection flag naming (`is_roformer_model`) with `CommonSeparator`, and surface loader stats via existing `CommonSeparator.get_roformer_loading_stats()` and `Separator` logging.\n\n- **Planned edits (Option A)**:\n  - `architectures/mdxc_separator.py`:\n    - Use `self.is_roformer_model` (from `CommonSeparator`) instead of a separate `self.is_roformer` detection.\n    - In `load_model()`, when Roformer is detected, call `self.roformer_loader.load_model(model_path=self.model_path, config=self.model_data['model'], device=str(self.torch_device))`. On success, set `self.model_run = result.model`; on failure, fall back to existing direct instantiation path.\n    - Keep current Roformer chunking/overlap-add logic in `demix()` unchanged.\n  - `roformer/roformer_loader.py`:\n    - Ensure it returns a `ModelLoadingResult` that matches the local dataclass (use `success_result`, `failure_result`, `fallback_success_result` helpers), and populate metadata via `model_info` rather than custom fields.\n\n- **Expected outcomes**:\n  - No change in separation outputs or performance characteristics.\n  - Loader stats in logs reflect actual usage (non-zero `new_implementation_success` and/or `fallback_success`).\n  - Reduced duplication and clearer ownership, while deferring larger routing changes (introducing `RoformerSeparator`) to a follow-up if desired.\n\n\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/quickstart.md",
    "content": "# Quickstart Guide: Updated Roformer Implementation\n\nThis guide demonstrates how to use the updated Roformer implementation with both old and new models.\n\n## Prerequisites\n\n- Python 3.11+\n- PyTorch installed\n- audio-separator package installed\n- Access to Roformer model files (.ckpt or .pth)\n\n## Basic Usage\n\n### Loading a Roformer Model\n\n```python\nfrom audio_separator import Separator\n\n# Initialize separator with Roformer model\nseparator = Separator(\n    model_file_dir='path/to/models',\n    output_dir='path/to/output'\n)\n\n# Load a Roformer model (automatically detects old vs new format)\nseparator.load_model('model_bs_roformer_ep_317_sdr_12.9755.ckpt')\n```\n\n### Separating Audio\n\n```python\n# Separate audio file\noutput_files = separator.separate('input_audio.flac')\n\nprint(f\"Separation complete. Output files: {output_files}\")\n```\n\n## Model Types\n\n### BSRoformer Models\n\n```python\n# BSRoformer models work with frequency band splitting\nseparator.load_model('bs_roformer_model.ckpt')\noutputs = separator.separate('audio.flac')\n# Outputs: ['audio_(Vocals).flac', 'audio_(Instrumental).flav']\n```\n\n### MelBandRoformer Models  \n\n```python\n# MelBandRoformer models work with mel-scale bands\nseparator.load_model('mel_band_roformer_model.ckpt')\noutputs = separator.separate('audio.flac')\n# Outputs depend on model configuration\n```\n\n## Advanced Configuration\n\n### Manual Configuration Override\n\n```python\n# Override model configuration if needed\nconfig_override = {\n    'mlp_expansion_factor': 4,\n    'sage_attention': True,\n    'zero_dc': True,\n    'use_torch_checkpoint': False\n}\n\nseparator.load_model('model.ckpt', config=config_override)\n```\n\n### Error Handling\n\n```python\ntry:\n    separator.load_model('problematic_model.ckpt')\n    outputs = separator.separate('audio.flac')\nexcept ParameterValidationError as e:\n    print(f\"Configuration error: {e}\")\n    print(f\"Suggestion: {e.suggested_fix}\")\nexcept Exception as e:\n    print(f\"Loading failed: {e}\")\n```\n\n## Testing Scenarios\n\n### Scenario 1: Existing Older Model\n\nTest that existing models continue to work without changes.\n\n```python\n# This should work exactly as before\nseparator = Separator()\nseparator.load_model('old_roformer_model.ckpt')\noutputs = separator.separate('test_audio.flac')\n\n# Verify outputs match previous results\nassert len(outputs) == 2  # Expecting vocal and instrumental\nassert outputs[0].endswith('_(Vocals).flav')\nassert outputs[1].endswith('_(Instrumental).flav')\n```\n\n### Scenario 2: Newer Model with New Parameters\n\nTest that newer models with additional parameters work correctly.\n\n```python\n# This should work with the updated implementation\nseparator = Separator()\nseparator.load_model('new_roformer_with_sage_attention.ckpt')\noutputs = separator.separate('test_audio.flac')\n\n# Verify outputs are generated successfully\nassert len(outputs) >= 1\nfor output in outputs:\n    assert os.path.exists(output)\n    assert os.path.getsize(output) > 0\n```\n\n### Scenario 3: Model Type Switching\n\nTest switching between different Roformer variants in the same session.\n\n```python\nseparator = Separator()\n\n# Load BSRoformer model\nseparator.load_model('bs_roformer.ckpt')\nbs_outputs = separator.separate('test1.flac')\n\n# Switch to MelBandRoformer model  \nseparator.load_model('mel_band_roformer.ckpt')\nmel_outputs = separator.separate('test2.flac')\n\n# Both should work without conflicts\nassert len(bs_outputs) > 0\nassert len(mel_outputs) > 0\n```\n\n### Scenario 4: Configuration Validation\n\nTest that invalid configurations are caught with helpful error messages.\n\n```python\n# Test missing required parameter\ntry:\n    separator.load_model('model_with_missing_config.ckpt')\n    assert False, \"Should have raised validation error\"\nexcept ParameterValidationError as e:\n    assert \"missing\" in e.suggested_fix.lower()\n    assert e.parameter_name is not None\n```\n\n### Scenario 5: Fallback Mechanism\n\nTest that fallback from new to old implementation works.\n\n```python\n# This would internally try new implementation first, then fallback to old\nseparator = Separator(log_level='DEBUG')  # Enable debug logging\nseparator.load_model('edge_case_model.ckpt')\n\n# Check logs to verify fallback occurred if needed\n# (Implementation should log which version was used)\n```\n\n## Performance Validation\n\n### Audio Quality Validation\n\n```python\nimport subprocess\nimport os\n\n# Run integration test to validate audio quality\nresult = subprocess.run([\n    'python', '-m', 'pytest', \n    'tests/integration/test_roformer_quality.py',\n    '-v'\n], capture_output=True, text=True)\n\nassert result.returncode == 0, f\"Quality tests failed: {result.stderr}\"\n```\n\n### Regression Testing\n\n```python\n# Ensure existing models produce identical results\nreference_outputs = load_reference_outputs('test_audio.flac')\ncurrent_outputs = separator.separate('test_audio.flav')\n\nfor ref, curr in zip(reference_outputs, current_outputs):\n    similarity = calculate_audio_similarity(ref, curr)\n    assert similarity >= 0.90, f\"Audio similarity {similarity} below threshold\"\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **AttributeError: 'norm'**\n   - Cause: Model configuration has different normalization structure\n   - Solution: The updated implementation handles this automatically\n\n2. **TypeError: unexpected keyword argument 'mlp_expansion_factor'**\n   - Cause: Trying to use new model with old implementation\n   - Solution: The fallback mechanism handles this automatically\n\n3. **Model loading fails completely**\n   - Check model file exists and is readable\n   - Verify model is a supported Roformer variant\n   - Check logs for specific error details\n\n### Debug Information\n\n```python\n# Enable detailed logging\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n\nseparator = Separator()\nseparator.load_model('model.ckpt')\n# Check logs for implementation version used and any warnings\n```\n\n## Manual Testing Checklist\n\nAfter implementation changes, manually verify:\n\n- [ ] Existing BSRoformer models load and separate correctly\n- [ ] Existing MelBandRoformer models load and separate correctly  \n- [ ] New models with additional parameters work\n- [ ] Audio quality matches reference outputs (SSIM ≥ 0.90/0.80)\n- [ ] Error messages are clear and helpful\n- [ ] Performance is not significantly degraded\n- [ ] Memory usage is reasonable\n- [ ] Multiple models can be loaded in sequence\n\n## Next Steps\n\nAfter validating the quickstart scenarios:\n\n1. Run full integration test suite\n2. Perform manual testing with real model files\n3. Validate performance benchmarks\n4. Update documentation if needed\n5. Prepare for production deployment\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/research.md",
    "content": "# Research Findings: Roformer Implementation Update\n\n## Key Differences Between Old and New Implementations\n\n### 1. BSRoformer Constructor Parameters\n\n**Decision**: The new implementation includes 5 additional parameters that are missing from the old implementation:\n- `zero_dc = True` (line 385)\n- `mlp_expansion_factor = 4` (line 392) \n- `use_torch_checkpoint = False` (line 393)\n- `skip_connection = False` (line 394)\n- `sage_attention = False` (line 395)\n\n**Rationale**: These parameters enable advanced features in newer models:\n- `mlp_expansion_factor`: Controls MLP layer expansion ratio for better model capacity\n- `sage_attention`: Enables Sage attention mechanism for improved performance\n- `zero_dc`: Controls DC component handling in STFT processing\n- `use_torch_checkpoint`: Enables gradient checkpointing for memory efficiency\n- `skip_connection`: Enables residual connections between layers\n\n**Alternatives considered**: \n- Hardcoding defaults in old implementation (rejected - not maintainable)\n- Creating separate classes (rejected - increases complexity)\n- Parameter validation with fallback (chosen approach)\n\n### 2. Transformer Configuration Changes\n\n**Decision**: New implementation includes `sage_attention` parameter in transformer_kwargs and conditional logic for Sage attention activation.\n\n**Rationale**: The new implementation adds:\n```python\nif sage_attention:\n    print(\"Use Sage Attention\")\n\ntransformer_kwargs = dict(\n    # ... existing parameters ...\n    sage_attention=sage_attention,  # New parameter\n)\n```\n\n**Alternatives considered**: Ignoring sage_attention (rejected - breaks newer models), implementing stub (chosen for backward compatibility)\n\n### 3. Instance Variable Additions\n\n**Decision**: New implementation tracks additional state variables:\n- `self.use_torch_checkpoint` \n- `self.skip_connection`\n\n**Rationale**: These are used throughout the forward pass and need to be accessible as instance variables.\n\n**Alternatives considered**: Local variables only (rejected - needed in forward method), property methods (rejected - unnecessary complexity)\n\n### 4. Normalization Configuration Issues\n\n**Decision**: The error `AttributeError: \"'norm'\"` in `tfc_tdf_v3.py` line 155 occurs when `config.model.norm` is accessed but the configuration object structure is different between old and new models.\n\n**Rationale**: Analysis of `tfc_tdf_v3.py` shows:\n```python\nnorm = get_norm(norm_type=config.model.norm)  # Line 155\n```\n\nThe `get_norm` function expects a string value but newer model configs may have different structure or missing `norm` attribute.\n\n**Alternatives considered**: \n- Modify get_norm to handle missing attributes (chosen)\n- Update config parsing (rejected - too invasive)\n- Separate normalization handling (rejected - duplicates code)\n\n### 5. Model Loading Strategy\n\n**Decision**: Implement try-new-first-fallback-to-old pattern for model loading.\n\n**Rationale**: This approach:\n1. Attempts to load with new implementation first\n2. Falls back to old implementation if new fails\n3. Provides clear error messages for debugging\n4. Maintains zero regression for existing models\n\n**Alternatives considered**: \n- Version detection from model files (rejected - unreliable)\n- User-specified format (rejected - poor UX)\n- Parallel implementations (rejected - maintenance burden)\n\n### 6. Parameter Validation Patterns\n\n**Decision**: Implement explicit parameter validation with detailed error messages rather than silent defaults.\n\n**Rationale**: Based on clarification session, failing fast with clear error messages is preferred over assuming defaults. This helps users understand what's wrong with their model configurations.\n\n**Alternatives considered**: \n- Silent defaults (rejected per clarification)\n- Configuration auto-correction (rejected - unpredictable)\n- Hardcoded fallbacks (rejected - not maintainable)\n\n## Implementation Strategy\n\n### Phase 1: Extend Old Implementation\n1. Add new parameters to BSRoformer.__init__ with default values\n2. Add new parameters to MelBandRoformer.__init__ with default values  \n3. Update transformer_kwargs to include new parameters\n4. Add instance variable assignments for checkpoint and skip_connection\n\n### Phase 2: Normalization Compatibility\n1. Modify get_norm function to handle missing norm attributes\n2. Add defensive checks in TFC_TDF_net.__init__ \n3. Provide clear error messages for configuration issues\n\n### Phase 3: Fallback Mechanism\n1. Implement model loading wrapper that tries new implementation first\n2. Add fallback to old implementation on specific exceptions\n3. Log which implementation was used for debugging\n\n### Phase 4: Testing Integration\n1. Extend existing integration tests to cover new parameters\n2. Add specific tests for fallback mechanism\n3. Validate that existing models continue to work identically\n\n## Risk Mitigation\n\n1. **Backward Compatibility**: All new parameters have sensible defaults that maintain existing behavior\n2. **Error Handling**: Clear error messages help users identify configuration issues\n3. **Testing**: Comprehensive test coverage ensures no regressions\n4. **Fallback Safety**: Old implementation remains available if new implementation fails\n5. **Manual Validation**: Manual testing after major milestones ensures real-world compatibility\n"
  },
  {
    "path": "specs/001-update-roformer-implementation/tasks.md",
    "content": "# Tasks: Update Roformer Implementation\n\n**Input**: Design documents from `/specs/001-update-roformer-implementation/`\n**Prerequisites**: plan.md (required), research.md, data-model.md, contracts/\n\n## Execution Flow (main)\n```\n1. Load plan.md from feature directory\n   → Extract: Python 3.11+, PyTorch, librosa, soundfile, numpy, onnxruntime\n   → Structure: single project (audio_separator/ library with CLI wrapper)\n2. Load design documents:\n   → data-model.md: 6 entities → model tasks\n   → contracts/: 2 interface files → contract test tasks\n   → quickstart.md: 5 test scenarios → integration test tasks\n3. Generate tasks by category:\n   → Setup: dependencies, linting, project structure\n   → Tests: contract tests, integration tests (TDD)\n   → Core: models, parameter validation, fallback mechanism\n   → Integration: existing separator integration, CLI updates\n   → Polish: unit tests, performance validation, docs\n4. Apply TDD ordering: Tests before implementation\n5. Mark [P] for parallel execution (different files)\n6. Validate: All contracts tested, all entities implemented\n```\n\n## Format: `[ID] [P?] Description`\n- **[P]**: Can run in parallel (different files, no dependencies)\n- Include exact file paths in descriptions\n\n## Phase 3.1: Setup\n\n- [x] T001 Update project dependencies in pyproject.toml to ensure PyTorch compatibility for new Roformer parameters\n- [x] T002 [P] Configure linting rules in pyproject.toml for new Roformer implementation files\n- [x] T003 [P] Create backup of existing uvr_lib_v5/roformer/ directory for rollback safety\n\n## Phase 3.2: Tests First (TDD) ⚠️ MUST COMPLETE BEFORE 3.3\n\n**CRITICAL: These tests MUST be written and MUST FAIL before ANY implementation**\n\n### Contract Tests\n- [x] T004 [P] Contract test RoformerLoaderInterface.load_model in tests/contract/test_roformer_loader_interface.py\n- [x] T005 [P] Contract test RoformerLoaderInterface.validate_configuration in tests/contract/test_roformer_loader_interface.py\n- [x] T006 [P] Contract test ParameterValidatorInterface.validate_required_parameters in tests/contract/test_parameter_validator_interface.py\n- [x] T007 [P] Contract test ParameterValidatorInterface.validate_normalization_config in tests/contract/test_parameter_validator_interface.py\n- [x] T008 [P] Contract test FallbackLoaderInterface.try_new_implementation in tests/contract/test_fallback_loader_interface.py\n\n### Integration Tests (from quickstart scenarios)\n- [x] T009 [P] Integration test existing older model compatibility in tests/integration/test_roformer_backward_compatibility.py\n- [x] T010 [P] Integration test newer model with new parameters in tests/integration/test_roformer_new_parameters.py\n- [x] T011 [P] Integration test model type switching (BSRoformer ↔ MelBandRoformer) in tests/integration/test_roformer_model_switching.py\n- [x] T012 [P] Integration test configuration validation error handling in tests/integration/test_roformer_config_validation.py\n- [x] T013 [P] Integration test fallback mechanism activation in tests/integration/test_roformer_fallback_mechanism.py\n\n### Audio Quality Validation Tests\n- [x] T014 [P] Audio quality regression test for existing BSRoformer models in tests/integration/test_roformer_audio_quality.py\n- [x] T015 [P] Audio quality validation test for MelBandRoformer models in tests/integration/test_roformer_audio_quality.py\n\n### Roformer Chunking and Overlap Unit Tests\n- [x] T051 [P] Assert chunk_size uses model.stft_hop_length: tests/unit/test_mdxc_roformer_chunking.py::test_chunk_size_uses_model_stft_hop_length\n- [x] T052 [P] Fallback to audio.hop_length if model.stft_hop_length missing: tests/unit/test_mdxc_roformer_chunking.py::test_chunk_size_falls_back_to_audio_hop_length\n- [x] T053 [P] Step clamped to chunk_size (desired_step > chunk_size or ≤ 0): tests/unit/test_mdxc_roformer_chunking.py::test_step_clamped_to_chunk_size\n- [x] T054 [P] overlap_add handles shorter model output safely (safe_len): tests/unit/test_mdxc_roformer_chunking.py::test_overlap_add_short_output_safe\n- [x] T055 [P] Counter increments match overlap_add safe span: tests/unit/test_mdxc_roformer_chunking.py::test_counter_updates_safe_len\n- [x] T056 [P] No NaN/inf on normalization (counter clamp): tests/unit/test_mdxc_roformer_chunking.py::test_counter_clamp_no_nan\n- [x] T057 [P] Short-audio last-block path works and preserves length: tests/unit/test_mdxc_roformer_chunking.py::test_short_audio_last_block\n- [x] T058 [P] Parametrized invariants across dim_t and hop configs: tests/unit/test_mdxc_roformer_chunking.py::test_parametrized_shape_invariants\n\n### Roformer Detection Tests\n- [x] T059 [P] YAML path containing \"roformer\" sets is_roformer and routes Roformer path: tests/unit/test_separator_detection.py::test_is_roformer_set_from_yaml_path\n\n### Roformer Integration (E2E)\n- [x] T060 [P] New BSRoformer SW-Fixed end-to-end separation succeeds: tests/integration/test_roformer_e2e.py::test_bs_roformer_sw_fixed_e2e\n- [x] T061 [P] Legacy Roformer end-to-end separation still succeeds: tests/integration/test_roformer_e2e.py::test_legacy_roformer_e2e\n\n### Regression: Size mismatch family\n- [x] T062 [P] Reproduce shorter outputs (N∈{1,16,32,236}) and assert no broadcast errors, output length preserved: tests/regression/test_roformer_size_mismatch.py::test_overlap_add_safe_lengths\n\n### Logging and Performance Sanity\n- [x] T063 [P] Logs include hop/step sources (stft_hop_length, dim_t, desired vs actual step): tests/unit/test_mdxc_roformer_chunking.py::test_logging_for_hop_and_step\n- [x] T064 [P] Iteration count reasonable (ceil calculation within ±1): tests/unit/test_mdxc_roformer_chunking.py::test_iteration_count_reasonable\n\n## Phase 3.3: Core Implementation (ONLY after tests are failing)\n\n### Data Models and Configuration\n- [x] T016 [P] Implement ModelConfiguration dataclass in audio_separator/separator/roformer/model_configuration.py\n- [x] T017 [P] Implement BSRoformerConfig class in audio_separator/separator/roformer/bs_roformer_config.py\n- [x] T018 [P] Implement MelBandRoformerConfig class in audio_separator/separator/roformer/mel_band_roformer_config.py\n- [x] T019 [P] Implement ModelLoadingResult dataclass in audio_separator/separator/roformer/model_loading_result.py\n- [x] T020 [P] Implement ParameterValidationError exception in audio_separator/separator/roformer/parameter_validation_error.py\n\n### Parameter Validation System\n- [x] T021 [P] Implement ParameterValidator class in audio_separator/separator/roformer/parameter_validator.py\n- [x] T022 [P] Implement BSRoformerValidator class in audio_separator/separator/roformer/bs_roformer_validator.py\n- [x] T023 [P] Implement MelBandRoformerValidator class in audio_separator/separator/roformer/mel_band_roformer_validator.py\n- [x] T024 [P] Implement ConfigurationNormalizer class in audio_separator/separator/roformer/configuration_normalizer.py\n\n### Updated Roformer Models\n- [x] T025 Update BSRoformer.__init__ method in audio_separator/separator/uvr_lib_v5/roformer/bs_roformer.py to add new parameters (mlp_expansion_factor, sage_attention, zero_dc, use_torch_checkpoint, skip_connection)\n- [x] T026 Update MelBandRoformer.__init__ method in audio_separator/separator/uvr_lib_v5/roformer/mel_band_roformer.py to add new parameters\n- [x] T027 Update transformer_kwargs in BSRoformer to include sage_attention parameter\n- [x] T028 Update transformer_kwargs in MelBandRoformer to include sage_attention parameter\n\n### Normalization Fixes\n- [x] T029 Update get_norm function in audio_separator/separator/uvr_lib_v5/tfc_tdf_v3.py to handle missing norm attributes gracefully\n- [x] T030 Add defensive checks in TFC_TDF_net.__init__ for normalization configuration\n\n### Fallback Loading System\n- [x] T031 [P] Implement RoformerLoader class with fallback mechanism in audio_separator/separator/roformer/roformer_loader.py\n- [x] T032 [P] Implement FallbackLoader class in audio_separator/separator/roformer/fallback_loader.py\n- [x] T033 Update RoformerSeparator class in audio_separator/separator/architectures/roformer_separator.py to use new loading system\n\n## Phase 3.4: Integration\n\n- [x] T034 Integrate new RoformerLoader into CommonSeparator base class in audio_separator/separator/common_separator.py\n- [x] T035 Update CLI model loading logic to use new fallback mechanism in audio_separator/separator/separator.py\n- [x] T036 Add logging for implementation version used (old/new/fallback) in audio_separator/separator/roformer/roformer_loader.py\n- [x] T037 Update error handling and user messages for configuration validation failures\n- [x] T038 Integrate with existing integration test framework (test_cli_integration.py compatibility)\n\n## Phase 3.5: Polish\n\n### Unit Tests\n- [x] T039 [P] Unit tests for ModelConfiguration validation in tests/unit/test_model_configuration.py\n- [x] T040 [P] Unit tests for ParameterValidator methods in tests/unit/test_parameter_validator.py\n- [x] T041 [P] Unit tests for ConfigurationNormalizer methods in tests/unit/test_configuration_normalizer.py\n- [x] T042 [P] Unit tests for fallback mechanism logic in tests/unit/test_fallback_loader.py\n\n### Performance and Quality Validation\n- [x] T043 Run existing integration tests to ensure no regression in audio quality (SSIM ≥ 0.90/0.80)\n- [x] T044 Performance benchmark for model loading time (should not significantly increase)\n- [x] T045 Memory usage validation for new parameter handling\n- [x] T046 [P] Update documentation in audio_separator/separator/roformer/README.md for new parameters\n\n### Manual Testing\n- [x] T047 Manual test with existing BSRoformer models from integration test suite\n- [x] T048 Manual test with existing MelBandRoformer models from integration test suite  \n- [x] T049 Manual test with newer models containing new parameters (if available)\n- [x] T050 Manual validation of error messages for common configuration issues\n\n## Dependencies\n\n### Critical Dependencies (TDD)\n- Tests (T004-T015) MUST complete and FAIL before implementation (T016-T033)\n- T001-T003 (setup) before everything else\n- T016-T020 (data models) before T021-T024 (validators)\n- T025-T028 (model updates) before T031-T033 (loaders)\n- T029-T030 (normalization fixes) before T031-T033 (loaders)\n\n### Implementation Dependencies\n- T016 (ModelConfiguration) blocks T017, T018, T021-T024\n- T020 (ParameterValidationError) blocks T021-T024\n- T025-T030 (model updates) before T031-T033 (loading system)\n- T031-T033 (loading system) before T034-T038 (integration)\n- Implementation (T016-T038) before polish (T039-T050)\n\n## Parallel Execution Examples\n\n### Phase 3.2 - Contract Tests (can run simultaneously)\n```\nTask: \"Contract test RoformerLoaderInterface.load_model in tests/contract/test_roformer_loader_interface.py\"\nTask: \"Contract test ParameterValidatorInterface.validate_required_parameters in tests/contract/test_parameter_validator_interface.py\" \nTask: \"Integration test existing older model compatibility in tests/integration/test_roformer_backward_compatibility.py\"\nTask: \"Integration test newer model with new parameters in tests/integration/test_roformer_new_parameters.py\"\n```\n\n### Phase 3.3 - Data Models (can run simultaneously)\n```\nTask: \"Implement ModelConfiguration dataclass in audio_separator/separator/roformer/model_configuration.py\"\nTask: \"Implement BSRoformerConfig class in audio_separator/separator/roformer/bs_roformer_config.py\"\nTask: \"Implement MelBandRoformerConfig class in audio_separator/separator/roformer/mel_band_roformer_config.py\"\nTask: \"Implement ParameterValidationError exception in audio_separator/separator/roformer/parameter_validation_error.py\"\n```\n\n### Phase 3.5 - Unit Tests (can run simultaneously)\n```\nTask: \"Unit tests for ModelConfiguration validation in tests/unit/test_model_configuration.py\"\nTask: \"Unit tests for ParameterValidator methods in tests/unit/test_parameter_validator.py\"\nTask: \"Unit tests for ConfigurationNormalizer methods in tests/unit/test_configuration_normalizer.py\"\nTask: \"Unit tests for fallback mechanism logic in tests/unit/test_fallback_loader.py\"\n```\n\n## Notes\n\n- [P] tasks = different files, no dependencies, can run in parallel\n- All tests must be written first and must fail before implementation (TDD)\n- Commit after each task completion\n- Manual testing (T047-T050) requires actual model files\n- Integration with existing test framework ensures no regression\n- Fallback mechanism provides safety net for edge cases\n\n## Validation Checklist\n\n- [x] All contracts have corresponding tests (T004-T008)\n- [x] All entities have model tasks (T016-T020)\n- [x] All tests come before implementation (Phase 3.2 before 3.3)\n- [x] Parallel tasks are truly independent (different files)\n- [x] Each task specifies exact file path\n- [x] No task modifies same file as another [P] task\n- [x] TDD ordering enforced (tests fail before implementation)\n- [x] Integration scenarios from quickstart covered (T009-T013)\n- [x] Audio quality validation included (T014-T015, T043)\n- [x] Manual testing milestones defined (T047-T050)\n"
  },
  {
    "path": "specs/main/plan.md",
    "content": "\n# Implementation Plan: [FEATURE]\n\n**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]\n**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`\n\n## Execution Flow (/plan command scope)\n```\n1. Load feature spec from Input path\n   → If not found: ERROR \"No feature spec at {path}\"\n2. Fill Technical Context (scan for NEEDS CLARIFICATION)\n   → Detect Project Type from context (web=frontend+backend, mobile=app+api)\n   → Set Structure Decision based on project type\n3. Fill the Constitution Check section based on the content of the constitution document.\n4. Evaluate Constitution Check section below\n   → If violations exist: Document in Complexity Tracking\n   → If no justification possible: ERROR \"Simplify approach first\"\n   → Update Progress Tracking: Initial Constitution Check\n5. Execute Phase 0 → research.md\n   → If NEEDS CLARIFICATION remain: ERROR \"Resolve unknowns\"\n6. Execute Phase 1 → contracts, data-model.md, quickstart.md, agent-specific template file (e.g., `CLAUDE.md` for Claude Code, `.github/copilot-instructions.md` for GitHub Copilot, `GEMINI.md` for Gemini CLI, `QWEN.md` for Qwen Code or `AGENTS.md` for opencode).\n7. Re-evaluate Constitution Check section\n   → If new violations: Refactor design, return to Phase 1\n   → Update Progress Tracking: Post-Design Constitution Check\n8. Plan Phase 2 → Describe task generation approach (DO NOT create tasks.md)\n9. STOP - Ready for /tasks command\n```\n\n**IMPORTANT**: The /plan command STOPS at step 7. Phases 2-4 are executed by other commands:\n- Phase 2: /tasks command creates tasks.md\n- Phase 3-4: Implementation execution (manual or via tools)\n\n## Summary\n[Extract from feature spec: primary requirement + technical approach from research]\n\n## Technical Context\n**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]  \n**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]  \n**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]  \n**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]  \n**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]\n**Project Type**: [single/web/mobile - determines source structure]  \n**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]  \n**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]  \n**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]\n\n## Constitution Check\n*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*\n\n### I. Library-First Architecture\n- [ ] Core functionality implemented in `Separator` class or similar library pattern\n- [ ] CLI/Remote API are thin wrappers, not containing business logic\n- [ ] Clear separation between model architectures (MDX, VR, Demucs, MDXC)\n\n### II. Multi-Interface Consistency  \n- [ ] Feature accessible via Python API, CLI, and Remote API (if applicable)\n- [ ] Parameter names identical across all interfaces\n- [ ] Same model architectures supported across interfaces\n\n### III. Test-First Development (NON-NEGOTIABLE)\n- [ ] Tests written before implementation\n- [ ] Unit tests for all core functionality\n- [ ] Integration tests with audio validation (SSIM comparison)\n- [ ] CLI tests for all exposed functionality\n\n### IV. Performance & Resource Efficiency\n- [ ] Hardware acceleration support considered (CUDA, CoreML, DirectML)\n- [ ] Memory optimization for large files (streaming/batch processing)\n- [ ] Tunable parameters for different hardware capabilities\n\n### V. Model Architecture Separation\n- [ ] Each architecture in separate modules\n- [ ] Inherits from `CommonSeparator` pattern\n- [ ] Architecture-specific parameters isolated\n- [ ] Loading one architecture doesn't load others\n\n## Project Structure\n\n### Documentation (this feature)\n```\nspecs/[###-feature]/\n├── plan.md              # This file (/plan command output)\n├── research.md          # Phase 0 output (/plan command)\n├── data-model.md        # Phase 1 output (/plan command)\n├── quickstart.md        # Phase 1 output (/plan command)\n├── contracts/           # Phase 1 output (/plan command)\n└── tasks.md             # Phase 2 output (/tasks command - NOT created by /plan)\n```\n\n### Source Code (repository root)\n```\n# Option 1: Single project (DEFAULT)\nsrc/\n├── models/\n├── services/\n├── cli/\n└── lib/\n\ntests/\n├── contract/\n├── integration/\n└── unit/\n\n# Option 2: Web application (when \"frontend\" + \"backend\" detected)\nbackend/\n├── src/\n│   ├── models/\n│   ├── services/\n│   └── api/\n└── tests/\n\nfrontend/\n├── src/\n│   ├── components/\n│   ├── pages/\n│   └── services/\n└── tests/\n\n# Option 3: Mobile + API (when \"iOS/Android\" detected)\napi/\n└── [same as backend above]\n\nios/ or android/\n└── [platform-specific structure]\n```\n\n**Structure Decision**: [DEFAULT to Option 1 unless Technical Context indicates web/mobile app]\n\n## Phase 0: Outline & Research\n1. **Extract unknowns from Technical Context** above:\n   - For each NEEDS CLARIFICATION → research task\n   - For each dependency → best practices task\n   - For each integration → patterns task\n\n2. **Generate and dispatch research agents**:\n   ```\n   For each unknown in Technical Context:\n     Task: \"Research {unknown} for {feature context}\"\n   For each technology choice:\n     Task: \"Find best practices for {tech} in {domain}\"\n   ```\n\n3. **Consolidate findings** in `research.md` using format:\n   - Decision: [what was chosen]\n   - Rationale: [why chosen]\n   - Alternatives considered: [what else evaluated]\n\n**Output**: research.md with all NEEDS CLARIFICATION resolved\n\n## Phase 1: Design & Contracts\n*Prerequisites: research.md complete*\n\n1. **Extract entities from feature spec** → `data-model.md`:\n   - Entity name, fields, relationships\n   - Validation rules from requirements\n   - State transitions if applicable\n\n2. **Generate API contracts** from functional requirements:\n   - For each user action → endpoint\n   - Use standard REST/GraphQL patterns\n   - Output OpenAPI/GraphQL schema to `/contracts/`\n\n3. **Generate contract tests** from contracts:\n   - One test file per endpoint\n   - Assert request/response schemas\n   - Tests must fail (no implementation yet)\n\n4. **Extract test scenarios** from user stories:\n   - Each story → integration test scenario\n   - Quickstart test = story validation steps\n\n5. **Update agent file incrementally** (O(1) operation):\n   - Run `.specify/scripts/bash/update-agent-context.sh cursor`\n     **IMPORTANT**: Execute it exactly as specified above. Do not add or remove any arguments.\n   - If exists: Add only NEW tech from current plan\n   - Preserve manual additions between markers\n   - Update recent changes (keep last 3)\n   - Keep under 150 lines for token efficiency\n   - Output to repository root\n\n**Output**: data-model.md, /contracts/*, failing tests, quickstart.md, agent-specific file\n\n## Phase 2: Task Planning Approach\n*This section describes what the /tasks command will do - DO NOT execute during /plan*\n\n**Task Generation Strategy**:\n- Load `.specify/templates/tasks-template.md` as base\n- Generate tasks from Phase 1 design docs (contracts, data model, quickstart)\n- Each contract → contract test task [P]\n- Each entity → model creation task [P] \n- Each user story → integration test task\n- Implementation tasks to make tests pass\n\n**Ordering Strategy**:\n- TDD order: Tests before implementation \n- Dependency order: Models before services before UI\n- Mark [P] for parallel execution (independent files)\n\n**Estimated Output**: 25-30 numbered, ordered tasks in tasks.md\n\n**IMPORTANT**: This phase is executed by the /tasks command, NOT by /plan\n\n## Phase 3+: Future Implementation\n*These phases are beyond the scope of the /plan command*\n\n**Phase 3**: Task execution (/tasks command creates tasks.md)  \n**Phase 4**: Implementation (execute tasks.md following constitutional principles)  \n**Phase 5**: Validation (run tests, execute quickstart.md, performance validation)\n\n## Complexity Tracking\n*Fill ONLY if Constitution Check has violations that must be justified*\n\n| Violation | Why Needed | Simpler Alternative Rejected Because |\n|-----------|------------|-------------------------------------|\n| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |\n| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |\n\n\n## Progress Tracking\n*This checklist is updated during execution flow*\n\n**Phase Status**:\n- [ ] Phase 0: Research complete (/plan command)\n- [ ] Phase 1: Design complete (/plan command)\n- [ ] Phase 2: Task planning complete (/plan command - describe approach only)\n- [ ] Phase 3: Tasks generated (/tasks command)\n- [ ] Phase 4: Implementation complete\n- [ ] Phase 5: Validation passed\n\n**Gate Status**:\n- [ ] Initial Constitution Check: PASS\n- [ ] Post-Design Constitution Check: PASS\n- [ ] All NEEDS CLARIFICATION resolved\n- [ ] Complexity deviations documented\n\n---\n*Based on Constitution v1.0.0 - See `.specify/memory/constitution.md`*\n"
  },
  {
    "path": "tests/README.md",
    "content": "# Audio Separator Tests\n\nThis directory contains tests for the audio-separator project.\n\n## Test Structure\n\n### Unit Tests (`tests/unit/`)\n- **Core functionality tests**: `test_cli.py`, `test_stft.py`\n- **Remote API tests**: `test_remote_api_client.py`, `test_remote_cli.py`\n\n### Integration Tests (`tests/integration/`)\n- **CLI integration tests**: `test_cli_integration.py`\n- **Audio output validation**: `test_separator_output_integration.py`\n- **Remote API integration tests**: `test_remote_api_integration.py`\n\n## Remote API Tests\n\nThe project includes comprehensive tests for the remote API functionality:\n\n- **61 tests** covering all remote API components\n- **Unit tests** with mocked HTTP responses\n- **Integration tests** with a mock HTTP server\n- **End-to-end workflow tests**\n\nKey features tested:\n- Multiple model support (upload once, process with multiple models)\n- Full parameter compatibility with local CLI\n- Asynchronous processing with job polling\n- Error handling and edge cases\n- File upload/download workflows\n\nSee `tests/remote/README.md` for detailed documentation on remote API tests.\n\n## Audio Validation\n\nThe integration tests now include validation of output audio files by comparing waveform and spectrogram images with reference images. This helps ensure that the audio separation results remain consistent across different runs and code changes.\n\n### How It Works\n\n1. Reference waveform and spectrogram images are generated from expected output files\n2. During test execution, the same images are generated for the actual output files\n3. The images are compared using Structural Similarity Index (SSIM) to measure perceptual similarity\n4. If the images differ significantly, the test fails, indicating a change in the audio output\n\n### Image Comparison with SSIM\n\nThe tests use Structural Similarity Index Measure (SSIM) to compare images, which is more robust than pixel-by-pixel comparison:\n\n- SSIM considers structural information in the images\n- It's more resilient to small spatial shifts or offsets\n- It better matches human perception of image similarity\n- It works well across different environments (local machines vs CI servers)\n\nThe comparison uses a minimum similarity threshold (0.0-1.0):\n- **Higher values** (closer to 1.0) require images to be **more similar**\n- **Lower values** (closer to 0.0) are **more permissive**\n- A value of 0.99 requires 99% similarity between images\n- A value of 0.0 would consider any images to match\n\nThe default threshold is set to 0.999, which is quite strict. However, model-specific thresholds can be set to accommodate different models' behavior.\n\n#### Model-Specific Thresholds\n\nSome models inherently produce slightly different outputs on different runs, even with the same input. To accommodate these models, you can set model-specific thresholds in the `MODEL_SIMILARITY_THRESHOLDS` dictionary:\n\n```python\nMODEL_SIMILARITY_THRESHOLDS = {\n    \"htdemucs_6s.yaml\": 0.990,  # Demucs models need a lower threshold\n    # Add other models that need custom thresholds here\n}\n```\n\nThis allows you to maintain a high threshold for most models while being more flexible with models that naturally exhibit more variation.\n\n### Generating Reference Images\n\nTo generate or update the reference images, use the script provided:\n\n```bash\npython tests/integration/generate_reference_images.py\n```\n\nThis script will create waveform and spectrogram images for all expected output files and store them in the `tests/inputs/reference` directory.\n\n### Skipping Validation\n\nIf you need to skip the audio validation (e.g., when you're intentionally changing the output), you can set the environment variable `SKIP_AUDIO_VALIDATION=1`:\n\n```bash\nSKIP_AUDIO_VALIDATION=1 pytest tests/integration/test_cli_integration.py\n```\n\n### Adding New Models\n\nWhen adding a new model to the tests:\n\n1. Add the model and its expected output files to the `MODEL_PARAMS` list in `test_cli_integration.py`\n2. Run the integration test to generate the output files\n3. Run the `generate_reference_images.py` script to create the reference images\n4. Run the tests again to validate the output files\n5. If necessary, add a custom similarity threshold for the new model in `MODEL_SIMILARITY_THRESHOLDS`\n\n### Debugging\n\nTo see detailed validation results, run pytest with the `-sv` flag:\n\n```bash\npytest tests/integration/test_cli_integration.py -sv\n```\n\nThis will show the similarity scores for each comparison and whether they passed or failed.\n\n## Running Tests\n\nTo run all tests:\n\n```bash\npytest\n```\n\nTo run specific tests:\n\n```bash\npytest tests/unit/\npytest tests/integration/\n```\n\nTo run remote API tests:\n\n```bash\n# All remote API tests (unit + integration)\npytest tests/unit/test_remote*.py tests/integration/test_remote*.py -v\n\n# Only unit tests (fast, no HTTP server)\npytest tests/unit/test_remote*.py -v\n\n# Only integration tests (with mock HTTP server)\npytest tests/integration/test_remote*.py -v\n```\n\nTo run with coverage:\n\n```bash\npytest --cov=audio_separator\n\n# Coverage for remote API specifically\npytest tests/unit/test_remote*.py tests/integration/test_remote*.py --cov=audio_separator.remote --cov-report=html\n``` "
  },
  {
    "path": "tests/TODO.txt",
    "content": "- Test running CLI with minimal input file for each major supported model (e.g. at least 1 from each architecture) outputs expected files\n- Test running CLI with pre-warmed cache directory containing model files does not repeat download\n- Test running CLI with corrupt model files in cache directory throws expected error\n- Test loading the separation class rather than using the CLI works as expected with all major supported models\n- Test processing multiple files in a row outputs separate expected files\n- Test processing file with multiple different models outputs separate expected files\n- Test each of the architecure specific parameters works as expected in both CLI and class mode\n- Generate oscillogram and spectrogram of model output for a short test file for each major supported model and compare to expected output to ensure separation is actually separating stems\n- Add a few different test files with different properties, e.g. background noise, stems present, or genre of music and ensure separation works as expected for each\n- Test that processing more than one distinct input file in sequence outputs separate files (not overwriting the first output with the second)\n- Test that every model returned by list_models actually works\n- Test each model architecture at least once using each supported Python version, to ensure we actually do legitimately support all of them!\n- Test that loading a model of one architecture does NOT load the code from any other architecture\n"
  },
  {
    "path": "tests/contract/test_parameter_validator_interface.py",
    "content": "\"\"\"\nContract tests for ParameterValidatorInterface.\nThese tests verify the parameter validation interface contracts.\n\"\"\"\n\nimport pytest\nfrom unittest.mock import Mock\nfrom typing import Dict, Any, List\n\n# Import interfaces from contracts\nimport sys\nimport os\n\n# Find project root dynamically\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\n# Go up until we find the project root (contains specs/ directory)\nwhile project_root and not os.path.exists(os.path.join(project_root, 'specs')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\ncontracts_path = os.path.join(project_root, 'specs', '001-update-roformer-implementation', 'contracts')\nif os.path.exists(contracts_path):\n    sys.path.append(contracts_path)\n\nfrom parameter_validator_interface import (\n    ParameterValidatorInterface,\n    BSRoformerValidatorInterface,\n    MelBandRoformerValidatorInterface,\n    ConfigurationNormalizerInterface,\n    ValidationIssue,\n    ValidationSeverity\n)\n\n\nclass TestParameterValidatorInterface:\n    \"\"\"Test the ParameterValidator interface contract.\"\"\"\n    \n    def test_validate_required_parameters_contract(self):\n        \"\"\"Test validate_required_parameters interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock valid configuration - should return no issues\n        validator.validate_required_parameters.return_value = []\n        \n        config = {\"dim\": 512, \"depth\": 6}\n        issues = validator.validate_required_parameters(config, \"bs_roformer\")\n        \n        assert isinstance(issues, list)\n        assert len(issues) == 0  # Valid config should have no issues\n        \n        # Mock invalid configuration - missing required parameters\n        missing_param_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"dim\",\n            message=\"Required parameter 'dim' is missing\",\n            suggested_fix=\"Add 'dim' parameter with positive integer value\",\n            current_value=None,\n            expected_value=\"positive integer\"\n        )\n        validator.validate_required_parameters.return_value = [missing_param_issue]\n        \n        incomplete_config = {\"depth\": 6}  # Missing dim\n        issues = validator.validate_required_parameters(incomplete_config, \"bs_roformer\")\n        \n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert all(isinstance(issue, ValidationIssue) for issue in issues)\n        assert issues[0].severity == ValidationSeverity.ERROR\n        assert issues[0].parameter_name == \"dim\"\n    \n    def test_validate_parameter_types_contract(self):\n        \"\"\"Test validate_parameter_types interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock type validation error\n        type_error_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"dim\",\n            message=\"Parameter 'dim' must be an integer\",\n            suggested_fix=\"Change 'dim' value to an integer\",\n            current_value=\"512\",  # String instead of int\n            expected_value=\"integer\"\n        )\n        validator.validate_parameter_types.return_value = [type_error_issue]\n        \n        config = {\"dim\": \"512\", \"depth\": 6}  # dim should be int, not string\n        issues = validator.validate_parameter_types(config)\n        \n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert issues[0].severity == ValidationSeverity.ERROR\n        assert issues[0].parameter_name == \"dim\"\n        assert issues[0].current_value == \"512\"\n    \n    def test_validate_parameter_ranges_contract(self):\n        \"\"\"Test validate_parameter_ranges interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock range validation error\n        range_error_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"attn_dropout\",\n            message=\"Parameter 'attn_dropout' must be between 0.0 and 1.0\",\n            suggested_fix=\"Set 'attn_dropout' to a value between 0.0 and 1.0\",\n            current_value=1.5,\n            expected_value=\"0.0 <= value <= 1.0\"\n        )\n        validator.validate_parameter_ranges.return_value = [range_error_issue]\n        \n        config = {\"dim\": 512, \"depth\": 6, \"attn_dropout\": 1.5}  # Invalid range\n        issues = validator.validate_parameter_ranges(config)\n        \n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert issues[0].severity == ValidationSeverity.ERROR\n        assert issues[0].parameter_name == \"attn_dropout\"\n        assert issues[0].current_value == 1.5\n    \n    def test_validate_parameter_compatibility_contract(self):\n        \"\"\"Test validate_parameter_compatibility interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock compatibility validation warning\n        compatibility_issue = ValidationIssue(\n            severity=ValidationSeverity.WARNING,\n            parameter_name=\"sage_attention\",\n            message=\"sage_attention=True may conflict with flash_attn=True\",\n            suggested_fix=\"Consider using only one attention mechanism\",\n            current_value=True,\n            expected_value=\"False when flash_attn=True\"\n        )\n        validator.validate_parameter_compatibility.return_value = [compatibility_issue]\n        \n        config = {\"sage_attention\": True, \"flash_attn\": True}\n        issues = validator.validate_parameter_compatibility(config)\n        \n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert issues[0].severity == ValidationSeverity.WARNING\n        assert \"conflict\" in issues[0].message.lower()\n    \n    def test_validate_normalization_config_contract(self):\n        \"\"\"Test validate_normalization_config interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock valid normalization config\n        validator.validate_normalization_config.return_value = []\n        \n        issues = validator.validate_normalization_config(\"layer_norm\")\n        assert isinstance(issues, list)\n        assert len(issues) == 0\n        \n        # Mock invalid normalization config\n        norm_error_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"norm\",\n            message=\"Unknown normalization type 'invalid_norm'\",\n            suggested_fix=\"Use one of: 'layer_norm', 'batch_norm', 'rms_norm', or None\",\n            current_value=\"invalid_norm\",\n            expected_value=\"valid normalization type\"\n        )\n        validator.validate_normalization_config.return_value = [norm_error_issue]\n        \n        issues = validator.validate_normalization_config(\"invalid_norm\")\n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert issues[0].parameter_name == \"norm\"\n    \n    def test_get_parameter_defaults_contract(self):\n        \"\"\"Test get_parameter_defaults interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock default parameters\n        default_params = {\n            \"mlp_expansion_factor\": 4,\n            \"sage_attention\": False,\n            \"zero_dc\": True,\n            \"use_torch_checkpoint\": False,\n            \"skip_connection\": False\n        }\n        validator.get_parameter_defaults.return_value = default_params\n        \n        defaults = validator.get_parameter_defaults(\"bs_roformer\")\n        \n        assert isinstance(defaults, dict)\n        assert \"mlp_expansion_factor\" in defaults\n        assert defaults[\"mlp_expansion_factor\"] == 4\n        assert isinstance(defaults[\"sage_attention\"], bool)\n    \n    def test_apply_parameter_defaults_contract(self):\n        \"\"\"Test apply_parameter_defaults interface contract.\"\"\"\n        validator = Mock(spec=ParameterValidatorInterface)\n        \n        # Mock config with defaults applied\n        input_config = {\"dim\": 512, \"depth\": 6}\n        expected_config = {\n            \"dim\": 512,\n            \"depth\": 6,\n            \"mlp_expansion_factor\": 4,\n            \"sage_attention\": False,\n            \"zero_dc\": True\n        }\n        validator.apply_parameter_defaults.return_value = expected_config\n        \n        result_config = validator.apply_parameter_defaults(input_config, \"bs_roformer\")\n        \n        assert isinstance(result_config, dict)\n        assert \"dim\" in result_config\n        assert \"mlp_expansion_factor\" in result_config\n        assert result_config[\"dim\"] == 512\n        assert result_config[\"mlp_expansion_factor\"] == 4\n\n\nclass TestBSRoformerValidatorInterface:\n    \"\"\"Test the BSRoformer-specific validator interface contract.\"\"\"\n    \n    def test_validate_freqs_per_bands_contract(self):\n        \"\"\"Test validate_freqs_per_bands interface contract.\"\"\"\n        validator = Mock(spec=BSRoformerValidatorInterface)\n        \n        # Mock validation error for mismatched frequency bands\n        freq_error_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"freqs_per_bands\",\n            message=\"Sum of freqs_per_bands (126) does not match expected frequency bins (129)\",\n            suggested_fix=\"Adjust freqs_per_bands to sum to 129\",\n            current_value=(2, 4, 8, 16, 32, 64),  # sums to 126\n            expected_value=\"tuple summing to 129\"\n        )\n        validator.validate_freqs_per_bands.return_value = [freq_error_issue]\n        \n        freqs_per_bands = (2, 4, 8, 16, 32, 64)\n        stft_config = {\"n_fft\": 256}\n        issues = validator.validate_freqs_per_bands(freqs_per_bands, stft_config)\n        \n        assert isinstance(issues, list)\n        assert len(issues) > 0\n        assert issues[0].parameter_name == \"freqs_per_bands\"\n        assert \"sum\" in issues[0].message.lower()\n    \n    def test_calculate_expected_freqs_contract(self):\n        \"\"\"Test calculate_expected_freqs interface contract.\"\"\"\n        validator = Mock(spec=BSRoformerValidatorInterface)\n        \n        # Mock expected frequency calculation\n        validator.calculate_expected_freqs.return_value = 129  # n_fft//2 + 1 for n_fft=256\n        \n        expected_freqs = validator.calculate_expected_freqs(256)\n        \n        assert isinstance(expected_freqs, int)\n        assert expected_freqs > 0\n\n\nclass TestMelBandRoformerValidatorInterface:\n    \"\"\"Test the MelBandRoformer-specific validator interface contract.\"\"\"\n    \n    def test_validate_num_bands_contract(self):\n        \"\"\"Test validate_num_bands interface contract.\"\"\"\n        validator = Mock(spec=MelBandRoformerValidatorInterface)\n        \n        # Mock successful validation\n        validator.validate_num_bands.return_value = []\n        \n        issues = validator.validate_num_bands(64, 44100)\n        assert isinstance(issues, list)\n        assert len(issues) == 0\n        \n        # Mock validation error for invalid band count\n        band_error_issue = ValidationIssue(\n            severity=ValidationSeverity.ERROR,\n            parameter_name=\"num_bands\",\n            message=\"num_bands must be positive integer\",\n            suggested_fix=\"Set num_bands to a positive integer value\",\n            current_value=0,\n            expected_value=\"positive integer\"\n        )\n        validator.validate_num_bands.return_value = [band_error_issue]\n        \n        issues = validator.validate_num_bands(0, 44100)\n        assert len(issues) > 0\n        assert issues[0].parameter_name == \"num_bands\"\n    \n    def test_validate_sample_rate_contract(self):\n        \"\"\"Test validate_sample_rate interface contract.\"\"\"\n        validator = Mock(spec=MelBandRoformerValidatorInterface)\n        \n        # Mock sample rate validation\n        validator.validate_sample_rate.return_value = []\n        \n        issues = validator.validate_sample_rate(44100)\n        assert isinstance(issues, list)\n        assert len(issues) == 0\n\n\nclass TestConfigurationNormalizerInterface:\n    \"\"\"Test the ConfigurationNormalizer interface contract.\"\"\"\n    \n    def test_normalize_config_format_contract(self):\n        \"\"\"Test normalize_config_format interface contract.\"\"\"\n        normalizer = Mock(spec=ConfigurationNormalizerInterface)\n        \n        # Mock normalization from object to dict\n        normalized_config = {\"dim\": 512, \"depth\": 6}\n        normalizer.normalize_config_format.return_value = normalized_config\n        \n        # Test with mock object input\n        raw_config = Mock()\n        result = normalizer.normalize_config_format(raw_config)\n        \n        assert isinstance(result, dict)\n        assert \"dim\" in result\n        assert \"depth\" in result\n    \n    def test_map_legacy_parameters_contract(self):\n        \"\"\"Test map_legacy_parameters interface contract.\"\"\"\n        normalizer = Mock(spec=ConfigurationNormalizerInterface)\n        \n        # Mock legacy parameter mapping\n        legacy_config = {\"old_param_name\": 512}\n        mapped_config = {\"dim\": 512}  # old_param_name -> dim\n        normalizer.map_legacy_parameters.return_value = mapped_config\n        \n        result = normalizer.map_legacy_parameters(legacy_config)\n        \n        assert isinstance(result, dict)\n        assert \"dim\" in result\n        assert \"old_param_name\" not in result\n    \n    def test_extract_nested_config_contract(self):\n        \"\"\"Test extract_nested_config interface contract.\"\"\"\n        normalizer = Mock(spec=ConfigurationNormalizerInterface)\n        \n        # Mock nested config extraction\n        normalizer.extract_nested_config.return_value = \"layer_norm\"\n        \n        config = {\"model\": {\"norm\": \"layer_norm\"}}\n        result = normalizer.extract_nested_config(config, \"model.norm\")\n        \n        assert result == \"layer_norm\"\n        \n        # Mock missing nested config\n        normalizer.extract_nested_config.return_value = None\n        \n        result = normalizer.extract_nested_config(config, \"model.missing_field\")\n        assert result is None\n\n\n# TDD placeholder test removed - implementation is now complete\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/contract/test_roformer_loader_interface.py",
    "content": "\"\"\"\nContract tests for RoformerLoaderInterface.\nThese tests verify the interface contracts defined in the specification.\n\"\"\"\n\nimport pytest\nimport tempfile\nimport os\nfrom unittest.mock import Mock, patch\nfrom typing import Dict, Any\n\n# Import interfaces from contracts\nimport sys\n\n# Find project root dynamically\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\n# Go up until we find the project root (contains specs/ directory)\nwhile project_root and not os.path.exists(os.path.join(project_root, 'specs')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\ncontracts_path = os.path.join(project_root, 'specs', '001-update-roformer-implementation', 'contracts')\nif os.path.exists(contracts_path):\n    sys.path.append(contracts_path)\n\nfrom roformer_loader_interface import (\n    RoformerLoaderInterface, \n    ModelLoadingResult, \n    ModelConfiguration, \n    RoformerType, \n    ImplementationVersion,\n    ParameterValidationError\n)\n\n\nclass TestRoformerLoaderInterface:\n    \"\"\"Test the RoformerLoader interface contract.\"\"\"\n    \n    def test_load_model_success_contract(self):\n        \"\"\"Test that load_model returns proper ModelLoadingResult on success.\"\"\"\n        # This test MUST FAIL initially - no implementation exists yet\n        \n        # Mock implementation for contract testing\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Configure mock to return expected result structure\n        expected_result = ModelLoadingResult(\n            success=True,\n            model=Mock(),  # Mock model instance\n            error_message=None,\n            implementation_used=ImplementationVersion.NEW,\n            warnings=[]\n        )\n        loader.load_model.return_value = expected_result\n        \n        # Test the contract\n        result = loader.load_model(\"/path/to/model.ckpt\")\n        \n        # Verify contract compliance\n        assert isinstance(result, ModelLoadingResult)\n        assert result.success is True\n        assert result.model is not None\n        assert result.error_message is None\n        assert isinstance(result.implementation_used, ImplementationVersion)\n        assert isinstance(result.warnings, list)\n    \n    def test_load_model_failure_contract(self):\n        \"\"\"Test that load_model returns proper error result on failure.\"\"\"\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Configure mock to return error result\n        expected_result = ModelLoadingResult(\n            success=False,\n            model=None,\n            error_message=\"Model file not found\",\n            implementation_used=ImplementationVersion.NEW,\n            warnings=[\"Warning: fallback attempted\"]\n        )\n        loader.load_model.return_value = expected_result\n        \n        result = loader.load_model(\"/nonexistent/model.ckpt\")\n        \n        # Verify error contract compliance\n        assert isinstance(result, ModelLoadingResult)\n        assert result.success is False\n        assert result.model is None\n        assert result.error_message is not None\n        assert isinstance(result.error_message, str)\n        assert isinstance(result.warnings, list)\n    \n    def test_load_model_parameter_validation_error(self):\n        \"\"\"Test that load_model raises ParameterValidationError for invalid config.\"\"\"\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Configure mock to raise validation error\n        validation_error = ParameterValidationError(\n            parameter_name=\"dim\",\n            expected_type=\"int\",\n            actual_value=\"invalid\",\n            suggested_fix=\"Provide an integer value for dim parameter\"\n        )\n        loader.load_model.side_effect = validation_error\n        \n        with pytest.raises(ParameterValidationError) as exc_info:\n            loader.load_model(\"/path/to/model.ckpt\", config={\"dim\": \"invalid\"})\n        \n        error = exc_info.value\n        assert error.parameter_name == \"dim\"\n        assert error.expected_type == \"int\"\n        assert error.actual_value == \"invalid\"\n        assert \"Provide an integer value\" in error.suggested_fix\n    \n    def test_validate_configuration_contract(self):\n        \"\"\"Test validate_configuration interface contract.\"\"\"\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Mock valid configuration\n        config = ModelConfiguration(dim=512, depth=6)\n        loader.validate_configuration.return_value = []  # No errors\n        \n        errors = loader.validate_configuration(config, RoformerType.BS_ROFORMER)\n        \n        assert isinstance(errors, list)\n        assert len(errors) == 0  # Valid config should return empty list\n        \n        # Mock invalid configuration\n        loader.validate_configuration.return_value = [\"Missing required parameter: freqs_per_bands\"]\n        \n        errors = loader.validate_configuration(config, RoformerType.BS_ROFORMER)\n        assert isinstance(errors, list)\n        assert len(errors) > 0\n        assert all(isinstance(error, str) for error in errors)\n    \n    def test_detect_model_type_contract(self):\n        \"\"\"Test detect_model_type interface contract.\"\"\"\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Mock successful type detection\n        loader.detect_model_type.return_value = RoformerType.BS_ROFORMER\n        \n        model_type = loader.detect_model_type(\"/path/to/bs_roformer.ckpt\")\n        \n        assert isinstance(model_type, RoformerType)\n        assert model_type in [RoformerType.BS_ROFORMER, RoformerType.MEL_BAND_ROFORMER]\n        \n        # Mock type detection failure\n        loader.detect_model_type.side_effect = ValueError(\"Cannot determine model type\")\n        \n        with pytest.raises(ValueError) as exc_info:\n            loader.detect_model_type(\"/path/to/unknown_model.ckpt\")\n        \n        assert \"Cannot determine model type\" in str(exc_info.value)\n    \n    def test_get_default_configuration_contract(self):\n        \"\"\"Test get_default_configuration interface contract.\"\"\"\n        loader = Mock(spec=RoformerLoaderInterface)\n        \n        # Mock default configuration\n        default_config = ModelConfiguration(\n            dim=512, \n            depth=6,\n            freqs_per_bands=(2, 4, 8, 16, 32, 64)\n        )\n        loader.get_default_configuration.return_value = default_config\n        \n        config = loader.get_default_configuration(RoformerType.BS_ROFORMER)\n        \n        assert isinstance(config, ModelConfiguration)\n        assert config.dim > 0\n        assert config.depth > 0\n\n\n# TDD placeholder test removed - implementation is now complete\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/README.md",
    "content": "# Integration Tests\n\nThese tests verify the end-to-end functionality of the audio-separator CLI.\n\n## Test Files\n\n### 16-bit Audio Tests\n- Input: `tests/inputs/mardy20s.flac` (16-bit)\n- Tests: `test_cli_integration.py`\n- Validates separation with various models using 16-bit audio\n\n### 24-bit Audio Tests\n- Input: `tests/inputs/fallen24bit20s.flac` (24-bit)\n- Tests: `test_24bit_preservation.py`\n- Validates that 24-bit input produces 24-bit output\n- Ensures audio quality is preserved during bit depth preservation\n\n## Running the tests\n\nTo run all integration tests:\n\n```bash\npytest tests/integration\n```\n\nTo run only 16-bit tests:\n\n```bash\npytest tests/integration/test_cli_integration.py\n```\n\nTo run only 24-bit preservation tests:\n\n```bash\npytest tests/integration/test_24bit_preservation.py\n```\n\nTo run a specific model test, use pytest's parameter selection:\n\n```bash\n# Run only the kuielab_b_vocals.onnx test (16-bit)\npytest tests/integration/test_cli_integration.py::test_model_separation[kuielab_b_vocals.onnx-expected_files0]\n\n# Run only the BS-Roformer test with 24-bit audio\npytest tests/integration/test_24bit_preservation.py::test_24bit_model_separation[model_bs_roformer_ep_317_sdr_12.9755.ckpt-expected_files0]\n```\n\n## Adding New Model Tests\n\n### For 16-bit tests\nAdd a new entry to the `MODEL_PARAMS` list in `test_cli_integration.py`:\n\n```python\n(\n    \"new_model_filename.onnx\",\n    [\"mardy20s_(Instrumental)_new_model_filename.flac\", \"mardy20s_(Vocals)_new_model_filename.flac\"]\n),\n```\n\n### For 24-bit tests\nAdd a new entry to the `MODEL_PARAMS_24BIT` list in `test_24bit_preservation.py`:\n\n```python\n(\n    \"new_model_filename.onnx\",\n    [\"fallen24bit20s_(Instrumental)_new_model_filename.flac\", \"fallen24bit20s_(Vocals)_new_model_filename.flac\"]\n),\n```\n\n## Generating Reference Images\n\n### For 16-bit tests\n```bash\npython tests/integration/generate_reference_images.py\n```\n\n### For 24-bit tests\n1. First, generate the output files by running audio-separator:\n   ```bash\n   audio-separator -m model_name.ckpt tests/inputs/fallen24bit20s.flac\n   ```\n2. Move the output files to `tests/inputs/`\n3. Generate reference images:\n   ```bash\n   python tests/integration/generate_reference_images_24bit.py\n   ```\n\n## Notes\n\n- These tests use actual audio files and models, and will run the full audio separation process.\n- Tests may take longer to run than unit tests, as they perform actual audio processing.\n- The model files will be automatically downloaded if they don't exist locally.\n- The 16-bit test requires the test audio file at `tests/inputs/mardy20s.flac` to exist.\n- The 24-bit test requires the test audio file at `tests/inputs/fallen24bit20s.flac` to exist.\n- Reference images must be generated before running the tests for the first time. "
  },
  {
    "path": "tests/integration/generate_multi_stem_references.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nGenerate reference stems for multi-stem integration tests.\n\nRuns best-in-class models on test clips to produce reference stems that\nother models' outputs can be compared against.\n\nUsage:\n    python tests/integration/generate_multi_stem_references.py\n\nPipelines:\n    1. All 4 clips → bs_roformer_vocals_resurrection_unwa → vocals + instrumental refs\n    2. levee_drums, clocks_piano → htdemucs_ft → drums/bass/other/vocals refs\n    3. levee_drums → htdemucs_ft drums stem → MDX23C-DrumSep → kit part refs\n    4. levee_drums, clocks_piano → karaoke model → karaoke vocal/instrumental refs\n    5. sing_sing_sing_brass → 17_HP-Wind_Inst-UVR → woodwind refs\n    6. only_time_reverb → resurrection vocals → dereverb → noreverb/reverb refs\n\"\"\"\n\nimport os\nimport sys\nimport shutil\nimport tempfile\n\nsys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))\n\nfrom audio_separator.separator import Separator\n\nINPUTS_DIR = \"tests/inputs\"\nREFERENCE_DIR = \"tests/inputs/reference\"\n\nCLIPS = {\n    \"levee_drums\": f\"{INPUTS_DIR}/levee_drums.flac\",\n    \"clocks_piano\": f\"{INPUTS_DIR}/clocks_piano.flac\",\n    \"sing_sing_sing_brass\": f\"{INPUTS_DIR}/sing_sing_sing_brass.flac\",\n    \"only_time_reverb\": f\"{INPUTS_DIR}/only_time_reverb.flac\",\n}\n\n\ndef run_model(model, input_file, output_dir):\n    \"\"\"Run a model and return output file paths.\"\"\"\n    sep = Separator(output_dir=output_dir, output_format=\"FLAC\")\n    sep.load_model(model)\n    return sep.separate(input_file)\n\n\ndef find_stem_file(output_files, stem_name, output_dir):\n    \"\"\"Find output file matching stem name (uses last parenthesized group for pipeline support).\"\"\"\n    import re\n    for f in output_files:\n        full = f if os.path.isabs(f) else os.path.join(output_dir, f)\n        if not os.path.exists(full):\n            full = os.path.join(output_dir, os.path.basename(f))\n        matches = re.findall(r'_\\(([^)]+)\\)', os.path.basename(f))\n        if matches and matches[-1].lower() == stem_name.lower():\n            return full\n    return None\n\n\ndef main():\n    os.makedirs(REFERENCE_DIR, exist_ok=True)\n    temp_dir = tempfile.mkdtemp(prefix=\"multi-stem-refs-\")\n\n    try:\n        # 1. Vocal/Instrumental on all clips\n        print(\"=== Vocal/Instrumental references (resurrection) ===\")\n        model = \"bs_roformer_vocals_resurrection_unwa.ckpt\"\n        for clip_name, clip_path in CLIPS.items():\n            print(f\"  {clip_name}...\")\n            outputs = run_model(model, clip_path, temp_dir)\n            vocals = find_stem_file(outputs, \"vocals\", temp_dir)\n            inst = find_stem_file(outputs, \"other\", temp_dir)\n            shutil.copy2(vocals, f\"{REFERENCE_DIR}/ref_{clip_name}_vocals.flac\")\n            shutil.copy2(inst, f\"{REFERENCE_DIR}/ref_{clip_name}_instrumental.flac\")\n\n        # 2. htdemucs_ft 4-stem on levee + clocks\n        print(\"\\n=== 4-stem references (htdemucs_ft) ===\")\n        model = \"htdemucs_ft.yaml\"\n        for clip_name in [\"levee_drums\", \"clocks_piano\"]:\n            print(f\"  {clip_name}...\")\n            outputs = run_model(model, CLIPS[clip_name], temp_dir)\n            for stem in [\"Vocals\", \"Drums\", \"Bass\", \"Other\"]:\n                stem_file = find_stem_file(outputs, stem, temp_dir)\n                shutil.copy2(stem_file, f\"{REFERENCE_DIR}/ref_{clip_name}_{stem.lower()}_htdemucs_ft.flac\")\n\n        # 3. DrumSep pipeline: levee drums stem → kit parts\n        print(\"\\n=== DrumSep pipeline references ===\")\n        drums_stem = find_stem_file(\n            run_model(\"htdemucs_ft.yaml\", CLIPS[\"levee_drums\"], temp_dir),\n            \"Drums\", temp_dir\n        )\n        outputs = run_model(\"MDX23C-DrumSep-aufr33-jarredou.ckpt\", drums_stem, temp_dir)\n        for stem in [\"kick\", \"snare\", \"toms\", \"hh\", \"ride\", \"crash\"]:\n            stem_file = find_stem_file(outputs, stem, temp_dir)\n            shutil.copy2(stem_file, f\"{REFERENCE_DIR}/ref_levee_drums_{stem}_drumsep.flac\")\n            print(f\"  {stem}: done\")\n\n        # 4. Karaoke on levee + clocks\n        print(\"\\n=== Karaoke references ===\")\n        model = \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\"\n        for clip_name in [\"levee_drums\", \"clocks_piano\"]:\n            print(f\"  {clip_name}...\")\n            outputs = run_model(model, CLIPS[clip_name], temp_dir)\n            vocals = find_stem_file(outputs, \"Vocals\", temp_dir)\n            inst = find_stem_file(outputs, \"Instrumental\", temp_dir)\n            shutil.copy2(vocals, f\"{REFERENCE_DIR}/ref_{clip_name}_vocals_karaoke.flac\")\n            shutil.copy2(inst, f\"{REFERENCE_DIR}/ref_{clip_name}_instrumental_karaoke.flac\")\n\n        # 5. Wind/Brass\n        print(\"\\n=== Wind instrument references ===\")\n        outputs = run_model(\"17_HP-Wind_Inst-UVR.pth\", CLIPS[\"sing_sing_sing_brass\"], temp_dir)\n        ww = find_stem_file(outputs, \"Woodwinds\", temp_dir)\n        no_ww = find_stem_file(outputs, \"No Woodwinds\", temp_dir)\n        shutil.copy2(ww, f\"{REFERENCE_DIR}/ref_sing_sing_sing_brass_woodwinds.flac\")\n        shutil.copy2(no_ww, f\"{REFERENCE_DIR}/ref_sing_sing_sing_brass_no_woodwinds.flac\")\n\n        # 6. Dereverb pipeline: only_time vocals → dereverb\n        print(\"\\n=== Dereverb pipeline references ===\")\n        vocals_file = f\"{REFERENCE_DIR}/ref_only_time_reverb_vocals.flac\"  # Already generated in step 1\n        outputs = run_model(\"dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt\", vocals_file, temp_dir)\n        noreverb = find_stem_file(outputs, \"noreverb\", temp_dir)\n        reverb = find_stem_file(outputs, \"reverb\", temp_dir)\n        shutil.copy2(noreverb, f\"{REFERENCE_DIR}/ref_only_time_reverb_vocals_noreverb.flac\")\n        shutil.copy2(reverb, f\"{REFERENCE_DIR}/ref_only_time_reverb_vocals_reverb.flac\")\n\n        print(f\"\\nDone! Generated {len([f for f in os.listdir(REFERENCE_DIR) if f.startswith('ref_')])} reference stems.\")\n\n    finally:\n        shutil.rmtree(temp_dir, ignore_errors=True)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "tests/integration/generate_reference_images.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScript to generate reference waveform and spectrogram images for model outputs.\nThis script should be run whenever the expected output files change.\n\"\"\"\n\nimport os\nimport sys\nimport glob\nfrom pathlib import Path\n\n# Add the parent directory to sys.path\nsys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))\n\nfrom tests.utils import generate_reference_images\nfrom tests.integration.test_cli_integration import MODEL_PARAMS\n\ndef main():\n    \"\"\"Generate reference images for all expected model outputs.\"\"\"\n    print(\"Generating reference images for expected model outputs...\")\n    \n    # Get the input file path\n    inputs_dir = Path(__file__).resolve().parent.parent / \"inputs\"\n    input_file = inputs_dir / \"mardy20s.flac\"\n    \n    # Create reference directory if it doesn't exist\n    reference_dir = inputs_dir / \"reference\"\n    os.makedirs(reference_dir, exist_ok=True)\n    \n    # First, generate reference images for the input file\n    print(f\"Generating reference images for input file: {input_file}\")\n    generate_reference_images(str(input_file), str(reference_dir), prefix=\"expected_\")\n    \n    # Then, generate reference images for each expected output file\n    for model, expected_files in MODEL_PARAMS:\n        for output_file in expected_files:\n            file_path = os.path.join(str(inputs_dir), output_file)\n            \n            # Check if the file exists\n            if os.path.exists(file_path):\n                print(f\"Generating reference images for output file: {output_file}\")\n                generate_reference_images(file_path, str(reference_dir), prefix=\"expected_\")\n            else:\n                print(f\"Warning: Output file does not exist: {file_path}\")\n                print(f\"You may need to run the CLI command first to generate the output files.\")\n    \n    print(\"Done generating reference images.\")\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "tests/integration/generate_reference_images_24bit.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScript to generate reference waveform and spectrogram images for 24-bit model outputs.\nThis script should be run after creating the expected output files with 24-bit audio.\n\nUsage:\n    1. First run audio-separator on the 24-bit input file with each model:\n       audio-separator -m model_bs_roformer_ep_317_sdr_12.9755.ckpt tests/inputs/fallen24bit20s.flac\n       audio-separator -m mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt tests/inputs/fallen24bit20s.flac\n       audio-separator -m MGM_MAIN_v4.pth tests/inputs/fallen24bit20s.flac\n    \n    2. Move the output files to tests/inputs/\n    \n    3. Run this script to generate reference images:\n       python tests/integration/generate_reference_images_24bit.py\n\"\"\"\n\nimport os\nimport sys\nfrom pathlib import Path\n\n# Add the parent directory to sys.path\nsys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))\n\nfrom tests.utils import generate_reference_images\nfrom tests.integration.test_24bit_preservation import MODEL_PARAMS_24BIT\n\n\ndef main():\n    \"\"\"Generate reference images for all expected 24-bit model outputs.\"\"\"\n    print(\"=\"*60)\n    print(\"Generating Reference Images for 24-bit Model Outputs\")\n    print(\"=\"*60)\n    \n    # Get the input file path\n    inputs_dir = Path(__file__).resolve().parent.parent / \"inputs\"\n    input_file = inputs_dir / \"fallen24bit20s.flac\"\n    \n    # Create reference directory if it doesn't exist\n    reference_dir = inputs_dir / \"reference\"\n    os.makedirs(reference_dir, exist_ok=True)\n    \n    # First, generate reference images for the 24-bit input file\n    if input_file.exists():\n        print(f\"\\n✓ Generating reference images for input file: {input_file}\")\n        generate_reference_images(str(input_file), str(reference_dir), prefix=\"expected_\")\n        print(f\"  Created: expected_fallen24bit20s_waveform.png\")\n        print(f\"  Created: expected_fallen24bit20s_spectrogram.png\")\n    else:\n        print(f\"\\n✗ Error: Input file does not exist: {input_file}\")\n        print(f\"  Please create the 24-bit test file first.\")\n        return 1\n    \n    # Then, generate reference images for each expected output file\n    print(f\"\\nGenerating reference images for model outputs...\")\n    missing_files = []\n    created_files = []\n    \n    for model, expected_files in MODEL_PARAMS_24BIT:\n        print(f\"\\n{'-'*60}\")\n        print(f\"Model: {model}\")\n        print(f\"{'-'*60}\")\n        \n        for output_file in expected_files:\n            file_path = os.path.join(str(inputs_dir), output_file)\n            \n            # Check if the file exists\n            if os.path.exists(file_path):\n                print(f\"✓ Processing: {output_file}\")\n                waveform_path, spectrogram_path = generate_reference_images(\n                    file_path, str(reference_dir), prefix=\"expected_\"\n                )\n                output_basename = os.path.splitext(output_file)[0]\n                print(f\"  Created: expected_{output_basename}_waveform.png\")\n                print(f\"  Created: expected_{output_basename}_spectrogram.png\")\n                created_files.extend([waveform_path, spectrogram_path])\n            else:\n                print(f\"✗ Missing: {output_file}\")\n                missing_files.append((model, output_file))\n    \n    # Summary\n    print(f\"\\n{'='*60}\")\n    print(\"Summary\")\n    print(f\"{'='*60}\")\n    print(f\"✓ Created {len(created_files)} reference images\")\n    \n    if missing_files:\n        print(f\"\\n⚠  Warning: {len(missing_files)} output files were not found:\")\n        print(f\"\\nTo generate missing files, run these commands:\")\n        print()\n        for model, _ in missing_files:\n            # Only print each model once\n            if missing_files.index((model, _)) == [m for m, _ in missing_files].index(model):\n                print(f\"  audio-separator -m {model} tests/inputs/fallen24bit20s.flac\")\n        print(f\"\\nThen move the output files to tests/inputs/ and run this script again.\")\n        print()\n    else:\n        print(f\"\\n✅ All reference images generated successfully!\")\n        print(f\"\\nYou can now run the 24-bit preservation tests:\")\n        print(f\"  pytest tests/integration/test_24bit_preservation.py -v\")\n    \n    print(f\"{'='*60}\\n\")\n    return 0 if not missing_files else 1\n\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n\n"
  },
  {
    "path": "tests/integration/generate_reference_images_ensemble.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScript to generate reference waveform and spectrogram images for ensemble preset outputs.\n\nUsage:\n    1. Run all ensemble presets to generate output files:\n       python -c \"\n       from audio_separator.separator import Separator\n       presets = ['instrumental_clean', 'instrumental_full', 'instrumental_balanced',\n                  'instrumental_low_resource', 'vocal_balanced', 'vocal_clean',\n                  'vocal_full', 'vocal_rvc', 'karaoke']\n       for p in presets:\n           sep = Separator(output_dir='tests/inputs/ensemble_outputs', output_format='FLAC', ensemble_preset=p)\n           sep.load_model()\n           sep.separate('tests/inputs/mardy20s.flac')\n       \"\n\n    2. Run this script to generate reference images from the outputs:\n       python tests/integration/generate_reference_images_ensemble.py\n\"\"\"\n\nimport os\nimport sys\nfrom pathlib import Path\n\nsys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))\n\nfrom tests.utils import generate_reference_images\nfrom tests.integration.test_ensemble_integration import ENSEMBLE_PRESET_PARAMS\n\n\ndef main():\n    \"\"\"Generate reference images for all expected ensemble preset outputs.\"\"\"\n    inputs_dir = Path(__file__).resolve().parent.parent / \"inputs\"\n    reference_dir = inputs_dir / \"reference\"\n    os.makedirs(reference_dir, exist_ok=True)\n\n    # Check if output files exist — they need to be generated first\n    # Look in a few common locations\n    search_dirs = [\n        \"/tmp/ensemble-all-presets\",\n        str(inputs_dir / \"ensemble_outputs\"),\n    ]\n\n    output_dir = None\n    for d in search_dirs:\n        if os.path.isdir(d):\n            output_dir = d\n            break\n\n    if output_dir is None:\n        print(\"ERROR: No ensemble output files found.\")\n        print(\"First run the ensemble presets to generate output files.\")\n        print(\"Output files should be in one of:\")\n        for d in search_dirs:\n            print(f\"  {d}\")\n        sys.exit(1)\n\n    print(f\"Using output files from: {output_dir}\")\n    print(f\"Generating reference images in: {reference_dir}\\n\")\n\n    generated = 0\n    missing = 0\n\n    for preset, expected_files in ENSEMBLE_PRESET_PARAMS:\n        print(f\"Preset: {preset}\")\n        for output_filename in expected_files:\n            file_path = os.path.join(output_dir, output_filename)\n\n            if os.path.exists(file_path):\n                print(f\"  Generating images for: {output_filename}\")\n                generate_reference_images(file_path, str(reference_dir), prefix=\"expected_\")\n                generated += 2  # waveform + spectrogram\n            else:\n                print(f\"  WARNING: Missing output file: {file_path}\")\n                missing += 1\n\n    print(f\"\\nDone! Generated {generated} reference images.\")\n    if missing > 0:\n        print(f\"WARNING: {missing} output files were missing.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "tests/integration/requirements.txt",
    "content": "matplotlib\nscikit-image\n\n"
  },
  {
    "path": "tests/integration/test_24bit_preservation.py",
    "content": "\"\"\"\nIntegration tests for 24-bit audio bit depth preservation.\n\nTests that 24-bit input audio files produce 24-bit output audio files\nwhile maintaining audio quality through the separation process.\n\"\"\"\n\nimport os\nimport subprocess\nimport pytest\nimport soundfile as sf\nfrom pathlib import Path\nimport sys\n\n# Add parent directory to path\nsys.path.append(os.path.dirname(os.path.dirname(__file__)))\nfrom utils import generate_reference_images, compare_images\n\n\n@pytest.fixture(name=\"input_file_24bit\")\ndef fixture_input_file_24bit():\n    \"\"\"Fixture providing the 24-bit test input audio file path.\"\"\"\n    return \"tests/inputs/fallen24bit20s.flac\"\n\n\n@pytest.fixture(name=\"reference_dir\")\ndef fixture_reference_dir():\n    \"\"\"Fixture providing the reference images directory path.\"\"\"\n    return \"tests/inputs/reference\"\n\n\n@pytest.fixture(name=\"cleanup_output_files\")\ndef fixture_cleanup_output_files():\n    \"\"\"Fixture to clean up output files before and after test.\"\"\"\n    output_files = []\n    yield output_files\n    \n    # Clean up output files after test\n    for file in output_files:\n        if os.path.exists(file):\n            print(f\"Cleaning up test output file: {file}\")\n            os.remove(file)\n\n\ndef get_audio_bit_depth(file_path):\n    \"\"\"Get the bit depth of an audio file.\"\"\"\n    info = sf.info(file_path)\n    subtype = info.subtype\n    \n    if 'PCM_16' in subtype or subtype == 'PCM_S8':\n        return 16\n    elif 'PCM_24' in subtype:\n        return 24\n    elif 'PCM_32' in subtype or 'FLOAT' in subtype or 'DOUBLE' in subtype:\n        return 32\n    else:\n        return None\n\n\ndef run_separation_test_24bit(model, audio_path, expected_files):\n    \"\"\"Helper function to run a separation test with a 24-bit input file.\"\"\"\n    # Clean up any existing output files before the test\n    for file in expected_files:\n        if os.path.exists(file):\n            print(f\"Deleting existing test output file {file}\")\n            os.remove(file)\n    \n    # Verify input is 24-bit\n    input_bit_depth = get_audio_bit_depth(audio_path)\n    print(f\"Input file bit depth: {input_bit_depth}-bit\")\n    assert input_bit_depth == 24, f\"Input file should be 24-bit, got {input_bit_depth}-bit\"\n    \n    # Run the CLI command\n    result = subprocess.run(\n        [\"audio-separator\", \"-m\", model, audio_path],\n        capture_output=True,\n        text=True,\n        check=False\n    )\n    \n    # Check that the command completed successfully\n    assert result.returncode == 0, f\"Command failed with output: {result.stderr}\"\n    \n    # Check that the output files were created and are 24-bit\n    for file in expected_files:\n        assert os.path.exists(file), f\"Output file {file} was not created\"\n        assert os.path.getsize(file) > 0, f\"Output file {file} is empty\"\n        \n        # Verify output is also 24-bit\n        output_bit_depth = get_audio_bit_depth(file)\n        print(f\"Output file {file} bit depth: {output_bit_depth}-bit\")\n        assert output_bit_depth == 24, f\"Output file should be 24-bit to match input, got {output_bit_depth}-bit\"\n    \n    return result\n\n\ndef validate_audio_output(output_file, reference_dir, waveform_threshold=0.999, spectrogram_threshold=None):\n    \"\"\"Validate an audio output file by comparing its waveform and spectrogram with reference images.\"\"\"\n    if spectrogram_threshold is None:\n        spectrogram_threshold = waveform_threshold\n    \n    # Create temporary directory for generated images\n    temp_dir = os.path.join(os.path.dirname(output_file), \"temp_images\")\n    os.makedirs(temp_dir, exist_ok=True)\n    \n    # Generate waveform and spectrogram images for the output file\n    output_filename = os.path.basename(output_file)\n    name_without_ext = os.path.splitext(output_filename)[0]\n    \n    # Generate actual images\n    actual_waveform_path, actual_spectrogram_path = generate_reference_images(\n        output_file, temp_dir, prefix=\"actual_\"\n    )\n    \n    # Path to expected reference images\n    expected_waveform_path = os.path.join(\n        reference_dir, f\"expected_{name_without_ext}_waveform.png\"\n    )\n    expected_spectrogram_path = os.path.join(\n        reference_dir, f\"expected_{name_without_ext}_spectrogram.png\"\n    )\n    \n    # Check if reference images exist\n    if not os.path.exists(expected_waveform_path) or not os.path.exists(expected_spectrogram_path):\n        print(f\"Warning: Reference images not found for {output_file}\")\n        print(f\"Expected: {expected_waveform_path} and {expected_spectrogram_path}\")\n        print(f\"Run generate_reference_images_24bit.py to create them\")\n        return False, False\n    \n    # Compare waveform images\n    waveform_similarity, waveform_match = compare_images(\n        expected_waveform_path, actual_waveform_path,\n        min_similarity_threshold=waveform_threshold\n    )\n    \n    # Compare spectrogram images\n    spectrogram_similarity, spectrogram_match = compare_images(\n        expected_spectrogram_path, actual_spectrogram_path,\n        min_similarity_threshold=spectrogram_threshold\n    )\n    \n    print(f\"Validation results for {output_file}:\")\n    print(f\"  Waveform similarity: {waveform_similarity:.4f} \"\n          f\"(match: {waveform_match}, threshold: {waveform_threshold:.2f})\")\n    print(f\"  Spectrogram similarity: {spectrogram_similarity:.4f} \"\n          f\"(match: {spectrogram_match}, threshold: {spectrogram_threshold:.2f})\")\n    \n    return waveform_match, spectrogram_match\n\n\n# Default similarity threshold for 24-bit audio tests\nDEFAULT_SIMILARITY_THRESHOLDS_24BIT = (0.90, 0.80)  # (waveform, spectrogram)\n\n# Model-specific similarity thresholds for 24-bit tests\nMODEL_SIMILARITY_THRESHOLDS_24BIT = {\n    # Format: (waveform_threshold, spectrogram_threshold)\n}\n\n\n# Parameterized test for multiple models with 24-bit audio\nMODEL_PARAMS_24BIT = [\n    # (model_filename, expected_output_filenames)\n    (\n        \"model_bs_roformer_ep_317_sdr_12.9755.ckpt\",\n        [\n            \"fallen24bit20s_(Instrumental)_model_bs_roformer_ep_317_sdr_12.flac\",\n            \"fallen24bit20s_(Vocals)_model_bs_roformer_ep_317_sdr_12.flac\",\n        ]\n    ),\n    (\n        \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\",\n        [\n            \"fallen24bit20s_(Instrumental)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\",\n            \"fallen24bit20s_(Vocals)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\",\n        ]\n    ),\n    (\n        \"MGM_MAIN_v4.pth\",\n        [\n            \"fallen24bit20s_(Instrumental)_MGM_MAIN_v4.flac\",\n            \"fallen24bit20s_(Vocals)_MGM_MAIN_v4.flac\",\n        ]\n    ),\n]\n\n\n@pytest.mark.parametrize(\"model,expected_files\", MODEL_PARAMS_24BIT)\ndef test_24bit_model_separation(model, expected_files, input_file_24bit, reference_dir, cleanup_output_files):\n    \"\"\"Test that 24-bit input audio produces 24-bit output audio with correct content.\"\"\"\n    print(f\"\\n{'='*60}\")\n    print(f\"Testing 24-bit preservation with model: {model}\")\n    print(f\"{'='*60}\")\n    \n    # Add files to the cleanup list\n    cleanup_output_files.extend(expected_files)\n    \n    # Run the test (includes bit depth validation)\n    run_separation_test_24bit(model, input_file_24bit, expected_files)\n    \n    # Validate the output audio quality\n    print(f\"\\nValidating output audio quality for model {model}...\")\n    \n    # Get model-specific similarity threshold or use default\n    threshold = MODEL_SIMILARITY_THRESHOLDS_24BIT.get(\n        model, DEFAULT_SIMILARITY_THRESHOLDS_24BIT\n    )\n    waveform_threshold, spectrogram_threshold = threshold\n    \n    print(f\"Using thresholds - waveform: {waveform_threshold}, \"\n          f\"spectrogram: {spectrogram_threshold}\")\n    \n    for output_file in expected_files:\n        # Skip validation if reference images are not required\n        if os.environ.get(\"SKIP_AUDIO_VALIDATION\") == \"1\":\n            print(f\"Skipping audio validation for {output_file} (SKIP_AUDIO_VALIDATION=1)\")\n            continue\n        \n        waveform_match, spectrogram_match = validate_audio_output(\n            output_file, reference_dir,\n            waveform_threshold=waveform_threshold,\n            spectrogram_threshold=spectrogram_threshold\n        )\n        \n        # Assert that the output matches the reference\n        assert waveform_match, f\"Waveform for {output_file} does not match the reference\"\n        assert spectrogram_match, f\"Spectrogram for {output_file} does not match the reference\"\n    \n    print(f\"✅ Test passed for {model}: 24-bit preservation and audio quality verified\")\n\n"
  },
  {
    "path": "tests/integration/test_cli_integration.py",
    "content": "import os\nimport subprocess\nimport sys\nfrom pathlib import Path\n\nimport pytest\nsys.path.append(os.path.dirname(os.path.dirname(__file__)))\nfrom utils import generate_reference_images, compare_images\n\n\n@pytest.fixture(name=\"input_file\")\ndef fixture_input_file():\n    \"\"\"Fixture providing the test input audio file path.\"\"\"\n    return \"tests/inputs/mardy20s.flac\"\n\n\n@pytest.fixture(name=\"reference_dir\")\ndef fixture_reference_dir():\n    \"\"\"Fixture providing the reference images directory path.\"\"\"\n    return \"tests/inputs/reference\"\n\n\n@pytest.fixture(name=\"cleanup_output_files\")\ndef fixture_cleanup_output_files():\n    \"\"\"Fixture to clean up output files before and after test.\"\"\"\n    # This list will be populated by the test functions\n    output_files = []\n\n    # Yield to allow the test to run and add files to the list\n    yield output_files\n\n    # Clean up output files after test\n    for file in output_files:\n        if os.path.exists(file):\n            print(f\"Test output file exists: {file}\")\n            os.remove(file)\n\n\ndef run_separation_test(model, audio_path, expected_files):\n    \"\"\"Helper function to run a separation test with a specific model.\"\"\"\n    # Clean up any existing output files before the test\n    for file in expected_files:\n        if os.path.exists(file):\n            print(f\"Deleting existing test output file {file}\")\n            os.remove(file)\n\n    # Run the CLI command\n    result = subprocess.run([\"audio-separator\", \"-m\", model, audio_path], capture_output=True, text=True, check=False)  # Explicitly set check to False as we handle errors manually\n\n    # Check that the command completed successfully\n    assert result.returncode == 0, f\"Command failed with output: {result.stderr}\"\n\n    # Check that the output files were created\n    for file in expected_files:\n        assert os.path.exists(file), f\"Output file {file} was not created\"\n        assert os.path.getsize(file) > 0, f\"Output file {file} is empty\"\n\n    return result\n\n\ndef validate_audio_output(output_file, reference_dir, waveform_threshold=0.999, spectrogram_threshold=None):\n    \"\"\"Validate an audio output file by comparing its waveform and spectrogram with reference images.\n\n    Args:\n        output_file: Path to the audio output file\n        reference_dir: Directory containing reference images\n        waveform_threshold: Minimum similarity required for waveform images (0.0-1.0)\n        spectrogram_threshold: Minimum similarity for spectrogram images (0.0-1.0), defaults to waveform_threshold if None\n\n    Returns:\n        Tuple of booleans: (waveform_match, spectrogram_match)\n    \"\"\"\n    # If spectrogram threshold not specified, use the same as waveform threshold\n    if spectrogram_threshold is None:\n        spectrogram_threshold = waveform_threshold\n\n    # Create temporary directory for generated images\n    temp_dir = os.path.join(os.path.dirname(output_file), \"temp_images\")\n    os.makedirs(temp_dir, exist_ok=True)\n\n    # Generate waveform and spectrogram images for the output file\n    output_filename = os.path.basename(output_file)\n    name_without_ext = os.path.splitext(output_filename)[0]\n\n    # Generate actual images\n    actual_waveform_path, actual_spectrogram_path = generate_reference_images(output_file, temp_dir, prefix=\"actual_\")\n\n    # Path to expected reference images\n    expected_waveform_path = os.path.join(reference_dir, f\"expected_{name_without_ext}_waveform.png\")\n    expected_spectrogram_path = os.path.join(reference_dir, f\"expected_{name_without_ext}_spectrogram.png\")\n\n    # Check if reference images exist\n    if not os.path.exists(expected_waveform_path) or not os.path.exists(expected_spectrogram_path):\n        print(f\"Warning: Reference images not found for {output_file}\")\n        print(f\"Expected: {expected_waveform_path} and {expected_spectrogram_path}\")\n        return False, False\n\n    # Compare waveform images\n    waveform_similarity, waveform_match = compare_images(expected_waveform_path, actual_waveform_path, min_similarity_threshold=waveform_threshold)\n\n    # Compare spectrogram images\n    spectrogram_similarity, spectrogram_match = compare_images(expected_spectrogram_path, actual_spectrogram_path, min_similarity_threshold=spectrogram_threshold)\n\n    print(f\"Validation results for {output_file}:\\n\")\n    print(f\"  Waveform similarity: {waveform_similarity:.4f} (match: {waveform_match}, threshold: {waveform_threshold:.2f})\\n\")\n    print(f\"  Spectrogram similarity: {spectrogram_similarity:.4f} (match: {spectrogram_match}, threshold: {spectrogram_threshold:.2f})\\n\")\n\n    # Cleanup temp images (optional, uncomment if needed)\n    # os.remove(actual_waveform_path)\n    # os.remove(actual_spectrogram_path)\n\n    return waveform_match, spectrogram_match\n\n\n# Default similarity threshold to use for most models\nDEFAULT_SIMILARITY_THRESHOLDS = (0.90, 0.80)  # (waveform_threshold, spectrogram_threshold)\n\n# Model-specific similarity thresholds\n# Use lower thresholds for models that show more variation between runs\nMODEL_SIMILARITY_THRESHOLDS = {\n    # Format: (waveform_threshold, spectrogram_threshold)\n    \"htdemucs_6s.yaml\": (0.90, 0.70)  # Demucs multi-stem output (e.g. \"Other\" and \"Piano\") is a lot more variable\n}\n\n\n# Parameterized test for multiple models\nMODEL_PARAMS = [\n    # (model_filename, expected_output_filenames)\n    (\"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\", [\n        \"mardy20s_(Instrumental)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\",\n        \"mardy20s_(Vocals)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\",\n    ]),\n    (\"kuielab_b_vocals.onnx\", [\"mardy20s_(Instrumental)_kuielab_b_vocals.flac\", \"mardy20s_(Vocals)_kuielab_b_vocals.flac\"]),\n    (\"MGM_MAIN_v4.pth\", [\"mardy20s_(Instrumental)_MGM_MAIN_v4.flac\", \"mardy20s_(Vocals)_MGM_MAIN_v4.flac\"]),\n    (\"UVR-MDX-NET-Inst_HQ_4.onnx\", [\"mardy20s_(Instrumental)_UVR-MDX-NET-Inst_HQ_4.flac\", \"mardy20s_(Vocals)_UVR-MDX-NET-Inst_HQ_4.flac\"]),\n    (\"2_HP-UVR.pth\", [\"mardy20s_(Instrumental)_2_HP-UVR.flac\", \"mardy20s_(Vocals)_2_HP-UVR.flac\"]),\n    (\n        \"htdemucs_6s.yaml\",\n        [\n            \"mardy20s_(Vocals)_htdemucs_6s.flac\",\n            \"mardy20s_(Drums)_htdemucs_6s.flac\",\n            \"mardy20s_(Bass)_htdemucs_6s.flac\",\n            \"mardy20s_(Other)_htdemucs_6s.flac\",\n            \"mardy20s_(Guitar)_htdemucs_6s.flac\",\n            \"mardy20s_(Piano)_htdemucs_6s.flac\",\n        ],\n    ),\n    (\"model_bs_roformer_ep_937_sdr_10.5309.ckpt\", [\"mardy20s_(Drum-Bass)_model_bs_roformer_ep_937_sdr_10.flac\", \"mardy20s_(No Drum-Bass)_model_bs_roformer_ep_937_sdr_10.flac\"]),\n    (\"model_bs_roformer_ep_317_sdr_12.9755.ckpt\", [\"mardy20s_(Instrumental)_model_bs_roformer_ep_317_sdr_12.flac\", \"mardy20s_(Vocals)_model_bs_roformer_ep_317_sdr_12.flac\"]),\n]\n\n\n@pytest.mark.parametrize(\"model,expected_files\", MODEL_PARAMS)\ndef test_model_separation(model, expected_files, input_file, reference_dir, cleanup_output_files):\n    \"\"\"Parameterized test for multiple model files.\"\"\"\n    # Add files to the cleanup list\n    cleanup_output_files.extend(expected_files)\n\n    # Run the test\n    run_separation_test(model, input_file, expected_files)\n\n    # Validate the output audio files\n    print(f\"\\nValidating output files for model {model}...\")\n\n    # Get model-specific similarity threshold or use default\n    threshold = MODEL_SIMILARITY_THRESHOLDS.get(model, DEFAULT_SIMILARITY_THRESHOLDS)\n\n    # Unpack thresholds - DEFAULT_SIMILARITY_THRESHOLDS is now always a tuple\n    waveform_threshold, spectrogram_threshold = threshold\n\n    print(f\"Using thresholds - waveform: {waveform_threshold}, spectrogram: {spectrogram_threshold} for model {model}\")\n\n    for output_file in expected_files:\n        # Skip validation if reference images are not required (set environment variable to skip)\n        if os.environ.get(\"SKIP_AUDIO_VALIDATION\") == \"1\":\n            print(f\"Skipping audio validation for {output_file} (SKIP_AUDIO_VALIDATION=1)\")\n            continue\n\n        waveform_match, spectrogram_match = validate_audio_output(output_file, reference_dir, waveform_threshold=waveform_threshold, spectrogram_threshold=spectrogram_threshold)\n\n        # Assert that the output matches the reference\n        assert waveform_match, f\"Waveform for {output_file} does not match the reference\"\n        assert spectrogram_match, f\"Spectrogram for {output_file} does not match the reference\"\n"
  },
  {
    "path": "tests/integration/test_ensemble_integration.py",
    "content": "\"\"\"\nIntegration tests for ensemble preset separations.\n\nTests that each ensemble preset produces correct output by:\n1. Running the preset on the test input audio\n2. Verifying output stems contain the expected content (vocal vs instrumental)\n3. Comparing output spectrograms against committed reference images\n\"\"\"\n\nimport os\nimport sys\nimport tempfile\nimport shutil\nimport pytest\n\nsys.path.append(os.path.dirname(os.path.dirname(__file__)))\nfrom utils import generate_reference_images, compare_images\nfrom utils_audio_verification import load_references, verify_separation_outputs\n\n\n@pytest.fixture(name=\"input_file\")\ndef fixture_input_file():\n    \"\"\"Fixture providing the test input audio file path.\"\"\"\n    return \"tests/inputs/mardy20s.flac\"\n\n\n@pytest.fixture(name=\"reference_dir\")\ndef fixture_reference_dir():\n    \"\"\"Fixture providing the reference images directory path.\"\"\"\n    return \"tests/inputs/reference\"\n\n\n@pytest.fixture(name=\"temp_output_dir\")\ndef fixture_temp_output_dir():\n    \"\"\"Fixture providing a temporary directory for output files.\"\"\"\n    temp_dir = tempfile.mkdtemp(prefix=\"ensemble-test-\")\n    yield temp_dir\n    shutil.rmtree(temp_dir, ignore_errors=True)\n\n\n@pytest.fixture(name=\"audio_references\")\ndef fixture_audio_references():\n    \"\"\"Fixture providing loaded audio references for content verification.\"\"\"\n    ref_vocal, ref_inst, ref_mix, min_len = load_references()\n    return ref_vocal, ref_inst, ref_mix, min_len\n\n\ndef validate_audio_output(output_file, reference_dir, waveform_threshold=0.90, spectrogram_threshold=0.80):\n    \"\"\"Validate an audio output file by comparing its waveform and spectrogram with reference images.\"\"\"\n    temp_dir = os.path.join(os.path.dirname(output_file), \"temp_images\")\n    os.makedirs(temp_dir, exist_ok=True)\n\n    output_filename = os.path.basename(output_file)\n    name_without_ext = os.path.splitext(output_filename)[0]\n\n    actual_waveform_path, actual_spectrogram_path = generate_reference_images(\n        output_file, temp_dir, prefix=\"actual_\"\n    )\n\n    expected_waveform_path = os.path.join(reference_dir, f\"expected_{name_without_ext}_waveform.png\")\n    expected_spectrogram_path = os.path.join(reference_dir, f\"expected_{name_without_ext}_spectrogram.png\")\n\n    if not os.path.exists(expected_waveform_path) or not os.path.exists(expected_spectrogram_path):\n        print(f\"Warning: Reference images not found for {output_file}\")\n        print(f\"Expected: {expected_waveform_path}\")\n        print(f\"Run generate_reference_images_ensemble.py to create them\")\n        return False, False\n\n    waveform_similarity, waveform_match = compare_images(\n        expected_waveform_path, actual_waveform_path,\n        min_similarity_threshold=waveform_threshold\n    )\n\n    spectrogram_similarity, spectrogram_match = compare_images(\n        expected_spectrogram_path, actual_spectrogram_path,\n        min_similarity_threshold=spectrogram_threshold\n    )\n\n    print(f\"  Waveform SSIM: {waveform_similarity:.4f} (threshold: {waveform_threshold:.2f}, match: {waveform_match})\")\n    print(f\"  Spectrogram SSIM: {spectrogram_similarity:.4f} (threshold: {spectrogram_threshold:.2f}, match: {spectrogram_match})\")\n\n    return waveform_match, spectrogram_match\n\n\n# Similarity thresholds — ensemble outputs can vary slightly across runs\nDEFAULT_THRESHOLDS = (0.90, 0.80)  # (waveform, spectrogram)\n\n# All 9 ensemble presets with their expected output stems\nENSEMBLE_PRESET_PARAMS = [\n    (\"instrumental_clean\", [\n        \"mardy20s_(Vocals)_preset_instrumental_clean.flac\",\n        \"mardy20s_(Instrumental)_preset_instrumental_clean.flac\",\n    ]),\n    (\"instrumental_full\", [\n        \"mardy20s_(Vocals)_preset_instrumental_full.flac\",\n        \"mardy20s_(Instrumental)_preset_instrumental_full.flac\",\n    ]),\n    (\"instrumental_balanced\", [\n        \"mardy20s_(Vocals)_preset_instrumental_balanced.flac\",\n        \"mardy20s_(Instrumental)_preset_instrumental_balanced.flac\",\n    ]),\n    (\"instrumental_low_resource\", [\n        \"mardy20s_(Vocals)_preset_instrumental_low_resource.flac\",\n        \"mardy20s_(Instrumental)_preset_instrumental_low_resource.flac\",\n    ]),\n    (\"vocal_balanced\", [\n        \"mardy20s_(Vocals)_preset_vocal_balanced.flac\",\n        \"mardy20s_(Instrumental)_preset_vocal_balanced.flac\",\n    ]),\n    (\"vocal_clean\", [\n        \"mardy20s_(Vocals)_preset_vocal_clean.flac\",\n        \"mardy20s_(Instrumental)_preset_vocal_clean.flac\",\n    ]),\n    (\"vocal_full\", [\n        \"mardy20s_(Vocals)_preset_vocal_full.flac\",\n        \"mardy20s_(Instrumental)_preset_vocal_full.flac\",\n    ]),\n    (\"vocal_rvc\", [\n        \"mardy20s_(Vocals)_preset_vocal_rvc.flac\",\n        \"mardy20s_(Instrumental)_preset_vocal_rvc.flac\",\n    ]),\n    (\"karaoke\", [\n        \"mardy20s_(Vocals)_preset_karaoke.flac\",\n        \"mardy20s_(Instrumental)_preset_karaoke.flac\",\n    ]),\n]\n\n\n@pytest.mark.parametrize(\"preset,expected_files\", ENSEMBLE_PRESET_PARAMS)\ndef test_ensemble_preset(preset, expected_files, input_file, reference_dir, temp_output_dir, audio_references):\n    \"\"\"Test that an ensemble preset produces correctly labeled and spectrogram-matching output.\"\"\"\n    from audio_separator.separator import Separator\n\n    print(f\"\\n{'='*60}\")\n    print(f\"  Testing preset: {preset}\")\n    print(f\"{'='*60}\")\n\n    # Run separation\n    separator = Separator(\n        output_dir=temp_output_dir,\n        output_format=\"FLAC\",\n        ensemble_preset=preset,\n    )\n    separator.load_model()\n    output_files = separator.separate(input_file)\n\n    # Check expected files were created\n    output_basenames = [os.path.basename(f) for f in output_files]\n    for expected in expected_files:\n        assert expected in output_basenames, (\n            f\"Expected output '{expected}' not found. Got: {output_basenames}\"\n        )\n\n    # Check files exist and are non-empty\n    for output_file in output_files:\n        full_path = output_file if os.path.isabs(output_file) else os.path.join(temp_output_dir, output_file)\n        assert os.path.exists(full_path), f\"Output file does not exist: {full_path}\"\n        assert os.path.getsize(full_path) > 0, f\"Output file is empty: {full_path}\"\n\n    # Content verification — ensure stems contain what their labels claim\n    ref_vocal, ref_inst, ref_mix, min_len = audio_references\n    full_paths = [\n        f if os.path.isabs(f) else os.path.join(temp_output_dir, f)\n        for f in output_files\n    ]\n    verifications = verify_separation_outputs(full_paths, ref_vocal, ref_inst, ref_mix, min_len)\n\n    for v in verifications:\n        print(f\"  {v.label:<15} → {v.detected_content:<15} corr_v={v.corr_vocal:.3f} corr_i={v.corr_instrumental:.3f}\")\n        assert v.label_matches, (\n            f\"Stem '{v.label}' contains {v.detected_content} \"\n            f\"(corr_vocal={v.corr_vocal:.3f}, corr_inst={v.corr_instrumental:.3f})\"\n        )\n\n    # Spectrogram comparison against reference images\n    if os.environ.get(\"SKIP_AUDIO_VALIDATION\") == \"1\":\n        print(\"  Skipping spectrogram validation (SKIP_AUDIO_VALIDATION=1)\")\n        return\n\n    waveform_threshold, spectrogram_threshold = DEFAULT_THRESHOLDS\n\n    for output_file in output_files:\n        full_path = output_file if os.path.isabs(output_file) else os.path.join(temp_output_dir, output_file)\n        print(f\"\\n  Validating: {os.path.basename(output_file)}\")\n        waveform_match, spectrogram_match = validate_audio_output(\n            full_path, reference_dir,\n            waveform_threshold=waveform_threshold,\n            spectrogram_threshold=spectrogram_threshold,\n        )\n\n        assert waveform_match, f\"Waveform mismatch for {output_file}\"\n        assert spectrogram_match, f\"Spectrogram mismatch for {output_file}\"\n\n    print(f\"\\n  Preset '{preset}' passed all checks\")\n"
  },
  {
    "path": "tests/integration/test_ensemble_meaningful.py",
    "content": "\"\"\"\nIntegration tests verifying ensemble presets produce meaningful results.\n\nThese tests go beyond regression (does output match reference?) to verify\nthat ensembles and specialized models produce *semantically correct* output:\n\n- Vocal ensemble output should closely match the best single-model vocal output\n- Karaoke ensemble should extract only lead vocals (differ from standard vocal split)\n- Karaoke on extracted vocals should produce distinct lead and backing vocal stems\n\nUsage:\n    pytest tests/integration/test_ensemble_meaningful.py -v -s\n    pytest tests/integration/test_ensemble_meaningful.py -v -s -k \"karaoke\"\n    pytest tests/integration/test_ensemble_meaningful.py -v -s -k \"lead_backing\"\n\"\"\"\n\nimport os\nimport sys\nimport re\nimport tempfile\nimport shutil\nimport logging\nimport pytest\nimport numpy as np\nimport librosa\n\nREFERENCE_DIR = \"tests/inputs/reference\"\n\n\ndef correlate(file_a, file_b, sr=44100):\n    \"\"\"Compute Pearson correlation between two audio files (mono-mixed).\"\"\"\n    a, _ = librosa.load(file_a, sr=sr, mono=True)\n    b, _ = librosa.load(file_b, sr=sr, mono=True)\n    ml = min(len(a), len(b))\n    return float(np.corrcoef(a[:ml], b[:ml])[0, 1])\n\n\ndef rms(file_path, sr=44100):\n    \"\"\"Compute RMS energy of an audio file.\"\"\"\n    y, _ = librosa.load(file_path, sr=sr, mono=True)\n    return float(np.sqrt(np.mean(y ** 2)))\n\n\ndef run_separation(model, input_file, output_dir):\n    \"\"\"Run a single model separation.\"\"\"\n    from audio_separator.separator import Separator\n    sep = Separator(output_dir=output_dir, output_format=\"FLAC\", log_level=logging.WARNING)\n    sep.load_model(model)\n    return sep.separate(input_file)\n\n\ndef run_preset(preset, input_file, output_dir):\n    \"\"\"Run an ensemble preset separation.\"\"\"\n    from audio_separator.separator import Separator\n    sep = Separator(output_dir=output_dir, output_format=\"FLAC\", ensemble_preset=preset, log_level=logging.WARNING)\n    sep.load_model()\n    return sep.separate(input_file)\n\n\ndef find_stem(output_files, stem_name, output_dir):\n    \"\"\"Find the output file matching a stem name (case-insensitive).\n\n    Uses the last _(StemName) group so pipeline outputs are matched correctly.\n    \"\"\"\n    for f in output_files:\n        full = f if os.path.isabs(f) else os.path.join(output_dir, f)\n        if not os.path.exists(full):\n            full = os.path.join(output_dir, os.path.basename(f))\n        matches = re.findall(r'_\\(([^)]+)\\)', os.path.basename(f))\n        if matches and matches[-1].lower() == stem_name.lower():\n            return full\n    return None\n\n\n# ─── Vocal ensemble quality ─────────────────────────────────────────\n\nVOCAL_ENSEMBLE_CLIPS = [\n    (\"tests/inputs/under_pressure_harmonies.flac\", \"ref_under_pressure_harmonies\"),\n    (\"tests/inputs/levee_drums.flac\", \"ref_levee_drums\"),\n]\n\n\n@pytest.mark.parametrize(\"input_file,ref_prefix\", VOCAL_ENSEMBLE_CLIPS)\ndef test_vocal_ensemble_matches_best_single_model(input_file, ref_prefix, tmp_path):\n    \"\"\"Vocal ensemble output should closely match the best single vocal model.\n\n    The vocal_balanced preset (Resurrection + Beta 6X averaged) should produce\n    output highly correlated (>0.90) with the Resurrection single-model output,\n    confirming the ensemble isn't degrading quality.\n    \"\"\"\n    output_dir = str(tmp_path)\n    clip = os.path.basename(input_file)\n    print(f\"\\n  {clip}: vocal_balanced ensemble vs resurrection single model\")\n\n    # Run ensemble\n    ens_out = run_preset(\"vocal_balanced\", input_file, output_dir)\n    ens_vocals = find_stem(ens_out, \"Vocals\", output_dir)\n    ens_inst = find_stem(ens_out, \"Instrumental\", output_dir)\n    assert ens_vocals and ens_inst\n\n    # Compare against single-model reference\n    ref_vocals = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_vocals.flac\")\n    ref_inst = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_instrumental.flac\")\n\n    corr_v = correlate(ens_vocals, ref_vocals)\n    corr_i = correlate(ens_inst, ref_inst)\n    print(f\"    Vocals correlation:       {corr_v:.3f} (should be > 0.90)\")\n    print(f\"    Instrumental correlation: {corr_i:.3f} (should be > 0.90)\")\n\n    # Ensemble should be very similar to best single model\n    assert corr_v > 0.90, f\"Ensemble vocals diverge too much from single model: {corr_v:.3f}\"\n    assert corr_i > 0.90, f\"Ensemble instrumental diverges too much from single model: {corr_i:.3f}\"\n\n    # Also verify it matches its own ensemble reference (regression)\n    ens_ref_vocals = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_vocals_preset_vocal_balanced.flac\")\n    ens_ref_inst = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_instrumental_preset_vocal_balanced.flac\")\n    if os.path.exists(ens_ref_vocals):\n        corr_ens_v = correlate(ens_vocals, ens_ref_vocals)\n        corr_ens_i = correlate(ens_inst, ens_ref_inst)\n        print(f\"    Ensemble regression:      vocals={corr_ens_v:.3f}, inst={corr_ens_i:.3f}\")\n        assert corr_ens_v > 0.70, f\"Ensemble vocals regression failed: {corr_ens_v:.3f}\"\n\n\n# ─── Karaoke ensemble extracts only lead vocals ─────────────────────\n\ndef test_karaoke_ensemble_extracts_lead_only(tmp_path):\n    \"\"\"Karaoke ensemble should extract only lead vocals, not backing harmonies.\n\n    On Under Pressure (which has prominent backing harmonies), the karaoke\n    ensemble's vocal output should differ significantly from the standard\n    vocal model's output (which extracts all vocals including backing).\n    \"\"\"\n    input_file = \"tests/inputs/under_pressure_harmonies.flac\"\n    output_dir = str(tmp_path)\n    print(f\"\\n  Karaoke ensemble on Under Pressure (lead-only extraction)\")\n\n    # Run karaoke ensemble\n    ens_out = run_preset(\"karaoke\", input_file, output_dir)\n    ens_vocals = find_stem(ens_out, \"Vocals\", output_dir)\n    ens_inst = find_stem(ens_out, \"Instrumental\", output_dir)\n    assert ens_vocals and ens_inst\n\n    # Compare karaoke ensemble vocals vs standard all-vocals reference\n    ref_all_vocals = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_vocals.flac\")\n    corr_vs_all = correlate(ens_vocals, ref_all_vocals)\n    print(f\"    Karaoke ensemble vocals vs all-vocals: {corr_vs_all:.3f} (should be < 0.90)\")\n\n    # Karaoke should extract LESS vocal content than standard separation\n    assert corr_vs_all < 0.90, (\n        f\"Karaoke ensemble vocals too similar to standard vocals ({corr_vs_all:.3f}). \"\n        \"Expected karaoke to extract only lead vocals, leaving backing in instrumental.\"\n    )\n\n    # Compare karaoke ensemble vs single karaoke model reference (regression)\n    ref_kar_vocals = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_vocals_karaoke.flac\")\n    corr_vs_single_kar = correlate(ens_vocals, ref_kar_vocals)\n    print(f\"    Karaoke ensemble vs single karaoke:    {corr_vs_single_kar:.3f} (should be > 0.70)\")\n    assert corr_vs_single_kar > 0.70, f\"Karaoke ensemble diverges from single karaoke: {corr_vs_single_kar:.3f}\"\n\n    # Verify karaoke instrumental has more content than standard instrumental\n    # (because it keeps backing vocals)\n    ens_inst_rms = rms(ens_inst)\n    ref_std_inst = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_instrumental.flac\")\n    std_inst_rms = rms(ref_std_inst)\n    print(f\"    Karaoke inst RMS: {ens_inst_rms:.3f}, Standard inst RMS: {std_inst_rms:.3f}\")\n\n\n# ─── Lead/backing vocal split pipeline ──────────────────────────────\n\ndef test_karaoke_on_vocals_produces_lead_backing_split(tmp_path):\n    \"\"\"Running karaoke model on extracted vocals should split lead from backing.\n\n    Pipeline: mix → vocal model → vocals → karaoke model → lead + backing\n    The lead and backing outputs should both be non-silent and uncorrelated.\n    \"\"\"\n    output_dir = str(tmp_path)\n    print(f\"\\n  Pipeline: Under Pressure mix → vocals → karaoke → lead/backing\")\n\n    # Step 1: Extract vocals (use reference to avoid re-running)\n    vocals_ref = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_vocals.flac\")\n    assert os.path.exists(vocals_ref), \"Missing vocal reference — run generate_multi_stem_references.py\"\n\n    # Step 2: Run karaoke on the extracted vocals\n    print(f\"    Running karaoke on extracted vocals...\")\n    kar_out = run_separation(\n        \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\",\n        vocals_ref, output_dir,\n    )\n\n    lead_file = find_stem(kar_out, \"Vocals\", output_dir)\n    backing_file = find_stem(kar_out, \"Instrumental\", output_dir)\n    assert lead_file, \"No lead vocal stem (labeled 'Vocals') from karaoke\"\n    assert backing_file, \"No backing vocal stem (labeled 'Instrumental') from karaoke\"\n\n    # Both should be non-silent\n    lead_rms = rms(lead_file)\n    backing_rms = rms(backing_file)\n    print(f\"    Lead vocals RMS:    {lead_rms:.3f} (should be > 0.01)\")\n    print(f\"    Backing vocals RMS: {backing_rms:.3f} (should be > 0.01)\")\n    assert lead_rms > 0.01, f\"Lead vocals are too quiet: {lead_rms:.3f}\"\n    assert backing_rms > 0.01, f\"Backing vocals are too quiet: {backing_rms:.3f}\"\n\n    # Lead and backing should be distinct (low correlation)\n    lb_corr = correlate(lead_file, backing_file)\n    print(f\"    Lead vs backing corr: {lb_corr:.3f} (should be < 0.50)\")\n    assert lb_corr < 0.50, f\"Lead and backing are too similar: {lb_corr:.3f}\"\n\n    # Regression: compare against committed references\n    ref_lead = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_lead_vocals.flac\")\n    ref_backing = os.path.join(REFERENCE_DIR, \"ref_under_pressure_harmonies_backing_vocals.flac\")\n    if os.path.exists(ref_lead):\n        corr_lead = correlate(lead_file, ref_lead)\n        corr_backing = correlate(backing_file, ref_backing)\n        print(f\"    Lead regression:    {corr_lead:.3f}\")\n        print(f\"    Backing regression: {corr_backing:.3f}\")\n        assert corr_lead > 0.70, f\"Lead vocals regression failed: {corr_lead:.3f}\"\n        assert corr_backing > 0.70, f\"Backing vocals regression failed: {corr_backing:.3f}\"\n"
  },
  {
    "path": "tests/integration/test_multi_stem_verification.py",
    "content": "\"\"\"\nIntegration tests for multi-stem and instrument-specific separation models.\n\nTests a matrix of models × clips, verifying each output stem correlates\nwith the committed reference stem generated by the best available model\nfor that instrument type.\n\nUsage:\n    # Run all multi-stem tests (requires model downloads):\n    pytest tests/integration/test_multi_stem_verification.py -v -s\n\n    # Run a specific test group:\n    pytest tests/integration/test_multi_stem_verification.py -v -s -k \"vocal_instrumental\"\n    pytest tests/integration/test_multi_stem_verification.py -v -s -k \"drumsep\"\n    pytest tests/integration/test_multi_stem_verification.py -v -s -k \"karaoke\"\n    pytest tests/integration/test_multi_stem_verification.py -v -s -k \"wind\"\n    pytest tests/integration/test_multi_stem_verification.py -v -s -k \"dereverb\"\n\nReference stems were generated using these best-in-class models:\n    - Vocal/Instrumental: bs_roformer_vocals_resurrection_unwa.ckpt\n    - 4-stem (drums/bass/other/vocals): htdemucs_ft.yaml\n    - DrumSep (kick/snare/toms/hh/ride/crash): MDX23C-DrumSep-aufr33-jarredou.ckpt\n      (pipeline: mix → htdemucs_ft drums stem → drumsep)\n    - Karaoke: mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\n    - Woodwind: 17_HP-Wind_Inst-UVR.pth\n    - De-reverb: dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt\n      (pipeline: mix → resurrection vocals → dereverb)\n\"\"\"\n\nimport os\nimport sys\nimport re\nimport tempfile\nimport shutil\nimport logging\nimport pytest\nimport numpy as np\nimport librosa\n\nREFERENCE_DIR = \"tests/inputs/reference\"\n\n# Correlation threshold: how closely must the output match the reference stem?\n# Lower than spectrogram SSIM because we're comparing against a different model's\n# output, not the same model re-run.\nDEFAULT_CORRELATION_THRESHOLD = 0.70\n\n\ndef correlate(file_a, file_b, sr=44100):\n    \"\"\"Compute Pearson correlation between two audio files (mono-mixed).\n\n    Returns nan when both signals are near-silent (zero variance), which\n    callers should treat as a match (both files are silence).\n    \"\"\"\n    a, _ = librosa.load(file_a, sr=sr, mono=True)\n    b, _ = librosa.load(file_b, sr=sr, mono=True)\n    min_len = min(len(a), len(b))\n    a, b = a[:min_len], b[:min_len]\n    # If both signals are near-silent, corrcoef returns nan; treat as perfect match\n    if np.std(a) < 1e-7 and np.std(b) < 1e-7:\n        return 1.0\n    return float(np.corrcoef(a, b)[0, 1])\n\n\ndef run_separation(model, input_file, output_dir):\n    \"\"\"Run a single model separation, return list of output paths.\"\"\"\n    from audio_separator.separator import Separator\n    sep = Separator(output_dir=output_dir, output_format=\"FLAC\", log_level=logging.WARNING)\n    sep.load_model(model)\n    return sep.separate(input_file)\n\n\ndef find_stem(output_files, stem_name, output_dir):\n    \"\"\"Find the output file matching a stem name (case-insensitive).\n\n    Uses the *last* _(StemName) group in the filename so pipeline outputs\n    (where the input filename already contains a parenthesized stem) are\n    matched correctly against the current model's stem label.\n    \"\"\"\n    for f in output_files:\n        full = f if os.path.isabs(f) else os.path.join(output_dir, f)\n        if not os.path.exists(full):\n            full = os.path.join(output_dir, os.path.basename(f))\n        matches = re.findall(r'_\\(([^)]+)\\)', os.path.basename(f))\n        if matches and matches[-1].lower() == stem_name.lower():\n            return full\n    return None\n\n\n# ─── Vocal / Instrumental separation ─────────────────────────────────\n\nVOCAL_INSTRUMENTAL_PARAMS = [\n    (\"tests/inputs/levee_drums.flac\", \"ref_levee_drums\"),\n    (\"tests/inputs/clocks_piano.flac\", \"ref_clocks_piano\"),\n    (\"tests/inputs/sing_sing_sing_brass.flac\", \"ref_sing_sing_sing_brass\"),\n    (\"tests/inputs/only_time_reverb.flac\", \"ref_only_time_reverb\"),\n]\n\n\n@pytest.mark.parametrize(\"input_file,ref_prefix\", VOCAL_INSTRUMENTAL_PARAMS)\ndef test_vocal_instrumental_separation(input_file, ref_prefix, tmp_path):\n    \"\"\"Test that vocal/instrumental separation produces stems matching references.\"\"\"\n    model = \"bs_roformer_vocals_resurrection_unwa.ckpt\"\n    output_dir = str(tmp_path)\n\n    output_files = run_separation(model, input_file, output_dir)\n    clip = os.path.basename(input_file)\n    print(f\"\\n  {clip} → {model}\")\n\n    # Find vocal and instrumental stems (this model labels them \"vocals\" and \"other\")\n    vocals_file = find_stem(output_files, \"vocals\", output_dir)\n    inst_file = find_stem(output_files, \"other\", output_dir)\n\n    assert vocals_file, f\"No vocals stem found in {output_files}\"\n    assert inst_file, f\"No instrumental stem found in {output_files}\"\n\n    ref_vocals = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_vocals.flac\")\n    ref_inst = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_instrumental.flac\")\n\n    corr_v = correlate(vocals_file, ref_vocals)\n    corr_i = correlate(inst_file, ref_inst)\n    print(f\"    Vocals corr:       {corr_v:.3f}\")\n    print(f\"    Instrumental corr: {corr_i:.3f}\")\n\n    assert corr_v > DEFAULT_CORRELATION_THRESHOLD, f\"Vocals correlation {corr_v:.3f} below threshold\"\n    assert corr_i > DEFAULT_CORRELATION_THRESHOLD, f\"Instrumental correlation {corr_i:.3f} below threshold\"\n\n\n# ─── 4-stem separation (htdemucs_ft) ─────────────────────────────────\n\nFOUR_STEM_PARAMS = [\n    (\"tests/inputs/levee_drums.flac\", \"ref_levee_drums\"),\n    (\"tests/inputs/clocks_piano.flac\", \"ref_clocks_piano\"),\n]\n\n\n@pytest.mark.parametrize(\"input_file,ref_prefix\", FOUR_STEM_PARAMS)\ndef test_four_stem_separation(input_file, ref_prefix, tmp_path):\n    \"\"\"Test htdemucs_ft 4-stem separation matches references.\"\"\"\n    model = \"htdemucs_ft.yaml\"\n    output_dir = str(tmp_path)\n\n    output_files = run_separation(model, input_file, output_dir)\n    clip = os.path.basename(input_file)\n    print(f\"\\n  {clip} → {model}\")\n\n    for stem in [\"Vocals\", \"Drums\", \"Bass\", \"Other\"]:\n        stem_file = find_stem(output_files, stem, output_dir)\n        assert stem_file, f\"No {stem} stem found\"\n\n        ref_path = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_{stem.lower()}_htdemucs_ft.flac\")\n        corr = correlate(stem_file, ref_path)\n        print(f\"    {stem:<8} corr: {corr:.3f}\")\n\n        assert corr > DEFAULT_CORRELATION_THRESHOLD, f\"{stem} correlation {corr:.3f} below threshold\"\n\n\n# ─── DrumSep pipeline ────────────────────────────────────────────────\n\ndef test_drumsep_pipeline(tmp_path):\n    \"\"\"Test drumsep pipeline: mix → htdemucs_ft drums → drumsep kit parts.\"\"\"\n    input_file = \"tests/inputs/levee_drums.flac\"\n    output_dir = str(tmp_path)\n\n    print(f\"\\n  Step 1: Extract drums from mix using htdemucs_ft\")\n    step1_files = run_separation(\"htdemucs_ft.yaml\", input_file, output_dir)\n    drums_file = find_stem(step1_files, \"Drums\", output_dir)\n    assert drums_file, \"No drums stem from htdemucs_ft\"\n\n    print(f\"  Step 2: Split drums into kit parts using DrumSep\")\n    step2_files = run_separation(\"MDX23C-DrumSep-aufr33-jarredou.ckpt\", drums_file, output_dir)\n\n    for stem in [\"kick\", \"snare\", \"toms\", \"hh\", \"ride\", \"crash\"]:\n        stem_file = find_stem(step2_files, stem, output_dir)\n        assert stem_file, f\"No {stem} stem from DrumSep\"\n\n        ref_path = os.path.join(REFERENCE_DIR, f\"ref_levee_drums_{stem}_drumsep.flac\")\n        corr = correlate(stem_file, ref_path)\n        print(f\"    {stem:<8} corr: {corr:.3f}\")\n\n        assert corr > DEFAULT_CORRELATION_THRESHOLD, f\"{stem} correlation {corr:.3f} below threshold\"\n\n\n# ─── Karaoke ─────────────────────────────────────────────────────────\n#\n# Karaoke models remove lead vocals while preserving backing vocals in\n# the instrumental. The test verifies:\n# 1. Karaoke output matches its own reference (regression check)\n# 2. Karaoke vocal output differs from standard vocal output (confirms\n#    the model extracts less — just lead vocals, not all vocals)\n\nKARAOKE_PARAMS = [\n    (\"tests/inputs/levee_drums.flac\", \"ref_levee_drums\"),\n    (\"tests/inputs/clocks_piano.flac\", \"ref_clocks_piano\"),\n    (\"tests/inputs/under_pressure_harmonies.flac\", \"ref_under_pressure_harmonies\"),\n]\n\n\n@pytest.mark.parametrize(\"input_file,ref_prefix\", KARAOKE_PARAMS)\ndef test_karaoke_separation(input_file, ref_prefix, tmp_path):\n    \"\"\"Test karaoke model produces output matching references and differs from standard split.\"\"\"\n    karaoke_model = \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\"\n    output_dir = str(tmp_path)\n\n    # Step 1: Run karaoke model\n    output_files = run_separation(karaoke_model, input_file, output_dir)\n    clip = os.path.basename(input_file)\n    print(f\"\\n  {clip} → karaoke\")\n\n    # Step 2: Verify stems match karaoke references\n    for stem, ref_suffix in [(\"Vocals\", \"vocals_karaoke\"), (\"Instrumental\", \"instrumental_karaoke\")]:\n        stem_file = find_stem(output_files, stem, output_dir)\n        assert stem_file, f\"No {stem} stem found\"\n\n        ref_path = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_{ref_suffix}.flac\")\n        corr = correlate(stem_file, ref_path)\n        print(f\"    {stem:<15} corr with karaoke ref: {corr:.3f}\")\n        assert corr > DEFAULT_CORRELATION_THRESHOLD, f\"{stem} correlation {corr:.3f} below threshold\"\n\n    # Step 3: Verify karaoke vocals differ from standard vocals\n    # (karaoke extracts only lead vocals; standard extracts all vocals)\n    karaoke_vocals = find_stem(output_files, \"Vocals\", output_dir)\n    standard_vocals_ref = os.path.join(REFERENCE_DIR, f\"{ref_prefix}_vocals.flac\")\n    if os.path.exists(standard_vocals_ref):\n        corr_vs_standard = correlate(karaoke_vocals, standard_vocals_ref)\n        print(f\"    Karaoke vs standard vocals corr: {corr_vs_standard:.3f} (should be < 0.95)\")\n        # Karaoke vocals should differ from standard vocals\n        # (not a hard failure — depends on whether there are backing vocals)\n        if corr_vs_standard > 0.95:\n            print(f\"    NOTE: Karaoke and standard vocals are very similar — clip may lack backing vocals\")\n\n\n# ─── Wind / Brass extraction ─────────────────────────────────────────\n\ndef test_wind_instrument_extraction(tmp_path):\n    \"\"\"Test wind instrument extraction on brass-heavy clip.\"\"\"\n    model = \"17_HP-Wind_Inst-UVR.pth\"\n    input_file = \"tests/inputs/sing_sing_sing_brass.flac\"\n    output_dir = str(tmp_path)\n\n    output_files = run_separation(model, input_file, output_dir)\n    print(f\"\\n  sing_sing_sing_brass → wind extraction\")\n\n    for stem, ref_name in [(\"Woodwinds\", \"ref_sing_sing_sing_brass_woodwinds.flac\"),\n                           (\"No Woodwinds\", \"ref_sing_sing_sing_brass_no_woodwinds.flac\")]:\n        stem_file = find_stem(output_files, stem, output_dir)\n        assert stem_file, f\"No '{stem}' stem found\"\n\n        ref_path = os.path.join(REFERENCE_DIR, ref_name)\n        corr = correlate(stem_file, ref_path)\n        print(f\"    {stem:<15} corr: {corr:.3f}\")\n\n        assert corr > DEFAULT_CORRELATION_THRESHOLD, f\"{stem} correlation {corr:.3f} below threshold\"\n\n\n# ─── De-reverb pipeline ──────────────────────────────────────────────\n\ndef test_dereverb_pipeline(tmp_path):\n    \"\"\"Test dereverb pipeline: mix → resurrection vocals → dereverb.\"\"\"\n    input_file = \"tests/inputs/only_time_reverb.flac\"\n    output_dir = str(tmp_path)\n\n    print(f\"\\n  Step 1: Extract vocals from mix\")\n    step1_files = run_separation(\"bs_roformer_vocals_resurrection_unwa.ckpt\", input_file, output_dir)\n    vocals_file = find_stem(step1_files, \"vocals\", output_dir)\n    assert vocals_file, \"No vocals stem from resurrection\"\n\n    print(f\"  Step 2: De-reverb the vocal stem\")\n    step2_files = run_separation(\"dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt\", vocals_file, output_dir)\n\n    for stem, ref_name in [(\"noreverb\", \"ref_only_time_reverb_vocals_noreverb.flac\"),\n                           (\"reverb\", \"ref_only_time_reverb_vocals_reverb.flac\")]:\n        stem_file = find_stem(step2_files, stem, output_dir)\n        assert stem_file, f\"No '{stem}' stem from dereverb\"\n\n        ref_path = os.path.join(REFERENCE_DIR, ref_name)\n        corr = correlate(stem_file, ref_path)\n        print(f\"    {stem:<15} corr: {corr:.3f}\")\n\n        assert corr > DEFAULT_CORRELATION_THRESHOLD, f\"{stem} correlation {corr:.3f} below threshold\"\n"
  },
  {
    "path": "tests/integration/test_remote_api_integration.py",
    "content": "import json\nimport pytest\nimport logging\nimport tempfile\nimport os\nimport time\nfrom pathlib import Path\nfrom unittest.mock import Mock, patch\nfrom http.server import HTTPServer, BaseHTTPRequestHandler\nimport threading\nimport urllib.parse\nimport io\n\nfrom audio_separator.remote import AudioSeparatorAPIClient\n\n\n# Mock API Server for Integration Tests\nclass MockAPIHandler(BaseHTTPRequestHandler):\n    \"\"\"Mock HTTP handler for simulating the Audio Separator API.\"\"\"\n\n    # Class variables to store state across requests\n    jobs = {}\n    models_data = {\n        \"model1.ckpt\": {\"Type\": \"MDXC\", \"Stems\": [\"vocals (12.9)\", \"instrumental (17.0)\"], \"Name\": \"Test Model 1\"},\n        \"model2.onnx\": {\"Type\": \"MDX\", \"Stems\": [\"vocals (10.5)\", \"instrumental (15.2)\"], \"Name\": \"Test Model 2\"},\n        \"htdemucs_6s.yaml\": {\"Type\": \"Demucs\", \"Stems\": [\"vocals (9.7)\", \"drums (8.5)\", \"bass (10.0)\", \"guitar\", \"piano\", \"other\"], \"Name\": \"Test Demucs 6s\"},\n    }\n\n    def log_message(self, format, *args):\n        \"\"\"Suppress HTTP server logs.\"\"\"\n        pass\n\n    def do_GET(self):\n        \"\"\"Handle GET requests.\"\"\"\n        path = self.path\n\n        if path == \"/health\":\n            self.send_json_response({\"status\": \"healthy\", \"version\": \"1.0.0\"})\n\n        elif path.startswith(\"/status/\"):\n            task_id = path.split(\"/\")[-1]\n            if task_id in self.jobs:\n                self.send_json_response(self.jobs[task_id])\n            else:\n                self.send_json_response({\"task_id\": task_id, \"status\": \"not_found\", \"error\": \"Job not found\"}, status=404)\n\n        elif path.startswith(\"/download/\"):\n            # Parse task_id and filename from path like /download/task123/output.wav\n            parts = path.split(\"/\")\n            if len(parts) >= 4:\n                task_id = parts[2]\n                filename = urllib.parse.unquote(parts[3])\n\n                # Check if job exists and is completed\n                if task_id in self.jobs and self.jobs[task_id][\"status\"] == \"completed\":\n                    # Simulate audio file content\n                    audio_content = b\"fake audio file content for \" + filename.encode()\n                    self.send_response(200)\n                    self.send_header(\"Content-Type\", \"audio/wav\")\n                    self.send_header(\"Content-Disposition\", f\"attachment; filename={filename}\")\n                    self.end_headers()\n                    self.wfile.write(audio_content)\n                else:\n                    self.send_json_response({\"error\": \"File not found\"}, status=404)\n            else:\n                self.send_json_response({\"error\": \"Invalid download path\"}, status=400)\n\n        elif path == \"/models\" or path.startswith(\"/models?\"):\n            # Parse query parameters\n            if \"?\" in path:\n                query_string = path.split(\"?\", 1)[1]\n                params = urllib.parse.parse_qs(query_string)\n                filter_by = params.get(\"filter_sort_by\", [None])[0]\n            else:\n                filter_by = None\n\n            # Filter models if requested\n            filtered_models = self.models_data\n            if filter_by:\n                filtered_models = {k: v for k, v in self.models_data.items() if filter_by.lower() in \" \".join(v[\"Stems\"]).lower() or filter_by.lower() in v[\"Name\"].lower()}\n\n            # Format as plain text (like the real API)\n            text_output = self._format_models_as_text(filtered_models)\n            self.send_response(200)\n            self.send_header(\"Content-Type\", \"text/plain\")\n            self.end_headers()\n            self.wfile.write(text_output.encode())\n\n        elif path == \"/models-json\":\n            self.send_json_response(self.models_data)\n\n        else:\n            self.send_json_response({\"error\": \"Not found\"}, status=404)\n\n    def do_POST(self):\n        \"\"\"Handle POST requests.\"\"\"\n        if self.path == \"/separate\":\n            content_length = int(self.headers[\"Content-Length\"])\n            post_data = self.rfile.read(content_length)\n\n            # Simple task ID generation\n            task_id = f\"task-{len(self.jobs) + 1:03d}\"\n\n            # Initialize job status\n            self.jobs[task_id] = {\n                \"task_id\": task_id,\n                \"status\": \"submitted\",\n                \"progress\": 0,\n                \"original_filename\": \"test.wav\",\n                \"models_used\": [\"default\"],\n                \"total_models\": 1,\n                \"current_model_index\": 0,\n                \"files\": [],\n            }\n\n            # Simulate processing in background (for testing polling)\n            threading.Thread(target=self._simulate_processing, args=(task_id,)).start()\n\n            self.send_json_response(\n                {\"task_id\": task_id, \"status\": \"submitted\", \"message\": \"Job submitted for processing\", \"models_used\": [\"default\"], \"total_models\": 1, \"original_filename\": \"test.wav\"}\n            )\n        else:\n            self.send_json_response({\"error\": \"Not found\"}, status=404)\n\n    def _simulate_processing(self, task_id):\n        \"\"\"Simulate job processing in background.\"\"\"\n        time.sleep(0.1)  # Brief delay\n\n        # Update to processing\n        self.jobs[task_id].update({\"status\": \"processing\", \"progress\": 25})\n\n        time.sleep(0.1)\n\n        # Update progress\n        self.jobs[task_id].update({\"progress\": 50})\n\n        time.sleep(0.1)\n\n        # Update progress\n        self.jobs[task_id].update({\"progress\": 75})\n\n        time.sleep(0.1)\n\n        # Complete\n        self.jobs[task_id].update({\"status\": \"completed\", \"progress\": 100, \"files\": [\"test_(Vocals)_default.flac\", \"test_(Instrumental)_default.flac\"]})\n\n    def _format_models_as_text(self, models):\n        \"\"\"Format models dictionary as plain text table.\"\"\"\n        if not models:\n            return \"No models found\"\n\n        # Calculate column widths\n        filename_width = max(len(\"Model Filename\"), max(len(k) for k in models.keys()))\n        arch_width = max(len(\"Arch\"), max(len(v[\"Type\"]) for v in models.values()))\n        stems_width = max(len(\"Output Stems (SDR)\"), max(len(\", \".join(v[\"Stems\"])) for v in models.values()))\n        name_width = max(len(\"Friendly Name\"), max(len(v[\"Name\"]) for v in models.values()))\n\n        total_width = filename_width + arch_width + stems_width + name_width + 15\n\n        lines = []\n        lines.append(\"-\" * total_width)\n        lines.append(f\"{'Model Filename':<{filename_width}}  {'Arch':<{arch_width}}  {'Output Stems (SDR)':<{stems_width}}  {'Friendly Name'}\")\n        lines.append(\"-\" * total_width)\n\n        for filename, info in models.items():\n            stems = \", \".join(info[\"Stems\"])\n            lines.append(f\"{filename:<{filename_width}}  {info['Type']:<{arch_width}}  {stems:<{stems_width}}  {info['Name']}\")\n\n        return \"\\n\".join(lines)\n\n    def send_json_response(self, data, status=200):\n        \"\"\"Send JSON response.\"\"\"\n        self.send_response(status)\n        self.send_header(\"Content-Type\", \"application/json\")\n        self.end_headers()\n        response = json.dumps(data).encode()\n        self.wfile.write(response)\n\n\n@pytest.fixture(scope=\"function\")\ndef mock_http_server():\n    \"\"\"Start a mock HTTP server for testing.\"\"\"\n    server = HTTPServer((\"localhost\", 0), MockAPIHandler)\n    server_thread = threading.Thread(target=server.serve_forever)\n    server_thread.daemon = True\n    server_thread.start()\n\n    # Get the actual port the server is using\n    port = server.server_address[1]\n    base_url = f\"http://localhost:{port}\"\n\n    yield base_url\n\n    server.shutdown()\n    server.server_close()\n\n\n@pytest.fixture\ndef api_client(mock_http_server):\n    \"\"\"Create an API client connected to the mock server.\"\"\"\n    logger = logging.getLogger(\"test\")\n    return AudioSeparatorAPIClient(mock_http_server, logger)\n\n\n@pytest.fixture\ndef test_audio_file():\n    \"\"\"Create a temporary test audio file.\"\"\"\n    with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False) as f:\n        f.write(b\"fake audio content for testing\")\n        yield f.name\n    os.unlink(f.name)\n\n\nclass TestRemoteAPIIntegration:\n    \"\"\"Integration tests for the Remote API functionality.\"\"\"\n\n    def test_server_health_check(self, api_client):\n        \"\"\"Test server health check endpoint.\"\"\"\n        version = api_client.get_server_version()\n        assert version == \"1.0.0\"\n\n    def test_list_models_pretty_format(self, api_client):\n        \"\"\"Test listing models in pretty format.\"\"\"\n        result = api_client.list_models(format_type=\"pretty\")\n\n        assert \"text\" in result\n        text = result[\"text\"]\n        assert \"Model Filename\" in text\n        assert \"model1.ckpt\" in text\n        assert \"model2.onnx\" in text\n        assert \"htdemucs_6s.yaml\" in text\n        assert \"Test Model 1\" in text\n\n    def test_list_models_json_format(self, api_client):\n        \"\"\"Test listing models in JSON format.\"\"\"\n        result = api_client.list_models(format_type=\"json\")\n\n        assert \"model1.ckpt\" in result\n        assert \"model2.onnx\" in result\n        assert \"htdemucs_6s.yaml\" in result\n        assert result[\"model1.ckpt\"][\"Type\"] == \"MDXC\"\n        assert result[\"model2.onnx\"][\"Type\"] == \"MDX\"\n        assert result[\"htdemucs_6s.yaml\"][\"Type\"] == \"Demucs\"\n\n    def test_list_models_with_filter(self, api_client):\n        \"\"\"Test listing models with filter.\"\"\"\n        result = api_client.list_models(filter_by=\"vocals\")\n\n        assert \"text\" in result\n        text = result[\"text\"]\n        # Should include models that have \"vocals\" in their stems\n        assert \"model1.ckpt\" in text\n        assert \"model2.onnx\" in text\n        assert \"htdemucs_6s.yaml\" in text\n\n    def test_separate_audio_submission(self, api_client, test_audio_file):\n        \"\"\"Test audio separation job submission.\"\"\"\n        result = api_client.separate_audio(test_audio_file)\n\n        assert \"task_id\" in result\n        assert result[\"status\"] == \"submitted\"\n        assert result[\"models_used\"] == [\"default\"]\n        assert result[\"total_models\"] == 1\n        assert result[\"original_filename\"] == \"test.wav\"\n\n        task_id = result[\"task_id\"]\n        assert task_id.startswith(\"task-\")\n\n    def test_separate_audio_with_custom_parameters(self, api_client, test_audio_file):\n        \"\"\"Test audio separation with custom parameters.\"\"\"\n        result = api_client.separate_audio(\n            test_audio_file,\n            model=\"model1.ckpt\",\n            output_format=\"wav\",\n            normalization_threshold=0.8,\n            mdx_segment_size=512,\n            vr_aggression=10,\n            custom_output_names={\"Vocals\": \"lead_vocals\", \"Instrumental\": \"backing_track\"},\n        )\n\n        assert result[\"status\"] == \"submitted\"\n        assert \"task_id\" in result\n\n    def test_job_status_polling(self, api_client, test_audio_file):\n        \"\"\"Test job status polling through completion.\"\"\"\n        # Submit job\n        result = api_client.separate_audio(test_audio_file)\n        task_id = result[\"task_id\"]\n\n        # Poll until completion\n        max_attempts = 20  # Prevent infinite loop\n        attempts = 0\n\n        while attempts < max_attempts:\n            status = api_client.get_job_status(task_id)\n\n            assert status[\"task_id\"] == task_id\n            assert \"status\" in status\n\n            if status[\"status\"] == \"completed\":\n                assert status[\"progress\"] == 100\n                assert \"files\" in status\n                assert len(status[\"files\"]) == 2  # Vocals and Instrumental\n                break\n            elif status[\"status\"] in [\"submitted\", \"processing\"]:\n                # Continue polling\n                time.sleep(0.05)  # Small delay between polls\n            else:\n                pytest.fail(f\"Unexpected job status: {status['status']}\")\n\n            attempts += 1\n\n        if attempts >= max_attempts:\n            pytest.fail(\"Job did not complete within expected time\")\n\n    def test_file_download(self, api_client, test_audio_file):\n        \"\"\"Test downloading files from completed job.\"\"\"\n        # Submit and wait for completion\n        result = api_client.separate_audio(test_audio_file)\n        task_id = result[\"task_id\"]\n\n        # Wait for completion (simplified polling)\n        for _ in range(20):\n            status = api_client.get_job_status(task_id)\n            if status[\"status\"] == \"completed\":\n                break\n            time.sleep(0.05)\n\n        assert status[\"status\"] == \"completed\"\n        assert len(status[\"files\"]) == 2\n\n        # Download first file\n        filename = status[\"files\"][0]\n        with tempfile.NamedTemporaryFile(delete=False) as temp_output:\n            output_path = temp_output.name\n\n        try:\n            downloaded_path = api_client.download_file(task_id, filename, output_path)\n            assert downloaded_path == output_path\n            assert os.path.exists(output_path)\n\n            # Verify file content\n            with open(output_path, \"rb\") as f:\n                content = f.read()\n                assert content.startswith(b\"fake audio file content\")\n                assert filename.encode() in content\n        finally:\n            if os.path.exists(output_path):\n                os.unlink(output_path)\n\n    def test_separate_audio_and_wait_success(self, api_client, test_audio_file):\n        \"\"\"Test the convenience method for separating audio and waiting for completion.\"\"\"\n        with tempfile.TemporaryDirectory() as temp_dir:\n            result = api_client.separate_audio_and_wait(test_audio_file, timeout=5, poll_interval=0.05, download=True, output_dir=temp_dir)\n\n            assert result[\"status\"] == \"completed\"\n            assert \"task_id\" in result\n            assert \"files\" in result\n            assert \"downloaded_files\" in result\n            assert len(result[\"files\"]) == 2\n            assert len(result[\"downloaded_files\"]) == 2\n\n            # Verify files were actually downloaded\n            for file_path in result[\"downloaded_files\"]:\n                assert os.path.exists(file_path)\n                assert file_path.startswith(temp_dir)\n\n    def test_separate_audio_and_wait_no_download(self, api_client, test_audio_file):\n        \"\"\"Test convenience method without downloading files.\"\"\"\n        result = api_client.separate_audio_and_wait(test_audio_file, timeout=5, poll_interval=0.05, download=False)\n\n        assert result[\"status\"] == \"completed\"\n        assert \"files\" in result\n        assert \"downloaded_files\" not in result\n\n    def test_job_status_not_found(self, api_client):\n        \"\"\"Test getting status for non-existent job.\"\"\"\n        import requests\n\n        with pytest.raises(requests.exceptions.HTTPError):\n            api_client.get_job_status(\"nonexistent-task-id\")\n\n    def test_download_file_not_found(self, api_client):\n        \"\"\"Test downloading file for non-existent job.\"\"\"\n        with pytest.raises(Exception):  # Should raise an exception for 404\n            api_client.download_file(\"nonexistent-task-id\", \"file.wav\")\n\n    def test_multiple_concurrent_jobs(self, api_client, test_audio_file):\n        \"\"\"Test handling multiple concurrent jobs.\"\"\"\n        # Submit multiple jobs\n        num_jobs = 3\n        jobs = []\n\n        for i in range(num_jobs):\n            result = api_client.separate_audio(test_audio_file)\n            jobs.append(result[\"task_id\"])\n\n        # Wait for all jobs to complete\n        completed_jobs = []\n        max_attempts = 30\n\n        for attempt in range(max_attempts):\n            for task_id in jobs:\n                if task_id not in completed_jobs:\n                    status = api_client.get_job_status(task_id)\n                    if status[\"status\"] == \"completed\":\n                        completed_jobs.append(task_id)\n\n            if len(completed_jobs) == num_jobs:\n                break\n\n            time.sleep(0.05)\n\n        assert len(completed_jobs) == num_jobs, \"Not all jobs completed in expected time\"\n\n    def test_separate_audio_with_multiple_models(self, api_client, test_audio_file):\n        \"\"\"Test separation with multiple models (parameter passing).\"\"\"\n        models = [\"model1.ckpt\", \"model2.onnx\"]\n        result = api_client.separate_audio(test_audio_file, models=models)\n\n        assert result[\"status\"] == \"submitted\"\n        assert \"task_id\" in result\n        # Note: The mock server doesn't fully simulate multiple model processing,\n        # but we can test that the parameters are accepted\n\n\n# CLI Integration Tests (using the mock server)\nclass TestRemoteCLIIntegration:\n    \"\"\"Integration tests for the remote CLI.\"\"\"\n\n    @patch(\"audio_separator.remote.cli.AudioSeparatorAPIClient\")\n    def test_cli_separate_command_integration(self, mock_client_class, test_audio_file):\n        \"\"\"Test CLI separate command integration.\"\"\"\n        from audio_separator.remote.cli import handle_separate_command\n\n        # Set up mock client\n        mock_client = Mock()\n        mock_client.separate_audio_and_wait.return_value = {\"status\": \"completed\", \"downloaded_files\": [\"output1.wav\", \"output2.wav\"]}\n        mock_client_class.return_value = mock_client\n\n        # Mock arguments (simplified)\n        args = Mock()\n        args.audio_files = [test_audio_file]\n        args.model = \"test_model.ckpt\"\n        args.models = None\n        args.timeout = 600\n        args.poll_interval = 10\n\n        # Set all required attributes with appropriate defaults\n        default_attrs = {\n            \"output_format\": \"flac\",\n            \"output_bitrate\": None,\n            \"normalization\": 0.9,\n            \"amplification\": 0.0,\n            \"single_stem\": None,\n            \"invert_spect\": False,\n            \"sample_rate\": 44100,\n            \"use_soundfile\": False,\n            \"use_autocast\": False,\n            \"custom_output_names\": None,\n            \"mdx_segment_size\": 256,\n            \"mdx_overlap\": 0.25,\n            \"mdx_batch_size\": 1,\n            \"mdx_hop_length\": 1024,\n            \"mdx_enable_denoise\": False,\n            \"vr_batch_size\": 1,\n            \"vr_window_size\": 512,\n            \"vr_aggression\": 5,\n            \"vr_enable_tta\": False,\n            \"vr_high_end_process\": False,\n            \"vr_enable_post_process\": False,\n            \"vr_post_process_threshold\": 0.2,\n            \"demucs_segment_size\": \"Default\",\n            \"demucs_shifts\": 2,\n            \"demucs_overlap\": 0.25,\n            \"demucs_segments_enabled\": True,\n            \"mdxc_segment_size\": 256,\n            \"mdxc_override_model_segment_size\": False,\n            \"mdxc_overlap\": 8,\n            \"mdxc_batch_size\": 1,\n            \"mdxc_pitch_shift\": 0,\n        }\n\n        for attr, value in default_attrs.items():\n            setattr(args, attr, value)\n\n        logger = Mock()\n\n        # Execute the command\n        handle_separate_command(args, mock_client, logger)\n\n        # Verify the API client method was called\n        mock_client.separate_audio_and_wait.assert_called_once()\n\n        # Verify success was logged\n        logger.info.assert_called()\n\n\n# End-to-End Test with Real Audio File\nclass TestRemoteAPIEndToEnd:\n    \"\"\"End-to-end tests using realistic audio data.\"\"\"\n\n    def test_end_to_end_workflow(self, api_client):\n        \"\"\"Test complete workflow from submission to download.\"\"\"\n        # Create a more realistic \"audio\" file\n        with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False) as f:\n            # Write some data that could represent a small WAV file\n            f.write(b\"RIFF\")  # WAV header\n            f.write(b\"\\x24\\x00\\x00\\x00\")  # File size\n            f.write(b\"WAVE\")  # Format\n            f.write(b\"fake audio data for testing\" * 10)  # Some fake audio data\n            test_file = f.name\n\n        try:\n            # Step 1: Check server health\n            version = api_client.get_server_version()\n            assert version is not None\n\n            # Step 2: List available models\n            models = api_client.list_models()\n            assert \"text\" in models\n\n            # Step 3: Submit separation job\n            result = api_client.separate_audio(test_file)\n            task_id = result[\"task_id\"]\n            assert result[\"status\"] == \"submitted\"\n\n            # Step 4: Poll for completion\n            completed = False\n            for _ in range(20):\n                status = api_client.get_job_status(task_id)\n                if status[\"status\"] == \"completed\":\n                    completed = True\n                    break\n                elif status[\"status\"] == \"error\":\n                    pytest.fail(f\"Job failed: {status.get('error', 'Unknown error')}\")\n                time.sleep(0.05)\n\n            assert completed, \"Job did not complete in expected time\"\n\n            # Step 5: Download results\n            files = status[\"files\"]\n            assert len(files) > 0\n\n            with tempfile.TemporaryDirectory() as temp_dir:\n                for filename in files:\n                    output_path = os.path.join(temp_dir, filename)\n                    downloaded_path = api_client.download_file(task_id, filename, output_path)\n                    assert os.path.exists(downloaded_path)\n                    assert os.path.getsize(downloaded_path) > 0\n\n        finally:\n            os.unlink(test_file)\n\n    def test_error_handling_workflow(self, api_client):\n        \"\"\"Test error handling in various scenarios.\"\"\"\n        import requests\n\n        # Test with non-existent file\n        with pytest.raises(FileNotFoundError):\n            api_client.separate_audio(\"/non/existent/file.wav\")\n\n        # Test status for non-existent job\n        with pytest.raises(requests.exceptions.HTTPError):\n            api_client.get_job_status(\"invalid-task-id\")\n\n        # Test download for non-existent job/file\n        with pytest.raises(requests.exceptions.HTTPError):\n            api_client.download_file(\"invalid-task-id\", \"file.wav\")\n"
  },
  {
    "path": "tests/integration/test_roformer_audio_quality.py",
    "content": "\"\"\"\nAudio quality validation tests for Roformer models.\nThese tests ensure audio quality hasn't regressed after the update.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport numpy as np\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerAudioQuality:\n    \"\"\"Test audio quality validation for Roformer models.\"\"\"\n    \n    @pytest.fixture\n    def reference_audio_file(self):\n        \"\"\"Create reference audio file for testing.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:\n            # Create mock audio data (in real implementation, this would be actual audio)\n            sample_rate = 44100\n            duration = 1.0  # 1 second\n            t = np.linspace(0, duration, int(sample_rate * duration))\n            audio_data = np.sin(2 * np.pi * 440 * t)  # 440 Hz sine wave\n            \n            # Write as simple binary data for mock\n            tmp.write(audio_data.tobytes())\n            yield tmp.name\n        \n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_bs_roformer_audio_quality_regression(self, reference_audio_file):\n        \"\"\"Test that BSRoformer models maintain audio quality after update.\"\"\"\n        \n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(output_dir=output_dir)\n            \n            # This would load an actual BSRoformer model in real implementation\n            # For now, it's a placeholder that will fail (TDD requirement)\n            separator.load_model(\"bs_roformer_test_model.ckpt\")\n            \n            outputs = separator.separate(reference_audio_file)\n            \n            # Verify outputs exist\n            assert len(outputs) >= 2  # Expecting vocal and instrumental\n            \n            for output_file in outputs:\n                assert os.path.exists(output_file)\n                assert os.path.getsize(output_file) > 0\n                \n                # In real implementation, this would:\n                # 1. Load reference output from before update\n                # 2. Calculate SSIM similarity between waveforms\n                # 3. Assert similarity >= 0.90 (per FR-006)\n                # 4. Calculate spectrogram SSIM\n                # 5. Assert spectrogram similarity >= 0.80 (per FR-006)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_mel_band_roformer_audio_quality_validation(self, reference_audio_file):\n        \"\"\"Test that MelBandRoformer models produce high-quality separation.\"\"\"\n        \n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(output_dir=output_dir)\n            \n            # This would load an actual MelBandRoformer model\n            separator.load_model(\"mel_band_roformer_test_model.ckpt\")\n            \n            outputs = separator.separate(reference_audio_file)\n            \n            # Quality validation\n            assert len(outputs) > 0\n            \n            for output_file in outputs:\n                assert os.path.exists(output_file)\n                \n                # In real implementation:\n                # 1. Analyze audio for artifacts\n                # 2. Check frequency response\n                # 3. Validate stem separation quality\n                # 4. Ensure no clipping or distortion\n    \n    # TDD placeholder test removed - implementation is now complete\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_audio_similarity_calculation_framework(self, reference_audio_file):\n        \"\"\"Test the audio similarity calculation framework.\"\"\"\n        \n        # This test would verify the SSIM calculation framework works correctly\n        # It's a placeholder for the actual similarity calculation implementation\n        \n        # Mock similarity calculation\n        def calculate_waveform_similarity(audio1, audio2):\n            # Placeholder - real implementation would use SSIM\n            return 0.95  # Mock high similarity\n        \n        def calculate_spectrogram_similarity(audio1, audio2):\n            # Placeholder - real implementation would convert to spectrogram and use SSIM\n            return 0.85  # Mock good similarity\n        \n        # Test the calculation functions work\n        mock_audio1 = np.random.random(1000)\n        mock_audio2 = mock_audio1 + np.random.random(1000) * 0.1  # Similar with noise\n        \n        waveform_sim = calculate_waveform_similarity(mock_audio1, mock_audio2)\n        spectrogram_sim = calculate_spectrogram_similarity(mock_audio1, mock_audio2)\n        \n        assert waveform_sim >= 0.90, f\"Waveform similarity {waveform_sim} below threshold\"\n        assert spectrogram_sim >= 0.80, f\"Spectrogram similarity {spectrogram_sim} below threshold\"\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_roformer_backward_compatibility.py",
    "content": "\"\"\"\nIntegration test for existing older model compatibility.\nThis test ensures that existing models continue to work without regression.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport numpy as np\nfrom unittest.mock import patch, Mock\nimport torch\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    from audio_separator.separator.roformer.roformer_loader import RoformerLoader\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerBackwardCompatibility:\n    \"\"\"Test backward compatibility with existing older Roformer models.\"\"\"\n    \n    @pytest.fixture\n    def mock_old_roformer_model(self):\n        \"\"\"Create a mock old Roformer model file for testing.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            # Create a minimal mock model state dict that represents an old Roformer\n            mock_state = {\n                'state_dict': {\n                    'model.dim': torch.tensor(512),\n                    'model.depth': torch.tensor(6),\n                    'model.stereo': torch.tensor(False),\n                    'model.num_stems': torch.tensor(2),\n                    # Old model doesn't have new parameters like mlp_expansion_factor\n                },\n                'config': {\n                    'dim': 512,\n                    'depth': 6,\n                    'stereo': False,\n                    'num_stems': 2,\n                    'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n                    # Missing new parameters: mlp_expansion_factor, sage_attention, etc.\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        \n        # Cleanup\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.fixture\n    def mock_audio_file(self):\n        \"\"\"Create a mock audio file for testing.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.flac', delete=False) as tmp:\n            # Create minimal mock audio data (this would normally be actual audio)\n            tmp.write(b'mock_audio_data')\n            yield tmp.name\n        \n        # Cleanup\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_load_existing_older_model_without_regression(self, mock_old_roformer_model, mock_audio_file):\n        \"\"\"Test that existing older models load and work identically to before update.\"\"\"\n        \n        # This test MUST FAIL initially because implementation doesn't exist\n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(\n                model_file_dir=os.path.dirname(mock_old_roformer_model),\n                output_dir=output_dir\n            )\n            \n            # Load the old model - should work with fallback mechanism\n            separator.load_model(os.path.basename(mock_old_roformer_model))\n            \n            # Separate audio - should produce same results as before\n            output_files = separator.separate(mock_audio_file)\n            \n            # Verify outputs exist and are valid\n            assert len(output_files) == 2, \"Should produce 2 stems (vocal/instrumental)\"\n            for output_file in output_files:\n                assert os.path.exists(output_file), f\"Output file should exist: {output_file}\"\n                assert os.path.getsize(output_file) > 0, f\"Output file should not be empty: {output_file}\"\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_old_model_uses_fallback_implementation(self, mock_old_roformer_model):\n        \"\"\"Test that old models automatically use fallback to old implementation.\"\"\"\n        \n        # Mock the loader to verify fallback behavior\n        with patch('audio_separator.separator.roformer.roformer_loader.RoformerLoader') as mock_loader_class:\n            mock_loader = Mock()\n            mock_loader_class.return_value = mock_loader\n            \n            # Configure mock to simulate fallback scenario\n            from audio_separator.separator.roformer.roformer_loader import ModelLoadingResult, ImplementationVersion\n            \n            mock_result = ModelLoadingResult(\n                success=True,\n                model=Mock(),\n                error_message=None,\n                implementation_used=ImplementationVersion.FALLBACK,  # Should use fallback\n                warnings=[\"Fell back to old implementation due to missing parameters\"]\n            )\n            mock_loader.load_model.return_value = mock_result\n            \n            # Load model\n            loader = RoformerLoader()\n            result = loader.load_model(mock_old_roformer_model)\n            \n            # Verify fallback was used\n            assert result.success is True\n            assert result.implementation_used == ImplementationVersion.FALLBACK\n            assert len(result.warnings) > 0\n            assert \"fallback\" in result.warnings[0].lower()\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_old_model_configuration_compatibility(self, mock_old_roformer_model):\n        \"\"\"Test that old model configurations are properly handled.\"\"\"\n        \n        # This test verifies that missing new parameters don't break loading\n        loader = RoformerLoader()\n        \n        # Load model with old configuration format\n        result = loader.load_model(mock_old_roformer_model)\n        \n        # Should succeed despite missing new parameters\n        assert result.success is True\n        assert result.model is not None\n        \n        # Verify that default values were applied for missing parameters\n        model_config = result.model.config\n        assert hasattr(model_config, 'mlp_expansion_factor')\n        assert model_config.mlp_expansion_factor == 4  # Default value\n        assert hasattr(model_config, 'sage_attention')\n        assert model_config.sage_attention is False  # Default value\n    \n    # TDD placeholder test removed - implementation is now complete\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_audio_quality_regression_detection(self, mock_old_roformer_model, mock_audio_file):\n        \"\"\"Test that audio quality hasn't regressed from the update.\"\"\"\n        \n        # This test would compare outputs before and after the update\n        # For now, it's a placeholder that will be implemented with actual audio processing\n        \n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(output_dir=output_dir)\n            separator.load_model(mock_old_roformer_model)\n            \n            outputs = separator.separate(mock_audio_file)\n            \n            # In real implementation, this would:\n            # 1. Load reference outputs from before the update\n            # 2. Compare waveforms using SSIM or similar metrics\n            # 3. Assert similarity >= 0.90 (per specification)\n            \n            # For now, just verify outputs exist\n            assert len(outputs) > 0\n            for output in outputs:\n                assert os.path.exists(output)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_performance_no_significant_degradation(self, mock_old_roformer_model, mock_audio_file):\n        \"\"\"Test that model loading performance hasn't significantly degraded.\"\"\"\n        \n        import time\n        \n        # Measure loading time\n        start_time = time.time()\n        \n        separator = Separator()\n        separator.load_model(mock_old_roformer_model)\n        \n        loading_time = time.time() - start_time\n        \n        # Should load in reasonable time (this is a placeholder threshold)\n        assert loading_time < 30.0, f\"Model loading took too long: {loading_time}s\"\n        \n        # Measure separation time\n        start_time = time.time()\n        outputs = separator.separate(mock_audio_file)\n        separation_time = time.time() - start_time\n        \n        # Should separate in reasonable time\n        assert separation_time < 60.0, f\"Audio separation took too long: {separation_time}s\"\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_roformer_config_validation.py",
    "content": "\"\"\"\nIntegration test for configuration validation error handling.\nThis test ensures invalid configurations are caught with helpful error messages.\n\"\"\"\n\nimport pytest\nimport tempfile\nimport torch\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    from audio_separator.separator.roformer.roformer_loader import ParameterValidationError\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerConfigValidation:\n    \"\"\"Test configuration validation and error handling.\"\"\"\n    \n    @pytest.fixture\n    def mock_invalid_config_model(self):\n        \"\"\"Create a mock model with invalid configuration.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            mock_state = {\n                'config': {\n                    'dim': \"invalid_string\",  # Should be int\n                    'depth': -1,              # Should be positive\n                    'attn_dropout': 1.5,      # Should be 0.0-1.0\n                    # Missing required 'num_stems'\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        import os\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_parameter_validation_error_handling(self, mock_invalid_config_model):\n        \"\"\"Test that invalid configurations raise ParameterValidationError.\"\"\"\n        \n        separator = Separator()\n        \n        with pytest.raises(ParameterValidationError) as exc_info:\n            separator.load_model(mock_invalid_config_model)\n        \n        error = exc_info.value\n        assert error.parameter_name is not None\n        assert error.expected_type is not None\n        assert error.suggested_fix is not None\n        assert \"integer\" in error.suggested_fix or \"positive\" in error.suggested_fix\n    \n    # TDD placeholder test removed - implementation is now complete\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_roformer_e2e.py",
    "content": "\"\"\"\nEnd-to-end integration tests for Roformer models.\nTests complete separation workflow with both new and legacy models.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport shutil\nfrom unittest.mock import Mock, patch, MagicMock\nimport torch\nimport numpy as np\nfrom pathlib import Path\n\n\nclass TestRoformerE2E:\n    \"\"\"End-to-end tests for Roformer model separation.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.temp_dir = tempfile.mkdtemp()\n        self.test_audio_path = os.path.join(self.temp_dir, \"test_audio.flac\")\n        self.output_dir = os.path.join(self.temp_dir, \"output\")\n        os.makedirs(self.output_dir, exist_ok=True)\n        \n        # Create a mock audio file\n        self._create_mock_audio_file()\n\n    def teardown_method(self):\n        \"\"\"Clean up test fixtures.\"\"\"\n        shutil.rmtree(self.temp_dir, ignore_errors=True)\n\n    def _create_mock_audio_file(self):\n        \"\"\"Create a mock audio file for testing.\"\"\"\n        # Create a simple stereo audio file (mock)\n        sample_rate = 44100\n        duration = 3  # 3 seconds\n        samples = int(sample_rate * duration)\n        \n        # Generate simple test audio (sine waves)\n        t = np.linspace(0, duration, samples)\n        left_channel = np.sin(2 * np.pi * 440 * t)  # 440 Hz\n        right_channel = np.sin(2 * np.pi * 880 * t)  # 880 Hz\n        audio_data = np.stack([left_channel, right_channel])\n        \n        # Mock writing audio file (in real test, would use soundfile)\n        with open(self.test_audio_path, 'w') as f:\n            f.write(\"mock_audio_data\")\n\n    def test_bs_roformer_sw_fixed_e2e(self):\n        \"\"\"T060: New BSRoformer SW-Fixed end-to-end separation succeeds.\"\"\"\n        # Mock the new BSRoformer SW-Fixed model\n        mock_model_path = \"model_bs_roformer_sw_fixed.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            # Setup mock separator\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            \n            # Mock model loading success\n            mock_separator.load_model.return_value = True\n            \n            # Mock separation results\n            expected_outputs = [\n                os.path.join(self.output_dir, \"test_audio_(vocals).flac\"),\n                os.path.join(self.output_dir, \"test_audio_(instrumental).flac\")\n            ]\n            mock_separator.separate.return_value = expected_outputs\n            \n            # Mock model configuration for new BSRoformer\n            mock_model_config = {\n                'model_type': 'bs_roformer',\n                'architecture': 'BSRoformer',\n                'dim': 512,\n                'depth': 12,\n                'stereo': True,\n                'num_stems': 2,\n                'freqs_per_bands': (4096, 2048, 1024, 512),\n                # New parameters that should be supported\n                'mlp_expansion_factor': 4,\n                'sage_attention': True,\n                'zero_dc': True,\n                'use_torch_checkpoint': False,\n                'skip_connection': True\n            }\n            \n            # Initialize separator\n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            # Test model loading\n            load_success = separator.load_model(mock_model_path)\n            assert load_success, \"BSRoformer SW-Fixed model should load successfully\"\n            \n            # Verify model was loaded with correct parameters\n            separator.load_model.assert_called_once_with(mock_model_path)\n            \n            # Test separation\n            output_files = separator.separate(self.test_audio_path)\n            \n            # Verify separation completed\n            assert output_files is not None, \"Separation should return output files\"\n            assert len(output_files) == 2, \"Should produce vocals and instrumental outputs\"\n            \n            # Verify output file paths\n            for output_file in output_files:\n                assert output_file in expected_outputs, f\"Unexpected output file: {output_file}\"\n                # In real test, would verify file exists and has content\n                # assert os.path.exists(output_file), f\"Output file should exist: {output_file}\"\n            \n            # Verify separation was called correctly\n            separator.separate.assert_called_once_with(self.test_audio_path)\n\n    def test_legacy_roformer_e2e(self):\n        \"\"\"T061: Legacy Roformer end-to-end separation still succeeds.\"\"\"\n        # Mock a legacy Roformer model (without new parameters)\n        mock_legacy_model_path = \"legacy_roformer_model.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            # Setup mock separator\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            \n            # Mock model loading success with fallback\n            mock_separator.load_model.return_value = True\n            \n            # Mock separation results\n            expected_outputs = [\n                os.path.join(self.output_dir, \"test_audio_(vocals).flac\"),\n                os.path.join(self.output_dir, \"test_audio_(instrumental).flac\")\n            ]\n            mock_separator.separate.return_value = expected_outputs\n            \n            # Mock legacy model configuration (without new parameters)\n            mock_legacy_config = {\n                'model_type': 'bs_roformer',\n                'architecture': 'BSRoformer',\n                'dim': 384,\n                'depth': 8,\n                'stereo': True,\n                'num_stems': 2,\n                'freqs_per_bands': (2048, 1024, 512, 256),\n                # Legacy config - missing new parameters\n                # Should fall back to old implementation\n            }\n            \n            # Initialize separator\n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            # Test model loading (should use fallback mechanism)\n            load_success = separator.load_model(mock_legacy_model_path)\n            assert load_success, \"Legacy Roformer model should load successfully via fallback\"\n            \n            # Verify model was loaded\n            separator.load_model.assert_called_once_with(mock_legacy_model_path)\n            \n            # Test separation\n            output_files = separator.separate(self.test_audio_path)\n            \n            # Verify separation completed\n            assert output_files is not None, \"Legacy separation should return output files\"\n            assert len(output_files) == 2, \"Should produce vocals and instrumental outputs\"\n            \n            # Verify output file paths match expected\n            for output_file in output_files:\n                assert output_file in expected_outputs, f\"Unexpected output file: {output_file}\"\n            \n            # Verify separation was called correctly\n            separator.separate.assert_called_once_with(self.test_audio_path)\n\n    def test_mel_band_roformer_e2e(self):\n        \"\"\"Test MelBandRoformer end-to-end separation.\"\"\"\n        mock_mel_model_path = \"mel_band_roformer_model.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            # Setup mock separator\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            \n            # Mock model loading success\n            mock_separator.load_model.return_value = True\n            \n            # Mock separation results for MelBandRoformer\n            expected_outputs = [\n                os.path.join(self.output_dir, \"test_audio_(vocals).flac\"),\n                os.path.join(self.output_dir, \"test_audio_(accompaniment).flac\")\n            ]\n            mock_separator.separate.return_value = expected_outputs\n            \n            # Mock MelBandRoformer configuration\n            mock_mel_config = {\n                'model_type': 'mel_band_roformer',\n                'architecture': 'MelBandRoformer',\n                'dim': 256,\n                'depth': 6,\n                'stereo': True,\n                'num_stems': 2,\n                'num_bands': 64,\n                'sample_rate': 44100,\n                # New parameters\n                'mlp_expansion_factor': 4,\n                'sage_attention': False,\n                'zero_dc': True\n            }\n            \n            # Initialize separator\n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            # Test model loading\n            load_success = separator.load_model(mock_mel_model_path)\n            assert load_success, \"MelBandRoformer model should load successfully\"\n            \n            # Test separation\n            output_files = separator.separate(self.test_audio_path)\n            \n            # Verify separation completed\n            assert output_files is not None, \"MelBandRoformer separation should return output files\"\n            assert len(output_files) == 2, \"Should produce two output stems\"\n            \n            # Verify output file paths\n            for output_file in output_files:\n                assert output_file in expected_outputs, f\"Unexpected output file: {output_file}\"\n\n    def test_roformer_e2e_with_different_audio_formats(self):\n        \"\"\"Test E2E separation with different audio formats.\"\"\"\n        audio_formats = ['.flac', '.wav', '.mp3', '.m4a']\n        mock_model_path = \"model_bs_roformer_test.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            mock_separator.load_model.return_value = True\n            \n            for audio_format in audio_formats:\n                # Create mock audio file with different format\n                test_audio = os.path.join(self.temp_dir, f\"test_audio{audio_format}\")\n                with open(test_audio, 'w') as f:\n                    f.write(\"mock_audio_data\")\n                \n                # Mock separation results\n                expected_outputs = [\n                    os.path.join(self.output_dir, f\"test_audio_(vocals){audio_format}\"),\n                    os.path.join(self.output_dir, f\"test_audio_(instrumental){audio_format}\")\n                ]\n                mock_separator.separate.return_value = expected_outputs\n                \n                # Initialize separator\n                separator = mock_separator_class(\n                    model_file_dir=self.temp_dir,\n                    output_dir=self.output_dir\n                )\n                \n                # Load model\n                load_success = separator.load_model(mock_model_path)\n                assert load_success, f\"Model should load for {audio_format} format\"\n                \n                # Test separation\n                output_files = separator.separate(test_audio)\n                assert output_files is not None, f\"Separation should work with {audio_format} format\"\n                assert len(output_files) == 2, f\"Should produce 2 outputs for {audio_format}\"\n\n    def test_roformer_e2e_error_handling(self):\n        \"\"\"Test E2E error handling scenarios.\"\"\"\n        mock_model_path = \"problematic_model.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            \n            # Test model loading failure\n            mock_separator.load_model.return_value = False\n            \n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            load_success = separator.load_model(mock_model_path)\n            assert not load_success, \"Should handle model loading failure gracefully\"\n            \n            # Test separation failure\n            mock_separator.load_model.return_value = True  # Model loads\n            mock_separator.separate.side_effect = Exception(\"Separation failed\")\n            \n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            separator.load_model(mock_model_path)\n            \n            # Should handle separation exception\n            with pytest.raises(Exception, match=\"Separation failed\"):\n                separator.separate(self.test_audio_path)\n\n    def test_roformer_e2e_performance_validation(self):\n        \"\"\"Test E2E performance characteristics.\"\"\"\n        mock_model_path = \"performance_test_model.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            mock_separator.load_model.return_value = True\n            \n            # Mock timing for performance validation\n            import time\n            \n            def mock_separate_with_timing(audio_path):\n                # Simulate processing time\n                start_time = time.time()\n                time.sleep(0.1)  # Simulate 100ms processing\n                end_time = time.time()\n                \n                processing_time = end_time - start_time\n                \n                # Return mock results with timing info\n                return [\n                    os.path.join(self.output_dir, \"test_audio_(vocals).flac\"),\n                    os.path.join(self.output_dir, \"test_audio_(instrumental).flac\")\n                ], processing_time\n            \n            mock_separator.separate.side_effect = lambda x: mock_separate_with_timing(x)[0]\n            \n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            separator.load_model(mock_model_path)\n            \n            # Measure separation time\n            start_time = time.time()\n            output_files = separator.separate(self.test_audio_path)\n            end_time = time.time()\n            \n            processing_time = end_time - start_time\n            \n            # Verify performance characteristics\n            assert output_files is not None, \"Should produce outputs\"\n            assert processing_time < 10.0, \"Processing should complete in reasonable time\"\n            assert len(output_files) == 2, \"Should produce expected number of outputs\"\n\n    def test_roformer_e2e_memory_usage(self):\n        \"\"\"Test E2E memory usage characteristics.\"\"\"\n        mock_model_path = \"memory_test_model.ckpt\"\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            mock_separator.load_model.return_value = True\n            \n            # Mock memory usage tracking\n            def mock_separate_with_memory_tracking(audio_path):\n                # Simulate memory usage\n                mock_memory_usage = {\n                    'peak_memory_mb': 1024,  # 1GB peak\n                    'current_memory_mb': 512,  # 512MB current\n                    'gpu_memory_mb': 2048 if torch.cuda.is_available() else 0\n                }\n                \n                return [\n                    os.path.join(self.output_dir, \"test_audio_(vocals).flac\"),\n                    os.path.join(self.output_dir, \"test_audio_(instrumental).flac\")\n                ], mock_memory_usage\n            \n            mock_separator.separate.side_effect = lambda x: mock_separate_with_memory_tracking(x)[0]\n            \n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            separator.load_model(mock_model_path)\n            output_files = separator.separate(self.test_audio_path)\n            \n            # Verify memory usage is reasonable\n            assert output_files is not None, \"Should produce outputs despite memory constraints\"\n            # In real implementation, would check actual memory usage\n            # assert peak_memory < threshold, \"Memory usage should be within limits\"\n\n    def test_roformer_e2e_batch_processing(self):\n        \"\"\"Test E2E batch processing of multiple files.\"\"\"\n        mock_model_path = \"batch_test_model.ckpt\"\n        \n        # Create multiple test audio files\n        test_files = []\n        for i in range(3):\n            test_file = os.path.join(self.temp_dir, f\"test_audio_{i}.flac\")\n            with open(test_file, 'w') as f:\n                f.write(f\"mock_audio_data_{i}\")\n            test_files.append(test_file)\n        \n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_separator = Mock()\n            mock_separator_class.return_value = mock_separator\n            mock_separator.load_model.return_value = True\n            \n            # Mock batch separation results\n            def mock_batch_separate(audio_path):\n                basename = os.path.splitext(os.path.basename(audio_path))[0]\n                return [\n                    os.path.join(self.output_dir, f\"{basename}_(vocals).flac\"),\n                    os.path.join(self.output_dir, f\"{basename}_(instrumental).flac\")\n                ]\n            \n            mock_separator.separate.side_effect = mock_batch_separate\n            \n            separator = mock_separator_class(\n                model_file_dir=self.temp_dir,\n                output_dir=self.output_dir\n            )\n            \n            separator.load_model(mock_model_path)\n            \n            # Process all files\n            all_outputs = []\n            for test_file in test_files:\n                outputs = separator.separate(test_file)\n                all_outputs.extend(outputs)\n            \n            # Verify batch processing\n            expected_total_outputs = len(test_files) * 2  # 2 outputs per input\n            assert len(all_outputs) == expected_total_outputs, \"Should process all files\"\n            \n            # Verify each file was processed\n            for i in range(len(test_files)):\n                expected_vocals = os.path.join(self.output_dir, f\"test_audio_{i}_(vocals).flac\")\n                expected_instrumental = os.path.join(self.output_dir, f\"test_audio_{i}_(instrumental).flac\")\n                assert expected_vocals in all_outputs, f\"Missing vocals output for file {i}\"\n                assert expected_instrumental in all_outputs, f\"Missing instrumental output for file {i}\"\n"
  },
  {
    "path": "tests/integration/test_roformer_fallback_mechanism.py",
    "content": "\"\"\"\nIntegration test for fallback mechanism activation.\nThis test ensures fallback from new to old implementation works correctly.\n\"\"\"\n\nimport pytest\nimport tempfile\nimport torch\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    from audio_separator.separator.roformer.roformer_loader import RoformerLoader, ImplementationVersion\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerFallbackMechanism:\n    \"\"\"Test fallback mechanism activation.\"\"\"\n    \n    @pytest.fixture\n    def mock_edge_case_model(self):\n        \"\"\"Create a mock model that triggers fallback.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            # Model that might cause new implementation to fail\n            mock_state = {\n                'config': {\n                    'dim': 512, 'depth': 6, 'num_stems': 2,\n                    'legacy_parameter': True,  # Unknown to new implementation\n                    'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        import os\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_fallback_mechanism_activation(self, mock_edge_case_model):\n        \"\"\"Test that fallback mechanism activates when new implementation fails.\"\"\"\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_edge_case_model)\n        \n        # Should succeed using fallback\n        assert result.success is True\n        assert result.implementation_used == ImplementationVersion.FALLBACK\n        assert len(result.warnings) > 0\n        assert \"fallback\" in result.warnings[0].lower()\n    \n    # TDD placeholder test removed - implementation is now complete\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_roformer_model_switching.py",
    "content": "\"\"\"\nIntegration test for model type switching (BSRoformer ↔ MelBandRoformer).\nThis test ensures seamless switching between different Roformer model types.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport torch\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerModelSwitching:\n    \"\"\"Test switching between different Roformer model types.\"\"\"\n    \n    @pytest.fixture\n    def mock_bs_roformer_model(self):\n        \"\"\"Create a mock BSRoformer model.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            mock_state = {\n                'config': {\n                    'dim': 512, 'depth': 6, 'num_stems': 2,\n                    'freqs_per_bands': (2, 4, 8, 16, 32, 64),  # BSRoformer specific\n                    'model_type': 'bs_roformer'\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.fixture\n    def mock_mel_band_roformer_model(self):\n        \"\"\"Create a mock MelBandRoformer model.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            mock_state = {\n                'config': {\n                    'dim': 384, 'depth': 8, 'num_stems': 4,\n                    'num_bands': 64,  # MelBandRoformer specific\n                    'sample_rate': 44100,\n                    'model_type': 'mel_band_roformer'\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_switch_bs_to_mel_band_roformer(self, mock_bs_roformer_model, mock_mel_band_roformer_model):\n        \"\"\"Test switching from BSRoformer to MelBandRoformer.\"\"\"\n        \n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(output_dir=output_dir)\n            \n            # Load BSRoformer first\n            separator.load_model(mock_bs_roformer_model)\n            assert separator.model_name == \"bs_roformer\"\n            \n            # Switch to MelBandRoformer\n            separator.load_model(mock_mel_band_roformer_model)\n            assert separator.model_name == \"mel_band_roformer\"\n            \n            # Both should work without conflicts\n            assert separator.model is not None\n    \n    # TDD placeholder test removed - implementation is now complete\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_roformer_new_parameters.py",
    "content": "\"\"\"\nIntegration test for newer models with new parameters.\nThis test ensures that newer Roformer models with additional parameters work correctly.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport torch\nfrom unittest.mock import patch, Mock\n\n# This will fail initially - that's expected for TDD\ntry:\n    from audio_separator import Separator\n    from audio_separator.separator.roformer.roformer_loader import RoformerLoader\n    IMPORTS_AVAILABLE = True\nexcept ImportError:\n    IMPORTS_AVAILABLE = False\n\n\nclass TestRoformerNewParameters:\n    \"\"\"Test newer Roformer models with new parameters.\"\"\"\n    \n    @pytest.fixture\n    def mock_new_roformer_model(self):\n        \"\"\"Create a mock new Roformer model with additional parameters.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.ckpt', delete=False) as tmp:\n            # Create mock model state dict with new parameters\n            mock_state = {\n                'state_dict': {\n                    'model.dim': torch.tensor(512),\n                    'model.depth': torch.tensor(6),\n                    'model.stereo': torch.tensor(False),\n                    'model.num_stems': torch.tensor(2),\n                },\n                'config': {\n                    'dim': 512,\n                    'depth': 6,\n                    'stereo': False,\n                    'num_stems': 2,\n                    'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n                    # New parameters that should be supported\n                    'mlp_expansion_factor': 6,  # Non-default value\n                    'sage_attention': True,     # Enabled\n                    'zero_dc': False,           # Non-default value\n                    'use_torch_checkpoint': True,\n                    'skip_connection': True,\n                }\n            }\n            torch.save(mock_state, tmp.name)\n            yield tmp.name\n        \n        # Cleanup\n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.fixture\n    def mock_audio_file(self):\n        \"\"\"Create a mock audio file for testing.\"\"\"\n        with tempfile.NamedTemporaryFile(suffix='.flac', delete=False) as tmp:\n            tmp.write(b'mock_audio_data')\n            yield tmp.name\n        \n        if os.path.exists(tmp.name):\n            os.unlink(tmp.name)\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_load_newer_model_with_new_parameters(self, mock_new_roformer_model, mock_audio_file):\n        \"\"\"Test that newer models with additional parameters load successfully.\"\"\"\n        \n        # This test MUST FAIL initially because implementation doesn't exist\n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(\n                model_file_dir=os.path.dirname(mock_new_roformer_model),\n                output_dir=output_dir\n            )\n            \n            # Load the new model - should work with new implementation\n            separator.load_model(os.path.basename(mock_new_roformer_model))\n            \n            # Separate audio - should work with new parameters\n            output_files = separator.separate(mock_audio_file)\n            \n            # Verify outputs exist and are valid\n            assert len(output_files) >= 1, \"Should produce audio outputs\"\n            for output_file in output_files:\n                assert os.path.exists(output_file), f\"Output file should exist: {output_file}\"\n                assert os.path.getsize(output_file) > 0, f\"Output file should not be empty: {output_file}\"\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_new_model_uses_new_implementation(self, mock_new_roformer_model):\n        \"\"\"Test that new models use the new implementation (not fallback).\"\"\"\n        \n        # Mock the loader to verify new implementation is used\n        with patch('audio_separator.separator.roformer.roformer_loader.RoformerLoader') as mock_loader_class:\n            mock_loader = Mock()\n            mock_loader_class.return_value = mock_loader\n            \n            from audio_separator.separator.roformer.roformer_loader import ModelLoadingResult, ImplementationVersion\n            \n            mock_result = ModelLoadingResult(\n                success=True,\n                model=Mock(),\n                error_message=None,\n                implementation_used=ImplementationVersion.NEW,  # Should use new implementation\n                warnings=[]\n            )\n            mock_loader.load_model.return_value = mock_result\n            \n            # Load model\n            loader = RoformerLoader()\n            result = loader.load_model(mock_new_roformer_model)\n            \n            # Verify new implementation was used\n            assert result.success is True\n            assert result.implementation_used == ImplementationVersion.NEW\n            assert len(result.warnings) == 0  # No fallback warnings\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_new_parameters_are_properly_handled(self, mock_new_roformer_model):\n        \"\"\"Test that new parameters are properly loaded and used.\"\"\"\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        model_config = result.model.config\n        \n        # Verify new parameters are loaded correctly\n        assert model_config.mlp_expansion_factor == 6  # From mock config\n        assert model_config.sage_attention is True     # From mock config\n        assert model_config.zero_dc is False          # From mock config\n        assert model_config.use_torch_checkpoint is True\n        assert model_config.skip_connection is True\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_sage_attention_parameter_integration(self, mock_new_roformer_model):\n        \"\"\"Test that sage_attention parameter is properly integrated.\"\"\"\n        \n        # This test verifies that sage_attention=True is handled correctly\n        # and doesn't cause the AttributeError that was mentioned in the spec\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        \n        # Verify that sage_attention is passed to transformer_kwargs\n        model = result.model\n        assert hasattr(model, 'sage_attention')\n        assert model.sage_attention is True\n        \n        # Verify no AttributeError is raised during model initialization\n        # (This would be caught in the loading process)\n        assert result.error_message is None\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_mlp_expansion_factor_parameter_handling(self, mock_new_roformer_model):\n        \"\"\"Test that mlp_expansion_factor parameter is handled correctly.\"\"\"\n        \n        # This test verifies that the mlp_expansion_factor parameter\n        # doesn't cause the TypeError mentioned in the spec\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        \n        # Verify parameter is properly set\n        model = result.model\n        assert hasattr(model, 'mlp_expansion_factor')\n        assert model.mlp_expansion_factor == 6  # From mock config\n        \n        # Verify no TypeError during initialization\n        assert result.error_message is None\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_torch_checkpoint_parameter_integration(self, mock_new_roformer_model):\n        \"\"\"Test that use_torch_checkpoint parameter works correctly.\"\"\"\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        model = result.model\n        \n        # Verify checkpoint parameter is set\n        assert hasattr(model, 'use_torch_checkpoint')\n        assert model.use_torch_checkpoint is True\n        \n        # This parameter affects memory usage during forward pass\n        # In real implementation, this would be tested during audio separation\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_skip_connection_parameter_integration(self, mock_new_roformer_model):\n        \"\"\"Test that skip_connection parameter works correctly.\"\"\"\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        model = result.model\n        \n        # Verify skip connection parameter is set\n        assert hasattr(model, 'skip_connection')\n        assert model.skip_connection is True\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_zero_dc_parameter_handling(self, mock_new_roformer_model):\n        \"\"\"Test that zero_dc parameter is handled correctly.\"\"\"\n        \n        loader = RoformerLoader()\n        result = loader.load_model(mock_new_roformer_model)\n        \n        assert result.success is True\n        model = result.model\n        \n        # Verify zero_dc parameter is set to non-default value\n        assert hasattr(model, 'zero_dc')\n        assert model.zero_dc is False  # Non-default value from mock\n    \n    # TDD placeholder test removed - implementation is now complete\n    \n    @pytest.mark.skipif(not IMPORTS_AVAILABLE, reason=\"Implementation not available yet (TDD)\")\n    def test_new_model_audio_quality_validation(self, mock_new_roformer_model, mock_audio_file):\n        \"\"\"Test that new models produce high-quality audio separation.\"\"\"\n        \n        with tempfile.TemporaryDirectory() as output_dir:\n            separator = Separator(output_dir=output_dir)\n            separator.load_model(mock_new_roformer_model)\n            \n            outputs = separator.separate(mock_audio_file)\n            \n            # Verify outputs exist and meet quality standards\n            assert len(outputs) > 0\n            for output in outputs:\n                assert os.path.exists(output)\n                \n                # In real implementation, this would:\n                # 1. Load and analyze the audio\n                # 2. Check for artifacts or quality issues\n                # 3. Compare against reference outputs if available\n                # 4. Verify SSIM >= 0.80 for spectrograms (per spec)\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/integration/test_separator_output_integration.py",
    "content": "import os\nimport pytest\nimport tempfile\nimport shutil\nfrom pathlib import Path\n\nfrom audio_separator.separator import Separator\n\n\n@pytest.fixture(name=\"input_file\")\ndef fixture_input_file():\n    \"\"\"Fixture providing the test input audio file path.\"\"\"\n    return \"tests/inputs/mardy20s.flac\"\n\n\n@pytest.fixture(name=\"temp_output_dir\")\ndef fixture_temp_output_dir():\n    \"\"\"Fixture providing a temporary directory for output files.\"\"\"\n    temp_dir = tempfile.mkdtemp()\n    yield temp_dir\n    # Clean up after test\n    shutil.rmtree(temp_dir)\n\n\ndef test_separator_output_dir_and_custom_names(input_file, temp_output_dir):\n    \"\"\"Test that Separator respects output_dir and custom_output_names parameters.\"\"\"\n    print(\"\\n>>> TEST: Checking output_dir with custom output names\")\n    \n    # Define custom output filenames\n    vocal_output_filename = \"custom_vocals_output\"\n    instrumental_output_filename = \"custom_instrumental_output\"\n\n    # Create output name mapping\n    custom_output_names = {\"Vocals\": vocal_output_filename, \"Instrumental\": instrumental_output_filename}\n\n    # Initialize separator with specified output directory\n    print(f\"Creating Separator with output_dir: {temp_output_dir}\")\n    separator = Separator(output_dir=temp_output_dir, log_level=20)  # INFO level\n\n    # Load model\n    print(\"Loading model: MGM_MAIN_v4.pth\")\n    separator.load_model(model_filename=\"MGM_MAIN_v4.pth\")\n\n    # Run separation with custom output names\n    print(f\"Running separation with custom_output_names: {custom_output_names}\")\n    output_files = separator.separate(input_file, custom_output_names=custom_output_names)\n    print(f\"Separator.separate() returned: {output_files}\")\n\n    # The separator adds .wav extension since the default output format is WAV\n    expected_vocal_filename = vocal_output_filename + \".wav\"\n    expected_instrumental_filename = instrumental_output_filename + \".wav\"\n    \n    # Check that the returned filenames match the expected names\n    output_filenames = [os.path.basename(f) for f in output_files]\n    print(f\"Extracted filenames from output_files: {output_filenames}\")\n    \n    # NOTE: The Separator class returns only the filenames, not the full paths with output_dir\n    print(\"EXPECTED BEHAVIOR: The Separator.separate() method returns filenames without the output_dir path\")\n    print(f\"Expected filenames (without path): {expected_vocal_filename} and {expected_instrumental_filename}\")\n    \n    assert expected_vocal_filename in output_filenames, f\"Expected {expected_vocal_filename} in output files\"\n    assert expected_instrumental_filename in output_filenames, f\"Expected {expected_instrumental_filename} in output files\"\n    \n    # Check that files physically exist in the specified output directory\n    expected_vocal_path = os.path.join(temp_output_dir, expected_vocal_filename)\n    expected_instrumental_path = os.path.join(temp_output_dir, expected_instrumental_filename)\n    \n    print(f\"Checking that files exist in output_dir: {temp_output_dir}\")\n    print(f\"Full expected vocal path: {expected_vocal_path}\")\n    print(f\"Full expected instrumental path: {expected_instrumental_path}\")\n    \n    assert os.path.exists(expected_vocal_path), f\"Vocals output file doesn't exist: {expected_vocal_path}\"\n    assert os.path.exists(expected_instrumental_path), f\"Instrumental output file doesn't exist: {expected_instrumental_path}\"\n    assert os.path.getsize(expected_vocal_path) > 0, f\"Vocals output file is empty: {expected_vocal_path}\"\n    assert os.path.getsize(expected_instrumental_path) > 0, f\"Instrumental output file is empty: {expected_instrumental_path}\"\n    \n    print(\"✅ Test passed: Separator correctly handles output_dir and custom_output_names\")\n    print(\"   - Files were saved to the specified output directory\")\n    print(\"   - Custom filenames were used (with .wav extension added)\")\n    print(\"   - Returned paths include just the filenames (not the full paths)\")\n\n\ndef test_separator_single_stem_output(input_file, temp_output_dir):\n    \"\"\"Test that Separator correctly respects output_single_stem with custom output name.\"\"\"\n    print(\"\\n>>> TEST: Checking output_single_stem with custom output name\")\n    \n    # Define custom output filename for single stem\n    vocal_output_filename = \"only_vocals_output\"\n\n    # Create output name mapping\n    custom_output_names = {\"Vocals\": vocal_output_filename}\n\n    # Initialize separator with specified output directory and single stem output\n    print(f\"Creating Separator with output_dir: {temp_output_dir} and output_single_stem: Vocals\")\n    separator = Separator(\n        output_dir=temp_output_dir,\n        output_single_stem=\"Vocals\",  # Only extract vocals\n        log_level=20  # INFO level\n    )\n\n    # Load model\n    print(\"Loading model: MGM_MAIN_v4.pth\")\n    separator.load_model(model_filename=\"MGM_MAIN_v4.pth\")\n\n    # Run separation with custom output name\n    print(f\"Running separation with custom_output_names: {custom_output_names}\")\n    output_files = separator.separate(input_file, custom_output_names=custom_output_names)\n    print(f\"Separator.separate() returned: {output_files}\")\n\n    # The separator adds .wav extension since the default output format is WAV\n    expected_vocal_filename = vocal_output_filename + \".wav\"\n    \n    # Check that the output files list contains only the expected file\n    print(f\"Checking that only one file was returned and it has the correct name\")\n    assert len(output_files) == 1, f\"Expected 1 output file, got {len(output_files)}\"\n    assert os.path.basename(output_files[0]) == expected_vocal_filename, f\"Expected {expected_vocal_filename} in output files\"\n    \n    # Check that file physically exists in the specified output directory\n    expected_vocal_path = os.path.join(temp_output_dir, expected_vocal_filename)\n    print(f\"Checking that file exists in output_dir: {expected_vocal_path}\")\n    \n    assert os.path.exists(expected_vocal_path), f\"Vocals output file doesn't exist: {expected_vocal_path}\"\n    assert os.path.getsize(expected_vocal_path) > 0, f\"Vocals output file is empty: {expected_vocal_path}\"\n    \n    # Make sure the instrumental file was NOT created\n    instrumental_files = [f for f in os.listdir(temp_output_dir) if \"instrumental\" in f.lower()]\n    print(f\"Checking that no instrumental files were created, found: {instrumental_files}\")\n    assert len(instrumental_files) == 0, f\"No instrumental file should be created when using output_single_stem, found: {instrumental_files}\"\n    \n    print(\"✅ Test passed: Separator correctly handles output_single_stem with custom output name\")\n    print(\"   - Only the specified stem (Vocals) was extracted\")\n    print(\"   - The custom filename was used (with .wav extension added)\")\n    print(\"   - No other stems (like Instrumental) were created\")\n\n\ndef test_separator_output_without_custom_names(input_file, temp_output_dir):\n    \"\"\"Test that Separator respects output_dir without custom_output_names.\"\"\"\n    print(\"\\n>>> TEST: Checking output_dir without custom output names\")\n    \n    # Initialize separator with specified output directory\n    print(f\"Creating Separator with output_dir: {temp_output_dir}\")\n    separator = Separator(output_dir=temp_output_dir, log_level=20)  # INFO level\n\n    # Load model\n    print(\"Loading model: MGM_MAIN_v4.pth\")\n    separator.load_model(model_filename=\"MGM_MAIN_v4.pth\")\n\n    # Run separation without custom output names\n    print(\"Running separation without custom_output_names\")\n    output_files = separator.separate(input_file)\n    print(f\"Separator.separate() returned: {output_files}\")\n\n    # Check that output files exist and have content\n    print(f\"Checking that two files were returned (one for each stem)\")\n    assert len(output_files) == 2, f\"Expected 2 output files, got {len(output_files)}\"\n    \n    # Check if the files were created in the output directory\n    # Note: The separator doesn't include the full path in the returned output_files\n    output_file_basenames = [os.path.basename(f) for f in output_files]\n    print(f\"Extracted filenames from output_files: {output_file_basenames}\")\n    \n    print(\"EXPECTED BEHAVIOR: Default filenames are being used (format: inputname_(StemName)_modelname.wav)\")\n    \n    # Check all output files exist in the specified output directory\n    print(f\"Checking that files exist in output_dir: {temp_output_dir}\")\n    for basename in output_file_basenames:\n        full_path = os.path.join(temp_output_dir, basename)\n        print(f\"Checking file: {full_path}\")\n        assert os.path.exists(full_path), f\"Output file doesn't exist: {full_path}\"\n        assert os.path.getsize(full_path) > 0, f\"Output file is empty: {full_path}\"\n    \n    print(\"✅ Test passed: Separator correctly handles output_dir without custom output names\")\n    print(\"   - Files were saved to the specified output directory\")\n    print(\"   - Default naming scheme was used (input_name_(Stem)_model.wav)\")\n    print(\"   - Returned paths include just the filenames (not the full paths)\")\n"
  },
  {
    "path": "tests/model-metrics/test-all-models.py",
    "content": "#!/usr/bin/env python\nimport os\nimport time\nimport museval\nimport numpy as np\nimport soundfile as sf\nfrom audio_separator.separator import Separator\nimport json\nfrom json import JSONEncoder\nimport logging\nimport musdb\nfrom decimal import Decimal\nimport tempfile\nimport argparse\n\n\n# Setup logging\nlogging.basicConfig(level=logging.INFO, format=\"%(asctime)s - %(levelname)s - %(message)s\")\nlogger = logging.getLogger(__name__)\n\n\n# Custom JSON Encoder to handle Decimal types\nclass DecimalEncoder(JSONEncoder):\n    def default(self, obj):\n        if isinstance(obj, Decimal):\n            return float(obj)\n        return super().default(obj)\n\n\nMUSDB_PATH = \"/Volumes/Nomad4TBOne/python-audio-separator/tests/model-metrics/datasets/musdb18hq\"\nRESULTS_PATH = \"/Volumes/Nomad4TBOne/python-audio-separator/tests/model-metrics/results\"\n# Find project root dynamically\nimport os\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\nwhile project_root and not os.path.exists(os.path.join(project_root, 'audio_separator')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\nCOMBINED_RESULTS_PATH = os.path.join(project_root, \"audio_separator\", \"models-scores.json\")\nCOMBINED_MUSEVAL_RESULTS_PATH = \"/Volumes/Nomad4TBOne/python-audio-separator/tests/model-metrics/results/combined-museval-results.json\"\nSTOP_SIGNAL_PATH = \"/Volumes/Nomad4TBOne/python-audio-separator/tests/model-metrics/stop-signal\"\n\n\ndef load_combined_results():\n    \"\"\"Load the combined museval results file\"\"\"\n    if os.path.exists(COMBINED_MUSEVAL_RESULTS_PATH):\n        logger.info(\"Loading combined museval results...\")\n        try:\n            with open(COMBINED_MUSEVAL_RESULTS_PATH, \"r\") as f:\n                # Use a custom parser to handle Decimal values\n                def decimal_parser(dct):\n                    for k, v in dct.items():\n                        if isinstance(v, str) and v.replace(\".\", \"\").isdigit():\n                            try:\n                                dct[k] = float(v)\n                            except (ValueError, TypeError):\n                                pass\n                    return dct\n\n                return json.load(f, object_hook=decimal_parser)\n        except Exception as e:\n            logger.error(f\"Error loading combined results: {str(e)}\")\n            # Try to load a backup file if it exists\n            backup_path = COMBINED_MUSEVAL_RESULTS_PATH + \".backup\"\n            if os.path.exists(backup_path):\n                logger.info(\"Attempting to load backup file...\")\n                try:\n                    with open(backup_path, \"r\") as f:\n                        return json.load(f, object_hook=decimal_parser)\n                except Exception as backup_e:\n                    logger.error(f\"Error loading backup file: {str(backup_e)}\")\n            return {}\n    else:\n        logger.info(\"No combined results file found, creating new one\")\n        return {}\n\n\ndef save_combined_results(combined_results):\n    \"\"\"Save the combined museval results file\"\"\"\n    logger.info(\"Saving combined museval results...\")\n    try:\n        # Create a backup of the existing file if it exists\n        if os.path.exists(COMBINED_MUSEVAL_RESULTS_PATH):\n            backup_path = COMBINED_MUSEVAL_RESULTS_PATH + \".backup\"\n            try:\n                with open(COMBINED_MUSEVAL_RESULTS_PATH, \"r\") as src, open(backup_path, \"w\") as dst:\n                    dst.write(src.read())\n            except Exception as e:\n                logger.error(f\"Error creating backup file: {str(e)}\")\n\n        # Save the new results using the custom encoder\n        with open(COMBINED_MUSEVAL_RESULTS_PATH, \"w\") as f:\n            json.dump(combined_results, f, cls=DecimalEncoder, indent=2)\n        logger.info(\"Combined results saved successfully\")\n        return True\n    except Exception as e:\n        logger.error(f\"Error saving combined results: {str(e)}\")\n        return False\n\n\ndef update_combined_results(model_name, track_name, track_data):\n    \"\"\"Update the combined results file with new track data\"\"\"\n    try:\n        # Load existing combined results\n        combined_results = load_combined_results()\n\n        # Initialize model entry if it doesn't exist\n        if model_name not in combined_results:\n            combined_results[model_name] = {}\n\n        # Add or update track data\n        combined_results[model_name][track_name] = track_data\n\n        # Write updated results back to file\n        save_combined_results(combined_results)\n        return True\n    except Exception as e:\n        logger.error(f\"Error updating combined results: {str(e)}\")\n        return False\n\n\ndef check_track_evaluated(model_name, track_name):\n    \"\"\"Check if a track has already been evaluated for a specific model\"\"\"\n    combined_results = load_combined_results()\n    return model_name in combined_results and track_name in combined_results[model_name]\n\n\ndef get_track_results(model_name, track_name):\n    \"\"\"Get the evaluation results for a specific track and model\"\"\"\n    combined_results = load_combined_results()\n    if model_name in combined_results and track_name in combined_results[model_name]:\n        return combined_results[model_name][track_name]\n    return None\n\n\ndef get_track_duration(track_path):\n    \"\"\"Get the duration of a track in minutes\"\"\"\n    try:\n        mixture_path = os.path.join(track_path, \"mixture.wav\")\n        info = sf.info(mixture_path)\n        return info.duration / 60.0  # Convert seconds to minutes\n    except Exception as e:\n        logger.error(f\"Error getting track duration: {str(e)}\")\n        return 0.0\n\n\ndef evaluate_track(track_name, track_path, test_model, mus_db):\n    \"\"\"Evaluate a single track using a specific model\"\"\"\n    logger.info(f\"Evaluating track: {track_name} with model: {test_model}\")\n\n    # Get track duration in minutes\n    track_duration_minutes = get_track_duration(track_path)\n    logger.info(f\"Track duration: {track_duration_minutes:.2f} minutes\")\n\n    # Initialize variables to track processing time\n    processing_time = 0\n    seconds_per_minute = 0\n\n    # Create a basic result structure that will be returned even if evaluation fails\n    basic_model_results = {\"track_name\": track_name, \"scores\": {}}\n\n    # Check if evaluation results already exist in combined file\n    museval_results = load_combined_results()\n    if test_model in museval_results and track_name in museval_results[test_model]:\n        logger.info(\"Found existing evaluation results in combined file...\")\n        track_data = museval_results[test_model][track_name]\n        scores = museval.TrackStore(track_name)\n        scores.scores = track_data\n\n        # Try to extract existing speed metrics if available\n        try:\n            if isinstance(track_data, dict) and \"targets\" in track_data:\n                for target in track_data[\"targets\"]:\n                    if \"metrics\" in target and \"seconds_per_minute_m3\" in target[\"metrics\"]:\n                        basic_model_results[\"scores\"][\"seconds_per_minute_m3\"] = target[\"metrics\"][\"seconds_per_minute_m3\"]\n                        break\n        except Exception:\n            pass  # Ignore errors in extracting existing speed metrics\n    else:\n        # Expanded stem mapping to include \"no-stem\" outputs and custom stem formats\n        stem_mapping = {\n            # Standard stems\n            \"Vocals\": \"vocals\",\n            \"Instrumental\": \"instrumental\",\n            \"Drums\": \"drums\",\n            \"Bass\": \"bass\",\n            \"Other\": \"other\",\n            # No-stem variants\n            \"No Drums\": \"nodrums\",\n            \"No Bass\": \"nobass\",\n            \"No Other\": \"noother\",\n            # Custom stem formats (with hyphens)\n            \"Drum-Bass\": \"drumbass\",\n            \"No Drum-Bass\": \"nodrumbass\",\n            \"Vocals-Other\": \"vocalsother\",\n            \"No Vocals-Other\": \"novocalsother\",\n        }\n\n        # Create a temporary directory for separation files\n        with tempfile.TemporaryDirectory() as temp_dir:\n            logger.info(f\"Using temporary directory: {temp_dir}\")\n\n            # Measure separation time\n            start_time = time.time()\n\n            # Perform separation\n            logger.info(\"Performing separation...\")\n            separator = Separator(output_dir=temp_dir)\n            separator.load_model(model_filename=test_model)\n            separator.separate(os.path.join(track_path, \"mixture.wav\"), custom_output_names=stem_mapping)\n\n            # Calculate processing time\n            processing_time = time.time() - start_time\n            seconds_per_minute = processing_time / track_duration_minutes if track_duration_minutes > 0 else 0\n            logger.info(f\"Separation completed in {processing_time:.2f} seconds\")\n            logger.info(f\"Processing speed: {seconds_per_minute:.2f} seconds per minute of audio\")\n\n            # Always add the speed metric to our basic results\n            basic_model_results[\"scores\"][\"seconds_per_minute_m3\"] = round(seconds_per_minute, 1)\n\n            # Check which stems were actually created\n            wav_files = [f for f in os.listdir(temp_dir) if f.endswith(\".wav\")]\n            logger.info(f\"Found WAV files: {wav_files}\")\n\n            # Determine if this is a standard vocal/instrumental model that can be evaluated with museval\n            standard_model = False\n            if len(wav_files) == 2:\n                # Check if one of the files is named vocals.wav or instrumental.wav\n                if \"vocals.wav\" in wav_files and \"instrumental.wav\" in wav_files:\n                    standard_model = True\n                    logger.info(\"Detected standard vocals/instrumental model, will run museval evaluation\")\n\n            # If not a standard model, skip museval evaluation and just return speed metrics\n            if not standard_model:\n                logger.info(f\"Non-standard stem configuration detected for model {test_model}, skipping museval evaluation\")\n\n                # Store the speed metric in the combined results\n                if test_model not in museval_results:\n                    museval_results[test_model] = {}\n\n                # Create a minimal structure for the speed metric\n                minimal_results = {\"targets\": [{\"name\": \"speed_metrics_only\", \"metrics\": {\"seconds_per_minute_m3\": round(seconds_per_minute, 1)}}]}\n\n                museval_results[test_model][track_name] = minimal_results\n                save_combined_results(museval_results)\n\n                return None, basic_model_results\n\n            # For standard models, proceed with museval evaluation\n            available_stems = {}\n            available_stems[\"vocals\"] = os.path.join(temp_dir, \"vocals.wav\")\n            available_stems[\"accompaniment\"] = os.path.join(temp_dir, \"instrumental.wav\")\n\n            # Get track from MUSDB\n            track = next((t for t in mus_db if t.name == track_name), None)\n            if track is None:\n                raise ValueError(f\"Track {track_name} not found in MUSDB18\")\n\n            # Load available stems\n            estimates = {}\n            for stem_name, stem_path in available_stems.items():\n                audio, _ = sf.read(stem_path)\n                if len(audio.shape) == 1:\n                    audio = np.expand_dims(audio, axis=1)\n                estimates[stem_name] = audio\n\n            # Evaluate using museval\n            logger.info(f\"Evaluating stems: {list(estimates.keys())}\")\n            try:\n                scores = museval.eval_mus_track(track, estimates, output_dir=temp_dir, mode=\"v4\")\n\n                # Add the speed metric to the scores\n                if not hasattr(scores, \"speed_metric_added\"):\n                    for target in scores.scores[\"targets\"]:\n                        if \"metrics\" not in target:\n                            target[\"metrics\"] = {}\n                        target[\"metrics\"][\"seconds_per_minute_m3\"] = round(seconds_per_minute, 1)\n                    scores.speed_metric_added = True\n\n                # Update the combined results file with the new evaluation\n                if test_model not in museval_results:\n                    museval_results[test_model] = {}\n                museval_results[test_model][track_name] = scores.scores\n                save_combined_results(museval_results)\n            except Exception as e:\n                logger.error(f\"Error during museval evaluation: {str(e)}\")\n                logger.exception(\"Evaluation exception details:\")\n                # Return basic results with just the speed metric\n                return None, basic_model_results\n\n    try:\n        # Only process museval results if we have them\n        if \"scores\" in locals() and scores is not None:\n            # Calculate aggregate scores for available stems\n            results_store = museval.EvalStore()\n            results_store.add_track(scores.df)\n            methods = museval.MethodStore()\n            methods.add_evalstore(results_store, name=test_model)\n            agg_scores = methods.agg_frames_tracks_scores()\n\n            # Return the aggregate scores in a structured format with 6 significant figures\n            model_results = {\"track_name\": track_name, \"scores\": {}}\n\n            for stem in [\"vocals\", \"drums\", \"bass\", \"other\", \"accompaniment\"]:\n                try:\n                    stem_scores = {metric: float(f\"{agg_scores.loc[(test_model, stem, metric)]:.6g}\") for metric in [\"SDR\", \"SIR\", \"SAR\", \"ISR\"]}\n                    # Rename 'accompaniment' to 'instrumental' in the output\n                    output_stem = \"instrumental\" if stem == \"accompaniment\" else stem\n                    model_results[\"scores\"][output_stem] = stem_scores\n                except KeyError:\n                    continue\n\n            # Add the seconds_per_minute_m3 metric if it was calculated\n            if processing_time > 0 and track_duration_minutes > 0:\n                model_results[\"scores\"][\"seconds_per_minute_m3\"] = round(seconds_per_minute, 1)\n\n            return scores, model_results if model_results[\"scores\"] else basic_model_results\n        else:\n            # If we don't have scores, just return the basic results with speed metrics\n            return None, basic_model_results\n\n    except Exception as e:\n        logger.error(f\"Error processing evaluation results: {str(e)}\")\n        logger.exception(\"Results processing exception details:\")\n        # Return basic results with just the speed metric\n        return None, basic_model_results\n\n\ndef convert_decimal_to_float(obj):\n    \"\"\"Recursively converts Decimal objects to floats in a nested structure.\"\"\"\n    if isinstance(obj, Decimal):\n        return float(obj)\n    elif isinstance(obj, dict):\n        return {k: convert_decimal_to_float(v) for k, v in obj.items()}\n    elif isinstance(obj, list):\n        return [convert_decimal_to_float(x) for x in obj]\n    return obj\n\n\ndef calculate_median_scores(track_scores):\n    \"\"\"Calculate median scores across all tracks for each stem and metric\"\"\"\n    # Initialize containers for each stem's metrics\n    stem_metrics = {\n        \"vocals\": {\"SDR\": [], \"SIR\": [], \"SAR\": [], \"ISR\": []},\n        \"drums\": {\"SDR\": [], \"SIR\": [], \"SAR\": [], \"ISR\": []},\n        \"bass\": {\"SDR\": [], \"SIR\": [], \"SAR\": [], \"ISR\": []},\n        \"instrumental\": {\"SDR\": [], \"SIR\": [], \"SAR\": [], \"ISR\": []},\n        \"seconds_per_minute_m3\": [],\n    }\n\n    # Collect all scores for each stem and metric\n    for track_score in track_scores:\n        if track_score is not None and \"scores\" in track_score:\n            # Process audio quality metrics\n            for stem, metrics in track_score[\"scores\"].items():\n                if stem in stem_metrics and stem != \"seconds_per_minute_m3\":\n                    for metric, value in metrics.items():\n                        stem_metrics[stem][metric].append(value)\n\n            # Process speed metric separately\n            if \"seconds_per_minute_m3\" in track_score[\"scores\"]:\n                stem_metrics[\"seconds_per_minute_m3\"].append(track_score[\"scores\"][\"seconds_per_minute_m3\"])\n\n    # Calculate medians for each stem and metric\n    median_scores = {}\n    for stem, metrics in stem_metrics.items():\n        if stem != \"seconds_per_minute_m3\" and any(metrics.values()):  # Only include stems that have scores\n            median_scores[stem] = {metric: float(f\"{np.median(values):.6g}\") for metric, values in metrics.items() if values}  # Only include metrics that have values\n\n    # Add median speed metric if available\n    if stem_metrics[\"seconds_per_minute_m3\"]:\n        median_scores[\"seconds_per_minute_m3\"] = round(np.median(stem_metrics[\"seconds_per_minute_m3\"]), 1)\n\n    return median_scores\n\n\ndef check_disk_usage(path):\n    \"\"\"Check inode usage and disk space on the filesystem containing path\"\"\"\n    import subprocess\n    import sys\n\n    # Check disk space first\n    result = subprocess.run([\"df\", \"-h\", path], capture_output=True, text=True)\n    output = result.stdout\n    logger.info(f\"Current disk usage:\\n{output}\")\n\n    # Parse the output to get disk usage percentage\n    lines = output.strip().split(\"\\n\")\n    if len(lines) >= 2:\n        parts = lines[1].split()\n        if len(parts) >= 5:\n            try:\n                # Extract disk usage percentage\n                disk_usage_str = parts[4].rstrip(\"%\")\n                disk_usage_pct = int(disk_usage_str)\n\n                logger.info(f\"Disk usage: {disk_usage_pct}%\")\n\n                if disk_usage_pct >= 99:\n                    logger.critical(\"CRITICAL: Disk is almost full (>99%)! Cannot continue processing.\")\n                    logger.critical(\"Please free up disk space before continuing.\")\n                    sys.exit(1)\n                elif disk_usage_pct > 95:\n                    logger.warning(f\"WARNING: High disk usage ({disk_usage_pct}%)!\")\n            except (ValueError, IndexError) as e:\n                logger.error(f\"Error parsing disk usage: {str(e)}\")\n\n    # Now check inode usage\n    result = subprocess.run([\"df\", \"-i\", path], capture_output=True, text=True)\n    output = result.stdout\n    logger.info(f\"Current inode usage:\\n{output}\")\n\n    # Parse the output to get inode usage percentage\n    lines = output.strip().split(\"\\n\")\n    if len(lines) >= 2:\n        # The second line contains the actual data\n        parts = lines[1].split()\n        if len(parts) >= 8:  # macOS df -i format has 8 columns\n            try:\n                # On macOS, inode usage is in the 8th column as a percentage\n                inode_usage_str = parts[7].rstrip(\"%\")\n                inode_usage_pct = int(inode_usage_str)\n\n                # Also extract the actual inode numbers for better reporting\n                iused = int(parts[5])\n                ifree = int(parts[6])\n                total_inodes = iused + ifree\n\n                # Skip inode check for exFAT or similar filesystems\n                if total_inodes <= 1:\n                    logger.info(\"Filesystem appears to be exFAT or similar (no real inode tracking). Skipping inode check.\")\n                    return None\n\n                logger.info(f\"Inode usage: {iused:,}/{total_inodes:,} ({inode_usage_pct}%)\")\n\n                if inode_usage_pct >= 100:\n                    logger.critical(\"CRITICAL: Inode usage is at 100%! Cannot continue processing.\")\n                    logger.critical(\"Please free up inodes before continuing.\")\n                    sys.exit(1)\n                elif inode_usage_pct > 90:\n                    logger.warning(f\"WARNING: High inode usage ({inode_usage_pct}%)!\")\n\n                return inode_usage_pct\n            except (ValueError, IndexError) as e:\n                logger.error(f\"Error parsing inode usage: {str(e)}\")\n\n    return None\n\n\ndef get_evaluated_track_count(model_name, museval_results):\n    \"\"\"Get the number of tracks evaluated for a specific model\"\"\"\n    if model_name in museval_results:\n        return len(museval_results[model_name])\n    return 0\n\n\ndef get_most_evaluated_tracks(museval_results, min_count=10):\n    \"\"\"Get tracks that have been evaluated for the most models\"\"\"\n    track_counts = {}\n\n    # Count how many models have evaluated each track\n    for model_name, tracks in museval_results.items():\n        for track_name in tracks:\n            if track_name not in track_counts:\n                track_counts[track_name] = 0\n            track_counts[track_name] += 1\n\n    # Sort tracks by evaluation count (descending)\n    sorted_tracks = sorted(track_counts.items(), key=lambda x: x[1], reverse=True)\n\n    # Return tracks that have been evaluated at least min_count times\n    return [track for track, count in sorted_tracks if count >= min_count]\n\n\ndef generate_summary_statistics(\n    start_time, models_processed, tracks_processed, models_with_new_data, tracks_evaluated, total_processing_time, fastest_model=None, slowest_model=None, combined_results_path=None, is_dry_run=False\n):\n    \"\"\"Generate a summary of the script's execution\"\"\"\n    end_time = time.time()\n    total_runtime = end_time - start_time\n\n    # Format the runtime\n    hours, remainder = divmod(total_runtime, 3600)\n    minutes, seconds = divmod(remainder, 60)\n    runtime_str = f\"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d}\"\n\n    # Build the summary\n    summary = [\n        \"=\" * 80,\n        \"DRY RUN SUMMARY - PREVIEW ONLY\" if is_dry_run else \"EXECUTION SUMMARY\",\n        \"=\" * 80,\n        f\"Total runtime: {runtime_str}\",\n        f\"Models {'that would be' if is_dry_run else ''} processed: {models_processed}\",\n        f\"Models {'that would receive' if is_dry_run else 'with'} new data: {len(models_with_new_data)}\",\n        f\"Total tracks {'that would be' if is_dry_run else ''} evaluated: {tracks_evaluated}\",\n        f\"Average tracks per model: {tracks_evaluated / len(models_with_new_data) if models_with_new_data else 0:.2f}\",\n    ]\n\n    if fastest_model:\n        summary.append(f\"Fastest model: {fastest_model['name']} ({fastest_model['speed']:.2f} seconds per minute)\")\n\n    if slowest_model:\n        summary.append(f\"Slowest model: {slowest_model['name']} ({slowest_model['speed']:.2f} seconds per minute)\")\n\n    if total_processing_time > 0:\n        summary.append(f\"Total audio processing time: {total_processing_time:.2f} seconds\")\n\n    if combined_results_path and os.path.exists(combined_results_path):\n        file_size = os.path.getsize(combined_results_path) / (1024 * 1024)  # Size in MB\n        summary.append(f\"Results file size: {file_size:.2f} MB\")\n\n    # Add models with new data\n    if models_with_new_data:\n        summary.append(f\"\\nModels {'that would receive' if is_dry_run else 'with'} new evaluation data:\")\n        for model_name in models_with_new_data:\n            summary.append(f\"- {model_name}\")\n\n    # Add dry run disclaimer if needed\n    if is_dry_run:\n        summary.append(\"\\nNOTE: This is a dry run summary. No actual changes were made.\")\n        summary.append(\"Run without --dry-run to perform actual evaluations.\")\n\n    summary.append(\"=\" * 80)\n    return \"\\n\".join(summary)\n\n\ndef check_stop_signal():\n    \"\"\"Check if the stop signal file exists\"\"\"\n    if os.path.exists(STOP_SIGNAL_PATH):\n        logger.info(\"Stop signal detected at: \" + STOP_SIGNAL_PATH)\n        return True\n    return False\n\n\ndef main():\n    # Add command line argument parsing for dry run mode\n    parser = argparse.ArgumentParser(description=\"Run model evaluation on MUSDB18 dataset\")\n    parser.add_argument(\"--dry-run\", action=\"store_true\", help=\"Run in dry-run mode (no writes)\")\n    parser.add_argument(\"--max-tracks\", type=int, default=10, help=\"Maximum number of tracks to evaluate per model\")\n    parser.add_argument(\"--max-models\", type=int, default=None, help=\"Maximum number of models to evaluate\")\n    args = parser.parse_args()\n\n    # Remove any existing stop signal file at start\n    if os.path.exists(STOP_SIGNAL_PATH):\n        os.remove(STOP_SIGNAL_PATH)\n        logger.info(\"Removed existing stop signal file\")\n\n    # Track start time for progress reporting\n    start_time = time.time()\n\n    # Statistics tracking\n    models_processed = 0\n    tracks_processed = 0\n    models_with_new_data = set()\n    total_processing_time = 0\n    fastest_model = {\"name\": \"\", \"speed\": float(\"inf\")}  # Initialize with infinity for comparison\n    slowest_model = {\"name\": \"\", \"speed\": 0}  # Initialize with zero for comparison\n\n    # Create a results cache manager\n    class ResultsCache:\n        def __init__(self):\n            self.results = load_combined_results()\n            self.last_update_time = time.time()\n\n        def get_results(self, force=False):\n            current_time = time.time()\n            # Only reload from disk every 5 minutes unless forced\n            if force or (current_time - self.last_update_time) > 300:\n                self.results = load_combined_results()\n                self.last_update_time = current_time\n            return self.results\n\n    results_cache = ResultsCache()\n\n    # Helper function for logging with elapsed time\n    def log_with_time(message, level=logging.INFO):\n        elapsed = time.time() - start_time\n        hours, remainder = divmod(elapsed, 3600)\n        minutes, seconds = divmod(remainder, 60)\n        time_str = f\"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d}\"\n        logger.log(level, f\"[{time_str}] {message}\")\n\n    if args.dry_run:\n        log_with_time(\"*** RUNNING IN DRY-RUN MODE - NO DATA WILL BE MODIFIED ***\")\n\n    log_with_time(\"Starting model evaluation script...\")\n    os.makedirs(RESULTS_PATH, exist_ok=True)\n\n    # Check disk space and inode usage at start\n    check_disk_usage(RESULTS_PATH)\n\n    # Load existing results if available\n    combined_results = {}\n    if os.path.exists(COMBINED_RESULTS_PATH):\n        log_with_time(\"Loading existing combined results...\")\n        with open(COMBINED_RESULTS_PATH) as f:\n            combined_results = json.load(f)\n\n    # Get initial museval results\n    museval_results = results_cache.get_results()\n    log_with_time(f\"Loaded combined museval results with {len(museval_results)} models\")\n\n    # Get the most commonly evaluated tracks\n    common_tracks = get_most_evaluated_tracks(museval_results)\n    log_with_time(f\"Found {len(common_tracks)} commonly evaluated tracks\")\n\n    # Initialize MUSDB\n    log_with_time(\"Initializing MUSDB database...\")\n    mus = musdb.DB(root=MUSDB_PATH, is_wav=True)\n\n    # Create a prioritized list of tracks\n    all_tracks = []\n    for track in mus.tracks:\n        # Check if this is a commonly evaluated track\n        is_common = track.name in common_tracks\n        all_tracks.append({\"name\": track.name, \"path\": os.path.dirname(track.path), \"is_common\": is_common})\n\n    # Sort tracks by whether they're commonly evaluated\n    all_tracks.sort(key=lambda t: 0 if t[\"is_common\"] else 1)\n\n    # Get list of all available models\n    log_with_time(\"Getting list of available models...\")\n    separator = Separator()\n    models_by_type = separator.list_supported_model_files()\n\n    # Flatten the models list and prioritize them\n    all_models = []\n    for model_type, models in models_by_type.items():\n        for model_name, model_info in models.items():\n            filename = model_info.get(\"filename\")\n            if filename:\n                # Count how many tracks have been evaluated for this model\n                evaluated_count = get_evaluated_track_count(filename, museval_results)\n\n                # Determine if this is a roformer model\n                is_roformer = \"roformer\" in model_name.lower()\n\n                # Add to the list with priority information\n                all_models.append({\"name\": model_name, \"filename\": filename, \"type\": model_type, \"info\": model_info, \"evaluated_count\": evaluated_count, \"is_roformer\": is_roformer})\n\n    # Sort models by priority:\n    # 1. Roformer models with fewer than max_tracks evaluations\n    # 2. Other models with fewer than max_tracks evaluations\n    # 3. Roformer models with more evaluations\n    # 4. Other models with more evaluations\n    all_models.sort(\n        key=lambda m: (\n            0 if m[\"is_roformer\"] and m[\"evaluated_count\"] < args.max_tracks else 1 if not m[\"is_roformer\"] and m[\"evaluated_count\"] < args.max_tracks else 2 if m[\"is_roformer\"] else 3,\n            m[\"evaluated_count\"],  # Secondary sort by number of evaluations (ascending)\n        )\n    )\n\n    # Log the prioritized models\n    log_with_time(f\"Prioritized {len(all_models)} models for evaluation:\")\n    for i, model in enumerate(all_models[:10]):  # Show top 10\n        log_with_time(f\"{i+1}. {model['name']} ({model['filename']}) - {model['evaluated_count']} tracks evaluated, roformer: {model['is_roformer']}\")\n\n    if len(all_models) > 10:\n        log_with_time(f\"... and {len(all_models) - 10} more models\")\n\n    # Limit the number of models if specified\n    if args.max_models:\n        all_models = all_models[: args.max_models]\n        log_with_time(f\"Limited to {args.max_models} models for this run\")\n\n    # Process models according to priority\n    model_idx = 0\n    stop_requested = False\n    while model_idx < len(all_models):\n        # Check for stop signal before processing each model\n        if check_stop_signal():\n            log_with_time(\"Stop signal detected. Will finish current model's tracks and then exit.\")\n            stop_requested = True\n\n        model = all_models[model_idx]\n        model_name = model[\"name\"]\n        model_filename = model[\"filename\"]\n        model_type = model[\"type\"]\n\n        progress_pct = (model_idx + 1) / len(all_models) * 100\n        log_with_time(f\"\\n=== Processing model {model_idx+1}/{len(all_models)} ({progress_pct:.1f}%): {model_name} ({model_filename}) ===\")\n\n        # Initialize model entry if it doesn't exist\n        if model_filename not in combined_results:\n            log_with_time(f\"Initializing new entry for {model_filename}\")\n            combined_results[model_filename] = {\"model_name\": model_name, \"track_scores\": [], \"median_scores\": {}, \"stems\": [], \"target_stem\": None}\n\n        # Try to load the model to get stem information\n        try:\n            separator.load_model(model_filename=model_filename)\n            model_data = separator.model_instance.model_data\n\n            # Extract stem information (similar to your existing code)\n            # ... (keep your existing stem extraction logic here)\n\n        except Exception as e:\n            log_with_time(f\"Error loading model {model_filename}: {str(e)}\", logging.ERROR)\n            logger.exception(\"Full exception details:\")\n            model_idx += 1\n            continue\n\n        # Count how many tracks have been evaluated for this model\n        # Use the cached results\n        evaluated_count = get_evaluated_track_count(model_filename, results_cache.get_results())\n\n        # Determine how many more tracks to evaluate\n        tracks_to_evaluate = max(0, args.max_tracks - evaluated_count)\n\n        if tracks_to_evaluate == 0:\n            log_with_time(f\"Model {model_name} already has {evaluated_count} tracks evaluated (>= {args.max_tracks}). Skipping.\")\n            model_idx += 1\n            continue\n\n        log_with_time(f\"Will evaluate up to {tracks_to_evaluate} tracks for model {model_name}\")\n\n        # Process tracks for this model\n        tracks_processed = 0\n        for track in all_tracks:\n            # Check for stop signal before each track if we haven't already detected it\n            if not stop_requested and check_stop_signal():\n                log_with_time(\"Stop signal detected. Will finish current track and then exit.\")\n                stop_requested = True\n\n            # Skip if we've processed enough tracks for this model\n            if tracks_processed >= tracks_to_evaluate:\n                break\n\n            track_name = track[\"name\"]\n            track_path = track[\"path\"]\n\n            # Skip if track already evaluated for this model\n            # Use the cached results\n            if model_filename in results_cache.get_results() and track_name in results_cache.get_results()[model_filename]:\n                log_with_time(f\"Skipping already evaluated track {track_name} for model: {model_filename}\")\n                continue\n\n            log_with_time(f\"Processing track: {track_name} for model: {model_filename}\")\n\n            if args.dry_run:\n                log_with_time(f\"[DRY RUN] Would evaluate track {track_name} with model {model_filename}\")\n                tracks_processed += 1\n                models_with_new_data.add(model_filename)\n\n                # Estimate processing time based on model type for dry run\n                # This is a rough estimate - roformer models are typically slower\n                estimated_speed = 30.0  # Default estimate: 30 seconds per minute\n                if \"roformer\" in model_name.lower():\n                    estimated_speed = 45.0  # Roformer models are typically slower\n                elif \"umx\" in model_name.lower():\n                    estimated_speed = 20.0  # UMX models are typically faster\n\n                # Update statistics with estimated values\n                total_processing_time += estimated_speed\n\n                # Track fastest and slowest models based on estimates\n                if estimated_speed < fastest_model[\"speed\"]:\n                    fastest_model = {\"name\": model_name, \"speed\": estimated_speed}\n                if estimated_speed > slowest_model[\"speed\"]:\n                    slowest_model = {\"name\": model_name, \"speed\": estimated_speed}\n\n                continue\n\n            try:\n                result = evaluate_track(track_name, track_path, model_filename, mus)\n\n                # Unpack the result safely\n                if result and isinstance(result, tuple) and len(result) == 2:\n                    _, model_results = result\n                else:\n                    model_results = None\n\n                # Process the results if they exist and are valid\n                if model_results is not None and isinstance(model_results, dict):\n                    combined_results[model_filename][\"track_scores\"].append(model_results)\n                    tracks_processed += 1\n                    models_with_new_data.add(model_filename)\n\n                    # Track processing time statistics - safely access nested dictionaries\n                    scores = model_results.get(\"scores\", {})\n                    if isinstance(scores, dict):\n                        speed = scores.get(\"seconds_per_minute_m3\")\n                        if speed is not None:\n                            total_processing_time += speed  # Accumulate total processing time\n\n                            # Track fastest and slowest models\n                            if speed < fastest_model[\"speed\"]:\n                                fastest_model = {\"name\": model_name, \"speed\": speed}\n                            if speed > slowest_model[\"speed\"]:\n                                slowest_model = {\"name\": model_name, \"speed\": speed}\n                else:\n                    log_with_time(f\"Skipping model {model_filename} for track {track_name} due to no evaluatable stems or invalid results\")\n            except Exception as e:\n                log_with_time(f\"Error evaluating model {model_filename} with track {track_name}: {str(e)}\", logging.ERROR)\n                logger.exception(f\"Exception details: \", exc_info=e)\n                continue\n\n            # Update and save results\n            if combined_results[model_filename][\"track_scores\"]:\n                median_scores = calculate_median_scores(combined_results[model_filename][\"track_scores\"])\n                combined_results[model_filename][\"median_scores\"] = median_scores\n\n            # Save results after each track\n            if not args.dry_run:\n                os.makedirs(os.path.dirname(COMBINED_RESULTS_PATH), exist_ok=True)\n                with open(COMBINED_RESULTS_PATH, \"w\", encoding=\"utf-8\") as f:\n                    json.dump(combined_results, f, indent=2)\n                log_with_time(f\"Updated combined results file with {model_filename} - {track_name}\")\n\n                # Force update the cache after saving\n                results_cache.get_results(force=True)\n            else:\n                log_with_time(f\"[DRY RUN] Would have updated combined results for {model_filename} - {track_name}\")\n\n            # Check disk space periodically\n            check_disk_usage(RESULTS_PATH)\n\n        log_with_time(f\"Completed processing {tracks_processed} tracks for model {model_name}\")\n\n        # If stop was requested, exit after completing the current model\n        if stop_requested:\n            log_with_time(\"Stop signal processed. Generating final summary before exit.\")\n            break\n\n        # If we're processing a non-roformer model, check if there are roformer models that need evaluation\n        if not model[\"is_roformer\"]:\n            # Find roformer models that still need more evaluations\n            # Use the cached results\n            roformer_models_needing_eval = []\n            for i, m in enumerate(all_models[model_idx + 1 :], start=model_idx + 1):\n                if m[\"is_roformer\"]:\n                    eval_count = get_evaluated_track_count(m[\"filename\"], results_cache.get_results())\n                    if eval_count < args.max_tracks:\n                        roformer_models_needing_eval.append((i, m))\n\n            if roformer_models_needing_eval:\n                log_with_time(f\"Found {len(roformer_models_needing_eval)} roformer models that still need evaluation. Reprioritizing...\")\n\n                # Move these models to the front of the remaining queue\n                for offset, (i, m) in enumerate(roformer_models_needing_eval):\n                    # Adjust index for models we've already moved\n                    adjusted_idx = i - offset\n                    # Move this model right after the current one\n                    all_models.insert(model_idx + 1, all_models.pop(adjusted_idx))\n\n                log_with_time(\"Reprioritization complete. Continuing with highest priority model.\")\n\n        # Move to the next model\n        model_idx += 1\n        models_processed += 1\n\n    log_with_time(\"Evaluation complete\")\n    # Final disk space check\n    check_disk_usage(RESULTS_PATH)\n\n    # Generate and display summary statistics\n    # Reset fastest/slowest models if they weren't updated\n    if fastest_model[\"speed\"] == float(\"inf\"):\n        fastest_model = None\n    if slowest_model[\"speed\"] == 0:\n        slowest_model = None\n\n    summary = generate_summary_statistics(\n        start_time=start_time,\n        models_processed=models_processed,\n        tracks_processed=tracks_processed,\n        models_with_new_data=models_with_new_data,\n        tracks_evaluated=tracks_processed,\n        total_processing_time=total_processing_time,\n        fastest_model=fastest_model,\n        slowest_model=slowest_model,\n        combined_results_path=COMBINED_RESULTS_PATH,\n        is_dry_run=args.dry_run,\n    )\n\n    log_with_time(\"\\n\" + summary)\n\n    # Also write summary to a log file\n    summary_filename = \"dry_run_summary.log\" if args.dry_run else \"evaluation_summary.log\"\n    if stop_requested:\n        summary_filename = \"stopped_\" + summary_filename\n    summary_log_path = os.path.join(os.path.dirname(COMBINED_RESULTS_PATH), summary_filename)\n    with open(summary_log_path, \"w\") as f:\n        f.write(f\"{'Dry run' if args.dry_run else 'Evaluation'} {'(stopped early)' if stop_requested else ''} completed at: {time.strftime('%Y-%m-%d %H:%M:%S')}\\n\")\n        f.write(summary)\n\n    log_with_time(f\"Summary written to {summary_log_path}\")\n\n    # Clean up stop signal file if it exists\n    if os.path.exists(STOP_SIGNAL_PATH):\n        os.remove(STOP_SIGNAL_PATH)\n        log_with_time(\"Removed stop signal file\")\n\n    return 0 if not stop_requested else 2  # Return different exit code if stopped early\n\n\nif __name__ == \"__main__\":\n    exit(main())\n"
  },
  {
    "path": "tests/regression/test_all_models_stem_verification.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nOn-demand regression test: verify every supported model's output stem labels\nmatch their actual audio content.\n\nThis test runs ALL supported models on the 20-second test audio file and uses\ncorrelation-based analysis to verify each output stem contains what its label\nclaims (e.g., a stem labeled \"Vocals\" actually contains vocal content).\n\nUsage:\n    # Run all models (takes a long time — ~163 models):\n    pytest tests/regression/test_all_models_stem_verification.py -v -s\n\n    # Run a specific architecture:\n    pytest tests/regression/test_all_models_stem_verification.py -v -s -k \"VR\"\n    pytest tests/regression/test_all_models_stem_verification.py -v -s -k \"MDX\"\n    pytest tests/regression/test_all_models_stem_verification.py -v -s -k \"MDXC\"\n    pytest tests/regression/test_all_models_stem_verification.py -v -s -k \"Demucs\"\n\n    # Run a single model:\n    pytest tests/regression/test_all_models_stem_verification.py -v -s -k \"UVR_MDXNET_KARA\"\n\n    # Generate a report without failing on mismatches (dry run):\n    STEM_VERIFY_REPORT_ONLY=1 pytest tests/regression/test_all_models_stem_verification.py -v -s\n\nWhen to run:\n    - After changing stem naming logic in common_separator.py or separator.py\n    - After adding new models or YAML configs to models.json\n    - After modifying the MDXC/VR separator stem assignment code\n    - Periodically as a health check\n\nNOT run in CI — requires downloading all models.\n\"\"\"\n\nimport os\nimport sys\nimport re\nimport tempfile\nimport shutil\nimport json\nimport logging\nimport pytest\nimport numpy as np\nimport librosa\n\nsys.path.insert(0, os.path.join(os.path.dirname(__file__), \"..\"))\nfrom utils_audio_verification import load_references, classify_audio\n\n\nINPUT_FILE = \"tests/inputs/mardy20s.flac\"\n\n# Stems where we can verify content via correlation\nVOCAL_STEMS = {\"vocals\", \"vocal\", \"lead vocals\", \"backing vocals\", \"lead_only\", \"backing_only\"}\nINSTRUMENTAL_STEMS = {\"instrumental\", \"inst\", \"karaoke\", \"no_vocals\", \"no vocals\"}\n\n# Stems that are sub-components — we can only verify they're not silent/full-mix\nSUB_STEMS = {\"drums\", \"bass\", \"guitar\", \"piano\", \"other\", \"synthesizer\", \"strings\",\n             \"woodwinds\", \"brass\", \"wind inst\", \"no drums\", \"no bass\", \"no guitar\",\n             \"no piano\", \"no other\", \"no synthesizer\", \"no strings\", \"no woodwinds\",\n             \"no brass\", \"no wind inst\", \"drum-bass\", \"no drum-bass\",\n             # Drumsep sub-stems (individual drum kit parts)\n             \"kick\", \"snare\", \"toms\", \"hh\", \"ride\", \"crash\",\n             # Gender-split vocal sub-stems\n             \"male\", \"female\",\n             # Specialized extraction sub-stems\n             \"aspiration\", \"bleed\", \"no bleed\"}\n\n# Utility model stems (de-echo, de-noise, de-reverb) — these remove subtle artifacts,\n# not vocals/instruments, so the \"cleaned\" stem is expected to be ≈ the original mix\n# and the \"artifact\" stem may be near-silent or unclear on clean source audio.\nUTILITY_STEMS = {\"echo\", \"no echo\", \"reverb\", \"no reverb\", \"noreverb\",\n                 \"noise\", \"no noise\", \"dry\", \"no dry\", \"crowd\", \"no crowd\"}\n\n# Stems that extract a specific subset of vocals — won't match the full vocal reference\nPARTIAL_VOCAL_STEMS = {\"lead vocals\", \"backing vocals\", \"lead_only\", \"backing_only\",\n                       \"with_lead_vocals\", \"with_backing_vocals\"}\n\n# Report-only mode: print results but don't fail\nREPORT_ONLY = os.environ.get(\"STEM_VERIFY_REPORT_ONLY\", \"0\") == \"1\"\n\n\n@pytest.fixture(scope=\"session\")\ndef audio_references():\n    \"\"\"Load reference audio once for the entire test session.\"\"\"\n    return load_references(input_dir=\"tests/inputs\")\n\n\ndef get_all_models():\n    \"\"\"Get all supported models grouped by architecture.\"\"\"\n    from audio_separator.separator import Separator\n    sep = Separator(info_only=True, log_level=logging.WARNING)\n    return sep.list_supported_model_files()\n\n\ndef build_model_params():\n    \"\"\"Build pytest parametrize list: (arch, friendly_name, filename).\"\"\"\n    params = []\n    all_models = get_all_models()\n    for arch, models in all_models.items():\n        for friendly_name, info in models.items():\n            filename = info.get(\"filename\", \"\")\n            if not filename:\n                continue\n            test_id = f\"{arch}-{filename}\"\n            params.append(pytest.param(arch, friendly_name, filename, id=test_id))\n    return params\n\n\nMODEL_PARAMS = build_model_params()\n\n\ndef verify_stem_content(stem_path, stem_label, ref_vocal, ref_inst, ref_mix, min_len):\n    \"\"\"Verify a single stem's content matches its label.\n\n    Returns (passed, message) tuple.\n    \"\"\"\n    y, _ = librosa.load(stem_path, sr=44100, mono=True)\n    cv, ci, cm, rms, detected = classify_audio(y, ref_vocal, ref_inst, ref_mix, min_len)\n\n    label_lower = stem_label.lower()\n    issues = []\n\n    is_sub_stem = label_lower in SUB_STEMS\n    is_utility = label_lower in UTILITY_STEMS\n    is_partial_vocal = label_lower in PARTIAL_VOCAL_STEMS\n\n    # Utility stems (de-echo, de-noise, de-reverb) — the \"cleaned\" output is expected\n    # to be ≈ the original mix on clean source audio, and the \"artifact\" stem may be\n    # near-silent. These are not separation errors.\n    if is_utility:\n        return True, f\"OK utility stem (detected={detected}, corr_m={cm:.3f}, rms={rms:.4f})\"\n\n    # Sub-stems (drums, bass, guitar, piano, \"no X\" variants) — can be near-silent\n    # if the source doesn't contain that instrument, and \"No X\" stems can be ≈ the\n    # full mix if X isn't present. Both are legitimate, not errors.\n    if is_sub_stem:\n        return True, f\"OK sub-stem (detected={detected}, corr_m={cm:.3f}, rms={rms:.4f})\"\n\n    # Check for silent output\n    if rms < 0.001:\n        return False, f\"SILENT (rms={rms:.6f})\"\n\n    # Check for full mix leak (no stem should be the original mix)\n    if cm > 0.95:\n        return False, f\"FULL_MIX (corr_mix={cm:.3f}) — stem contains the original mix, not a separation\"\n\n    # Partial vocal stems (backing vocals, lead vocals) — won't match the full vocal\n    # reference well, so just verify they're not silent/full-mix (already checked above)\n    if is_partial_vocal:\n        return True, f\"OK partial vocal (detected={detected}, corr_v={cv:.3f})\"\n\n    # Verify vocal-labeled stems\n    if label_lower in VOCAL_STEMS or (\"vocal\" in label_lower and \"no\" not in label_lower):\n        if detected != \"VOCALS\":\n            issues.append(f\"labeled '{stem_label}' but detected {detected} (corr_v={cv:.3f}, corr_i={ci:.3f})\")\n        if cv < 0.7:\n            issues.append(f\"low vocal correlation ({cv:.3f}) for vocal-labeled stem\")\n\n    # Verify instrumental-labeled stems\n    elif label_lower in INSTRUMENTAL_STEMS:\n        if detected != \"INSTRUMENTAL\":\n            issues.append(f\"labeled '{stem_label}' but detected {detected} (corr_v={cv:.3f}, corr_i={ci:.3f})\")\n        if ci < 0.7:\n            issues.append(f\"low instrumental correlation ({ci:.3f}) for instrumental-labeled stem\")\n\n    # Unknown stem type — log but don't fail\n    else:\n        issues.append(f\"unknown stem type '{stem_label}' — cannot verify content (detected={detected})\")\n\n    if issues:\n        return False, \"; \".join(issues)\n    return True, f\"OK (detected={detected}, corr_v={cv:.3f}, corr_i={ci:.3f}, corr_m={cm:.3f})\"\n\n\n# Models that are known to extract a partial/specialized signal — their \"Vocals\"\n# or \"Instrumental\" stems won't match the standard references.\nSPECIALIZED_MODEL_PATTERNS = [\"BVE\", \"De-Echo\", \"DeEcho\", \"DeNoise\", \"De-Noise\", \"De-Reverb\", \"DeReverb\"]\n\n\n@pytest.mark.parametrize(\"arch,friendly_name,model_filename\", MODEL_PARAMS)\ndef test_model_stem_labels(arch, friendly_name, model_filename, audio_references, tmp_path):\n    \"\"\"Verify that a model's output stems contain what their labels claim.\"\"\"\n    from audio_separator.separator import Separator\n\n    ref_vocal, ref_inst, ref_mix, min_len = audio_references\n\n    print(f\"\\n  Model: {model_filename} ({arch})\")\n    print(f\"  Friendly name: {friendly_name}\")\n\n    # Check if this is a specialized model where standard verification doesn't apply\n    is_specialized = any(p.lower() in model_filename.lower() or p.lower() in friendly_name.lower()\n                         for p in SPECIALIZED_MODEL_PATTERNS)\n    if is_specialized:\n        print(f\"    (specialized model — relaxed verification)\")\n\n    # Skip Demucs on Python < 3.10\n    if arch == \"Demucs\":\n        import sys\n        if sys.version_info < (3, 10):\n            pytest.skip(\"Demucs requires Python 3.10+\")\n\n    temp_dir = str(tmp_path)\n\n    try:\n        sep = Separator(output_dir=temp_dir, output_format=\"WAV\", log_level=logging.WARNING)\n        sep.load_model(model_filename)\n        output_files = sep.separate(INPUT_FILE)\n    except Exception as e:\n        # Model download or separation failure — report but don't mask as stem issue\n        pytest.skip(f\"Model failed to run: {e}\")\n\n    all_passed = True\n    messages = []\n\n    for output_file in output_files:\n        full_path = output_file if os.path.isabs(output_file) else os.path.join(temp_dir, output_file)\n        if not os.path.exists(full_path):\n            full_path = os.path.join(temp_dir, os.path.basename(output_file))\n\n        fname = os.path.basename(full_path)\n        match = re.search(r'_\\(([^)]+)\\)', fname)\n        stem_label = match.group(1) if match else \"Unknown\"\n\n        passed, msg = verify_stem_content(full_path, stem_label, ref_vocal, ref_inst, ref_mix, min_len)\n\n        # Specialized models (BVE, de-echo, de-noise, de-reverb) get relaxed verification —\n        # report failures as warnings but don't count them as test failures\n        if not passed and is_specialized:\n            status = \"WARN\"\n            print(f\"    {stem_label:<20} {status}  {msg} (specialized model, not a failure)\")\n            messages.append((stem_label, True, f\"WARN: {msg}\"))\n        else:\n            status = \"PASS\" if passed else \"FAIL\"\n            print(f\"    {stem_label:<20} {status}  {msg}\")\n            messages.append((stem_label, passed, msg))\n            if not passed:\n                all_passed = False\n\n    if not all_passed and not REPORT_ONLY:\n        failures = [f\"  {label}: {msg}\" for label, passed, msg in messages if not passed]\n        pytest.fail(f\"Stem content mismatch for {model_filename}:\\n\" + \"\\n\".join(failures))\n"
  },
  {
    "path": "tests/regression/test_roformer_size_mismatch.py",
    "content": "\"\"\"\nRegression tests for Roformer size mismatch issues.\nTests the handling of shorter outputs and broadcast errors in overlap_add operations.\n\"\"\"\n\nimport pytest\nimport torch\nimport numpy as np\nimport unittest\nfrom unittest.mock import Mock, patch, MagicMock\n\n\nclass TestRoformerSizeMismatch(unittest.TestCase):\n    \"\"\"Regression tests for size mismatch issues in Roformer processing.\"\"\"\n\n    def setUp(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.batch_size = 2\n        self.channels = 2\n        self.sample_rate = 44100\n        \n        # Test cases with different output lengths that have caused issues\n        self.problematic_lengths = [1, 16, 32, 236, 512, 1024, 2048]\n\n    def test_overlap_add_safe_lengths(self):\n        \"\"\"T062: Reproduce shorter outputs (N∈{1,16,32,236}) and assert no broadcast errors, output length preserved.\"\"\"\n        \n        # Test each problematic length\n        for output_length in self.problematic_lengths:\n            with self.subTest(output_length=output_length):\n                self._test_single_output_length(output_length)\n\n    def _test_single_output_length(self, output_length):\n        \"\"\"Test overlap_add with a specific output length.\"\"\"\n        # Setup test parameters\n        target_length = 10000  # Longer target buffer\n        chunk_size = 8192\n        step_size = 1024\n        \n        # Create mock model output with specific length\n        model_output = torch.randn(self.batch_size, self.channels, output_length)\n        target_buffer = torch.zeros(self.batch_size, self.channels, target_length)\n        counter = torch.zeros(self.batch_size, self.channels, target_length)\n        \n        # Mock overlap_add function that handles size mismatches safely\n        def mock_overlap_add_safe(output, target, counter, start_idx, expected_len):\n            \"\"\"Safe overlap_add that handles shorter outputs.\"\"\"\n            # Get actual output length\n            actual_len = output.shape[-1]\n            \n            # Calculate safe length to add\n            remaining_target = target.shape[-1] - start_idx\n            safe_len = min(actual_len, expected_len, remaining_target)\n            \n            if safe_len <= 0:\n                return 0  # Nothing to add\n            \n            # Ensure no broadcast errors by explicit slicing\n            end_idx = start_idx + safe_len\n            \n            try:\n                # Add output to target buffer\n                target_slice = target[..., start_idx:end_idx]\n                output_slice = output[..., :safe_len]\n                \n                # Verify shapes match before adding\n                assert target_slice.shape == output_slice.shape, (\n                    f\"Shape mismatch: target_slice {target_slice.shape} != output_slice {output_slice.shape}\"\n                )\n                \n                target[..., start_idx:end_idx] += output_slice\n                counter[..., start_idx:end_idx] += 1.0\n                \n                return safe_len\n                \n            except Exception as e:\n                pytest.fail(f\"Overlap add failed for output_length {output_length}: {e}\")\n        \n        # Test overlap_add with different starting positions\n        start_positions = [0, 1000, 5000, 8000]\n        \n        for start_idx in start_positions:\n            if start_idx >= target_length:\n                continue\n                \n            # Test with expected length equal to chunk_size\n            added_len = mock_overlap_add_safe(\n                model_output, target_buffer, counter, start_idx, chunk_size\n            )\n            \n            # Verify no broadcast errors occurred (function completed successfully)\n            assert added_len >= 0, f\"overlap_add should not fail for output_length {output_length}\"\n            \n            # Verify length preservation - added length should not exceed actual output length\n            assert added_len <= output_length, (\n                f\"Added length {added_len} should not exceed actual output length {output_length}\"\n            )\n            \n            # Verify added length is reasonable\n            remaining_space = target_length - start_idx\n            max_possible_add = min(output_length, chunk_size, remaining_space)\n            assert added_len <= max_possible_add, (\n                f\"Added length {added_len} exceeds maximum possible {max_possible_add}\"\n            )\n\n    def test_overlap_add_edge_cases(self):\n        \"\"\"Test edge cases that have caused size mismatch issues.\"\"\"\n        target_length = 10000\n        \n        edge_cases = [\n            # (output_length, start_idx, expected_chunk_size, description)\n            (1, 0, 8192, \"Single sample output at start\"),\n            (1, 9999, 8192, \"Single sample output at end\"),\n            (16, 0, 8192, \"Very short output at start\"),\n            (32, 5000, 8192, \"Short output in middle\"),\n            (236, 9000, 8192, \"Medium output near end\"),\n            (8192, 2000, 8192, \"Full chunk size output\"),\n            (10000, 0, 8192, \"Output longer than chunk size\"),\n            (5000, 8000, 8192, \"Output extending beyond target\"),\n        ]\n        \n        for output_len, start_idx, chunk_size, description in edge_cases:\n            with self.subTest(case=description):\n                # Create test tensors\n                model_output = torch.randn(self.batch_size, self.channels, output_len)\n                target_buffer = torch.zeros(self.batch_size, self.channels, target_length)\n                \n                # Mock safe overlap_add\n                def safe_overlap_add(output, target, start, chunk_sz):\n                    actual_len = output.shape[-1]\n                    remaining = target.shape[-1] - start\n                    safe_len = min(actual_len, chunk_sz, remaining)\n                    \n                    if safe_len > 0:\n                        end = start + safe_len\n                        target[..., start:end] += output[..., :safe_len]\n                    \n                    return safe_len\n                \n                # Should not raise any exceptions\n                try:\n                    added_len = safe_overlap_add(model_output, target_buffer, start_idx, chunk_size)\n                    \n                    # Verify reasonable result\n                    assert added_len >= 0, f\"Should add non-negative length for {description}\"\n                    assert added_len <= output_len, f\"Should not add more than available for {description}\"\n                    \n                except Exception as e:\n                    pytest.fail(f\"Edge case failed - {description}: {e}\")\n\n    def test_counter_consistency_with_overlap_add(self):\n        \"\"\"Test that counter updates match overlap_add operations for problematic lengths.\"\"\"\n        target_length = 10000\n        \n        for output_length in self.problematic_lengths:\n            with self.subTest(output_length=output_length):\n                # Create test tensors\n                model_output = torch.randn(self.batch_size, self.channels, output_length)\n                target_buffer = torch.zeros(self.batch_size, self.channels, target_length)\n                counter = torch.zeros(self.batch_size, self.channels, target_length)\n                \n                start_idx = 1000\n                chunk_size = 8192\n                \n                # Mock consistent overlap_add and counter update\n                def consistent_update(output, target, counter, start, chunk_sz):\n                    actual_len = output.shape[-1]\n                    remaining = target.shape[-1] - start\n                    safe_len = min(actual_len, chunk_sz, remaining)\n                    \n                    if safe_len > 0:\n                        end = start + safe_len\n                        # Update both target and counter with same range\n                        target[..., start:end] += output[..., :safe_len]\n                        counter[..., start:end] += 1.0\n                    \n                    return safe_len\n                \n                added_len = consistent_update(\n                    model_output, target_buffer, counter, start_idx, chunk_size\n                )\n                \n                # Verify consistency\n                if added_len > 0:\n                    # Check that counter was updated in the same range as target\n                    end_idx = start_idx + added_len\n                    counter_slice = counter[0, 0, start_idx:end_idx]\n                    \n                    # Counter should be 1.0 where we added data\n                    assert torch.all(counter_slice == 1.0), (\n                        f\"Counter not consistent for output_length {output_length}\"\n                    )\n                    \n                    # Counter should be 0.0 outside the updated range\n                    if start_idx > 0:\n                        before_slice = counter[0, 0, :start_idx]\n                        assert torch.all(before_slice == 0.0), (\n                            f\"Counter corrupted before range for output_length {output_length}\"\n                        )\n                    \n                    if end_idx < target_length:\n                        after_slice = counter[0, 0, end_idx:]\n                        assert torch.all(after_slice == 0.0), (\n                            f\"Counter corrupted after range for output_length {output_length}\"\n                        )\n\n    def test_broadcast_error_prevention(self):\n        \"\"\"Test specific scenarios that previously caused broadcast errors.\"\"\"\n        # These are specific cases that have been observed to cause issues\n        broadcast_error_cases = [\n            # (batch, channels, output_len, target_len, start_idx)\n            (1, 2, 1, 8192, 0),      # Single batch, stereo, 1 sample\n            (2, 1, 16, 8192, 100),   # Dual batch, mono, 16 samples\n            (1, 1, 32, 8192, 8000),  # Single batch, mono, 32 samples near end\n            (2, 2, 236, 8192, 4000), # Dual batch, stereo, 236 samples\n            (1, 2, 512, 1024, 800),  # Output longer than remaining target space\n        ]\n        \n        for batch, channels, output_len, target_len, start_idx in broadcast_error_cases:\n            with self.subTest(batch=batch, channels=channels, output_len=output_len):\n                # Create tensors with specific problematic dimensions\n                model_output = torch.randn(batch, channels, output_len)\n                target_buffer = torch.zeros(batch, channels, target_len)\n                \n                # Mock overlap_add with explicit broadcast error prevention\n                def broadcast_safe_overlap_add(output, target, start):\n                    try:\n                        # Get dimensions\n                        out_batch, out_channels, out_len = output.shape\n                        tgt_batch, tgt_channels, tgt_len = target.shape\n                        \n                        # Verify batch and channel dimensions match\n                        assert out_batch == tgt_batch, f\"Batch size mismatch: {out_batch} != {tgt_batch}\"\n                        assert out_channels == tgt_channels, f\"Channel mismatch: {out_channels} != {tgt_channels}\"\n                        \n                        # Calculate safe range\n                        remaining = tgt_len - start\n                        safe_len = min(out_len, remaining)\n                        \n                        if safe_len > 0:\n                            end = start + safe_len\n                            \n                            # Explicit slicing to prevent broadcast errors\n                            target_slice = target[:, :, start:end]\n                            output_slice = output[:, :, :safe_len]\n                            \n                            # Verify shapes match exactly\n                            assert target_slice.shape == output_slice.shape, (\n                                f\"Shape mismatch: {target_slice.shape} != {output_slice.shape}\"\n                            )\n                            \n                            # Perform addition\n                            target[:, :, start:end] += output_slice\n                        \n                        return safe_len\n                        \n                    except Exception as e:\n                        pytest.fail(f\"Broadcast error not prevented: {e}\")\n                \n                # Should complete without broadcast errors\n                added_len = broadcast_safe_overlap_add(model_output, target_buffer, start_idx)\n                \n                # Verify result is reasonable\n                expected_max = min(output_len, target_len - start_idx)\n                assert 0 <= added_len <= expected_max, (\n                    f\"Added length {added_len} outside expected range [0, {expected_max}]\"\n                )\n\n    def test_output_length_preservation_regression(self):\n        \"\"\"Test that output length is preserved correctly in all problematic cases.\"\"\"\n        # This tests the regression where shorter outputs were not handled correctly\n        \n        for output_length in self.problematic_lengths:\n            with self.subTest(output_length=output_length):\n                # Simulate processing pipeline that should preserve length\n                original_length = output_length\n                \n                # Mock processing steps that might alter length\n                def mock_processing_pipeline(input_length):\n                    \"\"\"Mock processing that should preserve input length.\"\"\"\n                    # Step 1: Model inference (might produce shorter output)\n                    model_output_length = input_length  # Should match input\n                    \n                    # Step 2: Overlap-add processing\n                    processed_length = model_output_length  # Should preserve length\n                    \n                    # Step 3: Final output\n                    final_length = processed_length  # Should still match\n                    \n                    return {\n                        'original': input_length,\n                        'model_output': model_output_length,\n                        'processed': processed_length,\n                        'final': final_length\n                    }\n                \n                result = mock_processing_pipeline(original_length)\n                \n                # Verify length preservation throughout pipeline\n                assert result['model_output'] == original_length, (\n                    f\"Model output length {result['model_output']} != original {original_length}\"\n                )\n                assert result['processed'] == original_length, (\n                    f\"Processed length {result['processed']} != original {original_length}\"\n                )\n                assert result['final'] == original_length, (\n                    f\"Final length {result['final']} != original {original_length}\"\n                )\n\n    def test_memory_layout_consistency(self):\n        \"\"\"Test that memory layout is consistent for problematic tensor sizes.\"\"\"\n        # Some size mismatches can be caused by unexpected memory layouts\n        \n        for output_length in self.problematic_lengths:\n            with self.subTest(output_length=output_length):\n                # Create tensors with different memory layouts\n                contiguous_output = torch.randn(self.batch_size, self.channels, output_length)\n                non_contiguous_output = contiguous_output.transpose(1, 2).transpose(1, 2)\n                \n                target = torch.zeros(self.batch_size, self.channels, 10000)\n                \n                # Mock overlap_add that handles memory layout\n                def layout_aware_overlap_add(output, target, start):\n                    # Ensure contiguous memory layout\n                    if not output.is_contiguous():\n                        output = output.contiguous()\n                    \n                    if not target.is_contiguous():\n                        target = target.contiguous()\n                    \n                    # Perform safe addition\n                    out_len = output.shape[-1]\n                    remaining = target.shape[-1] - start\n                    safe_len = min(out_len, remaining)\n                    \n                    if safe_len > 0:\n                        end = start + safe_len\n                        target[..., start:end] += output[..., :safe_len]\n                    \n                    return safe_len\n                \n                # Test with both contiguous and non-contiguous tensors\n                contiguous_result = layout_aware_overlap_add(contiguous_output, target.clone(), 1000)\n                non_contiguous_result = layout_aware_overlap_add(non_contiguous_output, target.clone(), 1000)\n                \n                # Results should be the same regardless of memory layout\n                assert contiguous_result == non_contiguous_result, (\n                    f\"Memory layout affected result for output_length {output_length}: \"\n                    f\"contiguous={contiguous_result}, non_contiguous={non_contiguous_result}\"\n                )\n"
  },
  {
    "path": "tests/remote/README.md",
    "content": "# Remote API Tests\n\nThis directory contains comprehensive tests for the remote API functionality of audio-separator, which allows running audio separation workloads on remote servers via an API.\n\n## Test Structure\n\n### Unit Tests (`tests/unit/`)\n\n- **`test_remote_api_client.py`** - Tests for the `AudioSeparatorAPIClient` class\n  - Mock-based tests for all client methods\n  - Tests HTTP request/response handling\n  - Tests error conditions and edge cases\n  - Tests the convenience method `separate_audio_and_wait()`\n\n- **`test_remote_cli.py`** - Tests for the remote CLI functionality\n  - Tests command-line argument parsing\n  - Tests all CLI commands (separate, status, models, download)\n  - Tests error handling and edge cases\n  - Mock-based tests without actual HTTP calls\n\n### Integration Tests (`tests/integration/`)\n\n- **`test_remote_api_integration.py`** - Full integration tests\n  - Tests with a mock HTTP server that simulates the real API\n  - End-to-end workflow testing\n  - Tests job submission, polling, and file download\n  - Tests multiple concurrent jobs\n  - Tests error handling with realistic scenarios\n\n## Remote API Architecture\n\nThe remote API system consists of three main components:\n\n1. **API Server** (`audio_separator/remote/deploy_modal.py`)\n   - FastAPI server deployed on Modal.com\n   - Handles audio upload, processing, and file serving\n   - Supports multiple models in single jobs\n   - Asynchronous processing with job status tracking\n\n2. **API Client** (`audio_separator/remote/api_client.py`)\n   - Python client for interacting with the remote API\n   - Handles file uploads, job polling, and downloads\n   - Supports all separator parameters and architectures\n\n3. **Remote CLI** (`audio_separator/remote/cli.py`)\n   - Command-line interface using the API client\n   - Mirror of local CLI functionality but for remote processing\n   - Supports all local CLI parameters and options\n\n## Key Features Tested\n\n### Multiple Model Support\n- Upload once, process with multiple models\n- Efficient workflow for comparing model quality\n- Progress tracking across multiple models\n\n### Full Parameter Compatibility\n- All MDX, VR, Demucs, and MDXC parameters supported\n- Custom output naming and format options\n- Same parameter validation as local processing\n\n### Asynchronous Processing\n- Job submission returns immediately with task ID\n- Status polling with progress updates\n- Background processing on remote server\n\n### Error Handling\n- Network connectivity issues\n- Invalid parameters and file formats\n- Server errors and timeouts\n- Job failures and cleanup\n\n## Running the Tests\n\n### Run All Remote API Tests\n```bash\n# Run all unit tests for remote functionality\npytest tests/unit/test_remote_api_client.py tests/unit/test_remote_cli.py -v\n\n# Run integration tests with mock server\npytest tests/integration/test_remote_api_integration.py -v\n\n# Run all remote tests\npytest tests/unit/test_remote*.py tests/integration/test_remote*.py -v\n```\n\n### Run Specific Test Categories\n```bash\n# Test only API client functionality\npytest tests/unit/test_remote_api_client.py::TestAudioSeparatorAPIClient -v\n\n# Test only CLI functionality  \npytest tests/unit/test_remote_cli.py::TestRemoteCLI -v\n\n# Test end-to-end workflows\npytest tests/integration/test_remote_api_integration.py::TestRemoteAPIEndToEnd -v\n```\n\n### Run with Coverage\n```bash\npytest tests/unit/test_remote*.py tests/integration/test_remote*.py --cov=audio_separator.remote --cov-report=html\n```\n\n## Test Environment Setup\n\nThe tests are designed to run without requiring a live API server:\n\n1. **Unit Tests** - Use mocked HTTP responses, no network calls\n2. **Integration Tests** - Use a mock HTTP server that simulates the real API\n3. **End-to-End Tests** - Full workflow testing with realistic data\n\n### Mock Server Features\n\nThe integration tests include a comprehensive mock HTTP server that simulates:\n- Job submission and processing\n- Status polling with progress updates\n- File upload and download\n- Model listing and filtering\n- Error conditions and edge cases\n\n## Testing Best Practices\n\n### Isolated Tests\n- Each test is independent and can run in isolation\n- Temporary files are properly cleaned up\n- Mock state is reset between tests\n\n### Realistic Scenarios\n- Tests use realistic audio file formats and sizes\n- Error conditions match real-world scenarios\n- Progress tracking mimics actual processing times\n\n### Comprehensive Coverage\n- All API endpoints are tested\n- All CLI commands and options are tested\n- Both success and error paths are tested\n- Parameter validation and edge cases are covered\n\n## Test Data\n\nTests use minimal synthetic data to avoid large file dependencies:\n- Fake audio content for upload testing\n- Simulated processing results\n- Mock model metadata\n\n## Debugging Tests\n\n### Enable Debug Logging\n```bash\npytest tests/integration/test_remote_api_integration.py -v -s --log-cli-level=DEBUG\n```\n\n### Run Individual Tests\n```bash\n# Test specific functionality\npytest tests/unit/test_remote_api_client.py::TestAudioSeparatorAPIClient::test_separate_audio_and_wait_success -v -s\n\n# Test with specific parameters\npytest tests/integration/test_remote_api_integration.py::TestRemoteAPIIntegration::test_job_status_polling -v -s\n```\n\n### Test Environment Variables\n```bash\n# Skip certain tests if needed\nSKIP_INTEGRATION_TESTS=1 pytest tests/unit/test_remote*.py\n\n# Enable additional debug output\nDEBUG_REMOTE_TESTS=1 pytest tests/integration/test_remote*.py -v -s\n```\n\n## Integration with CI/CD\n\nThese tests are designed to run in CI environments:\n- No external dependencies required\n- Fast execution (typically < 30 seconds)\n- Reliable mock server implementation\n- Clear pass/fail criteria\n\n## Contributing\n\nWhen adding new remote API features:\n\n1. **Add unit tests** for individual components\n2. **Add integration tests** for end-to-end workflows\n3. **Update mock server** to support new endpoints\n4. **Test error conditions** and edge cases\n5. **Update documentation** for new test scenarios\n\nThe test suite should maintain high coverage while remaining fast and reliable for continuous integration. "
  },
  {
    "path": "tests/reproduce_ensemble_bug.py",
    "content": "\"\"\"\nReproduce the ensemble + custom_output_names bug against the live API.\n\nThis script simulates exactly what karaoke-gen's audio_processor does:\n1. Call the API with preset=instrumental_clean and custom_output_names\n2. Download the results\n3. Check if the expected filenames exist\n\nExpected behavior (fixed): files named job123_mixed_vocals.flac and job123_mixed_instrumental.flac\nBug behavior (current prod): files named with original filename + _(Unknown)_ or _(Other)_\n\nUsage:\n    python tests/reproduce_ensemble_bug.py [--api-url URL]\n\"\"\"\nimport json\nimport os\nimport sys\nimport tempfile\n\n# Add the repo to path so we can import the API client\nsys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))\n\nfrom audio_separator.remote.api_client import AudioSeparatorAPIClient\n\n\ndef main():\n    api_url = os.environ.get(\"AUDIO_SEPARATOR_API_URL\")\n    if not api_url:\n        print(\"ERROR: Set AUDIO_SEPARATOR_API_URL environment variable\")\n        sys.exit(1)\n\n    test_audio = os.path.join(os.path.dirname(os.path.abspath(__file__)), \"inputs\", \"under_pressure_harmonies.flac\")\n    if not os.path.exists(test_audio):\n        print(f\"ERROR: Test audio file not found: {test_audio}\")\n        sys.exit(1)\n\n    with tempfile.TemporaryDirectory(prefix=\"ensemble_bug_test_\") as output_dir:\n        print(f\"API URL: {api_url}\")\n        print(f\"Output dir: {output_dir}\")\n        print()\n\n        import logging\n        logging.basicConfig(level=logging.INFO, format=\"%(asctime)s %(levelname)s %(name)s: %(message)s\")\n        logger = logging.getLogger(\"test\")\n\n        client = AudioSeparatorAPIClient(api_url, logger)\n\n        # This is exactly what karaoke-gen does in _process_audio_separation_remote\n        file_prefix = \"job123\"  # Simulates job_id-based prefix\n        custom_output_names = {\n            \"Vocals\": f\"{file_prefix}_mixed_vocals\",\n            \"Instrumental\": f\"{file_prefix}_mixed_instrumental\",\n        }\n\n        print(\"=\" * 60)\n        print(\"TEST: Preset + custom_output_names (reproduces karaoke-gen bug)\")\n        print(f\"  preset: instrumental_clean\")\n        print(f\"  custom_output_names: {custom_output_names}\")\n        print(\"=\" * 60)\n        print()\n\n        result = client.separate_audio_and_wait(\n            test_audio,\n            preset=\"instrumental_clean\",\n            timeout=600,\n            poll_interval=10,\n            download=True,\n            output_dir=output_dir,\n            output_format=\"flac\",\n            custom_output_names=custom_output_names,\n        )\n\n        print()\n        print(\"=\" * 60)\n        print(\"RESULTS\")\n        print(\"=\" * 60)\n        print(f\"Status: {result.get('status')}\")\n        print(f\"Downloaded files: {result.get('downloaded_files', [])}\")\n        print()\n\n        # List what's actually in the output dir\n        actual_files = os.listdir(output_dir)\n        print(f\"Files in output dir: {actual_files}\")\n        print()\n\n        # Check for expected files\n        fmt = \"flac\"\n        expected_vocals = f\"{file_prefix}_mixed_vocals.{fmt}\"\n        expected_instrumental = f\"{file_prefix}_mixed_instrumental.{fmt}\"\n\n        vocals_exists = os.path.exists(os.path.join(output_dir, expected_vocals))\n        instrumental_exists = os.path.exists(os.path.join(output_dir, expected_instrumental))\n\n        print(\"EXPECTED FILE CHECK:\")\n        print(f\"  {expected_vocals}: {'FOUND' if vocals_exists else 'MISSING'}\")\n        print(f\"  {expected_instrumental}: {'FOUND' if instrumental_exists else 'MISSING'}\")\n        print()\n\n        if vocals_exists and instrumental_exists:\n            print(\"RESULT: PASS - custom_output_names working correctly\")\n            return 0\n        else:\n            print(\"RESULT: FAIL - custom_output_names NOT applied (bug reproduced)\")\n            print()\n            print(\"Actual files downloaded:\")\n            for f in actual_files:\n                size = os.path.getsize(os.path.join(output_dir, f))\n                print(f\"  {f} ({size / 1024:.1f} KB)\")\n            return 1\n\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n"
  },
  {
    "path": "tests/unit/test_audio_chunking.py",
    "content": "\"\"\"\nUnit tests for audio chunking functionality.\nTests the AudioChunker class for splitting and merging audio files.\n\"\"\"\n\nimport pytest\nimport os\nimport tempfile\nimport logging\nfrom unittest.mock import Mock, patch, MagicMock\nfrom pydub import AudioSegment\n\nfrom audio_separator.separator.audio_chunking import AudioChunker\n\n\nclass TestAudioChunker:\n    \"\"\"Test cases for AudioChunker class.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.chunk_duration_seconds = 10.0  # 10 seconds\n        self.logger = logging.getLogger(__name__)\n        self.chunker = AudioChunker(self.chunk_duration_seconds, self.logger)\n\n    def test_initialization(self):\n        \"\"\"Test AudioChunker initialization.\"\"\"\n        assert self.chunker.chunk_duration_ms == 10000\n        assert self.chunker.logger is not None\n\n    def test_initialization_with_custom_logger(self):\n        \"\"\"Test AudioChunker initialization with custom logger.\"\"\"\n        custom_logger = Mock()\n        chunker = AudioChunker(5.0, custom_logger)\n        assert chunker.chunk_duration_ms == 5000\n        assert chunker.logger == custom_logger\n\n    def test_should_chunk_true(self):\n        \"\"\"Test should_chunk returns True for files longer than chunk duration.\"\"\"\n        # File duration 15 seconds, chunk duration 10 seconds\n        assert self.chunker.should_chunk(15.0) is True\n\n    def test_should_chunk_false(self):\n        \"\"\"Test should_chunk returns False for files shorter than chunk duration.\"\"\"\n        # File duration 5 seconds, chunk duration 10 seconds\n        assert self.chunker.should_chunk(5.0) is False\n\n    def test_should_chunk_exact_boundary(self):\n        \"\"\"Test should_chunk at exact boundary.\"\"\"\n        # File duration exactly 10 seconds, chunk duration 10 seconds\n        assert self.chunker.should_chunk(10.0) is False\n\n    def test_should_chunk_just_over_boundary(self):\n        \"\"\"Test should_chunk just over boundary.\"\"\"\n        # File duration 10.1 seconds, chunk duration 10 seconds\n        assert self.chunker.should_chunk(10.1) is True\n\n    @patch('audio_separator.separator.audio_chunking.AudioSegment.from_file')\n    @patch('audio_separator.separator.audio_chunking.os.path.exists')\n    @patch('audio_separator.separator.audio_chunking.os.makedirs')\n    def test_split_audio_basic(self, _mock_makedirs, mock_exists, mock_from_file):\n        \"\"\"Test basic audio splitting.\"\"\"\n        # Mock audio file (30 seconds)\n        mock_audio = Mock()\n        mock_audio.__len__ = Mock(return_value=30000)  # 30 seconds in ms\n        mock_audio.__getitem__ = Mock(side_effect=lambda _: mock_audio)\n        mock_audio.export = Mock()\n        mock_from_file.return_value = mock_audio\n        mock_exists.return_value = True\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            chunk_paths = self.chunker.split_audio(\"test.wav\", temp_dir)\n\n            # Should create 3 chunks (30s / 10s = 3)\n            assert len(chunk_paths) == 3\n            assert all(\"chunk_\" in path for path in chunk_paths)\n            assert mock_audio.export.call_count == 3\n\n        finally:\n            # Cleanup\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n\n    @patch('audio_separator.separator.audio_chunking.AudioSegment.from_file')\n    @patch('audio_separator.separator.audio_chunking.os.path.exists')\n    @patch('audio_separator.separator.audio_chunking.os.makedirs')\n    def test_split_audio_uneven_chunks(self, _mock_makedirs, mock_exists, mock_from_file):\n        \"\"\"Test splitting audio with uneven chunk sizes.\"\"\"\n        # Mock audio file (25 seconds)\n        mock_audio = Mock()\n        mock_audio.__len__ = Mock(return_value=25000)  # 25 seconds in ms\n        mock_audio.__getitem__ = Mock(side_effect=lambda _: mock_audio)\n        mock_audio.export = Mock()\n        mock_from_file.return_value = mock_audio\n        mock_exists.return_value = True\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            chunk_paths = self.chunker.split_audio(\"test.wav\", temp_dir)\n\n            # Should create 3 chunks (ceil(25s / 10s) = 3)\n            # First two chunks: 10s each, last chunk: 5s\n            assert len(chunk_paths) == 3\n\n        finally:\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n\n    def test_split_audio_file_not_found(self, tmp_path):\n        \"\"\"Test split_audio with non-existent file.\"\"\"\n        with pytest.raises(FileNotFoundError):\n            self.chunker.split_audio(\"nonexistent.wav\", str(tmp_path))\n\n    @patch('audio_separator.separator.audio_chunking.AudioSegment.from_file')\n    @patch('audio_separator.separator.audio_chunking.os.path.exists')\n    def test_merge_chunks_basic(self, mock_exists, mock_from_file, tmp_path):\n        \"\"\"Test basic chunk merging.\"\"\"\n        # Mock chunk files\n        mock_chunk1 = Mock()\n        mock_chunk2 = Mock()\n        mock_combined = Mock()\n        mock_combined.export = Mock()\n\n        # Setup mock to return chunks and allow addition\n        mock_from_file.side_effect = [mock_chunk1, mock_chunk2]\n        mock_exists.return_value = True\n\n        # Mock AudioSegment.empty() and addition\n        with patch('audio_separator.separator.audio_chunking.AudioSegment.empty') as mock_empty:\n            mock_empty.return_value = mock_combined\n            mock_combined.__add__ = Mock(side_effect=[mock_combined, mock_combined])\n            mock_combined.__len__ = Mock(return_value=20000)\n\n            chunk_paths = [\"chunk1.wav\", \"chunk2.wav\"]\n            output_path = self.chunker.merge_chunks(chunk_paths, str(tmp_path / \"output.wav\"))\n\n            assert output_path == str(tmp_path / \"output.wav\")\n            assert mock_combined.export.called\n\n    def test_merge_chunks_empty_list(self, tmp_path):\n        \"\"\"Test merge_chunks with empty chunk list.\"\"\"\n        with pytest.raises(ValueError, match=\"Cannot merge empty list\"):\n            self.chunker.merge_chunks([], str(tmp_path / \"output.wav\"))\n\n    @patch('audio_separator.separator.audio_chunking.os.path.exists')\n    def test_merge_chunks_missing_file(self, mock_exists, tmp_path):\n        \"\"\"Test merge_chunks with missing chunk file.\"\"\"\n        mock_exists.return_value = False\n\n        with pytest.raises(FileNotFoundError, match=\"Chunk file not found\"):\n            self.chunker.merge_chunks([\"missing.wav\"], str(tmp_path / \"output.wav\"))\n\n    def test_chunk_duration_calculation(self):\n        \"\"\"Test chunk duration calculation.\"\"\"\n        chunker_5s = AudioChunker(5.0, self.logger)\n        assert chunker_5s.chunk_duration_ms == 5000\n\n        chunker_60s = AudioChunker(60.0, self.logger)\n        assert chunker_60s.chunk_duration_ms == 60000\n\n        chunker_half = AudioChunker(0.5, self.logger)\n        assert chunker_half.chunk_duration_ms == 500\n\n\nclass TestAudioChunkerIntegration:\n    \"\"\"Integration tests with actual audio segment creation.\"\"\"\n\n    def test_split_and_merge_round_trip(self):\n        \"\"\"Test splitting and merging produces valid output.\"\"\"\n        # Create a simple test audio segment (silence)\n        audio = AudioSegment.silent(duration=15000)  # 15 seconds\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            # Save test audio\n            input_path = os.path.join(temp_dir, \"test_input.wav\")\n            audio.export(input_path, format=\"wav\")\n\n            # Split\n            chunker = AudioChunker(5.0)  # 5-second chunks\n            chunk_dir = os.path.join(temp_dir, \"chunks\")\n            chunk_paths = chunker.split_audio(input_path, chunk_dir)\n\n            # Should create 3 chunks\n            assert len(chunk_paths) == 3\n            assert all(os.path.exists(path) for path in chunk_paths)\n\n            # Merge\n            output_path = os.path.join(temp_dir, \"test_output.wav\")\n            merged_path = chunker.merge_chunks(chunk_paths, output_path)\n\n            # Verify output exists and has similar duration\n            assert os.path.exists(merged_path)\n            merged_audio = AudioSegment.from_file(merged_path)\n\n            # Duration should be close (within 100ms due to encoding)\n            assert abs(len(merged_audio) - len(audio)) < 100\n\n        finally:\n            # Cleanup\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n\n    def test_split_with_different_formats(self):\n        \"\"\"Test splitting works with different audio formats.\"\"\"\n        audio = AudioSegment.silent(duration=10000)  # 10 seconds\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            # Test with .wav extension\n            input_wav = os.path.join(temp_dir, \"test.wav\")\n            audio.export(input_wav, format=\"wav\")\n\n            chunker = AudioChunker(5.0)\n            chunk_dir = os.path.join(temp_dir, \"chunks_wav\")\n            chunk_paths = chunker.split_audio(input_wav, chunk_dir)\n\n            assert len(chunk_paths) == 2\n            assert all(path.endswith(\".wav\") for path in chunk_paths)\n\n        finally:\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n\n\nclass TestAudioChunkerEdgeCases:\n    \"\"\"Test edge cases and error handling.\"\"\"\n\n    def test_very_short_file(self):\n        \"\"\"Test chunking very short file (shorter than chunk duration).\"\"\"\n        audio = AudioSegment.silent(duration=2000)  # 2 seconds\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            input_path = os.path.join(temp_dir, \"short.wav\")\n            audio.export(input_path, format=\"wav\")\n\n            chunker = AudioChunker(10.0)  # 10-second chunks\n\n            # Should still work, creating just 1 chunk\n            chunk_dir = os.path.join(temp_dir, \"chunks\")\n            chunk_paths = chunker.split_audio(input_path, chunk_dir)\n\n            assert len(chunk_paths) == 1\n\n        finally:\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n\n    def test_exact_multiple_of_chunk_size(self):\n        \"\"\"Test file that's exact multiple of chunk size.\"\"\"\n        audio = AudioSegment.silent(duration=20000)  # 20 seconds\n\n        temp_dir = tempfile.mkdtemp()\n        try:\n            input_path = os.path.join(temp_dir, \"exact.wav\")\n            audio.export(input_path, format=\"wav\")\n\n            chunker = AudioChunker(10.0)  # 10-second chunks\n\n            chunk_dir = os.path.join(temp_dir, \"chunks\")\n            chunk_paths = chunker.split_audio(input_path, chunk_dir)\n\n            # Should create exactly 2 chunks\n            assert len(chunk_paths) == 2\n\n        finally:\n            import shutil\n            if os.path.exists(temp_dir):\n                shutil.rmtree(temp_dir)\n"
  },
  {
    "path": "tests/unit/test_bit_depth_detection.py",
    "content": "\"\"\"\nUnit tests for bit depth preservation functionality in CommonSeparator.\n\nTests the bit depth detection and storage logic without requiring full separation.\n\"\"\"\n\nimport os\nimport pytest\nimport tempfile\nimport shutil\nimport soundfile as sf\nimport numpy as np\nfrom unittest.mock import Mock, MagicMock, patch\n\nfrom audio_separator.separator.common_separator import CommonSeparator\n\n\ndef create_test_audio_file(output_path, sample_rate=44100, duration=0.5, bit_depth=16):\n    \"\"\"\n    Create a test audio file with a specific bit depth.\n    \n    Args:\n        output_path: Path to save the test audio file\n        sample_rate: Sample rate in Hz\n        duration: Duration in seconds\n        bit_depth: Bit depth (16, 24, or 32)\n    \n    Returns:\n        Path to the created audio file\n    \"\"\"\n    # Generate a simple test signal (440 Hz sine wave)\n    t = np.linspace(0, duration, int(sample_rate * duration))\n    frequency = 440  # A4 note\n    audio = 0.5 * np.sin(2 * np.pi * frequency * t)\n    \n    # Create stereo by duplicating the mono signal\n    stereo_audio = np.column_stack([audio, audio])\n    \n    # Determine the subtype based on bit depth\n    if bit_depth == 16:\n        subtype = 'PCM_16'\n    elif bit_depth == 24:\n        subtype = 'PCM_24'\n    elif bit_depth == 32:\n        subtype = 'PCM_32'\n    else:\n        raise ValueError(f\"Unsupported bit depth: {bit_depth}\")\n    \n    # Write the audio file\n    sf.write(output_path, stereo_audio, sample_rate, subtype=subtype)\n    \n    return output_path\n\n\n@pytest.fixture(name=\"temp_audio_dir\")\ndef fixture_temp_audio_dir():\n    \"\"\"Fixture providing a temporary directory for input audio files.\"\"\"\n    temp_dir = tempfile.mkdtemp()\n    yield temp_dir\n    # Clean up after test\n    shutil.rmtree(temp_dir)\n\n\n@pytest.fixture(name=\"mock_separator_config\")\ndef fixture_mock_separator_config():\n    \"\"\"Fixture providing a mock separator configuration.\"\"\"\n    return {\n        \"logger\": Mock(),\n        \"log_level\": 20,\n        \"torch_device\": Mock(),\n        \"torch_device_cpu\": Mock(),\n        \"torch_device_mps\": Mock(),\n        \"onnx_execution_provider\": Mock(),\n        \"model_name\": \"test_model\",\n        \"model_path\": \"/path/to/model\",\n        \"model_data\": {\"training\": {\"instruments\": [\"vocals\", \"other\"]}},\n        \"output_dir\": None,\n        \"output_format\": \"wav\",\n        \"output_bitrate\": None,\n        \"normalization_threshold\": 0.9,\n        \"amplification_threshold\": 0.0,\n        \"enable_denoise\": False,\n        \"output_single_stem\": None,\n        \"invert_using_spec\": False,\n        \"sample_rate\": 44100,\n        \"use_soundfile\": False,\n    }\n\n\ndef test_16bit_detection(temp_audio_dir, mock_separator_config):\n    \"\"\"Test that 16-bit audio files are correctly detected.\"\"\"\n    print(\"\\n>>> TEST: 16-bit detection\")\n    \n    # Create a 16-bit test audio file\n    input_file = os.path.join(temp_audio_dir, \"test_16bit.wav\")\n    create_test_audio_file(input_file, bit_depth=16)\n    \n    # Create CommonSeparator instance\n    separator = CommonSeparator(mock_separator_config)\n    \n    # Call prepare_mix to detect bit depth\n    mix = separator.prepare_mix(input_file)\n    \n    # Verify bit depth was detected correctly\n    print(f\"Detected bit depth: {separator.input_bit_depth}\")\n    print(f\"Detected subtype: {separator.input_subtype}\")\n    \n    assert separator.input_bit_depth == 16, f\"Expected 16-bit, got {separator.input_bit_depth}\"\n    assert 'PCM_16' in separator.input_subtype or separator.input_subtype == 'PCM_16', f\"Expected PCM_16, got {separator.input_subtype}\"\n    \n    print(\"✅ Test passed: 16-bit audio correctly detected\")\n\n\ndef test_24bit_detection(temp_audio_dir, mock_separator_config):\n    \"\"\"Test that 24-bit audio files are correctly detected.\"\"\"\n    print(\"\\n>>> TEST: 24-bit detection\")\n    \n    # Create a 24-bit test audio file\n    input_file = os.path.join(temp_audio_dir, \"test_24bit.wav\")\n    create_test_audio_file(input_file, bit_depth=24)\n    \n    # Create CommonSeparator instance\n    separator = CommonSeparator(mock_separator_config)\n    \n    # Call prepare_mix to detect bit depth\n    mix = separator.prepare_mix(input_file)\n    \n    # Verify bit depth was detected correctly\n    print(f\"Detected bit depth: {separator.input_bit_depth}\")\n    print(f\"Detected subtype: {separator.input_subtype}\")\n    \n    assert separator.input_bit_depth == 24, f\"Expected 24-bit, got {separator.input_bit_depth}\"\n    assert 'PCM_24' in separator.input_subtype, f\"Expected PCM_24, got {separator.input_subtype}\"\n    \n    print(\"✅ Test passed: 24-bit audio correctly detected\")\n\n\ndef test_32bit_detection(temp_audio_dir, mock_separator_config):\n    \"\"\"Test that 32-bit audio files are correctly detected.\"\"\"\n    print(\"\\n>>> TEST: 32-bit detection\")\n    \n    # Create a 32-bit test audio file\n    input_file = os.path.join(temp_audio_dir, \"test_32bit.wav\")\n    create_test_audio_file(input_file, bit_depth=32)\n    \n    # Create CommonSeparator instance\n    separator = CommonSeparator(mock_separator_config)\n    \n    # Call prepare_mix to detect bit depth\n    mix = separator.prepare_mix(input_file)\n    \n    # Verify bit depth was detected correctly\n    print(f\"Detected bit depth: {separator.input_bit_depth}\")\n    print(f\"Detected subtype: {separator.input_subtype}\")\n    \n    assert separator.input_bit_depth == 32, f\"Expected 32-bit, got {separator.input_bit_depth}\"\n    assert 'PCM_32' in separator.input_subtype or 'FLOAT' in separator.input_subtype, f\"Expected PCM_32 or FLOAT, got {separator.input_subtype}\"\n    \n    print(\"✅ Test passed: 32-bit audio correctly detected\")\n\n\ndef test_numpy_array_input_defaults_to_16bit(mock_separator_config):\n    \"\"\"Test that numpy array input defaults to 16-bit.\"\"\"\n    print(\"\\n>>> TEST: Numpy array input defaults to 16-bit\")\n    \n    # Create a mock numpy array (stereo audio)\n    mock_audio = np.random.rand(1000, 2).astype(np.float32)\n    \n    # Create CommonSeparator instance\n    separator = CommonSeparator(mock_separator_config)\n    \n    # Call prepare_mix with numpy array\n    mix = separator.prepare_mix(mock_audio)\n    \n    # Verify bit depth defaults to 16-bit\n    print(f\"Bit depth for numpy input: {separator.input_bit_depth}\")\n    assert separator.input_bit_depth == 16, f\"Expected 16-bit default, got {separator.input_bit_depth}\"\n    \n    print(\"✅ Test passed: Numpy array input defaults to 16-bit\")\n\n\ndef test_bit_depth_preserved_across_multiple_files(temp_audio_dir, mock_separator_config):\n    \"\"\"Test that bit depth is correctly updated when processing multiple files.\"\"\"\n    print(\"\\n>>> TEST: Bit depth updated across multiple files\")\n    \n    # Create test files with different bit depths\n    input_16bit = os.path.join(temp_audio_dir, \"test_16bit.wav\")\n    input_24bit = os.path.join(temp_audio_dir, \"test_24bit.wav\")\n    \n    create_test_audio_file(input_16bit, bit_depth=16)\n    create_test_audio_file(input_24bit, bit_depth=24)\n    \n    # Create CommonSeparator instance\n    separator = CommonSeparator(mock_separator_config)\n    \n    # Process 16-bit file\n    mix1 = separator.prepare_mix(input_16bit)\n    assert separator.input_bit_depth == 16\n    print(f\"After 16-bit file: bit depth = {separator.input_bit_depth}\")\n    \n    # Process 24-bit file\n    mix2 = separator.prepare_mix(input_24bit)\n    assert separator.input_bit_depth == 24\n    print(f\"After 24-bit file: bit depth = {separator.input_bit_depth}\")\n    \n    # Process 16-bit file again\n    mix3 = separator.prepare_mix(input_16bit)\n    assert separator.input_bit_depth == 16\n    print(f\"After 16-bit file again: bit depth = {separator.input_bit_depth}\")\n    \n    print(\"✅ Test passed: Bit depth correctly updated across multiple files\")\n\n\nif __name__ == \"__main__\":\n    # Run tests with pytest\n    pytest.main([__file__, \"-v\", \"-s\"])\n\n"
  },
  {
    "path": "tests/unit/test_bit_depth_writing.py",
    "content": "\"\"\"\nUnit tests for bit depth preservation in audio writing functions.\n\nTests that the write_audio functions preserve the input bit depth.\n\"\"\"\n\nimport os\nimport pytest\nimport tempfile\nimport shutil\nimport soundfile as sf\nimport numpy as np\nfrom unittest.mock import Mock\n\nfrom audio_separator.separator.common_separator import CommonSeparator\n\n\n# Check if FFmpeg is available (required for pydub tests)\ndef is_ffmpeg_available():\n    \"\"\"Check if FFmpeg is available in the system.\"\"\"\n    import subprocess\n    try:\n        subprocess.run(\n            [\"ffmpeg\", \"-version\"],\n            stdout=subprocess.DEVNULL,\n            stderr=subprocess.DEVNULL,\n            check=True\n        )\n        return True\n    except (subprocess.CalledProcessError, FileNotFoundError):\n        return False\n\n\nFFMPEG_AVAILABLE = is_ffmpeg_available()\nrequires_ffmpeg = pytest.mark.skipif(\n    not FFMPEG_AVAILABLE,\n    reason=\"FFmpeg not available (required for pydub tests)\"\n)\n\n\ndef create_test_audio_file(output_path, sample_rate=44100, duration=0.5, bit_depth=16):\n    \"\"\"Create a test audio file with a specific bit depth.\"\"\"\n    t = np.linspace(0, duration, int(sample_rate * duration))\n    frequency = 440  # A4 note\n    audio = 0.5 * np.sin(2 * np.pi * frequency * t)\n    stereo_audio = np.column_stack([audio, audio])\n    \n    if bit_depth == 16:\n        subtype = 'PCM_16'\n    elif bit_depth == 24:\n        subtype = 'PCM_24'\n    elif bit_depth == 32:\n        subtype = 'PCM_32'\n    else:\n        raise ValueError(f\"Unsupported bit depth: {bit_depth}\")\n    \n    sf.write(output_path, stereo_audio, sample_rate, subtype=subtype)\n    return output_path\n\n\ndef get_audio_bit_depth(file_path):\n    \"\"\"Get the bit depth of an audio file.\"\"\"\n    info = sf.info(file_path)\n    subtype = info.subtype\n    \n    if 'PCM_16' in subtype or subtype == 'PCM_S8':\n        return 16\n    elif 'PCM_24' in subtype:\n        return 24\n    elif 'PCM_32' in subtype or 'FLOAT' in subtype or 'DOUBLE' in subtype:\n        return 32\n    else:\n        return None\n\n\n@pytest.fixture(name=\"temp_dir\")\ndef fixture_temp_dir():\n    \"\"\"Fixture providing a temporary directory.\"\"\"\n    temp_dir = tempfile.mkdtemp()\n    yield temp_dir\n    shutil.rmtree(temp_dir)\n\n\n@pytest.fixture(name=\"mock_separator_config\")\ndef fixture_mock_separator_config(temp_dir):\n    \"\"\"Fixture providing a mock separator configuration.\"\"\"\n    return {\n        \"logger\": Mock(),\n        \"log_level\": 20,\n        \"torch_device\": Mock(),\n        \"torch_device_cpu\": Mock(),\n        \"torch_device_mps\": Mock(),\n        \"onnx_execution_provider\": Mock(),\n        \"model_name\": \"test_model\",\n        \"model_path\": \"/path/to/model\",\n        \"model_data\": {\"training\": {\"instruments\": [\"vocals\", \"other\"]}},\n        \"output_dir\": temp_dir,\n        \"output_format\": \"wav\",\n        \"output_bitrate\": None,\n        \"normalization_threshold\": 0.9,\n        \"amplification_threshold\": 0.0,\n        \"enable_denoise\": False,\n        \"output_single_stem\": None,\n        \"invert_using_spec\": False,\n        \"sample_rate\": 44100,\n        \"use_soundfile\": False,\n    }\n\n\n@requires_ffmpeg\ndef test_write_16bit_with_pydub(temp_dir, mock_separator_config):\n    \"\"\"Test that 16-bit audio is written correctly with pydub.\"\"\"\n    print(\"\\n>>> TEST: Write 16-bit audio with pydub\")\n    \n    # Create a 16-bit test input file\n    input_file = os.path.join(temp_dir, \"input_16bit.wav\")\n    create_test_audio_file(input_file, bit_depth=16)\n    \n    # Create CommonSeparator and prepare mix\n    separator = CommonSeparator(mock_separator_config)\n    separator.audio_file_path = input_file\n    mix = separator.prepare_mix(input_file)\n    \n    print(f\"Input bit depth detected: {separator.input_bit_depth}\")\n    \n    # Create output audio data (simulated separation output)\n    # The mix is in format [channels, samples], we need [samples, channels] for writing\n    output_audio = mix.T\n    \n    # Write audio using pydub\n    output_file = \"test_output_16bit.wav\"\n    separator.write_audio_pydub(output_file, output_audio)\n    \n    # Check the output file bit depth\n    full_output_path = os.path.join(temp_dir, output_file)\n    assert os.path.exists(full_output_path), f\"Output file not created: {full_output_path}\"\n    \n    output_bit_depth = get_audio_bit_depth(full_output_path)\n    print(f\"Output bit depth: {output_bit_depth}\")\n    \n    assert output_bit_depth == 16, f\"Expected 16-bit output, got {output_bit_depth}\"\n    print(\"✅ Test passed: 16-bit audio written correctly with pydub\")\n\n\n@requires_ffmpeg\ndef test_write_24bit_with_pydub(temp_dir, mock_separator_config):\n    \"\"\"Test that 24-bit audio is written correctly with pydub.\"\"\"\n    print(\"\\n>>> TEST: Write 24-bit audio with pydub\")\n    \n    # Create a 24-bit test input file\n    input_file = os.path.join(temp_dir, \"input_24bit.wav\")\n    create_test_audio_file(input_file, bit_depth=24)\n    \n    # Create CommonSeparator and prepare mix\n    separator = CommonSeparator(mock_separator_config)\n    separator.audio_file_path = input_file\n    mix = separator.prepare_mix(input_file)\n    \n    print(f\"Input bit depth detected: {separator.input_bit_depth}\")\n    \n    # Create output audio data\n    output_audio = mix.T\n    \n    # Write audio using pydub\n    output_file = \"test_output_24bit.wav\"\n    separator.write_audio_pydub(output_file, output_audio)\n    \n    # Check the output file bit depth\n    full_output_path = os.path.join(temp_dir, output_file)\n    assert os.path.exists(full_output_path), f\"Output file not created: {full_output_path}\"\n    \n    output_bit_depth = get_audio_bit_depth(full_output_path)\n    print(f\"Output bit depth: {output_bit_depth}\")\n    \n    assert output_bit_depth == 24, f\"Expected 24-bit output, got {output_bit_depth}\"\n    print(\"✅ Test passed: 24-bit audio written correctly with pydub\")\n\n\n@requires_ffmpeg\ndef test_write_32bit_with_pydub(temp_dir, mock_separator_config):\n    \"\"\"Test that 32-bit audio is written correctly with pydub.\"\"\"\n    print(\"\\n>>> TEST: Write 32-bit audio with pydub\")\n    \n    # Create a 32-bit test input file\n    input_file = os.path.join(temp_dir, \"input_32bit.wav\")\n    create_test_audio_file(input_file, bit_depth=32)\n    \n    # Create CommonSeparator and prepare mix\n    separator = CommonSeparator(mock_separator_config)\n    separator.audio_file_path = input_file\n    mix = separator.prepare_mix(input_file)\n    \n    print(f\"Input bit depth detected: {separator.input_bit_depth}\")\n    \n    # Create output audio data\n    output_audio = mix.T\n    \n    # Write audio using pydub\n    output_file = \"test_output_32bit.wav\"\n    separator.write_audio_pydub(output_file, output_audio)\n    \n    # Check the output file bit depth\n    full_output_path = os.path.join(temp_dir, output_file)\n    assert os.path.exists(full_output_path), f\"Output file not created: {full_output_path}\"\n    \n    output_bit_depth = get_audio_bit_depth(full_output_path)\n    print(f\"Output bit depth: {output_bit_depth}\")\n    \n    assert output_bit_depth == 32, f\"Expected 32-bit output, got {output_bit_depth}\"\n    print(\"✅ Test passed: 32-bit audio written correctly with pydub\")\n\n\ndef test_write_24bit_with_soundfile(temp_dir, mock_separator_config):\n    \"\"\"Test that 24-bit audio is written correctly with soundfile.\"\"\"\n    print(\"\\n>>> TEST: Write 24-bit audio with soundfile\")\n    \n    # Update config to use soundfile\n    mock_separator_config[\"use_soundfile\"] = True\n    \n    # Create a 24-bit test input file\n    input_file = os.path.join(temp_dir, \"input_24bit.wav\")\n    create_test_audio_file(input_file, bit_depth=24)\n    \n    # Create CommonSeparator and prepare mix\n    separator = CommonSeparator(mock_separator_config)\n    separator.audio_file_path = input_file\n    mix = separator.prepare_mix(input_file)\n    \n    print(f\"Input bit depth detected: {separator.input_bit_depth}\")\n    \n    # Create output audio data\n    output_audio = mix.T\n    \n    # Write audio using soundfile\n    output_file = \"test_output_24bit_sf.wav\"\n    separator.write_audio_soundfile(output_file, output_audio)\n    \n    # Check the output file bit depth\n    full_output_path = os.path.join(temp_dir, output_file)\n    assert os.path.exists(full_output_path), f\"Output file not created: {full_output_path}\"\n    \n    output_bit_depth = get_audio_bit_depth(full_output_path)\n    print(f\"Output bit depth: {output_bit_depth}\")\n    \n    assert output_bit_depth == 24, f\"Expected 24-bit output, got {output_bit_depth}\"\n    print(\"✅ Test passed: 24-bit audio written correctly with soundfile\")\n\n\ndef test_write_16bit_with_soundfile(temp_dir, mock_separator_config):\n    \"\"\"Test that 16-bit audio is written correctly with soundfile.\"\"\"\n    print(\"\\n>>> TEST: Write 16-bit audio with soundfile\")\n    \n    # Update config to use soundfile\n    mock_separator_config[\"use_soundfile\"] = True\n    \n    # Create a 16-bit test input file\n    input_file = os.path.join(temp_dir, \"input_16bit.wav\")\n    create_test_audio_file(input_file, bit_depth=16)\n    \n    # Create CommonSeparator and prepare mix\n    separator = CommonSeparator(mock_separator_config)\n    separator.audio_file_path = input_file\n    mix = separator.prepare_mix(input_file)\n    \n    print(f\"Input bit depth detected: {separator.input_bit_depth}\")\n    \n    # Create output audio data\n    output_audio = mix.T\n    \n    # Write audio using soundfile\n    output_file = \"test_output_16bit_sf.wav\"\n    separator.write_audio_soundfile(output_file, output_audio)\n    \n    # Check the output file bit depth\n    full_output_path = os.path.join(temp_dir, output_file)\n    assert os.path.exists(full_output_path), f\"Output file not created: {full_output_path}\"\n    \n    output_bit_depth = get_audio_bit_depth(full_output_path)\n    print(f\"Output bit depth: {output_bit_depth}\")\n    \n    assert output_bit_depth == 16, f\"Expected 16-bit output, got {output_bit_depth}\"\n    print(\"✅ Test passed: 16-bit audio written correctly with soundfile\")\n\n\nif __name__ == \"__main__\":\n    # Run tests with pytest\n    pytest.main([__file__, \"-v\", \"-s\"])\n\n"
  },
  {
    "path": "tests/unit/test_cli.py",
    "content": "import json\nimport pytest\nimport logging\nfrom audio_separator.utils.cli import main\nimport subprocess\nimport importlib.metadata\nfrom unittest import mock\nfrom unittest.mock import patch, MagicMock, mock_open\n\n\n# Mock metadata.distribution for tests to avoid PackageNotFoundError in environment without installed package\n@pytest.fixture(autouse=True)\ndef mock_distribution():\n    original_distribution = importlib.metadata.distribution\n\n    def side_effect(package_name):\n        if package_name == \"audio-separator\":\n            mock_dist = MagicMock()\n            mock_dist.version = \"0.42.1\"\n            return mock_dist\n        return original_distribution(package_name)\n\n    with patch(\"importlib.metadata.distribution\", side_effect=side_effect):\n        yield\n\n\n# Common fixture for expected arguments\n@pytest.fixture\ndef common_expected_args():\n    return {\n        \"log_formatter\": mock.ANY,\n        \"log_level\": logging.INFO,\n        \"model_file_dir\": \"/tmp/audio-separator-models/\",\n        \"output_dir\": None,\n        \"output_format\": \"FLAC\",\n        \"output_bitrate\": None,\n        \"normalization_threshold\": 0.9,\n        \"amplification_threshold\": 0.0,\n        \"output_single_stem\": None,\n        \"invert_using_spec\": False,\n        \"sample_rate\": 44100,\n        \"use_soundfile\": False,\n        \"use_autocast\": False,\n        \"chunk_duration\": None,\n        \"ensemble_algorithm\": None,\n        \"ensemble_weights\": None,\n        \"ensemble_preset\": None,\n        \"mdx_params\": {\"hop_length\": 1024, \"segment_size\": 256, \"overlap\": 0.25, \"batch_size\": 1, \"enable_denoise\": False},\n        \"vr_params\": {\"batch_size\": 1, \"window_size\": 512, \"aggression\": 5, \"enable_tta\": False, \"enable_post_process\": False, \"post_process_threshold\": 0.2, \"high_end_process\": False},\n        \"demucs_params\": {\"segment_size\": \"Default\", \"shifts\": 2, \"overlap\": 0.25, \"segments_enabled\": True},\n        \"mdxc_params\": {\"segment_size\": 256, \"batch_size\": 1, \"overlap\": 8, \"override_model_segment_size\": False, \"pitch_shift\": 0},\n    }\n\n\n# Test the CLI with version argument using subprocess\ndef test_cli_version_subprocess():\n    # Skip subprocess CLI tests - require proper CLI installation\n    pytest.skip(\"CLI subprocess tests require proper installation\")\n\n\n# Test the CLI with no arguments\ndef test_cli_no_args(capsys):\n    # Skip subprocess CLI tests - require proper CLI installation  \n    pytest.skip(\"CLI subprocess tests require proper installation\")\n\n\n# Test with multiple filename arguments\ndef test_cli_multiple_filenames():\n    test_args = [\"cli.py\", \"test1.mp3\", \"test2.mp3\"]\n\n    # Mock the open function to prevent actual file operations\n    mock_file = mock_open()\n\n    # Create a mock logger\n    mock_logger = MagicMock()\n\n    # Patch multiple functions to prevent actual file operations and separations\n    with patch(\"sys.argv\", test_args), patch(\"builtins.open\", mock_file), patch(\"audio_separator.separator.Separator.separate\") as mock_separate, patch(\n        \"audio_separator.separator.Separator.load_model\"\n    ), patch(\"logging.getLogger\", return_value=mock_logger):\n\n        # Mock the separate method to return some dummy output\n        mock_separate.return_value = [\"output_file1.mp3\", \"output_file2.mp3\"]\n\n        # Call the main function\n        main()\n\n        mock_separate.assert_called_once()\n        args, kwargs = mock_separate.call_args\n        assert args[0] == [\"test1.mp3\", \"test2.mp3\"]\n\n        # Check if the logger captured information about both files\n        log_messages = [call[0][0] for call in mock_logger.info.call_args_list]\n        assert any(\"test1.mp3\" in msg and \"test2.mp3\" in msg for msg in log_messages)\n        assert any(\"Separation complete\" in msg for msg in log_messages)\n\n\n# Test the CLI with a specific audio file\ndef test_cli_with_audio_file(capsys, common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--model_filename=UVR-MDX-NET-Inst_HQ_4.onnx\"]\n    with patch(\"audio_separator.separator.Separator.separate\") as mock_separate:\n        mock_separate.return_value = [\"output_file.mp3\"]\n        with patch(\"sys.argv\", test_args):\n            # Call the main function in cli.py\n            main()\n\n    # Update expected args for this specific test\n    common_expected_args[\"model_file_dir\"] = \"/tmp/audio-separator-models/\"\n\n    # Check if the separate method was called with the correct arguments\n    mock_separate.assert_called_once()\n\n    # Assertions\n    assert mock_separate.called\n\n\n# Test the CLI with invalid log level\ndef test_cli_invalid_log_level():\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--log_level=invalid\"]\n    with patch(\"sys.argv\", test_args):\n        # Assert an attribute error is raised due to the invalid LogLevel\n        with pytest.raises(AttributeError):\n            # Call the main function in cli.py\n            main()\n\n\n# Test using model name argument\ndef test_cli_model_filename_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--model_filename=Custom_Model.onnx\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Assertions\n            mock_separator.assert_called_once_with(**common_expected_args)\n            mock_separator_instance.load_model.assert_called_once_with(model_filename=\"Custom_Model.onnx\")\n\n\n# Test using output directory argument\ndef test_cli_output_dir_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--output_dir=/custom/output/dir\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"output_dir\"] = \"/custom/output/dir\"\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using output format argument\ndef test_cli_output_format_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--output_format=MP3\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"output_format\"] = \"MP3\"\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using normalization_threshold argument\ndef test_cli_normalization_threshold_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--normalization=0.75\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"normalization_threshold\"] = 0.75\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using amplification_threshold argument\ndef test_cli_amplification_threshold_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--amplification=0.75\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"amplification_threshold\"] = 0.75\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using single stem argument\ndef test_cli_single_stem_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--single_stem=instrumental\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"output_single_stem\"] = \"instrumental\"\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using invert spectrogram argument\ndef test_cli_invert_spectrogram_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--invert_spect\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"invert_using_spec\"] = True\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using use_autocast argument\ndef test_cli_use_autocast_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--use_autocast\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Update expected args for this specific test\n            expected_args = common_expected_args.copy()\n            expected_args[\"use_autocast\"] = True\n\n            # Assertions\n            mock_separator.assert_called_once_with(**expected_args)\n\n\n# Test using custom_output_names arguments\ndef test_cli_custom_output_names_argument(common_expected_args):\n    custom_names = {\n        \"Vocals\": \"vocals_output\",\n        \"Instrumental\": \"instrumental_output\",\n    }\n    test_args = [\"cli.py\", \"test_audio.mp3\", f\"--custom_output_names={json.dumps(custom_names)}\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Assertions\n            mock_separator.assert_called_once_with(**common_expected_args)\n            mock_separator_instance.separate.assert_called_once_with([\"test_audio.mp3\"], custom_output_names=custom_names)\n\n\n# Test using custom_output_names arguments\ndef test_cli_demucs_output_names_argument(common_expected_args):\n    demucs_output_names = {\n        \"Vocals\": \"vocals_output\",\n        \"Drums\": \"drums_output\",\n        \"Bass\": \"bass_output\",\n        \"Other\": \"other_output\",\n        \"Guitar\": \"guitar_output\",\n        \"Piano\": \"piano_output\"\n    }\n    test_args = [\"cli.py\", \"test_audio.mp3\", f\"--custom_output_names={json.dumps(demucs_output_names)}\", \"--model_filename=htdemucs_6s.yaml\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Assertions\n            mock_separator.assert_called_once_with(**common_expected_args)\n            mock_separator_instance.separate.assert_called_once_with([\"test_audio.mp3\"], custom_output_names=demucs_output_names)\n\n\n# Test using --extra_models for ensemble mode\ndef test_cli_extra_models_argument(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"-m\", \"model1.onnx\", \"--extra_models\", \"model2.onnx\", \"model3.onnx\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            # Assertions\n            mock_separator.assert_called_once_with(**common_expected_args)\n            mock_separator_instance.load_model.assert_called_once_with(model_filename=[\"model1.onnx\", \"model2.onnx\", \"model3.onnx\"])\n\n\n# Test that -m with single model still passes a string (backward compat)\ndef test_cli_single_model_passes_string(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"-m\", \"my_model.onnx\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            mock_separator_instance.load_model.assert_called_once_with(model_filename=\"my_model.onnx\")\n\n\n# Test old CLI syntax: -m model audio.wav (model before audio file)\ndef test_cli_old_syntax_model_before_audio(common_expected_args):\n    test_args = [\"cli.py\", \"-m\", \"my_model.onnx\", \"test_audio.mp3\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            mock_separator_instance.load_model.assert_called_once_with(model_filename=\"my_model.onnx\")\n            mock_separator_instance.separate.assert_called_once_with([\"test_audio.mp3\"], custom_output_names=None)\n\n\n# Test --ensemble_preset passes preset to Separator and calls load_model() with default\ndef test_cli_ensemble_preset(common_expected_args):\n    test_args = [\"cli.py\", \"test_audio.mp3\", \"--ensemble_preset\", \"vocal_balanced\"]\n    with patch(\"sys.argv\", test_args):\n        with patch(\"audio_separator.separator.Separator\") as mock_separator:\n            mock_separator_instance = mock_separator.return_value\n            mock_separator_instance.separate.return_value = [\"output_file.mp3\"]\n            main()\n\n            expected_args = common_expected_args.copy()\n            expected_args[\"ensemble_preset\"] = \"vocal_balanced\"\n            mock_separator.assert_called_once_with(**expected_args)\n            # With preset and no explicit models, load_model() called with default\n            mock_separator_instance.load_model.assert_called_once_with()\n\n\n# Test --list_presets exits cleanly\ndef test_cli_list_presets(capsys):\n    test_args = [\"cli.py\", \"--list_presets\"]\n    with patch(\"sys.argv\", test_args):\n        with pytest.raises(SystemExit) as exc_info:\n            main()\n        assert exc_info.value.code == 0\n    captured = capsys.readouterr()\n    assert \"vocal_balanced\" in captured.out\n    assert \"karaoke\" in captured.out\n"
  },
  {
    "path": "tests/unit/test_configuration_normalizer.py",
    "content": "\"\"\"\nUnit tests for ConfigurationNormalizer methods.\nTests the configuration normalization and validation logic.\n\"\"\"\n\nimport pytest\nfrom unittest.mock import Mock, patch\n\n# Add the roformer module to path for imports\nimport sys\nimport os\n# Find project root dynamically\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\n# Go up until we find the project root (contains audio_separator/ directory)\nwhile project_root and not os.path.exists(os.path.join(project_root, 'audio_separator')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\nif project_root:\n    sys.path.append(project_root)\n\nfrom audio_separator.separator.roformer.configuration_normalizer import ConfigurationNormalizer\nfrom audio_separator.separator.roformer.parameter_validation_error import ParameterValidationError\n\n\nclass TestConfigurationNormalizer:\n    \"\"\"Test cases for ConfigurationNormalizer class.\"\"\"\n    \n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.normalizer = ConfigurationNormalizer()\n    \n    def test_normalize_config_basic(self):\n        \"\"\"Test basic configuration normalization.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n        }\n        \n        result = self.normalizer.normalize_config(config, \"bs_roformer\", apply_defaults=False, validate=False)\n        \n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['freqs_per_bands'] == (2, 4, 8, 16, 32, 64)\n    \n    def test_normalize_config_with_defaults(self):\n        \"\"\"Test configuration normalization with defaults applied.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n        }\n        \n        result = self.normalizer.normalize_config(config, \"bs_roformer\", apply_defaults=True, validate=False)\n        \n        # Original values preserved\n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['freqs_per_bands'] == (2, 4, 8, 16, 32, 64)\n        \n        # Defaults applied\n        assert result['stereo'] is False\n        assert result['num_stems'] == 2\n        assert result['flash_attn'] is True\n        assert result['mlp_expansion_factor'] == 4\n    \n    def test_normalize_config_with_validation_valid(self):\n        \"\"\"Test configuration normalization with validation - valid config.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n        }\n        \n        # Should not raise any exception\n        result = self.normalizer.normalize_config(config, \"bs_roformer\", apply_defaults=True, validate=True)\n        assert result is not None\n    \n    def test_normalize_config_with_validation_invalid(self):\n        \"\"\"Test configuration normalization with validation - invalid config.\"\"\"\n        config = {\n            'dim': \"invalid\",  # Invalid type\n            'depth': 12\n            # Missing required 'freqs_per_bands'\n        }\n        \n        with pytest.raises(ParameterValidationError):\n            self.normalizer.normalize_config(config, \"bs_roformer\", apply_defaults=True, validate=True)\n    \n    def test_normalize_structure_flat_config(self):\n        \"\"\"Test structure normalization with flat configuration.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'sample_rate': 44100\n        }\n        \n        result = self.normalizer._normalize_structure(config, \"bs_roformer\")\n        \n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['sample_rate'] == 44100\n    \n    def test_normalize_structure_nested_config(self):\n        \"\"\"Test structure normalization with nested configuration.\"\"\"\n        config = {\n            'model': {\n                'dim': 512,\n                'depth': 12\n            },\n            'training': {\n                'sample_rate': 44100,\n                'hop_length': 512\n            },\n            'inference': {\n                'dim_t': 1024,\n                'n_fft': 2048\n            }\n        }\n        \n        result = self.normalizer._normalize_structure(config, \"bs_roformer\")\n        \n        # Flattened model parameters\n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        \n        # Extracted training/inference parameters\n        assert result['sample_rate'] == 44100\n        assert result['hop_length'] == 512\n        assert result['dim_t'] == 1024\n        assert result['n_fft'] == 2048\n    \n    def test_normalize_parameter_names_aliases(self):\n        \"\"\"Test parameter name normalization with aliases.\"\"\"\n        config = {\n            'n_fft': 2048,  # Should become 'stft_n_fft'\n            'hop_length': 512,  # Should become 'stft_hop_length'\n            'n_heads': 8,  # Should become 'heads'\n            'expansion_factor': 4,  # Should become 'mlp_expansion_factor'\n            'freq_bands': (2, 4, 8, 16),  # Should become 'freqs_per_bands'\n            'n_mels': 64  # Should become 'num_bands'\n        }\n        \n        result = self.normalizer._normalize_parameter_names(config)\n        \n        assert result['stft_n_fft'] == 2048\n        assert result['stft_hop_length'] == 512\n        assert result['heads'] == 8\n        assert result['mlp_expansion_factor'] == 4\n        assert result['freqs_per_bands'] == (2, 4, 8, 16)\n        assert result['num_bands'] == 64\n        \n        # Original names should not be present\n        assert 'n_fft' not in result\n        assert 'hop_length' not in result\n        assert 'n_heads' not in result\n    \n    def test_normalize_parameter_values_booleans(self):\n        \"\"\"Test parameter value normalization for booleans.\"\"\"\n        config = {\n            'stereo': 'true',\n            'flash_attn': 'false',\n            'sage_attention': '1',\n            'zero_dc': 'yes',\n            'use_torch_checkpoint': 'on',\n            'skip_connection': '0'\n        }\n        \n        result = self.normalizer._normalize_parameter_values(config, \"bs_roformer\")\n        \n        assert result['stereo'] is True\n        assert result['flash_attn'] is False\n        assert result['sage_attention'] is True\n        assert result['zero_dc'] is True\n        assert result['use_torch_checkpoint'] is True\n        assert result['skip_connection'] is False\n    \n    def test_normalize_parameter_values_numbers(self):\n        \"\"\"Test parameter value normalization for numbers.\"\"\"\n        config = {\n            'dim': '512',\n            'depth': '12.0',  # Float string to int\n            'sample_rate': 44100.0,  # Float to int\n            'attn_dropout': '0.1',\n            'ff_dropout': 0.2\n        }\n        \n        result = self.normalizer._normalize_parameter_values(config, \"bs_roformer\")\n        \n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['sample_rate'] == 44100\n        assert result['attn_dropout'] == 0.1\n        assert result['ff_dropout'] == 0.2\n        \n        # Check types\n        assert isinstance(result['dim'], int)\n        assert isinstance(result['depth'], int)\n        assert isinstance(result['sample_rate'], int)\n        assert isinstance(result['attn_dropout'], float)\n        assert isinstance(result['ff_dropout'], float)\n    \n    def test_normalize_parameter_values_tuples(self):\n        \"\"\"Test parameter value normalization for tuples/lists.\"\"\"\n        config = {\n            'freqs_per_bands': '[2, 4, 8, 16]',  # String representation\n            'freqs_per_bands_2': [2, 4, 8, 16],  # List to tuple\n            'freqs_per_bands_3': '(2,4,8,16)'  # String tuple\n        }\n        \n        result = self.normalizer._normalize_parameter_values(config, \"bs_roformer\")\n        \n        assert result['freqs_per_bands'] == (2, 4, 8, 16)\n        assert result['freqs_per_bands_2'] == (2, 4, 8, 16)\n        assert result['freqs_per_bands_3'] == (2, 4, 8, 16)\n        \n        # Check types are tuples\n        assert isinstance(result['freqs_per_bands'], tuple)\n        assert isinstance(result['freqs_per_bands_2'], tuple)\n        assert isinstance(result['freqs_per_bands_3'], tuple)\n    \n    def test_normalize_parameter_values_strings(self):\n        \"\"\"Test parameter value normalization for strings.\"\"\"\n        config = {\n            'norm': 'LAYER_NORM',  # Should become lowercase\n            'act': 'GELU',  # Should become lowercase\n            'mel_scale': 'HTK'  # Should become lowercase\n        }\n        \n        result = self.normalizer._normalize_parameter_values(config, \"mel_band_roformer\")\n        \n        assert result['norm'] == 'layer_norm'\n        assert result['act'] == 'gelu'\n        assert result['mel_scale'] == 'htk'\n    \n    def test_detect_model_type_bs_roformer(self):\n        \"\"\"Test model type detection for BSRoformer.\"\"\"\n        configs = [\n            {'freqs_per_bands': (2, 4, 8, 16)},  # Direct indicator\n            {'model_type': 'bs_roformer'},  # Explicit type\n            {'type': 'BSRoformer'},  # Explicit type variant\n            {'architecture': 'bs-roformer'}  # Architecture field\n        ]\n        \n        for config in configs:\n            result = self.normalizer.detect_model_type(config)\n            assert result == \"bs_roformer\", f\"Failed for config: {config}\"\n    \n    def test_detect_model_type_mel_band_roformer(self):\n        \"\"\"Test model type detection for MelBandRoformer.\"\"\"\n        configs = [\n            {'num_bands': 64},  # Direct indicator\n            {'n_mels': 64},  # Alias\n            {'mel_bands': 64},  # Alias\n            {'model_type': 'mel_band_roformer'},  # Explicit type\n            {'type': 'MelBandRoformer'},  # Explicit type variant\n            {'architecture': 'mel-roformer'}  # Architecture field\n        ]\n        \n        for config in configs:\n            result = self.normalizer.detect_model_type(config)\n            assert result == \"mel_band_roformer\", f\"Failed for config: {config}\"\n    \n    def test_detect_model_type_unknown(self):\n        \"\"\"Test model type detection for unknown configurations.\"\"\"\n        configs = [\n            {},  # Empty config\n            {'dim': 512, 'depth': 12},  # No specific indicators\n            {'model_type': 'unknown'}  # Unknown type\n        ]\n        \n        for config in configs:\n            result = self.normalizer.detect_model_type(config)\n            assert result is None, f\"Should return None for config: {config}\"\n    \n    def test_normalize_from_file_path_bs_roformer(self):\n        \"\"\"Test normalization with file path detection - BSRoformer.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12\n        }\n        \n        file_paths = [\n            '/path/to/bs_roformer_model.ckpt',\n            '/path/to/BS-Roformer-model.pth',\n            '/path/to/model_bs_roformer.bin'\n        ]\n        \n        for file_path in file_paths:\n            result = self.normalizer.normalize_from_file_path(\n                config, file_path, apply_defaults=True, validate=False\n            )\n            \n            # Should have BSRoformer defaults\n            assert 'freqs_per_bands' in result, f\"Failed for path: {file_path}\"\n            assert 'mask_estimator_depth' in result, f\"Failed for path: {file_path}\"\n    \n    def test_normalize_from_file_path_mel_band_roformer(self):\n        \"\"\"Test normalization with file path detection - MelBandRoformer.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12\n        }\n        \n        file_paths = [\n            '/path/to/mel_band_roformer_model.ckpt',\n            '/path/to/MelBand-Roformer-model.pth',\n            '/path/to/model_mel_roformer.bin'\n        ]\n        \n        for file_path in file_paths:\n            result = self.normalizer.normalize_from_file_path(\n                config, file_path, apply_defaults=True, validate=False\n            )\n            \n            # Should have MelBandRoformer defaults\n            assert result['num_bands'] == 64, f\"Failed for path: {file_path}\"\n    \n    def test_normalize_from_file_path_default_fallback(self):\n        \"\"\"Test normalization with file path detection - default fallback.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12\n        }\n        \n        file_path = '/path/to/unknown_model.ckpt'\n        \n        with patch.object(self.normalizer, 'detect_model_type', return_value=None):\n            result = self.normalizer.normalize_from_file_path(\n                config, file_path, apply_defaults=True, validate=False\n            )\n            \n            # Should default to BSRoformer\n            assert 'freqs_per_bands' in result\n    \n    def test_normalization_preserves_original_config(self):\n        \"\"\"Test that normalization doesn't modify the original configuration.\"\"\"\n        original_config = {\n            'dim': 512,\n            'depth': 12,\n            'n_fft': 2048  # Will be renamed to stft_n_fft\n        }\n        \n        # Keep a copy to compare\n        config_copy = original_config.copy()\n        \n        result = self.normalizer.normalize_config(original_config, \"bs_roformer\")\n        \n        # Original config should be unchanged\n        assert original_config == config_copy\n        assert 'n_fft' in original_config  # Original name preserved\n        assert 'stft_n_fft' not in original_config\n        \n        # Result should have normalized names\n        assert 'stft_n_fft' in result\n        assert result['stft_n_fft'] == 2048\n    \n    def test_normalization_error_handling(self):\n        \"\"\"Test error handling during normalization.\"\"\"\n        # Test with invalid string values that can't be converted\n        config = {\n            'dim': 'not_a_number',\n            'depth': 12,\n            'freqs_per_bands': 'invalid_tuple_string'\n        }\n        \n        # Should not crash, invalid values should be passed through for validator to catch\n        result = self.normalizer.normalize_config(config, \"bs_roformer\", validate=False)\n        \n        assert result['dim'] == 'not_a_number'  # Passed through unchanged\n        assert result['depth'] == 12  # Valid value normalized\n        assert result['freqs_per_bands'] == 'invalid_tuple_string'  # Passed through unchanged\n    \n    def test_comprehensive_normalization_workflow(self):\n        \"\"\"Test complete normalization workflow with complex configuration.\"\"\"\n        config = {\n            'model': {\n                'dim': '512',\n                'depth': '12.0',\n                'n_heads': '8'\n            },\n            'training': {\n                'sample_rate': 44100,\n                'n_fft': 2048\n            },\n            'stereo': 'true',\n            'flash_attn': 'false',\n            'freq_bands': '[2, 4, 8, 16, 32, 64]',\n            'norm': 'LAYER_NORM'\n        }\n        \n        result = self.normalizer.normalize_config(config, \"bs_roformer\", apply_defaults=True, validate=False)\n        \n        # Structure flattened\n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['sample_rate'] == 44100\n        \n        # Names normalized\n        assert result['heads'] == 8\n        assert result['stft_n_fft'] == 2048\n        assert result['freqs_per_bands'] == (2, 4, 8, 16, 32, 64)\n        \n        # Values normalized\n        assert result['stereo'] is True\n        assert result['flash_attn'] is False\n        assert result['norm'] == 'layer_norm'\n        \n        # Defaults applied\n        assert result['num_stems'] == 2\n        assert result['mlp_expansion_factor'] == 4\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__])\n"
  },
  {
    "path": "tests/unit/test_deploy_cloudrun_async.py",
    "content": "\"\"\"Tests for async job infrastructure in deploy_cloudrun.py.\"\"\"\n\nimport threading\nimport time\n\nimport pytest\n\n\nclass TestGPUSemaphore:\n    \"\"\"Test that GPU semaphore serializes separation work.\"\"\"\n\n    def test_semaphore_blocks_concurrent_jobs(self):\n        \"\"\"Second job waits while first job holds the semaphore.\"\"\"\n        semaphore = threading.Semaphore(1)\n        execution_order = []\n\n        def job(name, duration):\n            with semaphore:\n                execution_order.append(f\"{name}_start\")\n                time.sleep(duration)\n                execution_order.append(f\"{name}_end\")\n\n        t1 = threading.Thread(target=job, args=(\"job1\", 0.2))\n        t2 = threading.Thread(target=job, args=(\"job2\", 0.1))\n        t1.start()\n        time.sleep(0.05)\n        t2.start()\n        t1.join()\n        t2.join()\n\n        assert execution_order == [\"job1_start\", \"job1_end\", \"job2_start\", \"job2_end\"]\n\n    def test_semaphore_releases_on_exception(self):\n        \"\"\"Semaphore is released even when the job raises an exception.\"\"\"\n        semaphore = threading.Semaphore(1)\n        execution_order = []\n\n        def failing_job():\n            semaphore.acquire()\n            try:\n                execution_order.append(\"failing_start\")\n                # Simulate error without actually raising (avoids pytest thread warning)\n                execution_order.append(\"failing_error\")\n            finally:\n                semaphore.release()\n                execution_order.append(\"failing_released\")\n\n        def second_job():\n            with semaphore:\n                execution_order.append(\"second_start\")\n\n        t1 = threading.Thread(target=failing_job)\n        t1.start()\n        t1.join()\n\n        t2 = threading.Thread(target=second_job)\n        t2.start()\n        t2.join()\n\n        assert execution_order == [\"failing_start\", \"failing_error\", \"failing_released\", \"second_start\"]\n\n    def test_semaphore_allows_sequential_access(self):\n        \"\"\"Multiple jobs can run sequentially through the semaphore.\"\"\"\n        semaphore = threading.Semaphore(1)\n        completed = []\n\n        def job(name):\n            with semaphore:\n                completed.append(name)\n\n        for i in range(5):\n            t = threading.Thread(target=job, args=(f\"job{i}\",))\n            t.start()\n            t.join()\n\n        assert len(completed) == 5\n\n\nuvicorn = pytest.importorskip(\"uvicorn\", reason=\"uvicorn not installed (server-only dependency)\")\n\n\nclass TestLazyInit:\n    \"\"\"Test lazy initialization of stores.\"\"\"\n\n    def test_get_job_store_returns_same_instance(self):\n        \"\"\"get_job_store() returns the same instance on repeated calls.\"\"\"\n        import audio_separator.remote.deploy_cloudrun as module\n\n        # Reset global state\n        module._job_store = None\n\n        mock_store = object()\n        module._job_store = mock_store\n\n        result = module.get_job_store()\n        assert result is mock_store\n\n        # Calling again returns same instance\n        result2 = module.get_job_store()\n        assert result2 is mock_store\n\n        # Clean up\n        module._job_store = None\n\n    def test_get_output_store_returns_same_instance(self):\n        \"\"\"get_output_store() returns the same instance on repeated calls.\"\"\"\n        import audio_separator.remote.deploy_cloudrun as module\n\n        # Reset global state\n        module._output_store = None\n\n        mock_store = object()\n        module._output_store = mock_store\n\n        result = module.get_output_store()\n        assert result is mock_store\n\n        # Clean up\n        module._output_store = None\n\n\nclass TestFireAndForget:\n    \"\"\"Test that fire-and-forget pattern works correctly.\"\"\"\n\n    def test_run_in_executor_without_await_returns_immediately(self):\n        \"\"\"Verify that not awaiting run_in_executor lets the caller proceed.\"\"\"\n        import asyncio\n\n        async def fire_and_forget():\n            started = threading.Event()\n            finished = threading.Event()\n\n            def slow_task():\n                started.set()\n                time.sleep(0.2)\n                finished.set()\n\n            loop = asyncio.get_event_loop()\n            # Fire-and-forget (no await)\n            loop.run_in_executor(None, slow_task)\n\n            # We should get here immediately, before the task finishes\n            assert not finished.is_set()\n\n            # Wait for task to actually start and finish\n            started.wait(timeout=1)\n            finished.wait(timeout=1)\n            assert finished.is_set()\n\n        loop = asyncio.new_event_loop()\n        try:\n            loop.run_until_complete(fire_and_forget())\n        finally:\n            loop.close()\n"
  },
  {
    "path": "tests/unit/test_ensemble_presets.py",
    "content": "import pytest\nimport json\nimport logging\nfrom unittest.mock import patch, MagicMock\nfrom io import StringIO\nfrom audio_separator.separator import Separator\n\n\n@pytest.fixture\ndef mock_separator_init():\n    \"\"\"Fixture that patches hardware setup so Separator can be instantiated without GPU.\"\"\"\n    with patch.object(Separator, \"setup_accelerated_inferencing_device\"):\n        yield\n\n\ndef test_load_preset_vocal_balanced(mock_separator_init):\n    sep = Separator(ensemble_preset=\"vocal_balanced\")\n    assert sep.ensemble_algorithm == \"avg_fft\"\n    assert sep._ensemble_preset_models == [\n        \"bs_roformer_vocals_resurrection_unwa.ckpt\",\n        \"melband_roformer_big_beta6x.ckpt\",\n    ]\n    assert sep.ensemble_weights is None\n\n\ndef test_load_preset_karaoke(mock_separator_init):\n    sep = Separator(ensemble_preset=\"karaoke\")\n    assert sep.ensemble_algorithm == \"avg_wave\"\n    assert len(sep._ensemble_preset_models) == 3\n\n\ndef test_load_preset_instrumental_clean(mock_separator_init):\n    sep = Separator(ensemble_preset=\"instrumental_clean\")\n    assert sep.ensemble_algorithm == \"uvr_max_spec\"\n    assert len(sep._ensemble_preset_models) == 2\n\n\ndef test_preset_algorithm_override(mock_separator_init):\n    \"\"\"User explicitly sets algorithm, which should override preset's default.\"\"\"\n    sep = Separator(ensemble_preset=\"vocal_clean\", ensemble_algorithm=\"avg_wave\")\n    # vocal_clean preset uses min_fft, but user overrode to avg_wave\n    assert sep.ensemble_algorithm == \"avg_wave\"\n    # Models still come from preset\n    assert sep._ensemble_preset_models == [\n        \"bs_roformer_vocals_revive_v2_unwa.ckpt\",\n        \"mel_band_roformer_kim_ft2_bleedless_unwa.ckpt\",\n    ]\n\n\ndef test_preset_no_algorithm_uses_preset_default(mock_separator_init):\n    \"\"\"When no algorithm is specified, preset's algorithm is used.\"\"\"\n    sep = Separator(ensemble_preset=\"vocal_clean\")\n    assert sep.ensemble_algorithm == \"min_fft\"  # from preset\n\n\ndef test_preset_unknown_name(mock_separator_init):\n    with pytest.raises(ValueError, match=\"Unknown ensemble preset\"):\n        Separator(ensemble_preset=\"nonexistent_preset\")\n\n\ndef test_no_preset_defaults_to_avg_wave(mock_separator_init):\n    sep = Separator()\n    assert sep.ensemble_algorithm == \"avg_wave\"\n    assert sep._ensemble_preset_models is None\n\n\ndef test_list_ensemble_presets(mock_separator_init):\n    sep = Separator(info_only=True)\n    presets = sep.list_ensemble_presets()\n    assert isinstance(presets, dict)\n    assert \"vocal_balanced\" in presets\n    assert \"karaoke\" in presets\n    assert \"instrumental_clean\" in presets\n    assert len(presets) == 9\n\n\ndef test_preset_loads_models_on_load_model(mock_separator_init):\n    \"\"\"Calling load_model() with default arg should use preset models.\"\"\"\n    sep = Separator(ensemble_preset=\"karaoke\")\n    # Don't actually load the model, just check the preset models are set\n    assert sep._ensemble_preset_models is not None\n    assert len(sep._ensemble_preset_models) == 3\n\n\ndef test_preset_json_valid():\n    \"\"\"Validate that ensemble_presets.json is well-formed.\"\"\"\n    from importlib import resources\n    with resources.open_text(\"audio_separator\", \"ensemble_presets.json\") as f:\n        data = json.load(f)\n\n    assert \"version\" in data\n    assert data[\"version\"] == 1\n    assert \"presets\" in data\n\n    valid_algorithms = [\n        \"avg_wave\", \"median_wave\", \"min_wave\", \"max_wave\",\n        \"avg_fft\", \"median_fft\", \"min_fft\", \"max_fft\",\n        \"uvr_max_spec\", \"uvr_min_spec\", \"ensemble_wav\",\n    ]\n\n    for preset_id, preset in data[\"presets\"].items():\n        assert \"name\" in preset, f\"Preset {preset_id} missing 'name'\"\n        assert \"description\" in preset, f\"Preset {preset_id} missing 'description'\"\n        assert \"models\" in preset, f\"Preset {preset_id} missing 'models'\"\n        assert \"algorithm\" in preset, f\"Preset {preset_id} missing 'algorithm'\"\n        assert isinstance(preset[\"models\"], list), f\"Preset {preset_id} models must be a list\"\n        assert len(preset[\"models\"]) >= 2, f\"Preset {preset_id} must have at least 2 models\"\n        assert preset[\"algorithm\"] in valid_algorithms, f\"Preset {preset_id} has invalid algorithm: {preset['algorithm']}\"\n\n\ndef test_preset_validation_bad_weights_length(mock_separator_init):\n    \"\"\"Preset with weights length != models length should raise ValueError.\"\"\"\n    # We need to patch the JSON to inject a bad preset\n    import io\n    bad_json = json.dumps({\n        \"version\": 1,\n        \"presets\": {\n            \"bad_weights\": {\n                \"name\": \"Bad\",\n                \"description\": \"test\",\n                \"models\": [\"a.ckpt\", \"b.ckpt\"],\n                \"algorithm\": \"avg_wave\",\n                \"weights\": [1.0, 2.0, 3.0],  # 3 weights for 2 models\n            }\n        }\n    })\n    with patch(\"audio_separator.separator.separator.resources.open_text\", return_value=io.StringIO(bad_json)):\n        with pytest.raises(ValueError, match=\"weights length\"):\n            Separator(ensemble_preset=\"bad_weights\")\n\n\ndef test_preset_validation_bad_algorithm(mock_separator_init):\n    \"\"\"Preset with unknown algorithm should raise ValueError.\"\"\"\n    import io\n    bad_json = json.dumps({\n        \"version\": 1,\n        \"presets\": {\n            \"bad_algo\": {\n                \"name\": \"Bad\",\n                \"description\": \"test\",\n                \"models\": [\"a.ckpt\", \"b.ckpt\"],\n                \"algorithm\": \"nonexistent_algorithm\",\n                \"weights\": None,\n            }\n        }\n    })\n    with patch(\"audio_separator.separator.separator.resources.open_text\", return_value=io.StringIO(bad_json)):\n        with pytest.raises(ValueError, match=\"unknown algorithm\"):\n            Separator(ensemble_preset=\"bad_algo\")\n\n\ndef test_preset_validation_single_model(mock_separator_init):\n    \"\"\"Preset with only 1 model should raise ValueError.\"\"\"\n    import io\n    bad_json = json.dumps({\n        \"version\": 1,\n        \"presets\": {\n            \"one_model\": {\n                \"name\": \"Bad\",\n                \"description\": \"test\",\n                \"models\": [\"a.ckpt\"],\n                \"algorithm\": \"avg_wave\",\n                \"weights\": None,\n            }\n        }\n    })\n    with patch(\"audio_separator.separator.separator.resources.open_text\", return_value=io.StringIO(bad_json)):\n        with pytest.raises(ValueError, match=\"at least 2 models\"):\n            Separator(ensemble_preset=\"one_model\")\n\n\ndef test_preset_weights_applied(mock_separator_init):\n    \"\"\"Preset with explicit weights should apply them.\"\"\"\n    import io\n    preset_json = json.dumps({\n        \"version\": 1,\n        \"presets\": {\n            \"weighted\": {\n                \"name\": \"Weighted\",\n                \"description\": \"test\",\n                \"models\": [\"a.ckpt\", \"b.ckpt\"],\n                \"algorithm\": \"avg_wave\",\n                \"weights\": [2.0, 1.0],\n            }\n        }\n    })\n    with patch(\"audio_separator.separator.separator.resources.open_text\", return_value=io.StringIO(preset_json)):\n        sep = Separator(ensemble_preset=\"weighted\")\n        assert sep.ensemble_weights == [2.0, 1.0]\n\n\ndef test_preset_explicit_weights_override(mock_separator_init):\n    \"\"\"User-provided weights should override preset weights.\"\"\"\n    import io\n    preset_json = json.dumps({\n        \"version\": 1,\n        \"presets\": {\n            \"weighted\": {\n                \"name\": \"Weighted\",\n                \"description\": \"test\",\n                \"models\": [\"a.ckpt\", \"b.ckpt\"],\n                \"algorithm\": \"avg_wave\",\n                \"weights\": [2.0, 1.0],\n            }\n        }\n    })\n    with patch(\"audio_separator.separator.separator.resources.open_text\", return_value=io.StringIO(preset_json)):\n        sep = Separator(ensemble_preset=\"weighted\", ensemble_weights=[5.0, 3.0])\n        assert sep.ensemble_weights == [5.0, 3.0]\n"
  },
  {
    "path": "tests/unit/test_ensembler.py",
    "content": "import pytest\nimport numpy as np\nimport logging\nfrom audio_separator.separator.ensembler import Ensembler\n\n@pytest.fixture\ndef logger():\n    return logging.getLogger(\"test\")\n\ndef test_ensembler_avg_wave(logger):\n    # Test simple averaging\n    wav1 = np.ones((2, 100))\n    wav2 = np.zeros((2, 100))\n    ensembler = Ensembler(logger, algorithm=\"avg_wave\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert np.allclose(result, 0.5)\n\ndef test_ensembler_weighted_avg(logger):\n    # Test weighted averaging\n    wav1 = np.ones((2, 100))\n    wav2 = np.zeros((2, 100))\n    ensembler = Ensembler(logger, algorithm=\"avg_wave\", weights=[3.0, 1.0])\n    result = ensembler.ensemble([wav1, wav2])\n    assert np.allclose(result, 0.75)\n\ndef test_ensembler_different_lengths(logger):\n    # Test padding for different lengths\n    wav1 = np.ones((2, 100))\n    wav2 = np.zeros((2, 80))\n    ensembler = Ensembler(logger, algorithm=\"avg_wave\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.shape == (2, 100)\n    assert np.allclose(result[:, :80], 0.5)\n    assert np.allclose(result[:, 80:], 0.5) # 0.5 * 1 + 0.5 * 0\n\ndef test_ensembler_median_wave(logger):\n    wav1 = np.ones((2, 100))\n    wav2 = np.zeros((2, 100))\n    wav3 = np.ones((2, 100)) * 0.7\n    ensembler = Ensembler(logger, algorithm=\"median_wave\")\n    result = ensembler.ensemble([wav1, wav2, wav3])\n    assert np.allclose(result, 0.7)\n\ndef test_ensembler_max_wave(logger):\n    wav1 = np.array([[1.0, -2.0], [3.0, -4.0]])\n    wav2 = np.array([[0.5, -1.0], [4.0, -3.0]])\n    ensembler = Ensembler(logger, algorithm=\"max_wave\")\n    result = ensembler.ensemble([wav1, wav2])\n    # key=np.abs, so max of (1.0, 0.5) is 1.0, (-2.0, -1.0) is -2.0, (3.0, 4.0) is 4.0, (-4.0, -3.0) is -4.0\n    expected = np.array([[1.0, -2.0], [4.0, -4.0]])\n    assert np.allclose(result, expected)\n\ndef test_ensembler_min_wave(logger):\n    wav1 = np.array([[1.0, -2.0], [3.0, -4.0]])\n    wav2 = np.array([[0.5, -1.0], [4.0, -3.0]])\n    ensembler = Ensembler(logger, algorithm=\"min_wave\")\n    result = ensembler.ensemble([wav1, wav2])\n    # key=np.abs, so min of (1.0, 0.5) is 0.5, (-2.0, -1.0) is -1.0, (3.0, 4.0) is 3.0, (-4.0, -3.0) is -3.0\n    expected = np.array([[0.5, -1.0], [3.0, -3.0]])\n    assert np.allclose(result, expected)\n\ndef test_ensembler_avg_fft(logger):\n    # FFT algorithms involve STFT/ISTFT which are harder to test with simple constants\n    # but we can check if it returns a valid waveform of correct shape\n    wav1 = np.random.rand(2, 1024)\n    wav2 = np.random.rand(2, 1024)\n    ensembler = Ensembler(logger, algorithm=\"avg_fft\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.shape == (2, 1024)\n\ndef test_ensembler_ensemble_wav_uvr(logger):\n    # Linear Ensemble (least noisy chunk)\n    wav1 = np.ones((2, 1000))\n    wav2 = np.zeros((2, 1000))\n    ensembler = Ensembler(logger, algorithm=\"ensemble_wav\")\n    # It splits into 240 chunks by default. Each chunk in wav2 is less noisy (all 0s)\n    # so the result should be all 0s.\n    result = ensembler.ensemble([wav1, wav2])\n    assert np.allclose(result, 0.0)\n\ndef test_ensembler_empty_list(logger):\n    ensembler = Ensembler(logger)\n    assert ensembler.ensemble([]) is None\n\ndef test_ensembler_single_waveform(logger):\n    wav = np.random.rand(2, 100)\n    ensembler = Ensembler(logger)\n    result = ensembler.ensemble([wav])\n    assert np.array_equal(result, wav)\n\ndef test_ensembler_mismatched_channels(logger):\n    wav1 = np.random.rand(2, 100)\n    wav2 = np.random.rand(1, 100)\n    ensembler = Ensembler(logger)\n    # Broadcasing will happen in np.zeros_like(waveforms[0]) + w * weight\n    # but let's see what happens. Actually it should probably be handled or at least tested.\n    # Current implementation pads length but not channels.\n    with pytest.raises(ValueError):\n        ensembler.ensemble([wav1, wav2])\n\ndef test_ensembler_mono_stft(logger):\n    wav_mono = np.random.rand(1024)\n    ensembler = Ensembler(logger)\n    spec = ensembler._stft(wav_mono)\n    assert spec.shape[0] == 2 # Should be converted to stereo\n\ndef test_ensembler_single_channel_stft(logger):\n    wav_mono = np.random.rand(1, 1024)\n    ensembler = Ensembler(logger)\n    spec = ensembler._stft(wav_mono)\n    assert spec.shape[0] == 2 # Should be converted to stereo\n\ndef test_ensembler_median_fft(logger):\n    wav1 = np.random.rand(2, 1024)\n    wav2 = np.random.rand(2, 1024)\n    wav3 = np.random.rand(2, 1024)\n    ensembler = Ensembler(logger, algorithm=\"median_fft\")\n    result = ensembler.ensemble([wav1, wav2, wav3])\n    assert result.shape == (2, 1024)\n    assert np.all(np.isfinite(result))\n\ndef test_ensembler_min_fft(logger):\n    wav1 = np.random.rand(2, 1024)\n    wav2 = np.random.rand(2, 1024)\n    ensembler = Ensembler(logger, algorithm=\"min_fft\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.shape == (2, 1024)\n    assert np.all(np.isfinite(result))\n\ndef test_ensembler_max_fft(logger):\n    wav1 = np.random.rand(2, 1024)\n    wav2 = np.random.rand(2, 1024)\n    ensembler = Ensembler(logger, algorithm=\"max_fft\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.shape == (2, 1024)\n    assert np.all(np.isfinite(result))\n\ndef test_ensembler_uvr_max_spec(logger):\n    wav1 = np.random.rand(2, 4096).astype(np.float32)\n    wav2 = np.random.rand(2, 4096).astype(np.float32)\n    ensembler = Ensembler(logger, algorithm=\"uvr_max_spec\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.ndim == 2\n    assert np.all(np.isfinite(result))\n\ndef test_ensembler_uvr_min_spec(logger):\n    wav1 = np.random.rand(2, 4096).astype(np.float32)\n    wav2 = np.random.rand(2, 4096).astype(np.float32)\n    ensembler = Ensembler(logger, algorithm=\"uvr_min_spec\")\n    result = ensembler.ensemble([wav1, wav2])\n    assert result.ndim == 2\n    assert np.all(np.isfinite(result))\n\ndef test_ensembler_invalid_algorithm(logger):\n    wav1 = np.random.rand(2, 100)\n    ensembler = Ensembler(logger, algorithm=\"nonexistent\")\n    with pytest.raises(ValueError, match=\"Unknown ensemble algorithm\"):\n        ensembler.ensemble([wav1, wav1])\n\ndef test_ensembler_weight_mismatch_fallback(logger):\n    wav1 = np.ones((2, 100))\n    wav2 = np.zeros((2, 100))\n    # 3 weights for 2 waveforms should fall back to equal weights\n    ensembler = Ensembler(logger, algorithm=\"avg_wave\", weights=[1.0, 2.0, 3.0])\n    result = ensembler.ensemble([wav1, wav2])\n    assert np.allclose(result, 0.5)  # equal weights -> simple average\n"
  },
  {
    "path": "tests/unit/test_job_store.py",
    "content": "import pytest\nfrom unittest.mock import MagicMock, patch\nfrom audio_separator.remote.job_store import FirestoreJobStore\n\n\n@pytest.fixture\ndef mock_firestore_client():\n    with patch(\"google.cloud.firestore.Client\") as mock_cls:\n        mock_client = MagicMock()\n        mock_cls.return_value = mock_client\n        yield mock_client\n\n\n@pytest.fixture\ndef store(mock_firestore_client):\n    return FirestoreJobStore(project=\"test-project\")\n\n\nclass TestFirestoreJobStore:\n    def test_set_creates_document(self, store, mock_firestore_client):\n        \"\"\"Setting a task_id writes to Firestore with timestamps.\"\"\"\n        store.set(\"task-123\", {\n            \"task_id\": \"task-123\",\n            \"status\": \"submitted\",\n            \"progress\": 0,\n        })\n\n        collection = mock_firestore_client.collection\n        collection.assert_called_with(\"audio_separation_jobs\")\n        collection.return_value.document.assert_called_with(\"task-123\")\n        doc_ref = collection.return_value.document.return_value\n        doc_ref.set.assert_called_once()\n\n        written_data = doc_ref.set.call_args[0][0]\n        assert written_data[\"task_id\"] == \"task-123\"\n        assert written_data[\"status\"] == \"submitted\"\n        assert \"updated_at\" in written_data\n\n    def test_get_returns_document_data(self, store, mock_firestore_client):\n        \"\"\"Getting a task_id reads from Firestore.\"\"\"\n        doc_snapshot = MagicMock()\n        doc_snapshot.exists = True\n        doc_snapshot.to_dict.return_value = {\n            \"task_id\": \"task-123\",\n            \"status\": \"processing\",\n            \"progress\": 50,\n        }\n        collection = mock_firestore_client.collection\n        collection.return_value.document.return_value.get.return_value = doc_snapshot\n\n        result = store.get(\"task-123\")\n\n        assert result[\"status\"] == \"processing\"\n        assert result[\"progress\"] == 50\n\n    def test_get_returns_none_for_missing_document(self, store, mock_firestore_client):\n        \"\"\"Getting a nonexistent task_id returns None.\"\"\"\n        doc_snapshot = MagicMock()\n        doc_snapshot.exists = False\n        collection = mock_firestore_client.collection\n        collection.return_value.document.return_value.get.return_value = doc_snapshot\n\n        result = store.get(\"nonexistent\")\n        assert result is None\n\n    def test_contains_checks_existence(self, store, mock_firestore_client):\n        \"\"\"__contains__ checks if document exists in Firestore.\"\"\"\n        doc_snapshot = MagicMock()\n        doc_snapshot.exists = True\n        collection = mock_firestore_client.collection\n        collection.return_value.document.return_value.get.return_value = doc_snapshot\n\n        assert \"task-123\" in store\n\n    def test_update_merges_fields(self, store, mock_firestore_client):\n        \"\"\"Updating a task merges fields without overwriting the whole doc.\"\"\"\n        store.update(\"task-123\", {\"status\": \"processing\", \"progress\": 25})\n\n        collection = mock_firestore_client.collection\n        doc_ref = collection.return_value.document.return_value\n        doc_ref.update.assert_called_once()\n        updated_data = doc_ref.update.call_args[0][0]\n        assert updated_data[\"status\"] == \"processing\"\n        assert updated_data[\"progress\"] == 25\n        assert \"updated_at\" in updated_data\n\n    def test_delete_removes_document(self, store, mock_firestore_client):\n        \"\"\"Deleting a task_id removes the Firestore document.\"\"\"\n        store.delete(\"task-123\")\n\n        collection = mock_firestore_client.collection\n        doc_ref = collection.return_value.document.return_value\n        doc_ref.delete.assert_called_once()\n\n    def test_cleanup_old_jobs(self, store, mock_firestore_client):\n        \"\"\"cleanup_old_jobs deletes documents older than max_age_seconds.\"\"\"\n        old_doc = MagicMock()\n        old_doc.reference = MagicMock()\n        query = MagicMock()\n        query.stream.return_value = [old_doc]\n\n        collection = mock_firestore_client.collection\n        collection.return_value.where.return_value.where.return_value = query\n\n        deleted = store.cleanup_old_jobs(max_age_seconds=3600)\n\n        assert deleted == 1\n        old_doc.reference.delete.assert_called_once()\n"
  },
  {
    "path": "tests/unit/test_mdxc_roformer_chunking.py",
    "content": "\"\"\"\nUnit tests for MDXC Roformer chunking and overlap logic.\nTests the chunking mechanism, overlap handling, and edge cases.\n\"\"\"\n\nimport pytest\nimport numpy as np\nimport torch\nfrom unittest.mock import Mock, MagicMock, patch\nimport logging\n\n\nclass TestMDXCRoformerChunking:\n    \"\"\"Test cases for MDXC Roformer chunking and overlap functionality.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.sample_rate = 44100\n        self.audio_length = 132300  # 3 seconds\n        self.chunk_size = 8192\n        self.hop_length = 1024\n        \n        # Mock model with stft_hop_length\n        self.mock_model = Mock()\n        self.mock_model.stft_hop_length = self.hop_length\n        \n        # Mock audio object\n        self.mock_audio = Mock()\n        self.mock_audio.hop_length = 512  # Different from model for fallback test\n\n    def test_chunk_size_uses_model_stft_hop_length(self):\n        \"\"\"T051: Assert chunk_size uses model.stft_hop_length.\"\"\"\n        # Test implementation for chunking optimization - placeholder for future implementation\n        pytest.skip(\"Chunking optimization not yet implemented\")\n\n    def test_chunk_size_falls_back_to_audio_hop_length(self):\n        \"\"\"T052: Fallback to audio.hop_length if model.stft_hop_length missing.\"\"\"\n        # Test implementation for chunking optimization - placeholder for future implementation\n        pytest.skip(\"Chunking optimization not yet implemented\")\n\n    def test_step_clamped_to_chunk_size(self):\n        \"\"\"T053: Step clamped to chunk_size (desired_step > chunk_size or ≤ 0).\"\"\"\n        chunk_size = 8192\n        \n        # Test case 1: desired_step > chunk_size\n        desired_step_too_large = 10000\n        actual_step = min(desired_step_too_large, chunk_size)\n        assert actual_step == chunk_size\n        \n        # Test case 2: desired_step ≤ 0\n        desired_step_zero = 0\n        actual_step = max(desired_step_zero, chunk_size // 4)  # Use quarter chunk as minimum\n        assert actual_step == chunk_size // 4\n        \n        # Test case 3: desired_step negative\n        desired_step_negative = -100\n        actual_step = max(desired_step_negative, chunk_size // 4)\n        assert actual_step == chunk_size // 4\n        \n        # Test case 4: valid desired_step\n        desired_step_valid = 4096\n        actual_step = min(max(desired_step_valid, 1), chunk_size)\n        assert actual_step == desired_step_valid\n\n    def test_overlap_add_short_output_safe(self):\n        \"\"\"T054: overlap_add handles shorter model output safely (safe_len).\"\"\"\n        # Create mock tensors\n        chunk_size = 8192\n        model_output_length = 6000  # Shorter than expected\n        \n        # Mock overlap_add logic\n        def mock_overlap_add_safe(output, target, start_idx, safe_len):\n            \"\"\"Mock overlap add with safe length handling.\"\"\"\n            actual_len = min(output.shape[-1], safe_len)\n            end_idx = start_idx + actual_len\n            \n            # Ensure we don't go beyond target bounds\n            if end_idx > target.shape[-1]:\n                end_idx = target.shape[-1]\n                actual_len = end_idx - start_idx\n            \n            if actual_len > 0:\n                target[..., start_idx:end_idx] += output[..., :actual_len]\n            \n            return actual_len\n        \n        # Test with shorter output\n        output = torch.randn(2, model_output_length)  # Shorter than chunk_size\n        target = torch.zeros(2, 20000)\n        start_idx = 1000\n        safe_len = chunk_size\n        \n        actual_added = mock_overlap_add_safe(output, target, start_idx, safe_len)\n        \n        # Should handle shorter output gracefully\n        assert actual_added == model_output_length\n        assert actual_added < safe_len\n        \n        # Verify no out-of-bounds access\n        assert start_idx + actual_added <= target.shape[-1]\n\n    def test_counter_updates_safe_len(self):\n        \"\"\"T055: Counter increments match overlap_add safe span.\"\"\"\n        # Mock counter and overlap_add logic\n        counter = torch.zeros(2, 20000)\n        chunk_size = 8192\n        safe_len = 6000  # Shorter than chunk_size\n        start_idx = 1000\n        \n        def mock_update_counter_safe(counter, start_idx, safe_len):\n            \"\"\"Mock counter update that matches overlap_add safe span.\"\"\"\n            end_idx = start_idx + safe_len\n            if end_idx > counter.shape[-1]:\n                end_idx = counter.shape[-1]\n                safe_len = end_idx - start_idx\n            \n            if safe_len > 0:\n                counter[..., start_idx:end_idx] += 1.0\n            \n            return safe_len\n        \n        actual_updated = mock_update_counter_safe(counter, start_idx, safe_len)\n        \n        # Counter increment should match the safe length used\n        assert actual_updated == safe_len\n        \n        # Verify counter was updated correctly\n        expected_ones = counter[0, start_idx:start_idx + actual_updated]\n        assert torch.all(expected_ones == 1.0)\n        \n        # Verify areas outside weren't touched\n        before_area = counter[0, :start_idx]\n        after_area = counter[0, start_idx + actual_updated:]\n        assert torch.all(before_area == 0.0)\n        assert torch.all(after_area == 0.0)\n\n    def test_counter_clamp_no_nan(self):\n        \"\"\"T056: No NaN/inf on normalization (counter clamp).\"\"\"\n        # Create counter with some zero values (potential division by zero)\n        counter = torch.tensor([[0.0, 1.0, 2.0, 0.0, 3.0, 0.0]], dtype=torch.float32)\n        output = torch.tensor([[1.0, 2.0, 4.0, 8.0, 6.0, 16.0]], dtype=torch.float32)\n        \n        # Mock normalization with counter clamping\n        def mock_normalize_with_clamp(output, counter, min_clamp=1e-8):\n            \"\"\"Mock normalization that clamps counter to avoid NaN/inf.\"\"\"\n            clamped_counter = torch.clamp(counter, min=min_clamp)\n            normalized = output / clamped_counter\n            return normalized, clamped_counter\n        \n        normalized, clamped_counter = mock_normalize_with_clamp(output, counter)\n        \n        # Verify no NaN or inf values\n        assert not torch.any(torch.isnan(normalized))\n        assert not torch.any(torch.isinf(normalized))\n        \n        # Verify clamping worked\n        assert torch.all(clamped_counter >= 1e-8)\n        \n        # Verify normalization is reasonable\n        assert torch.all(normalized >= 0)  # Should be positive\n        \n        # Test with all-zero counter (extreme case)\n        zero_counter = torch.zeros_like(counter)\n        normalized_zero, clamped_zero = mock_normalize_with_clamp(output, zero_counter)\n        \n        assert not torch.any(torch.isnan(normalized_zero))\n        assert not torch.any(torch.isinf(normalized_zero))\n        assert torch.all(clamped_zero >= 1e-8)\n\n    def test_short_audio_last_block(self):\n        \"\"\"T057: Short-audio last-block path works and preserves length.\"\"\"\n        # Test with audio shorter than one chunk\n        short_audio_length = 4000  # Less than chunk_size (8192)\n        chunk_size = 8192\n        \n        # Mock processing of short audio\n        def mock_process_short_audio(audio_length, chunk_size):\n            \"\"\"Mock processing that handles short audio specially.\"\"\"\n            if audio_length < chunk_size:\n                # Last block path: process entire audio as one chunk\n                return {\n                    'processed_length': audio_length,\n                    'num_chunks': 1,\n                    'last_block': True,\n                    'preserved_length': audio_length\n                }\n            else:\n                # Normal chunking path\n                num_chunks = (audio_length + chunk_size - 1) // chunk_size\n                return {\n                    'processed_length': audio_length,\n                    'num_chunks': num_chunks,\n                    'last_block': False,\n                    'preserved_length': audio_length\n                }\n        \n        result = mock_process_short_audio(short_audio_length, chunk_size)\n        \n        # Verify last block path was taken\n        assert result['last_block'] is True\n        assert result['num_chunks'] == 1\n        \n        # Verify length preservation\n        assert result['processed_length'] == short_audio_length\n        assert result['preserved_length'] == short_audio_length\n        \n        # Test with normal-length audio for comparison\n        normal_audio_length = 20000\n        normal_result = mock_process_short_audio(normal_audio_length, chunk_size)\n        \n        assert normal_result['last_block'] is False\n        assert normal_result['num_chunks'] > 1\n        assert normal_result['preserved_length'] == normal_audio_length\n\n    @pytest.mark.parametrize(\"dim_t,hop_length\", [\n        (256, 512),\n        (512, 1024), \n        (1024, 2048),\n        (128, 256)\n    ])\n    def test_parametrized_shape_invariants(self, dim_t, hop_length):\n        \"\"\"T058: Parametrized invariants across dim_t and hop configs.\"\"\"\n        batch_size = 2\n        audio_length = 44100  # 1 second\n        \n        # Mock model with parametrized config\n        mock_model = Mock()\n        mock_model.dim_t = dim_t\n        mock_model.stft_hop_length = hop_length\n        \n        # Mock chunking calculation\n        def mock_calculate_chunks(audio_length, dim_t, hop_length):\n            \"\"\"Calculate number of chunks needed.\"\"\"\n            chunk_size = dim_t * 8  # Example chunk size calculation\n            step_size = hop_length\n            \n            if audio_length <= chunk_size:\n                return 1\n            \n            return (audio_length - chunk_size + step_size - 1) // step_size + 1\n        \n        num_chunks = mock_calculate_chunks(audio_length, dim_t, hop_length)\n        \n        # Invariants that should hold regardless of parameters\n        assert num_chunks >= 1, f\"Should always have at least 1 chunk for dim_t={dim_t}, hop={hop_length}\"\n        assert num_chunks <= audio_length // hop_length + 2, f\"Chunks should be reasonable for dim_t={dim_t}, hop={hop_length}\"\n        \n        # Test output shape consistency\n        mock_output_shape = (batch_size, 2, dim_t)  # (batch, channels, time)\n        assert mock_output_shape[0] == batch_size\n        assert mock_output_shape[2] == dim_t\n        \n        # Test that chunk_size scales with dim_t\n        chunk_size = dim_t * 8\n        assert chunk_size > 0\n        assert chunk_size >= dim_t\n\n    def test_logging_for_hop_and_step(self, caplog):\n        \"\"\"T063: Logs include hop/step sources (stft_hop_length, dim_t, desired vs actual step).\"\"\"\n        with caplog.at_level(logging.DEBUG):\n            # Mock logging during chunking setup\n            def mock_setup_chunking_with_logging(model, audio):\n                \"\"\"Mock chunking setup that logs parameter sources.\"\"\"\n                stft_hop = getattr(model, 'stft_hop_length', None)\n                dim_t = getattr(model, 'dim_t', None)\n                audio_hop = getattr(audio, 'hop_length', None)\n                \n                logging.debug(f\"Chunking setup: stft_hop_length={stft_hop}, dim_t={dim_t}\")\n                logging.debug(f\"Audio hop_length={audio_hop}\")\n                \n                desired_step = stft_hop if stft_hop else audio_hop\n                chunk_size = dim_t * 8 if dim_t else 8192\n                actual_step = min(desired_step, chunk_size) if desired_step else chunk_size // 4\n                \n                logging.debug(f\"Step calculation: desired={desired_step}, actual={actual_step}, chunk_size={chunk_size}\")\n                \n                return {\n                    'chunk_size': chunk_size,\n                    'step_size': actual_step,\n                    'stft_hop_source': stft_hop is not None\n                }\n            \n            # Test with model having stft_hop_length\n            model_with_stft = Mock()\n            model_with_stft.stft_hop_length = 1024\n            model_with_stft.dim_t = 256\n            \n            audio = Mock()\n            audio.hop_length = 512\n            \n            result = mock_setup_chunking_with_logging(model_with_stft, audio)\n            \n            # Verify logging occurred\n            assert \"stft_hop_length=1024\" in caplog.text\n            assert \"dim_t=256\" in caplog.text\n            assert \"desired=1024\" in caplog.text\n            assert \"actual=\" in caplog.text\n            assert \"chunk_size=\" in caplog.text\n\n    def test_iteration_count_reasonable(self):\n        \"\"\"T064: Iteration count reasonable (ceil calculation within ±1).\"\"\"\n        import math\n        \n        # Test various audio lengths and chunk configurations\n        test_cases = [\n            (44100, 8192, 1024),   # 1 second, normal chunk\n            (88200, 4096, 512),    # 2 seconds, smaller chunk  \n            (22050, 16384, 2048),  # 0.5 seconds, large chunk\n            (132300, 8192, 1024),  # 3 seconds, normal chunk\n        ]\n        \n        for audio_length, chunk_size, step_size in test_cases:\n            # Calculate expected iterations using ceiling division\n            if audio_length <= chunk_size:\n                expected_iterations = 1\n            else:\n                remaining_length = audio_length - chunk_size\n                expected_iterations = 1 + math.ceil(remaining_length / step_size)\n            \n            # Mock actual iteration calculation\n            def mock_calculate_iterations(audio_len, chunk_sz, step_sz):\n                if audio_len <= chunk_sz:\n                    return 1\n                \n                iterations = 0\n                pos = 0\n                while pos < audio_len:\n                    iterations += 1\n                    if pos + chunk_sz >= audio_len:\n                        break\n                    pos += step_sz\n                \n                return iterations\n            \n            actual_iterations = mock_calculate_iterations(audio_length, chunk_size, step_size)\n            \n            # Verify iteration count is reasonable (within ±1 of expected)\n            diff = abs(actual_iterations - expected_iterations)\n            assert diff <= 1, (\n                f\"Iteration count {actual_iterations} differs too much from expected {expected_iterations} \"\n                f\"for audio_len={audio_length}, chunk={chunk_size}, step={step_size}\"\n            )\n            \n            # Verify minimum iterations\n            assert actual_iterations >= 1, f\"Should always have at least 1 iteration\"\n            \n            # Verify maximum reasonable iterations\n            max_reasonable = (audio_length // step_size) + 2\n            assert actual_iterations <= max_reasonable, (\n                f\"Too many iterations {actual_iterations} for audio_len={audio_length}\"\n            )\n"
  },
  {
    "path": "tests/unit/test_model_configuration.py",
    "content": "\"\"\"\nUnit tests for ModelConfiguration validation.\nTests the ModelConfiguration dataclass and its validation logic.\n\"\"\"\n\nimport pytest\nfrom dataclasses import FrozenInstanceError\n\n# Add the roformer module to path for imports\nimport sys\nimport os\n# Find project root dynamically\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\n# Go up until we find the project root (contains audio_separator/ directory)\nwhile project_root and not os.path.exists(os.path.join(project_root, 'audio_separator')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\nif project_root:\n    sys.path.append(project_root)\n\nfrom audio_separator.separator.roformer.model_configuration import ModelConfiguration\n\n\nclass TestModelConfiguration:\n    \"\"\"Test cases for ModelConfiguration dataclass.\"\"\"\n    \n    def test_model_configuration_creation_valid(self):\n        \"\"\"Test creating a valid ModelConfiguration.\"\"\"\n        config = ModelConfiguration(\n            dim=512,\n            depth=12,\n            stereo=False,\n            num_stems=2,\n            time_transformer_depth=2,\n            freq_transformer_depth=2,\n            dim_head=64,\n            heads=8,\n            attn_dropout=0.1,\n            ff_dropout=0.1,\n            flash_attn=True,\n            mlp_expansion_factor=4,\n            sage_attention=False,\n            zero_dc=True,\n            use_torch_checkpoint=False,\n            skip_connection=False\n        )\n        \n        assert config.dim == 512\n        assert config.depth == 12\n        assert config.stereo is False\n        assert config.num_stems == 2\n        assert config.mlp_expansion_factor == 4\n        assert config.sage_attention is False\n    \n    def test_model_configuration_defaults(self):\n        \"\"\"Test ModelConfiguration with minimal required parameters.\"\"\"\n        config = ModelConfiguration(\n            dim=256,\n            depth=6\n        )\n        \n        # Check defaults are applied\n        assert config.dim == 256\n        assert config.depth == 6\n        assert config.stereo is False  # Default\n        assert config.num_stems == 1  # Default\n        assert config.time_transformer_depth == 2  # Default\n        assert config.freq_transformer_depth == 2  # Default\n        assert config.dim_head == 64  # Default\n        assert config.heads == 8  # Default\n        assert config.attn_dropout == 0.0  # Default\n        assert config.ff_dropout == 0.0  # Default\n        assert config.flash_attn is True  # Default\n        assert config.mlp_expansion_factor == 4  # Default\n        assert config.sage_attention is False  # Default\n        assert config.zero_dc is True  # Default\n        assert config.use_torch_checkpoint is False  # Default\n        assert config.skip_connection is False  # Default\n    \n    def test_model_configuration_immutable(self):\n        \"\"\"Test that ModelConfiguration is immutable (frozen).\"\"\"\n        config = ModelConfiguration(dim=512, depth=12)\n        \n        with pytest.raises(FrozenInstanceError):\n            config.dim = 1024\n        \n        with pytest.raises(FrozenInstanceError):\n            config.depth = 24\n    \n    def test_model_configuration_type_validation(self):\n        \"\"\"Test type validation in ModelConfiguration.\"\"\"\n        # Valid types should work\n        config = ModelConfiguration(\n            dim=512,\n            depth=12,\n            stereo=True,\n            attn_dropout=0.1,\n            flash_attn=False\n        )\n        assert isinstance(config.dim, int)\n        assert isinstance(config.stereo, bool)\n        assert isinstance(config.attn_dropout, float)\n    \n    def test_model_configuration_edge_values(self):\n        \"\"\"Test edge values for ModelConfiguration.\"\"\"\n        # Test minimum values\n        config_min = ModelConfiguration(\n            dim=1,\n            depth=1,\n            num_stems=1,\n            heads=1,\n            attn_dropout=0.0,\n            ff_dropout=0.0\n        )\n        assert config_min.dim == 1\n        assert config_min.depth == 1\n        assert config_min.num_stems == 1\n        \n        # Test larger values\n        config_max = ModelConfiguration(\n            dim=8192,\n            depth=64,\n            num_stems=16,\n            heads=64,\n            attn_dropout=1.0,\n            ff_dropout=1.0,\n            mlp_expansion_factor=16\n        )\n        assert config_max.dim == 8192\n        assert config_max.depth == 64\n        assert config_max.mlp_expansion_factor == 16\n    \n    def test_model_configuration_boolean_parameters(self):\n        \"\"\"Test boolean parameter handling.\"\"\"\n        config = ModelConfiguration(\n            dim=512,\n            depth=12,\n            stereo=True,\n            flash_attn=False,\n            sage_attention=True,\n            zero_dc=False,\n            use_torch_checkpoint=True,\n            skip_connection=True\n        )\n        \n        assert config.stereo is True\n        assert config.flash_attn is False\n        assert config.sage_attention is True\n        assert config.zero_dc is False\n        assert config.use_torch_checkpoint is True\n        assert config.skip_connection is True\n    \n    def test_model_configuration_new_parameters(self):\n        \"\"\"Test the new parameters added for updated Roformer implementation.\"\"\"\n        config = ModelConfiguration(\n            dim=512,\n            depth=12,\n            mlp_expansion_factor=8,\n            sage_attention=True,\n            zero_dc=False,\n            use_torch_checkpoint=True,\n            skip_connection=True\n        )\n        \n        # Verify new parameters are stored correctly\n        assert config.mlp_expansion_factor == 8\n        assert config.sage_attention is True\n        assert config.zero_dc is False\n        assert config.use_torch_checkpoint is True\n        assert config.skip_connection is True\n    \n    def test_model_configuration_repr(self):\n        \"\"\"Test string representation of ModelConfiguration.\"\"\"\n        config = ModelConfiguration(dim=512, depth=12)\n        repr_str = repr(config)\n        \n        assert \"ModelConfiguration\" in repr_str\n        assert \"dim=512\" in repr_str\n        assert \"depth=12\" in repr_str\n    \n    def test_model_configuration_equality(self):\n        \"\"\"Test equality comparison of ModelConfiguration instances.\"\"\"\n        config1 = ModelConfiguration(dim=512, depth=12, stereo=True)\n        config2 = ModelConfiguration(dim=512, depth=12, stereo=True)\n        config3 = ModelConfiguration(dim=512, depth=12, stereo=False)\n        \n        assert config1 == config2\n        assert config1 != config3\n    \n    def test_model_configuration_hash(self):\n        \"\"\"Test that ModelConfiguration is hashable.\"\"\"\n        config1 = ModelConfiguration(dim=512, depth=12)\n        config2 = ModelConfiguration(dim=512, depth=12)\n        config3 = ModelConfiguration(dim=256, depth=6)\n        \n        # Same configurations should have same hash\n        assert hash(config1) == hash(config2)\n        \n        # Different configurations should have different hashes\n        assert hash(config1) != hash(config3)\n        \n        # Should be usable as dict keys\n        config_dict = {config1: \"first\", config3: \"second\"}\n        assert config_dict[config2] == \"first\"  # config2 == config1\n    \n    def test_model_configuration_from_dict(self):\n        \"\"\"Test creating ModelConfiguration from dictionary-like data.\"\"\"\n        data = {\n            'dim': 512,\n            'depth': 12,\n            'stereo': True,\n            'mlp_expansion_factor': 8,\n            'sage_attention': True\n        }\n        \n        config = ModelConfiguration(**data)\n        \n        assert config.dim == 512\n        assert config.depth == 12\n        assert config.stereo is True\n        assert config.mlp_expansion_factor == 8\n        assert config.sage_attention is True\n    \n    def test_model_configuration_with_extra_kwargs(self):\n        \"\"\"Test ModelConfiguration ignores extra unknown parameters.\"\"\"\n        # This should not raise an error, unknown params should be ignored\n        # due to the dataclass design\n        try:\n            config = ModelConfiguration(\n                dim=512,\n                depth=12,\n                unknown_param=\"should_be_ignored\"  # This will cause TypeError\n            )\n            # If we get here, the dataclass accepted unknown params (unexpected)\n            assert False, \"Expected TypeError for unknown parameter\"\n        except TypeError as e:\n            # This is expected - dataclasses don't accept unknown parameters\n            assert \"unknown_param\" in str(e)\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__])\n"
  },
  {
    "path": "tests/unit/test_output_store.py",
    "content": "import pytest\nfrom unittest.mock import MagicMock, patch\nfrom audio_separator.remote.output_store import GCSOutputStore\n\n\n@pytest.fixture\ndef mock_storage_client():\n    with patch(\"google.cloud.storage.Client\") as mock_cls:\n        mock_client = MagicMock()\n        mock_cls.return_value = mock_client\n        yield mock_client\n\n\n@pytest.fixture\ndef store(mock_storage_client):\n    return GCSOutputStore(bucket_name=\"test-bucket\", project=\"test-project\")\n\n\nclass TestGCSOutputStore:\n    def test_upload_directory(self, store, mock_storage_client):\n        \"\"\"Uploads all files from a local directory to GCS under task_id prefix.\"\"\"\n        with patch(\"os.listdir\", return_value=[\"vocals.flac\", \"instrumental.flac\"]):\n            with patch(\"os.path.isfile\", return_value=True):\n                store.upload_task_outputs(\"task-123\", \"/tmp/outputs/task-123\")\n\n        bucket = mock_storage_client.bucket.return_value\n        assert bucket.blob.call_count == 2\n        blob = bucket.blob.return_value\n        assert blob.upload_from_filename.call_count == 2\n\n    def test_upload_builds_correct_gcs_paths(self, store, mock_storage_client):\n        \"\"\"GCS paths are {task_id}/{filename}.\"\"\"\n        with patch(\"os.listdir\", return_value=[\"output.flac\"]):\n            with patch(\"os.path.isfile\", return_value=True):\n                store.upload_task_outputs(\"task-123\", \"/tmp/outputs/task-123\")\n\n        bucket = mock_storage_client.bucket.return_value\n        bucket.blob.assert_called_with(\"task-123/output.flac\")\n\n    def test_download_file(self, store, mock_storage_client):\n        \"\"\"Downloads a specific file from GCS to a local path.\"\"\"\n        store.download_file(\"task-123\", \"vocals.flac\", \"/tmp/local/vocals.flac\")\n\n        bucket = mock_storage_client.bucket.return_value\n        bucket.blob.assert_called_with(\"task-123/vocals.flac\")\n        blob = bucket.blob.return_value\n        blob.download_to_filename.assert_called_with(\"/tmp/local/vocals.flac\")\n\n    def test_get_file_bytes(self, store, mock_storage_client):\n        \"\"\"Gets file content as bytes for streaming download responses.\"\"\"\n        bucket = mock_storage_client.bucket.return_value\n        blob = bucket.blob.return_value\n        blob.download_as_bytes.return_value = b\"audio data\"\n\n        result = store.get_file_bytes(\"task-123\", \"vocals.flac\")\n\n        assert result == b\"audio data\"\n        bucket.blob.assert_called_with(\"task-123/vocals.flac\")\n\n    def test_delete_task_outputs(self, store, mock_storage_client):\n        \"\"\"Deletes all files for a task from GCS.\"\"\"\n        bucket = mock_storage_client.bucket.return_value\n        blob1 = MagicMock()\n        blob2 = MagicMock()\n        bucket.list_blobs.return_value = [blob1, blob2]\n\n        deleted = store.delete_task_outputs(\"task-123\")\n\n        bucket.list_blobs.assert_called_with(prefix=\"task-123/\")\n        blob1.delete.assert_called_once()\n        blob2.delete.assert_called_once()\n        assert deleted == 2\n"
  },
  {
    "path": "tests/unit/test_parameter_validator.py",
    "content": "\"\"\"\nUnit tests for ParameterValidator methods.\nTests the core parameter validation logic for Roformer models.\n\"\"\"\n\nimport pytest\nfrom unittest.mock import Mock, patch\n\n# Add the roformer module to path for imports\nimport sys\nimport os\n# Find project root dynamically\ncurrent_dir = os.path.dirname(os.path.abspath(__file__))\nproject_root = current_dir\n# Go up until we find the project root (contains audio_separator/ directory)\nwhile project_root and not os.path.exists(os.path.join(project_root, 'audio_separator')):\n    parent = os.path.dirname(project_root)\n    if parent == project_root:  # Reached filesystem root\n        break\n    project_root = parent\n\nif project_root:\n    sys.path.append(project_root)\n\nfrom audio_separator.separator.roformer.parameter_validator import ParameterValidator, ValidationSeverity\nfrom audio_separator.separator.roformer.parameter_validation_error import ParameterValidationError\n\n\nclass TestParameterValidator:\n    \"\"\"Test cases for ParameterValidator class.\"\"\"\n    \n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.validator = ParameterValidator()\n    \n    def test_validate_required_parameters_bs_roformer_valid(self):\n        \"\"\"Test validation of required parameters for BSRoformer - valid case.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n        }\n        \n        issues = self.validator.validate_required_parameters(config, \"bs_roformer\")\n        assert len(issues) == 0\n    \n    def test_validate_required_parameters_bs_roformer_missing(self):\n        \"\"\"Test validation of required parameters for BSRoformer - missing parameters.\"\"\"\n        config = {\n            'dim': 512\n            # Missing 'depth' and 'freqs_per_bands'\n        }\n        \n        issues = self.validator.validate_required_parameters(config, \"bs_roformer\")\n        assert len(issues) == 2\n        \n        # Check that both missing parameters are reported\n        missing_params = [issue.parameter_name for issue in issues]\n        assert 'depth' in missing_params\n        assert 'freqs_per_bands' in missing_params\n        \n        # Check error severity\n        for issue in issues:\n            assert issue.severity == ValidationSeverity.ERROR\n    \n    def test_validate_required_parameters_mel_band_roformer_valid(self):\n        \"\"\"Test validation of required parameters for MelBandRoformer - valid case.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'num_bands': 64\n        }\n        \n        issues = self.validator.validate_required_parameters(config, \"mel_band_roformer\")\n        assert len(issues) == 0\n    \n    def test_validate_required_parameters_mel_band_roformer_missing(self):\n        \"\"\"Test validation of required parameters for MelBandRoformer - missing parameters.\"\"\"\n        config = {\n            'dim': 512\n            # Missing 'depth' and 'num_bands'\n        }\n        \n        issues = self.validator.validate_required_parameters(config, \"mel_band_roformer\")\n        assert len(issues) == 2\n        \n        missing_params = [issue.parameter_name for issue in issues]\n        assert 'depth' in missing_params\n        assert 'num_bands' in missing_params\n    \n    def test_validate_parameter_types_valid(self):\n        \"\"\"Test parameter type validation - valid types.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'stereo': False,\n            'attn_dropout': 0.1,\n            'freqs_per_bands': (2, 4, 8, 16),\n            'sample_rate': 44100\n        }\n        \n        issues = self.validator.validate_parameter_types(config)\n        assert len(issues) == 0\n    \n    def test_validate_parameter_types_invalid(self):\n        \"\"\"Test parameter type validation - invalid types.\"\"\"\n        config = {\n            'dim': \"512\",  # Should be int\n            'stereo': \"false\",  # Should be bool\n            'attn_dropout': \"0.1\",  # Should be float\n            'sample_rate': 44100.5  # Should be int\n        }\n        \n        issues = self.validator.validate_parameter_types(config)\n        assert len(issues) == 4\n        \n        # Check that all type errors are reported\n        invalid_params = [issue.parameter_name for issue in issues]\n        assert 'dim' in invalid_params\n        assert 'stereo' in invalid_params\n        assert 'attn_dropout' in invalid_params\n        assert 'sample_rate' in invalid_params\n    \n    def test_validate_parameter_ranges_valid(self):\n        \"\"\"Test parameter range validation - valid ranges.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'heads': 8,\n            'attn_dropout': 0.1,\n            'sample_rate': 44100\n        }\n        \n        issues = self.validator.validate_parameter_ranges(config)\n        assert len(issues) == 0\n    \n    def test_validate_parameter_ranges_invalid(self):\n        \"\"\"Test parameter range validation - invalid ranges.\"\"\"\n        config = {\n            'dim': 0,  # Below minimum (1)\n            'depth': 100,  # Above maximum (64)\n            'attn_dropout': 1.5,  # Above maximum (1.0)\n            'sample_rate': 5000  # Below minimum (8000)\n        }\n        \n        issues = self.validator.validate_parameter_ranges(config)\n        assert len(issues) == 4\n        \n        # Check that all range errors are reported\n        invalid_params = [issue.parameter_name for issue in issues]\n        assert 'dim' in invalid_params\n        assert 'depth' in invalid_params\n        assert 'attn_dropout' in invalid_params\n        assert 'sample_rate' in invalid_params\n    \n    def test_validate_parameter_compatibility_sage_flash_conflict(self):\n        \"\"\"Test parameter compatibility - sage_attention and flash_attn conflict.\"\"\"\n        config = {\n            'sage_attention': True,\n            'flash_attn': True\n        }\n        \n        issues = self.validator.validate_parameter_compatibility(config)\n        assert len(issues) == 1\n        assert issues[0].severity == ValidationSeverity.WARNING\n        assert \"sage_attention, flash_attn\" in issues[0].parameter_name\n    \n    def test_validate_parameter_compatibility_freqs_per_bands_warning(self):\n        \"\"\"Test parameter compatibility - freqs_per_bands sum warning.\"\"\"\n        config = {\n            'freqs_per_bands': (1, 2, 3)  # Sum = 6, which is very low\n        }\n        \n        issues = self.validator.validate_parameter_compatibility(config)\n        assert len(issues) == 1\n        assert issues[0].severity == ValidationSeverity.WARNING\n        assert issues[0].parameter_name == \"freqs_per_bands\"\n    \n    def test_validate_normalization_config_valid(self):\n        \"\"\"Test normalization configuration validation - valid cases.\"\"\"\n        valid_configs = [\n            None,\n            'layer_norm',\n            'batch_norm',\n            'rms_norm',\n            {'type': 'layer_norm', 'eps': 1e-5}\n        ]\n        \n        for norm_config in valid_configs:\n            issues = self.validator.validate_normalization_config(norm_config)\n            assert len(issues) == 0, f\"Failed for config: {norm_config}\"\n    \n    def test_validate_normalization_config_invalid(self):\n        \"\"\"Test normalization configuration validation - invalid cases.\"\"\"\n        invalid_configs = [\n            'invalid_norm',  # Unsupported string\n            123,  # Invalid type\n            ['layer_norm']  # Invalid type (list)\n        ]\n        \n        for norm_config in invalid_configs:\n            issues = self.validator.validate_normalization_config(norm_config)\n            assert len(issues) >= 1, f\"Should have failed for config: {norm_config}\"\n            assert issues[0].severity == ValidationSeverity.ERROR\n    \n    def test_get_parameter_defaults_bs_roformer(self):\n        \"\"\"Test getting parameter defaults for BSRoformer.\"\"\"\n        defaults = self.validator.get_parameter_defaults(\"bs_roformer\")\n        \n        # Check some expected defaults\n        assert defaults['stereo'] is False\n        assert defaults['num_stems'] == 2\n        assert defaults['flash_attn'] is True\n        assert defaults['mlp_expansion_factor'] == 4\n        assert defaults['sage_attention'] is False\n        assert 'freqs_per_bands' in defaults\n        assert 'mask_estimator_depth' in defaults\n    \n    def test_get_parameter_defaults_mel_band_roformer(self):\n        \"\"\"Test getting parameter defaults for MelBandRoformer.\"\"\"\n        defaults = self.validator.get_parameter_defaults(\"mel_band_roformer\")\n        \n        # Check some expected defaults\n        assert defaults['stereo'] is False\n        assert defaults['num_stems'] == 2\n        assert defaults['flash_attn'] is True\n        assert defaults['mlp_expansion_factor'] == 4\n        assert defaults['sage_attention'] is False\n        assert defaults['num_bands'] == 64\n    \n    def test_apply_parameter_defaults(self):\n        \"\"\"Test applying parameter defaults to configuration.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'stereo': True  # Override default\n        }\n        \n        result = self.validator.apply_parameter_defaults(config, \"bs_roformer\")\n        \n        # Should have original values\n        assert result['dim'] == 512\n        assert result['depth'] == 12\n        assert result['stereo'] is True  # Override preserved\n        \n        # Should have applied defaults\n        assert result['num_stems'] == 2\n        assert result['flash_attn'] is True\n        assert result['mlp_expansion_factor'] == 4\n    \n    def test_validate_all_comprehensive(self):\n        \"\"\"Test comprehensive validation with all checks.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64),\n            'stereo': False,\n            'attn_dropout': 0.1,\n            'sage_attention': False,\n            'norm': 'layer_norm'\n        }\n        \n        issues = self.validator.validate_all(config, \"bs_roformer\")\n        assert len(issues) == 0\n    \n    def test_validate_all_with_errors(self):\n        \"\"\"Test comprehensive validation with multiple error types.\"\"\"\n        config = {\n            'dim': \"invalid\",  # Type error\n            'depth': 100,  # Range error\n            # Missing 'freqs_per_bands' - required parameter error\n            'sage_attention': True,\n            'flash_attn': True,  # Compatibility warning\n            'norm': 'invalid_norm'  # Normalization error\n        }\n        \n        issues = self.validator.validate_all(config, \"bs_roformer\")\n        assert len(issues) >= 5  # At least 5 different types of issues\n        \n        # Check we have different types of issues\n        severities = [issue.severity for issue in issues]\n        assert ValidationSeverity.ERROR in severities\n        assert ValidationSeverity.WARNING in severities\n    \n    def test_validate_and_raise_success(self):\n        \"\"\"Test validate_and_raise with valid configuration.\"\"\"\n        config = {\n            'dim': 512,\n            'depth': 12,\n            'freqs_per_bands': (2, 4, 8, 16, 32, 64)\n        }\n        \n        # Should not raise any exception\n        self.validator.validate_and_raise(config, \"bs_roformer\")\n    \n    def test_validate_and_raise_error(self):\n        \"\"\"Test validate_and_raise with invalid configuration.\"\"\"\n        config = {\n            'dim': \"invalid\",  # Type error\n            'depth': 12\n            # Missing 'freqs_per_bands'\n        }\n        \n        with pytest.raises(ParameterValidationError) as exc_info:\n            self.validator.validate_and_raise(config, \"bs_roformer\")\n        \n        # Check the exception contains useful information\n        exception = exc_info.value\n        assert exception.parameter_name is not None\n        assert exception.suggested_fix is not None\n    \n    def test_private_helper_methods(self):\n        \"\"\"Test private helper methods.\"\"\"\n        # Test _is_correct_type\n        assert self.validator._is_correct_type(123, int) is True\n        assert self.validator._is_correct_type(\"123\", int) is False\n        assert self.validator._is_correct_type([1, 2, 3], (list, tuple)) is True\n        assert self.validator._is_correct_type((1, 2, 3), (list, tuple)) is True\n        \n        # Test _get_type_name\n        assert self.validator._get_type_name(int) == \"int\"\n        assert \"int\" in self.validator._get_type_name((int, float))\n        assert \"float\" in self.validator._get_type_name((int, float))\n        \n        # Test _get_expected_type_description\n        assert \"int\" in self.validator._get_expected_type_description('dim')\n        assert \"appropriate type\" in self.validator._get_expected_type_description('unknown_param')\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__])\n"
  },
  {
    "path": "tests/unit/test_remote_api_client.py",
    "content": "import json\nimport pytest\nimport logging\nimport tempfile\nimport os\nfrom unittest.mock import Mock, patch, mock_open\nfrom unittest.mock import MagicMock\nimport requests\n\nfrom audio_separator.remote import AudioSeparatorAPIClient\n\n\n@pytest.fixture\ndef logger():\n    \"\"\"Create a mock logger for testing.\"\"\"\n    return logging.getLogger(\"test\")\n\n\n@pytest.fixture\ndef api_client(logger):\n    \"\"\"Create an API client instance for testing.\"\"\"\n    return AudioSeparatorAPIClient(\"https://test-api.example.com\", logger)\n\n\n@pytest.fixture\ndef mock_audio_file():\n    \"\"\"Create a temporary audio file for testing.\"\"\"\n    with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False) as f:\n        f.write(b\"fake audio content\")\n        yield f.name\n    os.unlink(f.name)\n\n\nclass TestAudioSeparatorAPIClient:\n    \"\"\"Test the AudioSeparatorAPIClient class.\"\"\"\n\n    def test_init(self, logger):\n        \"\"\"Test client initialization.\"\"\"\n        api_url = \"https://test-api.example.com\"\n        client = AudioSeparatorAPIClient(api_url, logger)\n\n        assert client.api_url == api_url\n        assert client.logger == logger\n        assert client.session is not None\n\n    @patch(\"requests.Session.post\")\n    @patch(\"builtins.open\", new_callable=mock_open, read_data=b\"fake audio content\")\n    def test_separate_audio_success(self, mock_file, mock_post, api_client, mock_audio_file):\n        \"\"\"Test successful audio separation submission.\"\"\"\n        # Mock successful response\n        mock_response = Mock()\n        mock_response.json.return_value = {\n            \"task_id\": \"test-task-123\",\n            \"status\": \"submitted\",\n            \"message\": \"Job submitted for processing\",\n            \"models_used\": [\"default\"],\n            \"total_models\": 1,\n            \"original_filename\": \"test.wav\",\n        }\n        mock_response.raise_for_status.return_value = None\n        mock_post.return_value = mock_response\n\n        result = api_client.separate_audio(mock_audio_file)\n\n        # Verify the result\n        assert result[\"task_id\"] == \"test-task-123\"\n        assert result[\"status\"] == \"submitted\"\n        assert result[\"models_used\"] == [\"default\"]\n\n        # Verify the request was made correctly\n        mock_post.assert_called_once()\n        call_args = mock_post.call_args\n        assert call_args[0][0] == \"https://test-api.example.com/separate\"\n        assert \"files\" in call_args[1]\n        assert \"data\" in call_args[1]\n\n    @patch(\"requests.Session.post\")\n    @patch(\"builtins.open\", new_callable=mock_open, read_data=b\"fake audio content\")\n    def test_separate_audio_with_multiple_models(self, mock_file, mock_post, api_client, mock_audio_file):\n        \"\"\"Test audio separation with multiple models.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\"task_id\": \"test-task-456\", \"status\": \"submitted\", \"models_used\": [\"model1.ckpt\", \"model2.onnx\"], \"total_models\": 2}\n        mock_response.raise_for_status.return_value = None\n        mock_post.return_value = mock_response\n\n        models = [\"model1.ckpt\", \"model2.onnx\"]\n        result = api_client.separate_audio(mock_audio_file, models=models)\n\n        assert result[\"models_used\"] == models\n        assert result[\"total_models\"] == 2\n\n        # Check that models were serialized correctly in the request\n        call_args = mock_post.call_args\n        data = call_args[1][\"data\"]\n        assert json.loads(data[\"models\"]) == models\n\n    @patch(\"requests.Session.post\")\n    @patch(\"builtins.open\", new_callable=mock_open, read_data=b\"fake audio content\")\n    def test_separate_audio_with_custom_parameters(self, mock_file, mock_post, api_client, mock_audio_file):\n        \"\"\"Test audio separation with custom parameters.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\"task_id\": \"test-task-789\", \"status\": \"submitted\"}\n        mock_response.raise_for_status.return_value = None\n        mock_post.return_value = mock_response\n\n        custom_output_names = {\"Vocals\": \"lead_vocals\", \"Instrumental\": \"backing_track\"}\n        result = api_client.separate_audio(\n            mock_audio_file, model=\"test_model.ckpt\", output_format=\"wav\", normalization_threshold=0.8, mdx_segment_size=512, vr_aggression=10, custom_output_names=custom_output_names\n        )\n\n        # Verify the parameters were passed correctly\n        call_args = mock_post.call_args\n        data = call_args[1][\"data\"]\n        assert data[\"model\"] == \"test_model.ckpt\"\n        assert data[\"output_format\"] == \"wav\"\n        assert data[\"normalization_threshold\"] == 0.8\n        assert data[\"mdx_segment_size\"] == 512\n        assert data[\"vr_aggression\"] == 10\n        assert json.loads(data[\"custom_output_names\"]) == custom_output_names\n\n    @patch(\"requests.Session.post\")\n    def test_separate_audio_file_not_found(self, mock_post, api_client):\n        \"\"\"Test audio separation with non-existent file.\"\"\"\n        with pytest.raises(FileNotFoundError):\n            api_client.separate_audio(\"/nonexistent/file.wav\")\n\n    @patch(\"requests.Session.post\")\n    @patch(\"builtins.open\", new_callable=mock_open, read_data=b\"fake audio content\")\n    def test_separate_audio_request_error(self, mock_file, mock_post, api_client, mock_audio_file):\n        \"\"\"Test audio separation with request error.\"\"\"\n        mock_post.side_effect = requests.RequestException(\"Connection error\")\n\n        with pytest.raises(requests.RequestException):\n            api_client.separate_audio(mock_audio_file)\n\n    @patch(\"requests.Session.get\")\n    def test_get_job_status_success(self, mock_get, api_client):\n        \"\"\"Test successful job status retrieval.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\"task_id\": \"test-task-123\", \"status\": \"processing\", \"progress\": 50, \"current_model_index\": 0, \"total_models\": 1}\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.get_job_status(\"test-task-123\")\n\n        assert result[\"status\"] == \"processing\"\n        assert result[\"progress\"] == 50\n        mock_get.assert_called_once_with(\"https://test-api.example.com/status/test-task-123\", timeout=10)\n\n    @patch(\"requests.Session.get\")\n    def test_get_job_status_error(self, mock_get, api_client):\n        \"\"\"Test job status retrieval with error.\"\"\"\n        mock_get.side_effect = requests.RequestException(\"API error\")\n\n        with pytest.raises(requests.RequestException):\n            api_client.get_job_status(\"test-task-123\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_success(self, mock_file, mock_get, api_client):\n        \"\"\"Test successful file download.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.download_file(\"test-task-123\", \"output.wav\", \"local_output.wav\")\n\n        assert result == \"local_output.wav\"\n        mock_get.assert_called_once_with(\"https://test-api.example.com/download/test-task-123/output.wav\", timeout=60)\n        mock_file.assert_called_once_with(\"local_output.wav\", \"wb\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_default_output_path(self, mock_file, mock_get, api_client):\n        \"\"\"Test file download with default output path.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.download_file(\"test-task-123\", \"output.wav\")\n\n        assert result == \"output.wav\"\n        mock_file.assert_called_once_with(\"output.wav\", \"wb\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_with_spaces_in_filename(self, mock_file, mock_get, api_client):\n        \"\"\"Test file download with spaces in filename (URL encoding).\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        filename_with_spaces = \"My Song (Vocals) Output.wav\"\n        result = api_client.download_file(\"test-task-123\", filename_with_spaces)\n\n        # Verify URL was properly encoded\n        expected_url = \"https://test-api.example.com/download/test-task-123/My%20Song%20%28Vocals%29%20Output.wav\"\n        mock_get.assert_called_once_with(expected_url, timeout=60)\n        assert result == filename_with_spaces\n        mock_file.assert_called_once_with(filename_with_spaces, \"wb\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_with_special_characters(self, mock_file, mock_get, api_client):\n        \"\"\"Test file download with special characters in filename.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        filename_with_special_chars = \"Song & Band - Title (Vocals 50% Mix).flac\"\n        result = api_client.download_file(\"test-task-456\", filename_with_special_chars)\n\n        # Verify URL was properly encoded\n        expected_url = \"https://test-api.example.com/download/test-task-456/Song%20%26%20Band%20-%20Title%20%28Vocals%2050%25%20Mix%29.flac\"\n        mock_get.assert_called_once_with(expected_url, timeout=60)\n        assert result == filename_with_special_chars\n        mock_file.assert_called_once_with(filename_with_special_chars, \"wb\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_with_unicode_characters(self, mock_file, mock_get, api_client):\n        \"\"\"Test file download with unicode characters in filename.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        unicode_filename = \"Café - Naïve Song (Résumé).mp3\"\n        result = api_client.download_file(\"test-task-789\", unicode_filename)\n\n        # Verify URL was properly encoded (UTF-8 encoded then percent-encoded)\n        expected_url = \"https://test-api.example.com/download/test-task-789/Caf%C3%A9%20-%20Na%C3%AFve%20Song%20%28R%C3%A9sum%C3%A9%29.mp3\"\n        mock_get.assert_called_once_with(expected_url, timeout=60)\n        assert result == unicode_filename\n        mock_file.assert_called_once_with(unicode_filename, \"wb\")\n\n    @patch(\"requests.Session.get\")\n    def test_download_file_error(self, mock_get, api_client):\n        \"\"\"Test file download with error.\"\"\"\n        mock_get.side_effect = requests.RequestException(\"Download error\")\n\n        with pytest.raises(requests.RequestException):\n            api_client.download_file(\"test-task-123\", \"output.wav\")\n\n    @patch(\"requests.Session.get\")\n    def test_list_models_pretty_format(self, mock_get, api_client):\n        \"\"\"Test listing models in pretty format.\"\"\"\n        mock_response = Mock()\n        mock_response.text = \"Model list in pretty format\"\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.list_models(format_type=\"pretty\")\n\n        assert result == {\"text\": \"Model list in pretty format\"}\n        mock_get.assert_called_once_with(\"https://test-api.example.com/models\", timeout=10)\n\n    @patch(\"requests.Session.get\")\n    def test_list_models_json_format(self, mock_get, api_client):\n        \"\"\"Test listing models in JSON format.\"\"\"\n        mock_response = Mock()\n        models_data = {\"models\": [{\"name\": \"model1\", \"type\": \"MDX\"}]}\n        mock_response.json.return_value = models_data\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.list_models(format_type=\"json\")\n\n        assert result == models_data\n        mock_get.assert_called_once_with(\"https://test-api.example.com/models-json\", timeout=10)\n\n    @patch(\"requests.Session.get\")\n    def test_list_models_with_filter(self, mock_get, api_client):\n        \"\"\"Test listing models with filter.\"\"\"\n        mock_response = Mock()\n        mock_response.text = \"Filtered model list\"\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.list_models(filter_by=\"vocals\")\n\n        assert result == {\"text\": \"Filtered model list\"}\n        mock_get.assert_called_once_with(\"https://test-api.example.com/models?filter_sort_by=vocals\", timeout=10)\n\n    @patch(\"requests.Session.get\")\n    def test_get_server_version_success(self, mock_get, api_client):\n        \"\"\"Test successful server version retrieval.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\"version\": \"1.2.3\", \"status\": \"healthy\"}\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.get_server_version()\n\n        assert result == \"1.2.3\"\n        mock_get.assert_called_once_with(\"https://test-api.example.com/health\", timeout=10)\n\n    @patch(\"requests.Session.get\")\n    def test_get_server_version_no_version(self, mock_get, api_client):\n        \"\"\"Test server version retrieval when version is not in response.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\"status\": \"healthy\"}\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        result = api_client.get_server_version()\n\n        assert result == \"unknown\"\n\n    @patch(\"requests.Session.get\")\n    def test_get_server_version_error(self, mock_get, api_client):\n        \"\"\"Test server version retrieval with error.\"\"\"\n        mock_get.side_effect = requests.RequestException(\"Health check failed\")\n\n        with pytest.raises(requests.RequestException):\n            api_client.get_server_version()\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch.object(AudioSeparatorAPIClient, \"download_file\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_success(self, mock_sleep, mock_download, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method for separating audio and waiting for completion.\"\"\"\n        # Mock separation submission\n        mock_separate.return_value = {\"task_id\": \"test-task-123\"}\n\n        # Mock status polling - first processing, then completed\n        mock_status.side_effect = [{\"status\": \"processing\", \"progress\": 25}, {\"status\": \"processing\", \"progress\": 50}, {\"status\": \"completed\", \"files\": [\"output1.wav\", \"output2.wav\"]}]\n\n        # Mock file downloads\n        mock_download.side_effect = [\"output1.wav\", \"output2.wav\"]\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, model=\"test_model.ckpt\", timeout=60, poll_interval=5, download=True)\n\n        # Verify the result\n        assert result[\"status\"] == \"completed\"\n        assert result[\"task_id\"] == \"test-task-123\"\n        assert result[\"files\"] == [\"output1.wav\", \"output2.wav\"]\n        assert result[\"downloaded_files\"] == [\"output1.wav\", \"output2.wav\"]\n\n        # Verify method calls\n        mock_separate.assert_called_once()\n        assert mock_status.call_count == 3\n        assert mock_download.call_count == 2\n        assert mock_sleep.call_count == 2  # Two polling iterations\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_error(self, mock_sleep, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method when job fails.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-456\"}\n        mock_status.side_effect = [{\"status\": \"processing\", \"progress\": 25}, {\"status\": \"error\", \"error\": \"Processing failed\"}]\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, timeout=60, poll_interval=5)\n\n        assert result[\"status\"] == \"error\"\n        assert result[\"error\"] == \"Processing failed\"\n        assert result[\"files\"] == []\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_timeout(self, mock_sleep, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method when polling times out.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-789\"}\n        mock_status.return_value = {\"status\": \"processing\", \"progress\": 25}\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, timeout=1, poll_interval=0.1)\n\n        assert result[\"status\"] == \"timeout\"\n        assert \"timed out\" in result[\"error\"]\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch.object(AudioSeparatorAPIClient, \"download_file\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_no_download(self, mock_sleep, mock_download, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method without downloading files.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-123\"}\n        mock_status.side_effect = [{\"status\": \"completed\", \"files\": [\"output1.wav\", \"output2.wav\"]}]\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, download=False)\n\n        assert result[\"status\"] == \"completed\"\n        assert \"downloaded_files\" not in result\n        mock_download.assert_not_called()\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch.object(AudioSeparatorAPIClient, \"download_file\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_with_output_dir(self, mock_sleep, mock_download, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method with custom output directory.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-123\"}\n        mock_status.side_effect = [{\"status\": \"completed\", \"files\": [\"output1.wav\"]}]\n        mock_download.return_value = \"custom_dir/output1.wav\"\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, download=True, output_dir=\"custom_dir\")\n\n        # Verify download was called with custom output path\n        mock_download.assert_called_once_with(\"test-task-123\", \"output1.wav\", \"custom_dir/output1.wav\")\n        assert result[\"downloaded_files\"] == [\"custom_dir/output1.wav\"]\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch.object(AudioSeparatorAPIClient, \"download_file\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_with_special_character_filenames(self, mock_sleep, mock_download, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method with filenames containing special characters.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-456\"}\n        \n        # Simulate files with special characters like those in the bug report\n        files_with_special_chars = [\n            \"Song (Vocals model_bs_roformer_ep_317_sdr_12.9755.ckpt)_(Instrumental)_mel_band_roformer.flac\",\n            \"Song (Vocals model_bs_roformer_ep_317_sdr_12.9755.ckpt)_(Vocals)_mel_band_roformer.flac\"\n        ]\n        mock_status.side_effect = [{\"status\": \"completed\", \"files\": files_with_special_chars}]\n        \n        # Mock successful downloads\n        mock_download.side_effect = files_with_special_chars\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, download=True)\n\n        # Verify both files were downloaded despite having special characters\n        assert result[\"status\"] == \"completed\"\n        assert result[\"downloaded_files\"] == files_with_special_chars\n        assert mock_download.call_count == 2\n        \n        # Verify download was called with the correct filenames\n        expected_calls = [\n            (\"test-task-456\", files_with_special_chars[0], files_with_special_chars[0]),\n            (\"test-task-456\", files_with_special_chars[1], files_with_special_chars[1])\n        ]\n        actual_calls = [call.args for call in mock_download.call_args_list]\n        assert actual_calls == expected_calls\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_server_side_url_decoding_scenario(self, mock_file, mock_get, api_client):\n        \"\"\"Test that the client properly URL-encodes filenames that require server-side decoding.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        # Test the exact problematic filename from the bug report\n        problematic_filename = \"Bloc Party - The Prayer (Vocals model_bs_roformer_ep_317_sdr_12.9755.ckpt)_(Instrumental)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\"\n        result = api_client.download_file(\"test-task-bug\", problematic_filename)\n\n        # Verify URL was properly encoded - this is the exact URL that should be sent\n        expected_url = \"https://test-api.example.com/download/test-task-bug/Bloc%20Party%20-%20The%20Prayer%20%28Vocals%20model_bs_roformer_ep_317_sdr_12.9755.ckpt%29_%28Instrumental%29_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\"\n        mock_get.assert_called_once_with(expected_url, timeout=60)\n        assert result == problematic_filename\n        mock_file.assert_called_once_with(problematic_filename, \"wb\")\n\n    @patch(\"requests.Session.get\")\n    @patch(\"builtins.open\", new_callable=mock_open)\n    def test_download_file_by_hash(self, mock_file, mock_get, api_client):\n        \"\"\"Test downloading a file using hash identifier.\"\"\"\n        mock_response = Mock()\n        mock_response.content = b\"fake audio file content\"\n        mock_response.status_code = 200\n        mock_response.raise_for_status.return_value = None\n        mock_get.return_value = mock_response\n\n        # Test the new hash-based download method\n        filename = \"Long Filename With Spaces (Vocals).flac\"\n        file_hash = \"abc123def456\"\n        result = api_client.download_file_by_hash(\"test-task-hash\", file_hash, filename)\n\n        # Verify hash was used in URL instead of filename\n        expected_url = \"https://test-api.example.com/download/test-task-hash/abc123def456\"\n        mock_get.assert_called_once_with(expected_url, timeout=60)\n        assert result == filename\n        mock_file.assert_called_once_with(filename, \"wb\")\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch.object(AudioSeparatorAPIClient, \"download_file_by_hash\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_with_hash_format(self, mock_sleep, mock_download_hash, mock_status, mock_separate, api_client, mock_audio_file):\n        \"\"\"Test the convenience method with hash-based file format.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-hash\"}\n        \n        # Simulate new hash-based format\n        files_with_hashes = {\n            \"abc123def456\": \"Song (Vocals model_bs_roformer_ep_317_sdr_12.9755.ckpt)_(Instrumental)_mel_band_roformer.flac\",\n            \"def456ghi789\": \"Song (Vocals model_bs_roformer_ep_317_sdr_12.9755.ckpt)_(Vocals)_mel_band_roformer.flac\"\n        }\n        mock_status.side_effect = [{\"status\": \"completed\", \"files\": files_with_hashes}]\n        \n        # Mock successful downloads\n        mock_download_hash.side_effect = list(files_with_hashes.values())\n\n        result = api_client.separate_audio_and_wait(mock_audio_file, download=True)\n\n        # Verify both files were downloaded using hash method\n        assert result[\"status\"] == \"completed\"\n        assert result[\"downloaded_files\"] == list(files_with_hashes.values())\n        assert mock_download_hash.call_count == 2\n        \n        # Verify download was called with the correct hashes and filenames\n        expected_calls = [\n            (\"test-task-hash\", \"abc123def456\", list(files_with_hashes.values())[0], list(files_with_hashes.values())[0]),\n            (\"test-task-hash\", \"def456ghi789\", list(files_with_hashes.values())[1], list(files_with_hashes.values())[1])\n        ]\n        actual_calls = [call.args for call in mock_download_hash.call_args_list]\n        assert actual_calls == expected_calls\n\n    @patch(\"requests.Session.post\")\n    def test_separate_audio_with_gcs_uri(self, mock_post, api_client):\n        \"\"\"Test audio separation using GCS URI instead of file upload.\"\"\"\n        mock_response = Mock()\n        mock_response.json.return_value = {\n            \"task_id\": \"test-task-gcs\",\n            \"status\": \"submitted\",\n        }\n        mock_response.raise_for_status.return_value = None\n        mock_post.return_value = mock_response\n\n        result = api_client.separate_audio(\n            gcs_uri=\"gs://my-bucket/path/to/audio.flac\",\n            preset=\"instrumental_clean\",\n        )\n\n        assert result[\"task_id\"] == \"test-task-gcs\"\n\n        # Verify gcs_uri was sent in form data\n        call_args = mock_post.call_args\n        # A dummy empty file is sent to force multipart encoding (FastAPI requires it)\n        assert call_args[1][\"files\"][\"file\"][0] == \"\"  # empty filename\n        assert call_args[1][\"data\"][\"gcs_uri\"] == \"gs://my-bucket/path/to/audio.flac\"\n\n    def test_separate_audio_requires_file_or_gcs_uri(self, api_client):\n        \"\"\"Test that either file_path or gcs_uri must be provided.\"\"\"\n        with pytest.raises(ValueError, match=\"Must provide either\"):\n            api_client.separate_audio()\n\n    def test_separate_audio_rejects_both_file_and_gcs_uri(self, api_client, mock_audio_file):\n        \"\"\"Test that providing both file_path and gcs_uri raises an error.\"\"\"\n        with pytest.raises(ValueError, match=\"not both\"):\n            api_client.separate_audio(\n                file_path=mock_audio_file,\n                gcs_uri=\"gs://bucket/file.flac\",\n            )\n\n    @patch.object(AudioSeparatorAPIClient, \"separate_audio\")\n    @patch.object(AudioSeparatorAPIClient, \"get_job_status\")\n    @patch(\"time.sleep\")\n    def test_separate_audio_and_wait_with_gcs_uri(self, mock_sleep, mock_status, mock_separate, api_client):\n        \"\"\"Test separate_audio_and_wait with GCS URI.\"\"\"\n        mock_separate.return_value = {\"task_id\": \"test-task-gcs\"}\n        mock_status.side_effect = [\n            {\"status\": \"completed\", \"files\": {\"hash1\": \"output.flac\"}},\n        ]\n\n        result = api_client.separate_audio_and_wait(\n            gcs_uri=\"gs://my-bucket/audio.flac\",\n            preset=\"instrumental_clean\",\n            download=False,\n        )\n\n        assert result[\"status\"] == \"completed\"\n        # Verify gcs_uri was passed through to separate_audio\n        call_args = mock_separate.call_args\n        assert call_args[0][4] == \"gs://my-bucket/audio.flac\"  # positional arg for gcs_uri\n"
  },
  {
    "path": "tests/unit/test_remote_cli.py",
    "content": "import json\nimport pytest\nimport argparse\nimport logging\nimport os\nimport sys\nfrom unittest.mock import Mock, patch, MagicMock\nfrom unittest.mock import call\nimport tempfile\n\nfrom audio_separator.remote.cli import main, handle_separate_command, handle_status_command, handle_models_command, handle_download_command\nfrom audio_separator.remote import AudioSeparatorAPIClient\n\n\n@pytest.fixture\ndef mock_api_client():\n    \"\"\"Create a mock API client for testing.\"\"\"\n    return Mock(spec=AudioSeparatorAPIClient)\n\n\n@pytest.fixture\ndef mock_logger():\n    \"\"\"Create a mock logger for testing.\"\"\"\n    return Mock(spec=logging.Logger)\n\n\n@pytest.fixture\ndef mock_audio_file():\n    \"\"\"Create a temporary audio file for testing.\"\"\"\n    with tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False) as f:\n        f.write(b\"fake audio content\")\n        yield f.name\n    os.unlink(f.name)\n\n\nclass TestRemoteCLI:\n    \"\"\"Test the remote CLI functionality.\"\"\"\n\n    @patch('sys.argv', ['audio-separator-remote', '--version'])\n    @patch('audio_separator.remote.cli.metadata')\n    @patch('builtins.print')\n    def test_version_command_no_api_url(self, mock_print, mock_metadata, mock_api_client):\n        \"\"\"Test version command without API URL.\"\"\"\n        mock_metadata.distribution.return_value.version = \"1.2.3\"\n        \n        with pytest.raises(SystemExit) as exc_info:\n            main()\n        \n        assert exc_info.value.code == 0\n        mock_print.assert_any_call(\"Client version: 1.2.3\")\n\n    @patch('sys.argv', ['audio-separator-remote', '--version'])\n    @patch('audio_separator.remote.cli.metadata')\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('builtins.print')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_version_command_with_api_url(self, mock_print, mock_client_class, mock_metadata):\n        \"\"\"Test version command with API URL.\"\"\"\n        mock_metadata.distribution.return_value.version = \"1.2.3\"\n        mock_client = Mock()\n        mock_client.get_server_version.return_value = \"1.2.4\"\n        mock_client_class.return_value = mock_client\n        \n        with pytest.raises(SystemExit) as exc_info:\n            main()\n        \n        assert exc_info.value.code == 0\n        mock_print.assert_any_call(\"Client version: 1.2.3\")\n        mock_print.assert_any_call(\"Server version: 1.2.4\")\n\n    @patch('sys.argv', ['audio-separator-remote', '--version'])\n    @patch('audio_separator.remote.cli.metadata')\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('builtins.print')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_version_command_server_error(self, mock_print, mock_client_class, mock_metadata):\n        \"\"\"Test version command when server version retrieval fails.\"\"\"\n        mock_metadata.distribution.return_value.version = \"1.2.3\"\n        mock_client = Mock()\n        mock_client.get_server_version.side_effect = Exception(\"Connection error\")\n        mock_client_class.return_value = mock_client\n        \n        with pytest.raises(SystemExit) as exc_info:\n            main()\n        \n        assert exc_info.value.code == 0\n        mock_print.assert_any_call(\"Client version: 1.2.3\")\n\n    @patch('sys.argv', ['audio-separator-remote', 'separate', 'test.wav'])\n    @patch('builtins.print')\n    @patch.dict(os.environ, {}, clear=True)  # Clear all environment variables\n    def test_no_api_url_error(self, mock_print):\n        \"\"\"Test error when no API URL is provided.\"\"\"\n        with pytest.raises(SystemExit) as exc_info:\n            main()\n        \n        assert exc_info.value.code == 1\n\n    @patch('sys.argv', ['audio-separator-remote', 'separate', 'test.wav'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_separate_command')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_separate_command(self, mock_handle_separate, mock_client_class):\n        \"\"\"Test separate command execution.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        mock_handle_separate.assert_called_once()\n        # Verify the API client was called with the correct URL (don't check logger instance)\n        assert mock_client_class.call_count == 1\n        call_args = mock_client_class.call_args\n        assert call_args[0][0] == 'https://test-api.com'  # First argument should be the API URL\n\n    @patch('sys.argv', ['audio-separator-remote', 'status', 'task-123'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_status_command')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_status_command(self, mock_handle_status, mock_client_class):\n        \"\"\"Test status command execution.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        mock_handle_status.assert_called_once()\n\n    @patch('sys.argv', ['audio-separator-remote', 'models'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_models_command')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_models_command(self, mock_handle_models, mock_client_class):\n        \"\"\"Test models command execution.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        mock_handle_models.assert_called_once()\n\n    @patch('sys.argv', ['audio-separator-remote', 'download', 'task-123', 'file1.wav', 'file2.wav'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_download_command')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_download_command(self, mock_handle_download, mock_client_class):\n        \"\"\"Test download command execution.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        mock_handle_download.assert_called_once()\n\n    @patch('sys.argv', ['audio-separator-remote', '--api_url', 'https://custom-api.com', 'models'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_models_command')\n    def test_custom_api_url(self, mock_handle_models, mock_client_class):\n        \"\"\"Test using custom API URL parameter.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        # Verify the API client was called with the correct custom URL\n        assert mock_client_class.call_count == 1\n        call_args = mock_client_class.call_args\n        assert call_args[0][0] == 'https://custom-api.com'  # First argument should be the custom API URL\n\n    @patch('sys.argv', ['audio-separator-remote', '--debug', 'models'])\n    @patch('audio_separator.remote.cli.AudioSeparatorAPIClient')\n    @patch('audio_separator.remote.cli.handle_models_command')\n    @patch.dict(os.environ, {'AUDIO_SEPARATOR_API_URL': 'https://test-api.com'})\n    def test_debug_logging(self, mock_handle_models, mock_client_class):\n        \"\"\"Test debug logging flag.\"\"\"\n        mock_client = Mock()\n        mock_client_class.return_value = mock_client\n        \n        main()\n        \n        # Verify debug logging is enabled by checking if logger level was set\n        mock_handle_models.assert_called_once()\n\n    def test_handle_separate_command_success(self, mock_api_client, mock_logger, mock_audio_file):\n        \"\"\"Test successful separate command handling.\"\"\"\n        # Mock arguments\n        args = Mock()\n        args.audio_files = [mock_audio_file]\n        args.model = \"test_model.ckpt\"\n        args.models = None\n        args.timeout = 600\n        args.poll_interval = 10\n        args.output_format = \"flac\"\n        args.output_bitrate = None\n        args.normalization = 0.9\n        args.amplification = 0.0\n        args.single_stem = None\n        args.invert_spect = False\n        args.sample_rate = 44100\n        args.use_soundfile = False\n        args.use_autocast = False\n        args.custom_output_names = None\n        # MDX parameters\n        args.mdx_segment_size = 256\n        args.mdx_overlap = 0.25\n        args.mdx_batch_size = 1\n        args.mdx_hop_length = 1024\n        args.mdx_enable_denoise = False\n        # VR parameters\n        args.vr_batch_size = 1\n        args.vr_window_size = 512\n        args.vr_aggression = 5\n        args.vr_enable_tta = False\n        args.vr_high_end_process = False\n        args.vr_enable_post_process = False\n        args.vr_post_process_threshold = 0.2\n        # Demucs parameters\n        args.demucs_segment_size = \"Default\"\n        args.demucs_shifts = 2\n        args.demucs_overlap = 0.25\n        args.demucs_segments_enabled = True\n        # MDXC parameters\n        args.mdxc_segment_size = 256\n        args.mdxc_override_model_segment_size = False\n        args.mdxc_overlap = 8\n        args.mdxc_batch_size = 1\n        args.mdxc_pitch_shift = 0\n\n        # Mock successful API response\n        mock_api_client.separate_audio_and_wait.return_value = {\n            \"status\": \"completed\",\n            \"downloaded_files\": [\"output1.wav\", \"output2.wav\"]\n        }\n\n        handle_separate_command(args, mock_api_client, mock_logger)\n\n        # Verify API client was called with correct parameters\n        mock_api_client.separate_audio_and_wait.assert_called_once()\n        call_args = mock_api_client.separate_audio_and_wait.call_args\n        assert call_args[0][0] == mock_audio_file  # First positional argument should be the audio file\n        kwargs = call_args[1]\n        assert kwargs[\"model\"] == \"test_model.ckpt\"\n        assert kwargs[\"timeout\"] == 600\n        assert kwargs[\"download\"] is True\n\n    def test_handle_separate_command_with_multiple_models(self, mock_api_client, mock_logger, mock_audio_file):\n        \"\"\"Test separate command with multiple models.\"\"\"\n        args = Mock()\n        args.audio_files = [mock_audio_file]\n        args.model = None\n        args.models = [\"model1.ckpt\", \"model2.onnx\"]\n        args.timeout = 600\n        args.poll_interval = 10\n        # Set other required attributes\n        for attr in ['output_format', 'output_bitrate', 'normalization', 'amplification', 'single_stem',\n                     'invert_spect', 'sample_rate', 'use_soundfile', 'use_autocast', 'custom_output_names',\n                     'mdx_segment_size', 'mdx_overlap', 'mdx_batch_size', 'mdx_hop_length', 'mdx_enable_denoise',\n                     'vr_batch_size', 'vr_window_size', 'vr_aggression', 'vr_enable_tta', 'vr_high_end_process',\n                     'vr_enable_post_process', 'vr_post_process_threshold', 'demucs_segment_size', 'demucs_shifts',\n                     'demucs_overlap', 'demucs_segments_enabled', 'mdxc_segment_size', 'mdxc_override_model_segment_size',\n                     'mdxc_overlap', 'mdxc_batch_size', 'mdxc_pitch_shift']:\n            setattr(args, attr, None)\n\n        mock_api_client.separate_audio_and_wait.return_value = {\n            \"status\": \"completed\",\n            \"downloaded_files\": [\"output1.wav\", \"output2.wav\"]\n        }\n\n        handle_separate_command(args, mock_api_client, mock_logger)\n\n        call_args = mock_api_client.separate_audio_and_wait.call_args\n        kwargs = call_args[1]\n        assert kwargs[\"models\"] == [\"model1.ckpt\", \"model2.onnx\"]\n\n    def test_handle_separate_command_error(self, mock_api_client, mock_logger, mock_audio_file):\n        \"\"\"Test separate command with error.\"\"\"\n        args = Mock()\n        args.audio_files = [mock_audio_file]\n        args.model = \"test_model.ckpt\"\n        args.models = None\n        # Set other required attributes\n        for attr in ['timeout', 'poll_interval', 'output_format', 'output_bitrate', 'normalization', 'amplification',\n                     'single_stem', 'invert_spect', 'sample_rate', 'use_soundfile', 'use_autocast', 'custom_output_names',\n                     'mdx_segment_size', 'mdx_overlap', 'mdx_batch_size', 'mdx_hop_length', 'mdx_enable_denoise',\n                     'vr_batch_size', 'vr_window_size', 'vr_aggression', 'vr_enable_tta', 'vr_high_end_process',\n                     'vr_enable_post_process', 'vr_post_process_threshold', 'demucs_segment_size', 'demucs_shifts',\n                     'demucs_overlap', 'demucs_segments_enabled', 'mdxc_segment_size', 'mdxc_override_model_segment_size',\n                     'mdxc_overlap', 'mdxc_batch_size', 'mdxc_pitch_shift']:\n            setattr(args, attr, 0 if 'size' in attr or 'shift' in attr or 'batch' in attr or 'hop' in attr or 'aggression' in attr or 'overlap' in attr else False if 'enable' in attr or 'tta' in attr or 'process' in attr or 'spect' in attr or 'soundfile' in attr or 'autocast' in attr else None if 'output' in attr or 'single' in attr or 'custom' in attr else 600 if 'timeout' in attr else 10 if 'poll' in attr else 'Default' if 'demucs_segment' in attr else 0.25 if 'overlap' in attr and 'mdxc' not in attr else 512 if 'window' in attr else 44100 if 'sample' in attr else 'flac' if 'format' in attr else 0.9 if 'normalization' in attr else 0.0)\n\n        mock_api_client.separate_audio_and_wait.return_value = {\n            \"status\": \"error\",\n            \"error\": \"Processing failed\"\n        }\n\n        handle_separate_command(args, mock_api_client, mock_logger)\n\n        # Verify error was logged\n        mock_logger.error.assert_called()\n\n    def test_handle_separate_command_exception(self, mock_api_client, mock_logger, mock_audio_file):\n        \"\"\"Test separate command with exception.\"\"\"\n        args = Mock()\n        args.audio_files = [mock_audio_file]\n        # Set required attributes \n        for attr in ['model', 'models', 'timeout', 'poll_interval', 'output_format', 'output_bitrate', 'normalization',\n                     'amplification', 'single_stem', 'invert_spect', 'sample_rate', 'use_soundfile', 'use_autocast',\n                     'custom_output_names', 'mdx_segment_size', 'mdx_overlap', 'mdx_batch_size', 'mdx_hop_length',\n                     'mdx_enable_denoise', 'vr_batch_size', 'vr_window_size', 'vr_aggression', 'vr_enable_tta',\n                     'vr_high_end_process', 'vr_enable_post_process', 'vr_post_process_threshold', 'demucs_segment_size',\n                     'demucs_shifts', 'demucs_overlap', 'demucs_segments_enabled', 'mdxc_segment_size',\n                     'mdxc_override_model_segment_size', 'mdxc_overlap', 'mdxc_batch_size', 'mdxc_pitch_shift']:\n            setattr(args, attr, None)\n\n        mock_api_client.separate_audio_and_wait.side_effect = Exception(\"API error\")\n\n        handle_separate_command(args, mock_api_client, mock_logger)\n\n        # Verify error was logged\n        mock_logger.error.assert_called()\n\n    def test_handle_status_command_success(self, mock_api_client, mock_logger):\n        \"\"\"Test successful status command handling.\"\"\"\n        args = Mock()\n        args.task_id = \"test-task-123\"\n\n        mock_api_client.get_job_status.return_value = {\n            \"status\": \"completed\",\n            \"progress\": 100,\n            \"current_model_index\": 0,\n            \"total_models\": 1,\n            \"original_filename\": \"test.wav\",\n            \"models_used\": [\"test_model.ckpt\"],\n            \"files\": [\"output1.wav\", \"output2.wav\"]\n        }\n\n        handle_status_command(args, mock_api_client, mock_logger)\n\n        mock_api_client.get_job_status.assert_called_once_with(\"test-task-123\")\n        # Verify status information was logged\n        mock_logger.info.assert_called()\n\n    def test_handle_status_command_error_status(self, mock_api_client, mock_logger):\n        \"\"\"Test status command with error status.\"\"\"\n        args = Mock()\n        args.task_id = \"test-task-456\"\n\n        mock_api_client.get_job_status.return_value = {\n            \"status\": \"error\",\n            \"error\": \"Processing failed\"\n        }\n\n        handle_status_command(args, mock_api_client, mock_logger)\n\n        mock_api_client.get_job_status.assert_called_once_with(\"test-task-456\")\n        mock_logger.error.assert_called()\n\n    def test_handle_status_command_exception(self, mock_api_client, mock_logger):\n        \"\"\"Test status command with exception.\"\"\"\n        args = Mock()\n        args.task_id = \"test-task-789\"\n\n        mock_api_client.get_job_status.side_effect = Exception(\"API error\")\n\n        handle_status_command(args, mock_api_client, mock_logger)\n\n        mock_logger.error.assert_called()\n\n    def test_handle_models_command_pretty_format(self, mock_api_client, mock_logger):\n        \"\"\"Test models command with pretty format.\"\"\"\n        args = Mock()\n        args.format = \"pretty\"\n        args.filter = None\n\n        mock_api_client.list_models.return_value = {\n            \"text\": \"Model list in pretty format\"\n        }\n\n        with patch('builtins.print') as mock_print:\n            handle_models_command(args, mock_api_client, mock_logger)\n\n        mock_api_client.list_models.assert_called_once_with(\"pretty\", None)\n        mock_print.assert_called_once_with(\"Model list in pretty format\")\n\n    def test_handle_models_command_json_format(self, mock_api_client, mock_logger):\n        \"\"\"Test models command with JSON format.\"\"\"\n        args = Mock()\n        args.format = \"json\"\n        args.filter = \"vocals\"\n\n        models_data = {\"models\": [{\"name\": \"vocal_model\", \"type\": \"MDX\"}]}\n        mock_api_client.list_models.return_value = models_data\n\n        with patch('json.dumps') as mock_json_dumps:\n            with patch('builtins.print') as mock_print:\n                mock_json_dumps.return_value = '{\"models\": [{\"name\": \"vocal_model\", \"type\": \"MDX\"}]}'\n                handle_models_command(args, mock_api_client, mock_logger)\n\n        mock_api_client.list_models.assert_called_once_with(\"json\", \"vocals\")\n        mock_json_dumps.assert_called_once_with(models_data, indent=2)\n\n    def test_handle_models_command_exception(self, mock_api_client, mock_logger):\n        \"\"\"Test models command with exception.\"\"\"\n        args = Mock()\n        args.format = \"pretty\"\n        args.filter = None\n\n        mock_api_client.list_models.side_effect = Exception(\"API error\")\n\n        handle_models_command(args, mock_api_client, mock_logger)\n\n        mock_logger.error.assert_called()\n\n    def test_handle_download_command_success(self, mock_api_client, mock_logger):\n        \"\"\"Test successful download command handling.\"\"\"\n        args = Mock()\n        args.task_id = \"test-task-123\"\n        args.filenames = [\"output1.wav\", \"output2.wav\"]\n\n        mock_api_client.download_file.side_effect = [\"output1.wav\", \"output2.wav\"]\n\n        handle_download_command(args, mock_api_client, mock_logger)\n\n        # Verify download was called for each file\n        expected_calls = [\n            call(\"test-task-123\", \"output1.wav\"),\n            call(\"test-task-123\", \"output2.wav\")\n        ]\n        mock_api_client.download_file.assert_has_calls(expected_calls)\n\n        # Verify success messages were logged\n        assert mock_logger.info.call_count >= 4  # At least 2 downloading + 2 downloaded messages\n\n    def test_handle_download_command_exception(self, mock_api_client, mock_logger):\n        \"\"\"Test download command with exception.\"\"\"\n        args = Mock()\n        args.task_id = \"test-task-456\"\n        args.filenames = [\"output1.wav\"]\n\n        mock_api_client.download_file.side_effect = Exception(\"Download error\")\n\n        handle_download_command(args, mock_api_client, mock_logger)\n\n        mock_logger.error.assert_called() "
  },
  {
    "path": "tests/unit/test_separator_chunking.py",
    "content": "\"\"\"\nUnit tests for Separator class chunking functionality with multi-stem support.\nTests the _process_with_chunking() method for 2, 4, and 6-stem models.\n\"\"\"\n\nimport pytest\nimport os\nimport re\nimport tempfile\nimport logging\nfrom unittest.mock import Mock, patch, MagicMock, call\nfrom pydub import AudioSegment\n\nfrom audio_separator.separator.separator import Separator\n\n\nclass TestSeparatorChunking:\n    \"\"\"Test cases for Separator chunking with multi-stem models.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.logger = logging.getLogger(__name__)\n        self.temp_dir = tempfile.mkdtemp()\n\n    def teardown_method(self):\n        \"\"\"Clean up test fixtures.\"\"\"\n        import shutil\n        if os.path.exists(self.temp_dir):\n            shutil.rmtree(self.temp_dir)\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_process_with_chunking_2_stems(self, mock_chunker_class):\n        \"\"\"Test chunking with 2-stem model (Vocals, Instrumental).\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0001.wav\"),\n        ]\n\n        # Mock _separate_file to return 2 stems per chunk\n        separator._separate_file = Mock(side_effect=[\n            # Chunk 1 output\n            [\n                os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0000_(Instrumental).wav\"),\n            ],\n            # Chunk 2 output\n            [\n                os.path.join(self.temp_dir, \"chunk_0001_(Vocals).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0001_(Instrumental).wav\"),\n            ],\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify merge_chunks was called twice (once per stem)\n        assert mock_chunker.merge_chunks.call_count == 2\n\n        # Verify output contains 2 files\n        assert len(result) == 2\n\n        # Verify stem names in output\n        output_stems = [os.path.basename(path) for path in result]\n        assert any(\"Instrumental\" in name for name in output_stems)\n        assert any(\"Vocals\" in name for name in output_stems)\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_process_with_chunking_4_stems(self, mock_chunker_class):\n        \"\"\"Test chunking with 4-stem Demucs model (Drums, Bass, Other, Vocals).\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0001.wav\"),\n        ]\n\n        # Mock _separate_file to return 4 stems per chunk\n        separator._separate_file = Mock(side_effect=[\n            # Chunk 1 output (4 stems)\n            [\n                os.path.join(self.temp_dir, \"chunk_0000_(Drums).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0000_(Bass).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0000_(Other).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n            ],\n            # Chunk 2 output (4 stems)\n            [\n                os.path.join(self.temp_dir, \"chunk_0001_(Drums).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0001_(Bass).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0001_(Other).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0001_(Vocals).wav\"),\n            ],\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify merge_chunks was called 4 times (once per stem)\n        assert mock_chunker.merge_chunks.call_count == 4\n\n        # Verify output contains 4 files\n        assert len(result) == 4\n\n        # Verify all 4 stem names in output\n        output_stems = [os.path.basename(path) for path in result]\n        assert any(\"Drums\" in name for name in output_stems), f\"Drums not found in {output_stems}\"\n        assert any(\"Bass\" in name for name in output_stems), f\"Bass not found in {output_stems}\"\n        assert any(\"Other\" in name for name in output_stems), f\"Other not found in {output_stems}\"\n        assert any(\"Vocals\" in name for name in output_stems), f\"Vocals not found in {output_stems}\"\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_process_with_chunking_6_stems(self, mock_chunker_class):\n        \"\"\"Test chunking with 6-stem Demucs model.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Mock _separate_file to return 6 stems\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"chunk_0000_(Bass).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Drums).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Other).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Guitar).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Piano).wav\"),\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify merge_chunks was called 6 times (once per stem)\n        assert mock_chunker.merge_chunks.call_count == 6\n\n        # Verify output contains 6 files\n        assert len(result) == 6\n\n        # Verify all 6 stem names in output\n        output_stems = [os.path.basename(path) for path in result]\n        assert any(\"Bass\" in name for name in output_stems)\n        assert any(\"Drums\" in name for name in output_stems)\n        assert any(\"Other\" in name for name in output_stems)\n        assert any(\"Vocals\" in name for name in output_stems)\n        assert any(\"Guitar\" in name for name in output_stems)\n        assert any(\"Piano\" in name for name in output_stems)\n\n    def test_stem_name_extraction_from_filename(self):\n        \"\"\"Test regex extraction of stem names from chunk filenames.\"\"\"\n        import re\n\n        # Test standard patterns\n        test_cases = [\n            (\"chunk_0000_(Vocals).wav\", \"Vocals\"),\n            (\"chunk_0001_(Instrumental).wav\", \"Instrumental\"),\n            (\"chunk_0000_(Drums).wav\", \"Drums\"),\n            (\"chunk_0000_(Bass).wav\", \"Bass\"),\n            (\"chunk_0000_(Other).wav\", \"Other\"),\n            (\"chunk_0000_(Guitar).wav\", \"Guitar\"),\n            (\"chunk_0000_(Piano).wav\", \"Piano\"),\n            (\"test_audio_(Vocals).flac\", \"Vocals\"),\n            (\"long_filename_with_spaces_(Backing Vocals).mp3\", \"Backing Vocals\"),\n        ]\n\n        pattern = r'_\\(([^)]+)\\)'\n\n        for filename, expected_stem in test_cases:\n            match = re.search(pattern, filename)\n            assert match is not None, f\"Pattern did not match for {filename}\"\n            assert match.group(1) == expected_stem, f\"Expected {expected_stem}, got {match.group(1)}\"\n\n    def test_stem_name_extraction_fallback(self):\n        \"\"\"Test fallback behavior when filename pattern doesn't match.\"\"\"\n        import re\n\n        # Test filenames without the _(StemName) pattern\n        test_cases = [\n            \"chunk_0000.wav\",\n            \"output.mp3\",\n            \"vocals_only.flac\",\n        ]\n\n        pattern = r'_\\(([^)]+)\\)'\n\n        for filename in test_cases:\n            match = re.search(pattern, filename)\n            # Should not match - fallback logic should kick in\n            assert match is None, f\"Pattern should not match for {filename}\"\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_chunking_preserves_stem_order(self, mock_chunker_class):\n        \"\"\"Test that stems are merged in sorted order.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Mock _separate_file with intentionally unsorted output\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Drums).wav\"),\n            os.path.join(self.temp_dir, \"chunk_0000_(Bass).wav\"),\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify stems are processed in sorted order (Bass, Drums, Vocals)\n        output_stems = [os.path.basename(path) for path in result]\n\n        # Extract stem names from output\n        stem_names = []\n        pattern = r'_\\(([^)]+)\\)'\n        for name in output_stems:\n            match = re.search(pattern, name)\n            if match:\n                stem_names.append(match.group(1))\n\n        # Verify sorted order\n        assert stem_names == sorted(stem_names), f\"Stems not in sorted order: {stem_names}\"\n\n\nclass TestSeparatorChunkingLogic:\n    \"\"\"Test internal logic and state management of chunking.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.logger = logging.getLogger(__name__)\n        self.temp_dir = tempfile.mkdtemp()\n\n    def teardown_method(self):\n        \"\"\"Clean up test fixtures.\"\"\"\n        import shutil\n        if os.path.exists(self.temp_dir):\n            shutil.rmtree(self.temp_dir)\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_state_restoration_after_chunking(self, mock_chunker_class):\n        \"\"\"Test that separator state is restored after chunking.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = \"/original/output/dir\"\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = \"/original/model/output\"\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Track state changes during _separate_file call\n        state_during_processing = {}\n        def track_state(chunk_path, custom_names=None):\n            state_during_processing['chunk_duration'] = separator.chunk_duration\n            state_during_processing['output_dir'] = separator.output_dir\n            state_during_processing['model_output_dir'] = separator.model_instance.output_dir\n            return [os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\")]\n\n        separator._separate_file = Mock(side_effect=track_state)\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify state was modified during processing\n        assert state_during_processing['chunk_duration'] is None\n        assert state_during_processing['output_dir'] != \"/original/output/dir\"\n        assert state_during_processing['model_output_dir'] != \"/original/model/output\"\n\n        # Verify state was restored after processing\n        assert separator.chunk_duration == 10.0\n        assert separator.output_dir == \"/original/output/dir\"\n        assert separator.model_instance.output_dir == \"/original/model/output\"\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_gpu_cache_cleared_between_chunks(self, mock_chunker_class):\n        \"\"\"Test that GPU cache is cleared after each chunk.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n        separator.model_instance.clear_gpu_cache = Mock()\n\n        # Mock chunker behavior - 3 chunks\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0001.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0002.wav\"),\n        ]\n\n        # Mock _separate_file\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"chunk_(Vocals).wav\"),\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify clear_gpu_cache was called 3 times (once per chunk)\n        assert separator.model_instance.clear_gpu_cache.call_count == 3\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_temp_directory_cleanup(self, mock_chunker_class):\n        \"\"\"Test that temporary directory is cleaned up.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"chunk_(Vocals).wav\"),\n        ])\n\n        # Track temporary directory creation\n        temp_dirs_created = []\n        original_mkdtemp = tempfile.mkdtemp\n        def track_mkdtemp(prefix=None):\n            temp_dir = original_mkdtemp(prefix=prefix)\n            temp_dirs_created.append(temp_dir)\n            return temp_dir\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        with patch('tempfile.mkdtemp', side_effect=track_mkdtemp):\n            RealSeparator._process_with_chunking(\n                separator,\n                os.path.join(self.temp_dir, \"test.wav\"),\n                custom_output_names=None\n            )\n\n        # Verify temporary directories were cleaned up\n        for temp_dir in temp_dirs_created:\n            if temp_dir.startswith('/') or temp_dir.startswith('C:'):  # Real temp dir\n                assert not os.path.exists(temp_dir), f\"Temp directory {temp_dir} was not cleaned up\"\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_error_handling_with_state_restoration(self, mock_chunker_class):\n        \"\"\"Test that state is restored even when error occurs.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = \"/original/output/dir\"\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = \"/original/model/output\"\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Mock _separate_file to raise exception\n        separator._separate_file = Mock(side_effect=Exception(\"Processing error\"))\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n\n        # Should raise exception\n        with pytest.raises(Exception, match=\"Processing error\"):\n            RealSeparator._process_with_chunking(\n                separator,\n                os.path.join(self.temp_dir, \"test.wav\"),\n                custom_output_names=None\n            )\n\n        # Verify state was restored despite error\n        assert separator.chunk_duration == 10.0\n        assert separator.output_dir == \"/original/output/dir\"\n        assert separator.model_instance.output_dir == \"/original/model/output\"\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_audio_chunker_initialization(self, mock_chunker_class):\n        \"\"\"Test that AudioChunker is initialized with correct parameters.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 15.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"chunk_(Vocals).wav\"),\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Verify AudioChunker was initialized with correct parameters\n        mock_chunker_class.assert_called_once_with(15.0, separator.logger)\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_custom_output_names_applied_to_merged_output(self, mock_chunker_class):\n        \"\"\"Test that custom_output_names are applied to final merged output, not per-chunk.\n\n        Regression test for issue #259: when custom_output_names were passed to\n        _separate_file for each chunk, chunks would get the same custom name and\n        overwrite each other, producing corrupted output.\n        \"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior - multiple chunks to reproduce the overwrite bug\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0001.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0002.wav\"),\n        ]\n\n        separator._separate_file = Mock(side_effect=[\n            [os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n             os.path.join(self.temp_dir, \"chunk_0000_(Instrumental).wav\")],\n            [os.path.join(self.temp_dir, \"chunk_0001_(Vocals).wav\"),\n             os.path.join(self.temp_dir, \"chunk_0001_(Instrumental).wav\")],\n            [os.path.join(self.temp_dir, \"chunk_0002_(Vocals).wav\"),\n             os.path.join(self.temp_dir, \"chunk_0002_(Instrumental).wav\")],\n        ])\n\n        custom_names = {\"Vocals\": \"my_custom_vocals\", \"Instrumental\": \"my_custom_instrumental\"}\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=custom_names\n        )\n\n        # Verify _separate_file was called WITHOUT custom_output_names for each chunk\n        # This is the core of the fix: chunks must use default naming to avoid overwrites\n        for call_args in separator._separate_file.call_args_list:\n            args, kwargs = call_args\n            assert len(args) == 1, f\"_separate_file should be called with only chunk_path, got args: {args}\"\n            assert \"custom_output_names\" not in kwargs, \\\n                f\"_separate_file should not receive custom_output_names, got kwargs: {kwargs}\"\n\n        # Verify custom names were applied to the final merged output\n        assert len(result) == 2\n        output_basenames = [os.path.basename(path) for path in result]\n        assert any(\"my_custom_instrumental\" in name for name in output_basenames)\n        assert any(\"my_custom_vocals\" in name for name in output_basenames)\n\n        # Verify all 3 chunks were processed for each stem\n        assert mock_chunker.merge_chunks.call_count == 2\n        for merge_call in mock_chunker.merge_chunks.call_args_list:\n            chunk_list = merge_call[0][0]\n            assert len(chunk_list) == 3, f\"Each stem should merge 3 chunks, got {len(chunk_list)}\"\n\n\nclass TestSeparatorChunkingEdgeCases:\n    \"\"\"Test edge cases for multi-stem chunking.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.logger = logging.getLogger(__name__)\n        self.temp_dir = tempfile.mkdtemp()\n\n    def teardown_method(self):\n        \"\"\"Clean up test fixtures.\"\"\"\n        import shutil\n        if os.path.exists(self.temp_dir):\n            shutil.rmtree(self.temp_dir)\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_empty_output_handling(self, mock_chunker_class):\n        \"\"\"Test handling when a chunk produces no output files.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Mock _separate_file to return empty list\n        separator._separate_file = Mock(return_value=[])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Should return empty list\n        assert len(result) == 0\n        # merge_chunks should not be called\n        assert mock_chunker.merge_chunks.call_count == 0\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_inconsistent_stem_count_across_chunks(self, mock_chunker_class):\n        \"\"\"Test handling when different chunks produce different stem counts.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n            os.path.join(self.temp_dir, \"chunk_0001.wav\"),\n        ]\n\n        # Mock _separate_file - first chunk has 2 stems, second has 1 stem\n        separator._separate_file = Mock(side_effect=[\n            [\n                os.path.join(self.temp_dir, \"chunk_0000_(Vocals).wav\"),\n                os.path.join(self.temp_dir, \"chunk_0000_(Instrumental).wav\"),\n            ],\n            [\n                os.path.join(self.temp_dir, \"chunk_0001_(Vocals).wav\"),\n                # Missing Instrumental stem\n            ],\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Should handle inconsistency gracefully\n        # Vocals should have 2 chunks, Instrumental should have 1 chunk\n        assert len(result) == 2\n\n    @patch('audio_separator.separator.audio_chunking.AudioChunker')\n    def test_filename_without_stem_pattern(self, mock_chunker_class):\n        \"\"\"Test fallback when filename doesn't match expected pattern.\"\"\"\n        # Setup mock separator\n        separator = Mock(spec=Separator)\n        separator.logger = self.logger\n        separator.output_dir = self.temp_dir\n        separator.output_format = \"WAV\"\n        separator.chunk_duration = 10.0\n        separator.model_instance = Mock()\n        separator.model_instance.output_dir = self.temp_dir\n\n        # Mock chunker behavior\n        mock_chunker = Mock()\n        mock_chunker_class.return_value = mock_chunker\n        mock_chunker.split_audio.return_value = [\n            os.path.join(self.temp_dir, \"chunk_0000.wav\"),\n        ]\n\n        # Mock _separate_file - return file without standard naming pattern\n        separator._separate_file = Mock(return_value=[\n            os.path.join(self.temp_dir, \"output_file_without_pattern.wav\"),\n        ])\n\n        # Import and call the actual method\n        from audio_separator.separator.separator import Separator as RealSeparator\n        result = RealSeparator._process_with_chunking(\n            separator,\n            os.path.join(self.temp_dir, \"test.wav\"),\n            custom_output_names=None\n        )\n\n        # Should still produce output using fallback naming\n        assert len(result) == 1\n        # Should use fallback name like \"stem_0\"\n        assert \"stem_0\" in os.path.basename(result[0])\n\n\nif __name__ == \"__main__\":\n    pytest.main([__file__, \"-v\"])\n"
  },
  {
    "path": "tests/unit/test_separator_detection.py",
    "content": "\"\"\"\nUnit tests for separator detection and routing logic.\nTests the detection of Roformer models and proper routing.\n\"\"\"\n\nimport pytest\nfrom unittest.mock import Mock, patch, MagicMock\nimport os\nimport tempfile\n\n\nclass TestSeparatorDetection:\n    \"\"\"Test cases for separator detection and model routing.\"\"\"\n\n    def setup_method(self):\n        \"\"\"Set up test fixtures.\"\"\"\n        self.temp_dir = tempfile.mkdtemp()\n\n    def teardown_method(self):\n        \"\"\"Clean up test fixtures.\"\"\"\n        import shutil\n        shutil.rmtree(self.temp_dir, ignore_errors=True)\n\n    def test_is_roformer_set_from_yaml_path(self):\n        \"\"\"T059: YAML path containing 'roformer' sets is_roformer and routes Roformer path.\"\"\"\n        # Test cases for different YAML paths\n        test_cases = [\n            # (yaml_path, expected_is_roformer, description)\n            (\"model_bs_roformer_ep_317_sdr_12.yaml\", True, \"BS Roformer model\"),\n            (\"mel_band_roformer_vocals.yaml\", True, \"Mel Band Roformer model\"),\n            (\"roformer_large_model.yaml\", True, \"Generic Roformer model\"),\n            (\"BS-Roformer-Viperx-1297.yaml\", True, \"BS-Roformer with uppercase\"),\n            (\"model_mdx_extra_vocals.yaml\", False, \"MDX model (not Roformer)\"),\n            (\"vr_model_vocals.yaml\", False, \"VR model (not Roformer)\"),\n            (\"demucs_model.yaml\", False, \"Demucs model (not Roformer)\"),\n            (\"some_other_model.yaml\", False, \"Generic other model\"),\n        ]\n\n        for yaml_path, expected_is_roformer, description in test_cases:\n            # Mock the detection logic\n            def mock_detect_roformer_from_path(path):\n                \"\"\"Mock detection that checks if path contains 'roformer'.\"\"\"\n                path_lower = path.lower()\n                return 'roformer' in path_lower\n\n            is_roformer = mock_detect_roformer_from_path(yaml_path)\n            \n            assert is_roformer == expected_is_roformer, (\n                f\"Failed for {description}: path '{yaml_path}' should \"\n                f\"{'be' if expected_is_roformer else 'not be'} detected as Roformer\"\n            )\n\n        # Test routing logic based on detection\n        def mock_route_separator(yaml_path):\n            \"\"\"Mock routing logic that selects separator based on detection.\"\"\"\n            is_roformer = mock_detect_roformer_from_path(yaml_path)\n            \n            if is_roformer:\n                # In current routing, Roformer models are handled via MDXCSeparator path\n                return \"MDXCSeparator\"\n            elif 'mdx' in yaml_path.lower():\n                return \"MDXSeparator\"\n            elif 'vr' in yaml_path.lower():\n                return \"VRSeparator\"\n            elif 'demucs' in yaml_path.lower():\n                return \"DemucsSeparator\"\n            else:\n                return \"DefaultSeparator\"\n\n        # Test routing for Roformer models\n        roformer_paths = [\n            \"model_bs_roformer_ep_317_sdr_12.yaml\",\n            \"mel_band_roformer_vocals.yaml\",\n            \"BS-Roformer-Viperx-1297.yaml\"\n        ]\n        \n        for path in roformer_paths:\n            separator_type = mock_route_separator(path)\n            assert separator_type == \"MDXCSeparator\", (\n                f\"Roformer model '{path}' should route to MDXCSeparator, got {separator_type}\"\n            )\n\n        # Test routing for non-Roformer models\n        non_roformer_cases = [\n            (\"model_mdx_extra_vocals.yaml\", \"MDXSeparator\"),\n            (\"vr_model_vocals.yaml\", \"VRSeparator\"),\n            (\"demucs_model.yaml\", \"DemucsSeparator\"),\n            (\"some_other_model.yaml\", \"DefaultSeparator\"),\n        ]\n        \n        for path, expected_separator in non_roformer_cases:\n            separator_type = mock_route_separator(path)\n            assert separator_type == expected_separator, (\n                f\"Non-Roformer model '{path}' should route to {expected_separator}, got {separator_type}\"\n            )\n\n    def test_roformer_detection_case_insensitive(self):\n        \"\"\"Test that Roformer detection is case-insensitive.\"\"\"\n        case_variations = [\n            \"ROFORMER_MODEL.yaml\",\n            \"roformer_model.yaml\", \n            \"RoFormer_Model.yaml\",\n            \"BS_ROFORMER.yaml\",\n            \"bs_roformer.yaml\",\n            \"Bs_Roformer.yaml\",\n            \"MEL_BAND_ROFORMER.yaml\",\n            \"mel_band_roformer.yaml\",\n            \"Mel_Band_Roformer.yaml\"\n        ]\n\n        def mock_detect_roformer_case_insensitive(path):\n            \"\"\"Mock detection with case-insensitive matching.\"\"\"\n            return 'roformer' in path.lower()\n\n        for path in case_variations:\n            is_roformer = mock_detect_roformer_case_insensitive(path)\n            assert is_roformer, f\"Case variation '{path}' should be detected as Roformer\"\n\n    def test_roformer_detection_with_full_paths(self):\n        \"\"\"Test Roformer detection works with full file paths.\"\"\"\n        full_path_cases = [\n            (\"/models/roformer/model_bs_roformer_ep_317.yaml\", True),\n            (\"/path/to/models/mel_band_roformer_vocals.yaml\", True),\n            (\"/home/user/models/mdx_extra_vocals.yaml\", False),\n            (\"C:\\\\Models\\\\BS-Roformer-Viperx-1297.yaml\", True),\n            (\"./local/models/vr_model_vocals.yaml\", False),\n            (\"../models/roformer_large_model.yaml\", True),\n        ]\n\n        def mock_detect_roformer_full_path(full_path):\n            \"\"\"Mock detection that works with full paths.\"\"\"\n            filename = os.path.basename(full_path)\n            return 'roformer' in filename.lower()\n\n        for full_path, expected_result in full_path_cases:\n            is_roformer = mock_detect_roformer_full_path(full_path)\n            assert is_roformer == expected_result, (\n                f\"Full path '{full_path}' detection failed: expected {expected_result}, got {is_roformer}\"\n            )\n\n    def test_roformer_detection_with_config_content(self):\n        \"\"\"Test Roformer detection based on YAML configuration content.\"\"\"\n        # Mock YAML configurations\n        roformer_configs = [\n            {\"model_type\": \"bs_roformer\", \"architecture\": \"BSRoformer\"},\n            {\"model_type\": \"mel_band_roformer\", \"architecture\": \"MelBandRoformer\"},\n            {\"architecture\": \"roformer\", \"variant\": \"large\"},\n            {\"separator_class\": \"RoformerSeparator\", \"model\": \"bs_roformer\"},\n        ]\n\n        non_roformer_configs = [\n            {\"model_type\": \"mdx\", \"architecture\": \"MDX\"},\n            {\"model_type\": \"vr\", \"architecture\": \"VR\"},\n            {\"architecture\": \"demucs\", \"variant\": \"v4\"},\n            {\"separator_class\": \"MDXSeparator\", \"model\": \"mdx_extra\"},\n        ]\n\n        def mock_detect_roformer_from_config(config):\n            \"\"\"Mock detection based on configuration content.\"\"\"\n            config_str = str(config).lower()\n            return 'roformer' in config_str\n\n        # Test Roformer configs\n        for config in roformer_configs:\n            is_roformer = mock_detect_roformer_from_config(config)\n            assert is_roformer, f\"Roformer config should be detected: {config}\"\n\n        # Test non-Roformer configs  \n        for config in non_roformer_configs:\n            is_roformer = mock_detect_roformer_from_config(config)\n            assert not is_roformer, f\"Non-Roformer config should not be detected: {config}\"\n\n    def test_roformer_routing_integration(self):\n        \"\"\"Test integration between detection and routing.\"\"\"\n        with patch('audio_separator.separator.separator.Separator') as mock_separator_class:\n            mock_instance = Mock()\n            mock_separator_class.return_value = mock_instance\n            \n            # Mock the routing logic\n            def mock_load_model_with_routing(model_path):\n                \"\"\"Mock model loading that routes based on detection.\"\"\"\n                filename = os.path.basename(model_path)\n                is_roformer = 'roformer' in filename.lower()\n                \n                if is_roformer:\n                    # Should use Roformer-specific loading\n                    return {\n                        'separator_type': 'RoformerSeparator',\n                        'is_roformer': True,\n                        'routing_path': 'roformer_path'\n                    }\n                else:\n                    # Should use default loading\n                    return {\n                        'separator_type': 'DefaultSeparator', \n                        'is_roformer': False,\n                        'routing_path': 'default_path'\n                    }\n\n            # Test Roformer model routing\n            roformer_result = mock_load_model_with_routing(\"model_bs_roformer_ep_317.ckpt\")\n            assert roformer_result['is_roformer'] is True\n            assert roformer_result['separator_type'] == 'RoformerSeparator'\n            assert roformer_result['routing_path'] == 'roformer_path'\n\n            # Test non-Roformer model routing\n            mdx_result = mock_load_model_with_routing(\"model_mdx_extra_vocals.ckpt\")\n            assert mdx_result['is_roformer'] is False\n            assert mdx_result['separator_type'] == 'DefaultSeparator'\n            assert mdx_result['routing_path'] == 'default_path'\n\n    def test_roformer_detection_edge_cases(self):\n        \"\"\"Test edge cases in Roformer detection.\"\"\"\n        edge_cases = [\n            # (path, expected_result, description)\n            (\"\", False, \"Empty string\"),\n            (\"model.yaml\", False, \"No roformer in name\"),\n            (\"roformer\", True, \"Just 'roformer'\"),\n            (\"notroformer.yaml\", True, \"Contains 'roformer' as substring\"),\n            (\"roformer.txt\", True, \"Different extension\"),\n            (\"ROFORMER.YAML\", True, \"All uppercase\"),\n            (\"r0f0rmer.yaml\", False, \"Similar but not exact\"),\n            (\"roformermodel.ckpt\", True, \"No separator between roformer and model\"),\n            (\"model_roformer_v2.pth\", True, \"Roformer in middle of filename\"),\n        ]\n\n        def mock_detect_roformer_edge_cases(path):\n            \"\"\"Mock detection handling edge cases.\"\"\"\n            if not path:\n                return False\n            return 'roformer' in path.lower()\n\n        for path, expected_result, description in edge_cases:\n            is_roformer = mock_detect_roformer_edge_cases(path)\n            assert is_roformer == expected_result, (\n                f\"Edge case failed - {description}: path '{path}' should \"\n                f\"{'be' if expected_result else 'not be'} detected as Roformer\"\n            )\n\n    def test_roformer_detection_with_model_extensions(self):\n        \"\"\"Test Roformer detection works with various model file extensions.\"\"\"\n        model_extensions = ['.ckpt', '.pth', '.pt', '.onnx', '.yaml', '.yml']\n        base_names = ['bs_roformer_model', 'mel_band_roformer', 'roformer_large']\n\n        def mock_detect_roformer_with_extension(path):\n            \"\"\"Mock detection that ignores file extension.\"\"\"\n            filename = os.path.basename(path)\n            name_without_ext = os.path.splitext(filename)[0]\n            return 'roformer' in name_without_ext.lower()\n\n        for base_name in base_names:\n            for ext in model_extensions:\n                full_filename = f\"{base_name}{ext}\"\n                is_roformer = mock_detect_roformer_with_extension(full_filename)\n                assert is_roformer, (\n                    f\"Roformer model '{full_filename}' should be detected regardless of extension\"\n                )\n\n        # Test non-Roformer models with same extensions\n        non_roformer_bases = ['mdx_model', 'vr_vocals', 'demucs_v4']\n        for base_name in non_roformer_bases:\n            for ext in model_extensions:\n                full_filename = f\"{base_name}{ext}\"\n                is_roformer = mock_detect_roformer_with_extension(full_filename)\n                assert not is_roformer, (\n                    f\"Non-Roformer model '{full_filename}' should not be detected\"\n                )\n"
  },
  {
    "path": "tests/unit/test_stem_naming.py",
    "content": "\"\"\"\nUnit tests for stem name assignment logic in CommonSeparator and\nstem normalization in _separate_ensemble.\n\"\"\"\n\nimport pytest\nimport logging\nfrom unittest.mock import patch, MagicMock, PropertyMock\n\n\nclass TestCommonSeparatorStemSwap:\n    \"\"\"Test the target_instrument / instruments[0] swap logic in CommonSeparator.__init__.\"\"\"\n\n    def _make_config(self, model_data):\n        \"\"\"Create a minimal config dict for CommonSeparator.\"\"\"\n        return {\n            \"logger\": logging.getLogger(\"test\"),\n            \"log_level\": logging.DEBUG,\n            \"torch_device\": None,\n            \"torch_device_cpu\": None,\n            \"torch_device_mps\": None,\n            \"onnx_execution_provider\": None,\n            \"model_name\": \"test_model\",\n            \"model_path\": \"/tmp/test.ckpt\",\n            \"model_data\": model_data,\n            \"output_dir\": \"/tmp\",\n            \"output_format\": \"WAV\",\n            \"output_bitrate\": None,\n            \"normalization_threshold\": 0.9,\n            \"amplification_threshold\": 0.0,\n            \"enable_denoise\": False,\n            \"output_single_stem\": None,\n            \"invert_using_spec\": False,\n            \"sample_rate\": 44100,\n            \"use_soundfile\": False,\n        }\n\n    @patch(\"audio_separator.separator.common_separator.CommonSeparator._detect_roformer_model\", return_value=False)\n    def test_no_swap_when_target_matches_instruments0(self, mock_detect):\n        \"\"\"Normal case: target_instrument == instruments[0], no swap needed.\"\"\"\n        from audio_separator.separator.common_separator import CommonSeparator\n        model_data = {\n            \"training\": {\n                \"instruments\": [\"vocals\", \"other\"],\n                \"target_instrument\": \"vocals\",\n            }\n        }\n        sep = CommonSeparator(self._make_config(model_data))\n        assert sep.primary_stem_name == \"vocals\"\n        assert sep.secondary_stem_name == \"other\"\n\n    @patch(\"audio_separator.separator.common_separator.CommonSeparator._detect_roformer_model\", return_value=False)\n    def test_swap_when_target_mismatches_instruments0(self, mock_detect):\n        \"\"\"Bug fix case: target_instrument == instruments[1], should swap.\"\"\"\n        from audio_separator.separator.common_separator import CommonSeparator\n        model_data = {\n            \"training\": {\n                \"instruments\": [\"vocals\", \"other\"],\n                \"target_instrument\": \"other\",\n            }\n        }\n        sep = CommonSeparator(self._make_config(model_data))\n        # Primary should be \"other\" (the target), secondary should be \"vocals\"\n        assert sep.primary_stem_name == \"other\"\n        assert sep.secondary_stem_name == \"vocals\"\n\n    @patch(\"audio_separator.separator.common_separator.CommonSeparator._detect_roformer_model\", return_value=False)\n    def test_no_swap_when_no_target_instrument(self, mock_detect):\n        \"\"\"No target_instrument set — use instruments[0] as primary.\"\"\"\n        from audio_separator.separator.common_separator import CommonSeparator\n        model_data = {\n            \"training\": {\n                \"instruments\": [\"vocals\", \"other\"],\n            }\n        }\n        sep = CommonSeparator(self._make_config(model_data))\n        assert sep.primary_stem_name == \"vocals\"\n        assert sep.secondary_stem_name == \"other\"\n\n    @patch(\"audio_separator.separator.common_separator.CommonSeparator._detect_roformer_model\", return_value=False)\n    def test_no_swap_when_target_not_in_instruments(self, mock_detect):\n        \"\"\"Edge case: target_instrument not in instruments list — no swap, use default order.\"\"\"\n        from audio_separator.separator.common_separator import CommonSeparator\n        model_data = {\n            \"training\": {\n                \"instruments\": [\"vocals\", \"other\"],\n                \"target_instrument\": \"drums\",\n            }\n        }\n        sep = CommonSeparator(self._make_config(model_data))\n        assert sep.primary_stem_name == \"vocals\"\n        assert sep.secondary_stem_name == \"other\"\n\n    @patch(\"audio_separator.separator.common_separator.CommonSeparator._detect_roformer_model\", return_value=False)\n    def test_single_instrument_no_swap(self, mock_detect):\n        \"\"\"Single instrument — no swap possible.\"\"\"\n        from audio_separator.separator.common_separator import CommonSeparator\n        model_data = {\n            \"training\": {\n                \"instruments\": [\"vocals\"],\n                \"target_instrument\": \"vocals\",\n            }\n        }\n        sep = CommonSeparator(self._make_config(model_data))\n        assert sep.primary_stem_name == \"vocals\"\n\n\nclass TestStemNameMap:\n    \"\"\"Test the STEM_NAME_MAP constant covers all expected mappings.\"\"\"\n\n    def test_stem_name_map_has_expected_entries(self):\n        from audio_separator.separator.separator import STEM_NAME_MAP\n        # All known 2-stem secondary names should map\n        assert STEM_NAME_MAP[\"vocals\"] == \"Vocals\"\n        assert STEM_NAME_MAP[\"instrumental\"] == \"Instrumental\"\n        assert STEM_NAME_MAP[\"inst\"] == \"Instrumental\"\n        assert STEM_NAME_MAP[\"karaoke\"] == \"Instrumental\"\n        assert STEM_NAME_MAP[\"no_vocals\"] == \"Instrumental\"\n        assert STEM_NAME_MAP[\"other\"] == \"Other\"  # For multi-stem; 2-stem override happens in ensemble\n        assert STEM_NAME_MAP[\"drums\"] == \"Drums\"\n        assert STEM_NAME_MAP[\"bass\"] == \"Bass\"\n\n    def test_stem_name_map_keys_are_lowercase(self):\n        from audio_separator.separator.separator import STEM_NAME_MAP\n        for key in STEM_NAME_MAP:\n            assert key == key.lower(), f\"Key '{key}' is not lowercase\"\n\n\nclass TestEnsembleOutputFilenames:\n    \"\"\"Test the output filename logic for preset and custom ensembles.\"\"\"\n\n    def test_preset_filename_format(self):\n        \"\"\"Preset ensemble should use 'preset_<name>' in filename.\"\"\"\n        # This is tested indirectly via integration, but let's verify the format\n        import os\n        base_name = \"mardy20s\"\n        stem_name = \"Vocals\"\n        preset = \"vocal_balanced\"\n        expected = f\"{base_name}_({stem_name})_preset_{preset}\"\n        assert expected == \"mardy20s_(Vocals)_preset_vocal_balanced\"\n\n    def test_custom_ensemble_slug_generation(self):\n        \"\"\"Custom ensemble should generate model slugs for the filename.\"\"\"\n        import os\n        # Simulate the slug logic from separator.py\n        model_filenames = [\"UVR-MDX-NET-Inst_HQ_5.onnx\", \"mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt\"]\n        prefixes = [\"mel_band_roformer_\", \"melband_roformer_\", \"bs_roformer_\", \"model_bs_roformer_\", \"UVR-MDX-NET-\", \"UVR_MDXNET_\"]\n\n        model_slugs = []\n        for mf in model_filenames:\n            name = os.path.splitext(mf)[0]\n            for prefix in prefixes:\n                if name.startswith(prefix):\n                    name = name[len(prefix):]\n                    break\n            model_slugs.append(name[:12])\n\n        slugs_str = \"_\".join(model_slugs)\n        filename = f\"mardy20s_(Vocals)_custom_ensemble_{slugs_str}\"\n\n        assert \"Inst_HQ_5\" in filename\n        assert \"karaoke_aufr\" in filename\n        assert filename.startswith(\"mardy20s_(Vocals)_custom_ensemble_\")\n\n\nclass TestEnsembleCustomOutputNames:\n    \"\"\"Test that custom_output_names works correctly with ensemble separation.\"\"\"\n\n    def test_custom_output_names_not_passed_to_intermediate_separation(self):\n        \"\"\"Intermediate per-model separations must NOT receive custom_output_names.\n\n        custom_output_names replaces the default '_(StemType)_model' naming, which\n        removes the _(StemType)_ markers needed by _separate_ensemble to classify\n        stems. custom_output_names should only be applied to the final ensembled output.\n        \"\"\"\n        import re\n        from unittest.mock import patch, MagicMock, call\n        from audio_separator.separator.separator import Separator\n\n        sep = Separator(\n            log_level=logging.WARNING,\n            model_file_dir=\"/tmp/models\",\n            output_dir=\"/tmp/output\",\n            output_format=\"flac\",\n        )\n        sep.model_filenames = [\"model_a.ckpt\", \"model_b.ckpt\"]\n        sep.model_filename = [\"model_a.ckpt\", \"model_b.ckpt\"]\n        sep.ensemble_algorithm = \"uvr_max_spec\"\n        sep.ensemble_weights = None\n        sep.ensemble_preset = \"test_preset\"\n        sep.sample_rate = 44100\n\n        custom_names = {\"Vocals\": \"job123_mixed_vocals\", \"Instrumental\": \"job123_mixed_instrumental\"}\n\n        with patch.object(sep, '_separate_file') as mock_separate, \\\n             patch.object(sep, 'load_model'), \\\n             patch('audio_separator.separator.separator.Ensembler') as MockEnsembler, \\\n             patch('audio_separator.separator.separator.librosa') as mock_librosa, \\\n             patch('audio_separator.separator.separator.np') as mock_np:\n\n            # Mock _separate_file to return files with proper _(StemType)_ naming\n            mock_separate.side_effect = [\n                [\"/tmp/ensemble/song_(Vocals)_model_a.flac\", \"/tmp/ensemble/song_(Instrumental)_model_a.flac\"],\n                [\"/tmp/ensemble/song_(Vocals)_model_b.flac\", \"/tmp/ensemble/song_(Instrumental)_model_b.flac\"],\n            ]\n\n            # Mock librosa and numpy for ensembling\n            mock_wav = MagicMock()\n            mock_wav.ndim = 2\n            mock_wav.shape = (2, 44100)\n            mock_librosa.load.return_value = (mock_wav, 44100)\n            mock_np.asfortranarray.return_value = mock_wav\n\n            mock_ensembler = MagicMock()\n            mock_ensembler.ensemble.return_value = mock_wav\n            MockEnsembler.return_value = mock_ensembler\n\n            # Mock model_instance for write_audio\n            sep.model_instance = MagicMock()\n            sep.model_instance.output_dir = \"/tmp/output\"\n\n            sep._separate_ensemble(\"/tmp/song.flac\", custom_output_names=custom_names)\n\n            # Key assertion: _separate_file must be called with None, not custom_names\n            for call_args in mock_separate.call_args_list:\n                assert call_args[0][1] is None, (\n                    f\"_separate_file was called with custom_output_names={call_args[0][1]!r} \"\n                    f\"but should be None for intermediate ensemble files\"\n                )\n"
  },
  {
    "path": "tests/unit/test_stft.py",
    "content": "import unittest\nimport numpy as np\nimport torch\nfrom unittest.mock import Mock\nfrom audio_separator.separator.uvr_lib_v5.stft import STFT\n\n# Short-Time Fourier Transform (STFT) Process Overview:\n#\n# STFT transforms a time-domain signal into a frequency-domain representation.\n#   This transformation is achieved by dividing the signal into short frames (or segments) and applying the Fourier Transform to each frame.\n#\n# n_fft: The number of points used in the Fourier Transform, which determines the resolution of the frequency domain representation.\n#   Essentially, it dictates how many frequency bins we get in our STFT.\n#\n# hop_length: The number of samples by which we shift each frame of the signal.\n#   It affects the overlap between consecutive frames. If the hop_length is less than n_fft, we get overlapping frames.\n#\n# Windowing: Each frame of the signal is multiplied by a window function (e.g. Hann window) before applying the Fourier Transform.\n#   This is done to minimize discontinuities at the borders of each frame.\n\n\nclass TestSTFT(unittest.TestCase):\n    def setUp(self):\n        self.n_fft = 2048\n        self.hop_length = 512\n        self.dim_f = 1025\n        self.device = torch.device(\"cpu\")\n        self.stft = STFT(logger=Mock(), n_fft=self.n_fft, hop_length=self.hop_length, dim_f=self.dim_f, device=self.device)\n\n    def create_mock_tensor(self, shape, device=None):\n        tensor = torch.rand(shape)\n        if device:\n            tensor = tensor.to(device)\n        return tensor\n\n    def test_stft_initialization(self):\n        self.assertEqual(self.stft.n_fft, self.n_fft)\n        self.assertEqual(self.stft.hop_length, self.hop_length)\n        self.assertEqual(self.stft.dim_f, self.dim_f)\n        self.assertEqual(self.stft.device.type, \"cpu\")\n        self.assertIsInstance(self.stft.hann_window, torch.Tensor)\n\n    def test_stft_call(self):\n        input_tensor = self.create_mock_tensor((1, 16000))\n\n        # Apply STFT\n        stft_result = self.stft(input_tensor)\n\n        # Test conditions\n        self.assertIsNotNone(stft_result)\n        self.assertIsInstance(stft_result, torch.Tensor)\n\n        # Calculate the expected shape based on input parameters:\n\n        # Frequency Dimension (dim_f): This corresponds to the number of frequency bins in the STFT output.\n        #   In the case of a real-valued input signal (like audio), the Fourier Transform produces a symmetric output.\n        #   Hence, for an n_fft of 2048, we would typically get 2049 frequency bins (from 0 Hz to the Nyquist frequency).\n        #   However, we often don't need the full symmetric spectrum.\n        #   So, dim_f is used to specify how many frequency bins we are interested in.\n        #   In this test, it's set to 1025, which is about half of n_fft + 1 (as the Fourier Transform of a real-valued signal is symmetric).\n\n        # Time Dimension: This corresponds to how many frames (or segments) the input signal has been divided into.\n        #   It depends on the length of the input signal and the hop_length.\n        #   The formula for calculating the number of frames is derived from how we stride the window across the signal:\n        #     Length of Input Signal: Let's denote it as L. In this test, the input tensor has a shape of [1, 16000], so L is 16000 (ignoring the batch dimension for simplicity).\n        #     Number of Frames: The number of frames depends on how we stride the window across the signal. For each frame, we move the window by hop_length samples.\n        #     Therefore, the number of frames N_frames can be roughly estimated by dividing the length of the signal by the hop_length.\n        #     However, since the window overlaps the signal, we add an extra frame to account for the last segment of the signal. This gives us N_frames = (L // hop_length) + 1.\n\n        # Putting It All Together\n        #   expected_shape thus becomes (dim_f, N_frames), which is (1025, (16000 // 512) + 1) in this test case.\n\n        expected_shape = (self.dim_f, (input_tensor.shape[1] // self.hop_length) + 1)\n\n        self.assertEqual(stft_result.shape[-2:], expected_shape)\n\n    def test_calculate_inverse_dimensions(self):\n        # Create a sample input tensor\n        sample_input = torch.randn(1, 2, 500, 32)  # Batch, Channel, Frequency, Time dimensions\n        batch_dims, channel_dim, freq_dim, time_dim, num_freq_bins = self.stft.calculate_inverse_dimensions(sample_input)\n\n        # Expected values\n        expected_num_freq_bins = self.n_fft // 2 + 1\n\n        # Assertions\n        self.assertEqual(batch_dims, sample_input.shape[:-3])\n        self.assertEqual(channel_dim, 2)\n        self.assertEqual(freq_dim, 500)\n        self.assertEqual(time_dim, 32)\n        self.assertEqual(num_freq_bins, expected_num_freq_bins)\n\n    def test_pad_frequency_dimension(self):\n        # Create a sample input tensor\n        sample_input = torch.randn(1, 2, 500, 32)  # Batch, Channel, Frequency, Time dimensions\n        batch_dims, channel_dim, freq_dim, time_dim, num_freq_bins = self.stft.calculate_inverse_dimensions(sample_input)\n\n        # Apply padding\n        padded_output = self.stft.pad_frequency_dimension(sample_input, batch_dims, channel_dim, freq_dim, time_dim, num_freq_bins)\n\n        # Expected frequency dimension after padding\n        expected_freq_dim = num_freq_bins\n\n        # Assertions\n        self.assertEqual(padded_output.shape[-2], expected_freq_dim)\n\n    def test_prepare_for_istft(self):\n        # Create a sample input tensor\n        sample_input = torch.randn(1, 2, 500, 32)  # Batch, Channel, Frequency, Time dimensions\n        batch_dims, channel_dim, freq_dim, time_dim, num_freq_bins = self.stft.calculate_inverse_dimensions(sample_input)\n        padded_output = self.stft.pad_frequency_dimension(sample_input, batch_dims, channel_dim, freq_dim, time_dim, num_freq_bins)\n\n        # Apply prepare_for_istft\n        complex_tensor = self.stft.prepare_for_istft(padded_output, batch_dims, channel_dim, num_freq_bins, time_dim)\n\n        # Calculate the expected flattened batch size (flattening batch and channel dimensions)\n        expected_flattened_batch_size = batch_dims[0] * (channel_dim // 2)\n\n        # Expected shape of the complex tensor\n        expected_shape = (expected_flattened_batch_size, num_freq_bins, time_dim)\n\n        # Assertions\n        self.assertEqual(complex_tensor.shape, expected_shape)\n\n    def test_inverse_stft(self):\n        # Create a mock tensor with the correct input shape\n        input_tensor = torch.rand(1, 2, 1025, 32)  # shape matching output of STFT\n\n        # Apply inverse STFT\n        output_tensor = self.stft.inverse(input_tensor)\n\n        # Check if the output tensor is on the CPU\n        self.assertEqual(output_tensor.device.type, \"cpu\")\n\n        # Expected output shape: (Batch size, Channel dimension, Time dimension)\n        expected_shape = (1, 2, 7936)  # Calculated based on STFT parameters\n\n        # Check if the output tensor has the expected shape\n        self.assertEqual(output_tensor.shape, expected_shape)\n\n    @unittest.skipIf(not torch.backends.mps.is_available(), \"MPS not available\")\n    def test_stft_with_mps_device(self):\n        mps_device = torch.device(\"mps\")\n        self.stft.device = mps_device\n        input_tensor = self.create_mock_tensor((1, 16000), device=mps_device)\n        stft_result = self.stft(input_tensor)\n        self.assertIsNotNone(stft_result)\n        self.assertIsInstance(stft_result, torch.Tensor)\n\n    @unittest.skipIf(not torch.backends.mps.is_available(), \"MPS not available\")\n    def test_inverse_with_mps_device(self):\n        mps_device = torch.device(\"mps\")\n        self.stft.device = mps_device\n        input_tensor = self.create_mock_tensor((1, 2, 1025, 32), device=mps_device)\n        istft_result = self.stft.inverse(input_tensor)\n        self.assertIsNotNone(istft_result)\n        self.assertIsInstance(istft_result, torch.Tensor)\n\n\n# Mock logger to use in tests\nclass MockLogger:\n    def debug(self, message):\n        pass\n\n\nif __name__ == \"__main__\":\n    unittest.main()\n"
  },
  {
    "path": "tests/utils.py",
    "content": "import os\nimport numpy as np\nimport librosa\nimport librosa.display\nimport matplotlib.pyplot as plt\nfrom PIL import Image\nfrom io import BytesIO\nimport soundfile as sf\nfrom pathlib import Path\nfrom skimage.metrics import structural_similarity as ssim\n\n\ndef generate_waveform_image(audio_path, output_path=None, fig_size=(10, 4)):\n    \"\"\"Generate a waveform image from an audio file.\n    \n    Args:\n        audio_path: Path to the audio file\n        output_path: Path to save the generated image (optional)\n        fig_size: Size of the figure (width, height)\n        \n    Returns:\n        BytesIO object containing the image if output_path is None, otherwise saves to output_path\n    \"\"\"\n    # Load audio file\n    y, sr = librosa.load(audio_path, sr=None, mono=False)\n    \n    # If mono, convert to stereo-like format for consistent plotting\n    if y.ndim == 1:\n        y = np.array([y, y])\n    \n    plt.figure(figsize=fig_size)\n    \n    # Plot waveform for each channel with fixed Y-axis scale\n    plt.subplot(2, 1, 1)\n    plt.plot(y[0])\n    plt.title('Channel 1')\n    plt.ylim([-1.0, 1.0])  # Fixed Y-axis scale for all waveforms\n    \n    plt.subplot(2, 1, 2)\n    plt.plot(y[1])\n    plt.title('Channel 2')\n    plt.ylim([-1.0, 1.0])  # Fixed Y-axis scale for all waveforms\n    \n    plt.tight_layout()\n    \n    if output_path:\n        plt.savefig(output_path)\n        plt.close()\n        return output_path\n    else:\n        buf = BytesIO()\n        plt.savefig(buf, format='png')\n        plt.close()\n        buf.seek(0)\n        return buf\n\n\ndef generate_spectrogram_image(audio_path, output_path=None, fig_size=(10, 8)):\n    \"\"\"Generate a spectrogram image from an audio file.\n    \n    Args:\n        audio_path: Path to the audio file\n        output_path: Path to save the generated image (optional)\n        fig_size: Size of the figure (width, height)\n        \n    Returns:\n        BytesIO object containing the image if output_path is None, otherwise saves to output_path\n    \"\"\"\n    # Load audio file\n    y, sr = librosa.load(audio_path, sr=None, mono=False)\n    \n    # If mono, convert to stereo-like format for consistent plotting\n    if y.ndim == 1:\n        y = np.array([y, y])\n    \n    plt.figure(figsize=fig_size)\n    \n    # Set fixed min and max values for spectrogram color scale\n    vmin = -80  # dB\n    vmax = 0    # dB\n    \n    # Generate spectrograms for each channel\n    for i in range(2):\n        # Compute spectrogram\n        S = librosa.amplitude_to_db(np.abs(librosa.stft(y[i])), ref=np.max)\n        \n        plt.subplot(2, 1, i+1)\n        # Use fixed frequency range and consistent color scaling\n        librosa.display.specshow(\n            S, \n            sr=sr, \n            x_axis='time', \n            y_axis='log',\n            vmin=vmin,\n            vmax=vmax\n        )\n        plt.colorbar(format='%+2.0f dB')\n        plt.title(f'Channel {i+1} Spectrogram')\n        \n        # Set frequency range (y-axis) - typically up to Nyquist frequency (sr/2)\n        plt.ylim([20, sr/2])  # From 20Hz to Nyquist frequency\n    \n    plt.tight_layout()\n    \n    if output_path:\n        plt.savefig(output_path)\n        plt.close()\n        return output_path\n    else:\n        buf = BytesIO()\n        plt.savefig(buf, format='png')\n        plt.close()\n        buf.seek(0)\n        return buf\n\n\ndef compare_images(image1_path, image2_path, min_similarity_threshold=0.999):\n    \"\"\"Compare two images using Structural Similarity Index (SSIM) which is robust to small shifts.\n    \n    Args:\n        image1_path: Path to the first image\n        image2_path: Path to the second image\n        min_similarity_threshold: Minimum similarity required for images to be considered matching (0.0-1.0)\n            - Higher values (closer to 1.0) require images to be more similar\n            - Lower values (closer to 0.0) are more permissive\n            - A value of 0.99 requires 99% similarity between images\n            - A value of 0.0 would consider any images to match\n        \n    Returns:\n        Tuple of (similarity_score, is_match)\n        - similarity_score: Value between 0.0 and 1.0, where 1.0 means identical images\n        - is_match: Boolean indicating if similarity_score >= min_similarity_threshold\n    \"\"\"\n    # Open images\n    img1 = Image.open(image1_path).convert('RGB')\n    img2 = Image.open(image2_path).convert('RGB')\n    \n    # Ensure same size for comparison\n    if img1.size != img2.size:\n        img2 = img2.resize(img1.size)\n    \n    # Convert to numpy arrays\n    arr1 = np.array(img1)\n    arr2 = np.array(img2)\n    \n    # Calculate SSIM for each color channel\n    similarity_scores = []\n    for channel in range(3):  # RGB channels\n        score = ssim(arr1[:,:,channel], arr2[:,:,channel], data_range=255)\n        similarity_scores.append(score)\n    \n    # Calculate average SSIM across channels\n    similarity_score = np.mean(similarity_scores)\n    \n    # Determine if images match by comparing similarity to threshold\n    is_match = similarity_score >= min_similarity_threshold\n    \n    return (similarity_score, is_match)\n\n\ndef generate_reference_images(input_path, output_dir=None, prefix=\"\"):\n    \"\"\"Generate reference waveform and spectrogram images for an audio file.\n    \n    Args:\n        input_path: Path to the audio file\n        output_dir: Directory to save the generated images (optional)\n        prefix: Prefix to add to the output image filenames\n        \n    Returns:\n        Tuple of (waveform_path, spectrogram_path)\n    \"\"\"\n    if output_dir is None:\n        output_dir = os.path.dirname(input_path)\n    \n    # Create output directory if it doesn't exist\n    os.makedirs(output_dir, exist_ok=True)\n    \n    input_filename = os.path.basename(input_path)\n    name_without_ext = os.path.splitext(input_filename)[0]\n    \n    # Generate waveform image\n    waveform_path = os.path.join(output_dir, f\"{prefix}{name_without_ext}_waveform.png\")\n    generate_waveform_image(input_path, waveform_path)\n    \n    # Generate spectrogram image\n    spectrogram_path = os.path.join(output_dir, f\"{prefix}{name_without_ext}_spectrogram.png\")\n    generate_spectrogram_image(input_path, spectrogram_path)\n    \n    return (waveform_path, spectrogram_path) "
  },
  {
    "path": "tests/utils_audio_verification.py",
    "content": "\"\"\"\nAudio content verification utility for testing.\n\nVerifies that separated audio stems actually contain what their labels claim\nby correlating against known-good reference separations.\n\"\"\"\n\nimport numpy as np\nimport librosa\nimport os\nfrom dataclasses import dataclass\nfrom typing import Optional\n\n\n@dataclass\nclass StemVerification:\n    \"\"\"Result of verifying a single stem's content.\"\"\"\n    file_path: str\n    label: str\n    corr_vocal: float\n    corr_instrumental: float\n    corr_mix: float\n    rms: float\n    detected_content: str\n    label_matches: bool\n\n\ndef load_references(input_dir=\"tests/inputs\", sr=44100):\n    \"\"\"Load known-good reference stems and the original mix.\n\n    Returns (ref_vocal, ref_instrumental, ref_mix, min_len) as mono numpy arrays.\n    \"\"\"\n    ref_vocal, _ = librosa.load(\n        os.path.join(input_dir, \"mardy20s_(Vocals)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\"),\n        sr=sr, mono=True,\n    )\n    ref_inst, _ = librosa.load(\n        os.path.join(input_dir, \"mardy20s_(Instrumental)_mel_band_roformer_karaoke_aufr33_viperx_sdr_10.flac\"),\n        sr=sr, mono=True,\n    )\n    ref_mix, _ = librosa.load(\n        os.path.join(input_dir, \"mardy20s.flac\"),\n        sr=sr, mono=True,\n    )\n    min_len = min(len(ref_vocal), len(ref_inst), len(ref_mix))\n    return ref_vocal[:min_len], ref_inst[:min_len], ref_mix[:min_len], min_len\n\n\ndef classify_audio(audio_mono, ref_vocal, ref_instrumental, ref_mix, min_len):\n    \"\"\"Classify audio content by correlation against references.\n\n    Returns (corr_vocal, corr_instrumental, corr_mix, rms, detected_content).\n    \"\"\"\n    y = audio_mono[:min_len]\n    if len(y) < min_len:\n        y = np.pad(y, (0, min_len - len(y)))\n\n    corr_vocal = np.corrcoef(y, ref_vocal)[0, 1]\n    corr_inst = np.corrcoef(y, ref_instrumental)[0, 1]\n    corr_mix = np.corrcoef(y, ref_mix)[0, 1]\n    rms = float(np.sqrt(np.mean(y ** 2)))\n\n    if corr_mix > 0.95:\n        detected = \"FULL_MIX\"\n    elif rms < 0.005:\n        detected = \"SILENT\"\n    elif corr_vocal > corr_inst and corr_vocal > 0.5:\n        detected = \"VOCALS\"\n    elif corr_inst > corr_vocal and corr_inst > 0.5:\n        detected = \"INSTRUMENTAL\"\n    else:\n        detected = \"UNCLEAR\"\n\n    return corr_vocal, corr_inst, corr_mix, rms, detected\n\n\ndef verify_stem(file_path, label, ref_vocal, ref_instrumental, ref_mix, min_len, sr=44100):\n    \"\"\"Verify a single stem file's content matches its label.\n\n    Args:\n        file_path: Path to the audio file.\n        label: The stem label (e.g., \"Vocals\", \"Instrumental\").\n        ref_vocal, ref_instrumental, ref_mix: Reference arrays from load_references().\n        min_len: Minimum length for alignment.\n        sr: Sample rate.\n\n    Returns:\n        StemVerification dataclass with results.\n    \"\"\"\n    y, _ = librosa.load(file_path, sr=sr, mono=True)\n    cv, ci, cm, rms, detected = classify_audio(y, ref_vocal, ref_instrumental, ref_mix, min_len)\n\n    # Determine if label matches detected content\n    label_lower = label.lower()\n    if detected == \"VOCALS\":\n        label_matches = \"vocal\" in label_lower or \"karaoke\" not in label_lower and label_lower in (\"vocals\",)\n    elif detected == \"INSTRUMENTAL\":\n        label_matches = label_lower in (\"instrumental\", \"karaoke\", \"inst\", \"other\", \"no_vocals\")\n    elif detected == \"FULL_MIX\":\n        label_matches = False  # A stem should never be the full mix\n    elif detected == \"SILENT\":\n        label_matches = False\n    else:\n        label_matches = False\n\n    return StemVerification(\n        file_path=file_path,\n        label=label,\n        corr_vocal=cv,\n        corr_instrumental=ci,\n        corr_mix=cm,\n        rms=rms,\n        detected_content=detected,\n        label_matches=label_matches,\n    )\n\n\ndef verify_separation_outputs(output_files, ref_vocal, ref_instrumental, ref_mix, min_len, sr=44100):\n    \"\"\"Verify all output files from a separation.\n\n    Args:\n        output_files: List of output file paths (with stem names in parentheses).\n        ref_vocal, ref_instrumental, ref_mix: Reference arrays.\n        min_len: Minimum length for alignment.\n        sr: Sample rate.\n\n    Returns:\n        List of StemVerification results.\n    \"\"\"\n    import re\n\n    results = []\n    for fp in output_files:\n        fname = os.path.basename(fp)\n        match = re.search(r'_\\(([^)]+)\\)', fname)\n        label = match.group(1) if match else \"Unknown\"\n        result = verify_stem(fp, label, ref_vocal, ref_instrumental, ref_mix, min_len, sr)\n        results.append(result)\n\n    return results\n\n\ndef print_verification_report(results):\n    \"\"\"Print a formatted verification report.\"\"\"\n    print(f\"\\n{'File':<60} {'Label':<15} {'Corr-Voc':>8} {'Corr-Inst':>9} {'Corr-Mix':>8} {'Content':<15} {'Match'}\")\n    print(\"-\" * 130)\n    for r in results:\n        short = os.path.basename(r.file_path)[:57]\n        status = \"OK\" if r.label_matches else \"MISMATCH\"\n        print(f\"{short:<60} {r.label:<15} {r.corr_vocal:>8.3f} {r.corr_instrumental:>9.3f} {r.corr_mix:>8.3f} {r.detected_content:<15} {status}\")\n"
  },
  {
    "path": "tools/calculate-model-hashes.py",
    "content": "#!/usr/bin/env python3\n\nimport os\nimport sys\nimport json\nimport hashlib\nimport requests\n\nMODEL_CACHE_PATH = \"/tmp/audio-separator-models\"\nVR_MODEL_DATA_LOCAL_PATH = f\"{MODEL_CACHE_PATH}/vr_model_data.json\"\nMDX_MODEL_DATA_LOCAL_PATH = f\"{MODEL_CACHE_PATH}/mdx_model_data.json\"\n\nMODEL_DATA_URL_PREFIX = \"https://raw.githubusercontent.com/TRvlvr/application_data/main\"\nVR_MODEL_DATA_URL = f\"{MODEL_DATA_URL_PREFIX}/vr_model_data/model_data_new.json\"\nMDX_MODEL_DATA_URL = f\"{MODEL_DATA_URL_PREFIX}/mdx_model_data/model_data_new.json\"\n\nOUTPUT_PATH = f\"{MODEL_CACHE_PATH}/model_hashes.json\"\n\n\ndef get_model_hash(model_path):\n    \"\"\"\n    Get the hash of a model file\n    \"\"\"\n    # print(f\"Getting hash for model at {model_path}\")\n    try:\n        with open(model_path, \"rb\") as f:\n            f.seek(-10000 * 1024, 2)  # Move the file pointer 10MB before the end of the file\n            hash_result = hashlib.md5(f.read()).hexdigest()\n            # print(f\"Hash for {model_path}: {hash_result}\")\n            return hash_result\n    except IOError:\n        with open(model_path, \"rb\") as f:\n            hash_result = hashlib.md5(f.read()).hexdigest()\n            # print(f\"IOError encountered, hash for {model_path}: {hash_result}\")\n            return hash_result\n\n\ndef download_file_if_missing(url, local_path):\n    \"\"\"\n    Download a file from a URL if it doesn't exist locally\n    \"\"\"\n    print(f\"Checking if {local_path} needs to be downloaded from {url}\")\n    if not os.path.exists(local_path):\n        print(f\"Downloading {url} to {local_path}\")\n        with requests.get(url, stream=True, timeout=10) as r:\n            r.raise_for_status()\n            with open(local_path, \"wb\") as f:\n                for chunk in r.iter_content(chunk_size=8192):\n                    f.write(chunk)\n        print(f\"Downloaded {url} to {local_path}\")\n    else:\n        print(f\"{local_path} already exists. Skipping download.\")\n\n\ndef load_json_data(file_path):\n    \"\"\"\n    Load JSON data from a file\n    \"\"\"\n    print(f\"Loading JSON data from {file_path}\")\n    try:\n        with open(file_path, \"r\", encoding=\"utf-8\") as file:\n            data = json.load(file)\n            print(f\"Loaded JSON data successfully from {file_path}\")\n            return data\n    except FileNotFoundError:\n        print(f\"{file_path} not found.\")\n        sys.exit(1)\n\n\ndef iterate_and_hash(directory):\n    \"\"\"\n    Iterate through a directory and hash all model files\n    \"\"\"\n    print(f\"Iterating through directory {directory} to hash model files\")\n    model_files = [(file, os.path.join(root, file)) for root, _, files in os.walk(directory) for file in files if file.endswith((\".pth\", \".onnx\"))]\n\n    download_file_if_missing(VR_MODEL_DATA_URL, VR_MODEL_DATA_LOCAL_PATH)\n    download_file_if_missing(MDX_MODEL_DATA_URL, MDX_MODEL_DATA_LOCAL_PATH)\n\n    vr_model_data = load_json_data(VR_MODEL_DATA_LOCAL_PATH)\n    mdx_model_data = load_json_data(MDX_MODEL_DATA_LOCAL_PATH)\n\n    combined_model_params = {\n        **vr_model_data,\n        **mdx_model_data,\n    }\n\n    model_info_list = []\n    for file, file_path in sorted(model_files):\n        file_hash = get_model_hash(file_path)\n        model_info = {\n            \"file\": file,\n            \"hash\": file_hash,\n            \"params\": combined_model_params.get(file_hash, \"Parameters not found\"),\n        }\n        model_info_list.append(model_info)\n\n    print(f\"Writing model info list to {OUTPUT_PATH}\")\n    with open(OUTPUT_PATH, \"w\", encoding=\"utf-8\") as json_file:\n        json.dump(model_info_list, json_file, indent=4)\n        print(f\"Successfully wrote model info list to {OUTPUT_PATH}\")\n\n\nif __name__ == \"__main__\":\n    iterate_and_hash(MODEL_CACHE_PATH)\n"
  },
  {
    "path": "tools/sync-to-github.py",
    "content": "#! /usr/bin/env python3\nimport os\nimport requests\nimport hashlib\nfrom typing import List, Dict\nimport sys\n\n# Configuration\nGITHUB_TOKEN = os.getenv(\"GITHUB_TOKEN\", \"\").strip()  # Add .strip() to remove whitespace\nREPO_OWNER = \"nomadkaraoke\"\nREPO_NAME = \"python-audio-separator\"\nRELEASE_TAG = \"model-configs\"\n\nHEADERS = {\"Authorization\": f\"Bearer {GITHUB_TOKEN}\", \"Accept\": \"application/vnd.github.v3+json\"}\n\n\ndef debug_request(url: str, headers: dict, response: requests.Response):\n    \"\"\"Debug helper to print request and response details.\"\"\"\n    print(\"\\n=== Debug Information ===\")\n    print(f\"Request URL: {url}\")\n    print(f\"Request Headers: {headers}\")\n    print(f\"Response Status: {response.status_code}\")\n    print(f\"Response Headers: {dict(response.headers)}\")\n    print(f\"Response Body: {response.text[:500]}...\")  # First 500 chars of response\n    print(\"=======================\\n\")\n\n\ndef get_release_assets() -> List[Dict]:\n    \"\"\"Get all assets from the specified release.\"\"\"\n    url = f\"https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/releases/tags/{RELEASE_TAG}\"\n    print(f\"\\nDebug: Using token: {GITHUB_TOKEN[:4]}...{GITHUB_TOKEN[-4:]}\")  # Show first/last 4 chars\n    print(f\"Debug: Requesting URL: {url}\")\n    print(f\"Debug: Headers: {HEADERS}\")\n\n    response = requests.get(url, headers=HEADERS)\n    debug_request(url, HEADERS, response)\n\n    if response.status_code != 200:\n        print(f\"Error getting release: {response.status_code}\")\n        return []\n\n    release_data = response.json()\n    return release_data.get(\"assets\", [])\n\n\ndef list_local_files() -> List[str]:\n    \"\"\"List all files in the current directory (excluding directories).\"\"\"\n    return [f for f in os.listdir(\".\") if os.path.isfile(f)]\n\n\ndef calculate_file_hash(filepath: str) -> str:\n    \"\"\"Calculate SHA256 hash of a file.\"\"\"\n    sha256_hash = hashlib.sha256()\n    with open(filepath, \"rb\") as f:\n        for byte_block in iter(lambda: f.read(4096), b\"\"):\n            sha256_hash.update(byte_block)\n    return sha256_hash.hexdigest()\n\n\ndef upload_asset(release_id: int, filepath: str):\n    \"\"\"Upload a file as a release asset.\"\"\"\n    upload_url = f\"https://uploads.github.com/repos/{REPO_OWNER}/{REPO_NAME}/releases/{release_id}/assets\"\n\n    filename = os.path.basename(filepath)\n    headers = {\"Authorization\": f\"Bearer {GITHUB_TOKEN}\", \"Content-Type\": \"application/octet-stream\"}\n\n    params = {\"name\": filename}\n\n    with open(filepath, \"rb\") as f:\n        response = requests.post(upload_url, headers=headers, params=params, data=f)\n\n    if response.status_code == 201:\n        print(f\"Successfully uploaded {filename}\")\n    else:\n        print(f\"Failed to upload {filename}: {response.status_code}\")\n        print(response.text)\n\n\ndef download_asset(asset: Dict):\n    \"\"\"Download a release asset to the local directory.\"\"\"\n    filename = asset[\"name\"]\n    download_url = asset[\"browser_download_url\"]\n\n    print(f\"Downloading {filename}...\")\n    response = requests.get(download_url, headers=HEADERS, stream=True)\n\n    if response.status_code == 200:\n        with open(filename, \"wb\") as f:\n            for chunk in response.iter_content(chunk_size=8192):\n                f.write(chunk)\n        print(f\"Successfully downloaded {filename}\")\n    else:\n        print(f\"Failed to download {filename}: {response.status_code}\")\n\n\ndef main():\n    if not GITHUB_TOKEN:\n        print(\"Please set GITHUB_TOKEN environment variable\")\n        sys.exit(1)\n\n    print(f\"Debug: Script starting with token length: {len(GITHUB_TOKEN)}\")\n    print(f\"Debug: Token characters: {[ord(c) for c in GITHUB_TOKEN]}\")\n    print(f\"Debug: Token first/last chars: {GITHUB_TOKEN[:4]}...{GITHUB_TOKEN[-4:]}\")\n\n    # Get release ID first\n    url = f\"https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/releases/tags/{RELEASE_TAG}\"\n    response = requests.get(url, headers=HEADERS)\n    if response.status_code != 200:\n        print(f\"Error getting release: {response.status_code}\")\n        return\n\n    release_id = response.json()[\"id\"]\n\n    # Get existing assets\n    existing_assets = get_release_assets()\n    existing_filenames = {asset[\"name\"] for asset in existing_assets}\n\n    # Get local files\n    local_files = list_local_files()\n\n    print(\"\\nExisting release assets:\")\n    for asset in existing_assets:\n        print(f\"- {asset['name']} ({asset['size']} bytes)\")\n\n    # Add download option\n    print(\"\\nOptions:\")\n    print(\"1. Upload new files\")\n    print(\"2. Download all missing files\")\n    print(\"3. Exit\")\n\n    choice = input(\"\\nEnter your choice (1-3): \")\n\n    if choice == \"1\":\n        # Original upload logic\n        files_to_upload = []\n        for local_file in local_files:\n            if local_file not in existing_filenames:\n                print(f\"- {local_file}\")\n                files_to_upload.append(local_file)\n\n        if files_to_upload:\n            files_with_size = [(f, os.path.getsize(f)) for f in files_to_upload]\n            files_with_size.sort(key=lambda x: x[1])\n\n            print(\"\\nFiles to upload (in order):\")\n            for file, size in files_with_size:\n                print(f\"- {file} ({size / 1024 / 1024:.2f} MB)\")\n\n            response = input(\"\\nDo you want to upload these files? (y/n): \")\n            if response.lower() == \"y\":\n                for file, _ in files_with_size:\n                    upload_asset(release_id, file)\n        else:\n            print(\"\\nNo new files to upload.\")\n\n    elif choice == \"2\":\n        # Download missing files\n        files_to_download = []\n        for asset in existing_assets:\n            if asset[\"name\"] not in local_files:\n                files_to_download.append(asset)\n\n        if files_to_download:\n            print(\"\\nFiles to download:\")\n            for asset in files_to_download:\n                print(f\"- {asset['name']} ({asset['size'] / 1024 / 1024:.2f} MB)\")\n\n            response = input(\"\\nDo you want to download these files? (y/n): \")\n            if response.lower() == \"y\":\n                for asset in files_to_download:\n                    download_asset(asset)\n        else:\n            print(\"\\nNo files to download. Local directory is in sync.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  }
]