Repository: ace-step/ACE-Step-1.5 Branch: main Commit: 00f514af36a9 Files: 1148 Total size: 10.3 MB Directory structure: gitextract_6lqx4nhl/ ├── .claude/ │ └── skills/ │ ├── acestep/ │ │ ├── SKILL.md │ │ ├── api-reference.md │ │ └── scripts/ │ │ ├── acestep.sh │ │ └── config.example.json │ ├── acestep-docs/ │ │ ├── SKILL.md │ │ ├── api/ │ │ │ ├── API.md │ │ │ └── Openrouter_API.md │ │ ├── getting-started/ │ │ │ ├── ABOUT.md │ │ │ ├── README.md │ │ │ └── Tutorial.md │ │ └── guides/ │ │ ├── ENVIRONMENT_SETUP.md │ │ ├── GPU_COMPATIBILITY.md │ │ ├── GRADIO_GUIDE.md │ │ ├── INFERENCE.md │ │ ├── SCRIPT_CONFIGURATION.md │ │ └── UPDATE_AND_BACKUP.md │ ├── acestep-lyrics-transcription/ │ │ ├── SKILL.md │ │ └── scripts/ │ │ ├── acestep-lyrics-transcription.sh │ │ └── config.example.json │ ├── acestep-simplemv/ │ │ ├── SKILL.md │ │ └── scripts/ │ │ ├── package.json │ │ ├── remotion.config.ts │ │ ├── render-mv.sh │ │ ├── render.mjs │ │ ├── render.sh │ │ ├── src/ │ │ │ ├── AudioVisualization.tsx │ │ │ ├── Root.tsx │ │ │ ├── index.ts │ │ │ ├── parseLrc.ts │ │ │ └── types.ts │ │ └── tsconfig.json │ ├── acestep-songwriting/ │ │ └── SKILL.md │ └── acestep-thumbnail/ │ ├── SKILL.md │ └── scripts/ │ ├── acestep-thumbnail.sh │ └── config.example.json ├── .dockerignore ├── .editorconfig ├── .githooks/ │ └── pre-push ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.md │ │ └── feature_request.md │ ├── codeql-config.yml │ ├── copilot-instructions.md │ └── workflows/ │ ├── close-inactive-issues.yml │ ├── codeql.yml │ └── docs.yml ├── .gitignore ├── AGENTS.md ├── CONTRIBUTING.md ├── Dockerfile.jetson ├── LICENSE ├── README-XPU.md ├── README.md ├── SECURITY.md ├── acestep/ │ ├── __init__.py │ ├── acestep_v15_pipeline.py │ ├── api/ │ │ ├── __init__.py │ │ ├── http/ │ │ │ ├── __init__.py │ │ │ ├── audio_route.py │ │ │ ├── audio_route_http_test.py │ │ │ ├── audio_route_test.py │ │ │ ├── auth.py │ │ │ ├── auth_test.py │ │ │ ├── lora_routes.py │ │ │ ├── lora_routes_http_test.py │ │ │ ├── lora_routes_test.py │ │ │ ├── model_init_service.py │ │ │ ├── model_init_service_test.py │ │ │ ├── model_service_routes.py │ │ │ ├── model_service_routes_http_test.py │ │ │ ├── model_service_routes_test.py │ │ │ ├── query_result_route.py │ │ │ ├── query_result_route_http_test.py │ │ │ ├── query_result_route_test.py │ │ │ ├── query_result_service.py │ │ │ ├── reinitialize_route.py │ │ │ ├── reinitialize_route_http_test.py │ │ │ ├── reinitialize_route_test.py │ │ │ ├── release_task_audio_paths.py │ │ │ ├── release_task_audio_paths_test.py │ │ │ ├── release_task_models.py │ │ │ ├── release_task_models_test.py │ │ │ ├── release_task_param_parser.py │ │ │ ├── release_task_param_parser_test.py │ │ │ ├── release_task_request_builder.py │ │ │ ├── release_task_request_builder_test.py │ │ │ ├── release_task_request_parser.py │ │ │ ├── release_task_request_parser_test.py │ │ │ ├── release_task_route.py │ │ │ ├── release_task_route_http_test.py │ │ │ ├── sample_format_routes.py │ │ │ ├── sample_format_routes_http_test.py │ │ │ └── sample_format_routes_test.py │ │ ├── job_analysis_runtime.py │ │ ├── job_analysis_runtime_test.py │ │ ├── job_blocking_generation.py │ │ ├── job_blocking_generation_test.py │ │ ├── job_execution_runtime.py │ │ ├── job_execution_runtime_test.py │ │ ├── job_generation_runtime.py │ │ ├── job_generation_runtime_test.py │ │ ├── job_generation_setup.py │ │ ├── job_generation_setup_test.py │ │ ├── job_llm_preparation.py │ │ ├── job_llm_preparation_test.py │ │ ├── job_model_selection.py │ │ ├── job_model_selection_test.py │ │ ├── job_result_payload.py │ │ ├── job_result_payload_test.py │ │ ├── job_runtime_state.py │ │ ├── job_runtime_state_test.py │ │ ├── jobs/ │ │ │ ├── __init__.py │ │ │ ├── local_cache_updates.py │ │ │ ├── local_cache_updates_test.py │ │ │ ├── models.py │ │ │ ├── store.py │ │ │ ├── store_test.py │ │ │ ├── test_fakes.py │ │ │ ├── worker_loops.py │ │ │ └── worker_loops_test.py │ │ ├── lifespan_runtime.py │ │ ├── lifespan_runtime_test.py │ │ ├── llm_generation_inputs.py │ │ ├── llm_readiness.py │ │ ├── log_capture.py │ │ ├── log_capture_test.py │ │ ├── model_download.py │ │ ├── model_download_test.py │ │ ├── route_setup.py │ │ ├── route_setup_test.py │ │ ├── runtime_helpers.py │ │ ├── runtime_helpers_test.py │ │ ├── server_cli.py │ │ ├── server_cli_test.py │ │ ├── server_utils.py │ │ ├── server_utils_test.py │ │ ├── startup_llm_init.py │ │ ├── startup_llm_init_test.py │ │ ├── startup_model_init.py │ │ ├── startup_model_init_test.py │ │ ├── train_api_dataset_auto_label_async_route.py │ │ ├── train_api_dataset_auto_label_routes.py │ │ ├── train_api_dataset_auto_label_routes_http_test.py │ │ ├── train_api_dataset_auto_label_status_route.py │ │ ├── train_api_dataset_auto_label_sync_route.py │ │ ├── train_api_dataset_models.py │ │ ├── train_api_dataset_models_test.py │ │ ├── train_api_dataset_preprocess_routes.py │ │ ├── train_api_dataset_preprocess_routes_http_test.py │ │ ├── train_api_dataset_sample_routes.py │ │ ├── train_api_dataset_sample_routes_http_test.py │ │ ├── train_api_dataset_scan_load_routes.py │ │ ├── train_api_dataset_scan_load_routes_http_test.py │ │ ├── train_api_dataset_service.py │ │ ├── train_api_dataset_service_http_test.py │ │ ├── train_api_dataset_status_routes.py │ │ ├── train_api_dataset_status_routes_http_test.py │ │ ├── train_api_lokr_start_route.py │ │ ├── train_api_lora_start_route.py │ │ ├── train_api_models.py │ │ ├── train_api_runtime.py │ │ ├── train_api_service.py │ │ ├── train_api_service_http_test.py │ │ ├── worker_runtime.py │ │ └── worker_runtime_test.py │ ├── api_server.py │ ├── audio_utils.py │ ├── audio_utils_test.py │ ├── audio_utils_uuid_test.py │ ├── cli_args.py │ ├── cli_args_test.py │ ├── constants.py │ ├── constrained_logits_processor.py │ ├── core/ │ │ ├── __init__.py │ │ ├── audio/ │ │ │ └── __init__.py │ │ ├── generation/ │ │ │ ├── __init__.py │ │ │ └── handler/ │ │ │ ├── __init__.py │ │ │ ├── audio_codes.py │ │ │ ├── batch_prep.py │ │ │ ├── conditioning_batch.py │ │ │ ├── conditioning_batch_test.py │ │ │ ├── conditioning_embed.py │ │ │ ├── conditioning_embed_test.py │ │ │ ├── conditioning_masks.py │ │ │ ├── conditioning_masks_test.py │ │ │ ├── conditioning_target.py │ │ │ ├── conditioning_text.py │ │ │ ├── cover_noise_strength_forwarding_test.py │ │ │ ├── diffusion.py │ │ │ ├── diffusion_test.py │ │ │ ├── generate_music.py │ │ │ ├── generate_music_decode.py │ │ │ ├── generate_music_decode_test.py │ │ │ ├── generate_music_execute.py │ │ │ ├── generate_music_execute_test.py │ │ │ ├── generate_music_payload.py │ │ │ ├── generate_music_payload_test.py │ │ │ ├── generate_music_request.py │ │ │ ├── generate_music_request_test.py │ │ │ ├── generate_music_test.py │ │ │ ├── init_service.py │ │ │ ├── init_service_catalog.py │ │ │ ├── init_service_downloads.py │ │ │ ├── init_service_loader.py │ │ │ ├── init_service_loader_components.py │ │ │ ├── init_service_memory_basic.py │ │ │ ├── init_service_memory_transfer.py │ │ │ ├── init_service_offload_context.py │ │ │ ├── init_service_orchestrator.py │ │ │ ├── init_service_setup.py │ │ │ ├── init_service_test.py │ │ │ ├── io_audio.py │ │ │ ├── io_audio_test.py │ │ │ ├── lora/ │ │ │ │ ├── __init__.py │ │ │ │ ├── adapter_discovery.py │ │ │ │ ├── controls.py │ │ │ │ ├── controls_test.py │ │ │ │ ├── lifecycle.py │ │ │ │ ├── lifecycle_test.py │ │ │ │ ├── registry_builder.py │ │ │ │ ├── registry_state.py │ │ │ │ └── scale_apply.py │ │ │ ├── lora_integration_test.py │ │ │ ├── lora_manager.py │ │ │ ├── lyric_alignment_common.py │ │ │ ├── lyric_alignment_test.py │ │ │ ├── lyric_score.py │ │ │ ├── lyric_timestamp.py │ │ │ ├── memory_utils.py │ │ │ ├── metadata_utils.py │ │ │ ├── mlx_dit_init.py │ │ │ ├── mlx_dit_init_test.py │ │ │ ├── mlx_vae_decode_native.py │ │ │ ├── mlx_vae_encode_native.py │ │ │ ├── mlx_vae_init.py │ │ │ ├── mlx_vae_init_test.py │ │ │ ├── mlx_vae_native_test.py │ │ │ ├── padding_utils.py │ │ │ ├── progress.py │ │ │ ├── progress_project_root_test.py │ │ │ ├── prompt_utils.py │ │ │ ├── reference_audio_validation_test.py │ │ │ ├── repaint_step_injection.py │ │ │ ├── repaint_step_injection_test.py │ │ │ ├── repaint_waveform_splice.py │ │ │ ├── repaint_waveform_splice_test.py │ │ │ ├── resolve_repaint_config_test.py │ │ │ ├── service_generate.py │ │ │ ├── service_generate_execute.py │ │ │ ├── service_generate_execute_test.py │ │ │ ├── service_generate_outputs.py │ │ │ ├── service_generate_request.py │ │ │ ├── service_generate_request_test.py │ │ │ ├── service_generate_test.py │ │ │ ├── task_utils.py │ │ │ ├── task_utils_test.py │ │ │ ├── training_preset.py │ │ │ ├── training_preset_test.py │ │ │ ├── vae_decode.py │ │ │ ├── vae_decode_chunks.py │ │ │ ├── vae_decode_chunks_test.py │ │ │ ├── vae_decode_mixin_test.py │ │ │ ├── vae_decode_test_helpers.py │ │ │ ├── vae_encode.py │ │ │ ├── vae_encode_chunks.py │ │ │ └── vae_encode_test.py │ │ ├── llm/ │ │ │ └── __init__.py │ │ ├── lora/ │ │ │ ├── __init__.py │ │ │ ├── introspection.py │ │ │ ├── registry.py │ │ │ ├── scaling.py │ │ │ ├── service.py │ │ │ └── service_test.py │ │ ├── scoring/ │ │ │ ├── __init__.py │ │ │ ├── _dtw.py │ │ │ ├── dit_alignment.py │ │ │ ├── dit_score.py │ │ │ ├── lm_score.py │ │ │ └── scoring_test.py │ │ └── system/ │ │ └── __init__.py │ ├── dataset/ │ │ ├── __init__.py │ │ ├── builder/ │ │ │ └── __init__.py │ │ └── runtime/ │ │ └── __init__.py │ ├── dataset_handler.py │ ├── debug_utils.py │ ├── genres_vocab.txt │ ├── gpu_config.py │ ├── gpu_config_effective_free_vram_test.py │ ├── handler.py │ ├── inference.py │ ├── launcher_compat.py │ ├── launcher_compat_test.py │ ├── launcher_legacy_torch_fix_test.py │ ├── llm_backend_compat.py │ ├── llm_backend_compat_test.py │ ├── llm_inference.py │ ├── llm_inference_cache_cleanup_test.py │ ├── llm_inference_cfg_fixes_test.py │ ├── llm_inference_dist_cleanup_test.py │ ├── llm_inference_enforce_eager_test.py │ ├── local_cache.py │ ├── local_cache_thread_safety_test.py │ ├── model_downloader.py │ ├── model_downloader_test.py │ ├── models/ │ │ ├── __init__.py │ │ ├── base/ │ │ │ ├── __init__.py │ │ │ ├── apg_guidance.py │ │ │ ├── configuration_acestep_v15.py │ │ │ └── modeling_acestep_v15_base.py │ │ ├── mlx/ │ │ │ ├── __init__.py │ │ │ ├── dit_convert.py │ │ │ ├── dit_generate.py │ │ │ ├── dit_model.py │ │ │ ├── vae_convert.py │ │ │ └── vae_model.py │ │ ├── sft/ │ │ │ ├── __init__.py │ │ │ ├── apg_guidance.py │ │ │ ├── configuration_acestep_v15.py │ │ │ └── modeling_acestep_v15_base.py │ │ └── turbo/ │ │ ├── __init__.py │ │ ├── configuration_acestep_v15.py │ │ └── modeling_acestep_v15_turbo.py │ ├── null_duration_fixes_test.py │ ├── openrouter_adapter.py │ ├── openrouter_models.py │ ├── text2music_src_audio_test.py │ ├── third_parts/ │ │ └── nano-vllm/ │ │ ├── LICENSE │ │ ├── README.md │ │ ├── bench.py │ │ ├── example.py │ │ ├── nanovllm/ │ │ │ ├── __init__.py │ │ │ ├── config.py │ │ │ ├── distributed.py │ │ │ ├── engine/ │ │ │ │ ├── block_manager.py │ │ │ │ ├── llm_engine.py │ │ │ │ ├── model_runner.py │ │ │ │ ├── scheduler.py │ │ │ │ └── sequence.py │ │ │ ├── layers/ │ │ │ │ ├── activation.py │ │ │ │ ├── attention.py │ │ │ │ ├── embed_head.py │ │ │ │ ├── layernorm.py │ │ │ │ ├── linear.py │ │ │ │ ├── rotary_embedding.py │ │ │ │ └── sampler.py │ │ │ ├── llm.py │ │ │ ├── models/ │ │ │ │ └── qwen3.py │ │ │ ├── sampling_params.py │ │ │ └── utils/ │ │ │ ├── compat.py │ │ │ ├── compat_test.py │ │ │ ├── context.py │ │ │ └── loader.py │ │ └── pyproject.toml │ ├── training/ │ │ ├── __init__.py │ │ ├── configs.py │ │ ├── data_module.py │ │ ├── data_module_test.py │ │ ├── dataset_builder.py │ │ ├── dataset_builder_modules/ │ │ │ ├── __init__.py │ │ │ ├── audio_io.py │ │ │ ├── builder.py │ │ │ ├── core.py │ │ │ ├── csv_metadata.py │ │ │ ├── dataframe.py │ │ │ ├── label_all.py │ │ │ ├── label_single.py │ │ │ ├── label_utils.py │ │ │ ├── metadata.py │ │ │ ├── models.py │ │ │ ├── preprocess.py │ │ │ ├── preprocess_audio.py │ │ │ ├── preprocess_context.py │ │ │ ├── preprocess_encoder.py │ │ │ ├── preprocess_lyrics.py │ │ │ ├── preprocess_manifest.py │ │ │ ├── preprocess_text.py │ │ │ ├── preprocess_utils.py │ │ │ ├── preprocess_vae.py │ │ │ ├── scan.py │ │ │ ├── serialization.py │ │ │ └── update_sample.py │ │ ├── lokr_utils.py │ │ ├── lora_checkpoint.py │ │ ├── lora_injection.py │ │ ├── lora_utils.py │ │ ├── path_safety.py │ │ ├── test_lora_utils.py │ │ └── trainer.py │ ├── training_v2/ │ │ ├── __init__.py │ │ ├── cli/ │ │ │ ├── __init__.py │ │ │ ├── args.py │ │ │ ├── common.py │ │ │ ├── config_builder.py │ │ │ ├── train_fixed.py │ │ │ ├── train_vanilla.py │ │ │ └── validation.py │ │ ├── configs.py │ │ ├── estimate.py │ │ ├── fixed_lora_module.py │ │ ├── gpu_utils.py │ │ ├── make_test_fixtures.py │ │ ├── model_discovery.py │ │ ├── model_loader.py │ │ ├── optim.py │ │ ├── preprocess.py │ │ ├── preprocess_discovery.py │ │ ├── preprocess_prompt.py │ │ ├── preprocess_vae.py │ │ ├── presets/ │ │ │ ├── high_quality.json │ │ │ ├── quick_test.json │ │ │ ├── recommended.json │ │ │ ├── vram_12gb.json │ │ │ ├── vram_16gb.json │ │ │ ├── vram_24gb_plus.json │ │ │ └── vram_8gb.json │ │ ├── settings.py │ │ ├── tensorboard_utils.py │ │ ├── timestep_sampling.py │ │ ├── trainer_basic_loop.py │ │ ├── trainer_fixed.py │ │ ├── trainer_helpers.py │ │ ├── trainer_helpers_test.py │ │ ├── trainer_vanilla.py │ │ └── ui/ │ │ ├── __init__.py │ │ ├── banner.py │ │ ├── config_panel.py │ │ ├── errors.py │ │ ├── flows.py │ │ ├── flows_common.py │ │ ├── flows_estimate.py │ │ ├── flows_preprocess.py │ │ ├── flows_setup.py │ │ ├── flows_train.py │ │ ├── flows_train_steps.py │ │ ├── gpu_monitor.py │ │ ├── help_formatter.py │ │ ├── presets.py │ │ ├── progress.py │ │ ├── prompt_helpers.py │ │ ├── summary.py │ │ ├── wizard.py │ │ └── wizard_menus.py │ └── ui/ │ ├── __init__.py │ ├── gradio/ │ │ ├── __init__.py │ │ ├── api/ │ │ │ ├── __init__.py │ │ │ ├── api_routes.py │ │ │ ├── api_routes_resource_test.py │ │ │ └── api_routes_thread_safety_test.py │ │ ├── events/ │ │ │ ├── __init__.py │ │ │ ├── generation/ │ │ │ │ ├── __init__.py │ │ │ │ ├── llm_action_params.py │ │ │ │ ├── llm_actions.py │ │ │ │ ├── llm_analysis_actions.py │ │ │ │ ├── llm_format_actions.py │ │ │ │ ├── llm_sample_actions.py │ │ │ │ ├── metadata_loading.py │ │ │ │ ├── mode_ui.py │ │ │ │ ├── mode_ui_helpers.py │ │ │ │ ├── mode_ui_test.py │ │ │ │ ├── model_config.py │ │ │ │ ├── model_config_test.py │ │ │ │ ├── service_init.py │ │ │ │ ├── service_init_test.py │ │ │ │ ├── ui_helpers.py │ │ │ │ └── validation.py │ │ │ ├── generation_handlers.py │ │ │ ├── generation_handlers_test.py │ │ │ ├── results/ │ │ │ │ ├── __init__.py │ │ │ │ ├── _batch_management_test_support.py │ │ │ │ ├── audio_playback_updates.py │ │ │ │ ├── audio_playback_updates_test.py │ │ │ │ ├── audio_transfer.py │ │ │ │ ├── batch_management.py │ │ │ │ ├── batch_management_background.py │ │ │ │ ├── batch_management_background_test.py │ │ │ │ ├── batch_management_helpers.py │ │ │ │ ├── batch_management_helpers_test.py │ │ │ │ ├── batch_management_test.py │ │ │ │ ├── batch_management_wrapper.py │ │ │ │ ├── batch_navigation.py │ │ │ │ ├── batch_navigation_test.py │ │ │ │ ├── batch_queue.py │ │ │ │ ├── batch_queue_test.py │ │ │ │ ├── generation_info.py │ │ │ │ ├── generation_info_test.py │ │ │ │ ├── generation_progress.py │ │ │ │ ├── lrc_utils.py │ │ │ │ ├── lrc_utils_test.py │ │ │ │ └── scoring.py │ │ │ ├── results_handlers.py │ │ │ ├── results_handlers_facade_test.py │ │ │ ├── training/ │ │ │ │ ├── __init__.py │ │ │ │ ├── dataset_ops.py │ │ │ │ ├── dataset_ops_test.py │ │ │ │ ├── lokr_training.py │ │ │ │ ├── lora_training.py │ │ │ │ ├── preprocess.py │ │ │ │ ├── preprocess_test.py │ │ │ │ ├── training_facade_test.py │ │ │ │ ├── training_utils.py │ │ │ │ └── training_utils_test.py │ │ │ ├── training_handlers.py │ │ │ └── wiring/ │ │ │ ├── __init__.py │ │ │ ├── ast_test_utils.py │ │ │ ├── context.py │ │ │ ├── context_test.py │ │ │ ├── decomposition_contract_generation_test.py │ │ │ ├── decomposition_contract_helpers.py │ │ │ ├── decomposition_contract_training_test.py │ │ │ ├── docstring_coverage_test.py │ │ │ ├── generation_batch_navigation_wiring.py │ │ │ ├── generation_metadata_file_wiring.py │ │ │ ├── generation_metadata_file_wiring_test.py │ │ │ ├── generation_metadata_wiring.py │ │ │ ├── generation_mode_wiring.py │ │ │ ├── generation_run_wiring.py │ │ │ ├── generation_service_wiring.py │ │ │ ├── generation_service_wiring_test.py │ │ │ ├── generation_text_format_wiring.py │ │ │ ├── results_aux_wiring.py │ │ │ ├── results_display_wiring.py │ │ │ ├── results_display_wiring_test.py │ │ │ ├── training_dataset_builder_wiring.py │ │ │ ├── training_dataset_preprocess_wiring.py │ │ │ ├── training_lokr_wiring.py │ │ │ └── training_run_wiring.py │ │ ├── help_content.py │ │ ├── help_content_i18n_test.py │ │ ├── help_content_md_test.py │ │ ├── help_content_misc_test.py │ │ ├── help_content_test.py │ │ ├── help_content_test_helpers.py │ │ ├── i18n/ │ │ │ ├── __init__.py │ │ │ ├── en.json │ │ │ ├── he.json │ │ │ ├── i18n.py │ │ │ ├── i18n_thread_safety_test.py │ │ │ ├── ja.json │ │ │ └── zh.json │ │ └── interfaces/ │ │ ├── __init__.py │ │ ├── audio_player_preferences.js │ │ ├── audio_player_preferences.py │ │ ├── audio_player_preferences_test.py │ │ ├── dataset.py │ │ ├── generation.py │ │ ├── generation_advanced_dit_controls.py │ │ ├── generation_advanced_output_controls.py │ │ ├── generation_advanced_primary_controls.py │ │ ├── generation_advanced_settings.py │ │ ├── generation_contract_ast_utils.py │ │ ├── generation_decomposition_contract_test.py │ │ ├── generation_defaults.py │ │ ├── generation_service_config.py │ │ ├── generation_service_config_rows.py │ │ ├── generation_service_config_rows_test.py │ │ ├── generation_service_config_toggles.py │ │ ├── generation_tab_generate_controls.py │ │ ├── generation_tab_optional_controls.py │ │ ├── generation_tab_optional_controls_test.py │ │ ├── generation_tab_primary_controls.py │ │ ├── generation_tab_runtime_controls.py │ │ ├── generation_tab_secondary_controls.py │ │ ├── generation_tab_section.py │ │ ├── generation_tab_simple_controls.py │ │ ├── generation_tab_source_controls.py │ │ ├── result.py │ │ ├── training.py │ │ ├── training_contract_ast_utils.py │ │ ├── training_dataset_builder_tab.py │ │ ├── training_dataset_tab_label_preview.py │ │ ├── training_dataset_tab_save_preprocess.py │ │ ├── training_dataset_tab_scan_settings.py │ │ ├── training_decomposition_contract_test.py │ │ ├── training_lokr_tab.py │ │ ├── training_lokr_tab_dataset.py │ │ ├── training_lokr_tab_run_export.py │ │ ├── training_lora_tab.py │ │ ├── training_lora_tab_dataset.py │ │ └── training_lora_tab_run_export.py │ └── streamlit/ │ ├── .gitignore │ ├── .streamlit/ │ │ ├── config.toml │ │ └── secrets.toml │ ├── INSTALL.md │ ├── PROJECT_SUMMARY.md │ ├── QUICKSTART.md │ ├── README.md │ ├── components/ │ │ ├── __init__.py │ │ ├── audio_player.py │ │ ├── batch_generator.py │ │ ├── dashboard.py │ │ ├── editor.py │ │ ├── editor_audio_picker.py │ │ ├── editor_runner.py │ │ ├── editor_tasks.py │ │ ├── editor_waveform.py │ │ ├── generation_wizard.py │ │ └── settings_panel.py │ ├── config.py │ ├── main.py │ ├── requirements.txt │ ├── run.bat │ ├── run.sh │ └── utils/ │ ├── __init__.py │ ├── audio_utils.py │ ├── cache.py │ └── project_manager.py ├── check_update.bat ├── check_update.sh ├── cli.py ├── close_api_server.sh ├── docker-compose.jetson.yml ├── docs/ │ ├── .vitepress/ │ │ ├── config.mts │ │ └── theme/ │ │ ├── custom.css │ │ └── index.ts │ ├── en/ │ │ ├── ACE-Step1.5-Rocm-Manual-Linux.md │ │ ├── API.md │ │ ├── BENCHMARK.md │ │ ├── CLI.md │ │ ├── GPU_COMPATIBILITY.md │ │ ├── GPU_TROUBLESHOOTING.md │ │ ├── GRADIO_GUIDE.md │ │ ├── INFERENCE.md │ │ ├── INSTALL.md │ │ ├── LoRA_Training_Tutorial.md │ │ ├── Openrouter_API_DOC.md │ │ ├── Tutorial.md │ │ ├── VST3_BACKEND_CONTRACT.md │ │ ├── VST3_MVP.md │ │ ├── VST3_SETUP.md │ │ ├── ace_step_musicians_guide.md │ │ ├── index.md │ │ └── studio.md │ ├── index.md │ ├── ja/ │ │ ├── API.md │ │ ├── GPU_COMPATIBILITY.md │ │ ├── GRADIO_GUIDE.md │ │ ├── INFERENCE.md │ │ ├── INSTALL.md │ │ ├── LoRA_Training_Tutorial.md │ │ ├── Openrouter_API_DOC.md │ │ ├── Tutorial.md │ │ └── index.md │ ├── ko/ │ │ ├── API.md │ │ ├── GPU_COMPATIBILITY.md │ │ ├── GRADIO_GUIDE.md │ │ ├── INFERENCE.md │ │ ├── LoRA_Training_Tutorial.md │ │ ├── Openrouter_API_DOC.md │ │ ├── Tutorial.md │ │ └── index.md │ ├── sidestep/ │ │ ├── Dataset Preparation.md │ │ ├── End-to-End Tutorial.md │ │ ├── Estimation Guide.md │ │ ├── Getting Started.md │ │ ├── Model Management.md │ │ ├── ObsidianREADME.md │ │ ├── Preset Management.md │ │ ├── RepositoryREADME.md │ │ ├── Shift and Timestep Sampling.md │ │ ├── The Settings Wizard.md │ │ ├── Training Guide.md │ │ ├── Using Your Adapter.md │ │ ├── VRAM Optimization Guide.md │ │ └── Windows Notes.md │ └── zh/ │ ├── API.md │ ├── BENCHMARK.md │ ├── GPU_COMPATIBILITY.md │ ├── GRADIO_GUIDE.md │ ├── INFERENCE.md │ ├── INSTALL.md │ ├── LoRA_Training_Tutorial.md │ ├── Openrouter_API_DOC.md │ ├── Tutorial.md │ └── index.md ├── examples/ │ ├── simple_mode/ │ │ ├── example_01.json │ │ ├── example_02.json │ │ ├── example_03.json │ │ ├── example_04.json │ │ ├── example_05.json │ │ ├── example_06.json │ │ ├── example_07.json │ │ ├── example_08.json │ │ ├── example_09.json │ │ ├── example_10.json │ │ ├── example_100.json │ │ ├── example_101.json │ │ ├── example_102.json │ │ ├── example_103.json │ │ ├── example_104.json │ │ ├── example_105.json │ │ ├── example_106.json │ │ ├── example_107.json │ │ ├── example_108.json │ │ ├── example_109.json │ │ ├── example_11.json │ │ ├── example_110.json │ │ ├── example_111.json │ │ ├── example_112.json │ │ ├── example_113.json │ │ ├── example_114.json │ │ ├── example_115.json │ │ ├── example_116.json │ │ ├── example_117.json │ │ ├── example_118.json │ │ ├── example_119.json │ │ ├── example_12.json │ │ ├── example_120.json │ │ ├── example_121.json │ │ ├── example_122.json │ │ ├── example_123.json │ │ ├── example_124.json │ │ ├── example_125.json │ │ ├── example_126.json │ │ ├── example_127.json │ │ ├── example_128.json │ │ ├── example_129.json │ │ ├── example_13.json │ │ ├── example_130.json │ │ ├── example_131.json │ │ ├── example_132.json │ │ ├── example_133.json │ │ ├── example_134.json │ │ ├── example_135.json │ │ ├── example_136.json │ │ ├── example_137.json │ │ ├── example_138.json │ │ ├── example_139.json │ │ ├── example_14.json │ │ ├── example_140.json │ │ ├── example_141.json │ │ ├── example_142.json │ │ ├── example_143.json │ │ ├── example_144.json │ │ ├── example_145.json │ │ ├── example_146.json │ │ ├── example_147.json │ │ ├── example_148.json │ │ ├── example_149.json │ │ ├── example_15.json │ │ ├── example_150.json │ │ ├── example_151.json │ │ ├── example_152.json │ │ ├── example_153.json │ │ ├── example_154.json │ │ ├── example_155.json │ │ ├── example_156.json │ │ ├── example_157.json │ │ ├── example_158.json │ │ ├── example_159.json │ │ ├── example_16.json │ │ ├── example_160.json │ │ ├── example_161.json │ │ ├── example_162.json │ │ ├── example_163.json │ │ ├── example_164.json │ │ ├── example_165.json │ │ ├── example_166.json │ │ ├── example_167.json │ │ ├── example_168.json │ │ ├── example_169.json │ │ ├── example_17.json │ │ ├── example_170.json │ │ ├── example_171.json │ │ ├── example_172.json │ │ ├── example_173.json │ │ ├── example_174.json │ │ ├── example_175.json │ │ ├── example_176.json │ │ ├── example_177.json │ │ ├── example_178.json │ │ ├── example_179.json │ │ ├── example_18.json │ │ ├── example_180.json │ │ ├── example_181.json │ │ ├── example_182.json │ │ ├── example_183.json │ │ ├── example_184.json │ │ ├── example_185.json │ │ ├── example_186.json │ │ ├── example_187.json │ │ ├── example_188.json │ │ ├── example_189.json │ │ ├── example_19.json │ │ ├── example_190.json │ │ ├── example_191.json │ │ ├── example_192.json │ │ ├── example_193.json │ │ ├── example_194.json │ │ ├── example_195.json │ │ ├── example_196.json │ │ ├── example_197.json │ │ ├── example_198.json │ │ ├── example_199.json │ │ ├── example_20.json │ │ ├── example_200.json │ │ ├── example_21.json │ │ ├── example_22.json │ │ ├── example_23.json │ │ ├── example_24.json │ │ ├── example_25.json │ │ ├── example_26.json │ │ ├── example_27.json │ │ ├── example_28.json │ │ ├── example_29.json │ │ ├── example_30.json │ │ ├── example_31.json │ │ ├── example_32.json │ │ ├── example_33.json │ │ ├── example_34.json │ │ ├── example_35.json │ │ ├── example_36.json │ │ ├── example_37.json │ │ ├── example_38.json │ │ ├── example_39.json │ │ ├── example_40.json │ │ ├── example_41.json │ │ ├── example_42.json │ │ ├── example_43.json │ │ ├── example_44.json │ │ ├── example_45.json │ │ ├── example_46.json │ │ ├── example_47.json │ │ ├── example_48.json │ │ ├── example_49.json │ │ ├── example_50.json │ │ ├── example_51.json │ │ ├── example_52.json │ │ ├── example_53.json │ │ ├── example_54.json │ │ ├── example_55.json │ │ ├── example_56.json │ │ ├── example_57.json │ │ ├── example_58.json │ │ ├── example_59.json │ │ ├── example_60.json │ │ ├── example_61.json │ │ ├── example_62.json │ │ ├── example_63.json │ │ ├── example_64.json │ │ ├── example_65.json │ │ ├── example_66.json │ │ ├── example_67.json │ │ ├── example_68.json │ │ ├── example_69.json │ │ ├── example_70.json │ │ ├── example_71.json │ │ ├── example_72.json │ │ ├── example_73.json │ │ ├── example_74.json │ │ ├── example_75.json │ │ ├── example_76.json │ │ ├── example_77.json │ │ ├── example_78.json │ │ ├── example_79.json │ │ ├── example_80.json │ │ ├── example_81.json │ │ ├── example_82.json │ │ ├── example_83.json │ │ ├── example_84.json │ │ ├── example_85.json │ │ ├── example_86.json │ │ ├── example_87.json │ │ ├── example_88.json │ │ ├── example_89.json │ │ ├── example_90.json │ │ ├── example_91.json │ │ ├── example_92.json │ │ ├── example_93.json │ │ ├── example_94.json │ │ ├── example_95.json │ │ ├── example_96.json │ │ ├── example_97.json │ │ ├── example_98.json │ │ └── example_99.json │ └── text2music/ │ ├── example_01.json │ ├── example_02.json │ ├── example_03.json │ ├── example_04.json │ ├── example_05.json │ ├── example_06.json │ ├── example_07.json │ ├── example_08.json │ ├── example_09.json │ ├── example_10.json │ ├── example_100.json │ ├── example_101.json │ ├── example_102.json │ ├── example_103.json │ ├── example_104.json │ ├── example_105.json │ ├── example_106.json │ ├── example_107.json │ ├── example_108.json │ ├── example_109.json │ ├── example_11.json │ ├── example_110.json │ ├── example_111.json │ ├── example_112.json │ ├── example_113.json │ ├── example_114.json │ ├── example_115.json │ ├── example_116.json │ ├── example_117.json │ ├── example_118.json │ ├── example_119.json │ ├── example_12.json │ ├── example_120.json │ ├── example_121.json │ ├── example_122.json │ ├── example_123.json │ ├── example_124.json │ ├── example_125.json │ ├── example_126.json │ ├── example_127.json │ ├── example_128.json │ ├── example_129.json │ ├── example_13.json │ ├── example_130.json │ ├── example_131.json │ ├── example_132.json │ ├── example_133.json │ ├── example_134.json │ ├── example_135.json │ ├── example_136.json │ ├── example_137.json │ ├── example_138.json │ ├── example_139.json │ ├── example_14.json │ ├── example_140.json │ ├── example_141.json │ ├── example_142.json │ ├── example_143.json │ ├── example_144.json │ ├── example_145.json │ ├── example_146.json │ ├── example_147.json │ ├── example_148.json │ ├── example_149.json │ ├── example_15.json │ ├── example_150.json │ ├── example_151.json │ ├── example_152.json │ ├── example_153.json │ ├── example_154.json │ ├── example_155.json │ ├── example_156.json │ ├── example_157.json │ ├── example_158.json │ ├── example_159.json │ ├── example_16.json │ ├── example_160.json │ ├── example_161.json │ ├── example_162.json │ ├── example_163.json │ ├── example_164.json │ ├── example_165.json │ ├── example_166.json │ ├── example_167.json │ ├── example_168.json │ ├── example_169.json │ ├── example_17.json │ ├── example_170.json │ ├── example_171.json │ ├── example_172.json │ ├── example_173.json │ ├── example_174.json │ ├── example_175.json │ ├── example_176.json │ ├── example_177.json │ ├── example_178.json │ ├── example_179.json │ ├── example_18.json │ ├── example_180.json │ ├── example_181.json │ ├── example_182.json │ ├── example_183.json │ ├── example_184.json │ ├── example_185.json │ ├── example_186.json │ ├── example_187.json │ ├── example_188.json │ ├── example_189.json │ ├── example_19.json │ ├── example_190.json │ ├── example_191.json │ ├── example_192.json │ ├── example_193.json │ ├── example_194.json │ ├── example_195.json │ ├── example_196.json │ ├── example_197.json │ ├── example_198.json │ ├── example_199.json │ ├── example_20.json │ ├── example_200.json │ ├── example_21.json │ ├── example_22.json │ ├── example_23.json │ ├── example_24.json │ ├── example_25.json │ ├── example_26.json │ ├── example_27.json │ ├── example_28.json │ ├── example_29.json │ ├── example_30.json │ ├── example_31.json │ ├── example_32.json │ ├── example_33.json │ ├── example_34.json │ ├── example_35.json │ ├── example_36.json │ ├── example_37.json │ ├── example_38.json │ ├── example_39.json │ ├── example_40.json │ ├── example_41.json │ ├── example_42.json │ ├── example_43.json │ ├── example_44.json │ ├── example_45.json │ ├── example_46.json │ ├── example_47.json │ ├── example_48.json │ ├── example_49.json │ ├── example_50.json │ ├── example_51.json │ ├── example_52.json │ ├── example_53.json │ ├── example_54.json │ ├── example_55.json │ ├── example_56.json │ ├── example_57.json │ ├── example_58.json │ ├── example_59.json │ ├── example_60.json │ ├── example_61.json │ ├── example_62.json │ ├── example_63.json │ ├── example_64.json │ ├── example_65.json │ ├── example_66.json │ ├── example_67.json │ ├── example_68.json │ ├── example_69.json │ ├── example_70.json │ ├── example_71.json │ ├── example_72.json │ ├── example_73.json │ ├── example_74.json │ ├── example_75.json │ ├── example_76.json │ ├── example_77.json │ ├── example_78.json │ ├── example_79.json │ ├── example_80.json │ ├── example_81.json │ ├── example_82.json │ ├── example_83.json │ ├── example_84.json │ ├── example_85.json │ ├── example_86.json │ ├── example_87.json │ ├── example_88.json │ ├── example_89.json │ ├── example_90.json │ ├── example_91.json │ ├── example_92.json │ ├── example_93.json │ ├── example_94.json │ ├── example_95.json │ ├── example_96.json │ ├── example_97.json │ ├── example_98.json │ └── example_99.json ├── generate_examples.py ├── install_uv.bat ├── install_uv.sh ├── merge_config.bat ├── merge_config.sh ├── openrouter/ │ ├── __init__.py │ ├── client_test.py │ ├── openrouter_api_server.py │ └── stress_test.py ├── package.json ├── plugins/ │ └── acestep_vst3/ │ ├── CMakeLists.txt │ ├── README.md │ └── src/ │ ├── PluginBackendClient.cpp │ ├── PluginBackendClient.h │ ├── PluginConfig.h │ ├── PluginEditor.cpp │ ├── PluginEditor.h │ ├── PluginEditorPreview.cpp │ ├── PluginEditorState.cpp │ ├── PluginEnums.cpp │ ├── PluginEnums.h │ ├── PluginMockGeneration.cpp │ ├── PluginMockGeneration.h │ ├── PluginPreview.cpp │ ├── PluginPreview.h │ ├── PluginProcessor.cpp │ ├── PluginProcessor.h │ ├── PluginState.cpp │ └── PluginState.h ├── profile_inference.py ├── proxy_config.txt.example ├── pyproject.toml ├── quick_test.bat ├── quick_test.sh ├── requirements-rocm-linux.txt ├── requirements-rocm.txt ├── requirements-sidestep.txt ├── requirements-xpu.txt ├── requirements.txt ├── run_api_server.sh ├── run_openrouter_api_server.sh ├── scripts/ │ ├── check_gpu.py │ ├── fetch-awesome.mjs │ ├── lora_data_prepare/ │ │ ├── elevenlabs_transcription.py │ │ ├── gemini_caption.py │ │ └── whisper_transcription.py │ ├── new_pr_branch.ps1 │ ├── prepare_vae_calibration_data.py │ └── profile_vram.py ├── setup_xpu.bat ├── start_api_server.bat ├── start_api_server.sh ├── start_api_server_macos.sh ├── start_api_server_rocm.bat ├── start_api_server_rocm.sh ├── start_api_server_xpu.bat ├── start_gradio_ui.bat ├── start_gradio_ui.sh ├── start_gradio_ui_macos.sh ├── start_gradio_ui_macos_manual.sh ├── start_gradio_ui_manual.bat ├── start_gradio_ui_manual.sh ├── start_gradio_ui_rocm.bat ├── start_gradio_ui_rocm.sh ├── start_gradio_ui_rocm_manual.bat ├── start_gradio_ui_rocm_manual.sh ├── start_gradio_ui_xpu.bat ├── start_gradio_ui_xpu_manual.bat ├── test_env_detection.bat ├── test_env_detection.sh ├── test_git_update.bat ├── test_git_update.sh ├── train.py └── ui/ ├── studio.html └── studio_html_test.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .claude/skills/acestep/SKILL.md ================================================ --- name: acestep description: Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation. allowed-tools: Read, Write, Bash, Skill --- # ACE-Step Music Generation Skill Use ACE-Step V1.5 API for music generation. **Always use `scripts/acestep.sh` script** — do NOT call API endpoints directly. ## Quick Start ```bash # 1. cd to this skill's directory cd {project_root}/{.claude or .codex}/skills/acestep/ # 2. Check API service health ./scripts/acestep.sh health # 3. Generate with lyrics (recommended) ./scripts/acestep.sh generate -c "pop, female vocal, piano" -l "[Verse] Your lyrics here..." --duration 120 --language zh # 4. Output saved to: {project_root}/acestep_output/ ``` ## Workflow For user requests requiring vocals: 1. Use the **acestep-songwriting** skill for lyrics writing, caption creation, duration/BPM/key selection 2. Write complete, well-structured lyrics yourself based on the songwriting guide 3. Generate using Caption mode with `-c` and `-l` parameters Only use Simple/Random mode (`-d` or `random`) for quick inspiration or instrumental exploration. If the user needs a simple music video, use the **acestep-simplemv** skill to render one with waveform visualization and synced lyrics. **MV Production Requirements**: Making a simple MV requires three additional skills to be installed: - **acestep-songwriting** — for writing lyrics and planning song structure - **acestep-lyrics-transcription** — for transcribing audio to timestamped lyrics (LRC) - **acestep-simplemv** — for rendering the final music video - **acestep-thumbnail** (optional) — for generating cover art / MV background images via Gemini API **MV Background Image**: When the user requests MV production, ask whether they want a background image for the video: 1. **Generate via Gemini** — use the **acestep-thumbnail** skill (requires Gemini API key configuration) 2. **Provide an existing image** — user supplies a local image path 3. **Skip** — use the default animated gradient background (no image needed) Use `AskUserQuestion` to let the user choose before proceeding with MV rendering. **Parallel Processing**: Lyrics transcription and thumbnail generation are independent tasks. When the user chooses to generate a background image, run **acestep-lyrics-transcription** and **acestep-thumbnail** in parallel (e.g. via two concurrent Agent calls) to save time, then use both outputs for the final MV render. ## Script Commands **CRITICAL - Complete Lyrics Input**: When providing lyrics via the `-l` parameter, you MUST pass ALL lyrics content WITHOUT any omission: - If user provides lyrics, pass the ENTIRE text they give you - If you generate lyrics yourself, pass the COMPLETE lyrics you created - NEVER truncate, shorten, or pass only partial lyrics - Missing lyrics will result in incomplete or incoherent songs **Music Parameters**: Use the **acestep-songwriting** skill for guidance on duration, BPM, key scale, and time signature. ```bash # need to cd to this skill's directory first cd {project_root}/{.claude or .codex}/skills/acestep/ # Caption mode - RECOMMENDED: Write lyrics first, then generate ./scripts/acestep.sh generate -c "Electronic pop, energetic synths" -l "[Verse] Your complete lyrics [Chorus] Full chorus here..." --duration 120 --bpm 128 # Instrumental only ./scripts/acestep.sh generate "Jazz with saxophone" # Quick exploration (Simple/Random mode) ./scripts/acestep.sh generate -d "A cheerful song about spring" ./scripts/acestep.sh random # Cover / Repainting from source audio ./scripts/acestep.sh cover song.mp3 -c "Rock cover style" -l "[Verse] Lyrics..." --duration 120 --bpm 128 ./scripts/acestep.sh generate --src-audio song.mp3 --task-type repaint -c "Pop" --repaint-start 30 --repaint-end 60 # Music attribute options ./scripts/acestep.sh generate "Rock" --duration 60 --bpm 120 --key-scale "C major" --time-sig "4/4" ./scripts/acestep.sh generate "Rock" --duration 60 --batch 2 ./scripts/acestep.sh generate "EDM" --no-thinking # Faster # Other commands ./scripts/acestep.sh status ./scripts/acestep.sh health ./scripts/acestep.sh models ``` ### Cover / Audio Repainting The `cover` command generates music based on a source audio file. The audio is base64-encoded and sent to the API. ```bash # Cover: regenerate with new style/lyrics, preserving melody structure ./scripts/acestep.sh cover input.mp3 -c "Jazz cover" -l "[Verse] New lyrics..." --duration 120 # Repainting: modify a specific region of the audio ./scripts/acestep.sh generate --src-audio input.mp3 --task-type repaint -c "Pop ballad" --repaint-start 30 --repaint-end 90 # Cover options # --src-audio Source audio file path # --task-type cover (default with --src-audio), repaint, text2music # --cover-strength 0.0-1.0 (default: 1.0, higher = closer to source) # --repaint-start Repainting start position (seconds) # --repaint-end Repainting end position (seconds) # --key-scale Musical key (e.g. "E minor") # --time-signature Time signature (e.g. "4/4") ``` **Note**: For cloud API usage, large audio files may be rejected by Cloudflare. Compress audio before uploading if needed (e.g. using ffmpeg: `ffmpeg -i input.mp3 -b:a 64k -ar 24000 -ac 1 compressed.mp3`). ## Output Files After generation, the script automatically saves results to the `acestep_output` folder in the project root (same level as `.claude`): ``` project_root/ ├── .claude/ │ └── skills/acestep/... ├── acestep_output/ # Output directory │ ├── .json # Complete task result (JSON) │ ├── _1.mp3 # First audio file │ ├── _2.mp3 # Second audio file (if batch_size > 1) │ └── ... └── ... ``` ### JSON Result Structure **Important**: When LM enhancement is enabled (`use_format=true`), the final synthesized content may differ from your input. Check the JSON file for actual values: | Field | Description | |-------|-------------| | `prompt` | **Actual caption** used for synthesis (may be LM-enhanced) | | `lyrics` | **Actual lyrics** used for synthesis (may be LM-enhanced) | | `metas.prompt` | Original input caption | | `metas.lyrics` | Original input lyrics | | `metas.bpm` | BPM used | | `metas.keyscale` | Key scale used | | `metas.duration` | Duration in seconds | | `generation_info` | Detailed timing and model info | | `seed_value` | Seeds used (for reproducibility) | | `lm_model` | LM model name | | `dit_model` | DiT model name | To get the actual synthesized lyrics, parse the JSON and read the top-level `lyrics` field, not `metas.lyrics`. ## Configuration **Important**: Configuration follows this priority (high to low): 1. **Command line arguments** > **config.json defaults** 2. User-specified parameters **temporarily override** defaults but **do not modify** config.json 3. Only `config --set` command **permanently modifies** config.json ### Default Config File (`scripts/config.json`) ```json { "api_url": "http://127.0.0.1:8001", "api_key": "", "api_mode": "completion", "generation": { "thinking": true, "use_format": false, "use_cot_caption": true, "use_cot_language": false, "batch_size": 1, "audio_format": "mp3", "vocal_language": "en" } } ``` | Option | Default | Description | |--------|---------|-------------| | `api_url` | `http://127.0.0.1:8001` | API server address | | `api_key` | `""` | API authentication key (optional) | | `api_mode` | `completion` | API mode: `completion` (OpenRouter, default) or `native` (polling) | | `generation.thinking` | `true` | Enable 5Hz LM (higher quality, slower) | | `generation.audio_format` | `mp3` | Output format (mp3/wav/flac) | | `generation.vocal_language` | `en` | Vocal language | ## Prerequisites - ACE-Step API Service **IMPORTANT**: This skill requires the ACE-Step API server to be running. ### Required Dependencies The `scripts/acestep.sh` script requires: **curl** and **jq**. ```bash # Check dependencies curl --version jq --version ``` If jq is not installed, the script will attempt to install it automatically. If automatic installation fails: - **Windows**: `choco install jq` or download from https://jqlang.github.io/jq/download/ - **macOS**: `brew install jq` - **Linux**: `sudo apt-get install jq` (Debian/Ubuntu) or `sudo dnf install jq` (Fedora) ### Before First Use **You MUST check the API key and URL status before proceeding.** Run: ```bash cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --check-key cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --get api_url ``` #### Case 1: Using Official Cloud API (`https://api.acemusic.ai`) without API key If `api_url` is `https://api.acemusic.ai` and `api_key` is `empty`, you MUST stop and guide the user to configure their key: 1. Tell the user: "You're using the ACE-Step official cloud API, but no API key is configured. An API key is required to use this service." 2. Explain how to get a key: API keys are currently available through [acemusic.ai](https://acemusic.ai/api-key) for free. 3. Use `AskUserQuestion` to ask the user to provide their API key. 4. Once provided, configure it: ```bash cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --set api_key ``` 5. Additionally, inform the user: "If you also want to render music videos (MV), it's recommended to configure a lyrics transcription API key as well (OpenAI Whisper or ElevenLabs Scribe), so that lyrics can be automatically transcribed with accurate timestamps. You can configure it later via the `acestep-lyrics-transcription` skill." #### Case 2: API key is configured Verify the API endpoint: `./scripts/acestep.sh health` and proceed with music generation. #### Case 3: Using local/custom API without key Local services (`http://127.0.0.1:*`) typically don't require a key. Verify with `./scripts/acestep.sh health` and proceed. If health check fails: - Ask: "Do you have ACE-Step installed?" - **If installed but not running**: Use the acestep-docs skill to help them start the service - **If not installed**: Use acestep-docs skill to guide through installation ### Service Configuration **Official Cloud API:** ACE-Step provides an official API endpoint at `https://api.acemusic.ai`. To use it: ```bash ./scripts/acestep.sh config --set api_url "https://api.acemusic.ai" ./scripts/acestep.sh config --set api_key "your-key" ./scripts/acestep.sh config --set api_mode completion ``` API keys are currently available through [acemusic.ai](https://acemusic.ai/api-key) for free. **Local Service (Default):** No configuration needed — connects to `http://127.0.0.1:8001`. **Custom Remote Service:** Update `scripts/config.json` or use: ```bash ./scripts/acestep.sh config --set api_url "http://remote-server:8001" ./scripts/acestep.sh config --set api_key "your-key" ``` **API Key Handling**: When checking whether an API key is configured, use `config --check-key` which only reports `configured` or `empty` without printing the actual key. **NEVER use `config --get api_key`** or read `config.json` directly — these would expose the user's API key. The `config --list` command is safe — it automatically masks API keys as `***` in output. ### API Mode The skill supports two API modes. Switch via `api_mode` in `scripts/config.json`: | Mode | Endpoint | Description | |------|----------|-------------| | `completion` (default) | `/v1/chat/completions` | OpenRouter-compatible, sync request, audio returned as base64 | | `native` | `/release_task` + `/query_result` | Async polling mode, supports all parameters | **Switch mode:** ```bash ./scripts/acestep.sh config --set api_mode completion ./scripts/acestep.sh config --set api_mode native ``` **Completion mode notes:** - No polling needed — single request returns result directly - Audio is base64-encoded inline in the response (auto-decoded and saved) - `inference_steps`, `infer_method`, `shift` are not configurable (server defaults) - `--no-wait` and `status` commands are not applicable in completion mode - Requires `model` field — auto-detected from `/v1/models` if not specified ### Using acestep-docs Skill for Setup Help **IMPORTANT**: For installation and startup, always use the acestep-docs skill to get complete and accurate guidance. **DO NOT provide simplified startup commands** - each user's environment may be different. Always guide them to use acestep-docs for proper setup. --- For API debugging, see [API Reference](./api-reference.md). ================================================ FILE: .claude/skills/acestep/api-reference.md ================================================ # ACE-Step API Reference > For debugging and advanced usage only. Normal operations should use `scripts/acestep.sh`. ## Native Mode Endpoints All responses wrapped: `{"data": , "code": 200, "error": null, "timestamp": ...}` | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/release_task` | POST | Create generation task | | `/query_result` | POST | Query task status, body: `{"task_id_list": ["id"]}` | | `/v1/models` | GET | List available models | | `/v1/audio?path={path}` | GET | Download audio file | ## Completion Mode Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/chat/completions` | POST | Generate music (OpenRouter-compatible) | | `/v1/models` | GET | List available models (OpenRouter format) | ## Query Result Response ```json { "data": [{ "task_id": "xxx", "status": 1, "result": "[{\"file\":\"/v1/audio?path=...\",\"metas\":{\"bpm\":120,\"duration\":60,\"keyscale\":\"C Major\"}}]" }] } ``` Status codes: `0` = processing, `1` = success, `2` = failed ## Completion Mode Request (`/v1/chat/completions`) **Caption mode** — prompt and lyrics wrapped in XML tags inside message content: ```json { "model": "acestep/ACE-Step-v1.5", "messages": [{"role": "user", "content": "Jazz with saxophone[Verse] Hello..."}], "stream": false, "thinking": true, "use_format": false, "audio_config": {"duration": 90, "bpm": 110, "format": "mp3", "vocal_language": "en"} } ``` **Simple mode** — plain text message, set `sample_mode: true`: ```json { "model": "acestep/ACE-Step-v1.5", "messages": [{"role": "user", "content": "A cheerful pop song about spring"}], "stream": false, "sample_mode": true, "thinking": true } ``` ## Completion Mode Response ```json { "id": "chatcmpl-abc123", "choices": [{ "message": { "role": "assistant", "content": "## Metadata\n**Caption:** ...\n**BPM:** 128\n\n## Lyrics\n...", "audio": [{"type": "audio_url", "audio_url": {"url": "data:audio/mpeg;base64,..."}}] }, "finish_reason": "stop" }] } ``` Audio is base64-encoded inline — the script auto-decodes and saves to `acestep_output/`. ## Request Parameters (`/release_task`) Parameters can be placed in `param_obj` object. ### Generation Modes | Mode | Usage | When to Use | |------|-------|-------------| | **Caption** (Recommended) | `generate -c "style" -l "lyrics"` | For vocal songs - write lyrics yourself first | | **Simple** | `generate -d "description"` | Quick exploration, LM generates everything | | **Random** | `random` | Random generation for inspiration | ### Core Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `prompt` | string | "" | Music style description (Caption mode) | | `lyrics` | string | "" | **Full lyrics content** - Pass ALL lyrics without omission. Use `[inst]` for instrumental. Partial/truncated lyrics = incomplete songs | | `sample_mode` | bool | false | Enable Simple/Random mode | | `sample_query` | string | "" | Description for Simple mode | | `thinking` | bool | false | Enable 5Hz LM for audio code generation | | `use_format` | bool | false | Use LM to enhance caption/lyrics | | `model` | string | - | DiT model name | | `batch_size` | int | 1 | Number of audio files to generate | ### Music Attributes | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `audio_duration` | float | - | Duration in seconds | | `bpm` | int | - | Tempo (beats per minute) | | `key_scale` | string | "" | Key (e.g. "C Major") | | `time_signature` | string | "" | Time signature (e.g. "4/4") | | `vocal_language` | string | "en" | Language code (en, zh, ja, etc.) | | `audio_format` | string | "mp3" | Output format (mp3/wav/flac) | ### Generation Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `inference_steps` | int | 8 | Diffusion steps | | `guidance_scale` | float | 7.0 | CFG scale | | `seed` | int | -1 | Random seed (-1 for random) | | `infer_method` | string | "ode" | Diffusion method (ode/sde) | ### Audio Task Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `task_type` | string | "text2music" | text2music / continuation / repainting | | `src_audio_path` | string | - | Source audio for continuation | | `repainting_start` | float | 0.0 | Repainting start position (seconds) | | `repainting_end` | float | - | Repainting end position (seconds) | ### Example Request (Simple Mode) ```json { "sample_mode": true, "sample_query": "A cheerful pop song about spring", "thinking": true, "param_obj": { "duration": 60, "bpm": 120, "language": "en" }, "batch_size": 2 } ``` ================================================ FILE: .claude/skills/acestep/scripts/acestep.sh ================================================ #!/bin/bash # # ACE-Step Music Generation CLI (Bash + Curl + jq) # # Requirements: curl, jq # # Usage: # ./acestep.sh generate "Music description" [options] # ./acestep.sh random [--no-thinking] # ./acestep.sh status # ./acestep.sh models # ./acestep.sh health # ./acestep.sh config [--get|--set|--reset] # # Output: # - Results saved to output/.json # - Audio files downloaded to output/_1.mp3, output/_2.mp3, ... set -e # Ensure UTF-8 encoding for non-ASCII characters (Japanese, Chinese, etc.) export LANG="${LANG:-en_US.UTF-8}" export LC_ALL="${LC_ALL:-en_US.UTF-8}" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" CONFIG_FILE="${SCRIPT_DIR}/config.json" # Output dir at same level as .claude (go up 4 levels from scripts/) OUTPUT_DIR="$(cd "${SCRIPT_DIR}/../../../.." && pwd)/acestep_output" DEFAULT_API_URL="http://127.0.0.1:8001" STAR_MARKER_FILE="${SCRIPT_DIR}/.first_gen_done" # Colors RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' CYAN='\033[0;36m' NC='\033[0m' BOLD='\033[1m' # Show GitHub star prompt on first successful generation show_star_prompt() { if [ ! -f "$STAR_MARKER_FILE" ]; then touch "$STAR_MARKER_FILE" echo "" echo -e "${YELLOW}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" echo -e "${BOLD} ACE-Step is free and open-source.${NC}" echo -e " If you enjoyed this, a ${YELLOW}★ Star${NC} on GitHub means a lot to us!" echo -e " ${CYAN}→ https://github.com/ace-step/ACE-Step-1.5${NC}" echo -e "${YELLOW}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" fi } # Check dependencies check_deps() { if ! command -v curl &> /dev/null; then echo -e "${RED}Error: curl is required but not installed.${NC}" exit 1 fi if ! command -v jq &> /dev/null; then echo -e "${RED}Error: jq is required but not installed.${NC}" echo "Install: apt install jq / brew install jq / choco install jq" exit 1 fi } # JSON value extractor using jq # Usage: json_get "$json" ".key" or json_get "$json" ".nested.key" json_get() { local json="$1" local path="$2" echo "$json" | jq -r "$path // empty" 2>/dev/null } # Extract array values using jq json_get_array() { local json="$1" local path="$2" echo "$json" | jq -r "$path[]? // empty" 2>/dev/null } # Ensure output directory exists ensure_output_dir() { mkdir -p "$OUTPUT_DIR" } # Default config DEFAULT_CONFIG='{ "api_url": "http://127.0.0.1:8001", "api_key": "", "api_mode": "native", "generation": { "thinking": true, "use_format": true, "use_cot_caption": true, "use_cot_language": true, "audio_format": "mp3", "vocal_language": "en" } }' # Ensure config file exists ensure_config() { if [ ! -f "$CONFIG_FILE" ]; then local example="${SCRIPT_DIR}/config.example.json" if [ -f "$example" ]; then cp "$example" "$CONFIG_FILE" echo -e "${YELLOW}Config file created from config.example.json. Please configure your settings:${NC}" echo -e " ${CYAN}./scripts/acestep.sh config --set api_url ${NC}" echo -e " ${CYAN}./scripts/acestep.sh config --set api_key ${NC}" else echo "$DEFAULT_CONFIG" > "$CONFIG_FILE" fi fi } # Get config value using jq get_config() { local key="$1" ensure_config # Convert dot notation to jq path: "generation.thinking" -> ".generation.thinking" local jq_path=".${key}" local value # Don't use // operator as it treats boolean false as falsy value=$(jq -r "$jq_path" "$CONFIG_FILE" 2>/dev/null) # Remove any trailing whitespace/newlines (Windows compatibility) # Return empty string if value is "null" (key doesn't exist) if [ "$value" = "null" ]; then echo "" else echo "$value" | tr -d '\r\n' fi } # Normalize boolean value for jq --argjson normalize_bool() { local val="$1" local default="${2:-false}" case "$val" in true|True|TRUE|1) echo "true" ;; false|False|FALSE|0) echo "false" ;; *) echo "$default" ;; esac } # Set config value using jq set_config() { local key="$1" local value="$2" ensure_config local tmp_file="${CONFIG_FILE}.tmp" local jq_path=".${key}" # Determine value type and set accordingly if [ "$value" = "true" ] || [ "$value" = "false" ]; then jq "$jq_path = $value" "$CONFIG_FILE" > "$tmp_file" elif [[ "$value" =~ ^-?[0-9]+$ ]] || [[ "$value" =~ ^-?[0-9]+\.[0-9]+$ ]]; then jq "$jq_path = $value" "$CONFIG_FILE" > "$tmp_file" else jq "$jq_path = \"$value\"" "$CONFIG_FILE" > "$tmp_file" fi mv "$tmp_file" "$CONFIG_FILE" echo "Set $key = $value" } # Load API URL load_api_url() { local url=$(get_config "api_url") echo "${url:-$DEFAULT_API_URL}" } # Load API Key load_api_key() { local key=$(get_config "api_key") echo "${key:-}" } # Check API health check_health() { local url="$1" local status status=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "${url}/health" 2>/dev/null) || true [ "$status" = "200" ] } # Build auth header build_auth_header() { local api_key=$(load_api_key) if [ -n "$api_key" ]; then echo "-H \"Authorization: Bearer ${api_key}\"" fi } # Prompt for URL prompt_for_url() { echo "" echo -e "${YELLOW}API server is not responding.${NC}" echo "Please enter the API URL (or press Enter for default):" read -p "API URL [$DEFAULT_API_URL]: " user_input echo "${user_input:-$DEFAULT_API_URL}" } # Ensure API connection ensure_connection() { ensure_config local api_url=$(load_api_url) if check_health "$api_url"; then echo "$api_url" return 0 fi echo -e "${YELLOW}Cannot connect to: $api_url${NC}" >&2 local new_url=$(prompt_for_url) if check_health "$new_url"; then set_config "api_url" "$new_url" > /dev/null echo -e "${GREEN}Saved API URL: $new_url${NC}" >&2 echo "$new_url" return 0 fi echo -e "${RED}Error: Cannot connect to $new_url${NC}" >&2 exit 1 } # Save result to JSON file save_result() { local job_id="$1" local result_json="$2" ensure_output_dir local output_file="${OUTPUT_DIR}/${job_id}.json" echo "$result_json" > "$output_file" echo -e "${GREEN}Result saved: $output_file${NC}" } # Health command cmd_health() { check_deps ensure_config local api_url=$(load_api_url) echo "Checking API at: $api_url" if check_health "$api_url"; then echo -e "${GREEN}Status: OK${NC}" curl -s "${api_url}/health" echo "" else echo -e "${RED}Status: FAILED${NC}" exit 1 fi } # Config command cmd_config() { check_deps ensure_config local action="" local key="" local value="" while [[ $# -gt 0 ]]; do case $1 in --get) action="get"; key="$2"; shift 2 ;; --set) action="set"; key="$2"; value="$3"; shift 3 ;; --reset) action="reset"; shift ;; --list) action="list"; shift ;; --check-key) action="check-key"; shift ;; *) shift ;; esac done case "$action" in "check-key") local api_key=$(get_config "api_key") if [ -n "$api_key" ]; then echo "api_key: configured" else echo "api_key: empty" fi ;; "get") [ -z "$key" ] && { echo -e "${RED}Error: --get requires KEY${NC}"; exit 1; } local result=$(get_config "$key") [ -n "$result" ] && echo "$key = $result" || echo "Key not found: $key" ;; "set") [ -z "$key" ] || [ -z "$value" ] && { echo -e "${RED}Error: --set requires KEY VALUE${NC}"; exit 1; } set_config "$key" "$value" ;; "reset") echo "$DEFAULT_CONFIG" > "$CONFIG_FILE" echo -e "${GREEN}Configuration reset to defaults.${NC}" jq 'walk(if type == "object" and has("api_key") and (.api_key | length) > 0 then .api_key = "***" else . end)' "$CONFIG_FILE" ;; "list") echo "Current configuration:" jq 'walk(if type == "object" and has("api_key") and (.api_key | length) > 0 then .api_key = "***" else . end)' "$CONFIG_FILE" ;; *) echo "Config file: $CONFIG_FILE" echo "Output dir: $OUTPUT_DIR" echo "----------------------------------------" cat "$CONFIG_FILE" echo "----------------------------------------" echo "" echo "Usage:" echo " config --list Show config" echo " config --get Get value" echo " config --set Set value" echo " config --reset Reset to defaults" ;; esac } # Models command cmd_models() { check_deps local api_url=$(ensure_connection) local api_key=$(load_api_key) echo "Available Models:" echo "----------------------------------------" if [ -n "$api_key" ]; then curl -s -H "Authorization: Bearer ${api_key}" "${api_url}/v1/models" else curl -s "${api_url}/v1/models" fi echo "" } # Query job result via /query_result endpoint query_job_result() { local api_url="$1" local job_id="$2" local api_key=$(load_api_key) local payload=$(jq -n --arg id "$job_id" '{"task_id_list": [$id]}') if [ -n "$api_key" ]; then curl -s -X POST "${api_url}/query_result" \ -H "Content-Type: application/json; charset=utf-8" \ -H "Authorization: Bearer ${api_key}" \ -d "$payload" else curl -s -X POST "${api_url}/query_result" \ -H "Content-Type: application/json; charset=utf-8" \ -d "$payload" fi } # Parse query_result response to extract status (0=processing, 1=success, 2=failed) # Response is wrapped: {"data": [...], "code": 200, ...} # Uses temp file to avoid jq pipe issues with special characters on Windows parse_query_status() { local response="$1" local tmp_file=$(mktemp) printf '%s' "$response" > "$tmp_file" jq -r '.data[0].status // .[0].status // 0' "$tmp_file" rm -f "$tmp_file" } # Parse result JSON string from query_result response # The result field is a JSON string that needs to be parsed # Uses temp file to avoid jq pipe issues with special characters on Windows parse_query_result() { local response="$1" local tmp_file=$(mktemp) printf '%s' "$response" > "$tmp_file" jq -r '.data[0].result // .[0].result // "[]"' "$tmp_file" rm -f "$tmp_file" } # Extract audio file paths from result (returns newline-separated paths) # Uses temp file to avoid jq pipe issues with special characters on Windows parse_audio_files() { local result="$1" local tmp_file=$(mktemp) printf '%s' "$result" > "$tmp_file" jq -r '.[].file // empty' "$tmp_file" 2>/dev/null rm -f "$tmp_file" } # Extract metas value from result # Uses temp file to avoid jq pipe issues with special characters on Windows parse_metas_value() { local result="$1" local key="$2" local tmp_file=$(mktemp) printf '%s' "$result" > "$tmp_file" jq -r ".[0].metas.$key // .[0].$key // empty" "$tmp_file" 2>/dev/null rm -f "$tmp_file" } # Status command cmd_status() { check_deps local job_id="$1" [ -z "$job_id" ] && { echo -e "${RED}Error: job_id required${NC}"; echo "Usage: $0 status "; exit 1; } local api_url=$(ensure_connection) local response=$(query_job_result "$api_url" "$job_id") local status=$(parse_query_status "$response") echo "Job ID: $job_id" case "$status" in 0) echo "Status: processing" ;; 1) echo "Status: succeeded" echo "" local result_file=$(mktemp) parse_query_result "$response" > "$result_file" local bpm=$(jq -r '.[0].metas.bpm // .[0].bpm // empty' "$result_file" 2>/dev/null) local keyscale=$(jq -r '.[0].metas.keyscale // .[0].keyscale // empty' "$result_file" 2>/dev/null) local duration=$(jq -r '.[0].metas.duration // .[0].duration // empty' "$result_file" 2>/dev/null) echo "Result:" [ -n "$bpm" ] && echo " BPM: $bpm" [ -n "$keyscale" ] && echo " Key: $keyscale" [ -n "$duration" ] && echo " Duration: ${duration}s" # Save and download save_result "$job_id" "$response" download_audios "$api_url" "$job_id" "$result_file" rm -f "$result_file" ;; 2) echo "Status: failed" echo "" echo -e "${RED}Task failed${NC}" ;; *) echo "Status: unknown ($status)" ;; esac } # Download audio files from result file # Usage: download_audios download_audios() { local api_url="$1" local job_id="$2" local result_file="$3" local api_key=$(load_api_key) ensure_output_dir local audio_format=$(get_config "generation.audio_format") [ -z "$audio_format" ] && audio_format="mp3" # Read result file content and extract audio paths using pipe (avoid temp file path issues on Windows) local result_content result_content=$(cat "$result_file" 2>/dev/null) if [ -z "$result_content" ]; then echo -e " ${RED}Error: Result file is empty or cannot be read${NC}" return 1 fi # Extract audio paths using pipe instead of file (better Windows compatibility) local audio_paths audio_paths=$(echo "$result_content" | jq -r '.[].file // empty' 2>&1) local jq_exit_code=$? if [ $jq_exit_code -ne 0 ]; then echo -e " ${RED}Error: Failed to parse result JSON${NC}" echo -e " ${RED}jq error: $audio_paths${NC}" return 1 fi if [ -z "$audio_paths" ]; then echo -e " ${YELLOW}No audio files found in result${NC}" return 0 fi local count=1 while IFS= read -r audio_path; do # Skip empty lines and remove potential Windows carriage return audio_path=$(echo "$audio_path" | tr -d '\r') if [ -n "$audio_path" ]; then local output_file="${OUTPUT_DIR}/${job_id}_${count}.${audio_format}" local download_url="${api_url}${audio_path}" echo -e " ${CYAN}Downloading audio $count...${NC}" local curl_output local curl_exit_code if [ -n "$api_key" ]; then curl_output=$(curl -s --connect-timeout 10 --max-time 300 \ -w "%{http_code}" \ -o "$output_file" \ -H "Authorization: Bearer ${api_key}" \ "$download_url" 2>&1) curl_exit_code=$? else curl_output=$(curl -s --connect-timeout 10 --max-time 300 \ -w "%{http_code}" \ -o "$output_file" \ "$download_url" 2>&1) curl_exit_code=$? fi if [ $curl_exit_code -ne 0 ]; then echo -e " ${RED}Failed to download (curl error $curl_exit_code): $download_url${NC}" rm -f "$output_file" 2>/dev/null elif [ -f "$output_file" ] && [ -s "$output_file" ]; then echo -e " ${GREEN}Saved: $output_file${NC}" else echo -e " ${RED}Failed to download (HTTP $curl_output): $download_url${NC}" rm -f "$output_file" 2>/dev/null fi count=$((count + 1)) fi done <<< "$audio_paths" } # ============================================================================= # Completion Mode (OpenRouter /v1/chat/completions) # ============================================================================= # Load api_mode from config (default: native) load_api_mode() { local mode=$(get_config "api_mode") echo "${mode:-native}" } # Get model ID from /v1/models endpoint for completion mode get_completion_model() { local api_url="$1" local user_model="$2" local api_key=$(load_api_key) # If user specified a model, prefix with acemusic/ if needed if [ -n "$user_model" ]; then if [[ "$user_model" == */* ]]; then echo "$user_model" else echo "acemusic/${user_model}" fi return fi # Query /v1/models for the first available model local response if [ -n "$api_key" ]; then response=$(curl -s -H "Authorization: Bearer ${api_key}" "${api_url}/v1/models" 2>/dev/null) else response=$(curl -s "${api_url}/v1/models" 2>/dev/null) fi local model_id model_id=$(echo "$response" | jq -r '.data[0].id // empty' 2>/dev/null) echo "${model_id:-acemusic/acestep-v15-turbo}" } # Decode base64 audio data URL and save to file # Handles cross-platform compatibility (Linux/macOS/Windows MSYS) decode_base64_audio() { local data_url="$1" local output_file="$2" # Strip data URL prefix: data:audio/mpeg;base64,... local b64_data="${data_url#data:*;base64,}" local tmp_b64=$(mktemp) printf '%s' "$b64_data" > "$tmp_b64" if command -v base64 &> /dev/null; then # Linux / macOS / MSYS2 base64 -d < "$tmp_b64" > "$output_file" 2>/dev/null || \ base64 -D < "$tmp_b64" > "$output_file" 2>/dev/null || \ python3 -c "import base64,sys; sys.stdout.buffer.write(base64.b64decode(sys.stdin.read()))" < "$tmp_b64" > "$output_file" 2>/dev/null || \ python -c "import base64,sys; sys.stdout.buffer.write(base64.b64decode(sys.stdin.read()))" < "$tmp_b64" > "$output_file" 2>/dev/null else # Fallback to python python3 -c "import base64,sys; sys.stdout.buffer.write(base64.b64decode(sys.stdin.read()))" < "$tmp_b64" > "$output_file" 2>/dev/null || \ python -c "import base64,sys; sys.stdout.buffer.write(base64.b64decode(sys.stdin.read()))" < "$tmp_b64" > "$output_file" 2>/dev/null fi local decode_ok=$? rm -f "$tmp_b64" return $decode_ok } # Parse completion response: extract metadata, save audio files # Usage: parse_completion_response parse_completion_response() { local resp_file="$1" local job_id="$2" ensure_output_dir local audio_format=$(get_config "generation.audio_format") [ -z "$audio_format" ] && audio_format="mp3" # Check for error local finish_reason finish_reason=$(jq -r '.choices[0].finish_reason // "stop"' "$resp_file" 2>/dev/null) if [ "$finish_reason" = "error" ]; then local err_content err_content=$(jq -r '.choices[0].message.content // "Unknown error"' "$resp_file" 2>/dev/null) echo -e "${RED}Generation failed: $err_content${NC}" return 1 fi # Extract and display text content (metadata + lyrics) local content content=$(jq -r '.choices[0].message.content // empty' "$resp_file" 2>/dev/null) if [ -n "$content" ]; then echo "$content" echo "" fi # Extract and save audio files local audio_count audio_count=$(jq -r '.choices[0].message.audio | length // 0' "$resp_file" 2>/dev/null) if [ "$audio_count" -gt 0 ] 2>/dev/null; then local i=0 while [ "$i" -lt "$audio_count" ]; do local audio_url audio_url=$(jq -r ".choices[0].message.audio[$i].audio_url.url // empty" "$resp_file" 2>/dev/null) if [ -n "$audio_url" ]; then local output_file="${OUTPUT_DIR}/${job_id}_$((i+1)).${audio_format}" echo -e " ${CYAN}Decoding audio $((i+1))...${NC}" if decode_base64_audio "$audio_url" "$output_file"; then if [ -f "$output_file" ] && [ -s "$output_file" ]; then echo -e " ${GREEN}Saved: $output_file${NC}" else echo -e " ${RED}Failed to decode audio $((i+1))${NC}" rm -f "$output_file" 2>/dev/null fi else echo -e " ${RED}Failed to decode audio $((i+1))${NC}" rm -f "$output_file" 2>/dev/null fi fi i=$((i+1)) done else echo -e " ${YELLOW}No audio files in response${NC}" fi # Save full response JSON (strip base64 audio to keep file small) local clean_resp clean_resp=$(jq 'del(.choices[].message.audio[].audio_url.url)' "$resp_file" 2>/dev/null) if [ -n "$clean_resp" ]; then save_result "$job_id" "$clean_resp" else save_result "$job_id" "$(cat "$resp_file")" fi } # Send request to /v1/chat/completions and handle response # Usage: send_completion_request send_completion_request() { local api_url="$1" local payload_file="$2" local api_key=$(load_api_key) local resp_file=$(mktemp) local http_code if [ -n "$api_key" ]; then http_code=$(curl -s -w "%{http_code}" --connect-timeout 10 --max-time 660 \ -o "$resp_file" \ -X POST "${api_url}/v1/chat/completions" \ -H "Content-Type: application/json; charset=utf-8" \ -H "Authorization: Bearer ${api_key}" \ -A "curl/8.7.1" \ --data-binary "@${payload_file}") else http_code=$(curl -s -w "%{http_code}" --connect-timeout 10 --max-time 660 \ -o "$resp_file" \ -X POST "${api_url}/v1/chat/completions" \ -H "Content-Type: application/json; charset=utf-8" \ -A "curl/8.7.1" \ --data-binary "@${payload_file}") fi rm -f "$payload_file" if [ "$http_code" != "200" ]; then local err_detail err_detail=$(jq -r '.detail // .error.message // empty' "$resp_file" 2>/dev/null) echo -e "${RED}Error: HTTP $http_code${NC}" [ -n "$err_detail" ] && echo -e "${RED}$err_detail${NC}" rm -f "$resp_file" return 1 fi # Generate a job_id from the completion id local job_id job_id=$(jq -r '.id // empty' "$resp_file" 2>/dev/null) [ -z "$job_id" ] && job_id="completion-$(date +%s)" echo "" echo -e "${GREEN}Generation completed!${NC}" echo "" parse_completion_response "$resp_file" "$job_id" rm -f "$resp_file" echo "" echo -e "${GREEN}Done! Files saved to: $OUTPUT_DIR${NC}" show_star_prompt } # Wait for job and download results wait_for_job() { local api_url="$1" local job_id="$2" echo "Job created: $job_id" echo "Output: $OUTPUT_DIR" echo "" while true; do local response=$(query_job_result "$api_url" "$job_id") local status=$(parse_query_status "$response") case "$status" in 1) echo "" echo -e "${GREEN}Generation completed!${NC}" echo "" local result_file=$(mktemp) parse_query_result "$response" > "$result_file" local bpm=$(jq -r '.[0].metas.bpm // .[0].bpm // empty' "$result_file" 2>/dev/null) local keyscale=$(jq -r '.[0].metas.keyscale // .[0].keyscale // empty' "$result_file" 2>/dev/null) local duration=$(jq -r '.[0].metas.duration // .[0].duration // empty' "$result_file" 2>/dev/null) echo "Metadata:" [ -n "$bpm" ] && echo " BPM: $bpm" [ -n "$keyscale" ] && echo " Key: $keyscale" [ -n "$duration" ] && echo " Duration: ${duration}s" echo "" # Save result JSON save_result "$job_id" "$response" # Download audio files echo "Downloading audio files..." download_audios "$api_url" "$job_id" "$result_file" rm -f "$result_file" echo "" echo -e "${GREEN}Done! Files saved to: $OUTPUT_DIR${NC}" show_star_prompt return 0 ;; 2) echo "" echo -e "${RED}Generation failed!${NC}" # Save error result save_result "$job_id" "$response" return 1 ;; 0) printf "\rProcessing... " ;; *) printf "\rWaiting... " ;; esac sleep 5 done } # Generate command cmd_generate() { check_deps ensure_config local caption="" lyrics="" description="" thinking="" use_format="" local no_thinking=false no_format=false no_wait=false local model="" language="" steps="" guidance="" seed="" duration="" bpm="" batch="" local task_type="" src_audio="" cover_strength="" repaint_start="" repaint_end="" local key_scale="" time_signature="" while [[ $# -gt 0 ]]; do case $1 in --caption|-c) caption="$2"; shift 2 ;; --lyrics|-l) lyrics="$2"; shift 2 ;; --description|-d) description="$2"; shift 2 ;; --thinking|-t) thinking="true"; shift ;; --no-thinking) no_thinking=true; shift ;; --use-format) use_format="true"; shift ;; --no-format) no_format=true; shift ;; --model|-m) model="$2"; shift 2 ;; --language|--vocal-language) language="$2"; shift 2 ;; --steps) steps="$2"; shift 2 ;; --guidance) guidance="$2"; shift 2 ;; --seed) seed="$2"; shift 2 ;; --duration) duration="$2"; shift 2 ;; --bpm) bpm="$2"; shift 2 ;; --batch) batch="$2"; shift 2 ;; --no-wait) no_wait=true; shift ;; --task-type) task_type="$2"; shift 2 ;; --src-audio) src_audio="$2"; shift 2 ;; --cover-strength) cover_strength="$2"; shift 2 ;; --repaint-start) repaint_start="$2"; shift 2 ;; --repaint-end) repaint_end="$2"; shift 2 ;; --key-scale|--key) key_scale="$2"; shift 2 ;; --time-signature|--time-sig) time_signature="$2"; shift 2 ;; *) [ -z "$caption" ] && caption="$1"; shift ;; esac done # If no caption but has description, use simple mode if [ -z "$caption" ] && [ -z "$description" ]; then echo -e "${RED}Error: caption or description required${NC}" echo "Usage: $0 generate \"Music description\" [options]" echo " $0 generate -d \"Simple description\" [options]" exit 1 fi local api_url=$(ensure_connection) # Get defaults local def_thinking=$(get_config "generation.thinking") local def_format=$(get_config "generation.use_format") local def_cot_caption=$(get_config "generation.use_cot_caption") local def_cot_language=$(get_config "generation.use_cot_language") local def_language=$(get_config "generation.vocal_language") local def_audio_format=$(get_config "generation.audio_format") [ -z "$thinking" ] && thinking="${def_thinking:-true}" [ -z "$use_format" ] && use_format="${def_format:-true}" [ -z "$language" ] && language="${def_language:-en}" [ "$no_thinking" = true ] && thinking="false" [ "$no_format" = true ] && use_format="false" # Normalize boolean values for jq --argjson thinking=$(normalize_bool "$thinking" "true") use_format=$(normalize_bool "$use_format" "true") local cot_caption=$(normalize_bool "$def_cot_caption" "true") local cot_language=$(normalize_bool "$def_cot_language" "true") # Build payload using jq for proper escaping local payload=$(jq -n \ --arg prompt "$caption" \ --arg lyrics "${lyrics:-}" \ --arg sample_query "${description:-}" \ --argjson thinking "$thinking" \ --argjson use_format "$use_format" \ --argjson use_cot_caption "$cot_caption" \ --argjson use_cot_language "$cot_language" \ --arg vocal_language "$language" \ --arg audio_format "${def_audio_format:-mp3}" \ '{ prompt: $prompt, lyrics: $lyrics, sample_query: $sample_query, thinking: $thinking, use_format: $use_format, use_cot_caption: $use_cot_caption, use_cot_language: $use_cot_language, vocal_language: $vocal_language, audio_format: $audio_format, use_random_seed: true }') # Validate src_audio file exists if provided if [ -n "$src_audio" ]; then if [ ! -f "$src_audio" ]; then echo -e "${RED}Error: Source audio file not found: $src_audio${NC}" exit 1 fi # Default task_type to "cover" when src_audio is provided [ -z "$task_type" ] && task_type="cover" fi # Add optional parameters [ -n "$model" ] && payload=$(echo "$payload" | jq --arg v "$model" '. + {model: $v}') [ -n "$steps" ] && payload=$(echo "$payload" | jq --argjson v "$steps" '. + {inference_steps: $v}') [ -n "$guidance" ] && payload=$(echo "$payload" | jq --argjson v "$guidance" '. + {guidance_scale: $v}') [ -n "$seed" ] && payload=$(echo "$payload" | jq --argjson v "$seed" '. + {seed: $v, use_random_seed: false}') [ -n "$duration" ] && payload=$(echo "$payload" | jq --argjson v "$duration" '. + {audio_duration: $v}') [ -n "$bpm" ] && payload=$(echo "$payload" | jq --argjson v "$bpm" '. + {bpm: $v}') [ -n "$batch" ] && payload=$(echo "$payload" | jq --argjson v "$batch" '. + {batch_size: $v}') [ -n "$task_type" ] && payload=$(echo "$payload" | jq --arg v "$task_type" '. + {task_type: $v}') [ -n "$src_audio" ] && payload=$(echo "$payload" | jq --arg v "$src_audio" '. + {src_audio_path: $v}') [ -n "$cover_strength" ] && payload=$(echo "$payload" | jq --argjson v "$cover_strength" '. + {audio_cover_strength: $v}') [ -n "$repaint_start" ] && payload=$(echo "$payload" | jq --argjson v "$repaint_start" '. + {repainting_start: $v}') [ -n "$repaint_end" ] && payload=$(echo "$payload" | jq --argjson v "$repaint_end" '. + {repainting_end: $v}') [ -n "$key_scale" ] && payload=$(echo "$payload" | jq --arg v "$key_scale" '. + {key_scale: $v}') [ -n "$time_signature" ] && payload=$(echo "$payload" | jq --arg v "$time_signature" '. + {time_signature: $v}') local api_mode=$(load_api_mode) echo "Generating music..." if [ -n "$task_type" ] && [ "$task_type" != "text2music" ]; then echo " Mode: $(echo "$task_type" | awk '{print toupper(substr($0,1,1)) substr($0,2)}') (${task_type})" [ -n "$src_audio" ] && echo " Source audio: $src_audio" elif [ -n "$description" ]; then echo " Mode: Simple (description)" echo " Description: ${description:0:50}..." else echo " Mode: Caption" echo " Caption: ${caption:0:50}..." fi echo " Thinking: $thinking, Format: $use_format" echo " API: $api_mode" echo " Output: $OUTPUT_DIR" echo "" if [ "$api_mode" = "completion" ]; then # --- Completion mode: /v1/chat/completions --- local model_id=$(get_completion_model "$api_url" "$model") # Build message content parts local message_content="" local sample_mode=false if [ -n "$description" ]; then message_content="$description" sample_mode=true else message_content="${caption}" [ -n "$lyrics" ] && message_content="${message_content}${lyrics}" fi # Build completion payload local payload_c if [ -n "$src_audio" ]; then # Audio input mode: use multipart content array with text + input_audio # Encode audio to base64 using python to avoid shell argument limits local audio_b64_file=$(mktemp) python3 -c " import base64, sys with open(sys.argv[1], 'rb') as f: sys.stdout.write(base64.b64encode(f.read()).decode('ascii')) " "$src_audio" > "$audio_b64_file" local audio_ext="${src_audio##*.}" [ -z "$audio_ext" ] && audio_ext="mp3" # Build payload with audio using jq --rawfile to read base64 from file payload_c=$(jq -n \ --arg model "$model_id" \ --arg text_content "$message_content" \ --rawfile audio_b64 "$audio_b64_file" \ --arg audio_format "$audio_ext" \ --argjson thinking "$thinking" \ --argjson use_format "$use_format" \ --argjson sample_mode "$sample_mode" \ --argjson use_cot_caption "$cot_caption" \ --argjson use_cot_language "$cot_language" \ --arg vocal_language "$language" \ --arg format "${def_audio_format:-mp3}" \ --arg task_type "${task_type:-text2music}" \ '{ model: $model, messages: [{ "role": "user", "content": [ {"type": "text", "text": $text_content}, {"type": "input_audio", "input_audio": {"data": $audio_b64, "format": $audio_format}} ] }], stream: false, thinking: $thinking, use_format: $use_format, sample_mode: $sample_mode, use_cot_caption: $use_cot_caption, use_cot_language: $use_cot_language, task_type: $task_type, audio_config: { format: $format, vocal_language: $vocal_language } }') rm -f "$audio_b64_file" # Add cover/repainting parameters [ -n "$cover_strength" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$cover_strength" '. + {audio_cover_strength: $v}') [ -n "$repaint_start" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$repaint_start" '. + {repainting_start: $v}') [ -n "$repaint_end" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$repaint_end" '. + {repainting_end: $v}') else # Text-only mode: use string content payload_c=$(jq -n \ --arg model "$model_id" \ --arg content "$message_content" \ --argjson thinking "$thinking" \ --argjson use_format "$use_format" \ --argjson sample_mode "$sample_mode" \ --argjson use_cot_caption "$cot_caption" \ --argjson use_cot_language "$cot_language" \ --arg vocal_language "$language" \ --arg format "${def_audio_format:-mp3}" \ '{ model: $model, messages: [{"role": "user", "content": $content}], stream: false, thinking: $thinking, use_format: $use_format, sample_mode: $sample_mode, use_cot_caption: $use_cot_caption, use_cot_language: $use_cot_language, audio_config: { format: $format, vocal_language: $vocal_language } }') fi # Add optional parameters to completion payload [ -n "$guidance" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$guidance" '. + {guidance_scale: $v}') [ -n "$seed" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$seed" '. + {seed: $v}') [ -n "$batch" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$batch" '. + {batch_size: $v}') [ -n "$duration" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$duration" '.audio_config.duration = $v') [ -n "$bpm" ] && payload_c=$(echo "$payload_c" | jq --argjson v "$bpm" '.audio_config.bpm = $v') [ -n "$key_scale" ] && payload_c=$(echo "$payload_c" | jq --arg v "$key_scale" '.audio_config.key_scale = $v') [ -n "$time_signature" ] && payload_c=$(echo "$payload_c" | jq --arg v "$time_signature" '.audio_config.time_signature = $v') local temp_payload=$(mktemp) printf '%s' "$payload_c" > "$temp_payload" send_completion_request "$api_url" "$temp_payload" else # --- Native mode: /release_task + polling --- local temp_payload=$(mktemp) printf '%s' "$payload" > "$temp_payload" local api_key=$(load_api_key) local response if [ -n "$api_key" ]; then response=$(curl -s -X POST "${api_url}/release_task" \ -H "Content-Type: application/json; charset=utf-8" \ -H "Authorization: Bearer ${api_key}" \ --data-binary "@${temp_payload}") else response=$(curl -s -X POST "${api_url}/release_task" \ -H "Content-Type: application/json; charset=utf-8" \ --data-binary "@${temp_payload}") fi rm -f "$temp_payload" local job_id=$(echo "$response" | jq -r '.data.task_id // .task_id // empty') [ -z "$job_id" ] && { echo -e "${RED}Error: Failed to create job${NC}"; echo "$response"; exit 1; } if [ "$no_wait" = true ]; then echo "Job ID: $job_id" echo "Use '$0 status $job_id' to check progress and download" else wait_for_job "$api_url" "$job_id" fi fi } # Random command cmd_random() { check_deps ensure_config local thinking="" no_thinking=false no_wait=false while [[ $# -gt 0 ]]; do case $1 in --thinking|-t) thinking="true"; shift ;; --no-thinking) no_thinking=true; shift ;; --no-wait) no_wait=true; shift ;; *) shift ;; esac done local api_url=$(ensure_connection) local def_thinking=$(get_config "generation.thinking") [ -z "$thinking" ] && thinking="${def_thinking:-true}" [ "$no_thinking" = true ] && thinking="false" # Normalize boolean for jq --argjson thinking=$(normalize_bool "$thinking" "true") local api_mode=$(load_api_mode) echo "Generating random music..." echo " Thinking: $thinking" echo " API: $api_mode" echo " Output: $OUTPUT_DIR" echo "" if [ "$api_mode" = "completion" ]; then # --- Completion mode --- local model_id=$(get_completion_model "$api_url" "") local def_audio_format=$(get_config "generation.audio_format") local payload_c=$(jq -n \ --arg model "$model_id" \ --argjson thinking "$thinking" \ --arg format "${def_audio_format:-mp3}" \ '{ model: $model, messages: [{"role": "user", "content": "Generate a random song"}], stream: false, sample_mode: true, thinking: $thinking, audio_config: { format: $format } }') local temp_payload=$(mktemp) printf '%s' "$payload_c" > "$temp_payload" send_completion_request "$api_url" "$temp_payload" else # --- Native mode --- local payload=$(jq -n --argjson thinking "$thinking" '{sample_mode: true, thinking: $thinking}') local temp_payload=$(mktemp) printf '%s' "$payload" > "$temp_payload" local api_key=$(load_api_key) local response if [ -n "$api_key" ]; then response=$(curl -s -X POST "${api_url}/release_task" \ -H "Content-Type: application/json; charset=utf-8" \ -H "Authorization: Bearer ${api_key}" \ --data-binary "@${temp_payload}") else response=$(curl -s -X POST "${api_url}/release_task" \ -H "Content-Type: application/json; charset=utf-8" \ --data-binary "@${temp_payload}") fi rm -f "$temp_payload" local job_id=$(echo "$response" | jq -r '.data.task_id // .task_id // empty') [ -z "$job_id" ] && { echo -e "${RED}Error: Failed to create job${NC}"; echo "$response"; exit 1; } if [ "$no_wait" = true ]; then echo "Job ID: $job_id" echo "Use '$0 status $job_id' to check progress and download" else wait_for_job "$api_url" "$job_id" fi fi } # Cover command (shortcut for generate --task-type cover --src-audio) cmd_cover() { check_deps ensure_config local src_audio="" local args=() # Extract src_audio as first positional arg, pass rest to generate while [[ $# -gt 0 ]]; do case $1 in --src-audio) src_audio="$2"; shift 2 ;; -*) args+=("$1"); shift ;; *) if [ -z "$src_audio" ]; then src_audio="$1"; shift else args+=("$1"); shift fi ;; esac done if [ -z "$src_audio" ]; then echo -e "${RED}Error: source audio file required${NC}" echo "Usage: $0 cover -c \"caption\" -l \"lyrics\" [options]" exit 1 fi cmd_generate --src-audio "$src_audio" --task-type cover "${args[@]}" } # Help show_help() { echo "ACE-Step Music Generation CLI" echo "" echo "Requirements: curl, jq" echo "" echo "Usage: $0 [options]" echo "" echo "Commands:" echo " generate Generate music from text" echo " cover Cover/repainting from source audio" echo " random Generate random music" echo " status Check job status and download results" echo " models List available models" echo " health Check API health" echo " config Manage configuration" echo "" echo "Output:" echo " Results saved to: $OUTPUT_DIR/.json" echo " Audio files: $OUTPUT_DIR/_1.mp3, ..." echo "" echo "Generate Options:" echo " -c, --caption Music style/genre description (caption mode)" echo " -d, --description Simple description, LM auto-generates caption/lyrics" echo " -l, --lyrics Lyrics text" echo " -t, --thinking Enable thinking mode (default: true)" echo " --no-thinking Disable thinking mode" echo " --no-format Disable format enhancement" echo " --duration Duration in seconds" echo " --bpm Beats per minute" echo " --key-scale Musical key (e.g. \"E minor\")" echo " --time-signature Time signature (e.g. \"4/4\")" echo "" echo "Cover/Repainting Options:" echo " --src-audio Source audio file path" echo " --task-type Task type: cover, repaint, text2music (default: auto)" echo " --cover-strength Cover strength 0.0-1.0 (default: 1.0)" echo " --repaint-start Repainting start position in seconds" echo " --repaint-end Repainting end position in seconds" echo "" echo "Examples:" echo " $0 generate \"Pop music with guitar\" # Caption mode" echo " $0 generate -d \"A February love song\" # Simple mode (LM generates)" echo " $0 generate -c \"Jazz\" -l \"[Verse] Hello\" # With lyrics" echo " $0 cover song.mp3 -c \"Rock cover\" -l \"[Verse] ...\" --duration 120" echo " $0 generate --src-audio song.mp3 --task-type repaint -c \"Pop\" --repaint-start 30 --repaint-end 60" echo " $0 random" echo " $0 status " echo " $0 config --set generation.thinking false" } # Main case "$1" in generate) shift; cmd_generate "$@" ;; cover) shift; cmd_cover "$@" ;; random) shift; cmd_random "$@" ;; status) shift; cmd_status "$@" ;; models) cmd_models ;; health) cmd_health ;; config) shift; cmd_config "$@" ;; help|--help|-h) show_help ;; *) show_help; exit 1 ;; esac ================================================ FILE: .claude/skills/acestep/scripts/config.example.json ================================================ { "api_url": "https://api.acemusic.ai", "api_key": "", "api_mode": "completion", "generation": { "thinking": true, "use_format": false, "use_cot_caption": true, "use_cot_language": false, "audio_format": "mp3", "batch_size": 1, "vocal_language": "en" } } ================================================ FILE: .claude/skills/acestep-docs/SKILL.md ================================================ --- name: acestep-docs description: ACE-Step documentation and troubleshooting. Use when users ask about installing ACE-Step, GPU configuration, model download, Gradio UI usage, API integration, or troubleshooting issues like VRAM problems, CUDA errors, or model loading failures. allowed-tools: Read, Glob, Grep --- # ACE-Step Documentation Documentation skill for ACE-Step music generation system. ## Quick Reference ### Getting Started | Document | Description | |----------|-------------| | [README.md](getting-started/README.md) | Installation, model download, startup commands | | [Tutorial.md](getting-started/Tutorial.md) | Getting started tutorial, best practices | | [ABOUT.md](getting-started/ABOUT.md) | Project overview, architecture, model zoo | ### Guides | Document | Description | |----------|-------------| | [GRADIO_GUIDE.md](guides/GRADIO_GUIDE.md) | Web UI usage guide | | [INFERENCE.md](guides/INFERENCE.md) | Inference parameters tuning | | [GPU_COMPATIBILITY.md](guides/GPU_COMPATIBILITY.md) | GPU/VRAM configuration, hardware recommendations | | [ENVIRONMENT_SETUP.md](guides/ENVIRONMENT_SETUP.md) | Environment detection, uv installation, python_embeded setup (Windows/Linux/macOS) | | [SCRIPT_CONFIGURATION.md](guides/SCRIPT_CONFIGURATION.md) | Configuring launch scripts: .bat (Windows) and .sh (Linux/macOS) | | [UPDATE_AND_BACKUP.md](guides/UPDATE_AND_BACKUP.md) | Git updates, file backup, conflict resolution (all platforms) | ### API (for developers) | Document | Description | |----------|-------------| | [API.md](api/API.md) | REST API documentation | | [Openrouter_API.md](api/Openrouter_API.md) | OpenRouter API integration | ## Instructions 1. Installation questions → read [getting-started/README.md](getting-started/README.md) 2. General usage / best practices → read [getting-started/Tutorial.md](getting-started/Tutorial.md) 3. Project overview / architecture → read [getting-started/ABOUT.md](getting-started/ABOUT.md) 4. Web UI questions → read [guides/GRADIO_GUIDE.md](guides/GRADIO_GUIDE.md) 5. Inference parameter tuning → read [guides/INFERENCE.md](guides/INFERENCE.md) 6. GPU/VRAM issues → read [guides/GPU_COMPATIBILITY.md](guides/GPU_COMPATIBILITY.md) 7. Environment setup (uv, python_embeded) → read [guides/ENVIRONMENT_SETUP.md](guides/ENVIRONMENT_SETUP.md) 8. Launch script configuration (.bat/.sh) → read [guides/SCRIPT_CONFIGURATION.md](guides/SCRIPT_CONFIGURATION.md) 9. Updates and backup → read [guides/UPDATE_AND_BACKUP.md](guides/UPDATE_AND_BACKUP.md) 10. API development → read [api/API.md](api/API.md) or [api/Openrouter_API.md](api/Openrouter_API.md) ## Common Issues - **Installation problems**: See getting-started/README.md - **VRAM insufficient**: See guides/GPU_COMPATIBILITY.md - **Model download failed**: See getting-started/README.md or guides/SCRIPT_CONFIGURATION.md - **uv not found**: See guides/ENVIRONMENT_SETUP.md - **Environment detection issues**: See guides/ENVIRONMENT_SETUP.md - **BAT/SH script configuration**: See guides/SCRIPT_CONFIGURATION.md - **Update and backup**: See guides/UPDATE_AND_BACKUP.md - **Update conflicts**: See guides/UPDATE_AND_BACKUP.md - **Inference quality issues**: See guides/INFERENCE.md - **Gradio UI not starting**: See guides/GRADIO_GUIDE.md ================================================ FILE: .claude/skills/acestep-docs/api/API.md ================================================ # ACE-Step API Client Documentation --- This service provides an HTTP-based asynchronous music generation API. **Basic Workflow**: 1. Call `POST /release_task` to submit a task and obtain a `task_id`. 2. Call `POST /query_result` to batch query task status until `status` is `1` (succeeded) or `2` (failed). 3. Download audio files via `GET /v1/audio?path=...` URLs returned in the result. --- ## Table of Contents - [Authentication](#1-authentication) - [Response Format](#2-response-format) - [Task Status Description](#3-task-status-description) - [Create Generation Task](#4-create-generation-task) - [Batch Query Task Results](#5-batch-query-task-results) - [Format Input](#6-format-input) - [Get Random Sample](#7-get-random-sample) - [List Available Models](#8-list-available-models) - [Server Statistics](#9-server-statistics) - [Download Audio Files](#10-download-audio-files) - [Health Check](#11-health-check) - [Environment Variables](#12-environment-variables) --- ## 1. Authentication The API supports optional API key authentication. When enabled, a valid key must be provided in requests. ### Authentication Methods Two authentication methods are supported: **Method A: ai_token in request body** ```json { "ai_token": "your-api-key", "prompt": "upbeat pop song", ... } ``` **Method B: Authorization header** ```bash curl -X POST http://localhost:8001/release_task \ -H 'Authorization: Bearer your-api-key' \ -H 'Content-Type: application/json' \ -d '{"prompt": "upbeat pop song"}' ``` ### Configuring API Key Set via environment variable or command-line argument: ```bash # Environment variable export ACESTEP_API_KEY=your-secret-key # Or command-line argument python -m acestep.api_server --api-key your-secret-key ``` --- ## 2. Response Format All API responses use a unified wrapper format: ```json { "data": { ... }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` | Field | Type | Description | | :--- | :--- | :--- | | `data` | any | Actual response data | | `code` | int | Status code (200=success) | | `error` | string | Error message (null on success) | | `timestamp` | int | Response timestamp (milliseconds) | | `extra` | any | Extra information (usually null) | --- ## 3. Task Status Description Task status (`status`) is represented as integers: | Status Code | Status Name | Description | | :--- | :--- | :--- | | `0` | queued/running | Task is queued or in progress | | `1` | succeeded | Generation succeeded, result is ready | | `2` | failed | Generation failed | --- ## 4. Create Generation Task ### 4.1 API Definition - **URL**: `/release_task` - **Method**: `POST` - **Content-Type**: `application/json`, `multipart/form-data`, or `application/x-www-form-urlencoded` ### 4.2 Request Parameters #### Parameter Naming Convention The API supports both **snake_case** and **camelCase** naming for most parameters. For example: - `audio_duration` / `duration` / `audioDuration` - `key_scale` / `keyscale` / `keyScale` - `time_signature` / `timesignature` / `timeSignature` - `sample_query` / `sampleQuery` / `description` / `desc` - `use_format` / `useFormat` / `format` Additionally, metadata can be passed in a nested object (`metas`, `metadata`, or `user_metadata`). #### Method A: JSON Request (application/json) Suitable for passing only text parameters, or referencing audio file paths that already exist on the server. **Basic Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `prompt` | string | `""` | Music description prompt (alias: `caption`) | | `lyrics` | string | `""` | Lyrics content | | `thinking` | bool | `false` | Whether to use 5Hz LM to generate audio codes (lm-dit behavior) | | `vocal_language` | string | `"en"` | Lyrics language (en, zh, ja, etc.) | | `audio_format` | string | `"mp3"` | Output format (mp3, wav, flac) | **Sample/Description Mode Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `sample_mode` | bool | `false` | Enable random sample generation mode (auto-generates caption/lyrics/metas via LM) | | `sample_query` | string | `""` | Natural language description for sample generation (e.g., "a soft Bengali love song"). Aliases: `description`, `desc` | | `use_format` | bool | `false` | Use LM to enhance/format the provided caption and lyrics. Alias: `format` | **Multi-Model Support**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `model` | string | null | Select which DiT model to use (e.g., `"acestep-v15-turbo"`, `"acestep-v15-turbo-shift3"`). Use `/v1/models` to list available models. If not specified, uses the default model. | **thinking Semantics (Important)**: - `thinking=false`: - The server will **NOT** use 5Hz LM to generate `audio_code_string`. - DiT runs in **text2music** mode and **ignores** any provided `audio_code_string`. - `thinking=true`: - The server will use 5Hz LM to generate `audio_code_string` (lm-dit behavior). - DiT runs with LM-generated codes for enhanced music quality. **Metadata Auto-Completion (Conditional)**: When `use_cot_caption=true` or `use_cot_language=true` or metadata fields are missing, the server may call 5Hz LM to fill the missing fields based on `caption`/`lyrics`: - `bpm` - `key_scale` - `time_signature` - `audio_duration` User-provided values always win; LM only fills the fields that are empty/missing. **Music Attribute Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `bpm` | int | null | Specify tempo (BPM), range 30-300 | | `key_scale` | string | `""` | Key/scale (e.g., "C Major", "Am"). Aliases: `keyscale`, `keyScale` | | `time_signature` | string | `""` | Time signature (2, 3, 4, 6 for 2/4, 3/4, 4/4, 6/8). Aliases: `timesignature`, `timeSignature` | | `audio_duration` | float | null | Generation duration (seconds), range 10-600. Aliases: `duration`, `target_duration` | **Audio Codes (Optional)**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `audio_code_string` | string or string[] | `""` | Audio semantic tokens (5Hz) for `llm_dit`. Alias: `audioCodeString` | **Generation Control Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `inference_steps` | int | `8` | Number of inference steps. Turbo model: 1-20 (recommended 8). Base model: 1-200 (recommended 32-64). | | `guidance_scale` | float | `7.0` | Prompt guidance coefficient. Only effective for base model. | | `use_random_seed` | bool | `true` | Whether to use random seed | | `seed` | int | `-1` | Specify seed (when use_random_seed=false) | | `batch_size` | int | `2` | Batch generation count (max 8) | **Advanced DiT Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `shift` | float | `3.0` | Timestep shift factor (range 1.0-5.0). Only effective for base models, not turbo models. | | `infer_method` | string | `"ode"` | Diffusion inference method: `"ode"` (Euler, faster) or `"sde"` (stochastic). | | `timesteps` | string | null | Custom timesteps as comma-separated values (e.g., `"0.97,0.76,0.615,0.5,0.395,0.28,0.18,0.085,0"`). Overrides `inference_steps` and `shift`. | | `use_adg` | bool | `false` | Use Adaptive Dual Guidance (base model only) | | `cfg_interval_start` | float | `0.0` | CFG application start ratio (0.0-1.0) | | `cfg_interval_end` | float | `1.0` | CFG application end ratio (0.0-1.0) | **5Hz LM Parameters (Optional, server-side)**: These parameters control 5Hz LM sampling, used for metadata auto-completion and (when `thinking=true`) codes generation. | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `lm_model_path` | string | null | 5Hz LM checkpoint dir name (e.g. `acestep-5Hz-lm-0.6B`) | | `lm_backend` | string | `"vllm"` | `vllm` or `pt` | | `lm_temperature` | float | `0.85` | Sampling temperature | | `lm_cfg_scale` | float | `2.5` | CFG scale (>1 enables CFG) | | `lm_negative_prompt` | string | `"NO USER INPUT"` | Negative prompt used by CFG | | `lm_top_k` | int | null | Top-k (0/null disables) | | `lm_top_p` | float | `0.9` | Top-p (>=1 will be treated as disabled) | | `lm_repetition_penalty` | float | `1.0` | Repetition penalty | **LM CoT (Chain-of-Thought) Parameters**: | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `use_cot_caption` | bool | `true` | Let LM rewrite/enhance the input caption via CoT reasoning. Aliases: `cot_caption`, `cot-caption` | | `use_cot_language` | bool | `true` | Let LM detect vocal language via CoT. Aliases: `cot_language`, `cot-language` | | `constrained_decoding` | bool | `true` | Enable FSM-based constrained decoding for structured LM output. Aliases: `constrainedDecoding`, `constrained` | | `constrained_decoding_debug` | bool | `false` | Enable debug logging for constrained decoding | | `allow_lm_batch` | bool | `true` | Allow LM batch processing for efficiency | **Edit/Reference Audio Parameters** (requires absolute path on server): | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `reference_audio_path` | string | null | Reference audio path (Style Transfer) | | `src_audio_path` | string | null | Source audio path (Repainting/Cover) | | `task_type` | string | `"text2music"` | Task type: `text2music`, `cover`, `repaint`, `lego`, `extract`, `complete` | | `instruction` | string | auto | Edit instruction (auto-generated based on task_type if not provided) | | `repainting_start` | float | `0.0` | Repainting start time (seconds) | | `repainting_end` | float | null | Repainting end time (seconds), -1 for end of audio | | `audio_cover_strength` | float | `1.0` | Cover strength (0.0-1.0). Lower values (0.2) for style transfer. | #### Method B: File Upload (multipart/form-data) Use this when you need to upload local audio files as reference or source audio. In addition to supporting all the above fields as Form Fields, the following file fields are also supported: - `reference_audio` or `ref_audio`: (File) Upload reference audio file - `src_audio` or `ctx_audio`: (File) Upload source audio file > **Note**: After uploading files, the corresponding `_path` parameters will be automatically ignored, and the system will use the temporary file path after upload. ### 4.3 Response Example ```json { "data": { "task_id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued", "queue_position": 1 }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` ### 4.4 Usage Examples (cURL) **Basic JSON Method**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "prompt": "upbeat pop song", "lyrics": "Hello world", "inference_steps": 8 }' ``` **With thinking=true (LM generates codes + fills missing metas)**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "prompt": "upbeat pop song", "lyrics": "Hello world", "thinking": true, "lm_temperature": 0.85, "lm_cfg_scale": 2.5 }' ``` **Description-driven generation (sample_query)**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "sample_query": "a soft Bengali love song for a quiet evening", "thinking": true }' ``` **With format enhancement (use_format=true)**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "prompt": "pop rock", "lyrics": "[Verse 1]\nWalking down the street...", "use_format": true, "thinking": true }' ``` **Select specific model**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "prompt": "electronic dance music", "model": "acestep-v15-turbo", "thinking": true }' ``` **With custom timesteps**: ```bash curl -X POST http://localhost:8001/release_task \ -H 'Content-Type: application/json' \ -d '{ "prompt": "jazz piano trio", "timesteps": "0.97,0.76,0.615,0.5,0.395,0.28,0.18,0.085,0", "thinking": true }' ``` **File Upload Method**: ```bash curl -X POST http://localhost:8001/release_task \ -F "prompt=remix this song" \ -F "src_audio=@/path/to/local/song.mp3" \ -F "task_type=repaint" ``` --- ## 5. Batch Query Task Results ### 5.1 API Definition - **URL**: `/query_result` - **Method**: `POST` - **Content-Type**: `application/json` or `application/x-www-form-urlencoded` ### 5.2 Request Parameters | Parameter Name | Type | Description | | :--- | :--- | :--- | | `task_id_list` | string (JSON array) or array | List of task IDs to query | ### 5.3 Response Example ```json { "data": [ { "task_id": "550e8400-e29b-41d4-a716-446655440000", "status": 1, "result": "[{\"file\": \"/v1/audio?path=...\", \"wave\": \"\", \"status\": 1, \"create_time\": 1700000000, \"env\": \"development\", \"prompt\": \"upbeat pop song\", \"lyrics\": \"Hello world\", \"metas\": {\"bpm\": 120, \"duration\": 30, \"genres\": \"\", \"keyscale\": \"C Major\", \"timesignature\": \"4\"}, \"generation_info\": \"...\", \"seed_value\": \"12345,67890\", \"lm_model\": \"acestep-5Hz-lm-0.6B\", \"dit_model\": \"acestep-v15-turbo\"}]" } ], "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` **Result Field Description** (result is a JSON string, after parsing contains): | Field | Type | Description | | :--- | :--- | :--- | | `file` | string | Audio file URL (use with `/v1/audio` endpoint) | | `wave` | string | Waveform data (usually empty) | | `status` | int | Status code (0=in progress, 1=success, 2=failed) | | `create_time` | int | Creation time (Unix timestamp) | | `env` | string | Environment identifier | | `prompt` | string | Prompt used | | `lyrics` | string | Lyrics used | | `metas` | object | Metadata (bpm, duration, genres, keyscale, timesignature) | | `generation_info` | string | Generation info summary | | `seed_value` | string | Seed values used (comma-separated) | | `lm_model` | string | LM model name used | | `dit_model` | string | DiT model name used | ### 5.4 Usage Example ```bash curl -X POST http://localhost:8001/query_result \ -H 'Content-Type: application/json' \ -d '{ "task_id_list": ["550e8400-e29b-41d4-a716-446655440000"] }' ``` --- ## 6. Format Input ### 6.1 API Definition - **URL**: `/format_input` - **Method**: `POST` This endpoint uses LLM to enhance and format user-provided caption and lyrics. ### 6.2 Request Parameters | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `prompt` | string | `""` | Music description prompt | | `lyrics` | string | `""` | Lyrics content | | `temperature` | float | `0.85` | LM sampling temperature | | `param_obj` | string (JSON) | `"{}"` | JSON object containing metadata (duration, bpm, key, time_signature, language) | ### 6.3 Response Example ```json { "data": { "caption": "Enhanced music description", "lyrics": "Formatted lyrics...", "bpm": 120, "key_scale": "C Major", "time_signature": "4", "duration": 180, "vocal_language": "en" }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` ### 6.4 Usage Example ```bash curl -X POST http://localhost:8001/format_input \ -H 'Content-Type: application/json' \ -d '{ "prompt": "pop rock", "lyrics": "Walking down the street", "param_obj": "{\"duration\": 180, \"language\": \"en\"}" }' ``` --- ## 7. Get Random Sample ### 7.1 API Definition - **URL**: `/create_random_sample` - **Method**: `POST` This endpoint returns random sample parameters from pre-loaded example data for form filling. ### 7.2 Request Parameters | Parameter Name | Type | Default | Description | | :--- | :--- | :--- | :--- | | `sample_type` | string | `"simple_mode"` | Sample type: `"simple_mode"` or `"custom_mode"` | ### 7.3 Response Example ```json { "data": { "caption": "Upbeat pop song with guitar accompaniment", "lyrics": "[Verse 1]\nSunshine on my face...", "bpm": 120, "key_scale": "G Major", "time_signature": "4", "duration": 180, "vocal_language": "en" }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` ### 7.4 Usage Example ```bash curl -X POST http://localhost:8001/create_random_sample \ -H 'Content-Type: application/json' \ -d '{"sample_type": "simple_mode"}' ``` --- ## 8. List Available Models ### 8.1 API Definition - **URL**: `/v1/models` - **Method**: `GET` Returns a list of available DiT models loaded on the server. ### 8.2 Response Example ```json { "data": { "models": [ { "name": "acestep-v15-turbo", "is_default": true }, { "name": "acestep-v15-turbo-shift3", "is_default": false } ], "default_model": "acestep-v15-turbo" }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` ### 8.3 Usage Example ```bash curl http://localhost:8001/v1/models ``` --- ## 9. Server Statistics ### 9.1 API Definition - **URL**: `/v1/stats` - **Method**: `GET` Returns server runtime statistics. ### 9.2 Response Example ```json { "data": { "jobs": { "total": 100, "queued": 5, "running": 1, "succeeded": 90, "failed": 4 }, "queue_size": 5, "queue_maxsize": 200, "avg_job_seconds": 8.5 }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` ### 9.3 Usage Example ```bash curl http://localhost:8001/v1/stats ``` --- ## 10. Download Audio Files ### 10.1 API Definition - **URL**: `/v1/audio` - **Method**: `GET` Download generated audio files by path. ### 10.2 Request Parameters | Parameter Name | Type | Description | | :--- | :--- | :--- | | `path` | string | URL-encoded path to the audio file | ### 10.3 Usage Example ```bash # Download using the URL from task result curl "http://localhost:8001/v1/audio?path=%2Ftmp%2Fapi_audio%2Fabc123.mp3" -o output.mp3 ``` --- ## 11. Health Check ### 11.1 API Definition - **URL**: `/health` - **Method**: `GET` Returns service health status. ### 11.2 Response Example ```json { "data": { "status": "ok", "service": "ACE-Step API", "version": "1.0" }, "code": 200, "error": null, "timestamp": 1700000000000, "extra": null } ``` --- ## 12. Environment Variables The API server can be configured using environment variables: ### Server Configuration | Variable | Default | Description | | :--- | :--- | :--- | | `ACESTEP_API_HOST` | `127.0.0.1` | Server bind host | | `ACESTEP_API_PORT` | `8001` | Server bind port | | `ACESTEP_API_KEY` | (empty) | API authentication key (empty disables auth) | | `ACESTEP_API_WORKERS` | `1` | API worker thread count | ### Model Configuration | Variable | Default | Description | | :--- | :--- | :--- | | `ACESTEP_CONFIG_PATH` | `acestep-v15-turbo` | Primary DiT model path | | `ACESTEP_CONFIG_PATH2` | (empty) | Secondary DiT model path (optional) | | `ACESTEP_CONFIG_PATH3` | (empty) | Third DiT model path (optional) | | `ACESTEP_DEVICE` | `auto` | Device for model loading | | `ACESTEP_USE_FLASH_ATTENTION` | `true` | Enable flash attention | | `ACESTEP_OFFLOAD_TO_CPU` | `false` | Offload models to CPU when idle | | `ACESTEP_OFFLOAD_DIT_TO_CPU` | `false` | Offload DiT specifically to CPU | ### LM Configuration | Variable | Default | Description | | :--- | :--- | :--- | | `ACESTEP_INIT_LLM` | auto | Whether to initialize LM at startup (auto determines based on GPU) | | `ACESTEP_LM_MODEL_PATH` | `acestep-5Hz-lm-0.6B` | Default 5Hz LM model | | `ACESTEP_LM_BACKEND` | `vllm` | LM backend (vllm or pt) | | `ACESTEP_LM_DEVICE` | (same as ACESTEP_DEVICE) | Device for LM | | `ACESTEP_LM_OFFLOAD_TO_CPU` | `false` | Offload LM to CPU | ### Queue Configuration | Variable | Default | Description | | :--- | :--- | :--- | | `ACESTEP_QUEUE_MAXSIZE` | `200` | Maximum queue size | | `ACESTEP_QUEUE_WORKERS` | `1` | Number of queue workers | | `ACESTEP_AVG_JOB_SECONDS` | `5.0` | Initial average job duration estimate | | `ACESTEP_AVG_WINDOW` | `50` | Window for averaging job duration | ### Cache Configuration | Variable | Default | Description | | :--- | :--- | :--- | | `ACESTEP_TMPDIR` | `.cache/acestep/tmp` | Temporary file directory | | `TRITON_CACHE_DIR` | `.cache/acestep/triton` | Triton cache directory | | `TORCHINDUCTOR_CACHE_DIR` | `.cache/acestep/torchinductor` | TorchInductor cache directory | --- ## Error Handling **HTTP Status Codes**: - `200`: Success - `400`: Invalid request (bad JSON, missing fields) - `401`: Unauthorized (missing or invalid API key) - `404`: Resource not found - `415`: Unsupported Content-Type - `429`: Server busy (queue is full) - `500`: Internal server error **Error Response Format**: ```json { "detail": "Error message describing the issue" } ``` --- ## Best Practices 1. **Use `thinking=true`** for best quality results with LM-enhanced generation. 2. **Use `sample_query`/`description`** for quick generation from natural language descriptions. 3. **Use `use_format=true`** when you have caption/lyrics but want LM to enhance them. 4. **Batch query task status** using the `/query_result` endpoint to query multiple tasks at once. 5. **Check `/v1/stats`** to understand server load and average job time. 6. **Use multi-model support** by setting `ACESTEP_CONFIG_PATH2` and `ACESTEP_CONFIG_PATH3` environment variables, then select with the `model` parameter. 7. **For production**, set `ACESTEP_API_KEY` to enable authentication and secure your API. 8. **For low VRAM environments**, enable `ACESTEP_OFFLOAD_TO_CPU=true` to support longer audio generation. ================================================ FILE: .claude/skills/acestep-docs/api/Openrouter_API.md ================================================ # ACE-Step OpenRouter API Documentation > OpenAI Chat Completions-compatible API for AI music generation **Base URL:** `http://{host}:{port}` (default `http://127.0.0.1:8002`) --- ## Table of Contents - [Authentication](#authentication) - [Endpoints](#endpoints) - [POST /v1/chat/completions - Generate Music](#1-generate-music) - [GET /api/v1/models - List Models](#2-list-models) - [GET /health - Health Check](#3-health-check) - [Input Modes](#input-modes) - [Streaming Responses](#streaming-responses) - [Examples](#examples) - [Error Codes](#error-codes) --- ## Authentication If the server is configured with an API key (via the `OPENROUTER_API_KEY` environment variable or `--api-key` CLI flag), all requests must include the following header: ``` Authorization: Bearer ``` No authentication is required when no API key is configured. --- ## Endpoints ### 1. Generate Music **POST** `/v1/chat/completions` Generates music from chat messages and returns audio data along with LM-generated metadata. #### Request Parameters | Field | Type | Required | Default | Description | |---|---|---|---|---| | `model` | string | No | `"acemusic/acestep-v1.5-turbo"` | Model ID | | `messages` | array | **Yes** | - | Chat message list. See [Input Modes](#input-modes) | | `stream` | boolean | No | `false` | Enable streaming response. See [Streaming Responses](#streaming-responses) | | `temperature` | float | No | `0.85` | LM sampling temperature | | `top_p` | float | No | `0.9` | LM nucleus sampling parameter | | `lyrics` | string | No | `""` | Lyrics passed directly (takes priority over lyrics parsed from messages) | | `duration` | float | No | `null` | Audio duration in seconds. If omitted, determined automatically by the LM | | `bpm` | integer | No | `null` | Beats per minute. If omitted, determined automatically by the LM | | `vocal_language` | string | No | `"en"` | Vocal language code (e.g. `"zh"`, `"en"`, `"ja"`) | | `instrumental` | boolean | No | `false` | Whether to generate instrumental-only music (no vocals) | | `thinking` | boolean | No | `false` | Enable LLM thinking mode for deeper reasoning | | `use_cot_metas` | boolean | No | `true` | Auto-generate BPM, duration, key, time signature via Chain-of-Thought | | `use_cot_caption` | boolean | No | `true` | Rewrite/enhance the music description via Chain-of-Thought | | `use_cot_language` | boolean | No | `true` | Auto-detect vocal language via Chain-of-Thought | | `use_format` | boolean | No | `true` | When prompt/lyrics are provided directly, enhance them via LLM formatting | > **Note on LM parameters:** `use_format` applies when the user provides explicit prompt/lyrics (tagged or lyrics mode) and enhances the description and lyrics formatting via LLM. The `use_cot_*` parameters control Phase 1 CoT reasoning during the audio generation stage. When `use_format` or sample mode has already generated a duration, `use_cot_metas` is automatically skipped to avoid redundancy. #### messages Format ```json { "messages": [ { "role": "user", "content": "Your input content" } ] } ``` Set `role` to `"user"` and `content` to the text input. The system automatically determines the input mode based on the content. See [Input Modes](#input-modes) for details. --- #### Non-Streaming Response (`stream: false`) ```json { "id": "chatcmpl-a1b2c3d4e5f6g7h8", "object": "chat.completion", "created": 1706688000, "model": "acemusic/acestep-v1.5-turbo", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "## Metadata\n**Caption:** Upbeat pop song...\n**BPM:** 120\n**Duration:** 30s\n**Key:** C major\n\n## Lyrics\n[Verse 1]\nHello world...", "audio": [ { "type": "audio_url", "audio_url": { "url": "data:audio/mpeg;base64,SUQzBAAAAAAAI1RTU0UAAAA..." } } ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 100, "total_tokens": 110 } } ``` **Response Fields:** | Field | Description | |---|---| | `choices[0].message.content` | Text information generated by the LM, including Metadata (Caption, BPM, Duration, Key, Time Signature, Language) and Lyrics. Returns `"Music generated successfully."` if LM was not involved | | `choices[0].message.audio` | Audio data array. Each item contains `type` (`"audio_url"`) and `audio_url.url` (Base64 Data URL in format `data:audio/mpeg;base64,...`) | | `choices[0].finish_reason` | `"stop"` indicates normal completion | **Decoding Audio:** The `audio_url.url` value is a Data URL: `data:audio/mpeg;base64,` Extract the base64 portion after the comma and decode it to get the MP3 file: ```python import base64 url = response["choices"][0]["message"]["audio"][0]["audio_url"]["url"] # Strip the "data:audio/mpeg;base64," prefix b64_data = url.split(",", 1)[1] audio_bytes = base64.b64decode(b64_data) with open("output.mp3", "wb") as f: f.write(audio_bytes) ``` ```javascript const url = response.choices[0].message.audio[0].audio_url.url; const b64Data = url.split(",")[1]; const audioBytes = atob(b64Data); // Or use the Data URL directly in an