File size: 10,614 Bytes
9f5c8f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
<h1 align="center">ACE-Step 1.5</h1>
<h1 align="center">Pushing the Boundaries of Open-Source Music Generation</h1>
<p align="center">
    <a href="https://ace-step.github.io/ace-step-v1.5.github.io/">Project</a> |
    <a href="https://huggingface.co/collections/ACE-Step/ace-step-15">Hugging Face</a> |
    <a href="https://modelscope.cn/models/ACE-Step/ACE-Step-v1-5">ModelScope</a> |
    <a href="https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5">Space Demo</a> |
    <a href="https://discord.gg/PeWDxrkdj7">Discord</a> |
    <a href="https://arxiv.org/abs/2506.00045">Technical Report</a>
</p>

<p align="center">
    <img src="./assets/orgnization_logos.png" width="100%" alt="StepFun Logo">
</p>

## Table of Contents

- [โœจ Features](#-features)
- [๐Ÿ“ฆ Installation](#-installation)
- [๐Ÿš€ Usage](#-usage)
- [๐Ÿ”จ Train](#-train)
- [๐Ÿ—๏ธ Architecture](#๏ธ-architecture)
- [๐Ÿฆ Model Zoo](#-model-zoo)

## ๐Ÿ“ Abstract
๐Ÿš€ We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fastโ€”under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.

๐ŸŒ‰ At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprintsโ€”scaling from short loops to 10-minute compositionsโ€”while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). โšก Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. ๐ŸŽš๏ธ

๐Ÿ”ฎ Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilitiesโ€”such as cover generation, repainting, and vocal-to-BGM conversionโ€”while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. ๐ŸŽธ


## โœจ Features

<p align="center">
    <img src="./assets/application_map.png" width="100%" alt="ACE-Step Framework">
</p>

### โšก Performance
- โœ… **Ultra-Fast Generation** โ€” Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps)
- โœ… **Flexible Duration** โ€” Supports 10 seconds to 10 minutes (600s) audio generation
- โœ… **Batch Generation** โ€” Generate up to 8 songs simultaneously

### ๐ŸŽต Generation Quality
- โœ… **Commercial-Grade Output** โ€” Quality beyond most commercial music models (between Suno v4.5 and Suno v5)
- โœ… **Rich Style Support** โ€” 1000+ instruments and styles with fine-grained timbre description
- โœ… **Multi-Language Lyrics** โ€” Supports 50+ languages with lyrics prompt for structure & style control

### ๐ŸŽ›๏ธ Versatility & Control

| Feature | Description |
|---------|-------------|
| โœ… Reference Audio Input | Use reference audio to guide generation style |
| โœ… Cover Generation | Create covers from existing audio |
| โœ… Repaint & Edit | Selective local audio editing and regeneration |
| โœ… Track Separation | Separate audio into individual stems |
| โœ… Multi-Track Generation | Add layers like Suno Studio's "Add Layer" feature |
| โœ… Vocal2BGM | Auto-generate accompaniment for vocal tracks |
| โœ… Metadata Control | Control duration, BPM, key/scale, time signature |
| โœ… Simple Mode | Generate full songs from simple descriptions |
| โœ… Query Rewriting | Auto LM expansion of tags and lyrics |
| โœ… Audio Understanding | Extract BPM, key/scale, time signature & caption from audio |
| โœ… LRC Generation | Auto-generate lyric timestamps for generated music |
| โœ… LoRA Training | One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM) |
| โœ… Quality Scoring | Automatic quality assessment for generated audio |



## ๐Ÿ“ฆ Installation

> **Requirements:** Python 3.11, CUDA GPU recommended (works on CPU/MPS but slower)

### 1. Install uv (Package Manager)

```bash
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

### 2. Clone & Install

```bash
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync
```

### 3. Launch

#### ๐Ÿ–ฅ๏ธ Gradio Web UI (Recommended)

```bash
uv run acestep
```

Open http://localhost:7860 in your browser. Models will be downloaded automatically on first run.

#### ๐ŸŒ REST API Server

```bash
uv run acestep-api
```

API runs at http://localhost:8001. See [API Documentation](./docs/en/API.md) for endpoints.

### Command Line Options

**Gradio UI (`acestep`):**

| Option | Default | Description |
|--------|---------|-------------|
| `--port` | 7860 | Server port |
| `--server-name` | 127.0.0.1 | Server address (use `0.0.0.0` for network access) |
| `--share` | false | Create public Gradio link |
| `--language` | en | UI language: `en`, `zh`, `ja` |
| `--init_service` | false | Auto-initialize models on startup |
| `--config_path` | auto | DiT model (e.g., `acestep-v15-turbo`, `acestep-v15-turbo-shift3`) |
| `--lm_model_path` | auto | LM model (e.g., `acestep-5Hz-lm-0.6B`, `acestep-5Hz-lm-1.7B`) |
| `--offload_to_cpu` | auto | CPU offload (auto-enabled if VRAM < 16GB) |

**Examples:**

```bash
# Public access with Chinese UI
uv run acestep --server-name 0.0.0.0 --share --language zh

# Pre-initialize models on startup
uv run acestep --init_service true --config_path acestep-v15-turbo
```

### Development

```bash
# Add dependencies
uv add package-name
uv add --dev package-name

# Update all dependencies
uv sync --upgrade
```

## ๐Ÿš€ Usage

We provide multiple ways to use ACE-Step:

| Method | Description | Documentation |
|--------|-------------|---------------|
| ๐Ÿ–ฅ๏ธ **Gradio Web UI** | Interactive web interface for music generation | [Gradio Guide](./docs/en/GRADIO_GUIDE.md) |
| ๐Ÿ **Python API** | Programmatic access for integration | [Inference API](./docs/en/INFERENCE.md) |
| ๐ŸŒ **REST API** | HTTP-based async API for services | [REST API](./docs/en/API.md) |

**๐Ÿ“š Documentation available in:** [English](./docs/en/) | [ไธญๆ–‡](./docs/zh/) | [ๆ—ฅๆœฌ่ชž](./docs/ja/)


## ๐Ÿ”จ Train

See the **LoRA Training** tab in Gradio UI for one-click training, or check [Gradio Guide - LoRA Training](./docs/en/GRADIO_GUIDE.md#lora-training) for details.

## ๐Ÿ—๏ธ Architecture

<p align="center">
    <img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
</p>

## ๐Ÿฆ Model Zoo

<p align="center">
    <img src="./assets/model_zoo.png" width="100%" alt="Model Zoo">
</p>

### DiT Models

| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
| `acestep-v15-base` | โœ… | โŒ | โŒ | โœ… | 50 | โœ… | โœ… | โœ… | โœ… | โœ… | โœ… | โœ… | Medium | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-base) |
| `acestep-v15-sft` | โœ… | โœ… | โŒ | โœ… | 50 | โœ… | โœ… | โœ… | โœ… | โŒ | โŒ | โŒ | High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-sft) |
| `acestep-v15-turbo` | โœ… | โœ… | โŒ | โŒ | 8 | โœ… | โœ… | โœ… | โœ… | โŒ | โŒ | โŒ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
| `acestep-v15-turbo-rl` | โœ… | โœ… | โœ… | โŒ | 8 | โœ… | โœ… | โœ… | โœ… | โŒ | โŒ | โŒ | Very High | Medium | Medium | To be released |

### LM Models

| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
|----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|
| `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | โœ… | โœ… | โœ… | โœ… | โœ… | Medium | Medium | Weak | โœ… |
| `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | โœ… | โœ… | โœ… | โœ… | โœ… | Medium | Medium | Medium | โœ… |
| `acestep-5Hz-lm-4B` | Qwen3-4B | โœ… | โœ… | โœ… | โœ… | โœ… | Strong | Strong | Strong | To be released |

## ๐Ÿ“œ License & Disclaimer

This project is licensed under [MIT](./LICENSE)

ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.

๐Ÿ”” Important Notice  
The only official website for the ACE-Step project is our GitHub Pages site.    
 We do not operate any other websites.  
๐Ÿšซ Fake domains include but are not limited to:
ac\*\*p.com, a\*\*p.org, a\*\*\*c.org  
โš ๏ธ Please be cautious. Do not visit, trust, or make payments on any of those sites.

## ๐Ÿ™ Acknowledgements

This project is co-led by ACE Studio and StepFun.


## ๐Ÿ“– Citation

If you find this project useful for your research, please consider citing:

```BibTeX
@misc{gong2026acestep,
	title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
	author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
	howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
	year={2026},
	note={GitHub repository}
}
```