Duplicated from tencent/SongGeneration

Novix
/

SongGenerationtwo

Model card Files Files and versions

SongGenerationtwo / README.md

Novix's picture

Duplicate from tencent/SongGeneration

6972b23 9 days ago

|

history blame contribute delete

3.7 kB

	---
	language:
	- en
	- zh
	pipeline_tag: text-to-audio
	library_name: tencent-song-generation
	---

	# SongGeneration

	<p align="center"><img src="img/logo.jpg" width="40%"></p>
	<p align="center">
	<a href="https://levo-demo.github.io/">Demo</a>  \|  <a href="https://arxiv.org/abs/2506.07520">Paper</a>  \|  <a href="https://github.com/tencent-ailab/songgeneration">Code</a>  \|  <a href="https://huggingface.co/spaces/tencent/SongGeneration">Space Demo</a>
	</p>


	This repository is the official weight repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment. In this repository, we provide the SongGeneration model, inference scripts, and the checkpoint that has been trained on the Million Song Dataset.

	## Model Versions

	\| Model \| Max Length \| Language \| GPU Memory \| RTF(H20) \| Download Link \|
	\| ------------------------ \| :--------: \| :------------------: \| :--------: \| :------: \| ------------------------------------------------------------ \|
	\| SongGeneration-base \| 2m30s \| zh \| 10G/16G \| 0.67 \| [Huggingface](https://huggingface.co/tencent/SongGeneration/tree/main/ckpt/songgeneration_base) \|
	\| SongGeneration-base-new \| 2m30s \| zh, en \| 10G/16G \| 0.67 \| [Huggingface](https://huggingface.co/lglg666/SongGeneration-base-new) \|
	\| SongGeneration-base-full \| 4m30s \| zh, en \| 12G/18G \| 0.69 \| [Huggingface](https://huggingface.co/lglg666/SongGeneration-base-full) \|
	\| SongGeneration-large \| 4m30s \| zh, en \| 22G/28G \| 0.82 \| [Huggingface](https://huggingface.co/lglg666/SongGeneration-large) \|
	\| SongGeneration-v2-large \| 4m30s \| zh, en, es, ja, etc. \| 22G/28G \| 0.82 \| [Huggingface](https://huggingface.co/lglg666/SongGeneration-v2-large) \|
	\| SongGeneration-v2-medium \| 4m30s \| zh, en, es, ja, etc. \| 12G/18G \| 0.69 \| Coming soon \|
	\| SongGeneration-v2-fast \| 4m30s \| zh, en, es, ja, etc. \| - \| - \| Coming soon \| \|

	## Overview

	🚀 We introduce LeVo 2 (SongGeneration 2), an open-source music foundation model designed to shatter the ceiling of open-source AI music by achieving true commercial-grade generation.

	Through a large-scale, rigorous expert evaluation (20 industry professionals, 6 core dimensions, 100 songs per model), LeVo 2 has proven its superiority:

	- 🏆 Commercial-Grade Musicality: Comprehensively outperforms all open-source baselines across Overall Quality, Melody, Arrangement, Sound Quality, and Structure. Its subjective generation quality successfully rivals top-tier closed-source commercial systems (e.g., MiniMax 2.5).
	- 🎯 Precise Lyric Accuracy: Achieves an outstanding Phoneme Error Rate (PER) of 8.55%, effectively solving the lyrical hallucination problem. This remarkable accuracy significantly outperforms top commercial models like Suno v5 (12.4%) and Mureka v8 (9.96%).
	- 🎛️ Exceptional Controllability: Highly responsive to multi-modal instructions, including text descriptions and audio prompts, allowing for precise control over the generated music.

	📊 For detailed experimental setups and comprehensive metrics, please refer to the [Evaluation Performance](#Evaluation-Performance) section below or our upcoming technical report.

	<img src="img/output.png" alt="img" style="zoom:100%;" />

	## License

	The code and weights in this repository is released in the [LICENSE](LICENSE) file.