APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
Abstract
A large-scale multi-task learning framework for AI-generated music predicts both popularity and aesthetic quality using frozen audio embeddings from a self-supervised music understanding model, demonstrating strong generalization across different generative architectures.
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.
Community
Large-scale aesthetics informed AI music hit prediction model in terms of a streams and likes-score.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation (2026)
- SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment (2026)
- Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems (2026)
- Personalizing Text-to-Image Generation to Individual Taste (2026)
- Leveraging Artist Catalogs for Cold-Start Music Recommendation (2026)
- GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models (2026)
- Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.03395 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper