Audio-to-Audio

PianoKontext: Expressive Performance Rendering from Deadpan Context

PianoKontext is a flow matching rendering model for classical piano music that generates variable-length performances in the latent space of a pretrained Music2Latent model.

Description

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. PianoKontext addresses the limitations of existing flow matching models regarding expressive timing by generating variable-length performances. It synthesizes MIDI scores into deadpan audio and employs Dynamic Time Warping (DTW) in the latent space to construct paired data for training. The aligned embeddings are concatenated in DiT blocks, allowing the model to learn the dependencies between the score and expressive performances.

Citation

If you use this model in your research, please cite the following paper:

@article{gavrilev2026pianokontext,
  title={PianoKontext: Expressive Performance Rendering from Deadpan Context},
  author={Dmitrii Gavrilev},
  journal={arXiv preprint arXiv:2606.12282},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for realfolkcode/PianoKontext