ekwek
/

Soprano-Encoder

Feature Extraction

Model card Files Files and versions

Soprano-Encoder / README.md

ekwek's picture

Update README.md

2a5aeee verified about 1 month ago

|

history blame contribute delete

1.72 kB

	---
	license: apache-2.0
	pipeline_tag: feature-extraction
	---

	# Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

	<div align="center">

	<img width="640" height="320" alt="soprano-github" src="https://github.com/user-attachments/assets/4d612eac-23b8-44e6-8c59-d7ac14ebafd1" />

	[![Alt Text](https://img.shields.io/badge/Github-Repo-black?logo=github)](https://github.com/ekwek1/soprano)
	[![Alt Text](https://img.shields.io/badge/HuggingFace-Demo-yellow?logo=huggingface)](https://huggingface.co/spaces/ekwek/Soprano-TTS)
	</div>

	### 📰 News
	2026.01.13 - [Soprano-Factory](https://github.com/ekwek1/soprano-factory) released! You can now train/fine-tune your own Soprano models.
	2025.12.22 - Soprano-80M released! [Code](https://github.com/ekwek1/soprano) \| [Demo](https://huggingface.co/spaces/ekwek/Soprano-TTS)

	---

	This repository contains Soprano-Encoder, which converts raw audio into audio tokens that the LLM backbone can recognize.

	## Overview

	Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:
	- Up to 2000x real-time generation on GPU and 20x real-time on CPU
	- Lossless streaming with <15 ms latency on GPU, <250 ms on CPU
	- <1 GB memory usage with a compact 80M parameter architecture
	- Infinite generation length with automatic text splitting
	- Highly expressive, crystal clear audio generation at 32kHz
	- Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
	- Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference

	---