Sitong CHENG

cmots

11 17

AI & ML interests

None yet

Recent Activity

upvoted a paper 13 days ago

STEB: A Speech-to-Speech Translation Expressiveness Benchmark for Evaluating Beyond Translation Fidelity

updated a dataset 15 days ago

cmots/STEB

published a dataset 15 days ago

cmots/STEB

View all activity

Organizations

upvoted a paper 13 days ago

STEB: A Speech-to-Speech Translation Expressiveness Benchmark for Evaluating Beyond Translation Fidelity

Paper • 2606.25529 • Published 14 days ago • 1

upvoted a collection about 1 month ago

Cosmos3

Collection

Omnimodal World Models for Physical AI • 20 items • Updated 1 day ago • 139

upvoted an article about 1 month ago

Article

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

nvidia

•

Jun 1

• 85

upvoted a paper about 1 month ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Paper • 2606.02482 • Published Jun 1 • 36

upvoted a paper 7 months ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 68

upvoted a paper 8 months ago

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published Nov 19, 2025 • 60

upvoted 2 papers 9 months ago

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Paper • 2510.09606 • Published Oct 10, 2025 • 18

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

Paper • 2509.21144 • Published Sep 25, 2025 • 1

upvoted a paper over 1 year ago

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published Feb 23, 2025 • 36

upvoted 2 papers about 2 years ago

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 17

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published Apr 30, 2024 • 17

Sitong CHENG

AI & ML interests

Recent Activity

Organizations

cmots's activity

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action