10 10 23

Yifan Peng

pyf98

https://pyf98.github.io

AI & ML interests

Multimodal LLMs, Speech-to-Speech, Speech Recognition

Recent Activity

authored a paper about 14 hours ago

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

authored a paper about 14 hours ago

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

authored a paper about 15 hours ago

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

View all activity

Organizations

authored 2 papers about 14 hours ago

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Paper • 2505.24200 • Published May 30, 2025

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Paper • 2502.15218 • Published Feb 21, 2025

authored a paper about 15 hours ago

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Paper • 2604.24954 • Published 3 days ago • 6

liked a model 1 day ago

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Any-to-Any • 33B • Updated about 20 hours ago • 9.82k • 142

liked a dataset 2 months ago

inclusionAI/AudioMCQ

Viewer • Updated 17 days ago • 571k • 1.17k • 16

liked a model 3 months ago

espnet/owsm_ctc_v3.2_ft_1B

Automatic Speech Recognition • Updated 22 days ago • 24 • 5

liked a Space 6 months ago

The Smol Training Playbook

📚

3.13k

The secrets to building world-class LLMs

New activity in espnet/yodas_owsmv4 6 months ago

When data will be published?

#2 opened 11 months ago by

Yehor

updated a dataset 8 months ago

espnet/yodas_owsmv4

Viewer • Updated Sep 1, 2025 • 4 • 977 • 17

updated 9 models 8 months ago

updated a collection 8 months ago

Open Whisper-style Speech Models (OWSM)

Collection

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 22 items • Updated Aug 30, 2025 • 6

liked a Space 8 months ago

OWSM V4 Demo

🌍

This is a demo for OWSM-V4 CTC and medium model.