Zuhao Yang's picture

Zuhao Yang

mwxely

·

https://mwxely.github.io/

AI & ML interests

Large Multimodal Models

Recent Activity

upvoted a paper 3 days ago

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

authored a paper 5 days ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

commentedon a paper 8 days ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

View all activity

Organizations

upvoted a paper 3 days ago

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

Paper • 2604.28123 • Published 8 days ago • 42

upvoted a paper 8 days ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published 9 days ago • 86

upvoted a paper about 1 month ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

upvoted 3 papers about 2 months ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published Mar 16 • 186

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 371

DVD: Deterministic Video Depth Estimation with Generative Priors

Paper • 2603.12250 • Published Mar 12 • 26

upvoted 2 papers 2 months ago

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Paper • 2603.03756 • Published Mar 4 • 89

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted 2 papers 3 months ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 50

upvoted 6 papers 4 months ago

XR: Cross-Modal Agents for Composed Image Retrieval

Paper • 2601.14245 • Published Jan 20 • 8

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Paper • 2601.09688 • Published Jan 14 • 127

On the Role of Discreteness in Diffusion LLMs

Paper • 2512.22630 • Published Dec 27, 2025 • 18

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 324

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published Dec 9, 2025 • 124

upvoted 4 papers 5 months ago

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 68

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 68

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 276

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 233