GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks Paper • 2509.23738 • Published Sep 28, 2025 • 2
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation Paper • 2509.23736 • Published Sep 28, 2025 • 2
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert Paper • 2510.03896 • Published Oct 4, 2025
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published 16 days ago • 25
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published 3 days ago • 21
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published 3 days ago • 21
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published 3 days ago • 21
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published 16 days ago • 25
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published 16 days ago • 25
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 22 days ago • 232
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 109
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention Paper • 2602.05847 • Published Feb 5 • 12
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO Paper • 2602.06422 • Published Feb 6 • 47
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models Paper • 2601.07351 • Published Jan 12 • 26
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51