Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration Paper • 2605.28184 • Published 3 days ago • 4
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration Paper • 2605.28184 • Published 3 days ago • 4
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration Paper • 2605.28184 • Published 3 days ago • 4
Taming Modality Entanglement in Continual Audio-Visual Segmentation Paper • 2510.17234 • Published Oct 20, 2025 • 5
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering Paper • 2510.14605 • Published Oct 16, 2025 • 5
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28, 2025 • 110
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28, 2025 • 46
Faster and Better LLMs via Latency-Aware Test-Time Scaling Paper • 2505.19634 • Published May 26, 2025
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 613
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published Nov 18, 2024 • 16
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published Nov 18, 2024 • 16
Continuous Speculative Decoding for Autoregressive Image Generation Paper • 2411.11925 • Published Nov 18, 2024 • 16 • 3
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published Sep 10, 2024 • 16
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation Paper • 2408.01708 • Published Aug 3, 2024 • 4
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation Paper • 2408.01708 • Published Aug 3, 2024 • 4 • 2
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26, 2024 • 17
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29, 2024 • 53