NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 15 days ago • 34
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published 11 days ago • 32
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models nvidia • Jul 18, 2025 • 51
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published Apr 9 • 101
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 23
Qwen3.5 Collection Qwen3.5 is Qwen's new model family including Qwen3.5 Small: 0.8B, 2B, 4B, 9B and Qwen3.5 Medium: 35B-A3B, 27B, 122B-A10B and 397B-A17B. • 25 items • Updated 2 days ago • 156
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device Paper • 2602.20161 • Published Feb 23 • 23
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models Paper • 2602.18993 • Published Feb 22 • 4
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 12 • 83
timm DINOv3 Collection Meta AI's DINOv3 weights in timm. ViTs with `qkvb` have a zero QV bias present, otherwise bias is disabled. QKV bias are all 0 in original weights. • 18 items • Updated Sep 19, 2025 • 37
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 80
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space Paper • 2602.02092 • Published Feb 2 • 18
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 228
TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization Paper • 2512.09196 • Published Dec 9, 2025 • 1