LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 5 days ago • 16
GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF Image-Text-to-Text • 35B • Updated 2 days ago • 558 • 3
WebWorld: A Large-Scale World Model for Web Agent Training Paper • 2602.14721 • Published Feb 16 • 18
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published Jan 29 • 19
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 15 days ago • 104
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data Paper • 2604.19859 • Published 23 days ago • 51