LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper β’ 2605.27365 β’ Published 5 days ago β’ 124
Audio-Visual Intelligence in Large Foundation Models Paper β’ 2605.04045 β’ Published 26 days ago β’ 35
Running on Zero Agents Featured 2.51k Qwen Image Multiple Angles 3D Camera π₯ 2.51k Transform image viewpoint with adjustable camera angles
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper β’ 2511.23002 β’ Published Nov 28, 2025 β’ 26
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper β’ 2507.01006 β’ Published Jul 1, 2025 β’ 256
Sleeping Agents Featured 105 JarvisArt Preview π 105 Generate Lightroom presets from images and prompts