Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 2 days ago • 3
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 2 days ago • 3
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 2 days ago • 3 • 3
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 2 days ago • 3
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 5 days ago • 127
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published 21 days ago • 5
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published 21 days ago • 5
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction Paper • 2603.09930 • Published 24 days ago
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction Paper • 2603.09930 • Published 24 days ago
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published 21 days ago • 5
view article Article NanoVDR: A 70M Text-Only Model That Retrieves Visual Documents as Well as a 2B VLM 18 days ago • 3