facebook/dinov3-vitl16-pretrain-lvd1689m Image Feature Extraction • 0.3B • Updated Aug 19, 2025 • 765k • 205
LLM2Vec-Gen: Generative Embeddings from Large Language Models Paper • 2603.10913 • Published 21 days ago • 43
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 243
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21, 2025 • 5
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21, 2025 • 5 • 2
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published Feb 17, 2025 • 8
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published Feb 17, 2025 • 8 • 2
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10, 2025 • 75