LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs Paper • 2602.00462 • Published Jan 31 • 18
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs Paper • 2602.00462 • Published Jan 31 • 18
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22, 2025 • 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper • 2508.09804 • Published Aug 13, 2025
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14, 2025
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 106
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 106
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Paper • 2210.12786 • Published Oct 23, 2022
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory Paper • 2307.10768 • Published Jul 20, 2023
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Paper • 2210.07179 • Published Oct 13, 2022 • 3
Learning to Learn: How to Continuously Teach Humans and Machines Paper • 2211.15470 • Published Nov 28, 2022