When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published May 7 • 5
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published May 7 • 5
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Paper • 2506.09958 • Published Jun 11, 2025 • 1
Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models Paper • 2505.16647 • Published May 22, 2025 • 1