A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily Paper • 2311.08268 • Published Nov 14, 2023 • 1
FRABench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities Paper • 2505.12795 • Published May 19, 2025
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model Paper • 2602.23622 • Published Feb 27 • 3
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 20 days ago • 143
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published 5 days ago • 11
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model Paper • 2602.23622 • Published Feb 27 • 3
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels Paper • 2510.18915 • Published Oct 21, 2025 • 7
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 88 items • Updated Mar 2 • 117