A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily Paper • 2311.08268 • Published Nov 14, 2023 • 1
FRABench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities Paper • 2505.12795 • Published May 19, 2025
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model Paper • 2602.23622 • Published Feb 27 • 3
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 20 days ago • 143
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published 5 days ago • 11