Learning Human-Human Interactions in Images from Weak Textual Supervision Paper • 2304.14104 • Published Apr 27, 2023
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations Paper • 2312.03631 • Published Dec 6, 2023
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Paper • 2403.01306 • Published Mar 2, 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations Paper • 2407.08521 • Published Jul 11, 2024
ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline Paper • 2508.06094 • Published Aug 8, 2025
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech Paper • 2506.12311 • Published Jun 14, 2025
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding Paper • 2303.12513 • Published Mar 21, 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models Paper • 2310.16781 • Published Oct 25, 2023