Model weights for the paper "Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination"
Pritam Sarkar
pritamqu
AI & ML interests
multimodal learning with vision, language, and audio; generative modeling; large multimodal models (LMMs); multimodal LLMs (MLLMs); AI agents; alignments; representation learning; self-supervised and unsupervised learning; vision-language models; audio-visual models; foundation models; computer vision
Recent Activity
liked
a dataset
1 day ago
WHB139426/Grounded-VideoLLM
commented on
a paper
9 months ago
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large
Video Language Models
updated
a dataset
9 months ago
pritamqu/VCRBench
Organizations
None yet