Path to Multimodal Generalist

community

https://generalist.top/

path2generalist

AI & ML interests

Multimodal Generalist

Recent Activity

BradNLP submitted a paper 9 days ago

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

BradNLP authored a paper 10 days ago

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

BradNLP authored a paper 10 days ago

CMNER: A Chinese Multimodal NER Dataset based on Social Media

View all activity

submitted a paper to Daily Papers 9 days ago

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Paper • 2604.19548 • Published 16 days ago • 16

authored 11 papers 10 days ago

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

Paper • 2305.11255 • Published May 18, 2023 • 2

CMNER: A Chinese Multimodal NER Dataset based on Social Media

Paper • 2402.13693 • Published Feb 21, 2024

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Paper • 2408.09481 • Published Aug 18, 2024 • 1

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Paper • 2304.06248 • Published Apr 13, 2023

NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations

Paper • 2501.17261 • Published Aug 22, 2024

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

Paper • 2506.01520 • Published Jun 2, 2025

Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning

Paper • 2602.00971 • Published Feb 28 • 1

Zero-Shot Conversational Stance Detection: Dataset and Approaches

Paper • 2506.17693 • Published Jun 21, 2025

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Paper • 2604.19548 • Published 16 days ago • 16

submitted a paper to Daily Papers about 1 month ago

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

Paper • 2604.01569 • Published Apr 2 • 13

authored a paper about 1 month ago

OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

Paper • 2603.24458 • Published Mar 25 • 9

authored 6 papers about 2 months ago

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Paper • 2505.18660 • Published May 24, 2025 • 2

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30, 2025

SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control

Paper • 2505.19463 • Published May 26, 2025

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Paper • 2510.00647 • Published Oct 1, 2025

DragNeXt: Rethinking Drag-Based Image Editing

Paper • 2506.07611 • Published Jun 9, 2025 • 1