From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted
a
paper
3 days ago
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
upvoted
a
paper
14 days ago
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
updated
a collection
16 days ago
NEO1_5