A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities.
Ray Yang
rayruiyang
AI & ML interests
None yet
Recent Activity
upvoted a paper 1 day ago
MolmoPoint: Better Pointing for VLMs with Grounding Tokens upvoted a paper 6 days ago
ProAct: Agentic Lookahead in Interactive Environments updated a dataset 19 days ago
rayruiyang/vst_500kOrganizations
None yet