ONE-Lab
/

GUI-Vid

Video-Text-to-Text

Model card Files Files and versions

GUI-Vid / README.md

nielsr's picture

nielsr HF Staff

Add link to paper

78e59ab verified 11 months ago

|

535 Bytes

	---
	datasets:
	- shuaishuaicdp/GUI-World
	language:
	- en
	license: cc-by-4.0
	metrics:
	- bertscore
	- LLM-as-a-Judge
	tags:
	- gui
	- agent
	pipeline_tag: video-text-to-text
	---

	This is the first VideoLLM with powerful GUI-oriented capabilities, retrained on [GUI-World](https://gui-world.github.io).

	It was presented in [GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents](https://huggingface.co/papers/2406.10819).

	See [Github](https://github.com/Dongping-Chen/GUI-World) for how to use GUI-Vid for GUI understanding tasks.