| | --- |
| | datasets: |
| | - shuaishuaicdp/GUI-World |
| | language: |
| | - en |
| | license: cc-by-4.0 |
| | metrics: |
| | - bertscore |
| | - LLM-as-a-Judge |
| | tags: |
| | - gui |
| | - agent |
| | pipeline_tag: video-text-to-text |
| | --- |
| | |
| | This is the first VideoLLM with powerful GUI-oriented capabilities, retrained on [GUI-World](https://gui-world.github.io). |
| |
|
| | It was presented in [GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents](https://huggingface.co/papers/2406.10819). |
| |
|
| | See [Github](https://github.com/Dongping-Chen/GUI-World) for how to use GUI-Vid for GUI understanding tasks. |