OSU-NLP-Group/UGround

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

46
/ 100
Emerging

UGround helps AI agents understand and interact with any graphical user interface (GUI) by precisely locating specific elements on a screen. It takes a screenshot of a mobile app or website and a natural language instruction, then outputs the exact location (bounding box) of the UI element the AI needs to interact with. This is for AI developers and researchers building intelligent agents that can navigate and perform tasks in digital environments just like a human.

302 stars.

Use this if you are building an AI agent that needs to accurately identify and click on specific buttons, text fields, or icons within a mobile app or website.

Not ideal if your primary goal is general image recognition or object detection outside of the context of graphical user interfaces.

AI-agents GUI-automation digital-assistant web-automation mobile-automation
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

302

Forks

15

Language

Python

License

MIT

Last pushed

Mar 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/OSU-NLP-Group/UGround"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.