OSU-NLP-Group/UGround

[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents

/ 100

Emerging

UGround helps AI agents understand and interact with any graphical user interface (GUI) by precisely locating specific elements on a screen. It takes a screenshot of a mobile app or website and a natural language instruction, then outputs the exact location (bounding box) of the UI element the AI needs to interact with. This is for AI developers and researchers building intelligent agents that can navigate and perform tasks in digital environments just like a human.

302 stars.

Use this if you are building an AI agent that needs to accurately identify and click on specific buttons, text fields, or icons within a mobile app or website.

Not ideal if your primary goal is general image recognition or object detection outside of the context of graphical user interfaces.

AI-agents GUI-automation digital-assistant web-automation mobile-automation

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

302

Forks

Language

Python

License

MIT

Higher-rated alternatives

andyzeng/apc-vision-toolbox

MIT-Princeton Vision Toolbox for the Amazon Picking Challenge 2016 - RGB-D ConvNet-based object...

Ewenwan/MVision

机器人视觉移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习无人驾驶

leggedrobotics/wild_visual_navigation

Wild Visual Navigation: A system for fast traversability learning via pre-trained models and...

microsoft/event-vae-rl

Visuomotor policies from event-based cameras through representation learning and reinforcement...

RizwanMunawar/trajectory-forcast

Forecast object trajectory based on history of tracks. Provides a stable and computationally...

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights