OSU-NLP-Group/UGround
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
UGround helps AI agents understand and interact with any graphical user interface (GUI) by precisely locating specific elements on a screen. It takes a screenshot of a mobile app or website and a natural language instruction, then outputs the exact location (bounding box) of the UI element the AI needs to interact with. This is for AI developers and researchers building intelligent agents that can navigate and perform tasks in digital environments just like a human.
302 stars.
Use this if you are building an AI agent that needs to accurately identify and click on specific buttons, text fields, or icons within a mobile app or website.
Not ideal if your primary goal is general image recognition or object detection outside of the context of graphical user interfaces.
Stars
302
Forks
15
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/OSU-NLP-Group/UGround"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
andyzeng/apc-vision-toolbox
MIT-Princeton Vision Toolbox for the Amazon Picking Challenge 2016 - RGB-D ConvNet-based object...
Ewenwan/MVision
机器人视觉 移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习 无人驾驶
leggedrobotics/wild_visual_navigation
Wild Visual Navigation: A system for fast traversability learning via pre-trained models and...
microsoft/event-vae-rl
Visuomotor policies from event-based cameras through representation learning and reinforcement...
RizwanMunawar/trajectory-forcast
Forecast object trajectory based on history of tracks. Provides a stable and computationally...