eric-ai-lab/Screen-Point-and-Read
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
This project helps anyone who struggles to understand on-screen information, especially those who rely on screen readers. By simply pointing to an area on a digital screen, it provides a clear description of the content in that specific spot, along with how it's organized and related to other elements. This tool is designed for end-users who need to accurately interpret complex or unfamiliar graphical interfaces, enhancing accessibility and comprehension.
No commits in the last 6 months.
Use this if you need to understand specific content on a GUI screen, particularly its layout and spatial relationships, just by pointing to it.
Not ideal if you're looking for a general-purpose screen reader that reads aloud all elements sequentially without specific point-and-read functionality.
Stars
29
Forks
4
Language
Python
License
—
Category
Last pushed
Jul 31, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/eric-ai-lab/Screen-Point-and-Read"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GetStream/Vision-Agents
Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses...
video-db/videodb-capture-quickstart
Give your agents real time desktop perception. Stream screen, microphone, and system audio for...
sijeeshmiziha/visionagent
Multi-provider AI agent framework with vision capabilities and tool calling. Supports OpenAI,...
grctest/g3n-fastapi-webcam-docker
Utilizing multiple Gemma 3n agents to analyze webcam footage
leukaemiamedtech/hias-tassai-facial-recognition
HIAS TassAI Facial Recognition Agent processes streams from local or remote cameras to identify...