eric-ai-lab/Screen-Point-and-Read

Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"

/ 100

Experimental

This project helps anyone who struggles to understand on-screen information, especially those who rely on screen readers. By simply pointing to an area on a digital screen, it provides a clear description of the content in that specific spot, along with how it's organized and related to other elements. This tool is designed for end-users who need to accurately interpret complex or unfamiliar graphical interfaces, enhancing accessibility and comprehension.

No commits in the last 6 months.

Use this if you need to understand specific content on a GUI screen, particularly its layout and spatial relationships, just by pointing to it.

Not ideal if you're looking for a general-purpose screen reader that reads aloud all elements sequentially without specific point-and-read functionality.

digital-accessibility GUI-comprehension screen-reading user-interface-navigation visual-impairment-support

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

GetStream/Vision-Agents

Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses...

video-db/videodb-capture-quickstart

Give your agents real time desktop perception. Stream screen, microphone, and system audio for...

sijeeshmiziha/visionagent

Multi-provider AI agent framework with vision capabilities and tool calling. Supports OpenAI,...

grctest/g3n-fastapi-webcam-docker

Utilizing multiple Gemma 3n agents to analyze webcam footage

leukaemiamedtech/hias-tassai-facial-recognition

HIAS TassAI Facial Recognition Agent processes streams from local or remote cameras to identify...

Explore AI Agents

All categories Trending AI Agent directory Insights