addy999/omniparser-api
Self-hosted version of Microsoft's OmniParser Image-to-text model
This tool helps software developers integrate the OmniParser image-to-text model directly into their applications. It takes a screenshot or UI image as input and outputs structured data about the UI elements, including their text, descriptions, and clickable regions. This is ideal for developers building AI agents that interact with user interfaces or automate web workflows.
No commits in the last 6 months.
Use this if you are a developer building an application that needs to programmatically understand and interact with UI elements from screenshots, and you require fast, self-hosted processing without rate limits.
Not ideal if you are looking for a simple web-based tool for one-off image-to-text conversions or if you do not have the technical expertise to deploy and manage a Dockerized application.
Stars
83
Forks
23
Language
Python
License
—
Category
Last pushed
May 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/addy999/omniparser-api"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ParisNeo/lollms-webui
Lord of Large Language and Multi modal Systems Web User Interface
ggozad/oterm
the terminal client for Ollama
owndev/Open-WebUI-Functions
Open-WebUI-Functions is a collection of custom pipelines, filters, and integrations designed to...
hand-e-fr/OpenHosta
A lightweight library integrating LLM natively into Python
lmg-anon/mikupad
LLM Frontend in a single html file