addy999/omniparser-api

Self-hosted version of Microsoft's OmniParser Image-to-text model

/ 100

Emerging

This tool helps software developers integrate the OmniParser image-to-text model directly into their applications. It takes a screenshot or UI image as input and outputs structured data about the UI elements, including their text, descriptions, and clickable regions. This is ideal for developers building AI agents that interact with user interfaces or automate web workflows.

No commits in the last 6 months.

Use this if you are a developer building an application that needs to programmatically understand and interact with UI elements from screenshots, and you require fast, self-hosted processing without rate limits.

Not ideal if you are looking for a simple web-based tool for one-off image-to-text conversions or if you do not have the technical expertise to deploy and manage a Dockerized application.

AI agent development UI automation web scraping computer vision application integration

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 20 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ParisNeo/lollms-webui

Lord of Large Language and Multi modal Systems Web User Interface

ggozad/oterm

the terminal client for Ollama

owndev/Open-WebUI-Functions

Open-WebUI-Functions is a collection of custom pipelines, filters, and integrations designed to...

hand-e-fr/OpenHosta

A lightweight library integrating LLM natively into Python

lmg-anon/mikupad

LLM Frontend in a single html file

Explore LLM Tools

All categories Trending LLM Tool directory Insights