kyegomez/ScreenAI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
This project helps developers integrate a vision-language model to understand user interfaces and infographics. It takes an image (like a screenshot or a chart) and associated text, then processes them to provide an interpretable output about their content and relationship. It's designed for engineers building applications that need to interpret visual and textual data from screens or complex diagrams.
380 stars. Available on PyPI.
Use this if you are a developer creating applications that need to programmatically understand and extract information from screenshots, app interfaces, or detailed infographics by combining visual and textual input.
Not ideal if you are an end-user looking for a ready-to-use application to analyze UIs or infographics without programming.
Stars
380
Forks
36
Language
Python
License
MIT
Category
Last pushed
Feb 06, 2026
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kyegomez/ScreenAI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
microsoft/art
Exploring the connections between artworks with deep "Visual Analogies"
bluet/everypixel-js
JavaScript support for EveryPixel API
xingbpshen/artifact-impact
A [Genshin Impact] artifacts enhancement predictor. 一个【原神】圣遗物强化预测工具。
codetorex/spritex
A simple tool for extracting sprites from full frames. Useful for AI projects.
alexandrevl/pyscreen
PyScreen is an AI-powered tool that extracts, analyzes, and visualizes data from screen...