SALT-NLP/LLaVAR

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

37
/ 100
Emerging

This project helps anyone working with images that contain significant amounts of text, such as charts, diagrams, product labels, or infographics. It allows you to feed in an image and ask questions about both its visual content and the text embedded within it. The output is a clear, natural language answer, making it useful for researchers, data analysts, or content creators who need to extract specific information from complex visual documents.

269 stars. No commits in the last 6 months.

Use this if you need to understand and extract information from images where text is a crucial component, beyond just recognizing objects.

Not ideal if your primary need is general image recognition without a strong emphasis on understanding embedded text.

document-analysis information-extraction data-visualization-interpretation content-understanding visual-question-answering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

269

Forks

14

Language

Python

License

Apache-2.0

Last pushed

Jun 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/SALT-NLP/LLaVAR"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.