mlpc-ucsd/BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

41
/ 100
Emerging

Need to extract information or answer questions from images containing a lot of text, like charts, documents, or social media posters? BLIVA processes an image and a text question to give you accurate answers, even when the image is packed with words. This is ideal for anyone working with visual data that includes complex textual elements, such as researchers analyzing charts or marketers reviewing ad creatives.

260 stars. No commits in the last 6 months.

Use this if you need to reliably get answers to specific questions by 'reading' both the visual and textual content within an image.

Not ideal if your primary need is general image description without any text-based querying, or if you only process images with minimal to no text.

document-analysis chart-interpretation visual-content-analysis market-research data-extraction
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

260

Forks

25

Language

Python

License

BSD-3-Clause

Last pushed

Apr 14, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mlpc-ucsd/BLIVA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.