haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

47
/ 100
Emerging

LLaVA helps you understand and interact with images using natural language. You provide an image and ask questions or give instructions about its content, and it generates descriptive text, answers, or performs tasks like segmentation. This is ideal for anyone needing to extract insights from visuals, such as researchers analyzing images, content creators generating descriptions, or operations teams monitoring visual data.

24,554 stars. No commits in the last 6 months.

Use this if you need to ask questions about images, describe their content, or perform visual tasks using conversational prompts, similar to how you would interact with a human.

Not ideal if your primary need is for purely textual analysis or highly specialized image processing tasks that don't benefit from natural language interaction.

image-analysis visual-intelligence content-description multimodal-interaction visual-question-answering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

24,554

Forks

2,745

Language

Python

License

Apache-2.0

Last pushed

Aug 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/haotian-liu/LLaVA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.