haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

/ 100

Emerging

LLaVA helps you understand and interact with images using natural language. You provide an image and ask questions or give instructions about its content, and it generates descriptive text, answers, or performs tasks like segmentation. This is ideal for anyone needing to extract insights from visuals, such as researchers analyzing images, content creators generating descriptions, or operations teams monitoring visual data.

24,554 stars. No commits in the last 6 months.

Use this if you need to ask questions about images, describe their content, or perform visual tasks using conversational prompts, similar to how you would interact with a human.

Not ideal if your primary need is for purely textual analysis or highly specialized image processing tasks that don't benefit from natural language interaction.

image-analysis visual-intelligence content-description multimodal-interaction visual-question-answering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

24,554

Forks

2,745

Language

Python

License

Apache-2.0

Recent Releases

v1.2.0 31 Jan 2024 v1.1.3 26 Oct 2023 v1.1.1 12 Oct 2023 v1.1.0 08 Oct 2023 v1.0.2 05 Sep 2023

Compare

LLaVA and Video-LLaMA LLaVA and llama-multimodal-vqa LLaVA and LLaVA-Mini LLaVA and ViP-LLaVA

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

DAMO-NLP-SG/Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Explore Transformer Models

All categories Trending Transformer directory Insights