haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLaVA helps you understand and interact with images using natural language. You provide an image and ask questions or give instructions about its content, and it generates descriptive text, answers, or performs tasks like segmentation. This is ideal for anyone needing to extract insights from visuals, such as researchers analyzing images, content creators generating descriptions, or operations teams monitoring visual data.
24,554 stars. No commits in the last 6 months.
Use this if you need to ask questions about images, describe their content, or perform visual tasks using conversational prompts, similar to how you would interact with a human.
Not ideal if your primary need is for purely textual analysis or highly specialized image processing tasks that don't benefit from natural language interaction.
Stars
24,554
Forks
2,745
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/haotian-liu/LLaVA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Compare
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding