LLaVA and Video-LLaMA

These are complements: LLaVA provides the foundational vision-language instruction-tuning methodology for static images, which Video-LLaMA extends to the temporal and audio-visual domain for video understanding.

LLaVA

Emerging

Video-LLaMA

Emerging

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 21/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 20/25

Stars: 24,554

Forks: 2,745

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 3,134

Forks: 285

Downloads: —

Commits (30d): 0

Language: Python

License: BSD-3-Clause

Stale 6m No Package No Dependents

About LLaVA

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

LLaVA helps you understand and interact with images using natural language. You provide an image and ask questions or give instructions about its content, and it generates descriptive text, answers, or performs tasks like segmentation. This is ideal for anyone needing to extract insights from visuals, such as researchers analyzing images, content creators generating descriptions, or operations teams monitoring visual data.

image-analysis visual-intelligence content-description multimodal-interaction visual-question-answering

About Video-LLaMA

DAMO-NLP-SG/Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Video-LLaMA helps you understand the content of videos and images by answering your questions about them. You input a video or an image, and the model provides detailed text descriptions or answers based on both the visual and auditory information present. This is ideal for content analysts, researchers, or anyone needing to quickly extract insights from multimedia.

video-analysis content-understanding multimedia-research AI-assisted-analysis visual-question-answering

Related comparisons

LLaVA and llama-multimodal-vqa LLaVA and LLaVA-Mini LLaVA and ViP-LLaVA

Scores updated daily from GitHub, PyPI, and npm data. How scores work