LLaVA and ViP-LLaVA

ViP-LLaVA builds upon LLaVA's architecture by extending its visual instruction tuning approach to handle arbitrary visual prompts (like spatial markers and annotations) rather than just image-text pairs, making them complementary advances in the same multimodal instruction-tuning lineage.

LLaVA

Emerging

ViP-LLaVA

Emerging

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 21/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 12/25

Stars: 24,554

Forks: 2,745

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 336

Forks: 21

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stale 6m No Package No Dependents

About LLaVA

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

LLaVA helps you understand and interact with images using natural language. You provide an image and ask questions or give instructions about its content, and it generates descriptive text, answers, or performs tasks like segmentation. This is ideal for anyone needing to extract insights from visuals, such as researchers analyzing images, content creators generating descriptions, or operations teams monitoring visual data.

image-analysis visual-intelligence content-description multimodal-interaction visual-question-answering

About ViP-LLaVA

WisconsinAIVision/ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

This tool helps researchers and developers make large multimodal models (LMMs) understand specific regions or objects within an image. You provide an image and visually highlight a region (a 'visual prompt'), and the model outputs a detailed text description or answers questions about that specific area. It's designed for those working on computer vision, AI research, and multimodal AI applications.

AI-research computer-vision multimodal-AI image-understanding visual-question-answering

Related comparisons

LLaVA and Video-LLaMA LLaVA and llama-multimodal-vqa LLaVA and LLaVA-Mini

Scores updated daily from GitHub, PyPI, and npm data. How scores work