TinyLLaVA_Factory and LLaVA-Mini
These are competitors—both frameworks create efficient multimodal models by reducing LLaVA's size through knowledge distillation and architecture optimization, targeting the same use case of deploying vision-language models with minimal computational overhead.
About TinyLLaVA_Factory
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
This project offers a specialized framework for creating and customizing small-scale Large Multimodal Models (LMMs). It takes raw image and text data, along with configuration choices for language models, vision models, and training methods, to produce a finely tuned LMM. This is for machine learning researchers and practitioners who want to build efficient LMMs without extensive coding.
About LLaVA-Mini
ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
This project offers a unified large multimodal model that efficiently processes and understands both images and videos. It takes visual inputs (still images or video clips) and provides detailed descriptions or answers to questions about the content. Researchers and developers working with large language models to analyze visual data will find this tool useful for high-performance applications.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work