TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

/ 100

Established

This project offers a specialized framework for creating and customizing small-scale Large Multimodal Models (LMMs). It takes raw image and text data, along with configuration choices for language models, vision models, and training methods, to produce a finely tuned LMM. This is for machine learning researchers and practitioners who want to build efficient LMMs without extensive coding.

962 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you are a machine learning researcher or engineer looking to develop or experiment with custom, compact multimodal AI models that can understand both images and text.

Not ideal if you are an end-user simply looking to use an existing multimodal AI model off-the-shelf without any customization or training.

multimodal-ai machine-learning-engineering ai-model-development model-customization computer-vision-nlp

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

962

Forks

Language

Python

License

Apache-2.0

Compare

TinyLLaVA_Factory and LLaVA-Mini

Related models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

DAMO-NLP-SG/Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Explore Transformer Models

All categories Trending Transformer directory Insights