mu-cai/matryoshka-mm

Matryoshka Multimodal Models

/ 100

Emerging

This project helps people who work with visual data and text to get more efficient and accurate results from their AI models. It takes in images and text, then processes them together to produce a better understanding of the visual content, often for tasks like image description or visual question answering. Researchers, AI developers, and data scientists building multimodal AI applications would find this useful.

122 stars. No commits in the last 6 months.

Use this if you are developing or evaluating advanced AI models that need to understand both images and text simultaneously, and you need highly granular visual insights.

Not ideal if you are looking for a simple, off-the-shelf image recognition tool without multimodal capabilities or if your tasks are purely text-based.

multimodal-ai computer-vision natural-language-processing image-understanding ai-development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

122

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights