open-mmlab/Multimodal-GPT

Multimodal-GPT

/ 100

Emerging

This project helps you create intelligent chatbots that can understand both text and images, allowing for richer, more natural conversations. You input various visual and language instruction datasets, and the output is a finely tuned chatbot capable of responding to complex queries that involve visual information. This is ideal for researchers or developers looking to build advanced AI assistants for diverse applications.

1,517 stars. No commits in the last 6 months.

Use this if you want to develop a conversational AI that can interpret and respond to queries involving both visual content and text instructions.

Not ideal if you are looking for a pre-built, ready-to-deploy multimodal chatbot without any fine-tuning or development work.

conversational-ai image-understanding natural-language-processing chatbot-development AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

1,517

Forks

130

Language

Python

License

Apache-2.0

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights