JIA-Lab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
This project offers a sophisticated tool for advanced image understanding, reasoning, and text generation. It processes visual inputs like images and accompanying text to produce detailed descriptions, answer complex questions, or generate new text based on visual content. It's designed for researchers and practitioners working with multimodal AI, particularly those developing or evaluating large vision-language models.
3,334 stars. No commits in the last 6 months.
Use this if you need to develop, fine-tune, or evaluate large multimodal models that can perform complex visual reasoning and generate human-like text from images.
Not ideal if you're looking for a simple, out-of-the-box image captioning tool or don't have experience with model training and evaluation.
Stars
3,334
Forks
276
Language
Python
License
Apache-2.0
Category
Last pushed
May 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/JIA-Lab-research/MGM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
zai-org/GLM-TTS
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model