shikiw/Modality-Integration-Rate
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
This project helps AI researchers and practitioners evaluate and improve Large Vision-Language Models (LVLMs). It takes a pre-trained LVLM, along with text and image datasets, to output a 'Modality Integration Rate' score. This score indicates how well the model combines visual and textual information, guiding developers in refining their model's cross-modal understanding.
111 stars. No commits in the last 6 months.
Use this if you are developing or fine-tuning Large Vision-Language Models and need a quantitative metric to assess how effectively your model integrates visual and textual information during pre-training.
Not ideal if you are an end-user simply looking to apply an existing Vision-Language Model for tasks like image captioning or visual question answering, as this is a developer tool for model analysis.
Stars
111
Forks
2
Language
Python
License
MIT
Category
Last pushed
Jul 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/shikiw/Modality-Integration-Rate"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies