Jorffy/NoteMR

[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".

/ 100

Experimental

This project helps MLLM (multimodal large language models) researchers and developers to improve the accuracy of visual question answering (VQA) systems that require external knowledge. It takes an image, a question, and external knowledge, then generates 'knowledge notes' and 'visual notes' to guide the MLLM. The output is a more accurate answer to the question, reducing common errors like misdirected knowledge and visual hallucinations.

No commits in the last 6 months.

Use this if you are a researcher or developer working with multimodal large language models and need to enhance their reasoning capabilities for visual question answering by providing structured guidance from external knowledge and visual cues.

Not ideal if you are looking for a plug-and-play solution for general visual question answering without the need for detailed knowledge integration or fine-grained visual perception enhancements, or if you don't have access to an NVIDIA RTX A6000 GPU.

Multimodal AI Visual Question Answering Knowledge-based AI MLLM Development AI Research

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 7 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

lightonai/pylate

Late Interaction Models Training & Retrieval

TIGER-AI-Lab/VLM2Vec

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal...

Explore Transformer Models

All categories Trending Transformer directory Insights