Jorffy/NoteMR
[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".
This project helps MLLM (multimodal large language models) researchers and developers to improve the accuracy of visual question answering (VQA) systems that require external knowledge. It takes an image, a question, and external knowledge, then generates 'knowledge notes' and 'visual notes' to guide the MLLM. The output is a more accurate answer to the question, reducing common errors like misdirected knowledge and visual hallucinations.
No commits in the last 6 months.
Use this if you are a researcher or developer working with multimodal large language models and need to enhance their reasoning capabilities for visual question answering by providing structured guidance from external knowledge and visual cues.
Not ideal if you are looking for a plug-and-play solution for general visual question answering without the need for detailed knowledge integration or fine-grained visual perception enhancements, or if you don't have access to an NVIDIA RTX A6000 GPU.
Stars
21
Forks
2
Language
Python
License
—
Category
Last pushed
Jun 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Jorffy/NoteMR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.