kyopark2014/llm-multimodal-and-rag
It shows how to use mutimodal and RAG based on multi-region LLM.
This project helps developers build intelligent applications that can understand both text and images. It takes raw data, including visual content, and uses large language models to provide enriched, context-aware responses. Developers can use this to create robust chatbots and AI assistants capable of handling diverse information.
No commits in the last 6 months.
Use this if you are a developer looking to build a generative AI application that needs to process both images and text, leveraging multiple LLMs for higher performance and reliability.
Not ideal if you are an end-user without programming experience, as this is a toolkit for developers to build applications, not a ready-to-use solution.
Stars
27
Forks
4
Language
Python
License
—
Category
Last pushed
Oct 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/kyopark2014/llm-multimodal-and-rag"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
illuin-tech/colpali
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
jolibrain/colette
Multimodal RAG to search and interact locally with technical documents of any kind
nannib/nbmultirag
Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG,...
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs