lizhaoliu-Lec/CG-VLM
This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.
This project offers a method for training efficient instruction-following AI models that understand both images and text. It takes image-text pairs and user instructions as input, then produces a model capable of generating relevant responses or actions based on those instructions and visual information. This is useful for AI researchers and developers working on advanced multimodal AI systems.
No commits in the last 6 months.
Use this if you are developing AI models that need to interpret complex visual information alongside natural language instructions efficiently.
Not ideal if you are looking for an off-the-shelf application or a solution that doesn't require deep AI development expertise.
Stars
20
Forks
1
Language
—
License
MIT
Category
Last pushed
Dec 01, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lizhaoliu-Lec/CG-VLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice