lizhaoliu-Lec/CG-VLM

This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.

/ 100

Experimental

This project offers a method for training efficient instruction-following AI models that understand both images and text. It takes image-text pairs and user instructions as input, then produces a model capable of generating relevant responses or actions based on those instructions and visual information. This is useful for AI researchers and developers working on advanced multimodal AI systems.

No commits in the last 6 months.

Use this if you are developing AI models that need to interpret complex visual information alongside natural language instructions efficiently.

Not ideal if you are looking for an off-the-shelf application or a solution that doesn't require deep AI development expertise.

AI-research multimodal-AI machine-learning-engineering computer-vision natural-language-processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

—

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights