baaivision/EVE

EVE Series: Encoder-Free Vision-Language Models from BAAI

/ 100

Emerging

EVE provides advanced research in Vision-Language Models (VLMs) by exploring methods to remove the traditional vision encoder. It takes image data and text as input and aims to produce highly capable multimodal AI models. This project is for AI researchers and practitioners focused on developing or understanding next-generation vision-language AI.

368 stars. No commits in the last 6 months.

Use this if you are an AI researcher investigating novel architectures for multimodal large language models, specifically interested in encoder-free designs.

Not ideal if you are a business user looking for a ready-to-deploy tool or an application developer seeking a stable API for immediate integration into a product.

AI-research vision-language-models deep-learning multimodal-AI computational-linguistics

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

368

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights