NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
This helps AI/ML engineers and researchers who are building advanced AI models that need to understand long videos and high-resolution images. It takes raw video footage, high-resolution images, multi-page documents, and long text as input. It then processes these inputs to provide comprehensive understanding and reasoning, allowing you to build applications that can summarize long videos, answer questions about complex visual data, or analyze detailed documents.
931 stars.
Use this if you are a machine learning engineer or researcher developing advanced vision-language AI models that require processing very long videos or extremely detailed high-resolution images.
Not ideal if you are a non-technical user looking for an off-the-shelf application to analyze images or videos without deep technical integration.
Stars
931
Forks
49
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NVlabs/Eagle"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding