ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
This project helps AI researchers and developers working with multimodal large language models (LMMs) that process high-resolution images. It takes raw image data and text as input, specifically for LLaVA-UHD models, and produces a more robust and accurately trained LMM without common errors present in the original implementation. This is ideal for those actively training and refining LMMs for image understanding tasks.
No commits in the last 6 months.
Use this if you are a researcher or developer actively training LLaVA-UHD models and need a reliable, bug-fixed implementation for better performance and accurate results.
Not ideal if you are looking for a pre-trained, ready-to-use LMM for inference without needing to engage in model training or architectural modifications.
Stars
35
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ParadoxZW/LLaVA-UHD-Better"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice