ParadoxZW/LLaVA-UHD-Better

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

/ 100

Emerging

This project helps AI researchers and developers working with multimodal large language models (LMMs) that process high-resolution images. It takes raw image data and text as input, specifically for LLaVA-UHD models, and produces a more robust and accurately trained LMM without common errors present in the original implementation. This is ideal for those actively training and refining LMMs for image understanding tasks.

No commits in the last 6 months.

Use this if you are a researcher or developer actively training LLaVA-UHD models and need a reliable, bug-fixed implementation for better performance and accurate results.

Not ideal if you are looking for a pre-trained, ready-to-use LMM for inference without needing to engage in model training or architectural modifications.

AI research multimodal models large language models image recognition deep learning development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights