deepglint/unicom

Large-Scale Visual Representation Model

/ 100

Emerging

This project offers powerful visual recognition models that help AI developers build advanced multimodal AI systems. It takes raw image data and processes it into detailed visual features, which can then be combined with text to create more intelligent AI applications. It's ideal for AI researchers and machine learning engineers working on next-generation visual language models or embodied AI.

704 stars.

Use this if you are a machine learning engineer or researcher developing multimodal large language models and need state-of-the-art visual feature extraction.

Not ideal if you are an end-user looking for a ready-to-use application, as this is a foundational model for AI developers.

multimodal-ai computer-vision large-language-models ai-development deep-learning

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

704

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights