deepglint/unicom
Large-Scale Visual Representation Model
This project offers powerful visual recognition models that help AI developers build advanced multimodal AI systems. It takes raw image data and processes it into detailed visual features, which can then be combined with text to create more intelligent AI applications. It's ideal for AI researchers and machine learning engineers working on next-generation visual language models or embodied AI.
704 stars.
Use this if you are a machine learning engineer or researcher developing multimodal large language models and need state-of-the-art visual feature extraction.
Not ideal if you are an end-user looking for a ready-to-use application, as this is a foundational model for AI developers.
Stars
704
Forks
34
Language
Python
License
MIT
Category
Last pushed
Dec 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/deepglint/unicom"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice