FoundationVision/UniTok
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
UniTok helps researchers and developers working with AI models that interpret and create images. It acts as a universal translator, converting visual data into a standardized digital format that AI models can understand, and vice-versa. This allows for a more efficient and accurate exchange of visual information between different AI systems, benefiting those building advanced visual AI applications.
517 stars.
Use this if you are developing AI models that need to both generate images from descriptions and understand what's in an image, and you want a unified approach for processing visual data.
Not ideal if your primary goal is simple image processing or if you are not working with advanced multimodal AI systems.
Stars
517
Forks
11
Language
Python
License
MIT
Category
Last pushed
Nov 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FoundationVision/UniTok"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice