FoundationVision/UniTok

[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

/ 100

Emerging

UniTok helps researchers and developers working with AI models that interpret and create images. It acts as a universal translator, converting visual data into a standardized digital format that AI models can understand, and vice-versa. This allows for a more efficient and accurate exchange of visual information between different AI systems, benefiting those building advanced visual AI applications.

517 stars.

Use this if you are developing AI models that need to both generate images from descriptions and understand what's in an image, and you want a unified approach for processing visual data.

Not ideal if your primary goal is simple image processing or if you are not working with advanced multimodal AI systems.

visual-AI-development image-generation image-understanding multimodal-AI computer-vision-research

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

517

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights