ByteVisionLab/TokenFlow

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

/ 100

Emerging

This project offers a powerful tool for AI developers working with both image understanding and image generation tasks. It takes raw images or text prompts as input and outputs highly optimized image tokens that can be used to improve the performance of large multimodal models (LMMs) for tasks like visual question answering or text-to-image creation. It's ideal for machine learning engineers and researchers focused on building or enhancing AI models that interpret and create visual content.

449 stars. No commits in the last 6 months.

Use this if you are developing AI models that need to both understand and generate images, and you want a unified approach to process visual data for these tasks.

Not ideal if you are looking for an end-user application or a tool for image editing, as this is a foundational component for AI model development.

AI model development computer vision engineering multimodal AI generative AI image recognition

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

449

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

zhaorw02/DeepMesh

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

YangLing0818/RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with...

thu-nics/FrameFusion

[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token...

Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Explore Diffusion Models

All categories Trending Diffusion directory Insights