FoundationVision/OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

/ 100

Emerging

OmniTokenizer helps researchers and developers working on generative AI to represent both images and videos consistently using a single model. It takes in raw images or video clips and outputs discrete visual tokens, which can then be used by language models or diffusion models to create new, realistic visual content. This is ideal for those building advanced AI systems for image and video generation.

323 stars. No commits in the last 6 months.

Use this if you need a high-performance, unified way to tokenize both image and video data for downstream generative AI tasks, especially when dealing with high-resolution or long video inputs.

Not ideal if your primary focus is on basic image or video analysis (like classification or object detection) rather than complex content generation.

Generative AI Image Synthesis Video Generation Deep Learning Research Computer Vision

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

323

Forks

Language

Python

License

MIT

Higher-rated alternatives

Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

VectorSpaceLab/OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

EndlessSora/focal-frequency-loss

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

JIA-Lab-research/DreamOmni2

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing...

SkyworkAI/UniPic

Open-source SOTA multi-image editing model

Explore Diffusion Models

All categories Trending Diffusion directory Insights