FoundationVision/OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

34
/ 100
Emerging

OmniTokenizer helps researchers and developers working on generative AI to represent both images and videos consistently using a single model. It takes in raw images or video clips and outputs discrete visual tokens, which can then be used by language models or diffusion models to create new, realistic visual content. This is ideal for those building advanced AI systems for image and video generation.

323 stars. No commits in the last 6 months.

Use this if you need a high-performance, unified way to tokenize both image and video data for downstream generative AI tasks, especially when dealing with high-resolution or long video inputs.

Not ideal if your primary focus is on basic image or video analysis (like classification or object detection) rather than complex content generation.

Generative AI Image Synthesis Video Generation Deep Learning Research Computer Vision
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

323

Forks

8

Language

Python

License

MIT

Last pushed

Jul 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/FoundationVision/OmniTokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.