FoundationVision/OmniTokenizer
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
OmniTokenizer helps researchers and developers working on generative AI to represent both images and videos consistently using a single model. It takes in raw images or video clips and outputs discrete visual tokens, which can then be used by language models or diffusion models to create new, realistic visual content. This is ideal for those building advanced AI systems for image and video generation.
323 stars. No commits in the last 6 months.
Use this if you need a high-performance, unified way to tokenize both image and video data for downstream generative AI tasks, especially when dealing with high-resolution or long video inputs.
Not ideal if your primary focus is on basic image or video analysis (like classification or object detection) rather than complex content generation.
Stars
323
Forks
8
Language
Python
License
MIT
Category
Last pushed
Jul 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/FoundationVision/OmniTokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
EndlessSora/focal-frequency-loss
[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis
JIA-Lab-research/DreamOmni2
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing...
SkyworkAI/UniPic
Open-source SOTA multi-image editing model