ivonajdenkoska/tulip

[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"

/ 100

Emerging

This project helps researchers and developers working with vision-language models to better understand and generate content based on longer, more descriptive text. It takes existing CLIP-like models and upgrades their ability to process lengthy captions, resulting in improved performance for tasks like accurately matching images to long descriptions or creating images from detailed text prompts. Anyone building or training advanced AI models for multimodal understanding, especially those dealing with complex visual scenes and rich textual narratives, would use this.

Use this if you need to improve the performance of your existing CLIP-like models on tasks involving long, descriptive image captions or detailed text prompts for image generation.

Not ideal if your primary need is for short, simple image-text matching or if you are not working with advanced deep learning models.

AI model training multimodal AI image understanding natural language processing computer vision

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

filipstrand/mflux

MLX native implementations of state-of-the-art generative image models

potamides/DeTikZify

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ.

FoundationVision/Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

zai-org/CogView

Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image...

EleutherAI/DALLE-mtf

Open-AI's DALL-E for large scale training in mesh-tensorflow.

Explore Transformer Models

All categories Trending Transformer directory Insights