ivonajdenkoska/tulip
[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"
This project helps researchers and developers working with vision-language models to better understand and generate content based on longer, more descriptive text. It takes existing CLIP-like models and upgrades their ability to process lengthy captions, resulting in improved performance for tasks like accurately matching images to long descriptions or creating images from detailed text prompts. Anyone building or training advanced AI models for multimodal understanding, especially those dealing with complex visual scenes and rich textual narratives, would use this.
Use this if you need to improve the performance of your existing CLIP-like models on tasks involving long, descriptive image captions or detailed text prompts for image generation.
Not ideal if your primary need is for short, simple image-text matching or if you are not working with advanced deep learning models.
Stars
33
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ivonajdenkoska/tulip"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
filipstrand/mflux
MLX native implementations of state-of-the-art generative image models
potamides/DeTikZify
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ.
FoundationVision/Infinity
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
zai-org/CogView
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image...
EleutherAI/DALLE-mtf
Open-AI's DALL-E for large scale training in mesh-tensorflow.