tripletclip/TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
This project offers an improved way to train AI models that understand both images and text, making them better at grasping complex descriptions like "a red car next to a blue house." It takes existing image-text datasets and pre-trained models, then outputs more robust models capable of nuanced visual-linguistic understanding. Researchers and AI practitioners developing advanced vision-language applications would use this.
No commits in the last 6 months.
Use this if you are an AI researcher or developer looking to enhance the compositional reasoning capabilities of your vision-language models for tasks like complex image search or description generation.
Not ideal if you are looking for an off-the-shelf application to directly analyze images and text without involving model training or fine-tuning.
Stars
46
Forks
2
Language
Python
License
—
Category
Last pushed
Dec 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/tripletclip/TripletCLIP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
FoundationVision/VAR
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...
nerdyrodent/VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
huggingface/finetrainers
Scalable and memory-optimized training of diffusion models
AssemblyAI-Community/MinImagen
MinImagen: A minimal implementation of the Imagen text-to-image model