tripletclip/TripletCLIP

[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"

21
/ 100
Experimental

This project offers an improved way to train AI models that understand both images and text, making them better at grasping complex descriptions like "a red car next to a blue house." It takes existing image-text datasets and pre-trained models, then outputs more robust models capable of nuanced visual-linguistic understanding. Researchers and AI practitioners developing advanced vision-language applications would use this.

No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to enhance the compositional reasoning capabilities of your vision-language models for tasks like complex image search or description generation.

Not ideal if you are looking for an off-the-shelf application to directly analyze images and text without involving model training or fine-tuning.

AI-research computer-vision natural-language-processing model-training multimodal-AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

46

Forks

2

Language

Python

License

Last pushed

Dec 01, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/tripletclip/TripletCLIP"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.