tripletclip/TripletCLIP

[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"

/ 100

Experimental

This project offers an improved way to train AI models that understand both images and text, making them better at grasping complex descriptions like "a red car next to a blue house." It takes existing image-text datasets and pre-trained models, then outputs more robust models capable of nuanced visual-linguistic understanding. Researchers and AI practitioners developing advanced vision-language applications would use this.

No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to enhance the compositional reasoning capabilities of your vision-language models for tasks like complex image search or description generation.

Not ideal if you are looking for an off-the-shelf application to directly analyze images and text without involving model training or fine-tuning.

AI-research computer-vision natural-language-processing model-training multimodal-AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

AssemblyAI-Community/MinImagen

MinImagen: A minimal implementation of the Imagen text-to-image model

Explore Diffusion Models

All categories Trending Diffusion directory Insights