Victorwz/MLM_Filter

Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".

29
/ 100
Experimental

This project helps machine learning engineers and researchers select the best image-text pairs from vast, web-crawled datasets. By inputting an image and its corresponding text caption, the tool generates a quality score indicating how well they match and the overall caption quality. This allows users to filter out low-quality data and create cleaner datasets for training large-scale multimodal models.

No commits in the last 6 months.

Use this if you need to efficiently filter large collections of image-text data to improve the quality of your training datasets for computer vision or multimodal AI models.

Not ideal if you are looking for a tool to generate image captions or to evaluate the quality of images or text independently, as its primary function is assessing image-text pair coherence.

data-curation dataset-quality multimodal-ai computer-vision machine-learning-engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

69

Forks

1

Language

Python

License

MIT

Last pushed

Apr 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Victorwz/MLM_Filter"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.