facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

58
/ 100
Established

This framework helps AI researchers quickly set up new projects that combine visual information (like images or videos) with text information (like captions or questions). It takes in datasets containing both images and related text, and outputs trained models capable of understanding and generating insights from this combined data. Researchers and machine learning engineers working on cutting-edge AI problems would use this.

5,622 stars. Actively maintained with 3 commits in the last 30 days.

Use this if you are an AI researcher starting a new project that involves analyzing or generating content from both images and text, and you need a robust, scalable foundation.

Not ideal if you are a practitioner looking for a ready-to-use application or a developer working on a non-AI project.

AI research computer vision natural language processing multimodal learning machine learning engineering
No Package No Dependents
Maintenance 9 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

5,622

Forks

944

Language

Python

License

Last pushed

Jan 12, 2026

Commits (30d)

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/facebookresearch/mmf"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.