facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

/ 100

Established

This framework helps AI researchers quickly set up new projects that combine visual information (like images or videos) with text information (like captions or questions). It takes in datasets containing both images and related text, and outputs trained models capable of understanding and generating insights from this combined data. Researchers and machine learning engineers working on cutting-edge AI problems would use this.

5,622 stars. Actively maintained with 3 commits in the last 30 days.

Use this if you are an AI researcher starting a new project that involves analyzing or generating content from both images and text, and you need a robust, scalable foundation.

Not ideal if you are a practitioner looking for a ready-to-use application or a developer working on a non-AI project.

AI research computer vision natural language processing multimodal learning machine learning engineering

No Package No Dependents

Maintenance 9 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

5,622

Forks

944

Language

Python

License

—

Related frameworks

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

kuanghuei/SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

Explore ML Frameworks

All categories Trending ML Framework directory Insights