young-geng/m3ae_public

Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation

39
/ 100
Emerging

This project helps machine learning researchers pre-train advanced AI models that understand both images and text. It takes in datasets containing images, text, or paired image-text data and produces powerful, pre-trained models. These models can then be fine-tuned for specialized tasks like image classification.

107 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer looking to train state-of-the-art multimodal AI models efficiently, especially on large datasets using cloud GPUs or TPUs.

Not ideal if you are a business user looking for a ready-to-use application or a developer who needs a simple API for common image/text processing tasks.

deep-learning computer-vision natural-language-processing multimodal-ai model-pretraining
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

107

Forks

12

Language

Python

License

Apache-2.0

Last pushed

Feb 26, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/young-geng/m3ae_public"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.