young-geng/m3ae_public

Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation

/ 100

Emerging

This project helps machine learning researchers pre-train advanced AI models that understand both images and text. It takes in datasets containing images, text, or paired image-text data and produces powerful, pre-trained models. These models can then be fine-tuned for specialized tasks like image classification.

107 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer looking to train state-of-the-art multimodal AI models efficiently, especially on large datasets using cloud GPUs or TPUs.

Not ideal if you are a business user looking for a ready-to-use application or a developer who needs a simple API for common image/text processing tasks.

deep-learning computer-vision natural-language-processing multimodal-ai model-pretraining

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

107

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ThilinaRajapakse/simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...

jsksxs360/How-to-use-Transformers

Transformers 库快速入门教程

google/deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...

abhimishra91/transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Explore Transformer Models

All categories Trending Transformer directory Insights