kyegomez/MM1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

37
/ 100
Emerging

This project provides a foundational PyTorch implementation for exploring how large language models can understand and generate content based on both text and images. It takes an image and a sequence of text as input, processes them through a multimodal architecture, and outputs a refined set of tokens for further text generation or analysis. This is primarily for researchers and AI practitioners who are building or experimenting with advanced AI models that interpret and respond to visual and textual information.

Use this if you are an AI researcher or developer focusing on multimodal AI architectures and want to experiment with the core mechanisms of integrating image and text data into a unified model.

Not ideal if you are looking for a ready-to-use application or a fully trained model for immediate deployment in a specific business context.

Multimodal AI research Large Language Models Computer Vision Natural Language Processing AI model development
No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

26

Forks

1

Language

Python

License

MIT

Last pushed

Mar 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/MM1"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.