lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

/ 100

Emerging

This project helps AI researchers and developers working with large language models to integrate various types of unstructured data. It takes inputs like images, videos, audio clips, and text, then processes and aligns them to be understood by a language model. The output is a multi-modal language model capable of processing and generating responses based on diverse data types.

1,593 stars. No commits in the last 6 months.

Use this if you are developing advanced AI models and need to combine information from images, videos, audio, and text for a unified language understanding system.

Not ideal if you are looking for a ready-to-use application or API for end-user tasks, as this is a foundational model for further AI development.

AI research multi-modal learning large language models computer vision natural language processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

1,593

Forks

132

Language

Python

License

Apache-2.0

Higher-rated alternatives

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

kyegomez/PALM-E

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Explore Transformer Models

All categories Trending Transformer directory Insights