lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

45
/ 100
Emerging

This project helps AI researchers and developers working with large language models to integrate various types of unstructured data. It takes inputs like images, videos, audio clips, and text, then processes and aligns them to be understood by a language model. The output is a multi-modal language model capable of processing and generating responses based on diverse data types.

1,593 stars. No commits in the last 6 months.

Use this if you are developing advanced AI models and need to combine information from images, videos, audio, and text for a unified language understanding system.

Not ideal if you are looking for a ready-to-use application or API for end-user tasks, as this is a foundational model for further AI development.

AI research multi-modal learning large language models computer vision natural language processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

1,593

Forks

132

Language

Python

License

Apache-2.0

Last pushed

Jan 01, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lyuchenyang/Macaw-LLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.