kyegomez/MMCA

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

/ 100

Emerging

This is a low-level software component for developers building advanced AI models. It helps process diverse data like images and text together, letting the AI understand how they relate sequentially. Developers working on multimodal AI applications, like those for generating image captions or visual question answering, would use this to build their models.

No commits in the last 6 months. Available on PyPI.

Use this if you are an AI model developer creating systems that need to process and understand combinations of different data types, like text and images, in a sequential or conversational context.

Not ideal if you are an end-user looking for a ready-to-use application or a high-level tool; this is a foundational component for building such applications.

AI model development multi-modal deep learning natural language processing computer vision transformer architecture

Stale 6m

Maintenance 0 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

Explore Transformer Models

All categories Trending Transformer directory Insights