kyegomez/MMCA
The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"
This is a low-level software component for developers building advanced AI models. It helps process diverse data like images and text together, letting the AI understand how they relate sequentially. Developers working on multimodal AI applications, like those for generating image captions or visual question answering, would use this to build their models.
No commits in the last 6 months. Available on PyPI.
Use this if you are an AI model developer creating systems that need to process and understand combinations of different data types, like text and images, in a sequential or conversational context.
Not ideal if you are an end-user looking for a ready-to-use application or a high-level tool; this is a foundational component for building such applications.
Stars
11
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2024
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/MMCA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle