Jacksonlark/open-mllms

open llm for multimodal

/ 100

Emerging

This project helps AI developers and researchers find and compare open-source large language models and datasets that can process multiple types of data, like images, text, and audio. It provides a curated list of models that accept various inputs and generate outputs for tasks such as image captioning, story generation from images, and cross-modal retrieval. Data scientists and machine learning engineers working on advanced AI applications would use this.

No commits in the last 6 months.

Use this if you are an AI developer or researcher looking for readily available, open-source multimodal large language models and corresponding datasets to integrate into your projects.

Not ideal if you are an end-user seeking a ready-to-use application or a non-technical person without an understanding of AI models and datasets.

AI development machine learning engineering multimodal AI language model research dataset curation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

—

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights