The Multimodal Directory

Quality-scored directory of 39 multimodal ai tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Vision-language models, cross-modal retrieval, and multimodal learning tools — combining text, image, audio, and video understanding in unified systems.

Verified

1

70–100

Established

9

50–69

Emerging

16

30–49

Experimental

13

10–29

Top tools by quality score

# Tool Score
1 starVLA/starVLA

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

71
2 vortex-data/vortex

An extensible, state-of-the-art framework for columnar compression, and the...

69
3 motis-project/motis

multimodal routing, geocoding, and map tiles

64
4 zai-org/GLM-V

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with...

64
5 neka-nat/cad3dify

2D to 3D CAD Conversion Using VLM

61
6 batmanlab/Mammo-CLIP

[MICCAI 2024, top 11%] Official Pytorch implementation of Mammo-CLIP: A...

59
7 opendatalab/mineru-vl-utils

A Python package for interacting with the MinerU Vision-Language Model.

57
8 EMob-Lab/MnMS

Agent-based Multimodal Urban Moblity Simulator resulting from the ERC MAGnUM project

51
9 GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized...

51
10 withceleste/celeste-python

Open source, type-safe primitives for multi-modal AI. All modelities, all...

50
11 cloudglue/cloudglue-js

Official JavaScript / TypeScript SDK for Cloudglue API

48
12 EvolvingLMMs-Lab/LongVT

[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

47
13 om-ai-lab/GroundVLP

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language...

46
14 Jinfeng-Xu/Awesome-Multimodal-Recommender-Systems

[TMM'26] Continuously Updated Awesome Multimodal Recommendation Paper List

45
15 anam-org/metaxy

Pluggable sample-level metadata versioning for incremental multimodal pipelines.

45
16 eduardosanzb/escribano

AI-powered session intelligence tool - transcribes Cap recordings with Whisper

44
17 yunncheng/MMRL

[CVPR 2025 & IJCV2026] Official PyTorch Code for "MMRL: Multi-Modal...

43
18 Mellow-Artificial-Intelligence/open-xtract

Extract structured data from documents, images, audio, and video using LLMs.

43
19 ComfyUI-Kelin/ComfyUI-LLMs-Toolkit

ComfyUI custom nodes for DeepSeek, Qwen, GPT, and other OpenAI-compatible...

43
20 MING-ZCH/CII-Bench

[ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?

38

Browse by category