The Multimodal Directory

Quality-scored directory of 39 multimodal ai tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Vision-language models, cross-modal retrieval, and multimodal learning tools — combining text, image, audio, and video understanding in unified systems.

Browse all tools

Verified

70–100

Established

50–69

Emerging

30–49

Experimental

10–29

Top tools by quality score

#	Tool	Score	Stars	Language
1	starVLA/starVLA StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing	71	1,702	Python
2	vortex-data/vortex An extensible, state-of-the-art framework for columnar compression, and the...	69	2,853	Rust
3	motis-project/motis multimodal routing, geocoding, and map tiles	64	491	C++
4	zai-org/GLM-V GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with...	64	2,266	Python
5	neka-nat/cad3dify 2D to 3D CAD Conversion Using VLM	61	247	Python
6	batmanlab/Mammo-CLIP [MICCAI 2024, top 11%] Official Pytorch implementation of Mammo-CLIP: A...	59	90	Python
7	opendatalab/mineru-vl-utils A Python package for interacting with the MinerU Vision-Language Model.	57	109	Python
8	EMob-Lab/MnMS Agent-based Multimodal Urban Moblity Simulator resulting from the ERC MAGnUM project	51	20	Python
9	GerrySant/multimodalhugs MultimodalHugs is an extension of Hugging Face that offers a generalized...	51	15	Python
10	withceleste/celeste-python Open source, type-safe primitives for multi-modal AI. All modelities, all...	50	219	Python
11	cloudglue/cloudglue-js Official JavaScript / TypeScript SDK for Cloudglue API	48	5	TypeScript
12	EvolvingLMMs-Lab/LongVT [CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling	47	217	Python
13	om-ai-lab/GroundVLP GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language...	46	74	Jupyter Notebook
14	Jinfeng-Xu/Awesome-Multimodal-Recommender-Systems [TMM'26] Continuously Updated Awesome Multimodal Recommendation Paper List	45	97	—
15	anam-org/metaxy Pluggable sample-level metadata versioning for incremental multimodal pipelines.	45	89	Python
16	eduardosanzb/escribano AI-powered session intelligence tool - transcribes Cap recordings with Whisper	44	5	TypeScript
17	yunncheng/MMRL [CVPR 2025 & IJCV2026] Official PyTorch Code for "MMRL: Multi-Modal...	43	102	Python
18	Mellow-Artificial-Intelligence/open-xtract Extract structured data from documents, images, audio, and video using LLMs.	43	16	Python
19	ComfyUI-Kelin/ComfyUI-LLMs-Toolkit ComfyUI custom nodes for DeepSeek, Qwen, GPT, and other OpenAI-compatible...	43	19	Python
20	MING-ZCH/CII-Bench [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?	38	21	Python

Browse by category

Uncategorized

39 tools