Chenyu-Wang567/MLLM-Tool

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

34
/ 100
Emerging

This project offers a multimodal AI assistant that can understand requests involving images, audio, and text, then recommend the most suitable software tools to complete a task. It takes in various forms of input, such as a picture, a sound clip, or written instructions, and suggests specific tools or functions. It's designed for AI researchers or developers who are building agents that need to interpret complex, real-world user intentions.

140 stars. No commits in the last 6 months.

Use this if you are developing AI agents that need to perceive visual and auditory information alongside text to accurately choose the right tools for a given task.

Not ideal if you are an end-user looking for a ready-to-use application, as this project provides the underlying code and models for development, not a finished product.

AI agent development multimodal AI tool recommendation machine learning research AI system design
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

140

Forks

4

Language

Python

License

MIT

Last pushed

Oct 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Chenyu-Wang567/MLLM-Tool"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.