Chenyu-Wang567/MLLM-Tool

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

/ 100

Emerging

This project offers a multimodal AI assistant that can understand requests involving images, audio, and text, then recommend the most suitable software tools to complete a task. It takes in various forms of input, such as a picture, a sound clip, or written instructions, and suggests specific tools or functions. It's designed for AI researchers or developers who are building agents that need to interpret complex, real-world user intentions.

140 stars. No commits in the last 6 months.

Use this if you are developing AI agents that need to perceive visual and auditory information alongside text to accurately choose the right tools for a given task.

Not ideal if you are an end-user looking for a ready-to-use application, as this project provides the underlying code and models for development, not a finished product.

AI agent development multimodal AI tool recommendation machine learning research AI system design

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

140

Forks

Language

Python

License

MIT

Higher-rated alternatives

langfengQ/verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is...

sotopia-lab/sotopia

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)

zhudotexe/redel

ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive...

TIGER-AI-Lab/verl-tool

A version of verl to support diverse tool use

AMAP-ML/Tree-GRPO

[ICLR 2026] Tree Search for LLM Agent Reinforcement Learning

Explore LLM Tools

All categories Trending LLM Tool directory Insights