Chenyu-Wang567/MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
This project offers a multimodal AI assistant that can understand requests involving images, audio, and text, then recommend the most suitable software tools to complete a task. It takes in various forms of input, such as a picture, a sound clip, or written instructions, and suggests specific tools or functions. It's designed for AI researchers or developers who are building agents that need to interpret complex, real-world user intentions.
140 stars. No commits in the last 6 months.
Use this if you are developing AI agents that need to perceive visual and auditory information alongside text to accurately choose the right tools for a given task.
Not ideal if you are an end-user looking for a ready-to-use application, as this project provides the underlying code and models for development, not a finished product.
Stars
140
Forks
4
Language
Python
License
MIT
Category
Last pushed
Oct 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Chenyu-Wang567/MLLM-Tool"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
langfengQ/verl-agent
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is...
sotopia-lab/sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
zhudotexe/redel
ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive...
TIGER-AI-Lab/verl-tool
A version of verl to support diverse tool use
AMAP-ML/Tree-GRPO
[ICLR 2026] Tree Search for LLM Agent Reinforcement Learning