fengyuli-dev/multimedia-gpt
Empowering your ChatGPT with vision and audio inputs.
This project helps developers integrate vision and audio capabilities into their OpenAI GPT applications. It takes images, audio recordings, or PDF documents as input, processes them, and returns responses that can include both text and generated images. This allows a developer to build more versatile AI assistants that can understand and respond to multimedia content.
180 stars. No commits in the last 6 months.
Use this if you are a developer looking to extend your OpenAI GPT applications to process and generate multimedia content like images and audio.
Not ideal if you are an end-user seeking a ready-to-use application; this is a toolkit for developers, not a consumer product.
Stars
180
Forks
12
Language
Python
License
MIT
Category
Last pushed
Jul 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/fengyuli-dev/multimedia-gpt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
2noise/ChatTTS
A generative speech model for daily dialogue.
yihong0618/xiaogpt
Play ChatGPT and other LLM with Xiaomi AI Speaker
judahpaul16/gpt-home
ChatGPT at home! A better alternative to commercial smart home assistants, built on the...
paulovcmedeiros/pyRobBot
Chat with GPT LLMs over voice, UI & terminal, all with access to the internet. Powered by OpenAI.
Jdka1/SpeechGPT
Free ChatGPT voice interaction and integration into python workflows.