fengyuli-dev/multimedia-gpt

Empowering your ChatGPT with vision and audio inputs.

/ 100

Emerging

This project helps developers integrate vision and audio capabilities into their OpenAI GPT applications. It takes images, audio recordings, or PDF documents as input, processes them, and returns responses that can include both text and generated images. This allows a developer to build more versatile AI assistants that can understand and respond to multimedia content.

180 stars. No commits in the last 6 months.

Use this if you are a developer looking to extend your OpenAI GPT applications to process and generate multimedia content like images and audio.

Not ideal if you are an end-user seeking a ready-to-use application; this is a toolkit for developers, not a consumer product.

AI-application-development LLM-tooling multimodal-AI OpenAI-integration AI-agent-frameworks

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

180

Forks

Language

Python

License

MIT

Higher-rated alternatives

2noise/ChatTTS

A generative speech model for daily dialogue.

yihong0618/xiaogpt

Play ChatGPT and other LLM with Xiaomi AI Speaker

judahpaul16/gpt-home

ChatGPT at home! A better alternative to commercial smart home assistants, built on the...

paulovcmedeiros/pyRobBot

Chat with GPT LLMs over voice, UI & terminal, all with access to the internet. Powered by OpenAI.

Jdka1/SpeechGPT

Free ChatGPT voice interaction and integration into python workflows.

Explore LLM Tools

All categories Trending LLM Tool directory Insights