saharmor/gemini-multimodal-playground
Build realtime voice and video agents with Google's new Gemini 2.0 (API is free for now)
This tool allows you to have real-time spoken conversations with an AI assistant that can also understand what it sees from your webcam or screen. You provide voice, video, or screen sharing as input, and the AI responds with spoken audio. This is ideal for anyone who wants to interact with an AI naturally, as if they were talking to another person.
323 stars. No commits in the last 6 months.
Use this if you need an AI that can understand both spoken language and visual information from your camera or screen, and respond to you audibly in real-time.
Not ideal if you're looking for a simple text-based AI chat or a tool that doesn't require live multimedia interaction.
Stars
323
Forks
71
Language
TypeScript
License
Apache-2.0
Category
Last pushed
Sep 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/saharmor/gemini-multimodal-playground"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
HanaokaYuzu/Gemini-API
✨ Reverse-engineered Python API for Google Gemini web app
faetalize/zodiac
A ⚡lightweight⚡ frontend for Google's Gemini Pro.
hihumanzone/Gemini-Discord-Bot
A Discord bot leveraging Google Gemini. Has image/video/audio recognition, conversation...
Amm1rr/WebAI-to-API
Gemini to API (Don't need API KEY) (ChatGPT, Claude, DeeepSeek, Grok and more)
GewoonJaap/gemini-cli-openai
Expose Gemini CLI endpoints as OpenAI API with Cloudflare Workers