IDEA-CCNL/Real-Gemini
Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架,通过文本、语音、图像和视频和这是世界进行问答和交流。
This project helps you build interactive applications that can understand and respond to real-time video content. It takes live video streams, along with your text or audio input, and provides intelligent answers or generates new content like music. This is designed for application developers who want to integrate advanced multi-modal AI into their projects.
No commits in the last 6 months.
Use this if you are a developer building real-time applications that require understanding live video and interacting with it using various modalities like text, audio, and images.
Not ideal if you are an end-user looking for a ready-to-use application, as this project provides a framework and requires technical setup.
Stars
26
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IDEA-CCNL/Real-Gemini"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
macrocosm-os/apex
SN1: An incentive mechanism for internet-scale conversational intelligence
Lin-jun-xiang/agent-line-bot
🤖Free Agent Line Bot with Google Image Search, Image Generator, Video Generator...
uezo/chatmemory
The simple yet powerful long-term memory manager between AI and you💕
jgravelle/pocketgroq
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering...
CORE-Labet/CORE
CORE is a plug-and-play conversational agent for any recommender system.