seehiong/voicedoc-agent
🎙️ Voice-native document intelligence using Gemini, ElevenLabs STT/TTS, and Datadog observability — turning text documents into spoken conversations.
This project helps professionals deeply understand a single text document entirely through natural voice conversation. You upload a document (like a legal contract, financial report, or academic paper), and the system responds verbally to your questions, even adapting its tone and pace to match the document's subject matter. It's designed for anyone who needs to quickly grasp complex information from documents without typing or reading extensively.
Use this if you need to quickly and thoroughly comprehend a single complex document through spoken conversation, receiving nuanced, context-aware verbal explanations.
Not ideal if you need to compare or summarize information across many documents at once, as its focus is on deep interaction with one file.
Stars
25
Forks
2
Language
TypeScript
License
MIT
Category
Last pushed
Dec 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/seehiong/voicedoc-agent"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
labring/FastGPT
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of...
RunanywhereAI/RCLI
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
ragpi/ragpi
🤖 An open-source AI assistant answering questions using your docs
theaiautomators/insights-lm-local-package
Open-source, fully private and local alternative to NotebookLM. Chat with your documents,...
AstraBert/PapersChat
An agentic AI application that allows you to chat with your papers and gather also information...