Gr122lyBr/voicetag
Speaker identification powered by pyannote and resemblyzer
This tool helps you automatically figure out who spoke when in any audio recording, such as meetings, podcasts, or interviews. You provide examples of each person's voice once, and it outputs a timeline showing who spoke and for how long. It can even generate a full transcript, telling you "who said what." This is ideal for anyone needing to analyze spoken content, like researchers, journalists, or content creators.
Available on PyPI.
Use this if you need to quickly identify specific speakers and potentially transcribe their words from audio files, without manually listening through everything.
Not ideal if you only need a basic transcription without speaker identification or if you're working with extremely low-quality audio where voices are difficult to distinguish.
Stars
32
Forks
2
Language
Python
License
MIT
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Gr122lyBr/voicetag"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AbdullahHendy/live-translation
Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client...
i4Ds/whisper-finetune
This repository contains code for fine-tuning the Whisper speech-to-text model.
512z/podlens
Free Podwise: AI Podcast & Youtube Transcription & Understanding Agent | 播客+youtube转文字/学习/可视化AI工具
aws-solutions/content-localization-on-aws
Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated...
fizamusthafa/whisper-app
This repository contains a web application for multi-lingual transcription using OpenAI's...