juanmc2005/diart
A python package to build AI-powered real-time audio applications
This tool helps you automatically identify who is speaking in real-time audio, whether from a recorded conversation or a live microphone feed. It takes an audio input and outputs a detailed record of who spoke when, useful for generating speaker-labeled transcripts or analyzing conversational dynamics. It's designed for anyone needing to pinpoint individual speakers in multi-person audio.
1,944 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to accurately track and label different speakers in an audio stream as it happens, for tasks like live transcription or meeting analysis.
Not ideal if you only need to detect speech presence without differentiating between speakers, or if you're not comfortable with some command-line interaction.
Stars
1,944
Forks
159
Language
Python
License
MIT
Category
Last pushed
Feb 12, 2025
Commits (30d)
0
Dependencies
20
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/juanmc2005/diart"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
felixbur/nkululeko
Machine learning speaker characteristics
claritychallenge/clarity
Clarity Challenge toolkit - software for building Clarity Challenge systems
astorfi/3D-convolutional-speaker-recognition
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
wq2012/awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
hitachi-speech/EEND
End-to-End Neural Diarization