jtkim-kaist/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
This toolkit helps signal processing researchers and audio engineers accurately identify when speech is present in an audio recording, distinguishing it from background noise or silence. It takes raw audio recordings as input and outputs precise timestamps or labels indicating speech segments. This is ideal for anyone working with spoken language data where precise speech detection is crucial for further analysis or processing.
869 stars. No commits in the last 6 months.
Use this if you need to reliably separate speech from non-speech in noisy real-world audio, especially for research or advanced audio processing applications.
Not ideal if you're looking for a simple, off-the-shelf voice recorder or transcription service without needing to understand the underlying speech detection models.
Stars
869
Forks
233
Language
MATLAB
License
—
Category
Last pushed
Jun 09, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jtkim-kaist/VAD"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
FluidInference/FluidAudio
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity...
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn...
phuc-nt/my-translator
Real-time speech translation — macOS & Windows, free TTS, no server, your API keys only
pot-app/pot-desktop
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
Blaizzy/mlx-audio-swift
A modular Swift SDK for audio processing with MLX on Apple Silicon