abinashmeher999/voice-data-extract
A command line interface to combine text information from subtitles with voice data in the video. Provides a convenient way to generate training data for speech-recognition purposes.
This tool helps speech recognition engineers create high-quality audio datasets for training machine learning models. It takes a video file and its corresponding subtitle file as input, and outputs precisely clipped audio files for each subtitle line. Each audio clip has the subtitle text embedded within it, making it easy to build datasets for training new speech recognition systems.
No commits in the last 6 months.
Use this if you need to quickly generate labeled audio training data for speech recognition models from existing videos with subtitles.
Not ideal if you're looking for a solution that automatically handles complex audio cleaning or speaker diarization.
Stars
19
Forks
5
Language
Python
License
MIT
Category
Last pushed
Oct 04, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/abinashmeher999/voice-data-extract"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
speechmatics/speechmatics-python
Python library and CLI for Speechmatics
gooofy/py-nltools
A collection of basic python modules for spoken natural language processing
IBM/MAX-Speech-to-Text-Converter
Converts spoken words into text form.
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition,...
snakers4/open_stt
Open STT