m3hrdadfi/soxan
Wav2Vec for speech recognition, classification, and audio classification
This project helps researchers analyze spoken language and other audio by training models to identify specific characteristics. You can input audio files, such as speech recordings, and it will output classifications like the emotion expressed (e.g., anger, happiness) or the type of sound (e.g., eating sound). This is designed for scientists, linguists, or audio researchers working with large datasets of spoken words or environmental sounds.
273 stars. No commits in the last 6 months.
Use this if you need to train or use pre-trained models for tasks like recognizing emotions in speech or classifying specific audio events from sound recordings.
Not ideal if you are looking for an out-of-the-box application for real-time speech-to-text transcription or simple audio editing.
Stars
273
Forks
38
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Apr 02, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/m3hrdadfi/soxan"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
liangstein/Chinese-speech-to-text
Chinese Speech To Text Using Wavenet
louiskirsch/speechT
An opensource speech-to-text software written in tensorflow
Open-Speech-EkStep/vakyansh-models
Open source speech to text models for Indic Languages
oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Open-Speech-EkStep/vakyansh-wav2vec2-experimentation
Repository containing experimentation platform on how to train, infer on wav2vec2 models.