ddlBoJack/MT4SSL
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
This project helps machine learning engineers and researchers accelerate the training of speech recognition models. It takes raw audio data and various pre-training targets as input and outputs a fine-tuned model capable of transcribing speech efficiently. This is designed for those who develop or enhance speech AI systems.
No commits in the last 6 months.
Use this if you are developing new speech recognition models and want to achieve strong performance with fewer pre-training steps and faster convergence.
Not ideal if you are looking for an off-the-shelf speech recognition application rather than a framework for model development.
Stars
45
Forks
4
Language
Python
License
MIT
Category
Last pushed
Mar 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/ddlBoJack/MT4SSL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System