kkoutini/PaSST
Efficient Training of Audio Transformers with Patchout
This project helps machine learning engineers and researchers efficiently train audio transformers. It takes audio spectrograms as input and produces trained transformer models, along with significant reductions in training time and GPU memory. This is ideal for those developing and experimenting with audio classification, sound event detection, or other audio understanding tasks.
370 stars. No commits in the last 6 months.
Use this if you are developing transformer models for audio processing and need to drastically cut down on training time and GPU memory usage while maintaining or improving performance.
Not ideal if you are looking for a pre-built solution for immediate audio inference without needing to train or fine-tune models yourself.
Stars
370
Forks
58
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/kkoutini/PaSST"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz...
drethage/speech-denoising-wavenet
A neural network for end-to-end speech denoising
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
iver56/torch-audiomentations
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
lmnt-com/wavegrad
A fast, high-quality neural vocoder.