YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
This project offers an advanced tool for automatically categorizing audio recordings, whether you're identifying different sounds, spoken commands, or musical genres. You provide raw audio data, and it outputs labels or classifications, telling you what's in the sound. This is ideal for researchers or developers building systems that need to understand and react to various types of audio information.
1,432 stars. No commits in the last 6 months.
Use this if you need to accurately classify diverse audio inputs, like environmental sounds, speech commands, or musical snippets, and want to leverage state-of-the-art, attention-based models.
Not ideal if your primary goal is real-time audio synthesis, voice manipulation, or purely acoustic signal processing without a classification objective.
Stars
1,432
Forks
244
Language
Jupyter Notebook
License
BSD-3-Clause
Category
Last pushed
May 21, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/YuanGongND/ast"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz...
drethage/speech-denoising-wavenet
A neural network for end-to-end speech denoising
iver56/torch-audiomentations
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
lmnt-com/wavegrad
A fast, high-quality neural vocoder.
madhavmk/Noise2Noise-audio_denoising_without_clean_training_data
Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise...