Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
This tool helps researchers and developers working on audio captioning projects to easily access and prepare large datasets. It takes raw audio and associated text descriptions, providing them in a structured format suitable for machine learning models. The primary users are machine learning engineers and AI researchers focused on multimodal audio-language tasks.
127 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly set up and load standard audio captioning datasets like AudioCaps, Clotho, MACS, or WavCaps directly into your PyTorch machine learning workflows.
Not ideal if you are looking for an off-the-shelf solution to generate audio captions without any programming or machine learning development.
Stars
127
Forks
10
Language
Python
License
MIT
Category
Last pushed
Jul 18, 2025
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Labbeti/aac-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
iver56/audiomentations
A Python library for audio data augmentation. Useful for making audio ML models work well in the...
Rikorose/DeepFilterNet
Noise supression using deep filtering
torchsynth/torchsynth
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.
marl/openl3
OpenL3: Open-source deep audio and image embeddings
archinetai/audio-data-pytorch
A collection of useful audio datasets and transforms for PyTorch.