Labbeti/aac-datasets

Audio Captioning datasets for PyTorch.

/ 100

Emerging

This tool helps researchers and developers working on audio captioning projects to easily access and prepare large datasets. It takes raw audio and associated text descriptions, providing them in a structured format suitable for machine learning models. The primary users are machine learning engineers and AI researchers focused on multimodal audio-language tasks.

127 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly set up and load standard audio captioning datasets like AudioCaps, Clotho, MACS, or WavCaps directly into your PyTorch machine learning workflows.

Not ideal if you are looking for an off-the-shelf solution to generate audio captions without any programming or machine learning development.

audio-analysis machine-learning-datasets natural-language-processing speech-technologies multimodal-AI

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 11 / 25

How are scores calculated?

Stars

127

Forks

Language

Python

License

MIT

Compare

aac-datasets and audio-data-pytorch

Higher-rated alternatives

iver56/audiomentations

A Python library for audio data augmentation. Useful for making audio ML models work well in the...

Rikorose/DeepFilterNet

Noise supression using deep filtering

torchsynth/torchsynth

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

marl/openl3

OpenL3: Open-source deep audio and image embeddings

archinetai/audio-data-pytorch

A collection of useful audio datasets and transforms for PyTorch.

Explore ML Frameworks

All categories Trending ML Framework directory Insights