Labbeti/aac-datasets

Audio Captioning datasets for PyTorch.

48
/ 100
Emerging

This tool helps researchers and developers working on audio captioning projects to easily access and prepare large datasets. It takes raw audio and associated text descriptions, providing them in a structured format suitable for machine learning models. The primary users are machine learning engineers and AI researchers focused on multimodal audio-language tasks.

127 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly set up and load standard audio captioning datasets like AudioCaps, Clotho, MACS, or WavCaps directly into your PyTorch machine learning workflows.

Not ideal if you are looking for an off-the-shelf solution to generate audio captions without any programming or machine learning development.

audio-analysis machine-learning-datasets natural-language-processing speech-technologies multimodal-AI
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 11 / 25

How are scores calculated?

Stars

127

Forks

10

Language

Python

License

MIT

Last pushed

Jul 18, 2025

Commits (30d)

0

Dependencies

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Labbeti/aac-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.