ExplainableML/ZerAuCap

[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords

19
/ 100
Experimental

This project helps audio content creators and analysts automatically generate descriptive text captions for sound events, like ambient noise or human actions, without needing to manually label extensive datasets. It takes raw audio files as input and outputs concise, descriptive text captions, making it ideal for anyone who needs to quickly understand or catalog large collections of audio recordings.

No commits in the last 6 months.

Use this if you need to automatically generate clear, descriptive text summaries for various non-speech audio clips, significantly reducing manual effort in audio annotation or content understanding.

Not ideal if your primary goal is transcribing spoken language, as this tool is specifically designed for environmental sounds and actions, not speech-to-text conversion.

audio-analysis sound-event-detection content-cataloging multimedia-annotation audio-indexing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

18

Forks

1

Language

Python

License

Last pushed

Nov 30, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/ExplainableML/ZerAuCap"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.