slSeanWU/beats-conformer-bart-audio-captioner

PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"

26
/ 100
Experimental

This project automatically generates descriptive text captions for audio recordings. You feed it raw audio files, and it produces human-readable sentences that explain what's happening in the sound, like 'A dog barks' or 'Someone is playing a guitar in a park.' It's perfect for audio researchers, content creators, or anyone needing to quickly summarize sound events without manual listening.

No commits in the last 6 months.

Use this if you need to automatically generate clear, concise text descriptions for a large collection of audio files, saving significant time and effort compared to manual transcription.

Not ideal if you need to transcribe speech content from audio or require highly nuanced, subjective interpretations of sound that only a human could provide.

audio-analysis sound-event-description audio-content-labeling media-asset-management
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

39

Forks

1

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Jan 06, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/slSeanWU/beats-conformer-bart-audio-captioner"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.