slSeanWU/beats-conformer-bart-audio-captioner
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"
This project automatically generates descriptive text captions for audio recordings. You feed it raw audio files, and it produces human-readable sentences that explain what's happening in the sound, like 'A dog barks' or 'Someone is playing a guitar in a park.' It's perfect for audio researchers, content creators, or anyone needing to quickly summarize sound events without manual listening.
No commits in the last 6 months.
Use this if you need to automatically generate clear, concise text descriptions for a large collection of audio files, saving significant time and effort compared to manual transcription.
Not ideal if you need to transcribe speech content from audio or require highly nuanced, subjective interpretations of sound that only a human could provide.
Stars
39
Forks
1
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jan 06, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/slSeanWU/beats-conformer-bart-audio-captioner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zarzouram/image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
senadkurtisi/pytorch-image-captioning
Transformer & CNN Image Captioning model in PyTorch.
tojiboyevf/image_captioning
Deep Learning Final project 2022
Hamtech-ai/Persian-Image-Captioning
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.