slSeanWU/beats-conformer-bart-audio-captioner

PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"

/ 100

Experimental

This project automatically generates descriptive text captions for audio recordings. You feed it raw audio files, and it produces human-readable sentences that explain what's happening in the sound, like 'A dog barks' or 'Someone is playing a guitar in a park.' It's perfect for audio researchers, content creators, or anyone needing to quickly summarize sound events without manual listening.

No commits in the last 6 months.

Use this if you need to automatically generate clear, concise text descriptions for a large collection of audio files, saving significant time and effort compared to manual transcription.

Not ideal if you need to transcribe speech content from audio or require highly nuanced, subjective interpretations of sound that only a human could provide.

audio-analysis sound-event-description audio-content-labeling media-asset-management

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

tojiboyevf/image_captioning

Deep Learning Final project 2022

Hamtech-ai/Persian-Image-Captioning

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Explore Transformer Models

All categories Trending Transformer directory Insights