RustedBytes/audios-to-dataset
Convert your audio files into DuckDB or Parquet files
This tool helps researchers and data scientists prepare large collections of audio files for machine learning tasks. It takes a folder of various audio formats (like WAV, MP3, FLAC) and optional text metadata (like transcriptions), then organizes them into structured Parquet or DuckDB files. This is ideal for anyone building speech-to-text models or other audio-based AI applications.
Use this if you need to quickly and efficiently transform raw audio files and their associated metadata into a machine-learning-ready dataset.
Not ideal if you only have a few audio files or primarily work with specialized audio formats not commonly used in machine learning.
Stars
8
Forks
1
Language
Rust
License
MIT
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/RustedBytes/audios-to-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.