The-Data-Dilemma/ParquetToHuggingFace

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

/ 100

Emerging

This tool helps researchers and data scientists prepare and share audio datasets for machine learning. It takes raw audio recordings and their transcriptions, processes them into a standardized Parquet format, and then uploads them to Hugging Face, making your dataset easily accessible and shareable with the broader ML community. It's ideal for those working with audio for speech-to-text or translation tasks.

No commits in the last 6 months.

Use this if you need to convert your raw audio data and its corresponding text into a structured Parquet format and then publish it as a dataset on Hugging Face.

Not ideal if you are looking to analyze audio directly without preparing it for a machine learning dataset, or if you prefer not to use Hugging Face for data sharing.

audio-processing speech-recognition dataset-creation natural-language-processing machine-learning-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 15 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Picovoice/rhino

On-device Speech-to-Intent engine powered by deep learning

yandexdataschool/speech_course

YSDA course in Speech Processing.

MycroftAI/adapt

Adapt Intent Parser

Picovoice/speech-to-intent-benchmark

benchmark for Speech-to-Intent engines

IBM/BigLittleNet

Official repository for Big-Little Net

Explore Voice AI Tools

All categories Trending Voice AI directory Insights