lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

64
/ 100
Established

This project helps machine learning engineers and researchers efficiently prepare multimodal data, like speech, audio, video, images, and text, for training AI models. It takes raw data files and their metadata as input, then organizes and preprocesses them into structured datasets optimized for model training. The output is ready-to-use data loaders that streamline the workflow for anyone building or training AI systems involving multiple data types.

1,120 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you are a machine learning engineer or researcher working with large, diverse datasets involving audio, speech, video, images, or text, and need a flexible, Python-centric way to prepare and load this data for model training.

Not ideal if you primarily work with tabular data or single-modality datasets that do not require complex preprocessing or integration with deep learning frameworks like PyTorch.

AI-model-training multimodal-data speech-recognition computer-vision natural-language-processing
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

1,120

Forks

266

Language

Python

License

Apache-2.0

Last pushed

Mar 11, 2026

Commits (30d)

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/lhotse-speech/lhotse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.