lhotse-speech/lhotse
Tools for handling multimodal data in machine learning projects.
This project helps machine learning engineers and researchers efficiently prepare multimodal data, like speech, audio, video, images, and text, for training AI models. It takes raw data files and their metadata as input, then organizes and preprocesses them into structured datasets optimized for model training. The output is ready-to-use data loaders that streamline the workflow for anyone building or training AI systems involving multiple data types.
1,120 stars. Actively maintained with 4 commits in the last 30 days.
Use this if you are a machine learning engineer or researcher working with large, diverse datasets involving audio, speech, video, images, or text, and need a flexible, Python-centric way to prepare and load this data for model training.
Not ideal if you primarily work with tabular data or single-modality datasets that do not require complex preprocessing or integration with deep learning frameworks like PyTorch.
Stars
1,120
Forks
266
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 11, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/lhotse-speech/lhotse"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
google/sequence-layers
A neural network layer API and library for sequence modeling, designed for easy creation of...
awslabs/sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
OpenNMT/OpenNMT-tf
Neural machine translation and sequence learning using TensorFlow
mozilla/translations
The code, training pipeline, and models that power Firefox Translations