lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

/ 100

Established

This project helps machine learning engineers and researchers efficiently prepare multimodal data, like speech, audio, video, images, and text, for training AI models. It takes raw data files and their metadata as input, then organizes and preprocesses them into structured datasets optimized for model training. The output is ready-to-use data loaders that streamline the workflow for anyone building or training AI systems involving multiple data types.

1,120 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you are a machine learning engineer or researcher working with large, diverse datasets involving audio, speech, video, images, or text, and need a flexible, Python-centric way to prepare and load this data for model training.

Not ideal if you primarily work with tabular data or single-modality datasets that do not require complex preprocessing or integration with deep learning frameworks like PyTorch.

AI-model-training multimodal-data speech-recognition computer-vision natural-language-processing

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

1,120

Forks

266

Language

Python

License

Apache-2.0

Related frameworks

facebookresearch/fairseq2

FAIR Sequence Modeling Toolkit 2

google/sequence-layers

A neural network layer API and library for sequence modeling, designed for easy creation of...

awslabs/sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

OpenNMT/OpenNMT-tf

Neural machine translation and sequence learning using TensorFlow

mozilla/translations

The code, training pipeline, and models that power Firefox Translations

Explore ML Frameworks

All categories Trending ML Framework directory Insights