proger/haloop

Agent toolkit for 100 hours of speech and 10 GiB of text

46
/ 100
Emerging

This toolkit helps researchers and developers working with large volumes of speech and text data, particularly in less-resourced languages like Ukrainian. It enables the training of acoustic, language, and attention models from your raw audio and text. The output includes trained models, language model scores for sentences, and comparative analyses of datasets, used by natural language processing (NLP) and speech technology specialists.

No commits in the last 6 months. Available on PyPI.

Use this if you need to train speech and language models or evaluate text likelihood for large datasets (hundreds of hours of speech, tens of gigabytes of text) in specialized domains or languages where pre-built solutions are scarce.

Not ideal if you are looking for an off-the-shelf application for speech-to-text or text generation without needing to train custom models or work directly with model components.

speech-recognition natural-language-processing language-modeling acoustic-modeling computational-linguistics
Stale 6m
Maintenance 2 / 25
Adoption 5 / 25
Maturity 25 / 25
Community 14 / 25

How are scores calculated?

Stars

14

Forks

3

Language

Python

License

GPL-3.0

Last pushed

Jul 15, 2025

Commits (30d)

0

Dependencies

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/proger/haloop"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.