proger/haloop
Agent toolkit for 100 hours of speech and 10 GiB of text
This toolkit helps researchers and developers working with large volumes of speech and text data, particularly in less-resourced languages like Ukrainian. It enables the training of acoustic, language, and attention models from your raw audio and text. The output includes trained models, language model scores for sentences, and comparative analyses of datasets, used by natural language processing (NLP) and speech technology specialists.
No commits in the last 6 months. Available on PyPI.
Use this if you need to train speech and language models or evaluate text likelihood for large datasets (hundreds of hours of speech, tens of gigabytes of text) in specialized domains or languages where pre-built solutions are scarce.
Not ideal if you are looking for an off-the-shelf application for speech-to-text or text generation without needing to train custom models or work directly with model components.
Stars
14
Forks
3
Language
Python
License
GPL-3.0
Category
Last pushed
Jul 15, 2025
Commits (30d)
0
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/proger/haloop"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ai4co/reevo
[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution
SALT-NLP/collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together...
Gen-Verse/LatentMAS
Latent Collaboration in Multi-Agent Systems
lean-dojo/LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
WooooDyy/AgentGym-RL
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon...