proger/haloop

Agent toolkit for 100 hours of speech and 10 GiB of text

/ 100

Emerging

This toolkit helps researchers and developers working with large volumes of speech and text data, particularly in less-resourced languages like Ukrainian. It enables the training of acoustic, language, and attention models from your raw audio and text. The output includes trained models, language model scores for sentences, and comparative analyses of datasets, used by natural language processing (NLP) and speech technology specialists.

No commits in the last 6 months. Available on PyPI.

Use this if you need to train speech and language models or evaluate text likelihood for large datasets (hundreds of hours of speech, tens of gigabytes of text) in specialized domains or languages where pre-built solutions are scarce.

Not ideal if you are looking for an off-the-shelf application for speech-to-text or text generation without needing to train custom models or work directly with model components.

speech-recognition natural-language-processing language-modeling acoustic-modeling computational-linguistics

Stale 6m

Maintenance 2 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

ai4co/reevo

[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution

SALT-NLP/collaborative-gym

Framework and toolkits for building and evaluating collaborative agents that can work together...

Gen-Verse/LatentMAS

Latent Collaboration in Multi-Agent Systems

lean-dojo/LeanCopilot

LLMs as Copilots for Theorem Proving in Lean

WooooDyy/AgentGym-RL

Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon...

Explore LLM Tools

All categories Trending LLM Tool directory Insights