AI4Bharat/Anudesh

An open source platform to annotate data for Large language models - at scale

/ 100

Emerging

This platform helps data annotation teams efficiently prepare large datasets for training large language models (LLMs). You input raw text or audio data, and the platform facilitates human annotators in labeling and structuring this data. The output is high-quality, labeled data ready for LLM development, particularly useful for organizations focusing on under-represented languages.

Use this if you need a scalable, collaborative platform to annotate diverse datasets for large language model development, especially for less common languages.

Not ideal if you're looking for a simple, single-user annotation tool or don't work with large-scale language model data.

data-annotation language-model-training NLP-data-preparation text-labeling AI-data-pipeline

No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

—

License

MIT

Higher-rated alternatives

NVIDIA-NeMo/Curator

Scalable data pre processing and curation toolkit for LLMs

MigoXLab/dingo

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

data-prep-kit/data-prep-kit

Open source project for data preparation for GenAI applications

TheDataStation/pneuma

LLM-Powered Data Discovery System for Tabular Data

cleanlab/cleanlab-studio

Client interface to Cleanlab Studio

Explore LLM Tools

All categories Trending LLM Tool directory Insights