AI4Bharat/Anudesh
An open source platform to annotate data for Large language models - at scale
This platform helps data annotation teams efficiently prepare large datasets for training large language models (LLMs). You input raw text or audio data, and the platform facilitates human annotators in labeling and structuring this data. The output is high-quality, labeled data ready for LLM development, particularly useful for organizations focusing on under-represented languages.
Use this if you need a scalable, collaborative platform to annotate diverse datasets for large language model development, especially for less common languages.
Not ideal if you're looking for a simple, single-user annotation tool or don't work with large-scale language model data.
Stars
12
Forks
3
Language
—
License
MIT
Category
Last pushed
Oct 31, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/AI4Bharat/Anudesh"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio