refuel-ai/autolabel
Label, clean and enrich text datasets with LLMs.
This helps data scientists, machine learning engineers, and researchers quickly prepare text data for machine learning models. You provide raw text data and guidelines for what information you want extracted or categorized. It then automatically processes this data, outputting a cleaned and labeled dataset ready for model training, significantly reducing manual effort.
2,304 stars. No commits in the last 6 months.
Use this if you need to rapidly label, clean, or enrich large text datasets for machine learning, using the power of Large Language Models (LLMs) to automate the process.
Not ideal if your data is not text-based, or if you require human-in-the-loop validation for every single label without any automation.
Stars
2,304
Forks
159
Language
Python
License
MIT
Category
Last pushed
Mar 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/refuel-ai/autolabel"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
waikato-llm/llm-dataset-converter
For converting LLM datasets from one format into another.
niclasgriesshaber/llm_patent_pipeline
LLMs for Historical Dataset Construction from Archival Image Scans
cgxjdzz/FeatureForge-LLM
FeatureForge LLM is a Python package that leverages large language models (LLMs) to automate and...
codeastra2/llm-feat
Automated feature engineering using Large Language Models (LLMs) for tabular data