code-kern-ai/refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

/ 100

Emerging

This tool helps data scientists prepare, manage, and improve the quality of text-based training data for Natural Language Processing (NLP) models. You input raw, unstructured text data (like customer feedback or articles) and get out clean, structured, and expertly labeled datasets. It's designed for data scientists building and refining NLP models who need to ensure their training data is high quality and consistently maintained.

1,470 stars. No commits in the last 6 months.

Use this if you need to efficiently label, assess, and maintain natural language training data to build or improve your NLP models, especially if your current data is unstructured or its quality is uncertain.

Not ideal if your project doesn't involve natural language processing or if you already have perfectly clean, perfectly labeled datasets that require no further management.

natural-language-processing data-labeling text-analytics machine-learning-engineering data-quality-management

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

1,470

Forks

Language

Python

License

Apache-2.0

Related tools

nus-cs3244-ml-singapore-7/sg-parliament-hansard-nlp-demo

Singapore Hansard NLP Demo

jaychampaneri14/ai-essay-grader

Automated essay scoring with BERT and linguistic features

ininando/AI-Answer-Evaluation-System

Evaluate and grade student answers in text and audio formats using advanced NLP for meaningful...

Explore NLP Tools

All categories Trending NLP directory Insights