code-kern-ai/refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

42
/ 100
Emerging

This tool helps data scientists prepare, manage, and improve the quality of text-based training data for Natural Language Processing (NLP) models. You input raw, unstructured text data (like customer feedback or articles) and get out clean, structured, and expertly labeled datasets. It's designed for data scientists building and refining NLP models who need to ensure their training data is high quality and consistently maintained.

1,470 stars. No commits in the last 6 months.

Use this if you need to efficiently label, assess, and maintain natural language training data to build or improve your NLP models, especially if your current data is unstructured or its quality is uncertain.

Not ideal if your project doesn't involve natural language processing or if you already have perfectly clean, perfectly labeled datasets that require no further management.

natural-language-processing data-labeling text-analytics machine-learning-engineering data-quality-management
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

1,470

Forks

74

Language

Python

License

Apache-2.0

Last pushed

Dec 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/code-kern-ai/refinery"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.