HazyResearch/reef

Automatically labeling training data

45
/ 100
Emerging

This project helps data scientists and machine learning engineers automatically label large amounts of training data for binary classification tasks. You provide a small dataset that has already been labeled and a much larger dataset without labels. The system then generates a set of simple rules to apply these labels to your unlabeled data, producing a fully labeled dataset ready for training your machine learning models.

108 stars. No commits in the last 6 months.

Use this if you have a significant amount of unlabeled data and a smaller, high-quality labeled dataset for a binary classification problem, and you want to efficiently expand your labeled data without extensive manual effort.

Not ideal if your problem involves multi-class classification, requires very complex, nuanced labeling rules, or you are working with non-numerical data types that need specialized pre-processing.

data-labeling machine-learning-engineering binary-classification training-data-generation data-preparation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

108

Forks

27

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Jan 08, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/HazyResearch/reef"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.