HazyResearch/reef
Automatically labeling training data
This project helps data scientists and machine learning engineers automatically label large amounts of training data for binary classification tasks. You provide a small dataset that has already been labeled and a much larger dataset without labels. The system then generates a set of simple rules to apply these labels to your unlabeled data, producing a fully labeled dataset ready for training your machine learning models.
108 stars. No commits in the last 6 months.
Use this if you have a significant amount of unlabeled data and a smaller, high-quality labeled dataset for a binary classification problem, and you want to efficiently expand your labeled data without extensive manual effort.
Not ideal if your problem involves multi-class classification, requires very complex, nuanced labeling rules, or you are working with non-numerical data types that need specialized pre-processing.
Stars
108
Forks
27
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jan 08, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/HazyResearch/reef"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and...
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
wkentaro/labelme
Image annotation with Python. Supports polygon, rectangle, circle, line, point, and AI-assisted...
CVHub520/X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
doccano/doccano
Open source annotation tool for machine learning practitioners.