HazyResearch/reef

Automatically labeling training data

/ 100

Emerging

This project helps data scientists and machine learning engineers automatically label large amounts of training data for binary classification tasks. You provide a small dataset that has already been labeled and a much larger dataset without labels. The system then generates a set of simple rules to apply these labels to your unlabeled data, producing a fully labeled dataset ready for training your machine learning models.

108 stars. No commits in the last 6 months.

Use this if you have a significant amount of unlabeled data and a smaller, high-quality labeled dataset for a binary classification problem, and you want to efficiently expand your labeled data without extensive manual effort.

Not ideal if your problem involves multi-class classification, requires very complex, nuanced labeling rules, or you are working with non-numerical data types that need specialized pre-processing.

data-labeling machine-learning-engineering binary-classification training-data-generation data-preparation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

108

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

cvat-ai/cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and...

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

wkentaro/labelme

Image annotation with Python. Supports polygon, rectangle, circle, line, point, and AI-assisted...

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

doccano/doccano

Open source annotation tool for machine learning practitioners.

Explore ML Frameworks

All categories Trending ML Framework directory Insights