snorkel-team/snorkel

A system for quickly generating training data with weak supervision

/ 100

Established

Snorkel helps machine learning practitioners efficiently create labeled training data, which is often a bottleneck in developing AI applications. It allows you to programmatically define labeling functions that take your raw, unlabeled data as input and output training data with predicted labels. This is ideal for data scientists or ML engineers who need to quickly generate large datasets for model training without manual annotation.

5,940 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to rapidly label a large dataset for machine learning model training, especially when manual labeling is too time-consuming or expensive.

Not ideal if you have a small dataset that can be easily labeled manually or if you are looking for an end-to-end ML platform that includes model development and deployment.

data-labeling machine-learning-engineering dataset-generation AI-development

Stale 6m

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 22 / 25

How are scores calculated?

Stars

5,940

Forks

855

Language

Python

License

Apache-2.0

Related frameworks

cvat-ai/cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and...

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

wkentaro/labelme

Image annotation with Python. Supports polygon, rectangle, circle, line, point, and AI-assisted...

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

doccano/doccano

Open source annotation tool for machine learning practitioners.

Explore ML Frameworks

All categories Trending ML Framework directory Insights