snorkel-team/snorkel

A system for quickly generating training data with weak supervision

58
/ 100
Established

Snorkel helps machine learning practitioners efficiently create labeled training data, which is often a bottleneck in developing AI applications. It allows you to programmatically define labeling functions that take your raw, unlabeled data as input and output training data with predicted labels. This is ideal for data scientists or ML engineers who need to quickly generate large datasets for model training without manual annotation.

5,940 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to rapidly label a large dataset for machine learning model training, especially when manual labeling is too time-consuming or expensive.

Not ideal if you have a small dataset that can be easily labeled manually or if you are looking for an end-to-end ML platform that includes model development and deployment.

data-labeling machine-learning-engineering dataset-generation AI-development
Stale 6m
Maintenance 0 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

5,940

Forks

855

Language

Python

License

Apache-2.0

Last pushed

May 02, 2024

Commits (30d)

0

Dependencies

10

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/snorkel-team/snorkel"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.