NorskRegnesentral/skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

52
/ 100
Established

This toolkit helps you automatically label large amounts of text data, which is crucial for training AI models, without the time and cost of manual annotation. You provide raw, unlabelled text documents and a set of simple rules or existing dictionaries. The system then processes these to output a fully labelled text corpus, ready for use in machine learning. This is ideal for NLP practitioners, data scientists, or researchers working with text in specialized or under-resourced domains.

926 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to create labelled text datasets for natural language processing tasks but lack the resources for extensive manual annotation.

Not ideal if you already have access to large, high-quality manually labelled datasets or if your task does not involve text processing.

Text Annotation Natural Language Processing Data Labeling Information Extraction Machine Learning
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

926

Forks

77

Language

Python

License

MIT

Last pushed

Sep 02, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/NorskRegnesentral/skweak"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.