NorskRegnesentral/skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

/ 100

Established

This toolkit helps you automatically label large amounts of text data, which is crucial for training AI models, without the time and cost of manual annotation. You provide raw, unlabelled text documents and a set of simple rules or existing dictionaries. The system then processes these to output a fully labelled text corpus, ready for use in machine learning. This is ideal for NLP practitioners, data scientists, or researchers working with text in specialized or under-resourced domains.

926 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to create labelled text datasets for natural language processing tasks but lack the resources for extensive manual annotation.

Not ideal if you already have access to large, high-quality manually labelled datasets or if your task does not involve text processing.

Text Annotation Natural Language Processing Data Labeling Information Extraction Machine Learning

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 17 / 25

How are scores calculated?

Stars

926

Forks

Language

Python

License

MIT

Related tools

hellohaptik/chatbot_ner

chatbot_ner: Named Entity Recognition for chatbots.

openeventdata/mordecai

Full text geoparsing as a Python library

Rostlab/nalaf

NLP framework in python for entity recognition and relationship extraction

mpuig/spacy-lookup

Named Entity Recognition based on dictionaries

juand-r/entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These...

Explore NLP Tools

All categories Trending NLP directory Insights