NorskRegnesentral/skweak
skweak: A software toolkit for weak supervision applied to NLP tasks
This toolkit helps you automatically label large amounts of text data, which is crucial for training AI models, without the time and cost of manual annotation. You provide raw, unlabelled text documents and a set of simple rules or existing dictionaries. The system then processes these to output a fully labelled text corpus, ready for use in machine learning. This is ideal for NLP practitioners, data scientists, or researchers working with text in specialized or under-resourced domains.
926 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to create labelled text datasets for natural language processing tasks but lack the resources for extensive manual annotation.
Not ideal if you already have access to large, high-quality manually labelled datasets or if your task does not involve text processing.
Stars
926
Forks
77
Language
Python
License
MIT
Category
Last pushed
Sep 02, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/NorskRegnesentral/skweak"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
hellohaptik/chatbot_ner
chatbot_ner: Named Entity Recognition for chatbots.
openeventdata/mordecai
Full text geoparsing as a Python library
Rostlab/nalaf
NLP framework in python for entity recognition and relationship extraction
mpuig/spacy-lookup
Named Entity Recognition based on dictionaries
juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These...