ningchaoar/UnsupervisedTextClassification
基于关键词的无监督文本分类;Implementation for paper "Text Classification by Bootstrapping with Keywords, EM and Shrinkage" http://www.cs.cmu.edu/~knigam/papers/keywordcat-aclws99.pdf
This project helps content managers, researchers, or anyone dealing with large collections of unlabeled text quickly sort them into predefined categories. You provide a list of texts and some initial keywords for each category, and it outputs a classification for each text. This is designed for users who need to understand the distribution of their text data or set up rules for more precise, subsequent labeling.
No commits in the last 6 months.
Use this if you have a massive amount of unclassified text and want to quickly organize it using a few descriptive keywords per category.
Not ideal if your categories are ambiguous or overlap significantly, as it relies on distinct keywords for effective classification.
Stars
28
Forks
8
Language
Python
License
MIT
Category
Last pushed
Jan 28, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ningchaoar/UnsupervisedTextClassification"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hankcs/text-classification-svm
The missing SVM-based text classification module implementing HanLP's interface
derhuerst/nbayes
A Naive Bayes classifier written in JavaScript.
samitha9125/SinhalaTextClassification
Sinhala Text Classification based on n-grams
qyfang/TextClassification
基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
fullstackyang/article-classifier
基于朴素贝叶斯实现的一款微信公众号文章分类器