qyfang/TextClassification
基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
This project helps anyone who needs to automatically sort large volumes of Chinese news articles into predefined categories. You provide raw Chinese news text, and it classifies each article into one of 10 categories, like 'sports' or 'finance'. This is ideal for content managers, data analysts, or researchers dealing with news aggregation and topic organization.
110 stars. No commits in the last 6 months.
Use this if you need to efficiently categorize millions of Chinese news articles based on their content.
Not ideal if your text data is in a language other than Chinese or if you need to classify documents into custom categories not related to general news topics.
Stars
110
Forks
21
Language
Python
License
—
Category
Last pushed
Dec 24, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/qyfang/TextClassification"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hankcs/text-classification-svm
The missing SVM-based text classification module implementing HanLP's interface
derhuerst/nbayes
A Naive Bayes classifier written in JavaScript.
ningchaoar/UnsupervisedTextClassification
基于关键词的无监督文本分类;Implementation for paper "Text Classification by Bootstrapping with Keywords, EM...
samitha9125/SinhalaTextClassification
Sinhala Text Classification based on n-grams
fullstackyang/article-classifier
基于朴素贝叶斯实现的一款微信公众号文章分类器