qyfang/TextClassification

基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。

36
/ 100
Emerging

This project helps anyone who needs to automatically sort large volumes of Chinese news articles into predefined categories. You provide raw Chinese news text, and it classifies each article into one of 10 categories, like 'sports' or 'finance'. This is ideal for content managers, data analysts, or researchers dealing with news aggregation and topic organization.

110 stars. No commits in the last 6 months.

Use this if you need to efficiently categorize millions of Chinese news articles based on their content.

Not ideal if your text data is in a language other than Chinese or if you need to classify documents into custom categories not related to general news topics.

news-classification content-categorization chinese-text-analysis media-monitoring information-organization
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 19 / 25

How are scores calculated?

Stars

110

Forks

21

Language

Python

License

Last pushed

Dec 24, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/qyfang/TextClassification"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.