qyfang/TextClassification

基于scikit-learn实现对新浪新闻的文本分类，数据集为100w篇文档，总计10类，测试集与训练集1:1划分。分类算法采用SVM和Bayes，其中Bayes作为baseline。

/ 100

Emerging

This project helps anyone who needs to automatically sort large volumes of Chinese news articles into predefined categories. You provide raw Chinese news text, and it classifies each article into one of 10 categories, like 'sports' or 'finance'. This is ideal for content managers, data analysts, or researchers dealing with news aggregation and topic organization.

110 stars. No commits in the last 6 months.

Use this if you need to efficiently categorize millions of Chinese news articles based on their content.

Not ideal if your text data is in a language other than Chinese or if you need to classify documents into custom categories not related to general news topics.

news-classification content-categorization chinese-text-analysis media-monitoring information-organization

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

110

Forks

Language

Python

License

—

Higher-rated alternatives

hankcs/text-classification-svm

The missing SVM-based text classification module implementing HanLP's interface

derhuerst/nbayes

A Naive Bayes classifier written in JavaScript.

ningchaoar/UnsupervisedTextClassification

基于关键词的无监督文本分类；Implementation for paper "Text Classification by Bootstrapping with Keywords, EM...

samitha9125/SinhalaTextClassification

Sinhala Text Classification based on n-grams

fullstackyang/article-classifier

基于朴素贝叶斯实现的一款微信公众号文章分类器

Explore NLP Tools

All categories Trending NLP directory Insights