yuzhimanhua/HIMECat

Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

29
/ 100
Experimental

This tool helps researchers, content managers, or product managers automatically categorize documents into a multi-level hierarchy, even when you only have a few examples for each category. It takes your documents, which include text and associated metadata (like authors, tags, or product IDs), and outputs hierarchical category assignments for each document. This is ideal for anyone dealing with large collections of text that need structured classification.

No commits in the last 6 months.

Use this if you need to organize a large collection of documents into a hierarchical category system but have limited manually labeled data to train a classifier.

Not ideal if your documents lack any structured metadata or if you only need a flat (non-hierarchical) categorization.

document-classification content-management taxonomy-creation information-architecture academic-research-organization
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

45

Forks

2

Language

Python

License

Apache-2.0

Last pushed

Apr 02, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/yuzhimanhua/HIMECat"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.