yzhan238/TELEClass
The source code used for paper "TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision", published in WWW 2025.
This project helps researchers and data scientists classify large collections of text documents into complex, multi-level categories, even with minimal human-labeled examples. It takes your existing category structure and a body of unlabeled text, and outputs text documents accurately sorted into their appropriate hierarchical categories. It's designed for someone who needs to organize a lot of text without spending excessive time manually labeling data.
No commits in the last 6 months.
Use this if you need to automatically sort a large volume of text into a predefined, hierarchical category system with very little initial manual labeling.
Not ideal if you have a flat, simple categorization task or if you have a large, high-quality dataset of already-labeled documents.
Stars
25
Forks
2
Language
Python
License
—
Category
Last pushed
Apr 06, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/yzhan238/TELEClass"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...