dcarpintero/taxonomy-completion
Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline: A Case Study in Computational Linguistics
This project helps researchers and librarians organize academic literature efficiently. It takes in a collection of research papers, like arXiv publications with titles and abstracts, and automatically groups them into related topics. The output is a hierarchical classification scheme, or taxonomy, of these papers, making it easier to navigate large volumes of scientific knowledge.
No commits in the last 6 months.
Use this if you need to automatically structure a large, unorganized collection of academic papers into a hierarchical topic map.
Not ideal if you already have a well-defined taxonomy and only need to classify new papers into existing categories, rather than discovering new ones.
Stars
8
Forks
2
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jul 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/dcarpintero/taxonomy-completion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TorchDR/TorchDR
TorchDR - PyTorch Dimensionality Reduction
derrickburns/generalized-kmeans-clustering
Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL,...
abhilash1910/ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine similarity...
md-experiments/picture_text
Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)
mainlp/semantic_components
Finding semantic components in your neural representations.