vgupta123/P-SIF
Source code for our AAAI 2020 paper P-SIF: Document Embeddings using Partition Averaging
This project helps researchers and data scientists represent text data as numerical vectors for machine learning tasks. It takes raw text documents (like news articles or tweets) and converts them into fixed-dimension vectors, which can then be used as input for classification, information retrieval, or semantic similarity models. This is useful for anyone working with large text datasets who needs to prepare them for computational analysis.
No commits in the last 6 months.
Use this if you are a researcher or data scientist needing to transform diverse text documents into numerical representations for tasks like categorizing content or finding semantically similar texts.
Not ideal if you are looking for a plug-and-play solution without any programming or deep learning expertise.
Stars
35
Forks
10
Language
Python
License
—
Category
Last pushed
May 02, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/vgupta123/P-SIF"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...
vinid/cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics
criteo-research/CausE
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support...