thunlp/paragraph2vec
Paragraph Vector Implementation
This tool helps researchers analyze and compare documents by converting them into numerical representations called paragraph vectors. You provide collections of training and testing texts, and it outputs these vector files. Scientists, academics, or anyone working with large text corpora can use this to understand the semantic content of their documents.
No commits in the last 6 months.
Use this if you need to transform whole paragraphs or documents into numerical data for tasks like similarity comparisons or clustering.
Not ideal if you're looking for an off-the-shelf solution for sentiment analysis or named entity recognition, as this focuses solely on generating document embeddings.
Stars
56
Forks
24
Language
Python
License
MIT
Category
Last pushed
May 27, 2017
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/thunlp/paragraph2vec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Planeshifter/node-word2vec
Node.js interface to the Google word2vec tool.
nathanrooy/word2vec-from-scratch-with-python
A very simple, bare-bones, inefficient, implementation of skip-gram word2vec from scratch with Python
akoksal/Turkish-Word2Vec
Pre-trained Word2Vec Model for Turkish
RichDavis1/PHPW2V
A PHP implementation of Word2Vec, a popular word embedding algorithm created by Tomas Mikolov...
YuyuZha0/word2vec
a word2vec impl of Chinese language, based on deeplearning4j and ansj