BUTSpeechFIT/BaySMM
A Bayesian Multilingual Document Model
This project helps researchers and data analysts working with large volumes of text in multiple languages to automatically identify and discover common topics across those documents. You input a collection of text documents, potentially in different languages, and it outputs an understanding of the overarching themes present, without needing pre-labeled examples for every language. It's ideal for anyone analyzing multilingual content for shared insights.
No commits in the last 6 months.
Use this if you need to find common themes and topics in a collection of documents written in several different languages without having to manually label each document for every language.
Not ideal if you only work with single-language documents or if your primary goal is sentiment analysis rather than topic identification.
Stars
9
Forks
4
Language
Python
License
—
Category
Last pushed
Mar 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/BUTSpeechFIT/BaySMM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...
vinid/cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics
criteo-research/CausE
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support...