BUTSpeechFIT/BaySMM

A Bayesian Multilingual Document Model

/ 100

Experimental

This project helps researchers and data analysts working with large volumes of text in multiple languages to automatically identify and discover common topics across those documents. You input a collection of text documents, potentially in different languages, and it outputs an understanding of the overarching themes present, without needing pre-labeled examples for every language. It's ideal for anyone analyzing multilingual content for shared insights.

No commits in the last 6 months.

Use this if you need to find common themes and topics in a collection of documents written in several different languages without having to manually label each document for every language.

Not ideal if you only work with single-language documents or if your primary goal is sentiment analysis rather than topic identification.

multilingual-content-analysis topic-discovery cross-lingual-research document-classification natural-language-processing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...

vinid/cade

Compass-aligned Distributional Embeddings. Align embeddings from different corpora

spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

criteo-research/CausE

Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support...

Explore Embedding Tools

All categories Trending Embeddings directory Insights