ClimSocAna/tecb-de

German Text Embedding Clustering Benchmark

/ 100

Experimental

This project helps researchers and data scientists evaluate how well different language models can group German texts by meaning or topic. It takes various German text datasets (like book titles, news articles, or Reddit posts) and assesses how accurately a given model can cluster them into their predefined categories. This is designed for anyone working with German language data who needs to understand and compare the performance of text embedding models for clustering tasks.

No commits in the last 6 months.

Use this if you are a researcher or data scientist evaluating or developing natural language processing models for German text clustering and need benchmark datasets and results.

Not ideal if you are looking for a ready-to-use tool to cluster your own German texts without evaluating different underlying models.

German NLP text clustering language model evaluation topic modeling computational linguistics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

Embeddings Are Easier Than Whatever You're Doing Instead You're Shipping AI You Can't Measure

Higher-rated alternatives

embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

harmonydata/harmony

The Harmony Python library: a research tool for psychologists to harmonise data and...

yannvgn/laserembeddings

LASER multilingual sentence embeddings as a pip package

embeddings-benchmark/results

Data for the MTEB leaderboard

Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Explore Embedding Tools

All categories Trending Embeddings directory Insights