AstraBert/SenTrEv

Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs

/ 100

Emerging

This tool helps data scientists and AI/ML engineers working with Retrieval Augmented Generation (RAG) applications to compare different text embedding models. It takes your PDF, DOCX, PPTX, HTML, CSV, or XML documents and a selection of text embedding models, then provides detailed performance statistics like accuracy, retrieval time, and even carbon emissions. The output helps you confidently choose the best model for efficiently retrieving relevant information from your documents.

No commits in the last 6 months. Available on PyPI.

Use this if you need to objectively compare and select the most effective text embedding model for your RAG system by evaluating their retrieval performance, speed, and environmental impact on your specific document types.

Not ideal if you are looking for a simple, off-the-shelf RAG solution or if you don't need to benchmark multiple embedding models and fine-tune retrieval performance.

Retrieval Augmented Generation document intelligence information retrieval natural language processing AI model evaluation

Stale 6m

Maintenance 0 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Michael-JB/bm25

A BM25 embedder, scorer, and search engine, written in Rust.

jeanCarloMachado/PythonSearch

A minimalistic search engine for productivity that stores documents as code

neuml/codequestion

🔎 Semantic search for developers

chnsh/deep-semantic-code-search

Deep Semantic Code Search aims to explore a joint embedding space for code and description...

aws-samples/tabular-column-semantic-search

Code accompanying AWS blog post "Build a Semantic Search Engine for Tabular Columns with...

Explore Embedding Tools

All categories Trending Embeddings directory Insights