AstraBert/SenTrEv
Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs
This tool helps data scientists and AI/ML engineers working with Retrieval Augmented Generation (RAG) applications to compare different text embedding models. It takes your PDF, DOCX, PPTX, HTML, CSV, or XML documents and a selection of text embedding models, then provides detailed performance statistics like accuracy, retrieval time, and even carbon emissions. The output helps you confidently choose the best model for efficiently retrieving relevant information from your documents.
No commits in the last 6 months. Available on PyPI.
Use this if you need to objectively compare and select the most effective text embedding model for your RAG system by evaluating their retrieval performance, speed, and environmental impact on your specific document types.
Not ideal if you are looking for a simple, off-the-shelf RAG solution or if you don't need to benchmark multiple embedding models and fine-tune retrieval performance.
Stars
30
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jan 20, 2025
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/AstraBert/SenTrEv"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Michael-JB/bm25
A BM25 embedder, scorer, and search engine, written in Rust.
jeanCarloMachado/PythonSearch
A minimalistic search engine for productivity that stores documents as code
neuml/codequestion
🔎 Semantic search for developers
chnsh/deep-semantic-code-search
Deep Semantic Code Search aims to explore a joint embedding space for code and description...
aws-samples/tabular-column-semantic-search
Code accompanying AWS blog post "Build a Semantic Search Engine for Tabular Columns with...