jacobmarks/semantic-document-search-plugin
Semantically search through OCR text blocks with Qdrant, Sentence Transformers, and FiftyOne!
This tool helps you quickly find specific information within large collections of scanned documents, like research papers or historical archives. It takes your scanned documents (with text extracted by OCR) and a natural language query, then identifies and shows you the most relevant text blocks. This is ideal for researchers, librarians, or anyone who needs to pinpoint exact content across many documents without relying on exact keyword matches.
No commits in the last 6 months.
Use this if you need to intelligently search through digitized documents, finding relevant passages even when your search terms aren't exact matches to the text.
Not ideal if you are working with born-digital text documents where keyword search is sufficient, or if your documents are not yet processed with OCR.
Stars
9
Forks
—
Language
Python
License
—
Category
Last pushed
Apr 05, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/jacobmarks/semantic-document-search-plugin"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
meilisearch/meilisearch
A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.
nuclia/nucliadb
NucliaDB, The AI Search database for RAG
vespa-engine/vespa
AI + Data, online. https://vespa.ai
ICIJ/datashare
A self‑hosted search engine for documents
PrithivirajDamodaran/FlashRank
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and...