ofirbsh/ai_embedder_engine
AI Embedder Engine: An open-source Python engine for generating embeddings from PDFs, storing them in Parquet, and indexing with FAISS for semantic search.
This project helps you turn large PDF documents into organized, searchable data. You provide your PDFs, and it outputs structured data files containing numerical representations (embeddings) of your document content, ready for advanced searching. This is ideal for researchers, legal professionals, technical writers, or anyone who needs to quickly find specific information within a large collection of domain-specific documents.
No commits in the last 6 months.
Use this if you need to transform many PDF documents into a format that enables powerful semantic search or integration with AI systems like RAG (Retrieval Augmented Generation).
Not ideal if you only need to perform simple keyword searches or if your documents are not primarily text-based PDFs.
Stars
8
Forks
1
Language
Python
License
—
Category
Last pushed
Aug 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ofirbsh/ai_embedder_engine"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.