ofirbsh/ai_embedder_engine

AI Embedder Engine: An open-source Python engine for generating embeddings from PDFs, storing them in Parquet, and indexing with FAISS for semantic search.

/ 100

Experimental

This project helps you turn large PDF documents into organized, searchable data. You provide your PDFs, and it outputs structured data files containing numerical representations (embeddings) of your document content, ready for advanced searching. This is ideal for researchers, legal professionals, technical writers, or anyone who needs to quickly find specific information within a large collection of domain-specific documents.

No commits in the last 6 months.

Use this if you need to transform many PDF documents into a format that enables powerful semantic search or integration with AI systems like RAG (Retrieval Augmented Generation).

Not ideal if you only need to perform simple keyword searches or if your documents are not primarily text-based PDFs.

information-retrieval document-management legal-research medical-information technical-documentation

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 4 / 25

Maturity 15 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

deepset-ai/haystack-tutorials

Here you can find all the Tutorials for Haystack 📓

aryn-ai/sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

MaartenGr/PolyFuzz

Fuzzy string matching, grouping, and evaluation.

unum-cloud/USearch

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...

towhee-io/towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Explore Embedding Tools

All categories Trending Embeddings directory Insights