dgarnitz/vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

/ 100

Emerging

This tool helps data engineers and ML practitioners set up a reliable pipeline to convert large volumes of raw text data (like PDFs, HTML, or Word documents) into numerical representations called 'vector embeddings'. It takes your raw files, processes them, and then stores these embeddings in a vector database of your choice. This is essential for building applications that need to understand and search through vast amounts of text, such as advanced chatbots or recommendation systems.

698 stars. No commits in the last 6 months.

Use this if you need to continuously process and store large batches of diverse text documents as vector embeddings in a scalable and fault-tolerant way.

Not ideal if you only need to embed a few documents manually or are working with non-textual data types, as this version focuses on high-volume text processing.

data-engineering information-retrieval document-processing search-infrastructure AI-pipeline

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

698

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Azure/azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.

curiosity-ai/catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...

supabase/embeddings-generator

GitHub Action to generate embeddings from the markdown files in your repository.

vector-ai/vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data...

wagtail/wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases

Explore Embedding Tools

All categories Trending Embeddings directory Insights