dgarnitz/vectorflow
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
This tool helps data engineers and ML practitioners set up a reliable pipeline to convert large volumes of raw text data (like PDFs, HTML, or Word documents) into numerical representations called 'vector embeddings'. It takes your raw files, processes them, and then stores these embeddings in a vector database of your choice. This is essential for building applications that need to understand and search through vast amounts of text, such as advanced chatbots or recommendation systems.
698 stars. No commits in the last 6 months.
Use this if you need to continuously process and store large batches of diverse text documents as vector embeddings in a scalable and fault-tolerant way.
Not ideal if you only need to embed a few documents manually or are working with non-textual data types, as this version focuses on high-volume text processing.
Stars
698
Forks
51
Language
Python
License
Apache-2.0
Category
Last pushed
May 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/dgarnitz/vectorflow"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Azure/azure-search-vector-samples
A repository of code samples for Vector search capabilities in Azure AI Search.
curiosity-ai/catalyst
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...
supabase/embeddings-generator
GitHub Action to generate embeddings from the markdown files in your repository.
vector-ai/vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data...
wagtail/wagtail-vector-index
Store Wagtail pages & Django models as embeddings in vector databases