dgarnitz/vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

42
/ 100
Emerging

This tool helps data engineers and ML practitioners set up a reliable pipeline to convert large volumes of raw text data (like PDFs, HTML, or Word documents) into numerical representations called 'vector embeddings'. It takes your raw files, processes them, and then stores these embeddings in a vector database of your choice. This is essential for building applications that need to understand and search through vast amounts of text, such as advanced chatbots or recommendation systems.

698 stars. No commits in the last 6 months.

Use this if you need to continuously process and store large batches of diverse text documents as vector embeddings in a scalable and fault-tolerant way.

Not ideal if you only need to embed a few documents manually or are working with non-textual data types, as this version focuses on high-volume text processing.

data-engineering information-retrieval document-processing search-infrastructure AI-pipeline
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

698

Forks

51

Language

Python

License

Apache-2.0

Last pushed

May 16, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/dgarnitz/vectorflow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.