bacalhau-project/bacalhau
Community-driven, simple, yet powerful framework for fast, cost-effective distributed Compute over Data.
This framework helps data scientists, machine learning engineers, and operations teams process extremely large datasets without needing to move them. You provide your data and the computations you want to run, and it orchestrates the execution directly where the data resides. This eliminates costly data transfers and speeds up processing for tasks like log analysis or distributed model training.
853 stars.
Use this if you need to perform computations on massive datasets distributed across different locations and want to minimize data movement and network egress costs.
Not ideal if your data is small, centralized, and can be easily moved to a single processing unit or if you require real-time, ultra-low-latency processing for interactive applications.
Stars
853
Forks
101
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/bacalhau-project/bacalhau"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.