treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Managing a data lake often involves complex data pipelines for analytics and machine learning. This tool helps data engineers, data scientists, and data analysts manage their raw and processed data, allowing them to create isolated testing environments, ensure data quality, and easily roll back to previous versions of data. It takes your existing data in cloud storage and provides version control capabilities, outputting well-governed and reproducible datasets.
5,207 stars. Actively maintained with 50 commits in the last 30 days.
Use this if you need to reliably test changes to your data processing pipelines, reproduce past data states for debugging or compliance, or implement strong data quality gates before publishing data to critical dashboards and models.
Not ideal if your organization doesn't use cloud object storage for its data lake or if your data management needs are simple and don't require complex versioning or rigorous testing environments.
Stars
5,207
Forks
435
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 19, 2026
Commits (30d)
50
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/treeverse/lakeFS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.