treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

68
/ 100
Established

Managing a data lake often involves complex data pipelines for analytics and machine learning. This tool helps data engineers, data scientists, and data analysts manage their raw and processed data, allowing them to create isolated testing environments, ensure data quality, and easily roll back to previous versions of data. It takes your existing data in cloud storage and provides version control capabilities, outputting well-governed and reproducible datasets.

5,207 stars. Actively maintained with 50 commits in the last 30 days.

Use this if you need to reliably test changes to your data processing pipelines, reproduce past data states for debugging or compliance, or implement strong data quality gates before publishing data to critical dashboards and models.

Not ideal if your organization doesn't use cloud object storage for its data lake or if your data management needs are simple and don't require complex versioning or rigorous testing environments.

data-lake-management ETL-testing data-governance data-versioning data-reproducibility
No Package No Dependents
Maintenance 23 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

5,207

Forks

435

Language

Go

License

Apache-2.0

Last pushed

Mar 19, 2026

Commits (30d)

50

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/treeverse/lakeFS"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.