kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
This tool helps data professionals manage large files and directories, like datasets or models, alongside their project code. It allows you to commit, checkout, fetch, and push these large data assets using simple commands. Data scientists, machine learning engineers, and even digital designers can use this to keep their data in sync with their code versions.
219 stars. No commits in the last 6 months.
Use this if you need a fast, lightweight way to version large data files and build data pipelines, especially if you prioritize speed and simplicity over an all-in-one machine learning platform.
Not ideal if you need integrated experiment tracking, metric logging, or a 'batteries-included' suite of tools for an entire machine learning workflow.
Stars
219
Forks
10
Language
Go
License
BSD-3-Clause
Category
Last pushed
Jul 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/kevin-hanselman/dud"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of...
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
mindsdb/dbt-mindsdb
dbt adapter for connecting to MindsDB
Bread-Technologies/Bread-Dataset-Viewer
VS Code extension to easily view and handle large datasets. Look at JSONL/Parquet/CSV files...