kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

37
/ 100
Emerging

This tool helps data professionals manage large files and directories, like datasets or models, alongside their project code. It allows you to commit, checkout, fetch, and push these large data assets using simple commands. Data scientists, machine learning engineers, and even digital designers can use this to keep their data in sync with their code versions.

219 stars. No commits in the last 6 months.

Use this if you need a fast, lightweight way to version large data files and build data pipelines, especially if you prioritize speed and simplicity over an all-in-one machine learning platform.

Not ideal if you need integrated experiment tracking, metric logging, or a 'batteries-included' suite of tools for an entire machine learning workflow.

data-versioning data-pipelines data-management MLOps reproducibility
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

219

Forks

10

Language

Go

License

BSD-3-Clause

Last pushed

Jul 27, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/kevin-hanselman/dud"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.