AndreaBozzo/dataprof

Library and CLI for profiling tabular data

52
/ 100
Established

This tool helps data professionals understand and assess the quality of their tabular data, whether it's in a CSV, JSON, Parquet file, or a database. You input your raw data, and it outputs a detailed report showing statistics for each column, detected data types, and an overall data quality score based on ISO standards. It's designed for data analysts, data scientists, and data engineers who need to quickly get a handle on large datasets.

Available on PyPI.

Use this if you need to quickly understand the structure, content, and quality of large tabular datasets without worrying about memory limitations.

Not ideal if you're looking for a visual, interactive dashboard to explore your data, as this tool provides programmatic reports rather than graphical interfaces.

data-quality data-analysis data-engineering data-governance dataset-exploration
No Dependents
Maintenance 13 / 25
Adoption 9 / 25
Maturity 24 / 25
Community 6 / 25

How are scores calculated?

Stars

14

Forks

1

Language

Rust

License

MIT

Last pushed

Mar 18, 2026

Monthly downloads

75

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/AndreaBozzo/dataprof"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.