AndreaBozzo/dataprof
Library and CLI for profiling tabular data
This tool helps data professionals understand and assess the quality of their tabular data, whether it's in a CSV, JSON, Parquet file, or a database. You input your raw data, and it outputs a detailed report showing statistics for each column, detected data types, and an overall data quality score based on ISO standards. It's designed for data analysts, data scientists, and data engineers who need to quickly get a handle on large datasets.
Available on PyPI.
Use this if you need to quickly understand the structure, content, and quality of large tabular datasets without worrying about memory limitations.
Not ideal if you're looking for a visual, interactive dashboard to explore your data, as this tool provides programmatic reports rather than graphical interfaces.
Stars
14
Forks
1
Language
Rust
License
MIT
Category
Last pushed
Mar 18, 2026
Monthly downloads
75
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/AndreaBozzo/dataprof"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.