capitalone/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
This tool helps data professionals understand what's inside their datasets, including identifying sensitive information like Personally Identifiable Information (PII). You input a file (CSV, Avro, Parquet, JSON, or plain text), and it outputs a detailed report with schema information, statistical summaries for each column, and detected entities. This is for anyone who needs to quickly assess the quality, structure, and privacy implications of raw data before using it for analysis or applications.
1,548 stars. No commits in the last 6 months.
Use this if you need a quick, comprehensive overview of a dataset's content, structure, and potential sensitive data without manual inspection.
Not ideal if you primarily need advanced data transformation, complex cleaning, or visual data exploration tools.
Stars
1,548
Forks
185
Language
Python
License
Apache-2.0
Category
Last pushed
Sep 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/capitalone/DataProfiler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Snowflake-Labs/emerging-solutions-toolbox
The Emerging Solutions Toolbox is a collection of solutions created by Snowflake's Solution...
giagiannis/data-profiler
Data profiler is an attempt to model the behavior of a given operator for a set of datasets.
vyshakA/Orange-VoIP-FreePBX-Trunk
📞 Add Orange home phone service to FreePBX as a VoIP trunk with simple steps and secure login...
darsh276/snowflake-mh9
❄️ Simplify data management with snowflake-mh9, a tool that streamlines interactions with...