alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
This tool helps data professionals clean up messy CSV files without manual inspection. You provide a CSV file, and it automatically detects the correct formatting, allowing you to easily read the data into tables or dataframes. Data analysts, scientists, and anyone regularly working with diverse datasets would find this useful.
1,322 stars. Available on PyPI.
Use this if you frequently receive or work with CSV files that have inconsistent delimiters, quote characters, or other formatting issues, and you want to automate the cleaning process.
Not ideal if your data is primarily in structured databases or well-defined formats other than CSV, or if you prefer manual configuration for every file.
Stars
1,322
Forks
79
Language
Python
License
MIT
Category
Last pushed
Jan 12, 2026
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/alan-turing-institute/CleverCSV"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.