codepawl/loclean
An AI Data Cleaning Library
Need to clean up messy text data or extract specific information from it? This tool helps you transform unstructured text, like customer feedback or product descriptions, into clean, structured data without sending anything to external services. It takes your text or dataframes and outputs organized, type-safe data that's ready for analysis or further processing. Data engineers, data scientists, and anyone handling large volumes of text data in industries with strict privacy requirements would find this invaluable.
Use this if you need to reliably clean, extract, or scrub sensitive information from text data while ensuring privacy by keeping all processing on your local machines.
Not ideal if your data cleaning tasks are simple, manual, or you don't require the advanced, structured extraction capabilities of AI models.
Stars
10
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/codepawl/loclean"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio