nxank4/loclean
⚡️ The All-in-One Local AI Data Cleaning Library
This tool helps data professionals clean, structure, and redact sensitive information from text data using local AI. You provide unstructured text or dataframes containing descriptions, and it outputs cleaned data or structured JSON that precisely matches your defined schema. It's ideal for data scientists, analysts, and engineers handling proprietary or private data.
Available on PyPI.
Use this if you need to process sensitive customer records, internal reports, or other confidential text data that absolutely cannot leave your company's systems for cleaning or extraction.
Not ideal if you're comfortable sending your data to cloud-based AI services or if you only need very basic, rule-based text cleaning without the need for advanced AI understanding.
Stars
10
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Dependencies
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/nxank4/loclean"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
MigoXLab/dingo
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
TheDataStation/pneuma
LLM-Powered Data Discovery System for Tabular Data
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio