J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
This tool helps you identify duplicate records within a single dataset or link related records across two separate datasets. You provide your datasets, and it helps you find matches even when there are slight differences in names, addresses, or other identifiers. This is ideal for data analysts, researchers, or anyone working with large datasets that may contain inconsistent entries for the same real-world entity.
1,047 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to clean up data by finding and merging duplicate entries, or if you need to combine information about the same individuals or items from different data sources.
Not ideal if you're looking for a no-code solution, as this tool requires writing Python code to configure and execute the record linkage process.
Stars
1,047
Forks
154
Language
Python
License
BSD-3-Clause
Category
Last pushed
Feb 21, 2024
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/J535D165/recordlinkage"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
treeverse/dvc
🦉 Data Versioning and ML Experiments
runpod/runpod-python
🐍 | Python library for RunPod API and serverless worker SDK.
microsoft/vscode-jupyter
VS Code Jupyter extension
4paradigm/OpenMLDB
OpenMLDB is an open-source machine learning database that provides a feature platform computing...
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning...