J535D165/recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python

57
/ 100
Established

This tool helps you identify duplicate records within a single dataset or link related records across two separate datasets. You provide your datasets, and it helps you find matches even when there are slight differences in names, addresses, or other identifiers. This is ideal for data analysts, researchers, or anyone working with large datasets that may contain inconsistent entries for the same real-world entity.

1,047 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to clean up data by finding and merging duplicate entries, or if you need to combine information about the same individuals or items from different data sources.

Not ideal if you're looking for a no-code solution, as this tool requires writing Python code to configure and execute the record linkage process.

data-matching data-deduplication customer-data-integration research-data-cleaning entity-resolution
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

1,047

Forks

154

Language

Python

License

BSD-3-Clause

Last pushed

Feb 21, 2024

Commits (30d)

0

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/J535D165/recordlinkage"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.