ipums/hlink

Hierarchical record linkage at scale

51
/ 100
Established

This tool helps researchers and data analysts combine large, messy datasets that lack unique identifiers. It takes in multiple data files, like historical census records, and intelligently identifies which records likely belong to the same entity. The output is a linked dataset, making it easier to analyze relationships and trends across different sources.

Available on PyPI.

Use this if you need to accurately link individual records across large, separate datasets, even when direct matching keys are unavailable or unreliable.

Not ideal if your datasets are small, have perfect unique identifiers for merging, or you prefer a fully manual review process for every potential link.

data-matching historical-research social-science-data demographic-analysis probabilistic-matching
Maintenance 10 / 25
Adoption 5 / 25
Maturity 25 / 25
Community 11 / 25

How are scores calculated?

Stars

13

Forks

2

Language

Python

License

MPL-2.0

Last pushed

Jan 20, 2026

Commits (30d)

0

Dependencies

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ipums/hlink"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.