ipums/hlink
Hierarchical record linkage at scale
This tool helps researchers and data analysts combine large, messy datasets that lack unique identifiers. It takes in multiple data files, like historical census records, and intelligently identifies which records likely belong to the same entity. The output is a linked dataset, making it easier to analyze relationships and trends across different sources.
Available on PyPI.
Use this if you need to accurately link individual records across large, separate datasets, even when direct matching keys are unavailable or unreliable.
Not ideal if your datasets are small, have perfect unique identifiers for merging, or you prefer a fully manual review process for every potential link.
Stars
13
Forks
2
Language
Python
License
MPL-2.0
Category
Last pushed
Jan 20, 2026
Commits (30d)
0
Dependencies
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ipums/hlink"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
treeverse/dvc
🦉 Data Versioning and ML Experiments
runpod/runpod-python
🐍 | Python library for RunPod API and serverless worker SDK.
microsoft/vscode-jupyter
VS Code Jupyter extension
4paradigm/OpenMLDB
OpenMLDB is an open-source machine learning database that provides a feature platform computing...
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning...