dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

49
/ 100
Emerging

This tool helps you combine, clean, and organize messy datasets by identifying and linking similar records, even if they aren't an exact match. You provide one or more tables (like customer lists or product catalogs), and it identifies duplicates, merges related entries, or groups similar items together. Data analysts, marketers, or anyone dealing with complex, unstandardized business data would find this useful for tasks like customer 360 views or inventory management.

135 stars.

Use this if you need to accurately link, deduplicate, or categorize records across different datasets where common identifiers might be inconsistent or missing.

Not ideal if your data is already perfectly clean, normalized, and only requires exact matches for joining, or if you need to process extremely large datasets without access to cloud-based LLM services.

data-matching customer-360 data-quality master-data-management entity-resolution
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

135

Forks

13

Language

Python

License

GPL-3.0

Last pushed

Feb 15, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/dell-research-harvard/linktransformer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.