Vatshayan/Data-Duplication-Removal-using-Machine-Learning

Final Year Project as Deletion of Duplicated data using Machine learning project with source code and Report.

29
/ 100
Experimental

This project helps data professionals clean up datasets by identifying and removing duplicate records. It takes in a dataset containing redundant information and outputs a cleaner version with only unique entries. This is useful for anyone working with large datasets, such as data analysts, researchers, or data entry specialists, who need accurate, non-repeated information.

No commits in the last 6 months.

Use this if you have a dataset with many duplicate entries and need an automated way to identify and remove them.

Not ideal if your dataset is very small or if you need to manually review each potential duplicate, as this tool focuses on automated detection.

data-cleaning data-management record-deduplication information-quality data-preprocessing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

67

Forks

8

Language

Jupyter Notebook

License

Last pushed

Dec 01, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Vatshayan/Data-Duplication-Removal-using-Machine-Learning"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.