Vatshayan/Data-Duplication-Removal-using-Machine-Learning

Final Year Project as Deletion of Duplicated data using Machine learning project with source code and Report.

/ 100

Experimental

This project helps data professionals clean up datasets by identifying and removing duplicate records. It takes in a dataset containing redundant information and outputs a cleaner version with only unique entries. This is useful for anyone working with large datasets, such as data analysts, researchers, or data entry specialists, who need accurate, non-repeated information.

No commits in the last 6 months.

Use this if you have a dataset with many duplicate entries and need an automated way to identify and remove them.

Not ideal if your dataset is very small or if you need to manually review each potential duplicate, as this tool focuses on automated detection.

data-cleaning data-management record-deduplication information-quality data-preprocessing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

Cloud-CV/EvalAI

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

fireindark707/Python-Schema-Matching

A python tool using XGboost and sentence-transformers to perform schema matching task on tables.

graphbookai/graphbook

Visual AI development framework for training and inference of ML models, scaling pipelines, and...

visual-layer/fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...

github/CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.

Explore ML Frameworks

All categories Trending ML Framework directory Insights