Vatshayan/Data-Duplication-Removal-using-Machine-Learning
Final Year Project as Deletion of Duplicated data using Machine learning project with source code and Report.
This project helps data professionals clean up datasets by identifying and removing duplicate records. It takes in a dataset containing redundant information and outputs a cleaner version with only unique entries. This is useful for anyone working with large datasets, such as data analysts, researchers, or data entry specialists, who need accurate, non-repeated information.
No commits in the last 6 months.
Use this if you have a dataset with many duplicate entries and need an automated way to identify and remove them.
Not ideal if your dataset is very small or if you need to manually review each potential duplicate, as this tool focuses on automated detection.
Stars
67
Forks
8
Language
Jupyter Notebook
License
—
Category
Last pushed
Dec 01, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Vatshayan/Data-Duplication-Removal-using-Machine-Learning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
fireindark707/Python-Schema-Matching
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and...
visual-layer/fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.