PROGRESSIVE DUPLICATION DETECTION USING RABIN- KARP ALGORITHM

##plugins.themes.bootstrap3.article.main##

Sundari P Deepasamili S

Abstract

It is an important task of detecting exact duplicate records from the data source. Duplicate record detection is the problem of identifying records in the database that represent the same real-world entity. Duplicate records do not share a common key and that makes detecting the duplicates a difficult problem. An efficient and accurate content-based online duplicate detection method is a fundamental research goal to identify duplicate content on the large storage datasets. Despite the recent progress made duplicate detection, it remains very challenging to develop accurate duplicate detection mechanism for large-scale databases. This paper presents progressive duplicate detection algorithm that will increase the efficiency of finding duplicates from the web online dataset. This approach specifies a progressive duplication detection method using Rabin-Karp algorithm. The hash value is generated for each data in the tuples. This hash value is used for matching the data. Through experiments conducted, the algorithm achieves high precision and better accuracy in duplicate detection with many datasets. Rabin-Karp duplicate detection algorithm outperforms other duplicate detection algorithm in terms of throughput and efficiency.

##plugins.themes.bootstrap3.article.details##

Section
Articles