Entropy Based Measurement of Text Dissimilarity for Duplicate – Detection
- Venkatesh Kumar
- G. Rajendran
Abstract
The problem of identifying approximate similarity between pair of strings is an essential step for data cleansing and data integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity potential duplicate. But existing system does not produce the similarity percentage between pair of strings. In this paper we propose a method using entropy and information gain (IG) to find dissimilarity between pair of strings to increase the accuracy of data.
- Full Text: PDF
- DOI:10.5539/mas.v4n9p142
This work is licensed under a Creative Commons Attribution 4.0 License.
Journal Metrics
(The data was calculated based on Google Scholar Citations)
h5-index (July 2022): N/A
h5-median(July 2022): N/A
Index
- Aerospace Database
- American International Standards Institute (AISI)
- BASE (Bielefeld Academic Search Engine)
- CAB Abstracts
- CiteFactor
- CNKI Scholar
- Elektronische Zeitschriftenbibliothek (EZB)
- Excellence in Research for Australia (ERA)
- JournalGuide
- JournalSeek
- LOCKSS
- MIAR
- NewJour
- Norwegian Centre for Research Data (NSD)
- Open J-Gate
- Polska Bibliografia Naukowa
- ResearchGate
- SHERPA/RoMEO
- Standard Periodical Directory
- Ulrich's
- Universe Digital Library
- WorldCat
- ZbMATH
Contact
- Sunny LeeEditorial Assistant
- mas@ccsenet.org