PSNM: An Algorithm for Detecting Duplicates in Oceanographic Data
This work discusses a new method of identifying duplicates in surface meteorology data using PSNM (Progressive Sorted Neighborhood Method) Algorithm. Duplicate detection is the process of identifying the same representations of the real world entities in the data. This method needs to process a large amount of ocean data sets in shorter time. PSNM algorithm increases the efficiency of finding duplicates with lesser execution time and get the efficient results much earlier than traditional approaches. It is observed that all possible duplicates associated with the data can be identified using this method, and also this work proposes a new way to access the resulted (Duplicate eliminated) data using authorization restrictions based on the type of user and their need with different file conversion formats.
KeywordsDuplicate detection PSNM CTD Data cleaning
This work was completed in INCOIS Hyderabad. The Authors wish to thank Director ESSO-INCOIS, Hyderabad for the encouragement and facilities provided and also Authors wish to thank scientists for their support and guidance throughout working on this project and preparing this manuscript. We would also like to express our gratitude to our Professors in the college and Prof. S.C. Satapathy (Head of Dept.), ANITS, Visakhapatnam for his continuous support and encouragement.
- 3.Richard E. Thomson, William j. Emery, “Data Analysis Methods In Physical Oceanography”, Elsevier.Google Scholar
- 4.L. Boehme, p Lovell, M. Biuw, F Roqucet, J Nicholson, S.E. Thorpe, M.p. Meredith, and M. Fedak, “Technical Note: Animal-bornce CTD-Satellite Relay Data Loggers for real-time Oceanographic data collection.Google Scholar
- 5.Thorsten Papenbrock, Arvid Heise, and Felix Naumann, “Progressive Duplicate Detection”, IEEE Transactions on Knowledge and Data Engineering, Vol. 27, pp. 1316–1329, 2015.Google Scholar
- 6.Ashwini. V. Lakote, Lithin k, “A Study And Survey on Various Progressive Duplicate Detection Mechanisms”, IJRET International Journal of Research in Engineering and Technology, vol. 05, pp. 454–456, 2016.Google Scholar
- 7.Su Yan, Dongwon Lee, Min-Yen Kany, C. Lee Giles, “Adaptive Sorted Neighbourhood Methods for Effcient Record Linkage”, Proceedings of the ACM/IEEE–CS joint conference, pp. 185–194, 2007.Google Scholar
- 8.Erhard Rahm, Hong Hai Do, “Data Cleaning: Problems and Current Approaches”, IEEE Data Engineering Bulletin, vol. 23, 2000.Google Scholar
- 9.Arfa Skandar, Mariam Rehman, Maria Anjum, “An Efficient Duplication Record Detection Algorithm for Data Cleansing”, International Journal of Computer Applications, vol. 127, pp. 27–38, 2015.Google Scholar
- 10.Mauricio A. Hernandez, Salvatore J. Stolfo, “Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem”, Data Mining and Knowledge Discovery, vol. 2, pp. 9–37, 1998.Google Scholar