An Enhanced K-means Clustering Based Outlier Detection Techniques to Improve Water Contamination Detection and Classification
In many data mining applications, the primary step is detecting outliers in a dataset. Outlier detection for data mining is normally based on distance, clustering and spatial methods. This paper deals with locating outliers in large, multidimensional datasets. The k-means clustering algorithm partitions a dataset into a number of clusters, and then the results are used to find out the outliers from each cluster, using any one of the outlier’s detection methods. The k-means clustering algorithm is enhanced in three manners. The first is by using a different distance metric. The second and third enhancements are brought forward by automating the process of estimating ‘k’ value and initial seed selection using the enhanced clustering algorithm. Outliers are detected in the drinking water dataset after the clustering process is over. The results show that classification accuracy, speeds are improved and normalized root mean square error is reduced.
KeywordsK-means Similarity matrix Dissimilarity co-efficient Fixed-width clustering Distance-based Density-based
The authors express their gratitude to TWAD Board for their whole hearted support in providing dataset for research.
The author expresses their gratitude to Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India for the progress of research work.
- 1.Cateni, S., Colla, V., Vannucci, M.: Outlier detection methods for industrial applications. Advances in robotics. In: Automation and Control, pp. 274–275 (2008)Google Scholar
- 8.Pachgade, S.D., Dhande, S.S.: Outlier detection over data set using cluster-based and distance-based approach. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(6), 12–16 (2012)Google Scholar
- 10.Shi, Y., Zhang, L.: COID: a cluster–outlier iterative detection approach to multi-dimensional data analysis. Knowl. Inf. Syst. 28, 710–733 (2010)Google Scholar
- 11.Indira Priya, P., Ghosh, D.K.: A survey on different clustering algorithms in data mining techniques. Int. J. Mod. Eng. Res. 3(1), 267–274 (2013)Google Scholar