An Enhanced K-means Clustering Based Outlier Detection Techniques to Improve Water Contamination Detection and Classification

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 31)

Abstract

In many data mining applications, the primary step is detecting outliers in a dataset. Outlier detection for data mining is normally based on distance, clustering and spatial methods. This paper deals with locating outliers in large, multidimensional datasets. The k-means clustering algorithm partitions a dataset into a number of clusters, and then the results are used to find out the outliers from each cluster, using any one of the outlier’s detection methods. The k-means clustering algorithm is enhanced in three manners. The first is by using a different distance metric. The second and third enhancements are brought forward by automating the process of estimating ‘k’ value and initial seed selection using the enhanced clustering algorithm. Outliers are detected in the drinking water dataset after the clustering process is over. The results show that classification accuracy, speeds are improved and normalized root mean square error is reduced.

Keywords

K-means Similarity matrix Dissimilarity co-efficient Fixed-width clustering Distance-based Density-based 

Notes

Acknowledgments

The authors express their gratitude to TWAD Board for their whole hearted support in providing dataset for research.

The author expresses their gratitude to Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India for the progress of research work.

References

  1. 1.
    Cateni, S., Colla, V., Vannucci, M.: Outlier detection methods for industrial applications. Advances in robotics. In: Automation and Control, pp. 274–275 (2008)Google Scholar
  2. 2.
    Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 502–527 (2007)CrossRefGoogle Scholar
  3. 3.
    Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATHGoogle Scholar
  4. 4.
    Fawzy, A., Mokhtar, H.M.O., Hegazy, O.: Outliers detection and classification in wireless sensor networks. Egypt. Inf. J. 14, 157–164 (2013)CrossRefGoogle Scholar
  5. 5.
    Khan, F.: An initial seed selection algorithm for k-means clustering of geo-referenced data to improve replicability of cluster assignments for mapping application. Appl. Soft Comput. 12, 3698–3700 (2012)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  8. 8.
    Pachgade, S.D., Dhande, S.S.: Outlier detection over data set using cluster-based and distance-based approach. Int. J. Adv. Res. Comput. Sci. Soft. Eng. 2(6), 12–16 (2012)Google Scholar
  9. 9.
    Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C.: Outlier detection by example. J. Intell. Inf. Syst. 36, 217–247 (2011)CrossRefGoogle Scholar
  10. 10.
    Shi, Y., Zhang, L.: COID: a cluster–outlier iterative detection approach to multi-dimensional data analysis. Knowl. Inf. Syst. 28, 710–733 (2010)Google Scholar
  11. 11.
    Indira Priya, P., Ghosh, D.K.: A survey on different clustering algorithms in data mining techniques. Int. J. Mod. Eng. Res. 3(1), 267–274 (2013)Google Scholar
  12. 12.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1, 141–182 (1997)CrossRefGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceAvinashilingam Institute for Home Science and Higher Education for WomenCoimbatoreIndia

Personalised recommendations