Advertisement

Medoid-Shift for Noise Removal to Improve Clustering

  • Pasi FräntiEmail author
  • Jiawei Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)

Abstract

We propose to use medoid-shift to reduce the noise in data prior to clustering. The method processes every point by calculating its k-nearest neighbors (k-NN), and then replacing the point by the medoid of its neighborhood. The process can be iterated. After the data cleaning process, any clustering algorithm can be applied that is suitable for the data.

Keywords

Clustering Noise removal Outlier detection 

References

  1. 1.
    Ali, A.M., Angelov, P.: Anomalous behaviour detection based on heterogeneous data and data fusion. Soft Comput. 1–15 (2018).  https://doi.org/10.1007/s00500-017-2989-5CrossRefGoogle Scholar
  2. 2.
    Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2, pp. 93–104, May 2000Google Scholar
  3. 3.
    Brito, M.R., Chavez, E.L., Quiroz, A.J., Yukich, J.E.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Prob. Lett. 35(1), 33–42 (1997)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  5. 5.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, KDD, pp. 226–231 (1996)Google Scholar
  6. 6.
    Forgy, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21, 768–780 (1965)Google Scholar
  7. 7.
    Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)MathSciNetGoogle Scholar
  8. 8.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)CrossRefGoogle Scholar
  9. 9.
    Hautamäki, V., Cherednichenko, S., Kärkkäinen, I., Kinnunen, T., Fränti, P.: Improving k-means by outlier removal. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 978–987. Springer, Heidelberg (2005).  https://doi.org/10.1007/11499145_99CrossRefGoogle Scholar
  10. 10.
    Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, pp. 430–433, August, 2004Google Scholar
  11. 11.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)Google Scholar
  12. 12.
    Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)CrossRefGoogle Scholar
  13. 13.
    Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, NIPS, pp. 1359–1367 (2014)Google Scholar
  14. 14.
    Pollet, T.V., van der Meij, L.: To remove or not to remove: the impact of outlier handling on significance testing in testosterone data. Adapt. Hum. Behav. Physiol. 3(1), 43–60 (2017)CrossRefGoogle Scholar
  15. 15.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD Record, vol. 29, no. 2, pp. 427–438, June 2000Google Scholar
  16. 16.
    Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking by medoidshifts. In: IEEE International Conference on Computer Vision, ICCV, Rio de Janeiro, Brazil, October 2007Google Scholar
  17. 17.
    Tsai, D.-M., Luo, J.-Y.: Mean shift-based defect detection in multicrystalline solar wafer surfaces. IEEE Trans. Ind. Inf. 7(1), 125–135 (2011)CrossRefGoogle Scholar
  18. 18.
    Yin, L., Yang, R., Gabbouj, M., Neuvo, Y.: Weighted median filters: a tutorial. IEEE Trans. Circ. Syst. II: Analog Digit. Signal Process. 43(3), 157–192 (1996)CrossRefGoogle Scholar
  19. 19.
    Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of ComputingUniversity of Eastern FinlandJoensuuFinland

Personalised recommendations