Handling Noise and Outliers in Fuzzy Clustering

  • Christian Borgelt
  • Christian Braune
  • Marie-Jeanne Lesot
  • Rudolf Kruse
Chapter
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 326)

Abstract

Since it is an unsupervised data analysis approach, clustering relies solely on the location of the data points in the data space or, alternatively, on their relative distances or similarities. As a consequence, clustering can suffer from the presence of noisy data points and outliers, which can obscure the structure of the clusters in the data and thus may drive clustering algorithms to yield suboptimal or even misleading results. Fuzzy clustering is no exception in this respect, although it features an aspect of robustness, due to which outliers and generally data points that are atypical for the clusters in the data have a lesser influence on the cluster parameters. Starting from this aspect, we provide in this paper an overview of different approaches with which fuzzy clustering can be made less sensitive to noise and outliers and categorize them according to the component of standard fuzzy clustering they modify.

References

  1. 1.
    Everitt, B.S.: Cluster Analysis. Heinemann, London (1981)Google Scholar
  2. 2.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)Google Scholar
  3. 3.
    Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)Google Scholar
  4. 4.
    Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis. Wiley, Chichester (1999)Google Scholar
  5. 5.
    Ruspini, E.H.: A new approach to clustering. Inf. Control 15(1), 22–32 (1969). Reprinted in [47], 63–70 (Academic Press, San Diego)Google Scholar
  6. 6.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)Google Scholar
  7. 7.
    Bezdek, J.C., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Dordrecht (1999)Google Scholar
  8. 8.
    Ohashi, Y.: Fuzzy clustering and robust estimation. In: Proceedings 9th Meeting SAS Users Group International Hollywood Beach, FL, USA (1984)Google Scholar
  9. 9.
    Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12, 657–664 (1991) (Elsevier Science, Amsterdam)Google Scholar
  10. 10.
    Davé, R.N., Sen, S.: On generalizing the noise clustering algorithms. In: Proceedings 7th International Fuzzy Systems Association World Congress (IFSA’97), 3, 205–210. Academia, Prague, Czech Republic (1997)Google Scholar
  11. 11.
    Keller, A.: Fuzzy clustering with outliers. In: Proceedings 19th Conference North American Fuzzy Information Processing Society (NAFIPS’00, Atlanta, Canada), pp. 143–147. IEEE Press, Piscataway, NJ, USA (2000)Google Scholar
  12. 12.
    Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2) , 98–110 (1993) (IEEE Press, Piscataway)Google Scholar
  13. 13.
    Krishnapuram, R., Keller, J.M.: The possibilistic \(c\)-means algorithm: insights and recommendations. IEEE Trans. Fuzzy Syst. 4(3), 385–393 (1996) (IEEE Press, Piscataway)Google Scholar
  14. 14.
    Pal, N.R., Pal, K., Bezdek, J.C.: A mixed C-means clustering model. In: Proceedings 6th IEEE International Conference on Fuzzy Systems (FUZZIEEE’97, Barcelona, Spain), pp. 11–21. IEEE Press, Piscataway, NJ, USA (1997)Google Scholar
  15. 15.
    Pal, N.R., Pal, K., Keller, J.M., Bezdek, J.C.: A new hybrid C-means clustering model. In: Proceedings 13th IEEE International Conference on Fuzzy Systems (FUZZIEEE’04, Budapest, Hungary), pp. 179–184. IEEE Press, Piscataway, NJ, USA (2004)Google Scholar
  16. 16.
    Pal, N.R., Pal, K., Keller, J.M., Bezdek, J.C.: A possibilistic fuzzy \(C\)-means clustering algorithm. IEEE Trans. Fuzzy Syst. 13(4), 517–530 (2005) (IEEE Press, Piscataway)Google Scholar
  17. 17.
    Masulli, F., Rosetta, S.: Soft transition from probabilistic to possibilistic fuzzy clustering. IEEE Trans. Fuzzy Syst. 14(4), 516–527 (2006) (IEEE Press, Piscataway)Google Scholar
  18. 18.
    Honda, K., Ichihashi, H., Notsu, A., Masulli, F., Rovetta, S.: Formulations, several, for graded possibilistic approach to fuzzy clustering. In: Proceedings 5th International Conference Rough Sets and Current Trends in Computing (RSCTC, : Kobe, Japan), pp. 939–948. Springer-Verlag, Berlin/Heidelberg, Germany (2006)Google Scholar
  19. 19.
    Klawonn, F., Höppner, F.: What is fuzzy about fuzzy clustering? understanding and improving the concept of the fuzzifier. In: Proceedings 5th International Symposium on Intelligent Data Analysis (IDA: Berlin, Germany), pp. 254–264. Springer-Verlag, Berlin, Germany (2003)Google Scholar
  20. 20.
    Jajuga, K.: \(L_1\)-norm based fuzzy clustering. Fuzzy Sets Syst. 39(1), 43–50 (1991) (Elsevier Science, Amsterdam)Google Scholar
  21. 21.
    Groenen, P.J.F., Jajuga, K.: Fuzzy clustering with squared minkowski distances. Fuzzy Sets Syst. 120, 227–237 (2001) (Elsevier Science, Amsterdam)Google Scholar
  22. 22.
    Groenen, P.J.F., Kaymak, U., van Rosmalen, J.: Fuzzy clustering with minkowski distance functions. In: Chapter 3 of Valente de Oliveira, J., Pedrycz, W. (eds.) Advances in Fuzzy Clustering and Its Applications. Wiley, Chichester (2007)Google Scholar
  23. 23.
    Runkler, T.A., Bezdek, J.C.: Alternating cluster estimation: a new tool for clustering and function approximation. IEEE Trans. Fuzzy Syst. 7(4), 377–393 (1999) (IEEE Press, Piscataway)Google Scholar
  24. 24.
    Łȩski, J.: An \(\varepsilon \)-insensitive approach to fuzzy clustering. Int. J. Appl. Math. Comput. Sci. 11(4), 993–1007 (2001) (University of Zielona Góra, Poland)Google Scholar
  25. 25.
    Frigui, H., Krishnapuram, R.: A robust algorithm for automatic extraction of an unknown number of clusters from noisy data. Pattern Recogn. Lett. 17, 1223–1232 (1996) (Elsevier Science, Amsterdam)Google Scholar
  26. 26.
    Borgelt, C.: Prototype-based Classification and Clustering. Otto-von-Guericke-University of Magdeburg, Germany, Habilitationsschrift (2005)Google Scholar
  27. 27.
    Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967) (Wiley, Chichester)Google Scholar
  28. 28.
    Hartigan, J.A., Wong, M.A.: A \(k\)-means clustering algorithm. Appl. Stat. 28, 100–108 (1979) (Blackwell, Oxford)Google Scholar
  29. 29.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982) (IEEE Press, Piscataway)Google Scholar
  30. 30.
    Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973). Reprinted in [47], 82–101 (American Society for Cybernetics, Washington)Google Scholar
  31. 31.
    Borgelt, C.: Objective functions for fuzzy clustering. In: Moewes, C., Nürnberger, A. (eds.) Computational Intelligence in Intelligent Data Analysis, 3–16. Springer, Berlin/Heidelberg (2012)Google Scholar
  32. 32.
    Gustafson, E.E., Kessel, W.C.: Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of the IEEE Conference on Decision and Control (CDC 1979, San Diego, CA), pp. 761–766. IEEE Press, Piscataway, NJ, USA (1979). Reprinted in [47], 117–122Google Scholar
  33. 33.
    Gath, I., Gevam, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 11, 773–781 (1989). Reprinted in [47], 211–218 (IEEE Press, Piscataway)Google Scholar
  34. 34.
    Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5, 270–293 (1997) (IEEE Press, Piscataway)Google Scholar
  35. 35.
    Davé, R.N., Sumit, S.: Generalized noise clustering as a robust fuzzy C-M-estimators model. In: Proceedings 17th Annual Conference North American Fuzzy Information Processing Society (NAFIPS’98, Pensacola Beach, Florida), pp. 256–260. IEEE Press, Piscataway, NJ, USA (1998)Google Scholar
  36. 36.
    Klawonn, F.: Noise clustering with a fixed fraction of noise. In: Lotfi, A., Garibaldi, M. (eds.) Applications and Science in Soft Computing, 133–138. Springer, Berlin/Heidelberg (2004)Google Scholar
  37. 37.
    Rehm, F., Klawonn, F., Kruse, R.: A novel approach to noise clustering for outlier detection. Soft Comput. 11(5), 489–494. Springer, Berlin/Heidelberg (2007)Google Scholar
  38. 38.
    Cimino, M.G.C.A., Frosini, G., Lazzerini, B., Marcelloni, F.: On the noise distance in robust fuzzy C-means. In: Proceedings International Conference on Computational Intelligence (ICCI, : Istanbul, Turkey), pp. 361–364. Intelligence Society, International Compliance (2004)Google Scholar
  39. 39.
    Timm, H., Borgelt, C., Döring, C., Kruse, R.: An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets Syst. 147, 3–16 (2004) (Elsevier Science, Amsterdam)Google Scholar
  40. 40.
    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, Chichester (1987)Google Scholar
  41. 41.
    Hathaway, R.J., Devenport, J.W., Bezdek, J.C.: Relational dual of the C-means clustering algorithm. Pattern Recogn. 22(2), 205–212 (1989) (Elsevier, Amsterdam)Google Scholar
  42. 42.
    Krishnapuram, R., Joshi, A., Yi, L.: A fuzzy relative of the K-medoids algorithm with application to document and snippet clustering. In: Proceedings 8th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’99, Seoul, Korea), 3, 1281–1286. IEEE Press, Piscataway, NJ, USA (1999)Google Scholar
  43. 43.
    Sen, S., Dave, R.N.: Clustering of relational data containing noise and outliers. In: Proceedings 7th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’98, Anchorage, Alaska), 3, 1411–1416. IEEE Press, Piscataway, NJ, USA (1998)Google Scholar
  44. 44.
    Bobrowski, L., Bezdek, J.C.: C-means clustering with the \(L_1\) and \(L_\infty \) norms. IEEE Trans. Syst. Man Cybern. 21(3), 545–554 (1991) (IEEE Press, Piscataway)Google Scholar
  45. 45.
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (1986)Google Scholar
  46. 46.
    Binu, T., Raju, G.: A novel fuzzy clustering method for outlier detection in data mining. Int. J. Recent Trends Eng. 1(2), 161–165 (2009) (Academy Publisher, British Virgin Islands)Google Scholar
  47. 47.
    Bezdek, J.C., Pal, N.R.: Fuzzy Models for Pattern Recognition. IEEE Press, New York (1992)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Christian Borgelt
    • 1
  • Christian Braune
    • 2
  • Marie-Jeanne Lesot
    • 3
  • Rudolf Kruse
    • 2
  1. 1.European Centre for Soft Computing Edificio de InvestigacíonCampus MieresMieresSpain
  2. 2.Dept. Knowledge Processing and Language EngineeringOtto-von-Guericke-Universität Magdeburg, Universitätsplatz 2MagdeburgGermany
  3. 3.Sorbonne Universités, UPMC Univ Paris 06ParisFrance

Personalised recommendations