Skip to main content

Unsupervised Niche Clustering: Discovering an Unknown Number of Clusters in Noisy Data Sets

  • Chapter
Evolutionary Computation in Data Mining

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 163))

Abstract

As a valuable unsupervised learning tool, clustering is crucial to many applications in pattern recognition, machine learning, and data mining. Evolutionary techniques have been used with success as global searchers in difficult problems, particularly in the optimization of non-differentiable functions. Hence, they can improve clustering. However, existing evolutionary clustering techniques suffer from one or more of the following shortcomings: (i) they are not robust in the presence of noise, (ii) they assume a known number of clusters, and (iii) the size of the search space explodes exponentially with the number of clusters, or with the number of data points. We present a robust clustering algorithm, called the Unsupervised Niche Clustering algorithm (UNC), that overcomes all the above difficulties. UNC can successfully find dense areas (clusters) in feature space and determines the number of clusters automatically. The clustering problem is converted to a multimodal function optimization problem within the context of Genetic Niching. Robust cluster scale estimates are dynamically estimated using a hybrid learning scheme coupled with the genetic optimization of the cluster centers, to adapt to clusters of different sizes and noise contamination rates. Genetic optimization enables our approach to handle data with both numeric and qualitative attributes, and general subjective, non metric, even non-differentiable dissimilarity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arabie, P., Hubert, L. J. (1996): An overview of combinatorial data analysis. In Arabie, P., Hubert, L.J., Soete, G. D., editors, Clustering and Classification, 5–63. World Scientific Pub., New Jersey

    Google Scholar 

  2. Bezdek, J.C., Boggavarapu, S., Hall, L.O., Bensaid, A. (1994): Genetic algo-rithm guided clustering. In First IEEE conference on evolutionary computation, 1, 34–39, Orlando, Florida

    Article  Google Scholar 

  3. Bezdek, J. C. (1981): Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    Google Scholar 

  4. Bradley, P., Fayyad, U., Reina, C. (1998): Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98)

    Google Scholar 

  5. Bradley, P., Fayyad, U., Reina, C. (1998): Scaling em (expectation-maximization) clustering to large databases. Technical Report MSR-TR-98-35, Microsoft Research

    Google Scholar 

  6. Babu, G. B., Murty, M.N. (1993): A near-optimal initial seed value selection in the k-means algorithm using a genetic algorithm. Pattern Recognition Letters, 14, 763–769

    Article  Google Scholar 

  7. Bhuyan, J. N., Raghavan, V. V., Venkatesh, K. E (1991): Genetic algorithms for clustering with an ordered representation. In Fourth International Conference on Genetic Algorithms, 408–415

    Google Scholar 

  8. 99]Blobworld Carson, C, Thomas, M., Belongie, S., Hellerstein, J. M., Malik., J. (1999): Blobworld: a system for region-based image indexing and retrieval. In VISUAL, 509–516, Amsterdam, The Netherlands

    Google Scholar 

  9. Deb, K., Goldberg, D. E. (1989): An investigation of niche and species formation in genetic function optimization. In 3rd Intl. Conf. Genetic Algsorithms, 42–50, San Mateo, CA

    Google Scholar 

  10. Duda, R., Hart., P. (1973): Pattern classification and scene analysis. Wiley Interscience, NY.

    Google Scholar 

  11. Dunn, J. C. (1994): A fuzzy relative of the isodata process and its use in detecting compact, well separated clusters. J. Cybernetics, 3, 32–57

    MathSciNet  Google Scholar 

  12. Estivill-Castro, V., Yang, J. (2000): Fast and robust general purpose clustering algorithms. In Pacific Rim International Conference on Artificial Intelligence, 208–218

    Google Scholar 

  13. Fogel, L. J., Owens, A. J., Walsh, M. J. (1966): Artificial intelligence through simulated evolution. Wiley Publishing, New York

    Google Scholar 

  14. Fogel, D. B., Simpson, P. K. (1993): Evolving fuzzy clusters. In International Conference on Neural networks, pages 1829–1834, San Francisco, CA

    Google Scholar 

  15. Fukunaga, K. (1990): Introduction to statistical pattern recognition. Academic Press

    Google Scholar 

  16. Gustafson, E. E., Kessel, W. C. (1979): Fuzzy clustering with a fuzzy covariance matrix. In IEEE CDC, 761–766, San Diego, California

    Google Scholar 

  17. Goldberg, D.E. (1989): Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, New York

    Google Scholar 

  18. Goldberg, D.E., Richardson., J. J. (1987): Genetic algorithms with sharing for multimodal function optimization. In 2nd Intl. Conf. Genetic Algsorithms, 41–49, Cambridge, MA

    Google Scholar 

  19. Guha, S., Rastogi, R., Shim, K. (1998): Cure: An efficient clustering algorithm for large data databases. In Proceedings of the ACM SIGMOD conference on Management of Data, Seattle Washington

    Google Scholar 

  20. Hall, L. O., Ozyurt, I. O., Bezdek, J. C. (1999): Clustering with a genetically optimized approach. IEEE Trans. Evolutionary Computations, 3, 103–112

    Article  Google Scholar 

  21. Holland, J. H. (1975): Adaptation in natural and artificial systems. MIT Press

    Google Scholar 

  22. Hinton, G., Whitley, D. (1987): How learning can guide evolution. Complex Systems, 1, 495–502

    Google Scholar 

  23. Jolion, J. M., Meer, P., Bataouche, S. (1991): Robust clustering with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 791–802

    Article  Google Scholar 

  24. De Jong, K. A. (1975): An analysis of the behavior of a class of genetic adaptive systems. Doct. Diss., U. of Michigan., 36(10-5140B), 29–60

    Google Scholar 

  25. Krishnapuram, R., Freg, C. P. (1992): Fitting an unknown number of lines and planes to image data through compatible cluster merging. Pattern Recognition, 25

    Google Scholar 

  26. Krishnapuram, R., Keller, J. M. (1993): A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst., 1, 98–110

    Article  Google Scholar 

  27. Krishnapuram, R., Keller, J. M. (1994): Fuzzy and possibilistic clustering methods for computer vision. In Mitra, S., Gupta, M., Kraske, W., editors, Neural and Fuzzy Systems, 135–159. SPIE Institute Series

    Google Scholar 

  28. Lee, C. Y., Antonsson, E. K. (2000): Dynamic partitional clustering unsing evolution strategies. In 3rd Asia Pacific Conf. on simulated evolution and learning, Nagoya, Japan

    Google Scholar 

  29. MacQueen, J. (1967): Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symp. on Math. Statist, and Prob., 281–297, Berkeley, California, University of California Press

    Google Scholar 

  30. Mahfoud, S.W. (1992): Crowding and preselection revisited. In 2nd Conf. Parallel problem Solving from Nature, PPSN’ 92, Brussels, Belgium

    Google Scholar 

  31. Ng R.T., Han, J. (1994): Efficient and effective clustering methods for spatial data mining. In Proceedings of the VLDB conference, Santiago Chile

    Google Scholar 

  32. Nasraoui, O., Krishnapuram, R. (1996): A robust estimator based on density and scale optimization, and its application to clustering. In IEEE International Conference on Fuzzy Systems, 1031–1035, New Orleans

    Google Scholar 

  33. Nasraoui, O., Krishnapuram, R. (1997): Clustering using a genetic fuzzy least median of squares algorithm. In North American Fuzzy Information Processing Society Conference, Syracuse NY

    Google Scholar 

  34. Nasraoui, O., Krishnapuram, R. (2000): A novel approach to unsupervised robust clustering using genetic niching. In Ninth IEEE International Conference on Fuzzy Systems, 170–175, San Antonio, TX

    Google Scholar 

  35. Raghavan, V.V., K. Birchand, K. (1979): A clustering strategy based on a formalism of the reproductive process in a natural system. In Second International Conference on Information Storage and Retrieval, 10–22

    Google Scholar 

  36. Rousseeuw, P. J., Leroy, A.M. (1987): Robust regression and outlier detection. John Wiley & Sons, New York

    Google Scholar 

  37. Ruspini, E. (1969): A new approach to clustering. Information Control, 15, 22–32

    Article  MATH  Google Scholar 

  38. Whitley, D., Gordon, S., Mathias, K. (1994): Lamarckian evolution, the baldwin effect and function optimization. In Davidor, Y., Schwefel, H., Manner, R., editors, Parallel Problem Solving Prom Nature-PPSN III, 6–15. Springer Verlag

    Google Scholar 

  39. Zhang, T., Ramakrishnan, R., Livny, L. (1986): Birch: An efficient data clustering method for large databases. In ACM SIGMOD International Conference on Management of Data, 103–114, New York, NY, ACM Press

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nasraoui, O., Leon, E., Krishnapuram, R. (2005). Unsupervised Niche Clustering: Discovering an Unknown Number of Clusters in Noisy Data Sets. In: Ghosh, A., Jain, L.C. (eds) Evolutionary Computation in Data Mining. Studies in Fuzziness and Soft Computing, vol 163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32358-9_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-32358-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22370-2

  • Online ISBN: 978-3-540-32358-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics