Abstract
As a valuable unsupervised learning tool, clustering is crucial to many applications in pattern recognition, machine learning, and data mining. Evolutionary techniques have been used with success as global searchers in difficult problems, particularly in the optimization of non-differentiable functions. Hence, they can improve clustering. However, existing evolutionary clustering techniques suffer from one or more of the following shortcomings: (i) they are not robust in the presence of noise, (ii) they assume a known number of clusters, and (iii) the size of the search space explodes exponentially with the number of clusters, or with the number of data points. We present a robust clustering algorithm, called the Unsupervised Niche Clustering algorithm (UNC), that overcomes all the above difficulties. UNC can successfully find dense areas (clusters) in feature space and determines the number of clusters automatically. The clustering problem is converted to a multimodal function optimization problem within the context of Genetic Niching. Robust cluster scale estimates are dynamically estimated using a hybrid learning scheme coupled with the genetic optimization of the cluster centers, to adapt to clusters of different sizes and noise contamination rates. Genetic optimization enables our approach to handle data with both numeric and qualitative attributes, and general subjective, non metric, even non-differentiable dissimilarity measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arabie, P., Hubert, L. J. (1996): An overview of combinatorial data analysis. In Arabie, P., Hubert, L.J., Soete, G. D., editors, Clustering and Classification, 5–63. World Scientific Pub., New Jersey
Bezdek, J.C., Boggavarapu, S., Hall, L.O., Bensaid, A. (1994): Genetic algo-rithm guided clustering. In First IEEE conference on evolutionary computation, 1, 34–39, Orlando, Florida
Bezdek, J. C. (1981): Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Bradley, P., Fayyad, U., Reina, C. (1998): Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98)
Bradley, P., Fayyad, U., Reina, C. (1998): Scaling em (expectation-maximization) clustering to large databases. Technical Report MSR-TR-98-35, Microsoft Research
Babu, G. B., Murty, M.N. (1993): A near-optimal initial seed value selection in the k-means algorithm using a genetic algorithm. Pattern Recognition Letters, 14, 763–769
Bhuyan, J. N., Raghavan, V. V., Venkatesh, K. E (1991): Genetic algorithms for clustering with an ordered representation. In Fourth International Conference on Genetic Algorithms, 408–415
99]Blobworld Carson, C, Thomas, M., Belongie, S., Hellerstein, J. M., Malik., J. (1999): Blobworld: a system for region-based image indexing and retrieval. In VISUAL, 509–516, Amsterdam, The Netherlands
Deb, K., Goldberg, D. E. (1989): An investigation of niche and species formation in genetic function optimization. In 3rd Intl. Conf. Genetic Algsorithms, 42–50, San Mateo, CA
Duda, R., Hart., P. (1973): Pattern classification and scene analysis. Wiley Interscience, NY.
Dunn, J. C. (1994): A fuzzy relative of the isodata process and its use in detecting compact, well separated clusters. J. Cybernetics, 3, 32–57
Estivill-Castro, V., Yang, J. (2000): Fast and robust general purpose clustering algorithms. In Pacific Rim International Conference on Artificial Intelligence, 208–218
Fogel, L. J., Owens, A. J., Walsh, M. J. (1966): Artificial intelligence through simulated evolution. Wiley Publishing, New York
Fogel, D. B., Simpson, P. K. (1993): Evolving fuzzy clusters. In International Conference on Neural networks, pages 1829–1834, San Francisco, CA
Fukunaga, K. (1990): Introduction to statistical pattern recognition. Academic Press
Gustafson, E. E., Kessel, W. C. (1979): Fuzzy clustering with a fuzzy covariance matrix. In IEEE CDC, 761–766, San Diego, California
Goldberg, D.E. (1989): Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, New York
Goldberg, D.E., Richardson., J. J. (1987): Genetic algorithms with sharing for multimodal function optimization. In 2nd Intl. Conf. Genetic Algsorithms, 41–49, Cambridge, MA
Guha, S., Rastogi, R., Shim, K. (1998): Cure: An efficient clustering algorithm for large data databases. In Proceedings of the ACM SIGMOD conference on Management of Data, Seattle Washington
Hall, L. O., Ozyurt, I. O., Bezdek, J. C. (1999): Clustering with a genetically optimized approach. IEEE Trans. Evolutionary Computations, 3, 103–112
Holland, J. H. (1975): Adaptation in natural and artificial systems. MIT Press
Hinton, G., Whitley, D. (1987): How learning can guide evolution. Complex Systems, 1, 495–502
Jolion, J. M., Meer, P., Bataouche, S. (1991): Robust clustering with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 791–802
De Jong, K. A. (1975): An analysis of the behavior of a class of genetic adaptive systems. Doct. Diss., U. of Michigan., 36(10-5140B), 29–60
Krishnapuram, R., Freg, C. P. (1992): Fitting an unknown number of lines and planes to image data through compatible cluster merging. Pattern Recognition, 25
Krishnapuram, R., Keller, J. M. (1993): A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst., 1, 98–110
Krishnapuram, R., Keller, J. M. (1994): Fuzzy and possibilistic clustering methods for computer vision. In Mitra, S., Gupta, M., Kraske, W., editors, Neural and Fuzzy Systems, 135–159. SPIE Institute Series
Lee, C. Y., Antonsson, E. K. (2000): Dynamic partitional clustering unsing evolution strategies. In 3rd Asia Pacific Conf. on simulated evolution and learning, Nagoya, Japan
MacQueen, J. (1967): Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symp. on Math. Statist, and Prob., 281–297, Berkeley, California, University of California Press
Mahfoud, S.W. (1992): Crowding and preselection revisited. In 2nd Conf. Parallel problem Solving from Nature, PPSN’ 92, Brussels, Belgium
Ng R.T., Han, J. (1994): Efficient and effective clustering methods for spatial data mining. In Proceedings of the VLDB conference, Santiago Chile
Nasraoui, O., Krishnapuram, R. (1996): A robust estimator based on density and scale optimization, and its application to clustering. In IEEE International Conference on Fuzzy Systems, 1031–1035, New Orleans
Nasraoui, O., Krishnapuram, R. (1997): Clustering using a genetic fuzzy least median of squares algorithm. In North American Fuzzy Information Processing Society Conference, Syracuse NY
Nasraoui, O., Krishnapuram, R. (2000): A novel approach to unsupervised robust clustering using genetic niching. In Ninth IEEE International Conference on Fuzzy Systems, 170–175, San Antonio, TX
Raghavan, V.V., K. Birchand, K. (1979): A clustering strategy based on a formalism of the reproductive process in a natural system. In Second International Conference on Information Storage and Retrieval, 10–22
Rousseeuw, P. J., Leroy, A.M. (1987): Robust regression and outlier detection. John Wiley & Sons, New York
Ruspini, E. (1969): A new approach to clustering. Information Control, 15, 22–32
Whitley, D., Gordon, S., Mathias, K. (1994): Lamarckian evolution, the baldwin effect and function optimization. In Davidor, Y., Schwefel, H., Manner, R., editors, Parallel Problem Solving Prom Nature-PPSN III, 6–15. Springer Verlag
Zhang, T., Ramakrishnan, R., Livny, L. (1986): Birch: An efficient data clustering method for large databases. In ACM SIGMOD International Conference on Management of Data, 103–114, New York, NY, ACM Press
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nasraoui, O., Leon, E., Krishnapuram, R. (2005). Unsupervised Niche Clustering: Discovering an Unknown Number of Clusters in Noisy Data Sets. In: Ghosh, A., Jain, L.C. (eds) Evolutionary Computation in Data Mining. Studies in Fuzziness and Soft Computing, vol 163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32358-9_8
Download citation
DOI: https://doi.org/10.1007/3-540-32358-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22370-2
Online ISBN: 978-3-540-32358-7
eBook Packages: EngineeringEngineering (R0)