Abstract
In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.
Similar content being viewed by others
Notes
As it is usually the case in data clustering applications, the employed distance measures do not have to necessarily satisfy the properties of a metric (see, e.g., Chen and Zhang 2004).
Data clustering on linear manifolds, or affine spaces was first introduced by Bock (1974). Adopting the terminology of Haralick and Harpaz (2007), we say that \(L\) is a linear manifold in a vector space \(V\) if for some vector subspace \(S\) of \(V\) and some translation \(t\in V,\,L=\{t+s \mid s\in S\}.\)
References
Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, Basel
Alamgir M, von Luxburg U (2011) Phase transition in the family of p-resistances. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems (NIPS), vol 24. http://books.nips.cc/papers/files/nips24/NIPS2011_0278.pdf
Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396
Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17
Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203
Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214
Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German)
Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34
Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New York
Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053
Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119
Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, Berlin
Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330
Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916
Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence
Coifman R, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30
Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417
Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New York
Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92
Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(3). doi:10.11861471-2105-8-3
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia
Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141
Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684
Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744
He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Higham D, Kalna G, Kibble M (2007) Spectral clustering and its use in bioinformatics. J Comput Appl Math 204(1):25–37
Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Kimmel R, Sapiro G (2003) The mathematics of face recognition. SIAM News 36(3). http://www.siam.org/news/news.php?id=309
Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New York
Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593
Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297
Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632
Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin
Muller N, Magaia L, Herbst B (2004) Singular value decomposition, eigenfaces, and 3D reconstructions. SIAM Rev 46(3):518–545
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856
Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905
Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895
Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., Chichester
Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670
Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
von Luxburg U, Radl A, Hein M (2010) Getting lost in space: large sample analysis of the commute distance. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems (NIPS), vol 23. http://books.nips.cc/papers/files/nips23/NIPS2010_0929.pdf
Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324
Acknowledgments
The research of SL has been supported in part by an Alberta Wolfe Research Fellowship from the Iowa State University Mathematics department. The research of AM has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under Grant DMS 0931642. The research of SS is supported in part by NSF DMS-1159026.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The procedure commonly employed in the literature for minimizing the FCM functional (6) is an alternating directions scheme, originally proposed by Bezdek et al. (1984). For completeness, we provide a listing of the algorithm below. More details can be found in Gan et al. (2007) and Bezdek et al. (1984), among others.
The convergence criterion in line \(6\) is usually chosen to be of the form \(||U^{(r)}-U^{(r-1)}||<\varepsilon \) for some pre-specified \(\varepsilon >0\) (Gan et al. 2007). Here, \(U^{(r)}\) and \(U^{(r-1)}\) denote the values of the fuzzy membership matrix \(U=(u_{ij})_{i\le k, j\le n}\) in the \(r\) and \(r-1\) iteration of the loop, respectively.
Now, the \(k\) clusters of data points may be decided in terms of thresholding with respect to the membership matrix, or sometimes data points can be assigned to clusters based on their maximal membership values.
Rights and permissions
About this article
Cite this article
Liu, S., Matzavinos, A. & Sethuraman, S. Random walk distances in data clustering and applications. Adv Data Anal Classif 7, 83–108 (2013). https://doi.org/10.1007/s11634-013-0125-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-013-0125-7
Keywords
- Spectral clustering
- Fuzzy clustering methods
- Random walks
- Graph Laplacian
- Mahalanobis
- Face identification