Random walk distances in data clustering and applications

  • Sijia Liu
  • Anastasios MatzavinosEmail author
  • Sunder Sethuraman
Regular Article


In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.


Spectral clustering Fuzzy clustering methods Random walks  Graph Laplacian Mahalanobis Face identification 

Mathematics Subject Classification (2000)

60J20 62H30 



The research of SL has been supported in part by an Alberta Wolfe Research Fellowship from the Iowa State University Mathematics department. The research of AM has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under Grant DMS 0931642. The research of SS is supported in part by NSF DMS-1159026.


  1. Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, BaselzbMATHGoogle Scholar
  2. Alamgir M, von Luxburg U (2011) Phase transition in the family of p-resistances. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems (NIPS), vol 24.
  3. Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1Google Scholar
  4. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRefGoogle Scholar
  5. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396CrossRefGoogle Scholar
  6. Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17Google Scholar
  7. Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203Google Scholar
  8. Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214CrossRefGoogle Scholar
  9. Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German)Google Scholar
  10. Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34CrossRefGoogle Scholar
  11. Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New YorkzbMATHGoogle Scholar
  12. Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053CrossRefGoogle Scholar
  13. Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119MathSciNetzbMATHCrossRefGoogle Scholar
  14. Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, BerlinzbMATHCrossRefGoogle Scholar
  15. Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330CrossRefGoogle Scholar
  16. Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916CrossRefGoogle Scholar
  17. Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, ProvidenceGoogle Scholar
  18. Coifman R, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30MathSciNetzbMATHCrossRefGoogle Scholar
  19. Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417Google Scholar
  20. Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRefGoogle Scholar
  21. Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92MathSciNetzbMATHCrossRefGoogle Scholar
  22. Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(3). doi: 10.11861471-2105-8-3
  23. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, PhiladelphiaGoogle Scholar
  24. Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRefGoogle Scholar
  25. Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141Google Scholar
  26. Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684zbMATHCrossRefGoogle Scholar
  27. Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744CrossRefGoogle Scholar
  28. He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRefGoogle Scholar
  29. Higham D, Kalna G, Kibble M (2007) Spectral clustering and its use in bioinformatics. J Comput Appl Math 204(1):25–37MathSciNetzbMATHCrossRefGoogle Scholar
  30. Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRefGoogle Scholar
  31. Kimmel R, Sapiro G (2003) The mathematics of face recognition. SIAM News 36(3).
  32. Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkzbMATHGoogle Scholar
  33. Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698Google Scholar
  34. Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593zbMATHCrossRefGoogle Scholar
  35. Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258CrossRefGoogle Scholar
  36. Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297Google Scholar
  37. Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632Google Scholar
  38. Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, BerlinGoogle Scholar
  39. Muller N, Magaia L, Herbst B (2004) Singular value decomposition, eigenfaces, and 3D reconstructions. SIAM Rev 46(3):518–545MathSciNetzbMATHCrossRefGoogle Scholar
  40. Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856Google Scholar
  41. Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58Google Scholar
  42. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905CrossRefGoogle Scholar
  43. Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895CrossRefGoogle Scholar
  44. Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., ChichesterzbMATHGoogle Scholar
  45. Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670Google Scholar
  46. Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004CrossRefGoogle Scholar
  47. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416MathSciNetCrossRefGoogle Scholar
  48. von Luxburg U, Radl A, Hein M (2010) Getting lost in space: large sample analysis of the commute distance. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems (NIPS), vol 23.
  49. Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sijia Liu
    • 1
  • Anastasios Matzavinos
    • 1
    Email author
  • Sunder Sethuraman
    • 2
  1. 1.Department of MathematicsIowa State UniversityAmesUSA
  2. 2.Department of MathematicsUniversity of ArizonaTucsonUSA

Personalised recommendations