Skip to main content
Log in

Random walk distances in data clustering and applications

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. For further discussion of the emerging role of data geometry in the development of data clustering algorithms (see, e.g., Chen and Lerman 2009; Haralick and Harpaz 2007; Coifman and Lafon 2006).

  2. As it is usually the case in data clustering applications, the employed distance measures do not have to necessarily satisfy the properties of a metric (see, e.g., Chen and Zhang 2004).

  3. Data clustering on linear manifolds, or affine spaces was first introduced by Bock (1974). Adopting the terminology of Haralick and Harpaz (2007), we say that \(L\) is a linear manifold in a vector space \(V\) if for some vector subspace \(S\) of \(V\) and some translation \(t\in V,\,L=\{t+s \mid s\in S\}.\)

References

  • Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, Basel

    MATH  Google Scholar 

  • Alamgir M, von Luxburg U (2011) Phase transition in the family of p-resistances. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems (NIPS), vol 24. http://books.nips.cc/papers/files/nips24/NIPS2011_0278.pdf

  • Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1

  • Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  • Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396

    Article  Google Scholar 

  • Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17

  • Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203

    Google Scholar 

  • Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214

    Article  Google Scholar 

  • Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German)

  • Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34

    Chapter  Google Scholar 

  • Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New York

    MATH  Google Scholar 

  • Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053

    Article  Google Scholar 

  • Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119

    Article  MathSciNet  MATH  Google Scholar 

  • Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, Berlin

    Book  MATH  Google Scholar 

  • Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330

    Article  Google Scholar 

  • Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916

    Article  Google Scholar 

  • Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence

  • Coifman R, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30

    Article  MathSciNet  MATH  Google Scholar 

  • Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417

    Google Scholar 

  • Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New York

    Book  Google Scholar 

  • Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92

    Article  MathSciNet  MATH  Google Scholar 

  • Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(3). doi:10.11861471-2105-8-3

  • Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia

  • Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

    Article  Google Scholar 

  • Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141

  • Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684

    Article  MATH  Google Scholar 

  • Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744

    Article  Google Scholar 

  • He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340

    Article  Google Scholar 

  • Higham D, Kalna G, Kibble M (2007) Spectral clustering and its use in bioinformatics. J Comput Appl Math 204(1):25–37

    Article  MathSciNet  MATH  Google Scholar 

  • Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  • Kimmel R, Sapiro G (2003) The mathematics of face recognition. SIAM News 36(3). http://www.siam.org/news/news.php?id=309

  • Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New York

    MATH  Google Scholar 

  • Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Google Scholar 

  • Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593

    Article  MATH  Google Scholar 

  • Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258

    Article  Google Scholar 

  • Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297

  • Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632

  • Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin

  • Muller N, Magaia L, Herbst B (2004) Singular value decomposition, eigenfaces, and 3D reconstructions. SIAM Rev 46(3):518–545

    Article  MathSciNet  MATH  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856

  • Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905

    Article  Google Scholar 

  • Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895

    Article  Google Scholar 

  • Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., Chichester

    MATH  Google Scholar 

  • Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670

  • Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004

    Article  Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • von Luxburg U, Radl A, Hein M (2010) Getting lost in space: large sample analysis of the commute distance. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems (NIPS), vol 23. http://books.nips.cc/papers/files/nips23/NIPS2010_0929.pdf

  • Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324

Download references

Acknowledgments

The research of SL has been supported in part by an Alberta Wolfe Research Fellowship from the Iowa State University Mathematics department. The research of AM has been supported in part by the Mathematical Biosciences Institute and the National Science Foundation under Grant DMS 0931642. The research of SS is supported in part by NSF DMS-1159026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Matzavinos.

Appendix

Appendix

The procedure commonly employed in the literature for minimizing the FCM functional (6) is an alternating directions scheme, originally proposed by Bezdek et al. (1984). For completeness, we provide a listing of the algorithm below. More details can be found in Gan et al. (2007) and Bezdek et al. (1984), among others.

figure a1

The convergence criterion in line \(6\) is usually chosen to be of the form \(||U^{(r)}-U^{(r-1)}||<\varepsilon \) for some pre-specified \(\varepsilon >0\) (Gan et al. 2007). Here, \(U^{(r)}\) and \(U^{(r-1)}\) denote the values of the fuzzy membership matrix \(U=(u_{ij})_{i\le k, j\le n}\) in the \(r\) and \(r-1\) iteration of the loop, respectively.

Now, the \(k\) clusters of data points may be decided in terms of thresholding with respect to the membership matrix, or sometimes data points can be assigned to clusters based on their maximal membership values.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, S., Matzavinos, A. & Sethuraman, S. Random walk distances in data clustering and applications. Adv Data Anal Classif 7, 83–108 (2013). https://doi.org/10.1007/s11634-013-0125-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-013-0125-7

Keywords

Mathematics Subject Classification (2000)

Navigation