Abstract
To address the difficulties of “data noise sensitivity” and “cluster center variance” in mainstream clustering algorithms, we propose a novel robust approach for identifying cluster centers unambiguously from data contaminated with noise; it incorporates the strength of homophilic degrees and graph kernel. Exploiting that in-degrees can breed the homophilic distribution if ordered by their associated sorted out-degrees, it is easy to separate clusters from noise. Then we apply the diffusion kernel to the graph formed by clusters so as to obtain graph kernel matrix, which is treated as the measurement of global similarities. Based on local data densities and global similarities, the proposed approach manages to identify cluster centers precisely. Experiments on various synthetic and real-world databases verify the superiority of our algorithm in comparison with state-of-the-art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Byers, S., Raftery, A.E.: Nearest-neighbor clutter removal for estimating features in spatial point processes. J. Am. Stat. Assoc. 93(442), 577–584 (1998)
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)
Cho, M., MuLee, K.: Authority-shift clustering: hierarchical clustering by authority seeking on graphs. In: pp. 3193–3200. IEEE (2010)
Cho, M., Lee, K.M.: Mode-seeking on graphs via random walks. In: pp. 606–613. IEEE (2012)
Dietterich, T.G., Bakiri, G.: Error-correcting output codes: a general method for improving multiclass inductive learning programs. In: AAAI, pp. 572–577. Citeseer (1991)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 643–660 (2001)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nipps 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp. 545–552 (2004)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete input spaces. In: pp. 315–322. Morgan Kaufmann (2002)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 69(2 Pt 2), 026113 (2004)
O’Sullivan, D., Unwin, D.: Geographic Information Analysis. Wiley, Hoboken (2010)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Papadopoulos, F., Kitsak, M., Serrano, M., Bogun, M., Krioukov, D.: Popularity versus similarity in growing networks. Nature 489(7417), 537 (2012)
Phillips, P.J., Flynn, P.J., Scruggs, T., Bowyer, K.W., Chang, J., Hoffman, K., Marques, J., Min, J., Worek, W.: Overview of the face recognition grand challenge. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 947–954. IEEE (2005)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492 (2014)
Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop on Applications of Computer Vision, pp. 138–142. IEEE (1994)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Wang, F.: Quantitative Methods and Applications in GIS. CRC Press, Boca Raton (2006)
Wong, M.A., Lane, T.: A kth nearest neighbour clustering procedure. J. R. Stat. Soc. Ser. B (Methodol.) 45(3), 362–368 (1983)
Zhao, D., Tang, X.: Homophilic clustering by locally asymmetric geometry. Eprint Arxiv (2014)
Acknowledgments
This work was supported by the grants from China National Natural Science Foundation under Grant No. 613278050 & 61210013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, H., Zhao, D., Cao, L., Sun, F. (2016). A Precise and Robust Clustering Approach Using Homophilic Degrees of Graph Kernel. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)