Abstract
Recently a new clustering algorithm called “affinity propagation” (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two approaches are feasible and practicable.
Similar content being viewed by others
References
Bell, R.M., Koren, Y., Volinsky, C., 2007. Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. Proc. 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Jose, California, USA, p.95–104. [doi:10.1145/1281192.1281206]
Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V., 2001. Support vector clustering. J. Machine Learning Res., 2(2):125–137.
de Silva, V., Tenenbaum, J.B., 2003. Global versus Local Methods in Nonlinear Dimensionality Reduction. Neural Information Processing Systems, p.705–712.
de Silva, V., Tenenbaum, J.B., 2004. Sparse Multidimensional Scaling Using Landmark Points. Technical Report. Stanford University.
Donath, W.E., Hoffman, A.J., 1973. Lower bounds for partitioning of graphs. IBM J. Res. Dev., 17(5):420–425.
Enright, A.J., van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30(7):1575–1584. [doi:10.1093/nar/30.7.1575]
Fiedler, M., 1973. Algebraic connectivity of graphs. Czech. Math. J., 23:298–305.
Frey, B.J., Dueck, D., 2006. Mixture Modeling by Affinity Propagation. Neural Information Processing Systems, p.379–386.
Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]
Guha, S., Rastogi, R., Shim, K., 2001. CURE: an efficient clustering algorithm for large databases. Inf. Syst., 26(1): 35–58. [doi:10.1016/S0306-4379(01)00008-4]
Kanade, T., Cohn, J.F., Tian, Y.L., 2000. Comprehensive Database for Facial Expression Analysis. Proc. 4th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.46–53. [doi:10.1109/AFGR.2000.840611]
Kschischang, F.R., Frey, B.J., Loeliger, H.A., 2001. Factor graphs and the sum-product algorithm. IEEE Trans. on Inf. Theory, 47(2):498–519. [doi:10.1109/18.910572]
MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 1:281–297.
Pothen, A., Simon, H.D., Liou, K.P., 1990. Partitioning sparse matrices with eigenvectors of graph. SIAM J. Matrix Anal. Appl., 11(3):430–452. [doi:10.1137/0611030]
Silva, J.G., Marques, J.S., Lemos, J.M., 2005. Selecting Landmark Points for Sparse Manifold Learning. Advances in Neural Information Processing Systems. MIT Press.
Wittman, T., 2005. MANIfold Learning Matlab Demo. Http://www.math.umn.edu/~wittman/research.html
Zhuang, Y.T., Zhang, X.F., Wu, J.Q., Lu, X.Q., 2004. Retrieval of Chinese Calligraphic Character Image. Proc. Pacific Rim Conf. on Multimedia, p.17–24.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 60533090 and 60603096), the National Hi-Tech Research and Development Program (863) of China (No. 2006AA010107), the Key Technology R&D Program of China (No. 2006BAH02A13-4), the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652), and the Cultivation Fund of the Key Scientific and Technical Innovation Project of MOE, China (No. 706033)
Rights and permissions
About this article
Cite this article
Xia, Dy., Wu, F., Zhang, Xq. et al. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ. Sci. A 9, 1373–1381 (2008). https://doi.org/10.1631/jzus.A0720058
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.A0720058
Key words
- Clustering
- Affinity propagation
- Large scale data
- Partition affinity propagation
- Landmark affinity propagation