Skip to main content
Log in

Local and global approaches of affinity propagation clustering for large scale data

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

Recently a new clustering algorithm called “affinity propagation” (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two approaches are feasible and practicable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bell, R.M., Koren, Y., Volinsky, C., 2007. Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. Proc. 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Jose, California, USA, p.95–104. [doi:10.1145/1281192.1281206]

  • Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V., 2001. Support vector clustering. J. Machine Learning Res., 2(2):125–137.

    MATH  Google Scholar 

  • de Silva, V., Tenenbaum, J.B., 2003. Global versus Local Methods in Nonlinear Dimensionality Reduction. Neural Information Processing Systems, p.705–712.

  • de Silva, V., Tenenbaum, J.B., 2004. Sparse Multidimensional Scaling Using Landmark Points. Technical Report. Stanford University.

  • Donath, W.E., Hoffman, A.J., 1973. Lower bounds for partitioning of graphs. IBM J. Res. Dev., 17(5):420–425.

    Article  MathSciNet  MATH  Google Scholar 

  • Enright, A.J., van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30(7):1575–1584. [doi:10.1093/nar/30.7.1575]

    Article  Google Scholar 

  • Fiedler, M., 1973. Algebraic connectivity of graphs. Czech. Math. J., 23:298–305.

    MathSciNet  MATH  Google Scholar 

  • Frey, B.J., Dueck, D., 2006. Mixture Modeling by Affinity Propagation. Neural Information Processing Systems, p.379–386.

  • Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]

    Article  MathSciNet  MATH  Google Scholar 

  • Guha, S., Rastogi, R., Shim, K., 2001. CURE: an efficient clustering algorithm for large databases. Inf. Syst., 26(1): 35–58. [doi:10.1016/S0306-4379(01)00008-4]

    Article  MATH  Google Scholar 

  • Kanade, T., Cohn, J.F., Tian, Y.L., 2000. Comprehensive Database for Facial Expression Analysis. Proc. 4th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.46–53. [doi:10.1109/AFGR.2000.840611]

  • Kschischang, F.R., Frey, B.J., Loeliger, H.A., 2001. Factor graphs and the sum-product algorithm. IEEE Trans. on Inf. Theory, 47(2):498–519. [doi:10.1109/18.910572]

    Article  MathSciNet  MATH  Google Scholar 

  • MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 1:281–297.

    Google Scholar 

  • Pothen, A., Simon, H.D., Liou, K.P., 1990. Partitioning sparse matrices with eigenvectors of graph. SIAM J. Matrix Anal. Appl., 11(3):430–452. [doi:10.1137/0611030]

    Article  MathSciNet  MATH  Google Scholar 

  • Silva, J.G., Marques, J.S., Lemos, J.M., 2005. Selecting Landmark Points for Sparse Manifold Learning. Advances in Neural Information Processing Systems. MIT Press.

  • Wittman, T., 2005. MANIfold Learning Matlab Demo. Http://www.math.umn.edu/~wittman/research.html

  • Zhuang, Y.T., Zhang, X.F., Wu, J.Q., Lu, X.Q., 2004. Retrieval of Chinese Calligraphic Character Image. Proc. Pacific Rim Conf. on Multimedia, p.17–24.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wu.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 60533090 and 60603096), the National Hi-Tech Research and Development Program (863) of China (No. 2006AA010107), the Key Technology R&D Program of China (No. 2006BAH02A13-4), the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652), and the Cultivation Fund of the Key Scientific and Technical Innovation Project of MOE, China (No. 706033)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, Dy., Wu, F., Zhang, Xq. et al. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ. Sci. A 9, 1373–1381 (2008). https://doi.org/10.1631/jzus.A0720058

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.A0720058

Key words

Document code

CLC number

Navigation