Knowledge and Information Systems

, Volume 51, Issue 3, pp 941–963 | Cite as

Fast affinity propagation clustering based on incomplete similarity matrix

Regular Paper

Abstract

Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets.

Keywords

Exemplar-based clustering Affinity propagation Incomplete similarity matrix Fast algorithm 

Notes

Acknowledgments

This work was partly supported by the Natural Science Foundation of China under Grant (Nos. 71171030 and 71501023) and partly by Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 71421001).

References

  1. 1.
    Barbakh W, Wu Y, Fyfe C (2009) Review of clustering algorithms. Non-standard parameter adaptation for exploratory data analysis, studies in computational intelligence, vol 249. Springer, Berlin, pp 7–28CrossRefGoogle Scholar
  2. 2.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396CrossRefMATHGoogle Scholar
  3. 3.
    Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer series in statistics. Springer, BerlinGoogle Scholar
  4. 4.
    Chisci L, Mavino A, Perferi G, Sciandrone M, Anile C, Colicchio G, Fuggetta F (2010) Real time epileptic seizure prediction using AR models and support vector machines. IEEE Trans Biomed Eng 57(5):1124–1132CrossRefGoogle Scholar
  5. 5.
    Drezner Z, Hamacher HW (2002) Facility location: applications and theory. Springer, BerlinCrossRefMATHGoogle Scholar
  6. 6.
    Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  7. 7.
    Dueck D (2009) Affinity propagation: clustering data by passing messages. Canadian thesis, University of TorontoGoogle Scholar
  8. 8.
  9. 9.
    Farinelli A, Denitto M, Bicego M (2011) Biclustering of expression microarray data using affinity propagation. In: Proceedings of the 6th international conference on pattern recognition in bioinformatics (PRIB’11). Springer, Berlin, pp 13–24Google Scholar
  10. 10.
    Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Frey BJ, Dueck D (2008) Response to comment on “clustering by passing messages between data points”. Science 319(5864):726CrossRefGoogle Scholar
  12. 12.
    Fujiwara Y, Irie G, Kitahara T (2011) Fast algorithm for affinity propagation. In: Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI’11). IJCAI/AAAI, Barcelona, Spain, pp 2238–2243Google Scholar
  13. 13.
    Givoni IE (2012) Beyond affinity propagation: message passing algorithms for clustering. Canadian thesis, University of TorontoGoogle Scholar
  14. 14.
    Givoni IE, Frey BJ (2009) Semi-supervised affinity propagation with instance-level constraints. In: Dyk DAV, Welling M (eds) JMLR proceedings of the 12th international conference on artificial intelligence and statistics. JMLR.org, Clearwater, FL, USA, pp 161–168Google Scholar
  15. 15.
    Givoni IE, Chung C, Frey BJ (2011) Hierarchical affinity propagation. In: 27th Conference on uncertainty in artificial intelligence (UAI’11), Barcelona, SpainGoogle Scholar
  16. 16.
    Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds ffinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637CrossRefGoogle Scholar
  17. 17.
    Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Los AltosMATHGoogle Scholar
  18. 18.
    Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666CrossRefGoogle Scholar
  19. 19.
    Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRefGoogle Scholar
  20. 20.
    Jia C, Jiang Y, Yu J (2010) Affinity propagation on identifying communities in social and biological networks. Knowledge science, engineering and management, vol 6291., Lecture notes in computer scienceSpringer, Berlin, pp 597–602CrossRefGoogle Scholar
  21. 21.
    Jia Y, Wang J, Zhang C, Hua XS (2008) Finding image exemplars using fast sparse affinity propagation. In: Proceedings of the 16th ACM international conference on multimedia (MM ’08). ACM, New York, USA, pp 639–642Google Scholar
  22. 22.
    Kariv O, Hakimi S (1979) An algorithmic approach to network location problems: the p-centers. SIAM J Appl Math 37(3):513–538MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefMATHGoogle Scholar
  24. 24.
    Kazantseva A, Szpakowicz S (2011) Linear text segmentation using affinity propagation. Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 284–293Google Scholar
  25. 25.
    Kschischang FR, Frey BJ, Loeliger HA (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2):498–519MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Lai D, Nardini C, Lu H (2011) Partitioning networks into communities by message passing. Phys Rev E 83(016):115MathSciNetGoogle Scholar
  27. 27.
    Leone M, Sumedha Weigt M (2007) Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20):2708–2715CrossRefGoogle Scholar
  28. 28.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297Google Scholar
  29. 29.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRefGoogle Scholar
  30. 30.
    Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11):2731–2744CrossRefGoogle Scholar
  31. 31.
    Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning (ICML ’09). ACM, pp 1073–1080Google Scholar
  32. 32.
    Xiao J, Wang J, Tan P, Quan L (2007) Joint affinity propagation for multiple view segmentation. In: IEEE 11th international conference on computer vision (ICCV ’07). IEEE, pp 1–7Google Scholar
  33. 33.
    Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neur Netw 16(3):645–678CrossRefGoogle Scholar
  34. 34.
    Zhang K, Kwok JT (2010) Clustered nystrom method for large scale manifold learning and dimension reduction. IEEE Trans Neur Netw 21(10):1576–1587CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Leilei Sun
    • 1
  • Chonghui Guo
    • 1
  • Chuanren Liu
    • 2
  • Hui Xiong
    • 3
  1. 1.Institute of Systems EngineeringDalian University of TechnologyDalianChina
  2. 2.Decision Sciences and MIS DepartmentDrexel UniversityPhiladelphiaUSA
  3. 3.Management Science and Information Systems DepartmentRutgers UniversityNewarkUSA

Personalised recommendations