Advertisement

Affinity Propagation Clustering Using Centroid-Deviation-Distance Based Similarity

  • Yifan Xie
  • Xing Wang
  • Long Zhang
  • Guoxian YuEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1072)

Abstract

Clustering is a fundamental and important task in data mining. Affinity propagation clustering (APC) has demonstrated its advantages and effectiveness in various domains. APC iteratively propagates information between affinity samples, updates the responsibility matrix and availability matrix, and employs these matrices to choose cluster centroid (or exemplar) of the respective clusters. However, since it chooses the sample points as the exemplars, these exemplars may not be the realistic centroids of the clusters they belong to. There may be some deviation between exemplars and the realistic cluster centroids. As a result, samples near the decision boundary may have a relatively large similarity with other exemplar they don’t belong to, and they are easy to be clustered incorrectly. To mitigate this problem, we propose an improved APC based on centroid-deviation-distance similarity (APC-CDD). APC-CDD firstly takes advantages of k-means on the whole samples to explore the more realistic centroid of the cluster, and then calculates the approximate centroid deviation distance of each cluster. After that, it adjusts the similarity between pairwise samples by subtracting the centroid deviation distance of the clusters they belong to. Next, it utilizes APC based on centroid-deviation-distance similarity to group samples. Our empirical study on synthetic and UCI datasets shows that the proposed APC-CDD has better performance than original APC and other related approaches.

Keywords

Clustering Affinity propagation Decision boundary Centroid-Deviation-Distance based similarity 

References

  1. 1.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)Google Scholar
  2. 2.
    Barbakh, W., Fyfe, C.: Inverse weighted clustering algorithm. Comput. Inf. Syst. 11, 10–18 (2007)Google Scholar
  3. 3.
    Bradley, P.S., Fayyad, U., Reina, C., et al.: Scaling EM (expectation-maximization) clustering to large databases. Technical report (1998)Google Scholar
  4. 4.
    Brusco, M.J., Hans-Friedrich, K.: Comment on “clustering by passing messages between data points”. Science 319(5864), 726 (2008)CrossRefGoogle Scholar
  5. 5.
    De Meo, P., Ferrara, E., Fiumara, G., Ricciardello, A.: A novel measure of edge centrality in social networks. Knowl.-Based Syst. 30, 136–150 (2012)CrossRefGoogle Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Du, H., Wang, Y., Duan, L.: A new method for grayscale image segmentation based on affinity propagation clustering algorithm. In: 2013 Ninth International Conference on Computational Intelligence and Security, pp. 170–173. IEEE (2013)Google Scholar
  8. 8.
    Frey, B.J., Delbert, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Gates, A.J., Ahn, Y.Y.: The impact of random models on clustering similarity. J. Mach. Learn. Res. 18(1), 3049–3076 (2017)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Guo, K., Guo, W., Chen, Y., Qiu, Q., Zhang, Q.: Community discovery by propagating local and global information based on the MapReduce model. Inf. Sci. 323, 73–93 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)CrossRefGoogle Scholar
  12. 12.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefGoogle Scholar
  13. 13.
    Kang, J.H., Lerman, K., Plangprasopchok, A.: Analyzing microblogs with affinity propagation. In: Proceedings of the First Workshop on Social Media Analytics, pp. 67–70. ACM (2010)Google Scholar
  14. 14.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)Google Scholar
  15. 15.
    Michele, L., Martin, W.: Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20), 2708–2715 (2007)CrossRefGoogle Scholar
  16. 16.
    Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, G.: Clustering and visualization approaches for human cell cycle gene expression data analysis. Int. J. Approximate Reasoning 47(1), 70–84 (2008)CrossRefGoogle Scholar
  17. 17.
    Papalexakis, E.E., Beutel, A., Steenkiste, P.: Network anomaly detection using co-clustering. In: Alhajj, R., Rokne, J. (eds.) Encyclopedia of Social Network Analysis and Mining, pp. 1054–1068. Springer, New York (2014).  https://doi.org/10.1007/978-1-4614-6170-8CrossRefGoogle Scholar
  18. 18.
    Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178. ACM (2010)Google Scholar
  19. 19.
    Serdah, A.M., Ashour, W.M.: Clustering large-scale data based on modified affinity propagation algorithm. J. Artif. Intell. Soft Comput. Res. 6(1), 23–33 (2016)CrossRefGoogle Scholar
  20. 20.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)Google Scholar
  21. 21.
    Walter, S.: Clustering by affinity propagation. Ph.D. thesis (2007)Google Scholar
  22. 22.
    Wang, K.J., Jian, L.I., Zhang, J.Y., Chong-Yang, T.U.: Semi-supervised affinity propagation clustering. Comput. Eng. 33(23), 197–198 (2007)Google Scholar
  23. 23.
    Wang, K., Zhang, J., Li, D., Zhang, X., Guo, T.: Adaptive affinity propagation clustering. ArXiv Preprint ArXiv:0805.1096 (2008)
  24. 24.
    Wei, F.P., Shu, D., Fu, X.L.: Unsupervised image segmentation via affinity propagation. Appl. Mech. Mater. 610, 464–470 (2014)CrossRefGoogle Scholar
  25. 25.
    Zhang, L., Du, Z.: Affinity propagation clustering with geodesic distances. J. Comput. Inf. Syst. 6(1), 47–53 (2010)Google Scholar
  26. 26.
    Zhang, R.: Two similarity measure methods based on human vision properties for image segmentation based on affinity propagation clustering. In: 2010 International Conference on Measuring Technology and Mechatronics Automation, vol. 3, pp. 1054–1058. IEEE (2010)Google Scholar
  27. 27.
    Zhang, X., Wang, W., Norvag, K., Sebag, M.: K-AP: generating specified K clusters by efficient affinity propagation. In: 2010 IEEE International Conference on Data Mining, pp. 1187–1192. IEEE (2010)Google Scholar
  28. 28.
    Zhao, C., Peng, Q., Sun, S.: Chinese text automatic summarization based on affinity propagation cluster. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 425–429. IEEE (2009)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.College of Computer and Information SciencesSouthwest UniversityChongqingChina

Personalised recommendations