Computing

, Volume 99, Issue 9, pp 813–840 | Cite as

Weighted clustering of attributed multi-graphs

  • Andreas Papadopoulos
  • George Pallis
  • Marios D. Dikaiakos
Article
  • 187 Downloads

Abstract

An information network modeled as an attributed multi-graph contains objects described by heterogeneous attributes and connected by multiple types of edges. In this paper we study the problem of identifying groups of related objects, namely clusters, in an attributed multi-graph. It is a challenging task since a good balance between the structural and attribute properties of the objects must be achieved, while each edge-type and each attribute contains different information and is of different importance to the clustering task. We propose a unified distance measure for attributed multi-graphs which is the first to consider simultaneously the individual importance of each object property, i.e. attribute and edge-type, as well as the balance between the sets of attributes and edges. Based on this, we design an iterative parallelizable algorithm for CLustering Attributed Multi-graPhs called CLAMP, which automatically balances the structural and attribute properties of the vertices, and clusters the network such that objects in the same cluster are characterized by similar attributes and connections. Extensive experimentation on synthetic and real-world datasets demonstrates the superiority of the proposed approach over several state-of-the-art clustering methods.

Keywords

Clustering Information networks Attributed multi-graphs 

Mathematics Subject Classification

05C22 05C40 05C78 68W10 68W15 62H30 91C20 

Notes

Acknowledgements

This work was partially supported by the EU Commission in terms of the PaaSport 605193 FP7 Project (FP7-SME-2013).

References

  1. 1.
    Akoglu L, Tong H, Meeder B, Faloutsos C (2012) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 12th SIAM international conference on data mining, SDM 2012Google Scholar
  2. 2.
    Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633CrossRefGoogle Scholar
  4. 4.
    Barbieri N, Bonchi F, Galimberti E, Gullo F (2015) Efficient and effective community search. Data Min Knowl Discov 29(5):1406–1433MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRefGoogle Scholar
  6. 6.
    Bothorel C, Cruz JD, Magnani M, Micenkova B (2015) Clustering attributed graphs: models, measures and methods. Netw Sci 3:408–444CrossRefGoogle Scholar
  7. 7.
    Cheng H, Zhou Y, Huang X, Yu J (2012) Clustering large attributed information networks: an efficient incremental computing approach. Data Min Knowl Discov 25(3):450–477MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Galbrun E, Gionis A, Tatti N (2014) Overlapping community detection in labeled graphs. Data Min Knowl Discov 28(5–6):1586–1610MathSciNetCrossRefGoogle Scholar
  9. 9.
    Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New YorkMATHGoogle Scholar
  10. 10.
    Gunnemann S, Farber I, Raubach S, Seidl T (2013) Spectral subspace clustering for graphs with feature vectors. In: 2013 IEEE 13th international conference on data mining (ICDM), pp 231–240. doi:10.1109/ICDM.2013.110
  11. 11.
    Hu X, Xu L (2004) Investigation on several model selection criteria for determining the number of cluster. Neural Inf Process Lett Rev 4(1):1–10MathSciNetGoogle Scholar
  12. 12.
    Huang HC, Chuang YY, Chen CS (2012) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20(1):120–134CrossRefGoogle Scholar
  13. 13.
    Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRefGoogle Scholar
  14. 14.
    Klawonn F, Höppner F, (2003) What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. Advances in Intelligent Data Analysis V, vol 2810, Lecture Notes in Computer Science. Springer, Berlin, pp 254–264Google Scholar
  15. 15.
    Kumar A, Rai P, Daume H (2011) Co-regularized multi-view spectral clustering. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems, vol 24. Curran Associates, Inc., pp 1413–1421Google Scholar
  16. 16.
    Li N, Sun H, Chipman KC, George J, Yan X (2014) A probabilistic approach to uncovering attributed graph anomalies. In: Zaki MJ, Obradovic Z, Tan P, Banerjee A, Kamath C, Parthasarathy S (eds) Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, SIAM, pp 82–90Google Scholar
  17. 17.
    Mann GS, McCallum A (2007) Efficient computation of entropy gradient for semi-supervised conditional random fields. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume. Short Papers, Association for Computational Linguistics, pp 109–112Google Scholar
  18. 18.
    Papadopoulos A, Pallis G, Dikaiakos MD (2013) Identifying clusters with attribute homogeneity and similar connectivity in information networks. IEEE/WIC/ACM international conference on web intelligenceGoogle Scholar
  19. 19.
    Papadopoulos A, Rafailidis D, Pallis G, Dikaiakos M (2015) Clustering attributed multi-graphs with information ranking. In: database and expert systems applications, Lecture Notes in Computer Science. Springer International PublishingGoogle Scholar
  20. 20.
    Perozzi B, Akoglu L, Sánchez PI, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’14Google Scholar
  21. 21.
    Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471CrossRefMATHGoogle Scholar
  22. 22.
    Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64CrossRefMATHGoogle Scholar
  23. 23.
    Shen S, Meng Z (2012) Optimization of initial centroids for k-means algorithm based on small world network. In: Shi Z, Leake D, Vadera S (eds) Intelligent information processing VI, IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, pp 87–96Google Scholar
  24. 24.
    Steinbach M, Kumar V (2005) Cluster analysis: basic concepts and algorithms. In: Introduction to data mining, 1st edn. Pearson Addison WesleyGoogle Scholar
  25. 25.
    Steinhaeuser K, Chawla N (2008) Community detection in a large real-world social network. In: Liu H, Salerno J, Young M (eds) Social computing, behavioral modeling, and prediction. Springer, USA, pp 168–175CrossRefGoogle Scholar
  26. 26.
    Sun H, Huang J, Han J, Deng H, Zhao P, Feng B (2010) gSkeletonClu: density-based network clustering via structure-connected tree division or agglomeration. In: Proceedings of the 2010 IEEE international conference on data mining. IEEE Computer Society, Washington, DC, ICDM ’10, pp 481–490. doi:10.1109/ICDM.2010.69
  27. 27.
    Sun Y, Aggarwal CC, Han J (2012) Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc VLDB Endow 5Google Scholar
  28. 28.
    Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010, Columbus, pp 49–59Google Scholar
  29. 29.
    Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’07, pp 824–833. doi:10.1145/1281192.1281280
  30. 30.
    Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 international conference on management of data. ACM, New York, SIGMOD ’12Google Scholar
  31. 31.
    Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2014) GBAGC: a general bayesian framework for attributed graph clustering. ACM Trans Knowl Discov Data 9(1):5:1–5:43Google Scholar
  32. 32.
    Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: IEEE international conference on data mining, IEEE, pp 1151–1156. doi:10.1109/ICDM.2013.167
  33. 33.
    Zhong E, Fan W, Yang Q, Verscheure O, Ren J (2010) Cross validation framework to choose amongst models and datasets for transfer learning. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases: part III. Springer, Berlin, ECML PKDD’10, pp 547–562Google Scholar
  34. 34.
    Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  • Andreas Papadopoulos
    • 1
  • George Pallis
    • 1
  • Marios D. Dikaiakos
    • 1
  1. 1.Department of Computer ScienceUniversity of CyprusNicosiaCyprus

Personalised recommendations