Weighted clustering of attributed multi-graphs

Papadopoulos, Andreas; Pallis, George; Dikaiakos, Marios D.

doi:10.1007/s00607-016-0526-5

Weighted clustering of attributed multi-graphs

Published: 01 December 2016

Volume 99, pages 813–840, (2017)
Cite this article

Computing Aims and scope Submit manuscript

Andreas Papadopoulos¹,
George Pallis¹ &
Marios D. Dikaiakos¹

591 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

An information network modeled as an attributed multi-graph contains objects described by heterogeneous attributes and connected by multiple types of edges. In this paper we study the problem of identifying groups of related objects, namely clusters, in an attributed multi-graph. It is a challenging task since a good balance between the structural and attribute properties of the objects must be achieved, while each edge-type and each attribute contains different information and is of different importance to the clustering task. We propose a unified distance measure for attributed multi-graphs which is the first to consider simultaneously the individual importance of each object property, i.e. attribute and edge-type, as well as the balance between the sets of attributes and edges. Based on this, we design an iterative parallelizable algorithm for CLustering Attributed Multi-graPhs called CLAMP, which automatically balances the structural and attribute properties of the vertices, and clusters the network such that objects in the same cluster are characterized by similar attributes and connections. Extensive experimentation on synthetic and real-world datasets demonstrates the superiority of the proposed approach over several state-of-the-art clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Graph based anomaly detection and description: a survey

Article 05 July 2014

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Notes

Similarly, overlapping clustering assigns an object to multiple clusters with binary memberships [32]. Though, membership probabilities provide more information, i.e. importance of an object in a cluster [14].
This dataset is available online at EU Open Data Portal—http://open-data.europa.eu.
Edge weights have been scaled to [0, 1].
Type-similar Connectivity can be calculated on directed graphs as well.
Other distance functions such as Minkowski or Semantic could be adopted as well.
Hence, \(\mathscr {C}_k\) is a valid parameter to Eqs. (1)–(6).
Also, it is suitable for our problem since Eq. (8) is differentiable. Alternatively, optimization techniques such as simulated annealing and Newton’s optimization method could be adopted. However, these techniques may impose new parameters to the model, i.e. temperature parameter, or require expensive computations at each iteration, i.e. second order derivatives, while they do not guarantee better results.
Alternatively, several centroid initialization methods could be extended and used in the proposed approach, such as the works of Bahmani et al. [3] and Shen and Meng [23], to preprocess the network aiming to reduce the number of iterations and/or improve clustering accuracy.
The full DBLP dataset is available at http://kdl.cs.umass.edu/data/dblp/dblp-info.html.

References

Akoglu L, Tong H, Meeder B, Faloutsos C (2012) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 12th SIAM international conference on data mining, SDM 2012
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Article MathSciNet Google Scholar
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633
Article Google Scholar
Barbieri N, Bonchi F, Galimberti E, Gullo F (2015) Efficient and effective community search. Data Min Knowl Discov 29(5):1406–1433
Article MathSciNet Google Scholar
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Article Google Scholar
Bothorel C, Cruz JD, Magnani M, Micenkova B (2015) Clustering attributed graphs: models, measures and methods. Netw Sci 3:408–444
Article Google Scholar
Cheng H, Zhou Y, Huang X, Yu J (2012) Clustering large attributed information networks: an efficient incremental computing approach. Data Min Knowl Discov 25(3):450–477
Article MathSciNet MATH Google Scholar
Galbrun E, Gionis A, Tatti N (2014) Overlapping community detection in labeled graphs. Data Min Knowl Discov 28(5–6):1586–1610
Article MathSciNet Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York
MATH Google Scholar
Gunnemann S, Farber I, Raubach S, Seidl T (2013) Spectral subspace clustering for graphs with feature vectors. In: 2013 IEEE 13th international conference on data mining (ICDM), pp 231–240. doi:10.1109/ICDM.2013.110
Hu X, Xu L (2004) Investigation on several model selection criteria for determining the number of cluster. Neural Inf Process Lett Rev 4(1):1–10
MathSciNet Google Scholar
Huang HC, Chuang YY, Chen CS (2012) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20(1):120–134
Article Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Klawonn F, Höppner F, (2003) What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. Advances in Intelligent Data Analysis V, vol 2810, Lecture Notes in Computer Science. Springer, Berlin, pp 254–264
Kumar A, Rai P, Daume H (2011) Co-regularized multi-view spectral clustering. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems, vol 24. Curran Associates, Inc., pp 1413–1421
Li N, Sun H, Chipman KC, George J, Yan X (2014) A probabilistic approach to uncovering attributed graph anomalies. In: Zaki MJ, Obradovic Z, Tan P, Banerjee A, Kamath C, Parthasarathy S (eds) Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, SIAM, pp 82–90
Mann GS, McCallum A (2007) Efficient computation of entropy gradient for semi-supervised conditional random fields. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume. Short Papers, Association for Computational Linguistics, pp 109–112
Papadopoulos A, Pallis G, Dikaiakos MD (2013) Identifying clusters with attribute homogeneity and similar connectivity in information networks. IEEE/WIC/ACM international conference on web intelligence
Papadopoulos A, Rafailidis D, Pallis G, Dikaiakos M (2015) Clustering attributed multi-graphs with information ranking. In: database and expert systems applications, Lecture Notes in Computer Science. Springer International Publishing
Perozzi B, Akoglu L, Sánchez PI, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’14
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Article MATH Google Scholar
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article MATH Google Scholar
Shen S, Meng Z (2012) Optimization of initial centroids for k-means algorithm based on small world network. In: Shi Z, Leake D, Vadera S (eds) Intelligent information processing VI, IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, pp 87–96
Google Scholar
Steinbach M, Kumar V (2005) Cluster analysis: basic concepts and algorithms. In: Introduction to data mining, 1st edn. Pearson Addison Wesley
Steinhaeuser K, Chawla N (2008) Community detection in a large real-world social network. In: Liu H, Salerno J, Young M (eds) Social computing, behavioral modeling, and prediction. Springer, USA, pp 168–175
Chapter Google Scholar
Sun H, Huang J, Han J, Deng H, Zhao P, Feng B (2010) gSkeletonClu: density-based network clustering via structure-connected tree division or agglomeration. In: Proceedings of the 2010 IEEE international conference on data mining. IEEE Computer Society, Washington, DC, ICDM ’10, pp 481–490. doi:10.1109/ICDM.2010.69
Sun Y, Aggarwal CC, Han J (2012) Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc VLDB Endow 5
Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010, Columbus, pp 49–59
Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’07, pp 824–833. doi:10.1145/1281192.1281280
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 international conference on management of data. ACM, New York, SIGMOD ’12
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2014) GBAGC: a general bayesian framework for attributed graph clustering. ACM Trans Knowl Discov Data 9(1):5:1–5:43
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: IEEE international conference on data mining, IEEE, pp 1151–1156. doi:10.1109/ICDM.2013.167
Zhong E, Fan W, Yang Q, Verscheure O, Ren J (2010) Cross validation framework to choose amongst models and datasets for transfer learning. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases: part III. Springer, Berlin, ECML PKDD’10, pp 547–562
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow 2(1):718–729
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the EU Commission in terms of the PaaSport 605193 FP7 Project (FP7-SME-2013).

Author information

Authors and Affiliations

Department of Computer Science, University of Cyprus, P.O. Box 20537, 1678, Nicosia, Cyprus
Andreas Papadopoulos, George Pallis & Marios D. Dikaiakos

Authors

Andreas Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
George Pallis
View author publications
You can also search for this author in PubMed Google Scholar
Marios D. Dikaiakos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Papadopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadopoulos, A., Pallis, G. & Dikaiakos, M.D. Weighted clustering of attributed multi-graphs. Computing 99, 813–840 (2017). https://doi.org/10.1007/s00607-016-0526-5

Download citation

Received: 02 April 2016
Accepted: 17 November 2016
Published: 01 December 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00607-016-0526-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted clustering of attributed multi-graphs

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Weighted clustering of attributed multi-graphs

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation