Efficiently summarizing attributed diffusion networks

Abstract

Given a large attributed social network, can we find a compact, diffusion-equivalent representation while keeping the attribute properties? Diffusion networks with user attributes such as friendship, email communication, and people contact networks are increasingly common-place in the real-world. However, analyzing them is challenging due to their large size. In this paper, we first formally formulate a novel problem of summarizing an attributed diffusion graph to preserve its attributes and influence-based properties. Next, we propose ANeTS, an effective sub-quadratic parallelizable algorithm to solve this problem: it finds the best set of candidate nodes and merges them to construct a smaller network of ‘super-nodes’ preserving the desired properties. Extensive experiments on diverse real-world datasets show that ANeTS outperforms all state-of-the-art baselines (some of which do not even finish in 14 days). Finally, we show how ANeTS helps in multiple applications such as Topic-Aware viral marketing and sense-making of diverse graphs from different domains.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Akoglu L, Tong H, Meeder B, Faloutsos C (2012) Pics: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 439–450

  2. Anderson RM, May RM, Anderson B (1992) Infectious diseases of humans: dynamics and control, vol 28. Wiley Online Library, Oxford, UK

    Google Scholar 

  3. Barbieri N, Bonchi F, Manco G (2012) Topic-aware social influence propagation models. In: Data mining (ICDM), 2012 IEEE 12th international conference on. IEEE, pp 81–90

  4. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 199–208

  5. Chen S, Fan J, Li G, Feng J, Tan K-L, Tang J (2015) Online topic-aware influence maximization. Proc VLDB Endow 8(6):666–677

    Article  Google Scholar 

  6. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 551–556

  7. Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 157–168

  8. Ghosh A, Boyd S (2006) Growing well-connected graphs. In: Decision and control, 2006 45th IEEE conference on. IEEE, pp 6605–6611

  9. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence, vol 5. ACM, New York, p 21

    Google Scholar 

  10. Günnemann S, Boden B, Seidl T (2011) DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors. Springer, Berlin, pp 565–580

    Google Scholar 

  11. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  12. Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  MATH  Google Scholar 

  13. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    MathSciNet  Article  MATH  Google Scholar 

  14. Khan A, Bhowmick SS, Bonchi F (2017) Summarizing static and dynamic big graphs. Proc VLDB Endow 10(12):1981–1984

    Article  Google Scholar 

  15. Kloumann IM, Kleinberg JM (2014) Community membership identification from small seed sets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1366–1375

  16. Liu Y, Dighe A, Safavi T, Koutra D (2016) A graph summarization: a survey. arXiv preprint arXiv:1612.04883

  17. Mathioudakis M, Bonchi F, Castillo C, Gionis A, Ukkonen A (2011) Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 529–537

  18. Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 419–432

  19. Perozzi B, Akoglu L, Iglesias Sánchez P, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1346–1355

  20. Prakash BA, Chakrabarti D, Valler NC, Faloutsos M, Faloutsos C (2011) Threshold conditions for arbitrary cascade models on arbitrary networks. ICDM, Vancouver, Canada

    Google Scholar 

  21. Purohit M, Prakash BA, Kang C, Zhang Y, Subrahmanian V (2014) Fast influence-based coarsening for large networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1296–1305

  22. Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613

  23. Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 1089–1098

  24. Seah B-S, Bhowmick SS, Dewey CF, Yu H (2012) Fuse: a profit maximization approach for functional summarization of biological networks. BMC Bioinform 13(3):S10

    Article  Google Scholar 

  25. Sen P, Namata GM, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–106

    Article  Google Scholar 

  26. Shi L, Tong H, Tang J, Lin C (2015) Vegas: visual influence graph summarization on citation networks. IEEE Trans Knowl Data Eng 27(12):3417–3431

    Article  Google Scholar 

  27. Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 567–580

  28. Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 965–973

  29. Wu Y, Zhong Z, Xiong W, Jing N (2014) Graph summarization for attributed graphs. In: Information Science, Electronics and Electrical Engineering (ISEEE), 2014 international conference on, vol 1. IEEE, pp 503–507

  30. Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 505–516

  31. Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213

    Article  Google Scholar 

  32. Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 1151–1156

  33. Zhang H, Yao DD, Ramakrishnan N (2014) Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery. In: Proceedings of the 9th ACM symposium on information, computer and communications security. ACM, pp 39–50

  34. Zhang H, Sun M, Yao DD, North C (2015) Visualizing traffic causality for analyzing network anomalies. In: Proceedings of the 2015 ACM international workshop on international workshop on security and privacy analytics. ACM, pp 37–42

  35. Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: Data mining (ICDM), 2010 IEEE 10th international conference on. IEEE, pp 689–698

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sorour E. Amiri.

Additional information

Responsible editor: Jesse Davis, Elisa Fromont, Derek Greene, and Björn Bringmann.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1307 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amiri, S.E., Chen, L. & Prakash, B.A. Efficiently summarizing attributed diffusion networks. Data Min Knowl Disc 32, 1251–1274 (2018). https://doi.org/10.1007/s10618-018-0572-z

Download citation

Keywords

  • Attributed graph
  • Summarization
  • Topic aware influence maximization