Advertisement

Data Mining and Knowledge Discovery

, Volume 32, Issue 5, pp 1251–1274 | Cite as

Efficiently summarizing attributed diffusion networks

  • Sorour E. Amiri
  • Liangzhe Chen
  • B. Aditya Prakash
Article
  • 312 Downloads
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2018

Abstract

Given a large attributed social network, can we find a compact, diffusion-equivalent representation while keeping the attribute properties? Diffusion networks with user attributes such as friendship, email communication, and people contact networks are increasingly common-place in the real-world. However, analyzing them is challenging due to their large size. In this paper, we first formally formulate a novel problem of summarizing an attributed diffusion graph to preserve its attributes and influence-based properties. Next, we propose ANeTS, an effective sub-quadratic parallelizable algorithm to solve this problem: it finds the best set of candidate nodes and merges them to construct a smaller network of ‘super-nodes’ preserving the desired properties. Extensive experiments on diverse real-world datasets show that ANeTS outperforms all state-of-the-art baselines (some of which do not even finish in 14 days). Finally, we show how ANeTS helps in multiple applications such as Topic-Aware viral marketing and sense-making of diverse graphs from different domains.

Keywords

Attributed graph Summarization Topic aware influence maximization 

Supplementary material

10618_2018_572_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1307 KB)

References

  1. Akoglu L, Tong H, Meeder B, Faloutsos C (2012) Pics: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 439–450Google Scholar
  2. Anderson RM, May RM, Anderson B (1992) Infectious diseases of humans: dynamics and control, vol 28. Wiley Online Library, Oxford, UKGoogle Scholar
  3. Barbieri N, Bonchi F, Manco G (2012) Topic-aware social influence propagation models. In: Data mining (ICDM), 2012 IEEE 12th international conference on. IEEE, pp 81–90Google Scholar
  4. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 199–208Google Scholar
  5. Chen S, Fan J, Li G, Feng J, Tan K-L, Tang J (2015) Online topic-aware influence maximization. Proc VLDB Endow 8(6):666–677CrossRefGoogle Scholar
  6. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 551–556Google Scholar
  7. Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 157–168Google Scholar
  8. Ghosh A, Boyd S (2006) Growing well-connected graphs. In: Decision and control, 2006 45th IEEE conference on. IEEE, pp 6605–6611Google Scholar
  9. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence, vol 5. ACM, New York, p 21Google Scholar
  10. Günnemann S, Boden B, Seidl T (2011) DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors. Springer, Berlin, pp 565–580Google Scholar
  11. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666CrossRefGoogle Scholar
  12. Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefzbMATHGoogle Scholar
  13. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefzbMATHGoogle Scholar
  14. Khan A, Bhowmick SS, Bonchi F (2017) Summarizing static and dynamic big graphs. Proc VLDB Endow 10(12):1981–1984CrossRefGoogle Scholar
  15. Kloumann IM, Kleinberg JM (2014) Community membership identification from small seed sets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1366–1375Google Scholar
  16. Liu Y, Dighe A, Safavi T, Koutra D (2016) A graph summarization: a survey. arXiv preprint arXiv:1612.04883
  17. Mathioudakis M, Bonchi F, Castillo C, Gionis A, Ukkonen A (2011) Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 529–537Google Scholar
  18. Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 419–432Google Scholar
  19. Perozzi B, Akoglu L, Iglesias Sánchez P, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1346–1355Google Scholar
  20. Prakash BA, Chakrabarti D, Valler NC, Faloutsos M, Faloutsos C (2011) Threshold conditions for arbitrary cascade models on arbitrary networks. ICDM, Vancouver, CanadaCrossRefGoogle Scholar
  21. Purohit M, Prakash BA, Kang C, Zhang Y, Subrahmanian V (2014) Fast influence-based coarsening for large networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1296–1305Google Scholar
  22. Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613Google Scholar
  23. Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 1089–1098Google Scholar
  24. Seah B-S, Bhowmick SS, Dewey CF, Yu H (2012) Fuse: a profit maximization approach for functional summarization of biological networks. BMC Bioinform 13(3):S10CrossRefGoogle Scholar
  25. Sen P, Namata GM, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–106CrossRefGoogle Scholar
  26. Shi L, Tong H, Tang J, Lin C (2015) Vegas: visual influence graph summarization on citation networks. IEEE Trans Knowl Data Eng 27(12):3417–3431CrossRefGoogle Scholar
  27. Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 567–580Google Scholar
  28. Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 965–973Google Scholar
  29. Wu Y, Zhong Z, Xiong W, Jing N (2014) Graph summarization for attributed graphs. In: Information Science, Electronics and Electrical Engineering (ISEEE), 2014 international conference on, vol 1. IEEE, pp 503–507Google Scholar
  30. Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 505–516Google Scholar
  31. Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213CrossRefGoogle Scholar
  32. Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 1151–1156Google Scholar
  33. Zhang H, Yao DD, Ramakrishnan N (2014) Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery. In: Proceedings of the 9th ACM symposium on information, computer and communications security. ACM, pp 39–50Google Scholar
  34. Zhang H, Sun M, Yao DD, North C (2015) Visualizing traffic causality for analyzing network anomalies. In: Proceedings of the 2015 ACM international workshop on international workshop on security and privacy analytics. ACM, pp 37–42Google Scholar
  35. Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: Data mining (ICDM), 2010 IEEE 10th international conference on. IEEE, pp 689–698Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceVirginia TechBlacksburgUSA

Personalised recommendations