Data Mining and Knowledge Discovery

, Volume 27, Issue 3, pp 321–343 | Cite as

Activity preserving graph simplification

  • Francesco Bonchi
  • Gianmarco De Francisci Morales
  • Aristides Gionis
  • Antti Ukkonen


We study the problem of simplifying a given directed graph by keeping a small subset of its arcs. Our goal is to maintain the connectivity required to explain a set of observed traces of information propagation across the graph. Unlike previous work, we do not make any assumption about an underlying model of information propagation. Instead, we approach the task as a combinatorial problem. We prove that the resulting optimization problem is \(\mathbf{NP}\)-hard. We show that a standard greedy algorithm performs very well in practice, even though it does not have theoretical guarantees. Additionally, if the activity traces have a tree structure, we show that the objective function is supermodular, and experimentally verify that the approach for size-constrained submodular minimization recently proposed by Nagano et al. (28th International Conference on Machine Learning, 2011) produces very good results. Moreover, when applied to the task of reconstructing an unobserved graph, our methods perform comparably to a state-of-the-art algorithm devised specifically for this task.


Information propagation Graph sparsification  Submodular minimization 


  1. Arenas A, Duch J, Fernández A, Gómez S (2007) Size reduction of complex networks preserving modularity. New J Phys 9(6):176CrossRefGoogle Scholar
  2. Edmonds J (2003) Submodular functions, matroids, and certain polyhedra. In: Combinatorial optimization—Eureka, You Shrink!, Springer, Berlin, pp 11–26Google Scholar
  3. Elkin M, Peleg D (2005) Approximating \(k\)-spanner problems for \(k {\>} 2\). Theor Comput Sci 337(1):249–277Google Scholar
  4. Foti NJ, Hughes JM, Rockmore DN (2011) Nonparametric sparsification of complex multiscale networks. PLoS One 6(2):e16431Google Scholar
  5. Fujishige S (2005) Submodular functions and optimization, vol 58. Elsevier Science, AmsterdamGoogle Scholar
  6. Fung WS, Hariharan R, Harvey NJ, Panigrahi D (2011) A general framework for graph sparsification. In: Proceedings of the 43rd annual ACM symposium on theory of computing, ACM, pp 71–80Google Scholar
  7. Gomez-Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1019–1028Google Scholar
  8. Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning, pp 561–568Google Scholar
  9. Iwata S, Orlin JB (2009) A simple combinatorial algorithm for submodular function minimization. In: Proceedings of the twentieth Annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1230–1237Google Scholar
  10. Jamali M, Ester M (2010) Modeling and comparing the influence of neighbors on the behavior of users in social and similarity networks. In: 2010 IEEE international conference on data mining workshops (ICDMW), IEEE, pp 336–343Google Scholar
  11. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 137–146Google Scholar
  12. Krause A (2010) Sfo: a toolbox for submodular function optimization. J Mach Learn Res 11:1141–1144zbMATHGoogle Scholar
  13. Leskovec J, Faloutsos C (2007) Scalable modeling of real graphs using kronecker multiplication. In: Proceedings of the 24th international conference on machine learning, ACM, pp 497–504Google Scholar
  14. Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 497–506Google Scholar
  15. Mathioudakis M, Bonchi F, Castillo C, Gionis A, Ukkonen A (2011) Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 529–537Google Scholar
  16. Misiołek E, Chen DZ (2006) Two flow network simplification algorithms. Inf Process Let 97(5):197–202zbMATHCrossRefGoogle Scholar
  17. Nagano K, Kawahara Y, Aihara K (2011) Size-constrained submodular minimization through minimum norm base. In: Proceedings of the 28th international conference on machine learning, pp 977–984Google Scholar
  18. Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Progr 14(1):265–294MathSciNetzbMATHCrossRefGoogle Scholar
  19. Peleg D, Schäffer AA (1989) Graph spanners. J Graph Theory 13(1):99–116 Google Scholar
  20. Quirin A, Cordon O, Santamaria J, Vargas-Quesada B, Moya-Anegón F (2008) A new variant of the pathfinder algorithm to generate large visual science maps in cubic time. Inf Process Manag 44(4):1611–1623CrossRefGoogle Scholar
  21. Serrano E, Quirin A, Botia J, Cordón O (2010) Debugging complex software systems by means of pathfinder networks. Inf Sci 180(5):561–583CrossRefGoogle Scholar
  22. Serrano MÁ, Boguñá M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Nat Acad Sci USA 106(16):6483–6488CrossRefGoogle Scholar
  23. Srikant R, Yang Y (2001) Mining web logs to improve website organization. In: Proceedings of the 10th international conference on World Wide Web, ACM, pp 430–437Google Scholar
  24. Svitkina Z, Fleischer L (2011) Submodular approximation: sampling-based algorithms and lower bounds. SIAM J Comput 40(6):1715–1737MathSciNetzbMATHCrossRefGoogle Scholar
  25. Toivonen H, Mahler S, Zhou F (2010) A framework for path-oriented network simplification. In: Advances in intelligent data analysis IX, Springer, Berlin, pp 220–231Google Scholar
  26. Wolfe P (1976) Finding the nearest point in a polytope. Math Progr 11(1):128–149zbMATHCrossRefGoogle Scholar
  27. Zhou F, Malher S, Toivonen H (2010) Network simplification with minimal loss of connectivity. In: Data Mining (ICDM), 2010 IEEE 10th international conference on IEEE, pp 659–668Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Francesco Bonchi
    • 1
  • Gianmarco De Francisci Morales
    • 1
  • Aristides Gionis
    • 3
  • Antti Ukkonen
    • 2
  1. 1.Yahoo! Research BarcelonaBarcelonaSpain
  2. 2.Helsinki Institute for Information Technology (HIIT)Aalto UniversityEspooFinland
  3. 3.Department of Information and Computer ScienceAalto UniversityEspooFinland

Personalised recommendations