Finding lasting dense subgraphs

Abstract

Graphs form a natural model for relationships and interactions between entities, for example, between people in social and cooperation networks, servers in computer networks, or tags and words in documents and tweets. But, which of these relationships or interactions are the most lasting ones? In this paper, we study the following problem: given a set of graph snapshots, which may correspond to the state of an evolving graph at different time instances, identify the set of nodes that are the most densely connected in all snapshots. We call this problem the Best Friends Forever (\(\text {BFF}\)) problem. We provide definitions for density over multiple graph snapshots, that capture different semantics of connectedness over time, and we study the corresponding variants of the \(\text {BFF}\) problem. We then look at the On–Off\(\text {BFF}\) (\(\textsc {O}^{\textsc {2}}\text {BFF}\)) problem that relaxes the requirement of nodes being connected in all snapshots, and asks for the densest set of nodes in at least k of a given set of graph snapshots. We show that this problem is NP-complete for all definitions of density, and we propose a set of efficient algorithms. Finally, we present experiments with synthetic and real datasets that show both the efficiency of our algorithms and the usefulness of the \(\text {BFF}\) and the \(\textsc {O}^{\textsc {2}}\text {BFF}\) problems.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    http://dblp.uni-trier.de/.

  2. 2.

    https://snap.stanford.edu/data/oregon1.html.

  3. 3.

    https://snap.stanford.edu/data/oregon2.html.

  4. 4.

    http://www.caida.org/data/as-relationships/.

  5. 5.

    https://snap.stanford.edu/data/as.html.

  6. 6.

    https://github.com/ksemer/BestFriendsForever-BFF-.

References

  1. Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2005) Large scale networks fingerprinting and visualization using the k-core decomposition. In: Advances in neural information processing systems, [neural information processing systems, NIPS 2005, December 5–8, 2005, Vancouver, British Columbia, Canada], 2005, vol 18. MIT Press Cambridge, MA, USA, pp 41–50

  2. Araujo M, Günnemann S, Papadimitriou S, Faloutsos C, Basu P, Swami A, Papalexakis EE, Koutra D (2016) Discovery of “comet” communities in temporal and labeled graphs com\(^{2}\). Knowl Inf Syst 46(3):657–677. https://doi.org/10.1007/s10115-015-0847-2

    Article  Google Scholar 

  3. Asahiro Y, Iwama K, Tamaki H, Tokuyama T (2000) Greedily finding a dense subgraph. J Algorithms 34:203–221. https://doi.org/10.1006/jagm.1999.1062

    MathSciNet  Article  MATH  Google Scholar 

  4. Bahmani B, Kumar R, Vassilvitskii S (2012) Densest subgraph in streaming and mapreduce. PVLDB 5(5):454–465. https://doi.org/10.14778/2140436.2140442

    Google Scholar 

  5. Bhattacharya S, Henzinger M, Nanongkai D, Tsourakakis CE (2015) Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: Proceedings of the forty-seventh annual ACM on symposium on theory of computing, STOC 2015, Portland, OR, USA, June 14–17, 2015, pp 173–182. https://doi.org/10.1145/2746539.2746592

  6. Bogdanov P, Mongiovì M, Singh AK (2011) Mining heavy subgraphs in time-evolving networks. In: 11th IEEE international conference on data mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011, pp 81–90. https://doi.org/10.1109/ICDM.2011.101

  7. Bourjolly J-M, Laporte G, Pesant G (2002) An exact algorithm for the maximum k-club problem in an undirected graph. Eur J Oper Res 138(1):21–28. https://doi.org/10.1016/S0377-2217(01)00133-3

    MathSciNet  Article  MATH  Google Scholar 

  8. Cerf L, Besson J, Robardet C, Boulicaut J-F (2008) Data peeler: contraint-based closed pattern mining in n-ary relations. In: Proceedings of the SIAM international conference on data mining, SDM 2008, April 24–26, 2008, Atlanta, Georgia, USA, pp 37–48. https://doi.org/10.1137/1.9781611972788.4

  9. Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: Approximation algorithms for combinatorial optimization, third international workshop, APPROX 2000, Saarbrücken, Germany, September 5–8, 2000, proceedings, pp 84–95. https://doi.org/10.1007/3-540-44436-X_10

  10. Epasto A, Lattanzi S, Sozio M (2015) Efficient densest subgraph computation in evolving graphs. In: Proceedings of the 24th international conference on world wide web, WWW 2015, Florence, Italy, May 18–22, 2015, pp 300–310. https://doi.org/10.1145/2736277.2741638

  11. Fortunato S (2009) Community detection in graphs. CoRR. arXiv:0906.0612

  12. Goldberg AV (1984) Finding a maximum density subgraph. Technical report

  13. Jethava V, Beerenwinkel N (2015) Finding dense subgraphs in relational graphs. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2015, Porto, Portugal, September 7–11, 2015, Proceedings, Part II, pp 641–654. https://doi.org/10.1007/978-3-319-23525-7_39

  14. Khuller S, Saha B (2009) On finding dense subgraphs. In: Automata, languages and programming, 36th international colloquium, ICALP 2009, Rhodes, Greece, July 5–12, 2009, Proceedings, Part I, pp 597–608. https://doi.org/10.1007/978-3-642-02927-1_50

  15. Khurana U, Deshpande A (2013) Efficient snapshot retrieval over historical graph data. In: 29th IEEE international conference on data engineering, ICDE 2013, Brisbane, Australia, April 8–12, 2013, pp 997–1008. https://doi.org/10.1109/ICDE.2013.6544892

  16. Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. TKDD 1(1):2. https://doi.org/10.1145/1217299.1217301

    Article  Google Scholar 

  17. Ma S, Hu R, Wang L, Lin X, Huai J (2017) Fast computation of dense temporal subgraphs. In: 33rd IEEE international conference on data engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp 361–372. https://doi.org/10.1109/ICDE.2017.95

  18. Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Algorithm theory—SWAT 2004, 9th Scandinavian workshop on algorithm theory, Humlebaek, Denmark, July 8–10, 2004, Proceedings, pp 260–272. https://doi.org/10.1007/978-3-540-27810-8_23

  19. McClosky B, Hicks IV (2012) Combinatorial algorithms for the maximum k-plex problem. J. Comb. Optim. 23(1):29–49. https://doi.org/10.1007/s10878-010-9338-2

    MathSciNet  Article  MATH  Google Scholar 

  20. Moffitt VZ, Stoyanovich J (2016) Towards a distributed infrastructure for evolving graph analytics. In: Proceedings of the 25th international conference on world wide web, WWW 2016, Montreal, Canada, April 11–15, 2016, Companion Volume, pp 843–848. https://doi.org/10.1145/2872518.2889290

  21. Myra S (2011) Evolution in social networks: a survey. In: Social network data analytics, pp 149–175. https://doi.org/10.1007/978-1-4419-8462-3_6

  22. Nguyen K-N, Cerf L, Plantevit M, Boulicaut J-F (2011) Multidimensional association rules in boolean tensors. In: Proceedings of the eleventh SIAM international conference on data mining, SDM 2011, April 28–30, Mesa, Arizona, USA, pp 570–581. https://doi.org/10.1137/1.9781611972818.49

  23. Nguyen K-N, Cerf L, Plantevit M, Boulicaut J-F (2013) Discovering descriptive rules in relational dynamic graphs. Intell. Data Anal. 17(1):49–69. https://doi.org/10.3233/IDA-120567

    Article  Google Scholar 

  24. Ren C, Lo E, Kao B, Zhu X, Cheng R (2011) On querying historical evolving graph sequences. PVLDB 4(11):726–737

    Google Scholar 

  25. Rozenshtein P, Nikolaj T, Aristides G (2017) Finding dynamic dense subgraphs. TKDD 11(3):27:1–27:30. https://doi.org/10.1145/3046791

    Article  Google Scholar 

  26. Rozenshtein P, Tatti N, Gionis A (2014) Discovering dynamic communities in interaction networks. In: Machine learning and knowledge discovery in databases—European conference, ECML PKDD 2014, Nancy, France, September 15–19, 2014. Proceedings, Part II, pp 678–693. https://doi.org/10.1007/978-3-662-44851-9_43

  27. Semertzidis K, Pitoura E (2018) Top-k durable graph pattern queries on temporal graphs. IEEE Trans Knowl Data Eng PP(99):1–1. https://doi.org/10.1109/TKDE.2018.2823754

    Google Scholar 

  28. Semertzidis K, Pitoura E (2016) Durable graph pattern queries on historical graphs. In: 32nd IEEE international conference on data engineering, ICDE 2016, Helsinki, Finland, May 16–20, 2016, pp 541–552. https://doi.org/10.1109/ICDE.2016.7498269

  29. Semertzidis K, Pitoura E (2017) Historical traversals in native graph databases. In: Advances in databases and information systems—21st European conference, ADBIS 2017, Nicosia, Cyprus, September 24–27, 2017, proceedings, pp 167–181. https://doi.org/10.1007/978-3-319-66917-5_12

  30. Semertzidis K, Pitoura E, Lillis K (2015) Timereach: historical reachability queries on evolving graphs. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 121–132. https://doi.org/10.5441/002/edbt.2015.12

  31. Semertzidis K, Pitoura E, Terzi E, Tsaparas P (2016) Best friends forever (BFF): finding lasting dense subgraphs. In: CoRR. arXiv:1612.05440

  32. Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, July 25–28, 2010, pp 939–948. https://doi.org/10.1145/1835804.1835923

  33. Tsantarliotis P, Pitoura E (2015) Topic detectionusing a critical term graph on news-related tweets. In: Proceedings of the workshops of the EDBT/ICDT 2015 joint conference (EDBT/ICDT), Brussels, Belgium, March 27th, 2015, pp 177–182

  34. Tsourakakis CE, Bonchi F, Gionis A, Gullo F, Tsiarli MA (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: The 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2013, Chicago, IL, USA, August 11–14, 2013, pp 104–112. https://doi.org/10.1145/2487575.2487645

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Konstantinos Semertzidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editor: Evimaria Terzi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Semertzidis, K., Pitoura, E., Terzi, E. et al. Finding lasting dense subgraphs. Data Min Knowl Disc 33, 1417–1445 (2019). https://doi.org/10.1007/s10618-018-0602-x

Download citation

Keywords

  • Dense subgraphs
  • Graph history
  • Social networks