Skip to main content
Log in

Triangle minimization in large networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The number of triangles is a fundamental metric for analyzing the structure and function of a network. In this paper, for the first time, we investigate the triangle minimization problem in a network under edge (node) attack, where the attacker aims to minimize the number of triangles in the network by removing \(k\) edges (nodes). We show that the triangle minimization problem under edge (node) attack is a submodular function maximization problem, which can be solved efficiently. Specifically, we propose a degree-based edge (node) removal algorithm and a near-optimal greedy edge (node) removal algorithm for approximately solving the triangle minimization problem under edge (node) attack. In addition, we introduce two pruning strategies and an approximate marginal gain evaluation technique to further speed up the greedy edge (node) removal algorithm. We conduct extensive experiments over 12 real-world datasets to evaluate the proposed algorithms, and the results demonstrate the effectiveness, efficiency and scalability of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://news.cnet.com/delete-10-facebook-friends-get-a-free-whopper/.

References

  1. Albert R et al (2000) Error and attack tolerance of complex networks. Nature 406:378–382

  2. Alon N et al (1997) Finding and counting given length cycles. Algorithmica 17(3):209–223

  3. Avron H (2010) Counting triangles in large graphs using randomized matrix trace estimation. In: Proceedings of KDD-LDMTA’10

  4. Bar-Yossef Z et al (2002) Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA

  5. Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

  6. Becchetti L et al (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: KDD

  7. Brin S, Page L (1997) PageRank: bringing order to the web. Tech. rep, Stanford Digital Library Project

  8. Buriol LS et al (2006) Counting triangles in data streams. In: PODS

  9. Callaway DS et al (2000) Network robustness and fragility: percolation on random graphs. Phys Rev Lett 85(25):5468–5471

  10. Chu S, Cheng J (2011) Triangle listing in massive networks and its applications. In: KDD

  11. Cohen R et al (2000) Resilience of the internet to random breakdowns. Phys Rev Lett 85(21):5626–5628

  12. Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:95–120

    Article  Google Scholar 

  13. Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: ESA, pp 605–617

  14. Feige U (1998) A threshold of in n for approximating set cover. J ACM 45(4):634–652

  15. Flajolet P et al (2003) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: ESA, pp 605–617

  16. Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209

    Article  MathSciNet  MATH  Google Scholar 

  17. Godsil C, Royle GF (2001) Algebraic graph theory. Springer, Berlin

    Book  MATH  Google Scholar 

  18. Hanneman RA, Riddle M (2005) Introduction to social network methods. University of California, Riverside. http://faculty.ucr.edu/~hanneman/nettext/

  19. Hochbaum DS (1996) Approximation algorithms for NP-hard problems. PWS Publishing Company, Boston, MA

  20. Itai A, Rodeh M (1978) Finding a minimum circuit in a graph. SIAM J Comput 7(4):413–423

  21. Jowhari H, Ghodsi M (2005) New streaming algorithms for counting triangles in graphs. In: COCOON

  22. Kempe D et al (2003) Maximizing the spread of influence through a social network. In: KDD

  23. Krause A, Guestrin C (2007) Near-optimal observation selection using submodular functions. In: AAAI

  24. Krause A, Horvitz E (2008) A utility-theoretic approach to privacy and personalization. In: AAAI

  25. Krause A et al (2008) Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9:235–284

  26. Latapy M (2008) Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor Comput Sci 407:1–3

    Article  MathSciNet  Google Scholar 

  27. Leskovec J (2010) Standford network analysis project

  28. Leskovec J et al (2007) Cost-effective outbreak detection in networks. In: KDD

  29. Li R-H, Yu JX (2011) Scalable diversified ranking on large graphs. In: ICDM

  30. Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146

    Article  Google Scholar 

  31. Li R-H et al (2014a) Random-walk domination in large graphs. In: ICDE

  32. Li R-H et al (2012) Measuring robustness of complex networks under MVC attack. In: CIKM

  33. Li R-H et al (2014b) Measuring the impact of MVC attack in large complex networks. Inf Sci 278:685–702

    Article  Google Scholar 

  34. Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL

  35. Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: ACL

  36. McPherson M et al (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

  37. Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. Lecture Notes in Control and Information Sciences. Springer, Berlin

  38. Nemhauser GL et al (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14:265–294

  39. Palmer CR et al (2002) ANF: a fast and scalable tool for data mining in massive graphs. In: KDD, pp 81–90

  40. Schank T (2007) Algorithmic aspects of triangle-based network analysis. PhD Thesis, University Karlsruhe (TH)

  41. Schank T, Wagner D (2005) Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA

  42. Schneider CM et al (2011) Mitigation of malicious attacks on networks. PNAS 108(10):3838–3841

  43. Seshadhri C et al (2012) Fast triangle counting through wedge sampling. CoRR abs/1202.5230

  44. Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: WWW

  45. Tong H et al (2012) Gelling, and melting, large graphs by edge manipulation. In: CIKM

  46. Tong H et al (2010) On the vulnerability of large graphs. In: ICDM

  47. Tsourakakis CE et al (2009) DOULION: counting triangles in massive graphs with a coin. In: KDD

  48. Vazirani VV (2001) Approximation algorithms. Springer, Berlin

  49. Vondrak J (2010) Submodularity and curvature: the optimal algorithm. RIMS Kokyuroku Bessatsu B23:253–266

    MathSciNet  MATH  Google Scholar 

  50. Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393:440–442

  51. Zafarani R, Liu H (2009) Social Computing Data Repository at ASU

Download references

Acknowledgments

We thank anonymous reviewers for their helpful comments. The work was supported in part by (1) NSFC Grant 61402292, Natural Science Foundation of SZU (Grant No. 201438) and (2) Research Grants Council of the Hong Kong SAR, China, 14209314 and 418512.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong-Hua Li.

Appendix

Appendix

Proof of Theorem 2.1

First, it is known that the set cover with frequency constraint (SCFC) problem is NP-hard [19, 48]. Given a ground set \(\mathcal {U}\), a collection of \(n\) subsets \(\mathcal {S}=\{S_1, S_2, \cdots , S_n\}\) where \(\bigcup _i S_i=\mathcal {U}\), and a frequency parameter \(t\) (\(t < n\)), the SCFC problem is to find the minimum number of subsets in \(\mathcal {S}\) that covers all elements in \(\mathcal {U}\). Here, the frequency parameter \(t\) denotes that every element in \(\mathcal {U}\) is included in \(t\) subsets in \(\mathcal {S}\). Let us consider a special case of the SCFC problem, which has an additional constraint that the intersection of any three subsets in \(\mathcal {S}\) has at most one element (i.e., for any \(i, j, k\) and \(i \ne j \ne k\), \(|S_i \cap S_j \cap S_k| \le 1\)). For convenience, we refer to this problem as the intersection-bounded SCFC (IBSCFC) problem. Below, we show that the IBSCFC problem is also NP-hard. Suppose to the contrary that there is a polynomial algorithm \(\mathcal {A}\) to solve the IBSCFC problem. For any \(|S_i \cap S_j \cap S_k| > 1\) in the SCFC problem, we can discard the “redundant-common elements” in the subsets \(S_i, S_j, S_k\) so that \(| \tilde{S}_i \cap \tilde{S}_j \cap \tilde{S}_k|=1\) where \(\tilde{S}_i\) denotes the subset \(S_i\) after discarding the redundant-common elements (i.e., for \(S_i, S_j, S_k\), only one common element is left). Then, the SCFC problem becomes the IBSCFC problem, and we invoke algorithm \(\mathcal {A}\) to solve it. It is important to note that the optimal solution (the selected subsets ID) obtained by algorithm \(\mathcal {A}\) is the optimal solution for the SCFC problem. The reason is as follows. For any \(S_i, S_j, S_k\) with \(|S_i \cap S_j \cap S_k| > 1\), the redundant-common elements are only in these three subsets (by our constraint, each element is included in three subsets), thus they do not affect the optimal solution. Moreover, the optimal solution obtained by algorithm \(\mathcal {A}\) must contain at least one subset from \(S_i, S_j, S_k\), because these three subsets have one common element left which must be covered by a subset in the optimal solution. By the above process, there is a polynomial algorithm for the SCFC problem, which is a contradiction.

Second, we consider the maximum coverage version of the IBSCFC problem, called IBMCFC, where the goal is to find \(k\) subsets in \(\mathcal {S}\) to maximize the cardinality of their union. It is easy to show that this problem is also NP-hard. Because if not, there is a polynomial algorithm \(\mathcal {B}\) to solve the IBMCFC problem. Since \(\bigcup _i S_i=\mathcal {U}\), we can invoke \(\mathcal {B}\) at most \(n\) times to get an optimal solution of the IBSCFC problem (enumerating \(k\) from \(1\) to \(n\)). That is to say, there is a polynomial algorithm for the IBSCFC problem, which is a contradiction.

Third, to prove the theorem, we show a reduction from the IBMCFC problem. Specifically, for each subset \(S_i\), we create an edge \(e_i\) with \(2|S_i|\) stubs, which are used to combine the end nodes of different edges. Each end node of an edge is associated with \(|S_i|\) stubs, and these stubs are labeled by the element ID in \(S_i\). Then, for any three subsets \(S_i\), \(S_j\), and \(S_k\) (\(i \ne j \ne k\)) with \(|S_i \cap S_j \cap S_k| =1\), we combine the end nodes of their corresponding edges with the same stub labels so that they can form a triangle. As an example, let \(\mathcal {U}=\{u_1, u_2\}\), \(S_1=\{u_1\}\), \(S_2=\{u_1\}\), \(S_3=\{u_1, u_2\}\), \(S_4=\{u_2\}\), \(S_5=\{u_2\}\). Clearly, each element in \(\mathcal {U}\) is in three subsets and any three subsets have at most one common element. Then, for each subset, we create an edge with stubs as shown in the left part of Fig. 5. Then, we can construct a graph as shown in the right part of Fig. 5. By this construction, each triangle is represented by an element in \(\mathcal {U}\), and each edge \(e_i\) in the resulting graph is represented by a subset \(S_i\) in \(\mathcal {S}\). As a result, the optimal solution of the triangle minimization problem (by edge removal) in the resulting graph is the optimal solution of the IBMCFC problem. Since IBMCFC is NP-hard, the triangle minimization problem by edge removal is also NP-hard. This completes the proof. \(\square \)

Fig. 5
figure 5

Illustration of the graph construction

Proof of Theorem 2.2

Similar to the proof of Theorem 2.1, we can show a reduction from the IBMCFC problem. Following the notations used in the proof of Theorem 2.1, we create a graph \(G\) for the instance of triangle minimization problem by node removal as follows. Specifically, for each \(S_i\) in \(\mathcal {S}\), we create a node \(v_i\). For each pair \(S_i\) and \(S_j\) (\(i \ne j\)), we create an edge \((v_i, v_j)\) if and only if \(S_i \bigcap S_j \ne \emptyset \). By this construction, each node is represented by a subset, and each triangle is represented by an element. One can easily check that the optimal solution of the triangle minimization problem by node removal is the optimal solution of the IBMCFC problem. Thus, the theorem is established.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, RH., Yu, J.X. Triangle minimization in large networks. Knowl Inf Syst 45, 617–643 (2015). https://doi.org/10.1007/s10115-014-0800-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0800-9

Keywords

Navigation