Patterns and anomalies in k-cores of real-world graphs with applications

Shin, Kijung; Eliassi-Rad, Tina; Faloutsos, Christos

doi:10.1007/s10115-017-1077-6

Patterns and anomalies in k-cores of real-world graphs with applications

Regular Paper
Published: 28 June 2017

Volume 54, pages 677–710, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

1073 Accesses
39 Citations
Explore all metrics

Abstract

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Graph based anomaly detection and description: a survey

Article 05 July 2014

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Notes

This paper is an extended version of [49].
http://konect.uni-koblenz.de/networks/petster-friendships-hamster.
http://konect.uni-koblenz.de/networks/petster-friendships-cat.
http://konect.uni-koblenz.de/networks/friendster.
http://www.caida.org/data/as-relationships.
http://www.caida.org/tools/measurements/skitter.
Spearman’s rank correlation coefficient \(\rho \) [52] is the standard (Pearson) correlation coefficient r of the ranks. Here, \(\rho \) is equivalent to r between the ranks of vertices in terms of degree and their ranks in terms of coreness. Using \(\rho \) is known to be robust to outlying values than simply using r. We ignored isolated vertices when computing \(\rho \).
The fractional rank of an item is one plus the number of items greater than it plus half the number of items equal to it.
Strength of core–periphery structure. The correlation between the adjacency matrix of the measured graph and that of a graph with perfect core–periphery structure. See [10] for details.
Strength of community structure. The fraction of the edges within communities minus such fraction expected in a randomly connected graph. See [39] for details.
Isolated vertices are ignored when we compute Spearman’s rank correlation coefficient \(\rho \).
We used a machine with 2.67 GHz Intel Xeon E7-8837 CPUs and 1TB RAM.

References

Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, Springer, pp 598–612
Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 410–421
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688
Article MathSciNet Google Scholar
Albert R, Jeong H, Barabsi AL (1999) Internet: diameter of the world-wide web. Nature 401(6749):130–131
Article Google Scholar
Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the \(k\)-core decomposition. Adv Neural Inf Process Syst 18:41
Google Scholar
Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2008) \(K\)-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases. Netw Heterog Media 3:371
Article MathSciNet MATH Google Scholar
Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2
Article Google Scholar
Batagelj V, Zaversnik M (2003) An o(m) algorithm for cores decomposition of networks. arXiv:cs/0310049
Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 119–130
Borgatti SP, Everett MG (2000) Models of core/periphery structures. Soc Netw 21(4):375–395
Article Google Scholar
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577
Article MATH Google Scholar
Brouwer AE, Haemers WH (2001) Spectra of graphs. Springer, Berlin
MATH Google Scholar
Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on approximation algorithms for combinatorial optimization, Springer, pp 84–95
Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62
Cohen J (2008) Trusses: cohesive subgraphs for social network analysis. In: National security agency technical report, p 16
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 233–240
De Stefani L, Epasto A, Riondato M, Upfal E (2016) TRIÈST: counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 825–834
Erdös P (1963) On the structure of linear graphs. Israel J Math 1(3):156–160
Article MathSciNet MATH Google Scholar
Farach-Colton M, Tsai MT (2014) Computing the degeneracy of large graphs. In: Latin American symposium on theoretical informatics, Springer, pp 250–260
Freuder EC (1982) A sufficient condition for backtrack-free search. J ACM (JACM) 29(1):24–32
Article MathSciNet MATH Google Scholar
Gehrke J, Ginsparg P, Kleinberg J (2003) Overview of the 2003 KDD cup. ACM SIGKDD Explor Newslett 5(2):149–151
Article Google Scholar
Giatsidis C, Malliaros F, Thilikos DM, Vazirgiannis M (2014) Corecluster: a degeneracy based graph clustering framework. In: Twenty-sixth annual conference on innovative applications of artificial intelligence, AAAI, pp 29–31
Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citation data file: lessons, insights and methodological tools. doi:10.3386/w8498
Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016a) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 895–904
Hooi B, Song HA, Papalexakis E, Agrawal R, Faloutsos C (2016b) Matrices, compression, learning curves: formulation, and the GROUPNTEACH algorithms. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 376–387
Huang X, Lu W, Lakshmanan LV (2016) Truss decomposition of probabilistic graphs: semantics and algorithms. In: Proceedings of the 2016 ACM SIGMOD international conference on management of data, ACM, pp 77–90
Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE international conference on data mining, IEEE, pp 781–786
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 137–146
Kitsak M, Gallos LK, Havlin S, Liljeros F, Muchnik L, Stanley HE, Makse HA (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893
Article Google Scholar
Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: European conference on machine learning, Springer, pp 217–226
Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media?. In: Proceedings of the 19th international conference on world wide web, ACM, pp 591–600
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic mathematically tractable graph generation and evolution, using kronecker multiplication. In: European conference on principles of data mining and knowledge discovery, Springer, pp 133–145
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Article MathSciNet MATH Google Scholar
Lim Y, Kang U (2015) Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 685–694
Luce RD (1950) Connectivity and generalized cliques in sociometric group structure. Psychometrika 15(2):169–90
Article MathSciNet Google Scholar
Macdonald B, Shakarian P, Howard N, Moores G (2012) Spreaders in the network sir model: an empirical study. arXiv preprint arXiv:1208.4269
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, ACM, pp 29–42
Mokken RJ (1979) Cliques, clubs and clans. Qual Quant 13(2):161–173
Article Google Scholar
Newman ME (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23):8577–8582
Article Google Scholar
Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on world wide web, ACM, pp 201–210
Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 435–448
Rossi MEG, Malliaros FD, Vazirgiannis M (2015) Spread it good, spread it fast: identification of influential nodes in social networks. In: Proceedings of the 24th international conference on world wide web (companion volume), ACM, pp 101–102
Saríyüce AE, Gedik B, Jacques-Silva G, Wu KL, Çatalyürek ÜV (2013) Streaming algorithms for \(k\)-core decomposition. Proc VLDB Endow 6(6):433–444
Article Google Scholar
Saríyüce AE, Seshadhri C, Pinar A, Catalyurek UV (2015) Finding the hierarchy of dense subgraphs using nucleus decompositions. In: Proceedings of the 24th international conference on world wide web, ACM, pp 927–937
Schank T (2007) Algorithmic aspects of triangle-based network analysis. Ph.D. thesis, Universitt Karlsruhe (TH), Fakultt fr Informatik
Seidman SB, Foster BL (1978) A graph theoretic generalization of the clique concept. J Math Sociol 6(1):139–154
Article MathSciNet MATH Google Scholar
Seidman SB (1983) Network structure and minimum degree. Soc Netw 5(3):269–287
Article MathSciNet Google Scholar
Shin K, Hooi B, Faloutsos C (2016a) M-zoom: fast dense-block detection in tensors with quality guarantees. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 264–280
Shin K, Eliassi-Rad T, Faloutsos C (2016b) Corescope: graph mining using \(k\)-core analysis—patterns, anomalies and algorithms. In: 2016 16th IEEE international conference on data mining, IEEE, pp 469–478
Shin K, Hooi B, Jisu K, Faloutsos C (2017a) D-cube: dense-block detection in terabyte-scale tensors. In: Proceedings of the Tenth ACM international conference on web search and data mining, ACM, pp 681–690
Shin K, Hooi B, Jisu K, Faloutsos C (2017b) Densealert: incremental dense-subtensor detection in tensor streams. arXiv preprint arXiv:1706.03374
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Article Google Scholar
Tsourakakis CE (2008) Fast counting of triangles in large real networks without counting: algorithms and laws. In: 2008 eighth IEEE international conference on data mining, IEEE, pp 608–617
Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) Doulion: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 837–846
Van Loan CF (2000) The ubiquitous kronecker product. J Comput Appl Math 123(1):85–100
Article MathSciNet MATH Google Scholar
Wang J, Cheng J (2012) Truss decomposition in massive networks. Proc VLDB Endow 5(9):812–823
Article Google Scholar
Wuchty S, Almaas E (2005) Peeling the yeast protein network. Proteomics 5(2):444–449
Article Google Scholar
Zhang S, Zhou D, Yildirim MY, Alcorn S, He J, Davulcu H, Tong H (2017) HiDDen: hierarchical dense subgraph detection with application to financial fraud detection. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 570–578

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Nos. CNS-1314632 and IIS-1408924. Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. Kijung Shin was supported by KFAS Scholarship. Tina Eliassi-Rad was supported by NSF CNS-1314603 and by DTRA HDTRA1-10-1-0120. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Kijung Shin & Christos Faloutsos
Network Science Institute, Northeastern University, Boston, MA, USA
Tina Eliassi-Rad

Authors

Kijung Shin
View author publications
You can also search for this author in PubMed Google Scholar
Tina Eliassi-Rad
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kijung Shin.

Appendices

Appendix A: Interpreting sparsity patterns

We explain sparsity patterns and how to interpret them. The sparsity pattern of a graph is a plot with the axes representing the rows and columns of the adjacency matrix. For each nonzero entry (i.e., edge in the graph), a point is plotted, thus displaying sparsity patterns in the adjacency matrix.

Figure 19a shows the sparsity pattern of the degeneracy–core of Caida Dataset. The rows in the plot indicate vertices, and they are divided into two ranges, which correspond to the core and the periphery. The vertices in the core are densely connected with each other, as seen in region A in Fig. 19b. The vertices in the periphery are well connected to the vertices in the core (regions B and C) but rarely connected to each other (region D). The vertices in the core are further divided into three communities, each of which corresponds to a range of the columns in Fig. 19a. The vertices in the same community are particularly well connected to each other, as seen in regions A1, A2, and A3 in Fig. 19c, which correspond to the sparsity patterns of the communities.

Appendix B: Core-D with a small number of samples

Figure 20 presents the accuracy of Core-D with different sample sizes in the two largest datasets. Even with a small number of samples less than the number of vertices, Core-D, especially Overall Model, accurately and reliably estimated degeneracy. Thus, Core-D is still effective even when the amount of available memory space is less than n.

Appendix C: Measuring influence using SIR model simulation

To evaluate influence as a spreader, we simulate spreading processes using SIR model [29], a widely used epidemic model. Initially, a vertex chosen as the seed is in the infectious state (I-state), while the others are in the susceptible state (S-state). Each vertex in the I-state infects each of its neighbors in the S-state with probability \(\beta \) (infection rate) and then enters the recovered state (R-state). This is repeated until no vertex is in the I-state. The influence of a seed, the initially infected vertex, can be quantified by the number of vertices infected at any time during the process. To reduce random effects, we repeat the whole process 100 times and use the average number of infected vertices as the measure of influence. \(\beta \) is set close to the epidemic threshold \(\lambda _{1}^{-1}\), as in previous work [42].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shin, K., Eliassi-Rad, T. & Faloutsos, C. Patterns and anomalies in k-cores of real-world graphs with applications. Knowl Inf Syst 54, 677–710 (2018). https://doi.org/10.1007/s10115-017-1077-6

Download citation

Received: 16 March 2017
Revised: 06 June 2017
Accepted: 16 June 2017
Published: 28 June 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10115-017-1077-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Patterns and anomalies in k-cores of real-world graphs with applications

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Acknowledgements