# Patterns and anomalies in *k*-cores of real-world graphs with applications

- 470 Downloads
- 5 Citations

## Abstract

How do the *k*-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A *k*-core is the maximal subgraph in which all vertices have degree at least *k*. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to *k*-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum *k* such that each vertex belongs to the *k*-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum *k* such that the *k*-core exists) obeys a *3-to-1* power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to **12** \(\times \) **faster** than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to **17** \(\times \) **faster** than its competitors with comparable accuracy.

## Keywords

Graph*k*-core Degeneracy Influential node Anomaly detection

*k*-truss

## Notes

### Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Nos. CNS-1314632 and IIS-1408924. Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. Kijung Shin was supported by KFAS Scholarship. Tina Eliassi-Rad was supported by NSF CNS-1314603 and by DTRA HDTRA1-10-1-0120. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

## References

- 1.Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, Springer, pp 598–612Google Scholar
- 2.Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 410–421Google Scholar
- 3.Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688MathSciNetCrossRefGoogle Scholar
- 4.Albert R, Jeong H, Barabsi AL (1999) Internet: diameter of the world-wide web. Nature 401(6749):130–131CrossRefGoogle Scholar
- 5.Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the \(k\)-core decomposition. Adv Neural Inf Process Syst 18:41Google Scholar
- 6.Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2008) \(K\)-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases. Netw Heterog Media 3:371MathSciNetCrossRefzbMATHGoogle Scholar
- 7.Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2CrossRefGoogle Scholar
- 8.Batagelj V, Zaversnik M (2003) An o(m) algorithm for cores decomposition of networks. arXiv:cs/0310049
- 9.Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 119–130Google Scholar
- 10.Borgatti SP, Everett MG (2000) Models of core/periphery structures. Soc Netw 21(4):375–395CrossRefGoogle Scholar
- 11.Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577CrossRefzbMATHGoogle Scholar
- 12.Brouwer AE, Haemers WH (2001) Spectra of graphs. Springer, BerlinzbMATHGoogle Scholar
- 13.Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on approximation algorithms for combinatorial optimization, Springer, pp 84–95Google Scholar
- 14.Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62Google Scholar
- 15.Cohen J (2008) Trusses: cohesive subgraphs for social network analysis. In: National security agency technical report, p 16Google Scholar
- 16.Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 233–240Google Scholar
- 17.De Stefani L, Epasto A, Riondato M, Upfal E (2016) TRIÈST: counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 825–834Google Scholar
- 18.Erdös P (1963) On the structure of linear graphs. Israel J Math 1(3):156–160MathSciNetCrossRefzbMATHGoogle Scholar
- 19.Farach-Colton M, Tsai MT (2014) Computing the degeneracy of large graphs. In: Latin American symposium on theoretical informatics, Springer, pp 250–260Google Scholar
- 20.Freuder EC (1982) A sufficient condition for backtrack-free search. J ACM (JACM) 29(1):24–32MathSciNetCrossRefzbMATHGoogle Scholar
- 21.Gehrke J, Ginsparg P, Kleinberg J (2003) Overview of the 2003 KDD cup. ACM SIGKDD Explor Newslett 5(2):149–151CrossRefGoogle Scholar
- 22.Giatsidis C, Malliaros F, Thilikos DM, Vazirgiannis M (2014) Corecluster: a degeneracy based graph clustering framework. In: Twenty-sixth annual conference on innovative applications of artificial intelligence, AAAI, pp 29–31Google Scholar
- 23.Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citation data file: lessons, insights and methodological tools. doi: 10.3386/w8498
- 24.Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016a) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 895–904Google Scholar
- 25.Hooi B, Song HA, Papalexakis E, Agrawal R, Faloutsos C (2016b) Matrices, compression, learning curves: formulation, and the GROUPNTEACH algorithms. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 376–387Google Scholar
- 26.Huang X, Lu W, Lakshmanan LV (2016) Truss decomposition of probabilistic graphs: semantics and algorithms. In: Proceedings of the 2016 ACM SIGMOD international conference on management of data, ACM, pp 77–90Google Scholar
- 27.Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE international conference on data mining, IEEE, pp 781–786Google Scholar
- 28.Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 137–146Google Scholar
- 29.Kitsak M, Gallos LK, Havlin S, Liljeros F, Muchnik L, Stanley HE, Makse HA (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893CrossRefGoogle Scholar
- 30.Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: European conference on machine learning, Springer, pp 217–226Google Scholar
- 31.Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media?. In: Proceedings of the 19th international conference on world wide web, ACM, pp 591–600Google Scholar
- 32.Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic mathematically tractable graph generation and evolution, using kronecker multiplication. In: European conference on principles of data mining and knowledge discovery, Springer, pp 133–145Google Scholar
- 33.Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefzbMATHGoogle Scholar
- 34.Lim Y, Kang U (2015) Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 685–694Google Scholar
- 35.Luce RD (1950) Connectivity and generalized cliques in sociometric group structure. Psychometrika 15(2):169–90MathSciNetCrossRefGoogle Scholar
- 36.Macdonald B, Shakarian P, Howard N, Moores G (2012) Spreaders in the network sir model: an empirical study. arXiv preprint arXiv:1208.4269
- 37.Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, ACM, pp 29–42Google Scholar
- 38.Mokken RJ (1979) Cliques, clubs and clans. Qual Quant 13(2):161–173CrossRefGoogle Scholar
- 39.Newman ME (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
- 40.Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on world wide web, ACM, pp 201–210Google Scholar
- 41.Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 435–448Google Scholar
- 42.Rossi MEG, Malliaros FD, Vazirgiannis M (2015) Spread it good, spread it fast: identification of influential nodes in social networks. In: Proceedings of the 24th international conference on world wide web (companion volume), ACM, pp 101–102Google Scholar
- 43.Saríyüce AE, Gedik B, Jacques-Silva G, Wu KL, Çatalyürek ÜV (2013) Streaming algorithms for \(k\)-core decomposition. Proc VLDB Endow 6(6):433–444CrossRefGoogle Scholar
- 44.Saríyüce AE, Seshadhri C, Pinar A, Catalyurek UV (2015) Finding the hierarchy of dense subgraphs using nucleus decompositions. In: Proceedings of the 24th international conference on world wide web, ACM, pp 927–937Google Scholar
- 45.Schank T (2007) Algorithmic aspects of triangle-based network analysis. Ph.D. thesis, Universitt Karlsruhe (TH), Fakultt fr InformatikGoogle Scholar
- 46.Seidman SB, Foster BL (1978) A graph theoretic generalization of the clique concept. J Math Sociol 6(1):139–154MathSciNetCrossRefzbMATHGoogle Scholar
- 47.Seidman SB (1983) Network structure and minimum degree. Soc Netw 5(3):269–287MathSciNetCrossRefGoogle Scholar
- 48.Shin K, Hooi B, Faloutsos C (2016a) M-zoom: fast dense-block detection in tensors with quality guarantees. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 264–280Google Scholar
- 49.Shin K, Eliassi-Rad T, Faloutsos C (2016b) Corescope: graph mining using \(k\)-core analysis—patterns, anomalies and algorithms. In: 2016 16th IEEE international conference on data mining, IEEE, pp 469–478Google Scholar
- 50.Shin K, Hooi B, Jisu K, Faloutsos C (2017a) D-cube: dense-block detection in terabyte-scale tensors. In: Proceedings of the Tenth ACM international conference on web search and data mining, ACM, pp 681–690Google Scholar
- 51.Shin K, Hooi B, Jisu K, Faloutsos C (2017b) Densealert: incremental dense-subtensor detection in tensor streams. arXiv preprint arXiv:1706.03374
- 52.Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101CrossRefGoogle Scholar
- 53.Tsourakakis CE (2008) Fast counting of triangles in large real networks without counting: algorithms and laws. In: 2008 eighth IEEE international conference on data mining, IEEE, pp 608–617Google Scholar
- 54.Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) Doulion: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 837–846Google Scholar
- 55.Van Loan CF (2000) The ubiquitous kronecker product. J Comput Appl Math 123(1):85–100MathSciNetCrossRefzbMATHGoogle Scholar
- 56.Wang J, Cheng J (2012) Truss decomposition in massive networks. Proc VLDB Endow 5(9):812–823CrossRefGoogle Scholar
- 57.Wuchty S, Almaas E (2005) Peeling the yeast protein network. Proteomics 5(2):444–449CrossRefGoogle Scholar
- 58.Zhang S, Zhou D, Yildirim MY, Alcorn S, He J, Davulcu H, Tong H (2017) HiDDen: hierarchical dense subgraph detection with application to financial fraud detection. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 570–578Google Scholar