Advertisement

Knowledge and Information Systems

, Volume 54, Issue 3, pp 677–710 | Cite as

Patterns and anomalies in k-cores of real-world graphs with applications

  • Kijung ShinEmail author
  • Tina Eliassi-Rad
  • Christos Faloutsos
Regular Paper

Abstract

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy.

Keywords

Graph k-core Degeneracy Influential node Anomaly detection k-truss 

Notes

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Nos. CNS-1314632 and IIS-1408924. Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. Kijung Shin was supported by KFAS Scholarship. Tina Eliassi-Rad was supported by NSF CNS-1314603 and by DTRA HDTRA1-10-1-0120. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  1. 1.
    Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, Springer, pp 598–612Google Scholar
  2. 2.
    Akoglu L, McGlohon M, Faloutsos C (2010) Oddball: spotting anomalies in weighted graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 410–421Google Scholar
  3. 3.
    Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688MathSciNetCrossRefGoogle Scholar
  4. 4.
    Albert R, Jeong H, Barabsi AL (1999) Internet: diameter of the world-wide web. Nature 401(6749):130–131CrossRefGoogle Scholar
  5. 5.
    Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2006) Large scale networks fingerprinting and visualization using the \(k\)-core decomposition. Adv Neural Inf Process Syst 18:41Google Scholar
  6. 6.
    Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2008) \(K\)-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases. Netw Heterog Media 3:371MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4(1):2CrossRefGoogle Scholar
  8. 8.
    Batagelj V, Zaversnik M (2003) An o(m) algorithm for cores decomposition of networks. arXiv:cs/0310049
  9. 9.
    Beutel A, Xu W, Guruswami V, Palow C, Faloutsos C (2013) Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 119–130Google Scholar
  10. 10.
    Borgatti SP, Everett MG (2000) Models of core/periphery structures. Soc Netw 21(4):375–395CrossRefGoogle Scholar
  11. 11.
    Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577CrossRefzbMATHGoogle Scholar
  12. 12.
    Brouwer AE, Haemers WH (2001) Spectra of graphs. Springer, BerlinzbMATHGoogle Scholar
  13. 13.
    Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on approximation algorithms for combinatorial optimization, Springer, pp 84–95Google Scholar
  14. 14.
    Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62Google Scholar
  15. 15.
    Cohen J (2008) Trusses: cohesive subgraphs for social network analysis. In: National security agency technical report, p 16Google Scholar
  16. 16.
    Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 233–240Google Scholar
  17. 17.
    De Stefani L, Epasto A, Riondato M, Upfal E (2016) TRIÈST: counting local and global triangles in fully-dynamic streams with fixed memory size. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 825–834Google Scholar
  18. 18.
    Erdös P (1963) On the structure of linear graphs. Israel J Math 1(3):156–160MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Farach-Colton M, Tsai MT (2014) Computing the degeneracy of large graphs. In: Latin American symposium on theoretical informatics, Springer, pp 250–260Google Scholar
  20. 20.
    Freuder EC (1982) A sufficient condition for backtrack-free search. J ACM (JACM) 29(1):24–32MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Gehrke J, Ginsparg P, Kleinberg J (2003) Overview of the 2003 KDD cup. ACM SIGKDD Explor Newslett 5(2):149–151CrossRefGoogle Scholar
  22. 22.
    Giatsidis C, Malliaros F, Thilikos DM, Vazirgiannis M (2014) Corecluster: a degeneracy based graph clustering framework. In: Twenty-sixth annual conference on innovative applications of artificial intelligence, AAAI, pp 29–31Google Scholar
  23. 23.
    Hall BH, Jaffe AB, Trajtenberg M (2001) The NBER patent citation data file: lessons, insights and methodological tools. doi: 10.3386/w8498
  24. 24.
    Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016a) Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 895–904Google Scholar
  25. 25.
    Hooi B, Song HA, Papalexakis E, Agrawal R, Faloutsos C (2016b) Matrices, compression, learning curves: formulation, and the GROUPNTEACH algorithms. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 376–387Google Scholar
  26. 26.
    Huang X, Lu W, Lakshmanan LV (2016) Truss decomposition of probabilistic graphs: semantics and algorithms. In: Proceedings of the 2016 ACM SIGMOD international conference on management of data, ACM, pp 77–90Google Scholar
  27. 27.
    Jiang M, Beutel A, Cui P, Hooi B, Yang S, Faloutsos C (2015) A general suspiciousness metric for dense blocks in multimodal data. In: 2015 IEEE international conference on data mining, IEEE, pp 781–786Google Scholar
  28. 28.
    Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 137–146Google Scholar
  29. 29.
    Kitsak M, Gallos LK, Havlin S, Liljeros F, Muchnik L, Stanley HE, Makse HA (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893CrossRefGoogle Scholar
  30. 30.
    Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: European conference on machine learning, Springer, pp 217–226Google Scholar
  31. 31.
    Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media?. In: Proceedings of the 19th international conference on world wide web, ACM, pp 591–600Google Scholar
  32. 32.
    Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic mathematically tractable graph generation and evolution, using kronecker multiplication. In: European conference on principles of data mining and knowledge discovery, Springer, pp 133–145Google Scholar
  33. 33.
    Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Lim Y, Kang U (2015) Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 685–694Google Scholar
  35. 35.
    Luce RD (1950) Connectivity and generalized cliques in sociometric group structure. Psychometrika 15(2):169–90MathSciNetCrossRefGoogle Scholar
  36. 36.
    Macdonald B, Shakarian P, Howard N, Moores G (2012) Spreaders in the network sir model: an empirical study. arXiv preprint arXiv:1208.4269
  37. 37.
    Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, ACM, pp 29–42Google Scholar
  38. 38.
    Mokken RJ (1979) Cliques, clubs and clans. Qual Quant 13(2):161–173CrossRefGoogle Scholar
  39. 39.
    Newman ME (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  40. 40.
    Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on world wide web, ACM, pp 201–210Google Scholar
  41. 41.
    Prakash BA, Sridharan A, Seshadri M, Machiraju S, Faloutsos C (2010) Eigenspokes: surprising patterns and scalable community chipping in large graphs. In: Pacific–Asia conference on knowledge discovery and data mining, Springer, pp 435–448Google Scholar
  42. 42.
    Rossi MEG, Malliaros FD, Vazirgiannis M (2015) Spread it good, spread it fast: identification of influential nodes in social networks. In: Proceedings of the 24th international conference on world wide web (companion volume), ACM, pp 101–102Google Scholar
  43. 43.
    Saríyüce AE, Gedik B, Jacques-Silva G, Wu KL, Çatalyürek ÜV (2013) Streaming algorithms for \(k\)-core decomposition. Proc VLDB Endow 6(6):433–444CrossRefGoogle Scholar
  44. 44.
    Saríyüce AE, Seshadhri C, Pinar A, Catalyurek UV (2015) Finding the hierarchy of dense subgraphs using nucleus decompositions. In: Proceedings of the 24th international conference on world wide web, ACM, pp 927–937Google Scholar
  45. 45.
    Schank T (2007) Algorithmic aspects of triangle-based network analysis. Ph.D. thesis, Universitt Karlsruhe (TH), Fakultt fr InformatikGoogle Scholar
  46. 46.
    Seidman SB, Foster BL (1978) A graph theoretic generalization of the clique concept. J Math Sociol 6(1):139–154MathSciNetCrossRefzbMATHGoogle Scholar
  47. 47.
    Seidman SB (1983) Network structure and minimum degree. Soc Netw 5(3):269–287MathSciNetCrossRefGoogle Scholar
  48. 48.
    Shin K, Hooi B, Faloutsos C (2016a) M-zoom: fast dense-block detection in tensors with quality guarantees. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 264–280Google Scholar
  49. 49.
    Shin K, Eliassi-Rad T, Faloutsos C (2016b) Corescope: graph mining using \(k\)-core analysis—patterns, anomalies and algorithms. In: 2016 16th IEEE international conference on data mining, IEEE, pp 469–478Google Scholar
  50. 50.
    Shin K, Hooi B, Jisu K, Faloutsos C (2017a) D-cube: dense-block detection in terabyte-scale tensors. In: Proceedings of the Tenth ACM international conference on web search and data mining, ACM, pp 681–690Google Scholar
  51. 51.
    Shin K, Hooi B, Jisu K, Faloutsos C (2017b) Densealert: incremental dense-subtensor detection in tensor streams. arXiv preprint arXiv:1706.03374
  52. 52.
    Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101CrossRefGoogle Scholar
  53. 53.
    Tsourakakis CE (2008) Fast counting of triangles in large real networks without counting: algorithms and laws. In: 2008 eighth IEEE international conference on data mining, IEEE, pp 608–617Google Scholar
  54. 54.
    Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) Doulion: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 837–846Google Scholar
  55. 55.
    Van Loan CF (2000) The ubiquitous kronecker product. J Comput Appl Math 123(1):85–100MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Wang J, Cheng J (2012) Truss decomposition in massive networks. Proc VLDB Endow 5(9):812–823CrossRefGoogle Scholar
  57. 57.
    Wuchty S, Almaas E (2005) Peeling the yeast protein network. Proteomics 5(2):444–449CrossRefGoogle Scholar
  58. 58.
    Zhang S, Zhou D, Yildirim MY, Alcorn S, He J, Davulcu H, Tong H (2017) HiDDen: hierarchical dense subgraph detection with application to financial fraud detection. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 570–578Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA
  2. 2.Network Science InstituteNortheastern UniversityBostonUSA

Personalised recommendations