Data Mining and Knowledge Discovery

, Volume 29, Issue 3, pp 626–688 | Cite as

Graph based anomaly detection and description: a survey

Article

Abstract

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised versus (semi-)supervised approaches, for static versus dynamic graphs, for attributed versus plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.

Keywords

Anomaly detection Graph mining Network anomaly detection Event detection Change point detection Fraud detection Anomaly description Visual analytics 

Notes

Acknowledgments

This material is based upon work supported by the Army Research Office (ARO) under Cooperative Agreement Numbers W911NF-14-1-0029 and W911NF-09-2-0053, the Defense Advanced Research Projects Agency (DARPA) under Contract Numbers W911NF-11-C-0088, W911NF-11-C-0200 and W911NF-12-C-0028, the National Science Foundation (NSF) under Grant Nos. IIS-1217559 and IIS1017415, by Region II University Transportation Center under the Project number 49997-33-25, and the Stony Brook University Office of Vice President for Research. Any findings and conclusions expressed in this material are those of the author(s) and do not necessarily reflect the position or the policy of the U.S. Government and the other funding parties, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  1. Abe N, Zadrozny B, Langford J (2006) Outlier detection by active learning. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA, pp 504–509Google Scholar
  2. Abe N, Melville P, Pendus C, Reddy CK, Jensen DL, Thomas VP, Bennett JJ, Anderson GF, Cooley BR, Kowalczyk M, Domick M, Gardinier T (2010) Optimizing debt collections using constrained reinforcement learning. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC. ACM, pp 75–84Google Scholar
  3. Aggarwal C, Subbian K (2014) Evolutionary network analysis: a survey. ACM Comput Surv 47(1):10. doi:10.1145/2601412
  4. Aggarwal CC (2012) Outlier ensembles. In: ACM SIGKDD explorationsGoogle Scholar
  5. Aggarwal CC (2013) Outlier analysis. Springer, New York IncorporatedGoogle Scholar
  6. Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM international conference on management of data (SIGMOD), Santa Barbara, CA. ACM, pp 37–46Google Scholar
  7. Aggarwal CC, Zhao Y, Yu PS (2011) Outlier detection in graph streams. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany, pp 399–409Google Scholar
  8. Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Data Min Knowl Discov 19(2):194–209CrossRefMathSciNetGoogle Scholar
  9. Akoglu L, Faloutsos C (2010) Event detection in time series of mobile communication graphs. In: Proceedings of army science conferenceGoogle Scholar
  10. Akoglu L, McGlohon M, Faloutsos C (2010) OddBall: spotting anomalies in weighted graphs. In: Proceedings of the 14th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Hyderabad, India, pp 410–421Google Scholar
  11. Akoglu L, de Melo POSV, Faloutsos C (2012a) Quantifying reciprocity in large weighted communication networks. In: Proceedings of the 16th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Kuala Lumpur, MalaysiaGoogle Scholar
  12. Akoglu L, Tong H, Meeder B, Faloutsos C (2012b) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA. SIAM/Omnipress, pp 439–450Google Scholar
  13. Akoglu L, Tong H, Vreeken J, Faloutsos C (2012c) Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM conference on information and knowledge management (CIKM), Maui, Hawaii, pp 415–424Google Scholar
  14. Akoglu L, Chandy R, Faloutsos C (2013a) Opinion fraud detection in online reviews using network effects. In: Proceedings of the 7th international AAAI conference on weblogs and social media (ICWSM), Ann Arbor, MIGoogle Scholar
  15. Akoglu L, Vreeken J, Tong H, Duen HC, Tatti N, Faloutsos C (2013b) Mining connection pathways for marked nodes in large graphs. In: Proceedings of the 13th SIAM international conference on data mining (SDM), Texas-Austin, TXGoogle Scholar
  16. Ambai M, Utama NP, Yoshida Y (2011) Dimensionality reduction for histogram features based on supervised non-negative matrix factorization. IEICE Trans Inf Syst 94-D(10):1870–1879Google Scholar
  17. Andersen R, Chung F, Lang K (2006) Local graph partitioning using pagerank vectors. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science. IEEE Computer Society, pp 475–486Google Scholar
  18. Ando S (2007) Clustering needles in a haystack: an information theoretic analysis of minority and outlier detection. In: Proceedings of the 7th IEEE international conference on data mining (ICDM), Omaha, NE, pp 13–22Google Scholar
  19. Antonellis I, Garcia-Molina H, Chang C-C (2008) Simrank++: query rewriting through link analysis of the click graph. In: Proceedings of the 34nd international conference on very large data bases (VLDB), Auckland, New Zealand, pp 408–421Google Scholar
  20. Araujo M, Papadimitriou S, Günnemann S, Faloutsos C, Basu P, Swami A, Papalexakis E, Koutra D (2014) Com2: fast automatic discovery of temporal (comet) communities. In: Proceedings of the 18th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Tainan, TaiwanGoogle Scholar
  21. Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA. ACM, pp 44–54Google Scholar
  22. Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512CrossRefMathSciNetGoogle Scholar
  23. Bay SD, Pazzani MJ (1999) Detecting change in categorical data: mining contrast sets. In: Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. ACM Press, pp 302–306Google Scholar
  24. Bayati M, Gleich DF, Saberi A, Wang Y (2013) Message passing algorithms for sparse network alignment. ACM Trans Knowl Discov Data 7(1):3:1–3:31Google Scholar
  25. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R (2006) Link-based characterization and detection of Web Spam. In: Second international workshop on adversarial information retrieval on the web (AIRWeb)Google Scholar
  26. Benczúr AA, Csalogány K, Sarlós T, Uher M (2005) Spamrank: fully automatic link spam detection. In: Proceedings of the first international workshop on adversarial information retrieval on the webGoogle Scholar
  27. Berlingerio M, Koutra D, Eliassi-Rad T, Faloutsos C (2012) Netsimile: a scalable approach to size-independent network similarity. CoRR, abs/1209.2684Google Scholar
  28. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235Google Scholar
  29. Bilgin CC, Yener B (2006) Dynamic Network Evolution: Models, Clustering, Anomaly Detection. Rensselaer Polytechnic Institute, Troy, NYGoogle Scholar
  30. Boden B, Günnemann S, Hoffmann H, Seidl T (2012a) Mining coherent subgraphs in multi-layer graphs with edge labels. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China. ACM, pp 1258–1266Google Scholar
  31. Boden B, Günnemann S, Seidl T (2012b) Tracing clusters in evolving graphs with node attributes. In: Proceedings of the 21st ACM conference on information and knowledge management (CIKM 2012), Maui, USAGoogle Scholar
  32. Böhm C, Haegler K, Müller NS, Plant C (2009) CoCo: coding cost for parameter-free outlier detection. In: Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France. ACM, pp 149–158Google Scholar
  33. Bolton RJ, Hand DJ (2001) Unsupervised profiling methods for fraud detection. In: Proceedings of conference credit scoring and credit control VII, pp 5–7Google Scholar
  34. Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17(3):235–255Google Scholar
  35. Bonacich P, Lloyd P (2001) Eigenvector-like measures of centrality for asymmetric relations. Soc Netw 23(3):191–201CrossRefGoogle Scholar
  36. Box GEP, Jenkins G (1990) Time series analysis. Forecasting and Control, Holden-Day, IncorporatedGoogle Scholar
  37. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX. ACM, pp 93–104Google Scholar
  38. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7):107–117Google Scholar
  39. Bunke H (1999) Error correcting graph matching: on the influence of the underlying cost function. IEEE Trans Pattern Anal Mach Intell 21(9):917–922CrossRefGoogle Scholar
  40. Bunke H, Dickinson PJ, Humm A, Irniger C, Kraetzl M (2006a) Computer network monitoring and abnormal event detection using graph matching and multidimensional scaling. In Proceedings of 6th industrial conference on data mining (ICDM), pp 576–590Google Scholar
  41. Bunke H, Dickinson PJ, Kraetzl M, Wallis WD (2006b) A graph-theoretic approach to enterprise network dynamics (PCS). Birkhauser, BaselGoogle Scholar
  42. Canali D, Cova M, Vigna G, Kruegel C (2011) Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 19th international conference on World Wide Web (WWW), Hyderabad, India. ACM, pp 197–206Google Scholar
  43. Castillo C, Donato D, Gionis A, Murdock V, Silvestri F (2007) Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th international conference on research and development in information retrieval (SIGIR), Amsterdam. ACM, pp 423–430Google Scholar
  44. Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 1(4):300–307MathSciNetGoogle Scholar
  45. Chakrabarti D (2004) Autopart: parameter-free graph partitioning and outlier detection. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa. Italy. Springer, New York, pp 112–124Google Scholar
  46. Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA. ACM, pp 554–560Google Scholar
  47. Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web (WWW), Alberta, Canada, pp 571–580Google Scholar
  48. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:15:1–15:58Google Scholar
  49. Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839CrossRefGoogle Scholar
  50. Chartrand G, Kubicki G, Schulz M (1998) Graph similarity and distance in graphs. Aequ Math 55(1–2):129–145CrossRefMATHGoogle Scholar
  51. Chau DH, Pandit S, Faloutsos C (2006) Detecting fraudulent personalities in networks of online auctioneers. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), Berlin, Germany, pp 103–114Google Scholar
  52. Chau DH, Akoglu L, Vreeken J, Tong H, Faloutsos C (2012) Tourviz: interactive visualization of connection pathways in large graphs. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 1516–1519Google Scholar
  53. Chaudhary A, Szalay AS, Moore AW (2002) Very fast outlier detection in large multidimensional data sets. In Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery (DMKD), Madison, WIGoogle Scholar
  54. Chen H.-H, Giles CL (2013) ASCOS: an asymmetric network structure context similarity measure. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Niagara Falls, CanadaGoogle Scholar
  55. Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell 42(2–3):393–405CrossRefMATHGoogle Scholar
  56. Cortes C, Pregibon D (2001) Signature-based methods for data streams. Data Min Knowl Discov 5(3):167–182CrossRefMATHGoogle Scholar
  57. Cortes C, Fisher K, Pregibon D, Rogers A (2000) Hancock: a language for extracting signatures from data streams. In: Proceedings of the 6th ACM international conference on knowledge discovery and data mining (SIGKDD), Boston, MA. ACM, pp 9–17Google Scholar
  58. Cortes C, Pregibon D, Volinsky C (2002) Communities of interest. Intell Data Anal 6(3):211–219MATHGoogle Scholar
  59. Dai H, Zhu F, Lim E-P, Pang HH (2012) Detecting anomalies in bipartite graphs with mutual dependency principles. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, Belgium. IEEE Computer Society, pp 171–180Google Scholar
  60. Damnjanovic U, Virginia FA, Izquierdo E, Martínez JM (2008) Event detection and clustering for surveillance video summarization. In: 9th international workshop on image analysis for multimedia interactive services. IEEE Computer Society, pp 63–66Google Scholar
  61. Das K, Schneider JG (2007) Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA. ACM, pp 220–229Google Scholar
  62. Davis M, Liu W, Miller P, Redpath G (2011) Detecting anomalies in graphs with numeric labels. In: Proceedings of the 21st ACM conference on information and knowledge management (CIKM), Glasgow, Scotland. ACM, pp 1197–1202Google Scholar
  63. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  64. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC. ACM, pp 89–98Google Scholar
  65. Dickinson P, Bunke H, Dadej A, Kraetzl M (2002) Median graphs and anomalous change detection in communication networks. In: Information, decision and control. Final Program and Abstracts, pp 59–64Google Scholar
  66. Ding Q, Katenka N, Barford P, Kolaczyk ED, Crovella M (2012) Intrusion as (anti)social communication: characterization and detection. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China. ACM, pp 886–894Google Scholar
  67. Drineas P, Kannan R, Mahoney MW (2006) Fast monte carlo algorithms for matrices iii: computing a compressed approximate matrix decomposition. SIAM J Comput 36(1):184–206CrossRefMATHMathSciNetGoogle Scholar
  68. Eberle W, Holder LB (2007) Discovering structural anomalies in graph-based data. In: Proceedings of the international workshop on mining graphs and complex structures at the 7th IEEE international conference on data mining (ICDM), Omaha, NE. IEEE Computer Society, pp 393–398Google Scholar
  69. Eberle W, Holder LB (2009) Graph-based approaches to insider threat detection. In: Proceedings of the 5th annual cyber security and information intelligence research workshop (CSIIRW). ACM, p 44Google Scholar
  70. Edge ME, Falcone Sampaio PR (2009) A survey of signature based methods for financial fraud detection. Comput Secur 28(6):381–394CrossRefGoogle Scholar
  71. Elghawalby H, Hancock ER (2008) Measuring graph similarity using spectral geometry. In: Proceedings of the 5th international conference on image analysis and recognition (ICIAR), pp 517–526Google Scholar
  72. Faloutsos C, McCurley KS, Tomkins A (2004) Fast discovery of connection subgraphs. In: Proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA, pp 118–127Google Scholar
  73. Fawcett T, Provost FJ (1996) Combining data mining and machine learning for effective user profiling. In: Proceedings of the 2nd AAAI international conference on knowledge discovery and data mining (KDD), Portland, OR. AAAI Press, pp 8–13Google Scholar
  74. Fawcett T, Provost FJ (1999) Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. ACM, pp 53–62Google Scholar
  75. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 5th international joint conference on artificial intelligence (IJCAI), Chambery, France. Morgan Kaufmann, pp 1022–1029Google Scholar
  76. Federal Bureau of Investigation (FBI) (2009) Online auction fraudGoogle Scholar
  77. Feller W (1968) An introduction to probability theory and its applications. Wiley, New YorkMATHGoogle Scholar
  78. Feng S, Banerjee R, Choi Y (2012a) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics (ACL), Jeju Island, KoreaGoogle Scholar
  79. Feng S, Xing L, Gogar A, Choi Y (2012b) Distributional footprints of deceptive product reviews. In: Proceedings of the 6th international AAAI conference on weblogs and social media (ICWSM), Dublin, IrelandGoogle Scholar
  80. Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(98):298–305MathSciNetGoogle Scholar
  81. Fisher NI, Lewis T, Embleton BJJ (1993) Statistical analysis of spherical data. Cambridge University Press, Cambridge, MAMATHGoogle Scholar
  82. Flegel U, Vayssire J, Bitz G (2010) A state of the art survey of fraud detection technology. In: Insider threats in cyber security, volume 49 of advances in information security. Springer, Berlin, pp 73–84Google Scholar
  83. Freeman LC (1977) A set of measures of centrality based upon betweenness. Sociometry 40:35–41CrossRefGoogle Scholar
  84. Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the 11th international joint conference on artificial intelligence (IJCAI), Stockholm, Sweden, pp 1300–1309Google Scholar
  85. Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV. ACM, pp 256–264Google Scholar
  86. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence. Springer, Berlin, pp 286–295Google Scholar
  87. Gao H, Chen Y, Lee K, Palsetia D, Choudhary A (2012) Towards online spam filtering in social networks. In: Proceedings of the 19th annual network & distributed system security symposiumGoogle Scholar
  88. Gao J, Tan P-N (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China, pp 212–221Google Scholar
  89. Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010a) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC. ACM, pp 813–822Google Scholar
  90. Gao X, Xiao B, Tao D, Li X (2010b) A survey of graph edit distance. J Pattern Anal Appl 13(1):113–129CrossRefMathSciNetGoogle Scholar
  91. Gaston ME, Kraetzl M, Wallis WD (2006) Using graph diameter for change detection in dynamic networks. Aust J Comb, 299–311Google Scholar
  92. Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Discov 16(3):349–364CrossRefMathSciNetGoogle Scholar
  93. Glaz J, Naus J, Wallenstein S (2001) Scan Statistics. SpringerGoogle Scholar
  94. Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore, MDMATHGoogle Scholar
  95. Grigg OA, Farewell VT, Spiegelhalter DJ (2003) Use of risk-adjusted cusum and rspert charts for monitoring in medial contexts. Stat Methods Med ResGoogle Scholar
  96. Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia. IEEE Computer Society, pp 845–850Google Scholar
  97. Günnemann S, Boden B, Seidl T (2012) Finding density-based subspace clusters in graphs with feature vectors. Data Min Knowl Discov 25(2):243–269CrossRefMATHMathSciNetGoogle Scholar
  98. Gupta M, Gao J, Sun Y, Han J (2012) Integrating community matching and outlier detection for mining evolutionary community outliers. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China. ACM, pp 859–867Google Scholar
  99. Gupta M, Gao J, Aggarwal CC, Han J (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 99(PrePrints):1. ISSN 1041–4347Google Scholar
  100. Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data. Synthesis lectures on data mining and knowledge discovery. Morgan & Claypool PublishersGoogle Scholar
  101. Gupte M, Eliassi-Rad T (2012) Measuring tie strength in implicit social networks. In: Proceedings of the ACM conference on web science, Evanston, IL. ACM, pp 109–118Google Scholar
  102. Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the 30th international conference on very large data bases (VLDB), Canada, Toronto, pp 576–587Google Scholar
  103. Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796CrossRefGoogle Scholar
  104. Hawkins D (1980) Identification of outliers. Chapman and Hall, LondonCrossRefMATHGoogle Scholar
  105. He Z, Xiaofei X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24(9–10):1641–1650CrossRefMATHGoogle Scholar
  106. Heard NA, Weston DJ, Platanioti K, Hand DJ (2010) Bayesian anomaly detection methods for social networks. Ann Appl Stat 4:645–662CrossRefMATHMathSciNetGoogle Scholar
  107. Hempstalk K, Frank E, Witten IH (2008) One-class classification by combining density and class probability estimation. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Antwerp, Belgium. Springer, BerlinGoogle Scholar
  108. Henderson K, Eliassi-Rad T, Faloutsos C, Akoglu L, Li L Maruhashi K, Prakash BA, Tong H (2010) Metricforensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 163–172Google Scholar
  109. Henderson K, Gallagher B, Li L, Akoglu L, Eliassi-Rad T, Tong H, Faloutsos C (2011) It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. ACM, pp 663–671Google Scholar
  110. Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) RolX: structural role extraction & mining in large graphs. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 1231–1239Google Scholar
  111. Idé T, Kashima H (2004) Eigenspace-based anomaly detection in computer systems. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA. ACM, pp 440–449Google Scholar
  112. Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Sumeet S, Varghese G (2007) Network monitoring using traffic dispersion graphs. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, San Diego, CA. ACM, pp 24–26Google Scholar
  113. Iliofotou M, Kim H, Faloutsos M, Mitzenmacher M, Pappu P, Varghese G (2011) Graption: a graph-based P2P traffic classification framework for the internet backbone. Comput Netw 55(8):1909–1920CrossRefGoogle Scholar
  114. Invernizzi L, Comparetti PM (2012) Evilseed: a guided approach to finding malicious web pages. In: IEEE symposium on security and privacy, pp 428–442Google Scholar
  115. Ishibashi K, Kondoh T, Harada S, Mori T, Kawahara R, Asano S (2010) Detecting anomalous traffic using communication graphs. In: Telecommunications: the infrastructure for the 21st century (WTC), pp 1–6Google Scholar
  116. Jansen BJ (2008) Click fraud. IEEE Comput 40(7):85–86CrossRefGoogle Scholar
  117. Janssens JHM, Flesch I, Postma EO (2009) Outlier detection with one-class classifiers from ML and KDD. In: Proceedings of the 8th international conference on machine learning and applications (ICMLA), Miami Beach, FL. IEEE Computer Society, pp 147–153Google Scholar
  118. Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining (SIGKDD), Edmonton, Alberta, pp 538–543Google Scholar
  119. Jensen D, Neville J, Gallagher B (2004) Why collective inference improves relational classification. In: Proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA, pp 593–598Google Scholar
  120. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceeding of the 1st ACM international conference on web search and data mining (WSDM), pp 219–230Google Scholar
  121. Jindal N, Liu B, Lim E-P (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM conference on information and knowledge management (CIKM), Toronto, Canada. ACM, pp 1549–1552Google Scholar
  122. Kahneman D (2011) Thinking, fast and slow. Farrar, Straus and GirouxGoogle Scholar
  123. Kang U, McGlohon M, Akoglu L, Faloutsos C (2010) Patterns on the connected components of terabyte-scale graphs. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia, pp 875–880Google Scholar
  124. Kang U, Chau DH, Faloutsos C (2011a) Mining large graphs: algorithms, inference, and discoveries. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany. IEEE Computer Society, pp 243–254Google Scholar
  125. Kang U, Papadimitriou S, Sun J, Tong H (2011b) Centralities in large networks: algorithms and observations. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 119–130Google Scholar
  126. Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011c) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5: 8:1–8:24. ISSN 1556–4681Google Scholar
  127. Kang U, Tong H, Sun J (2012) Fast random walk graph kernel. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CAGoogle Scholar
  128. Kang U, Lee J.-Y., Koutra D, Faloutsos C (2014) Net-Ray: visualizing and mining web-scale graphs. In: Proceedings of the 18th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Tainan, TaiwanGoogle Scholar
  129. Kapsabelis KM, Dickinson PJ, Dogancay K (2007) Investigation of graph edit distance cost functions for detection of network anomalies. In: Proceedings of the 13th Biennial computational techniques and applications conference, CTAC-2006, volume 48 of ANZIAM journal, pp C436–C449Google Scholar
  130. Karypis G, Kumar V (1995) Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report, University of Minnesota, Department of Computer ScienceGoogle Scholar
  131. Karypis G, Kumar V (1996) Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE conference on supercomputing (CDROM), Supercomputing ’96. IEEE Computer SocietyGoogle Scholar
  132. Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. In: Proceedings of the twentieth international conference on machine learning. AAAI Press, pp 321–328Google Scholar
  133. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43CrossRefMATHGoogle Scholar
  134. Keller F, Müller E, Böhm K (2012) Hics: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1037–1048Google Scholar
  135. Kelmans AK (1976) Comparison of graphs by their number of spanning trees. Discrete Math 16(3):241–261CrossRefMATHMathSciNetGoogle Scholar
  136. Kleinberg JM (1998) Authoritative sources in a hyperlinked environment. In: Proceedings of the 5th Annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 668–677Google Scholar
  137. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases (VLDB), New York City, NY, pp 392–403Google Scholar
  138. Kontkanen P, Myllymki P (2007) MDL histogram density estimation. J Mach Learn Res Proc Track 2:219–226Google Scholar
  139. Koren Y, North SC, Volinsky C (2006) Measuring and extracting proximity in networks. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA, pp 245–255 (2006)Google Scholar
  140. Koutra D, Ke T-Y, Kang U, Chau DH, Pao H-KK, Faloutsos C (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Greece, Athens, pp 245–260Google Scholar
  141. Koutra D, Papalexakis E, Faloutsos C (2012) Tensorsplat: spotting latent anomalies in time. In: 16th panhellenic conference on informatics (PCI)Google Scholar
  142. Koutra D, Tong H, Lubensky D (2013a) Big-Align: fast bipartite graph alignment. In: Proceedings of the 13th IEEE international conference on data mining (ICDM), Dallas, TexasGoogle Scholar
  143. Koutra D, Vogelstein J, Faloutsos C (2013b) Deltacon: a principled massive-graph similarity function. In: Proceedings of the 13th SIAM international conference on data mining (SDM), Texas-Austin, TXGoogle Scholar
  144. Krausz B, Herpers R (2010) MetroSurv: detecting events in subway stations. Multimed Tools Appl 50(1):123–147CrossRefGoogle Scholar
  145. Kriegel H-P, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: Proceedings of the 12th IEEE international conference on data mining (ICDM). Brussels, Belgium, pp 379–388Google Scholar
  146. Krishnan V, Raj R (2006) Web spam detection with anti-trust rank. In: Proceedings of the 2nd international workshop on adversarial IR on the Web at the 29th international conference on research and development in information retrieval (SIGIR), Seattle, WA, pp 37–40Google Scholar
  147. Kshetri N (2010) The economics of click fraud. IEEE Secur Priv 8(3):45–53CrossRefGoogle Scholar
  148. Kuang D, Park H, Ding CHQ (2012) Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA, pp 106–117Google Scholar
  149. Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26:1481–1496CrossRefMATHMathSciNetGoogle Scholar
  150. Kumar M, Ghani R, Mei Z-S (2010) Data mining to predict and prevent errors in health insurance claims processing. In: Proceedings of the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC. ACM, pp 65–74Google Scholar
  151. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 2001 IEEE international conference on data mining, proceedings of the 1st IEEE international conference on data mining (ICDM), San Jose, CA, Washington, DC, USA. IEEE Computer Society, pp 313–320Google Scholar
  152. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 157–166Google Scholar
  153. Lee DD, Sebastian HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of the 14th annual conference on neural information processing systems (NIPS), Denver, CO, pp 556–562Google Scholar
  154. Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: Proceedings of the 33rd international conference on research and development in information retrieval (SIGIR), Switzerland, Geneva, pp 435–442Google Scholar
  155. Leeuwen M, Siebes A (2008) Streamkrimp: detecting change in data streams. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Antwerp, Belgium. Springer, Berlin, pp 672–687Google Scholar
  156. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL. ACM, pp 177–187Google Scholar
  157. Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World Wide Web (WWW), Raleigh, NC, New York, NY, USA, ACM, pp 631–640Google Scholar
  158. Li G, Semerci M, Yener B, Zaki MJ (2011a) Graph classification via topological and label attributes. In: Proceedings of the 9th international workshop on mining and learning with graphs (MLG), San Diego, USAGoogle Scholar
  159. Li L, Liang C.-JM, Liu J, Nath S, Terzis A, Faloutsos C (2011b) Thermocast: a cyber-physical forecasting model for data centers. In: Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. ACMGoogle Scholar
  160. Li Z, Xiong H, Liu Y, Zhou A (2010) Detecting blackhole and volcano patterns in directed networks. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia. IEEE Computer Society, pp 294–303Google Scholar
  161. Liben-Nowell D, Kleinberg JM (2003) The link prediction problem for social networks. In: Proceedings of the 12th ACM conference on information and knowledge management (CIKM), New Orleans, LA, pp 556–559Google Scholar
  162. Lieto G, Orsini F, Pagano G (2008) Cluster analysis for anomaly detection. In: Proceedings of the 2nd international conference on complex, intelligent and software intensive systems (CISIS), Barcelona, Spain, volume 53 of advances in soft computing. Springer, Berlin, pp 163–169Google Scholar
  163. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery (DMKD), San Diego, CA. ACM, pp 2–11Google Scholar
  164. Liu B, Xiao Y, Cao L, Hao Z, Deng F (2013) Svdd-based outlier detection on uncertain data. Knowl Inf Syst 34(3):597–618CrossRefGoogle Scholar
  165. Liu C, Yan X, Yu H, Han J, Philip SY (2005) Mining behavior graphs for “backtrace” of noncrashing bugs. In: Proceedings of the 5th SIAM international conference on data mining (SDM), Newport Beach, CAGoogle Scholar
  166. Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th international conference on machine learning (ICML), Washington, DCGoogle Scholar
  167. Ma J, Saul LK, Savage S, Voelker GM (2009) Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France. ACM, pp 1245–1254Google Scholar
  168. Macindoe O, Richards W (2010) Graph comparison using fine structure analysis. In: International conference on privacy, security, risk and trust (SocialCom/PASSAT), pp 193–200Google Scholar
  169. Macskassy S, Provost F (2003) A simple relational classifier. In: Proceedings of the KDD-workshop on multi-relational data mining (MRDM), Washington, DC, pp 64–76Google Scholar
  170. Margineantu DD, Wong W-K, Dash D (2010) Machine learning algorithms for event detection. Mach Learn 79(3):257–259CrossRefGoogle Scholar
  171. McGlohon M, Bay S, Anderle MG, Steier DM, Faloutsos C (2009) Snare: a link analytic system for graph labeling and risk detection. In: Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France, pp 1265–1274Google Scholar
  172. Medina A, Lakhina A, Matta I, Byers JW (2001) BRITE: an approach to universal topology generation. In: Proceedings of the IEEE 9th international symposium on modeling, analysis and simulation of computer and telecommunication systems. IEEE Computer SocietyGoogle Scholar
  173. Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th international conference on data engineering (ICDE), San Jose, CAGoogle Scholar
  174. Miller DJ, Browning J (2003) A mixture model and em-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. IEEE Trans Pattern Anal Mach Intell 25(11):1468–1483CrossRefGoogle Scholar
  175. Mongiovi M, Bogdanov P, Ranca R, Singh AK, Papalexakis EE, Faloutsos C (2013) Netspot: spotting significant anomalous regions on dynamic networks. In: Proceedings of the 13th SIAM international conference on data mining (SDM), Texas-Austin, TXGoogle Scholar
  176. Montgomery DC (1997) Introduction to statistical quality control, 3rd edn. Wiley, New YorkMATHGoogle Scholar
  177. Müller E, Schiffer M, Seidl T (2010) Adaptive outlierness for subspace outlier ranking. In: Proceedings of the 19th ACM conference on information and knowledge management (CIKM), Toronto, Canada. ACM, pp 1629–1632Google Scholar
  178. Müller E, Assent I, Sanchez PI, Mülle Y, Böhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, Belgium. IEEE Computer Society, pp 529–538Google Scholar
  179. Müller E, Sánchez PI, Mülle Y, Böhm K (2013) Ranking outlier nodes in subspaces of attributed graphs. In: Proceedings of the 4th international workshop on graph data management: techniques and applicationsGoogle Scholar
  180. Naus JI (1982) Approximations for distributions of scan statistics. J Am Stat Assoc 77(377):177–183CrossRefMATHMathSciNetGoogle Scholar
  181. Neil J (2011) Scan statistics for the online detection of locally anomalous subgraphs. PhD thesis, University of New MexicoGoogle Scholar
  182. Neill DB, Wong W.-K (2009) A tutorial on event detection tutorial. In: ACM international conference on knowledge discovery and data mining (SIGKDD)Google Scholar
  183. Neville J, Jensen D (2000) Iterative classification in relational data. In: Proceedings of the AAAI workshop on learning statistical models from relational data. AAAI Press, pp 13–20Google Scholar
  184. Neville J, Jensen D (2003) Collective classification with relational dependency networks. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DCGoogle Scholar
  185. Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DCGoogle Scholar
  186. Neville J, Simsek O, Jensen D, Komoroske J, Palmer K, Goldberg HG (2005) Using relational knowledge discovery to prevent securities fraud. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL, pp 449–458Google Scholar
  187. Newman MEJ (2004) Detecting community structure in networks. Eur Phys J B 38:321–330CrossRefGoogle Scholar
  188. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  189. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRefGoogle Scholar
  190. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems. MIT Press, pp 849–856Google Scholar
  191. Nikulin V, Huang T-H (2012) Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptive learning rates. J Mach Learn Res Proc Track 27:181–194Google Scholar
  192. Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: Proceedings of the 9th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington, DC, pp 631–636Google Scholar
  193. Noh JD, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92:118701CrossRefGoogle Scholar
  194. Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the World Wide Web conference. Edinburgh, Scotland, pp 83–92Google Scholar
  195. Orair GH, Teixeira CHC, Wang Y, Meira W Jr, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(2):1469–1480CrossRefGoogle Scholar
  196. Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228CrossRefMathSciNetGoogle Scholar
  197. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th annual meeting of the association for computational linguistics (ACL), Portland, OR, pp 309–319Google Scholar
  198. Ott M, Cardie C, Hancock JT (2012) Estimating the prevalence of deception in online review communities. In: Proceedings of the 21st international conference on World Wide Web (WWW). Lyon, France. ACM, pp 201–210Google Scholar
  199. Pandit S, Chau DH, Wang S, Faloutsos C (2007) Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th international conference on World Wide Web (WWW), Alberta, CanadaGoogle Scholar
  200. Papadimitriou P, Dasdan A, Garcia-Molina H (2008) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):1167Google Scholar
  201. Papadimitriou S, Sun J (2008) DisCo: distributed co-clustering with map-reduce: a case study towards petabyte-scale end-to-end mining. In: Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy. IEEE Computer Society, pp 512–521Google Scholar
  202. Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering (ICDE), Bangalore, India. IEEE Computer Society, pp 315–326Google Scholar
  203. Papalexakis EE, Faloutsos C, Sidiropoulos ND (2012) Parcube: sparse parallelizable tensor decompositions. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD). Bristol, UK, pp 521–536Google Scholar
  204. Pauwels EJ, Ambekar O (2011) One class classification for anomaly detection: support vector data description revisited. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), vol 6870, Vancouver, Canada, pp 25–39Google Scholar
  205. Peabody M (2003) Finding groups of graphs in databases. Master’s thesis, Drexel UniversityGoogle Scholar
  206. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572CrossRefGoogle Scholar
  207. Peel L, Clauset A (2014) Detecting change points in the large-scale structure of evolving networks. CoRR, abs/1403.0989Google Scholar
  208. Pelillo M (1999) Replicator equations, maximal cliques, and graph isomorphism. Neural Comput 11(8):1933–1955CrossRefGoogle Scholar
  209. Perozzi B, Akoglu L, Sanchez PI, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: ACM special interest group on knowledge discovery and data mining (SIG-KDD)Google Scholar
  210. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKDD Explor 6(1):50–59CrossRefGoogle Scholar
  211. Phua C, Lee VCS, Smith-Miles K, Gayler RW (2010) A comprehensive survey of data mining-based fraud detection research. CoRR, abs/1009.6119Google Scholar
  212. Pincombe B (2005) Anomaly detection in time series of graphs using arma processes. ASOR Bull 24(4): 2–10Google Scholar
  213. Priebe CE, Conroy JM, Marchette DJ, Park Y (2005) Scan statistics on enron graphs. Comput Math Organ Theory 11(3):229–247. ISSN 1381–298XGoogle Scholar
  214. Provos N, McNamee D, Mavrommatis P, Wang K, Modadugu N (2007) The ghost in the browser: analysis of web-based malware. In: Proceedings of the 1st workshop on hot topics in understanding botnets (HotBots)Google Scholar
  215. Radke RJ, Andra S, Al-Kofahi O, Roysam B (2005) Image change detection algorithms: a systematic survey. IEEE Trans Image Process 14(3):294–307CrossRefMathSciNetGoogle Scholar
  216. Rahman MS, Huang T.-K., Madhyastha HV, Faloutsos M 2012) Efficient and scalable socware detection in online social networks. In: Proceedings of the 21st USENIX conference on Security symposium (Security). USENIX Association, pp 32–32Google Scholar
  217. Ramakrishnan C, Milnor W, Perry M, Sheth A (2005) Discovering informative connection subgraphs in multi-relational graphs. In: SIGKDD explorations special issue on link miningGoogle Scholar
  218. Rissanen J (1999) Hypothesis selection and testing by the MDL principle. Comput J 42:260–269CrossRefMATHGoogle Scholar
  219. Rossi RA, Gallagher B, Neville J, Henderson K (2012) Role-dynamics: fast mining of large dynamic networks. In: Proceedings of the 21st international conference on World Wide Web (WWW), Lyon, France, WWW ’12 Companion. ACM, pp 997–1006Google Scholar
  220. Rossi RA, Gallagher B, Neville J, Henderson K (2013) Modeling dynamic behavior in large evolving graphs. In: Proceeding of the 6th ACM international conference on Web search and data mining (WSDM), pp 667–676Google Scholar
  221. Ruts I, Rousseeuw PJ (1996) Computing depth contours of bivariate point clouds. Comput Stat Data Anal 23(1):153–168CrossRefMATHGoogle Scholar
  222. Saltenis V (2004) Outlier detection based on the distribution of distances between data points. Informatica (Lithuanian Academy of Sciences) 15(3):399–410MathSciNetGoogle Scholar
  223. Schubert E, Zimek A, Kriegel H-P (2012) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Mining Knowl Discov 28(1): 190–237. doi:10.1007/s10618-012-0300-z
  224. Sen P, Namata G, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–106Google Scholar
  225. Shi J, Malik J (1997) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905Google Scholar
  226. Shoubridge P, Kraetzl M, Ray D (1999) Detection of abnormal change in dynamic networks. In: Information, decision and control, 1999. IDC 99. Proceedings. pp 557–562Google Scholar
  227. Shoubridge P, Kraetzl M, Wallis WD, Bunke H (2002) Detection of abnormal change in a time series of graphs. J Interconnect Netw 3(1–2):85–101CrossRefGoogle Scholar
  228. Smets K, Vreeken J (2011) The Odd One Out: identifying and characterising anomalies. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 804–815Google Scholar
  229. Sun H, Huang J, Han J, Deng H, Zhao P, Feng B (2010) gskeletonclu: density-based network clustering via structure-connected tree division or agglomeration. In: Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia. IEEE Computer Society, pp 481–490Google Scholar
  230. Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the 5th IEEE international conference on data mining (ICDM), Houston, TX. IEEE Computer Society, pp 418–425Google Scholar
  231. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA, pp 374–383Google Scholar
  232. Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007a) GraphScope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA. ACM, pp 687–696Google Scholar
  233. Sun J, Xie Y, Zhang H, Faloutsos C (2007b) Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of the 7th SIAM international conference on data mining (SDM), Minneapolis, MNGoogle Scholar
  234. Sun J, Xie Y, Zhang H, Faloutsos C (2008) Less is more: sparse graph mining with compact matrix decomposition. Stat Anal Data Min 1(1): 6–22. ISSN 1932–1864Google Scholar
  235. Taniguchi M, Haft M, Hollmen J, Tresp V (1998) Fraud detection in communication networks using neural and probabilistic methods. Acoust Speech Signal Process 2:1241–1244Google Scholar
  236. Tantipathananandh C, Berger-Wolf T (2009) Constant-factor approximation algorithms for identifying dynamic communities. In: Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France. ACM, pp 827–836Google Scholar
  237. Tantipathananandh C, Berger-Wolf T (2011) Finding communities in dynamic social networks. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), Vancouver, Canada. IEEE, pp 1236–1241Google Scholar
  238. Tantipathananandh C, Berger-Wolf T, Kempe D (2007) A framework for community identification in dynamic social networks. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA, New York, NY, USA, ACM, pp 717–726Google Scholar
  239. Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of the 18th conference on uncertainty in artificial intelligence (UAI), Alberta, Canada, pp 485–492Google Scholar
  240. Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA, pp 404–413Google Scholar
  241. Tong H, Lin C-Y (2011) Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ, pp 143–153Google Scholar
  242. Tong H, Lin C-Y (2012) Non-negative residual matrix factorization: problem definition, fast solutions, and applications. Stat Anal Data Min 5(1):3–15CrossRefMathSciNetGoogle Scholar
  243. Tong H, Papadimitriou S, Jimeng S, Yu PS, Faloutsos C (2008) Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV, pp 686–694Google Scholar
  244. Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42CrossRefMathSciNetGoogle Scholar
  245. Vishwanathan SVN, Schraudolph NN, Kondor RI, Borgwardt KM (2010) Graph kernels. J Mach Learn Res 11:1201–1242MATHMathSciNetGoogle Scholar
  246. Wang G, Xie S, Liu B, Yu PS (2011a) Review graph based online store review spammer detection. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), Vancouver, Canada, pp 1242–1247Google Scholar
  247. Wang G, Xie S, Liu B, Yu PS (2012a) Identify online store review spammers via social review graph. ACM Trans Intell Syst Technol 3(4):61Google Scholar
  248. Wang L, Rege M, Dong M, Ding Y (2012b) Low-rank kernel matrix factorization for large-scale evolutionary clustering. IEEE Trans Knowl Data Eng 24(6):1036–1050CrossRefGoogle Scholar
  249. Wang X, Wang X, Wilkes DM (2012c) A minimum spanning tree-inspired clustering-based outlier detection technique. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Belgium, Brussels, pp 209–223Google Scholar
  250. Wang Y, Parthasarathy S, Tatikonda S (2011b) Locality sensitive outlier detection: a ranking driven approach. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, Germany, pp 410–421Google Scholar
  251. Watts DJ (1999) Small worlds. Princeton University Press, Princeton, NJGoogle Scholar
  252. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442. ISSN 00280836Google Scholar
  253. Wilson RC, Zhu P (2008) A study of graph spectra for comparing graphs and trees. J Pattern Recognit 41(9):2833–2841CrossRefMATHGoogle Scholar
  254. Wong W.-K., Moore A, Cooper G, Wagner M (2005) What’s strange about recent events (wsare): an algorithm for the early detection of disease outbreaks. J Mach Learn Res 6:1961–1998. ISSN 1532–4435Google Scholar
  255. Wu B, Goel V, Davison BD (2006) Propagating trust and distrust to demote web spam. In: Proceedings of the workshop models of trust for the Web (MTW) at the 15th international World Wide Web Conference (WWW), Edinburgh, Scotland, volume 190 of CEUR workshop proceedingsGoogle Scholar
  256. Wu R-S, Ou C-S, Lin HY, Chang S-I, Yen DC (2012) Using data mining technique to enhance tax evasion detection performance. Expert Syst Appl 39(10):8769–8777CrossRefGoogle Scholar
  257. Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 823–831Google Scholar
  258. Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA. ACM, pp 824–833Google Scholar
  259. Yedidia JS, Freeman WT, Weiss Y (2003) Understanding belief propagation and its generalizations. In: Exploring AI in the new millennium. Morgan Kaufmann Publishers Inc, pp 239–269Google Scholar
  260. Zager L, Verghese G (2008) Graph similarity scoring and matching. Appl Math Lett 21(1):86–94CrossRefMATHMathSciNetGoogle Scholar
  261. Zhao P, Han J, Sun Y (2009) P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, China. ACM, pp 553–562Google Scholar
  262. Zhu B, Sastry S (2011) Revisit dynamic arima based anomaly detection. In: International conference on privacy, security, risk and trust (Social-Com/PASSAT), pp 1263–1268Google Scholar
  263. Zimek A, Schubert E, Kriegel H-P (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387CrossRefMathSciNetGoogle Scholar
  264. Zimek A, Campello RJGB, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions. A position paper. SIGKDD Explor Newsl 15(1):11–22CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA
  2. 2.Department of Computer Science, City CollegeCity University of New YorkNew YorkUSA
  3. 3.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations