Abstract
Anomaly detection has evolved as a successful research subject in the areas such as bibliometrics, informatics and computer networks including security-based and social networks. Almost all existing anomaly detection techniques have some limitations and do not focus specifically on detecting anomalous groups. Anomaly detection is also a crucial problem in processing large-scale datasets when our goal is to find abnormal values or unusual events. The authors decided to survey existing group anomaly detection techniques because there is a need to consider group anomalies for mitigation of risks, prevention of malicious collaborative activities, and other interesting explanatory insights by identifying groups that are not consistent with regular group patterns. In this research, we bifurcated group anomaly detection techniques into activity-based and graph-based methods. The graphical methodologies are then further classified under static versus dynamic and attributed versus plain graph methods. We have also listed the datasets used in various studies to detect group anomalies along with detected anomalies and the various performance measures used to validate the results. Towards the end, we have provided various applications of group anomaly detection and the research challenges that group anomaly detection presents to the scientific community and enlisted some of the future trends for this particular research area.
Similar content being viewed by others
Notes
One Class-Support vector machine detect anomalous behavior [103].
Density-based local outliers [104].
Statistics of extremes: theory and applications [105].
Structural comparison measure of objects by Jeh and Widom [106].
Ranking algorithm for web search engine proposed by Brin and Page [107].
Model measures the compression of data proposed by Rissanen [108].
“The law of anomalous numbers” by Frank Benford [109].
Singular value decomposition matrix for determining rank and range of data proposed by Golub and Reinsch [110].
en.wikipedia.org/wiki/Kolmogorov–Smirnov_test.
metacademy.org/graphs/concepts/f_measure.
Modularity by Newman and Girvan [111].
en.wikipedia.org/wiki/Perplexity.
H-index measures citation impact of the author's publications by Hirsch [112].
References
Muandet K, Schölkopf B. One-class support measure machines for group anomaly detection. 2013. arXiv preprint arXiv:1303.0309.
Tong H, Papadimitriou S, Sun J, Yu PS, Faloutsos C. Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, p. 686–94. ACM; 2008.
Kuppa A, Grzonkowski S, Asghar MR, Le-Khac NA. Finding rats in cats: detecting stealthy attacks using group anomaly detection. In: 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), p. 442–449. IEEE; 2019.
He Z, Xu X, Deng S. Discovering cluster-based local outliers. Pattern Recogn Lett. 2003;24(9):1641–50.
Eberle W, Holder L, Massengill B. Graph-based anomaly detection applied to homeland security cargo screening. In: Twenty-Fifth International FLAIRS Conference. 2012.
Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L. Anomaly detection using autoencoders in high performance computing systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, p. 9428–33. 2019.
Duan D, Li Y, Jin Y, Lu Z. Community mining on dynamic weighted directed graphs. In: Proceedings of the 1st ACM International Workshop on Complex Networks Meet Information and Knowledge Management, p. 11–8. ACM; 2009.
Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C. Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 895–904. ACM; 2016.
Zhao P, Han J, Sun Y. P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, p. 553–62. ACM; 2009.
Kurt MN, Yilmaz Y, Wang X. Real-time nonparametric anomaly detection in high-dimensional settings. IEEE Trans Pattern Anal Mach Intell. 2020. https://doi.org/10.1109/TPAMI.2020.2970410.
Du B, Zhang S, Cao N, Tong H. First: fast interactive attributed subgraph matching. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1447–56. ACM; 2017.
Chalapathy R, Chawla S. Deep learning for anomaly detection: a survey. 2019. arXiv preprint arXiv:1901.03407.
Pang G, Shen C, Cao L, Hengel AVD. Deep learning for anomaly detection: a review. 2020. arXiv preprint arXiv:2007.02500.
Noble CC, Cook DJ. Graph-based anomaly detection. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 631–6. ACM; 2003.
Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv (CSUR). 2009;41(3):15.
Salehi M, Rashidi L. A survey on anomaly detection in evolving data: [with application to forest fire risk prediction]. ACM SIGKDD Explor Newsl. 2018;20(1):13–23.
Yu R, Qiu H, Wen Z, Lin C, Liu Y. A survey on social media anomaly detection. ACM SIGKDD Explor Newsl. 2016;18(1):1–14.
Yu R, He X, Liu Y. Glad: group anomaly detection in social media analysis. ACM Trans Knowl Discov Data (TKDD). 2015;10(2):18.
Chalupsky H. Unsupervised link discovery in multi-relational data via rarity analysis. In: Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, p. 171–8. IEEE; 2003.
Macskassy SA, Provost F. A simple relational classifier. In: Proceedings of the KDD-Workshop on Multi-Relational Data Mining (MRDM), Washington, DC, p. 64–76. 2003.
Bergman L, Hoshen Y. Classification-based anomaly detection for general data. 2020. arXiv preprint arXiv:2005.02359.
Toth E, Chawla S. Group deviation detection methods: a survey. ACM Comput Surv (CSUR). 2018;51(4):1–38.
Kaur R, Singh S. A survey of data mining and social network analysis based anomaly detection techniques. Egypt Inform J. 2016;17(2):199–216.
Manzoor E, Lamba H, Akoglu L. xstream: Outlier detection in feature-evolving data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1963–72. 2018.
Maurus S, Plant C. Let's see your digits: anomalous-state detection using Benford's law. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 977–86. ACM; 2017.
Siffer A, Fouque PA, Termier A, Largouet C. Anomaly detection in streams with extreme value theory. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1067–75. ACM; 2017.
Xiong L, Poczos B, Schneider J, Connolly A, Vanderplas J. Hierarchical probabilistic models for group anomaly detection. In: JMLR WCP Proceedings of the International Conference on Artificial Intelligence and Statistics AISTATS, vol. 15, p. 789–97. 2011.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.
Xiong L, Póczos B, Schneider J. Group anomaly detection using flexible genre models. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS. 1–9. 2011.
Angiulli F, Pizzuti C. Fast outlier detection in high dimensional spaces. In: European Conference on Principles of Data Mining and Knowledge Discovery, p. 15–27. Berlin: Springer; 2002
Song W, Dong W, Kang L. Group anomaly detection based on Bayesian framework with genetic algorithm. Inf Sci. 2020;533:138–49.
Nachman B, Shih D. Anomaly detection with density estimation. Phys Rev D. 2020;101(7):075042.
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel AVD. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE International Conference on Computer Vision, p. 1705–14. 2019.
Sun X, Yin H, Liu B, Chen H, Cao J, Shao Y, Hung NQV. Heterogeneous hypergraph embedding for graph classification. 2020. arXiv preprint arXiv:2010.10728.
Liu B, Sun X, Ni Z, Cao J, Luo J, Liu B, Fu X. Co-detection of crowdturfing microblogs and spammers in online social networks. World Wide Web. 2020;23(1):573–607.
Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY. Measurement-calibrated graph models for social network experiments. In: Proceedings of the 19th International Conference on World Wide Web, p. 861–70. ACM; 2010.
Warrender C, Forrest S, Pearlmutter B. Detecting intrusions using system calls: Alternative data models. In: Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on, p. 133–45. IEEE; 1999.
Eswaran D, Faloutsos C, Guha S, Mishra N. Spotlight: detecting anomalies in streaming graphs. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1378–1386. 2018.
Babbar S, Surian D, Chawla S. A causal approach for mining interesting anomalies. In: Canadian Conference on Artificial Intelligence, p. 226–232. Berlin: Springer; 2013
Bakshy E, Rosenn I, Marlow C, Adamic L. The role of social networks in information diffusion. In: Proceedings of the 21st International Conference on World Wide Web, p. 519–28. ACM; 2012.
Turner R, Ghahramani Z, Bottone S. Fast online anomaly detection using scan statistics. In: Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop on, p. 385–90. IEEE; 2010.
Friedland L, Jensen D. Finding tribes: identifying close-knit individuals from employment patterns. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 290–9. ACM; 2007.
Das K, Schneider J. Detecting anomalous records in categorical datasets. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 220–9. ACM; 2007.
Boniol P, Linardi M, Roncallo F, Palpanas T. Automated Anomaly Detection in Large Sequences. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), p. 1834–7. IEEE; 2020.
Lin J, Keogh E, Lonardi S, Chiu B. A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, p. 2–11. ACM; 2003.
Chen HH, Giles CL. ASCOS: an asymmetric network structure context similarity measure. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, p. 442–449. IEEE; 2013.
Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J. Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data (TKDD). 2011;5(2):8.
Chau DH, Akoglu L, Vreeken J, Tong H, Faloutsos C. TourViz: interactive visualization of connection pathways in large graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1516–9. 2012.
Chakrabarti D, Zhan Y, Faloutsos C. RMAT: a recursive model for graph mining. In: SIAM International Conference on Data Mining. 2004.
Rattigan MJ, Jensen D. The case for anomalous link discovery. ACM SIGKDD Explor Newsl. 2005;7(2):41–7.
Eberle W, Holder L. Discovering structural anomalies in graph-based data. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, p. 393–8. IEEE; 2007.
Maruhashi K, Guo F, Faloutsos C. Multiaspectforensics: pattern mining on large-scale heterogeneous networks with tensor analysis. In: Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on, p. 203–210. IEEE; 2011.
Atzmueller M, Doerfel S, Mitzlaff F. Description-oriented community detection using exhaustive subgroup discovery. Inf Sci. 2016;329:965–84.
Tantipathananandh C, Berger-Wolf TY. Finding communities in dynamic social networks. In: Data Mining (ICDM), 2011 IEEE 11th International Conference on, p. 1236–41. IEEE; 2011.
Araujo M, Papadimitriou S, Günnemann S, Faloutsos C, Basu P, Swami A, Koutra D. Com2: fast automatic discovery of temporal (‘comet’) communities. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, p. 271–83. Springer, Cham; 2014.
Fan J, Zhang Q, Zhu J, Zhang M, Yang Z, Cao H. Robust deep auto-encoding Gaussian process regression for unsupervised anomaly detection. Neurocomputing. 2020;376:180–90.
Eswaran D, Faloutsos C. Sedanspot: detecting anomalies in edge streams. In: 2018 IEEE International Conference on Data Mining (ICDM), p. 953–958. IEEE; 2018.
Fernandes G, Rodrigues JJ, Carvalho LF, Al-Muhtadi JF, Proença ML. A comprehensive survey on network anomaly detection. Telecommun Syst. 2019;70(3):447–89.
Liu Z, Yu JX, Ke Y, Lin X, Chen L. Spotting significant changing subgraphs in evolving graphs. In: Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, p. 917–922. IEEE; 2008.
Tong H, Lin CY. Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining, p. 143–53. Society for Industrial and Applied Mathematics; 2011.
Kim MS, Han J. CHRONICLE: a two-stage density-based clustering algorithm for dynamic networks. In: Discovery science, DS Springer, p. 152–67. 2009.
Sun H, Huang J, Han J, Deng H, Zhao P, Feng B. gskeletonclu: Density-based network clustering via structure-connected tree division or agglomeration. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on, p. 481–90. IEEE; 2010.
Xu X, Yuruk N, Feng Z, Schweiger TA. Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 824–33. ACM; 2007.
Akoglu L, Chau DH, Vreeken J, Tatti N, Tong H, Faloutsos C. Mining connection pathways for marked nodes in large graphs. In: Proceedings of the 2013 SIAM International Conference on Data Mining, p. 37–45. Society for Industrial and Applied Mathematics; 2013.
Bontemps L, McDermott J, Le-Khac NA. Collective anomaly detection based on long short-term memory recurrent neural networks. In: International Conference on Future Data and Security Engineering, p. 141–52. Cham: Springer; 2016.
Forbes AG, Burks A, Lee K, Li X, Boutillier P, Krivine J, Fontana W. Dynamic influence networks for rule-based models. IEEE Trans Visual Comput Graphics. 2018;24(1):184–94.
Rossi RA, Gallagher B, Neville J, Henderson K. Modeling dynamic behavior in large evolving graphs. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, p. 667–76. ACM; 2013.
Cheng H, Tan PN, Potter C, Klooster S. A robust graph-based algorithm for detection and characterization of anomalies in noisy multivariate time series. In: Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on, p. 349–358. IEEE; 2008.
Mongiovi M, Bogdanov P, Ranca R, Papalexakis EE, Faloutsos C, Singh AK. Netspot: spotting significant anomalous regions on dynamic networks. In: Proceedings of the 2013 SIAM International Conference on Data Mining, p. 28–36. Society for Industrial and Applied Mathematics; 2013.
Guille A, Favre C. Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. Soc Netw Anal Min. 2015;5(1):18.
Tielenburg N. Automating outlier detection in academic publishing. Master's Thesis, Open Universiteit Nederland. 2017.
Hochenbaum J, Vallis OS, Kejariwal A. Automatic anomaly detection in the cloud via statistical learning. 2017. arXiv preprint arXiv:1704.07706.
Yuan Q, Zhang W, Zhang C, Geng X, Cong G, Han J. Pred: periodic region detection for mobility modeling of social media users. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, p. 263–72. ACM; 2017.
Jiang R, Fei H, Huan J. Anomaly localization for network data streams with graph joint sparse PCA. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 886–94. ACM; 2011.
Sun J, Tao D, Faloutsos C. Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 374–83. ACM; 2006.
Chalapathy R, Toth E, Chawla S. Group anomaly detection using deep generative models. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, p. 173–89. Cham: Springer; 2018.
Tonta Y, Darvish HR. Diffusion of latent semantic analysis as a research tool: a social network analysis approach. J Informetr. 2010;4(2):166–74.
Thudumu S, Branch P, Jin J, Singh JJ. A comprehensive survey of anomaly detection techniques for high dimensional big data. J Big Data. 2020;7(1):1–30.
Kriegel HP, Kröger P, Schubert E, Zimek A. Outlier detection in axis-parallel subspaces of high dimensional data. Adv Knowl Discov Data Mining. 2009. https://doi.org/10.1007/978-3-642-01307-2_86.
Aggarwal CC. High-dimensional outlier detection: the subspace method. In: Outlier analysis. Springer International Publishing; 2017. p. 149–84.
Akoglu L, Tong H, Koutra D. Graph-based anomaly detection and description: a survey. Data Min Knowl Discov. 2015;29(3):626–88.
Muller E, Assent I, Iglesias P, Mulle Y, Bohm K. Outlier ranking via subspace analysis in multiple views of the data. In: Data Mining (ICDM), 2012 IEEE 12th International Conference on, p. 529–38. IEEE; 2012.
Meng G, Liu Y, Zhang J, Pokluda A, Boutaba R. Collaborative security: a survey and taxonomy. ACM Comput Surv (CSUR). 2015;48(1):1.
McFowland E, Speakman S, Neill DB. Fast generalized subset scan for anomalous pattern detection. J Mach Learn Res. 2013;14(1):1533–61.
Zheng L, Li Z, Li J, Li Z, Gao J. AddGraph: anomaly detection in dynamic graph using attention-based temporal GCN. In: IJCAI, p. 4419–25. 2019.
Yan E, Ding Y, Jacob EK. Overlaying communities and topics: an analysis on publication networks. Scientometrics. 2012;90(2):499–513.
Pereira DA, Ribeiro-Neto B, Ziviani N, Laender AH, Gonçalves MA, Ferreira AA. Using web information for author name disambiguation. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital libraries, p. 49–58. ACM; 2009.
Hayat MK, Daud A. Anomaly detection in heterogeneous bibliographic information networks using co-evolution pattern mining. Scientometrics. 2017;113(1):149–75.
Zhang D. PRAAG algorithm in anomaly detection. Kommunikationsteknik, Sweden. 2016. 1–56.
Daud A. Using time topic modeling for semantics-based dynamic research interest finding. Knowl Based Syst. 2012;26:154–63.
Amjad T, Ding Y, Daud A, Xu J, Malic V. Topic-based heterogeneous rank. Scientometrics. 2015;104(1):313–34.
Amjad T, Daud A, Che D, Akram A. MuICE: mutual influence and citation exclusivity author rank. Inf Process Manag. 2016;52(3):374–86.
Amjad T, Daud A, Akram A, Muhammed F. Impact of mutual influence while ranking authors in a co-authorship network. Kuwait J Sci 2016;43(3):101–109.
Amjad T, Daud A, Aljohani NR. Ranking authors in academic social networks: a survey. Libr Hi Tech. 2018;36(1):97–128.
Daud A, Amjad T, Siddiqui MA, Aljohani NR, Abbasi RA, Aslam MA. Correlational analysis of topic specificity and citations count of publication venues. Libr Hi Tech. 2019;37(1):8–18.
Amjad T, Daud A. Indexing of authors according to their domain of expertise. Malays J Libr Inf Sci. 2017;22(1):69–82.
Daud A, Aljohani NR, Abbasi RA, Rafique Z, Amjad T, Dawood H, Alyoubi KH. Finding rising stars in co-author networks via weighted mutual influence. In: Proceedings of the 26th International Conference on World Wide Web Companion, p. 33–41. International World Wide Web Conferences Steering Committee; 2017.
Daud A, Ahmed W, Amjad T, Nasir JA, Aljohani NR, Abbasi RA, Ahmad I. Who will cite you back? Reciprocal link prediction in citation networks. Libr Hi Tech. 2017;35(4):509–20.
Airoldi EM, Blei DM, Fienberg SE, Xing EP, Jaakkola T. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In: Proceedings of the international biometrics society annual meeting, vol. 15. 2006.
Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann Math Stat. 1956;27(3):832–837.
Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling of cellular signalling. In International conference on concurrency theory (pp. 17–41). Springer, Berlin, Heidelberg. 2007.
Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling, symmetries, refinements. In International Workshop on Formal Methods in Systems Biology (pp. 103–122). Springer, Berlin, Heidelberg. 2008.
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural computat. 2001;13(7):1443–1471.
Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 93–104) 2000.
Beirlant J, Goegebeur Y, Segers J, Teugels JL. Statistics of extremes: theory and applications. John Wiley & Sons;2006.
Jeh G, Widom J. Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 538–543) 2002.
Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput netw ISDN systems. 1998;30(1–7):107–117.
Rissanen J. Hypothesis selection and testing by the MDL principle. Comput J. 1999;42:260–269.
Benford F. The Law of Anomalous Numbers. Proc Am Philos Soc. 1938;78:551–572.
Golub GH, Reinsch C. Handbook series linear algebra. Singular value decomposition and least squares solutions, Numer Math. 1970;14:403–420.
Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69(2): 026113.
Hirsch JE (2005). An index to quantify an individual's scientific research output. Proc Nat Acad Sci. 2005;102(46):16569–16572.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Feroze, A., Daud, A., Amjad, T. et al. Group Anomaly Detection: Past Notions, Present Insights, and Future Prospects. SN COMPUT. SCI. 2, 219 (2021). https://doi.org/10.1007/s42979-021-00603-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00603-x