Advertisement

Data Mining and Knowledge Discovery

, Volume 32, Issue 5, pp 1444–1480 | Cite as

Explaining anomalies in groups with characterizing subspace rules

  • Meghanath Macha
  • Leman Akoglu
Article
  • 259 Downloads
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2018

Abstract

Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces), which “reverse-engineers” the known anomalies by identifying (1) the groups (or patterns) that they form, and (2) the characterizing subspace and feature rules that separate each anomalous pattern from normal instances. Explaining anomalies in groups not only saves analyst time and gives insight into various types of anomalies, but also draws attention to potentially critical, repeating anomalies. In developing x-PACS, we first construct a desiderata for the anomaly description problem. From a descriptive data mining perspective, our method exhibits five desired properties in our desiderata. Namely, it can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by human analysts, (iv) different from normal patterns of the data, and finally (v) succinct, providing a short data description. No existing work on anomaly description satisfies all of these properties simultaneously. Furthermore, x-PACS is highly parallelizable; it is linear on the number of data points and exponential on the (typically small) largest characterizing subspace size. The anomalous patterns that x-PACS finds constitute interpretable “signatures”, and while it is not our primary goal, they can be used for anomaly detection. Through extensive experiments on real-world datasets, we show the effectiveness and superiority of x-PACS in anomaly explanation over various baselines, and demonstrate its competitive detection performance as compared to the state-of-the-art.

Keywords

Anomalies Interpretability Subspace clustering Minimum description length High dimensional data Submodular optimization 

Notes

Acknowledgements

This research is sponsored by NSF CAREER 1452425 and IIS 1408287, ARO Young Investigator Program under Contract No. W911NF-14-1-0029, and the PwC Risk and Regulatory Services Innovation Center at Carnegie Mellon University. Any conclusions expressed in this material are of the authors and do not necessarily reflect the views, either expressed or implied, of the funding parties.

References

  1. Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263Google Scholar
  2. Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD ’99. ACM, New York, NY, USA, pp 61–72Google Scholar
  3. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98. ACM, New York, NY, USA, pp 94–105Google Scholar
  4. Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst (TODS) 34(1):7:1–7:62CrossRefGoogle Scholar
  5. Angiulli F, Fassetti F, Palopoli L (2013) Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans Knowl Data Eng 25(6):1280–1292CrossRefGoogle Scholar
  6. Buchbinder N, Feldman M, Naor JS, Schwartz R (2014) Submodular maximization with cardinality constraints. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA ’14. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1433–1452Google Scholar
  7. Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99. ACM, New York, NY, USA, pp 84–93Google Scholar
  8. Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283Google Scholar
  9. Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings. Morgan Kaufmann, San Francisco, pp 115–123Google Scholar
  10. Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: 2014 IEEE 30th international conference on data engineering, pp 88–99Google Scholar
  11. Dang XH, Micenková B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 304–320CrossRefGoogle Scholar
  12. Dave V, Guha S, Zhang Y (2012) Measuring and fingerprinting click-spam in ad networks. In: Proceedings of the ACM SIGCOMM 2012 conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’12. ACM, New York, NY, USA, pp 175–186Google Scholar
  13. Deng H (2014) Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456
  14. Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE international conference on computer vision (ICCV), pp 3449–3457Google Scholar
  15. Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954MathSciNetCrossRefzbMATHGoogle Scholar
  16. Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Int Res 17(1):501–527zbMATHGoogle Scholar
  17. Gharan SO, Vondrák J (2011) Submodular maximization by simulated annealing. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA ’11. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1098–1116Google Scholar
  18. Görnitz N, Kloft M, Brefeld U (2009) Active and semi-supervised data domain description. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 407–422CrossRefGoogle Scholar
  19. Gnnemann S, Seidl T, Krieger R, Mller E, Assent I (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: 2009 Ninth IEEE international conference on data mining (ICDM), pp 377–386Google Scholar
  20. Hara S, Hayashi K (2016) Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390
  21. He J, Carbonell J (2010) Co-selection of features and instances for unsupervised rare category analysis. In: Proceedings of the 10th SIAM international conference on data mining, SDM 2010, pp 525–536Google Scholar
  22. He J, Tong H, Carbonell J (2010) Rare category characterization. In: 2010 IEEE international conference on data mining, pp 226–235Google Scholar
  23. Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRefGoogle Scholar
  24. Keller F, Muller E, Bohm K (2012) Hics: High contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048Google Scholar
  25. Keller F, Müller E, Wixler A, Böhm K (2013). Flexible and adaptive subspace search for outlier analysis. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13. ACM, New York, NY, USA, pp 1381–1390Google Scholar
  26. Klösgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 249–271Google Scholar
  27. Klösgen W, May M (2002) Census data miningan application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD), Helsinki, FinlandGoogle Scholar
  28. Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 211–222Google Scholar
  29. Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70 of Proceedings of machine learning research. International Convention Centre, Sydney, Australia, pp 1885–1894Google Scholar
  30. Kopp M, Pevnỳ T, Holeňa M (2014) Interpreting and clustering outliers with sapling random forests. In: ITAT 2014. European conference on information technologies—applications and theory. Institute of Computer Science AS CR, pp 61–67Google Scholar
  31. Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining (ICDM’05), p 8Google Scholar
  32. Kriegel H-P, Kröger P, Schubert E, Zimek A (2009a) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 831–838CrossRefGoogle Scholar
  33. Kriegel H-P, Kröger P, Zimek A (2009b) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58CrossRefGoogle Scholar
  34. Kriegel HP, Krger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388Google Scholar
  35. Kuo C-T, Davidson I (2016) A framework for outlier description using constraint programming. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1237–1243Google Scholar
  36. Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable and explorable approximations of black box models. CoRR, abs/1707.01154Google Scholar
  37. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, NY, USA, pp 157–166Google Scholar
  38. Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: AAAI international conference on weblogs and social media (ICWSM). CiteseerGoogle Scholar
  39. Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behaviour. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM, New York, NY, USA, pp 971–980Google Scholar
  40. Micenkov B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: 2013 IEEE 13th international conference on data mining, pp 518–527Google Scholar
  41. Moise G, Sander J, Ester M (2006) P3c: A robust projected clustering algorithm. In: Sixth international conference on data mining (ICDM’06), pp 414–425Google Scholar
  42. Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process Rev J 73:1–15MathSciNetCrossRefGoogle Scholar
  43. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: 7th International AAAI conference on weblogs and social media, ICWSM 2013. AAAI PressGoogle Scholar
  44. Muller E, Assent I, Steinhausen U, Seidl T (2008) Outrank: ranking outliers in high dimensional data. In: 2008 IEEE 24th international conference on data engineering workshop, pp 600–603Google Scholar
  45. Mller E, Assent I, Iglesias P, Mlle Y, Bhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: 2012 IEEE 12th international conference on data mining, pp 529–538Google Scholar
  46. Mller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: 2011 IEEE 27th international conference on data engineering, pp 434–445Google Scholar
  47. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105CrossRefGoogle Scholar
  48. Pelleg D, Moore AW (2000) \({X}\)-means: Extending \({K}\)-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 727–734Google Scholar
  49. Pevnỳ T, Kopp M (2014). Explaining anomalies with sapling random forests. In: Information technologies—applications and theory workshops, posters, and tutorials (ITAT 2014)Google Scholar
  50. Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144Google Scholar
  51. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471CrossRefzbMATHGoogle Scholar
  52. Sequeira K, Zaki M (2004). Schism: a new approach for interesting subspace mining. In: Data mining, 2004. ICDM ’04. Fourth IEEE international conference on data mining, pp 186–193Google Scholar
  53. Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, AbingdonGoogle Scholar
  54. Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66CrossRefzbMATHGoogle Scholar
  55. Ting KM, Liu FT, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining (ICDM), pp 413–422Google Scholar
  56. Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Disc 23(1):169–214MathSciNetCrossRefzbMATHGoogle Scholar
  57. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 78–87CrossRefGoogle Scholar
  58. Zhang H, Diao Y, Meliou A (2017) Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT), pp 156–167Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Heinz CollegeCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations