Explaining anomalies in groups with characterizing subspace rules

Abstract

Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces), which “reverse-engineers” the known anomalies by identifying (1) the groups (or patterns) that they form, and (2) the characterizing subspace and feature rules that separate each anomalous pattern from normal instances. Explaining anomalies in groups not only saves analyst time and gives insight into various types of anomalies, but also draws attention to potentially critical, repeating anomalies. In developing x-PACS, we first construct a desiderata for the anomaly description problem. From a descriptive data mining perspective, our method exhibits five desired properties in our desiderata. Namely, it can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by human analysts, (iv) different from normal patterns of the data, and finally (v) succinct, providing a short data description. No existing work on anomaly description satisfies all of these properties simultaneously. Furthermore, x-PACS is highly parallelizable; it is linear on the number of data points and exponential on the (typically small) largest characterizing subspace size. The anomalous patterns that x-PACS finds constitute interpretable “signatures”, and while it is not our primary goal, they can be used for anomaly detection. Through extensive experiments on real-world datasets, we show the effectiveness and superiority of x-PACS in anomaly explanation over various baselines, and demonstrate its competitive detection performance as compared to the state-of-the-art.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    In this text, phrases ‘anomalous pattern’, ‘clustered anomalies’, and ‘group of anomalies’ are interchangeable.

  2. 2.

    X- refers to the number of packs, which we automatically identify via our data encoding scheme (Sect. 3.3). We use this naming convention after X-means, Pelleg and Moore (2000), which finds the number of k-means clusters automatically in an information-theoretic way.

  3. 3.

    KDE involves two parameters—the number of points sampled to construct the smooth curve and the kernel bandwidth. We set the sample size to 512 points and use the Silverman’s rule of thumb (Silverman 2018) to set the bandwidth.

  4. 4.

    For categorical features, we would instead use histogram density estimation.

  5. 5.

    We use \(\alpha =\{10^{-6}, 10^{-5},\ldots , 1\} \times \lambda =\{10^{-3}, 10^{-2},\ldots , 10^3\}\).

  6. 6.

    If the anomalous patterns are to be used for detection, we estimate a full \(\mathbf {U}\) matrix (i.e., possibly rotated ellipsoid).

  7. 7.

    Value of f is chosen according to the required floating point precision in the normalized feature space \(\mathbb {R}^{d}\).

  8. 8.

    Cost of encoding an arbitrary integer K is \(L_{\mathbb {N}}(K) = \log ^\star (K) + \log _2(c)\), where \(c \approx 2.865064\) and \(\log ^\star (K) = \log _2(K) + \log _2(\log _2(K)) + \ldots \) summing only the positive terms (Rissanen 1978). We drop \(\log _2(c)\) as it is constant for all packings.

  9. 9.

    Another way to identify the normal points in a pack: sort points by their distance to center and send the index of normal points in this list of length \(m_k\). This costs more for \(n_k\ge 2\): \(n_k\log _2 m_k> \log _2 \frac{m_k^{n_k}}{n_k!} > \log _2 \left( {\begin{array}{c}m_k\\ n_k\end{array}}\right) \).

  10. 10.

    Intuitively, this is where \(R_{\ell }\) drops when we add a new pack to \(\mathcal {S}\) (with positive cost) that does not cover any new anomalies.

  11. 11.

    For instance, if we have t \(d_{\max }\)-dimensional hyper-rectangles, then the complexity would be \(O(t2^{d_{\max }} + md_{\max })\), we could rewrite this as \(O(c^{d_{\max }} + md_{\max })\).

  12. 12.

    In practice, the solver converges in 20–100 iterations.

  13. 13.

    R package pre : https://CRAN.R-project.org/package=pre.

  14. 14.

    Note that, like any supervised method, x-PACS could only detect future instances of anomalies of known types.

  15. 15.

    SVDD optimization diverged for some high dimensional datasets, therefore, we performed PCA as a preprocessing step.

References

  1. Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263

    Google Scholar 

  2. Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD ’99. ACM, New York, NY, USA, pp 61–72

  3. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98. ACM, New York, NY, USA, pp 94–105

  4. Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst (TODS) 34(1):7:1–7:62

    Article  Google Scholar 

  5. Angiulli F, Fassetti F, Palopoli L (2013) Discovering characterizations of the behavior of anomalous subpopulations. IEEE Trans Knowl Data Eng 25(6):1280–1292

    Article  Google Scholar 

  6. Buchbinder N, Feldman M, Naor JS, Schwartz R (2014) Submodular maximization with cardinality constraints. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA ’14. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1433–1452

  7. Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99. ACM, New York, NY, USA, pp 84–93

  8. Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3(4):261–283

    Google Scholar 

  9. Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings. Morgan Kaufmann, San Francisco, pp 115–123

    Google Scholar 

  10. Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: 2014 IEEE 30th international conference on data engineering, pp 88–99

  11. Dang XH, Micenková B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 304–320

    Google Scholar 

  12. Dave V, Guha S, Zhang Y (2012) Measuring and fingerprinting click-spam in ad networks. In: Proceedings of the ACM SIGCOMM 2012 conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’12. ACM, New York, NY, USA, pp 175–186

  13. Deng H (2014) Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456

  14. Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE international conference on computer vision (ICCV), pp 3449–3457

  15. Friedman JH, Popescu BE (2008) Predictive learning via rule ensembles. Ann Appl Stat 2(3):916–954

    MathSciNet  Article  MATH  Google Scholar 

  16. Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Int Res 17(1):501–527

    MATH  Google Scholar 

  17. Gharan SO, Vondrák J (2011) Submodular maximization by simulated annealing. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms, SODA ’11. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1098–1116

  18. Görnitz N, Kloft M, Brefeld U (2009) Active and semi-supervised data domain description. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, pp 407–422

    Google Scholar 

  19. Gnnemann S, Seidl T, Krieger R, Mller E, Assent I (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: 2009 Ninth IEEE international conference on data mining (ICDM), pp 377–386

  20. Hara S, Hayashi K (2016) Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390

  21. He J, Carbonell J (2010) Co-selection of features and instances for unsupervised rare category analysis. In: Proceedings of the 10th SIAM international conference on data mining, SDM 2010, pp 525–536

  22. He J, Tong H, Carbonell J (2010) Rare category characterization. In: 2010 IEEE international conference on data mining, pp 226–235

  23. Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525

    Article  Google Scholar 

  24. Keller F, Muller E, Bohm K (2012) Hics: High contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048

  25. Keller F, Müller E, Wixler A, Böhm K (2013). Flexible and adaptive subspace search for outlier analysis. In: Proceedings of the 22nd ACM international conference on information and knowledge management, CIKM ’13. ACM, New York, NY, USA, pp 1381–1390

  26. Klösgen W (1996) Explora: A multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, USA, pp 249–271

  27. Klösgen W, May M (2002) Census data miningan application. In: Proceedings of the 6th European conference on principles and practice of knowledge discovery in databases (PKDD), Helsinki, Finland

  28. Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 211–222

  29. Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, vol 70 of Proceedings of machine learning research. International Convention Centre, Sydney, Australia, pp 1885–1894

  30. Kopp M, Pevnỳ T, Holeňa M (2014) Interpreting and clustering outliers with sapling random forests. In: ITAT 2014. European conference on information technologies—applications and theory. Institute of Computer Science AS CR, pp 61–67

  31. Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining (ICDM’05), p 8

  32. Kriegel H-P, Kröger P, Schubert E, Zimek A (2009a) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 831–838

    Google Scholar 

  33. Kriegel H-P, Kröger P, Zimek A (2009b) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58

    Article  Google Scholar 

  34. Kriegel HP, Krger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388

  35. Kuo C-T, Davidson I (2016) A framework for outlier description using constraint programming. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. AAAI Press, pp 1237–1243

  36. Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable and explorable approximations of black box models. CoRR, abs/1707.01154

  37. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD ’05. ACM, New York, NY, USA, pp 157–166

  38. Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on twitter. In: AAAI international conference on weblogs and social media (ICWSM). Citeseer

  39. Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behaviour. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM, New York, NY, USA, pp 971–980

  40. Micenkov B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: 2013 IEEE 13th international conference on data mining, pp 518–527

  41. Moise G, Sander J, Ester M (2006) P3c: A robust projected clustering algorithm. In: Sixth international conference on data mining (ICDM’06), pp 414–425

  42. Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digit Signal Process Rev J 73:1–15

    MathSciNet  Article  Google Scholar 

  43. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing? In: 7th International AAAI conference on weblogs and social media, ICWSM 2013. AAAI Press

  44. Muller E, Assent I, Steinhausen U, Seidl T (2008) Outrank: ranking outliers in high dimensional data. In: 2008 IEEE 24th international conference on data engineering workshop, pp 600–603

  45. Mller E, Assent I, Iglesias P, Mlle Y, Bhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: 2012 IEEE 12th international conference on data mining, pp 529–538

  46. Mller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: 2011 IEEE 27th international conference on data engineering, pp 434–445

  47. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor Newsl 6(1):90–105

    Article  Google Scholar 

  48. Pelleg D, Moore AW (2000) \({X}\)-means: Extending \({K}\)-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 727–734

  49. Pevnỳ T, Kopp M (2014). Explaining anomalies with sapling random forests. In: Information technologies—applications and theory workshops, posters, and tutorials (ITAT 2014)

  50. Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144

  51. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471

    Article  MATH  Google Scholar 

  52. Sequeira K, Zaki M (2004). Schism: a new approach for interesting subspace mining. In: Data mining, 2004. ICDM ’04. Fourth IEEE international conference on data mining, pp 186–193

  53. Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Abingdon

    Google Scholar 

  54. Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  MATH  Google Scholar 

  55. Ting KM, Liu FT, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining (ICDM), pp 413–422

  56. Vreeken J, van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Disc 23(1):169–214

    MathSciNet  Article  MATH  Google Scholar 

  57. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 78–87

    Google Scholar 

  58. Zhang H, Diao Y, Meliou A (2017) Exstream: Explaining anomalies in event stream monitoring. In: Proceedings of the 20th international conference on extending database technology (EDBT), pp 156–167

Download references

Acknowledgements

This research is sponsored by NSF CAREER 1452425 and IIS 1408287, ARO Young Investigator Program under Contract No. W911NF-14-1-0029, and the PwC Risk and Regulatory Services Innovation Center at Carnegie Mellon University. Any conclusions expressed in this material are of the authors and do not necessarily reflect the views, either expressed or implied, of the funding parties.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Meghanath Macha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editors Jesse Davis, Elisa Fromont, Derek Greene, Bjørn Bringmann.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Macha, M., Akoglu, L. Explaining anomalies in groups with characterizing subspace rules. Data Min Knowl Disc 32, 1444–1480 (2018). https://doi.org/10.1007/s10618-018-0585-7

Download citation

Keywords

  • Anomalies
  • Interpretability
  • Subspace clustering
  • Minimum description length
  • High dimensional data
  • Submodular optimization