Advertisement

Beyond Outlier Detection: LookOut for Pictorial Explanation

  • Nikhil GuptaEmail author
  • Dhivya Eswaran
  • Neil Shah
  • Leman Akoglu
  • Christos Faloutsos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

Why is a given point in a dataset marked as an outlier by an off-the-shelf detection algorithm? Which feature(s) explain it the best? What is the best way to convince a human analyst that the point is indeed an outlier? We provide succinct, interpretable, and simple pictorial explanations of outlying behavior in multi-dimensional real-valued datasets while respecting the limited attention of human analysts. Specifically, we propose to output a few focus-plots, i.e., pairwise feature plots, from a few, carefully chosen feature sub-spaces. The proposed LookOut makes four contributions: (a) problem formulation: we introduce an “analyst-centered” problem formulation for explaining outliers via focus-plots, (b) explanation algorithm: we propose a plot-selection objective and the LookOut algorithm to approximate it with optimality guarantees, (c) generality: our explanation algorithm is both domain- and detector-agnostic, and (d) scalability: LookOut scales linearly with the size of input outliers to explain and the explanation budget. Our experiments show that LookOut performs near-ideally in terms of maximizing explanation objective on several real datasets, while producing visually interpretable and intuitive results in explaining groundtruth outliers. Code related to this paper is available at: https://github.com/NikhilGupta1997/Lookout.

Keywords

Outlier detection Pictorial explanation Interpretability 

Notes

Acknowledgments

This material is based upon work supported by the National Science Foundation (NSF) under Grants No. CNS-1314632, IIS-1408924 and IIS 1408287, by NSF CAREER 1452425 and the PwC Risk and Regulatory Services Innovation Center at Carnegie Mellon University. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  1. 1.
    Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013).  https://doi.org/10.1007/978-1-4614-6396-2CrossRefzbMATHGoogle Scholar
  2. 2.
    Akoglu, L., McGlohon, M., Faloutsos, C.: oddball: spotting anomalies in weighted graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 410–421. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13672-6_40CrossRefGoogle Scholar
  3. 3.
    Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. 29(3), 626–688 (2015)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Angiulli, F., Fassetti, F., Palopoli, L.: Discovering characterizations of the behavior of anomalous subpopulations. IEEE TKDE 25(6), 1280–1292 (2013)Google Scholar
  5. 5.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000)CrossRefGoogle Scholar
  6. 6.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  7. 7.
    Dang, X.H., Assent, I., Ng, R.T., Zimek, A., Schubert, E.: Discriminative features for identifying and interpreting outliers. In: ICDE, pp. 88–99 (2014)Google Scholar
  8. 8.
    Dang, X.H., Micenková, B., Assent, I., Ng, R.T.: Local outlier detection with interpretation. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 304–320. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40994-3_20CrossRefGoogle Scholar
  9. 9.
    Emmott, A.F., Das, S., Dietterich, T., Fern, A., Wong, W.K.: Systematic construction of anomaly detection benchmarks from real data. In: KDD Workshop on Outlier Detection and Description (2013)Google Scholar
  10. 10.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1979)zbMATHGoogle Scholar
  11. 11.
    Giatsoglou, M., Chatzakou, D., Shah, N., Beutel, A., Faloutsos, C., Vakali, A.: ND-Sync: detecting synchronized fraud activities. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 201–214. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18032-8_16CrossRefGoogle Scholar
  12. 12.
    Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)CrossRefGoogle Scholar
  13. 13.
    Traina Jr., C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. JIDM 1(1), 3–16 (2010)Google Scholar
  14. 14.
    Keller, F., Müller, E., Wixler, A., Böhm, K.: Flexible and adaptive subspace search for outlier analysis. In: CIKM, pp. 1381–1390. ACM (2013)Google Scholar
  15. 15.
    Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: VLDB, pp. 211–222 (1999)Google Scholar
  16. 16.
    Kopp, M., Pevný, T., Holena, M.: Interpreting and clustering outliers with sapling random forests. In: ITAT (2014)Google Scholar
  17. 17.
    Kuo, C.T., Davidson, I.: A framework for outlier description using constraint programming. In: AAAI, pp. 1237–1243 (2016)Google Scholar
  18. 18.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM, pp. 413–422 (2008)Google Scholar
  19. 19.
    Micenková, B., Ng, R.T., Dang, X.H., Assent, I.: Explaining outliers by subspace separability. In: ICDM, pp. 518–527 (2013)Google Scholar
  20. 20.
    Miller, G.: The magic number seven plus or minus two: some limits on our automatization of cognitive skills. Psychol. Rev. 63, 81–97 (1956)CrossRefGoogle Scholar
  21. 21.
    Nemhauser, G.L., Wolsey, L.A.: Best algorithms for approximating the maximum of a submodular set function. Math. Oper. Res. 3(3), 177–188 (1978)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Shah, N., et al.: EdgeCentric: anomaly detection in edge-attributed networks. In: ICDM Workshops, pp. 327–334. IEEE (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Nikhil Gupta
    • 1
    Email author
  • Dhivya Eswaran
    • 2
  • Neil Shah
    • 3
  • Leman Akoglu
    • 2
  • Christos Faloutsos
    • 2
  1. 1.IIT DelhiNew DelhiIndia
  2. 2.CMUPittsburghUSA
  3. 3.Snap Inc.Santa MonicaUSA

Personalised recommendations