Skip to main content
Log in

An explainable outlier detection method using region-partition trees

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Most outlier detection methods output outlier score that measures the degree of deviation of a data sample from a normal data pattern. However, it is difficult to choose an optimal threshold on outlier scores by which outliers and normal data samples can be distinguished. In this paper, we propose a tree-based outlier detection method which computes normalized outlier scores for data samples. In particular, without the need to determine the threshold for outlier score it provides binary labels for outlier prediction. By using training data which consists of normal data samples, the proposed method builds a multi-way splitting tree, called region-partition tree (RP-tree), where normal data region is effectively described by the partition of data region into leaf nodes. By utilizing region-partition table (RP-table) which stores the information for splitting attributes and interval partition, RP-tree can be constructed so as to finely split the normal data region but keep the size of a tree be reasonably small. From the ensemble of RP-trees, the proposed method computes the normalized outlier scores ranging in [0, 1] and data samples with outlier score of 1 are predicted as outliers. Also it identifies the attributes responsible for outlier prediction. Experimental results demonstrate the outlier detection performance of the proposed method. The proposed method obtained an average F1-value of 0.72 and an AUC score of 0.96, while the second highest performance in the compared methods was an F1-value of 0.57 and an AUC score of 0.94, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Hawkins D (1980) Identification of outliers. Springer, Netherlands

    Book  Google Scholar 

  2. Aggarwal C (2017) Outlier analysis. Springer, Netherlands

    Book  Google Scholar 

  3. Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA

  4. March E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional ISTM neural networks. In Proceedings of ICASSP

  5. Yang J, Rahardja S, Franti P (2019) Outlier detection: how to threshold outlier scores?. In: Proceedings of the international conference on artificial intelligence, information processing and cloud computing

  6. Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th international conference on data mining

  7. Wu K, Zhang K, Fan W, Edwards A, Yu P (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th international conference on data mining

  8. Remy P (2016) Anomaly detection in time setries using auto encoders. Bolg positng from http://philipperemy.github.io/anomaly-detection. Accessed 2 Oct 2016

  9. Park C (2019) Outlier and anomaly pattern detection on data streams. J Supercomput 75:6118–6128

    Article  Google Scholar 

  10. Knorr E, Ng R(1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th international conference on very large databases

  11. Breunig M, Kriegel H, Ng J, Sander R (2000) LOF: Identifying density-based local outliers. In: Proceedings of the 2000 ACM sigmod international conference on management of data

  12. Jiang M, Tseng S, Su C (2001) Two-phase clustering process for outliers detection. Pattern recognition letters 22:691–700

    Article  Google Scholar 

  13. He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650

    Article  Google Scholar 

  14. Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML

  15. Wang H, Li X, Zhang T (2018) Generative adversarial network based novelty detection using minimized reconstruction error. Frontiers Inf Technol Electron Eng 19:116–125

    Article  Google Scholar 

  16. Zenati H, Romain M, Foo C, Lecouat B, Chandrasekhar V (2018) Adversarially learned anomaly detection. In: Proceedings of the ICDM

  17. Alla S, Adari S (2019) Practical use cases of anomaly detection beginning anomaly detection using python-based deep learning. Apress, Berkeley

    Book  Google Scholar 

  18. Susto G, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: Proceedings of the 28th annual semi advanced semiconductor manufacturing conference (ASMC)

  19. Ounacer S, Bour H, Oubrahim Y, Ghoumari M, Azzouazi M (2018) Using Isolation Forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394–400

    Google Scholar 

  20. https://www.kaggle.com/dalpozz/creditcardfraud

  21. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  22. Hawkins S, Hongxing H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of DaWaK

  23. Bife A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  24. https://archive.ics.uci.edu/ml/index.php

  25. https://www.openml.org

  26. Scholkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  Google Scholar 

  27. Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported in part by Korea Electric Power Corporation (Grant Number: R18XA05) and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1062341).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheong Hee Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, C.H., Kim, J. An explainable outlier detection method using region-partition trees. J Supercomput 77, 3062–3076 (2021). https://doi.org/10.1007/s11227-020-03384-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03384-x

Keywords

Navigation