An explainable outlier detection method using region-partition trees

Park, Cheong Hee; Kim, Jiil

doi:10.1007/s11227-020-03384-x

An explainable outlier detection method using region-partition trees

Published: 20 July 2020

Volume 77, pages 3062–3076, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

406 Accesses
7 Citations
Explore all metrics

Abstract

Most outlier detection methods output outlier score that measures the degree of deviation of a data sample from a normal data pattern. However, it is difficult to choose an optimal threshold on outlier scores by which outliers and normal data samples can be distinguished. In this paper, we propose a tree-based outlier detection method which computes normalized outlier scores for data samples. In particular, without the need to determine the threshold for outlier score it provides binary labels for outlier prediction. By using training data which consists of normal data samples, the proposed method builds a multi-way splitting tree, called region-partition tree (RP-tree), where normal data region is effectively described by the partition of data region into leaf nodes. By utilizing region-partition table (RP-table) which stores the information for splitting attributes and interval partition, RP-tree can be constructed so as to finely split the normal data region but keep the size of a tree be reasonably small. From the ensemble of RP-trees, the proposed method computes the normalized outlier scores ranging in [0, 1] and data samples with outlier score of 1 are predicted as outliers. Also it identifies the attributes responsible for outlier prediction. Experimental results demonstrate the outlier detection performance of the proposed method. The proposed method obtained an average F1-value of 0.72 and an AUC score of 0.96, while the second highest performance in the compared methods was an F1-value of 0.57 and an AUC score of 0.94, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

Outlier Detection Forest for Large-Scale Categorical Data Sets

References

Hawkins D (1980) Identification of outliers. Springer, Netherlands
Book Google Scholar
Aggarwal C (2017) Outlier analysis. Springer, Netherlands
Book Google Scholar
Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA
March E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional ISTM neural networks. In Proceedings of ICASSP
Yang J, Rahardja S, Franti P (2019) Outlier detection: how to threshold outlier scores?. In: Proceedings of the international conference on artificial intelligence, information processing and cloud computing
Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th international conference on data mining
Wu K, Zhang K, Fan W, Edwards A, Yu P (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th international conference on data mining
Remy P (2016) Anomaly detection in time setries using auto encoders. Bolg positng from http://philipperemy.github.io/anomaly-detection. Accessed 2 Oct 2016
Park C (2019) Outlier and anomaly pattern detection on data streams. J Supercomput 75:6118–6128
Article Google Scholar
Knorr E, Ng R(1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th international conference on very large databases
Breunig M, Kriegel H, Ng J, Sander R (2000) LOF: Identifying density-based local outliers. In: Proceedings of the 2000 ACM sigmod international conference on management of data
Jiang M, Tseng S, Su C (2001) Two-phase clustering process for outliers detection. Pattern recognition letters 22:691–700
Article Google Scholar
He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650
Article Google Scholar
Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML
Wang H, Li X, Zhang T (2018) Generative adversarial network based novelty detection using minimized reconstruction error. Frontiers Inf Technol Electron Eng 19:116–125
Article Google Scholar
Zenati H, Romain M, Foo C, Lecouat B, Chandrasekhar V (2018) Adversarially learned anomaly detection. In: Proceedings of the ICDM
Alla S, Adari S (2019) Practical use cases of anomaly detection beginning anomaly detection using python-based deep learning. Apress, Berkeley
Book Google Scholar
Susto G, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: Proceedings of the 28th annual semi advanced semiconductor manufacturing conference (ASMC)
Ounacer S, Bour H, Oubrahim Y, Ghoumari M, Azzouazi M (2018) Using Isolation Forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394–400
Google Scholar
https://www.kaggle.com/dalpozz/creditcardfraud
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Hawkins S, Hongxing H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of DaWaK
Bife A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
https://archive.ics.uci.edu/ml/index.php
https://www.openml.org
Scholkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article Google Scholar
Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported in part by Korea Electric Power Corporation (Grant Number: R18XA05) and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1062341).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chungnam National University, Daejeon, Korea
Cheong Hee Park & Jiil Kim

Authors

Cheong Hee Park
View author publications
You can also search for this author in PubMed Google Scholar
Jiil Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheong Hee Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, C.H., Kim, J. An explainable outlier detection method using region-partition trees. J Supercomput 77, 3062–3076 (2021). https://doi.org/10.1007/s11227-020-03384-x

Download citation

Published: 20 July 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11227-020-03384-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An explainable outlier detection method using region-partition trees

Abstract

Access this article

Similar content being viewed by others

Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

Outlier Detection Forest for Large-Scale Categorical Data Sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An explainable outlier detection method using region-partition trees

Abstract

Access this article

Similar content being viewed by others

Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

Outlier Detection Forest for Large-Scale Categorical Data Sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation