Outlier detection using binary decision diagrams

Kutsuna, Takuro; Yamamoto, Akihiro

doi:10.1007/s10618-016-0486-6

Outlier detection using binary decision diagrams

Published: 17 November 2016

Volume 31, pages 548–572, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Takuro Kutsuna¹ &
Akihiro Yamamoto²

929 Accesses
2 Citations
Explore all metrics

Abstract

We propose a novel method for outlier detection using binary decision diagrams. Leave-one-out density is proposed as a new measure for detecting outliers, which is defined as a ratio of the number of data elements inside a region to the volume of the region after a focused datum is removed. We show that leave-one-out density can be evaluated very efficiently on a set of regions around each datum in a given dataset by using binary decision diagrams. The time complexity of the proposed method is nearly linear with respect to the size of the dataset, while the outlier detection accuracy is still comparable to that of other methods. Experimental results show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Smoothing a Boolean function f with respect to x means \((f \wedge x) \vee (f \wedge \overline{x})\).

References

Aryal S, Ting KM, Wells JR, Washio T (2014) Improving iforest with relative mass. In: Tseng VS, Ho TB, Zhou ZH, Chen AL, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 510–521
Chapter Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 24 June 2014
Bay SD (2003) Orca: a program for mining distance-based outliers. http://www.stephenbay.net/orca. Accessed 6 Jul 2015
Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), ACM, New York, pp 29–38
Beckmann N, Kriegel H, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331
Article Google Scholar
Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151
Article Google Scholar
Brace K, Rudell R, Bryant R (1990) Efficient implementation of a BDD package. In: The 27th ACM/IEEE design automation conference, pp 40–45
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD ’00), ACM, New York, pp 93–104
Bryant R (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput 35(8):677–691
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Discov 16(3):349–364
Article MathSciNet Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab-an S4 package for kernel methods in R. J Stat Softw 11(9):1–20
Article Google Scholar
Kutsuna T (2010) A binary decision diagram-based one-class classifier. In: Proceedings of the 10th IEEE international conference on data mining (ICDM ’10), pp 284–293
Kutsuna T, Yamamoto A (2014a) Outlier detection based on leave-one-out density using binary decision diagrams. In: Tseng V, Ho T, Zhou ZH, Chen A, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 486–497
Chapter Google Scholar
Kutsuna T, Yamamoto A (2014b) A parameter-free approach for one-class classification using binary decision diagrams. Intell Data Anal 18(5):889–910
Article Google Scholar
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (KDD ’05), ACM, New York, pp 157–166
Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM conference on data mining
Chapter Google Scholar
Liu FT (2009) Isolationforest: Isolation forest. http://sourceforge.net/projects/iforest. Accessed 11 November 2014. R package version 0.0-25
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Proceedings of the 8th IEEE international conference on data mining (ICDM ’08), pp 413–422
Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 2:49–55
MATH Google Scholar
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31
Article Google Scholar
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 20 Jan 2016
Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article Google Scholar
Somenzi F (1999) Calculational system design. In: Broy M, Steninbruggen R (eds) Binary decision diagrams, vol 173. IOS Press, Amsterdam, pp 303–366
MATH Google Scholar
Somenzi F (2012) CUDD: CU decision diagram package. http://vlsi.colorado.edu/~fabio/CUDD. Accessed 24 June 2014
Torgo L (2010) Data mining with R, learning with case studies. Chapman and Hall/CRC, Boca Raton
Book Google Scholar
Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Central R&D Labs. Inc., 41-1, Yokomichi, Nagakute, Aichi, 480-1192, Japan
Takuro Kutsuna
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Akihiro Yamamoto

Authors

Takuro Kutsuna
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuro Kutsuna.

Additional information

Responsible editor: Srinivasan Parthasarathy

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kutsuna, T., Yamamoto, A. Outlier detection using binary decision diagrams. Data Min Knowl Disc 31, 548–572 (2017). https://doi.org/10.1007/s10618-016-0486-6

Download citation

Received: 19 January 2015
Accepted: 01 November 2016
Published: 17 November 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10618-016-0486-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier detection using binary decision diagrams

Abstract

Access this article

Similar content being viewed by others

Outlier Detection Based on Leave-One-Out Density Using Binary Decision Diagrams

Enhancing Outlier Detection by an Outlier Indicator

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Outlier detection using binary decision diagrams

Abstract

Access this article

Similar content being viewed by others

Outlier Detection Based on Leave-One-Out Density Using Binary Decision Diagrams

Enhancing Outlier Detection by an Outlier Indicator

Outlier Detection Based on Cluster Outlier Factor and Mutual Density

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation