Skip to main content
Log in

Outlier detection using binary decision diagrams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We propose a novel method for outlier detection using binary decision diagrams. Leave-one-out density is proposed as a new measure for detecting outliers, which is defined as a ratio of the number of data elements inside a region to the volume of the region after a focused datum is removed. We show that leave-one-out density can be evaluated very efficiently on a set of regions around each datum in a given dataset by using binary decision diagrams. The time complexity of the proposed method is nearly linear with respect to the size of the dataset, while the outlier detection accuracy is still comparable to that of other methods. Experimental results show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Smoothing a Boolean function f with respect to x means \((f \wedge x) \vee (f \wedge \overline{x})\).

References

  • Aryal S, Ting KM, Wells JR, Washio T (2014) Improving iforest with relative mass. In: Tseng VS, Ho TB, Zhou ZH, Chen AL, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 510–521

    Chapter  Google Scholar 

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 24 June 2014

  • Bay SD (2003) Orca: a program for mining distance-based outliers. http://www.stephenbay.net/orca. Accessed 6 Jul 2015

  • Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), ACM, New York, pp 29–38

  • Beckmann N, Kriegel H, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331

    Article  Google Scholar 

  • Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151

    Article  Google Scholar 

  • Brace K, Rudell R, Bryant R (1990) Efficient implementation of a BDD package. In: The 27th ACM/IEEE design automation conference, pp 40–45

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD ’00), ACM, New York, pp 93–104

  • Bryant R (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput 35(8):677–691

    Article  Google Scholar 

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58

    Article  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Discov 16(3):349–364

    Article  MathSciNet  Google Scholar 

  • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab-an S4 package for kernel methods in R. J Stat Softw 11(9):1–20

    Article  Google Scholar 

  • Kutsuna T (2010) A binary decision diagram-based one-class classifier. In: Proceedings of the 10th IEEE international conference on data mining (ICDM ’10), pp 284–293

  • Kutsuna T, Yamamoto A (2014a) Outlier detection based on leave-one-out density using binary decision diagrams. In: Tseng V, Ho T, Zhou ZH, Chen A, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 486–497

    Chapter  Google Scholar 

  • Kutsuna T, Yamamoto A (2014b) A parameter-free approach for one-class classification using binary decision diagrams. Intell Data Anal 18(5):889–910

    Article  Google Scholar 

  • Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (KDD ’05), ACM, New York, pp 157–166

  • Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM conference on data mining

    Chapter  Google Scholar 

  • Liu FT (2009) Isolationforest: Isolation forest. http://sourceforge.net/projects/iforest. Accessed 11 November 2014. R package version 0.0-25

  • Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Proceedings of the 8th IEEE international conference on data mining (ICDM ’08), pp 413–422

  • Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 2:49–55

    MATH  Google Scholar 

  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31

    Article  Google Scholar 

  • R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 20 Jan 2016

  • Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  Google Scholar 

  • Somenzi F (1999) Calculational system design. In: Broy M, Steninbruggen R (eds) Binary decision diagrams, vol 173. IOS Press, Amsterdam, pp 303–366

    MATH  Google Scholar 

  • Somenzi F (2012) CUDD: CU decision diagram package. http://vlsi.colorado.edu/~fabio/CUDD. Accessed 24 June 2014

  • Torgo L (2010) Data mining with R, learning with case studies. Chapman and Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuro Kutsuna.

Additional information

Responsible editor: Srinivasan Parthasarathy

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kutsuna, T., Yamamoto, A. Outlier detection using binary decision diagrams. Data Min Knowl Disc 31, 548–572 (2017). https://doi.org/10.1007/s10618-016-0486-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0486-6

Keywords

Navigation