Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Aryal, Sunil; Ting, Kai Ming; Haffari, Gholamreza

doi:10.1007/978-3-319-31863-9_6

Sunil Aryal^16,17,
Kai Ming Ting¹⁷ &
Gholamreza Haffari¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9650))

Included in the following conference series:

Pacific-Asia Workshop on Intelligence and Security Informatics

1190 Accesses
10 Citations

Abstract

In this paper, we revisit the simple probabilistic approach of unsupervised anomaly detection by estimating multivariate probability as a product of univariate probabilities, assuming attributes are generated independently. We show that this simple traditional approach performs competitively to or better than five state-of-the-art unsupervised anomaly detection methods across a wide range of data sets from categorical, numeric or mixed domains. It is arguably the fastest anomaly detector. It is one order of magnitude faster than the fastest state-of-the-art method in high dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Though \(\hat{P}(\cdot )\) can be estimated directly in numeric domains, it is a lot easier and faster to do it in categorical domains.
2.
http://www3.cs.stonybrook.edu/~leman/pubs.html.

References

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15: 1–15: 58 (2009)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 427–438 (2000)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Liu, F., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the Eighth IEEE International Conference on Data Mining, (ICDM), pp. 413–422 (2008)
Google Scholar
Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, pp. 467–475 (2013)
Google Scholar
He, Z., Xu, X., Huang, J.Z., Deng, S.: FP-outlier: frequent pattern based outlier detection. Comput. Sci. Inf. Syst. 2(1), 103–118 (2005)
Article Google Scholar
Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM), pp. 415–424 (2012)
Google Scholar
Goldstein, M., Dengel, A.: Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. In: Proceedings of the 35th German Conference on Artificial Intelligence (KI-2012), pp. 59–63 (2012)
Google Scholar
Chandola, V., Boriah, S., Kumar, V.: Similarity measures for categorical data: a comparative study. Technical report TR 07–022, Department of Computer Science and Engineering, University of Minnesota, USA (2007)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference on Management of Data, pp. 207–216 (1993)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)
MATH Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Article Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2003)
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104 (2006)
Google Scholar
Keller, F., Mller, E., Bhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of ICDE, pp. 1037–1048. IEEE Computer Society (2012)
Google Scholar

Download references

Acknowledgments

We would like to thank Prof. Takashi Washio for providing very useful comments and suggestions. We are thankful to the anonymous reviewers for their critical comments to improve the quality of the paper.

Author information

Authors and Affiliations

Clayton School of Information Technology, Monash University, Victoria, Australia
Sunil Aryal & Gholamreza Haffari
School of Engineering and Information Technology, Federation University, Victoria, Australia
Sunil Aryal & Kai Ming Ting

Authors

Sunil Aryal
View author publications
You can also search for this author in PubMed Google Scholar
Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Haffari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunil Aryal .

Editor information

Editors and Affiliations

The University of Hong Kong, Hong Kong, Hong Kong
Michael Chau
Virginia Tech, Blacksburg, Virginia, USA
G. Alan Wang
The University of Arizona, Tucson, Arizona, USA
Hsinchun Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aryal, S., Ting, K.M., Haffari, G. (2016). Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection. In: Chau, M., Wang, G., Chen, H. (eds) Intelligence and Security Informatics. PAISI 2016. Lecture Notes in Computer Science(), vol 9650. Springer, Cham. https://doi.org/10.1007/978-3-319-31863-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-31863-9_6
Published: 29 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31862-2
Online ISBN: 978-3-319-31863-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics