Abstract
Attribute Value Frequency (AVF) is a simple yet fast and effective method for detecting outliers in categorical nominal data. Previous work has shown that AVF requires lesser processing time while maintains very good outlier detection accuracy when compared with other existing techniques. However, AVF works on static data only; this means that AVF cannot be used in data stream applications such as sensor data monitoring. In this paper, we introduce a modified version of AVF known as One Pass AVF to deal with streaming categorical data. We compare this new algorithm with AVF based on outlier detection accuracy. We also apply One Pass AVF for detecting unreliable data points (i.e., outliers) in a marine sensor data monitoring application. The proposed algorithm is experimentally shown to be as effective as AVF and yet capable of detecting outliers in streaming categorical data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In Poceedings of the International Conference on Very Large Data Bases VLDB, pp. 487–499, 1994
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA. http://archive.ics.uci.edu/ml
Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45:171–186
He Z, Xu X, Huang JZ, Deng S (2005) FP-Outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118
He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. In: Proceedings of the 2005 international conference on advances in intelligent computing—volume part I. Springer, Berlin, Heidelberg, pp 400–409
Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: IEEE international conference on tools with artificial intelligence ICTAI, pp 210–217
Rahman A, Smith D, Timms G (2011) Multiple classifier system for automated quality assessment of marine sensor data. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 1511–1516
Tan SC, Ting KM, Liu FT (2013) Fast anomaly detection for streaming data. In: Proceedings of IEEE intelligent sensors, sensor networks and information processing (ISSNIP), pp 362–367
Timms GP, McCulloch JW, McCarthy P, Howell B, de Souza PA, Dunbabin MD, Hartmann K (2009) The Tasmanian marine analysis network (TasMAN). In: Proceedings of IEEE oceans, vol ½. Bremen, Germany, pp 43–48
Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160
Wei L, Qian W, Zhou A, Jin W, Yu J (2003) HOT: hypergraph-based outlier test for categorical data. In: Proceedings of the 7th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 399–410
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Tan, S.C., Yip, S.H., Rahman, A. (2013). One Pass Outlier Detection for Streaming Categorical Data. In: Uden, L., Wang, L., Hong, TP., Yang, HC., Ting, IH. (eds) The 3rd International Workshop on Intelligent Data Analysis and Management. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7293-9_4
Download citation
DOI: https://doi.org/10.1007/978-94-007-7293-9_4
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7292-2
Online ISBN: 978-94-007-7293-9
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)