One Pass Outlier Detection for Streaming Categorical Data

  • Swee Chuan TanEmail author
  • Si Hao Yip
  • Ashfaqur Rahman
Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


Attribute Value Frequency (AVF) is a simple yet fast and effective method for detecting outliers in categorical nominal data. Previous work has shown that AVF requires lesser processing time while maintains very good outlier detection accuracy when compared with other existing techniques. However, AVF works on static data only; this means that AVF cannot be used in data stream applications such as sensor data monitoring. In this paper, we introduce a modified version of AVF known as One Pass AVF to deal with streaming categorical data. We compare this new algorithm with AVF based on outlier detection accuracy. We also apply One Pass AVF for detecting unreliable data points (i.e., outliers) in a marine sensor data monitoring application. The proposed algorithm is experimentally shown to be as effective as AVF and yet capable of detecting outliers in streaming categorical data.


Data stream Outlier Categorical data Attribute value frequency One pass 


  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In Poceedings of the International Conference on Very Large Data Bases VLDB, pp. 487–499, 1994Google Scholar
  2. 2.
    Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA.
  3. 3.
    Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45:171–186zbMATHCrossRefGoogle Scholar
  4. 4.
    He Z, Xu X, Huang JZ, Deng S (2005) FP-Outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118CrossRefGoogle Scholar
  5. 5.
    He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. In: Proceedings of the 2005 international conference on advances in intelligent computing—volume part I. Springer, Berlin, Heidelberg, pp 400–409Google Scholar
  6. 6.
    Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: IEEE international conference on tools with artificial intelligence ICTAI, pp 210–217Google Scholar
  7. 7.
    Rahman A, Smith D, Timms G (2011) Multiple classifier system for automated quality assessment of marine sensor data. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 1511–1516Google Scholar
  8. 8.
    Tan SC, Ting KM, Liu FT (2013) Fast anomaly detection for streaming data. In: Proceedings of IEEE intelligent sensors, sensor networks and information processing (ISSNIP), pp 362–367Google Scholar
  9. 9.
    Timms GP, McCulloch JW, McCarthy P, Howell B, de Souza PA, Dunbabin MD, Hartmann K (2009) The Tasmanian marine analysis network (TasMAN). In: Proceedings of IEEE oceans, vol ½. Bremen, Germany, pp 43–48Google Scholar
  10. 10.
    Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Wei L, Qian W, Zhou A, Jin W, Yu J (2003) HOT: hypergraph-based outlier test for categorical data. In: Proceedings of the 7th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 399–410Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.School of BusinessSIM UniversitySingaporeSingapore
  2. 2.Intelligent Sensing and Systems LaboratoryCSIROHobartAustralia

Personalised recommendations