Skip to main content

One Pass Outlier Detection for Streaming Categorical Data

  • Conference paper
  • First Online:
The 3rd International Workshop on Intelligent Data Analysis and Management

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Abstract

Attribute Value Frequency (AVF) is a simple yet fast and effective method for detecting outliers in categorical nominal data. Previous work has shown that AVF requires lesser processing time while maintains very good outlier detection accuracy when compared with other existing techniques. However, AVF works on static data only; this means that AVF cannot be used in data stream applications such as sensor data monitoring. In this paper, we introduce a modified version of AVF known as One Pass AVF to deal with streaming categorical data. We compare this new algorithm with AVF based on outlier detection accuracy. We also apply One Pass AVF for detecting unreliable data points (i.e., outliers) in a marine sensor data monitoring application. The proposed algorithm is experimentally shown to be as effective as AVF and yet capable of detecting outliers in streaming categorical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In Poceedings of the International Conference on Very Large Data Bases VLDB, pp. 487–499, 1994

    Google Scholar 

  2. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA. http://archive.ics.uci.edu/ml

  3. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45:171–186

    Article  MATH  Google Scholar 

  4. He Z, Xu X, Huang JZ, Deng S (2005) FP-Outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118

    Article  Google Scholar 

  5. He Z, Xu X, Deng S (2005) An optimization model for outlier detection in categorical data. In: Proceedings of the 2005 international conference on advances in intelligent computing—volume part I. Springer, Berlin, Heidelberg, pp 400–409

    Google Scholar 

  6. Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: IEEE international conference on tools with artificial intelligence ICTAI, pp 210–217

    Google Scholar 

  7. Rahman A, Smith D, Timms G (2011) Multiple classifier system for automated quality assessment of marine sensor data. In: Proceedings of the twenty-second international joint conference on artificial intelligence, pp 1511–1516

    Google Scholar 

  8. Tan SC, Ting KM, Liu FT (2013) Fast anomaly detection for streaming data. In: Proceedings of IEEE intelligent sensors, sensor networks and information processing (ISSNIP), pp 362–367

    Google Scholar 

  9. Timms GP, McCulloch JW, McCarthy P, Howell B, de Souza PA, Dunbabin MD, Hartmann K (2009) The Tasmanian marine analysis network (TasMAN). In: Proceedings of IEEE oceans, vol ½. Bremen, Germany, pp 43–48

    Google Scholar 

  10. Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160

    Article  MathSciNet  MATH  Google Scholar 

  11. Wei L, Qian W, Zhou A, Jin W, Yu J (2003) HOT: hypergraph-based outlier test for categorical data. In: Proceedings of the 7th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 399–410

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swee Chuan Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Tan, S.C., Yip, S.H., Rahman, A. (2013). One Pass Outlier Detection for Streaming Categorical Data. In: Uden, L., Wang, L., Hong, TP., Yang, HC., Ting, IH. (eds) The 3rd International Workshop on Intelligent Data Analysis and Management. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7293-9_4

Download citation

Publish with us

Policies and ethics