Skip to main content

Data Intensive vs Sliding Window Outlier Detection in the Stream Data — An Experimental Approach

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

Abstract

In the paper a problem of outlier detection in the stream data is raised. The authors propose a new approach, using well known outlier detection algorithms, of outlier detection in the stream data. The method is based on the definition of a sliding window, which means a sequence of stream data observations from the past that are closest to the newly coming object. As it may be expected the outlier detection accuracy level of this model becomes worse than the accuracy of the model that uses all historical data, but from the statistical point of view the difference is not significant. In the paper several well known methods of outlier detection are used as the basis of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Aggarwal, C.: An Introduction to Data Streams. Springer, USA (2007)

    Book  Google Scholar 

  3. Aggarwal, C.: Outlier Analysis. Springer, New York (2013)

    Book  MATH  Google Scholar 

  4. Aggarwal, C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–46 (2001)

    Google Scholar 

  5. Angiulli, F., Fassetti, F.: Distance-based outlier queries in data streams: the novel task and algorithms. Data Min. Knowl. Discov. 20(2), 290–324 (2010)

    Article  MathSciNet  Google Scholar 

  6. Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: the stanford data stream management system (2004)

    Google Scholar 

  7. Assent, I., Kranen, P., Baldauf, C., Seidl, T.: AnyOut: anytime outlier detection on streaming data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 228–242. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  9. Barkow, S., Bleuler, S., Prelić, A., Zimmermann, P., Zitzler, E.: BicAT: a biclustering analysis toolbox. Bioinformatics 22(10), 1282–1283 (2006)

    Article  Google Scholar 

  10. Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1994)

    MATH  Google Scholar 

  11. Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007)

    Article  Google Scholar 

  12. Breunig, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  13. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: OPTICS-OF: identifying local outliers. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 262–270. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  14. Bu, Y., Leung, T.-W., Fu, A., et al.: WAT: finding top-\(K\) discords in time series database. In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)

    Google Scholar 

  15. Byers, S., Raftery, A.: Nearest-neighbor clutter removal for estimating features in spatial point processes. J. Am. Stat. Assoc. 93(442), 577–584 (1988)

    Article  MATH  Google Scholar 

  16. Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668–668 (2003)

    Google Scholar 

  17. Dhaliwal, P., Bhatia, M., Bansal, P.: A cluster-based approach for outlier detection in dynamic data streams (KORM: k-median OutlieR miner). J. Comput. 2(2), 74–80 (2010)

    Google Scholar 

  18. Elahi, M., Li, K., Nisar, W., et al.: Efficient clustering-based outlier detection algorithm for dynamic data stream. In: 5th International Conference on Fuzzy Systems and Knowledge, Discovery, pp. 298–304 (2008)

    Google Scholar 

  19. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  20. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC, Boca Raton (2010)

    Book  MATH  Google Scholar 

  21. Georgiadis, D., Kontaki, M., Gounaris, A., et al.: Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1061–1064 (2013)

    Google Scholar 

  22. Grubbs, F.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)

    Article  Google Scholar 

  23. Grubbs, F.: Sample criteria for testing outlying observations. Ann. Math. Stat. 21(1), 27–58 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hawkins, D.: Identification of Outliers. Springer, Netherlands (1980)

    Book  MATH  Google Scholar 

  26. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  27. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. John, G.: Robust decision trees: removing outliers from databases. In: Knowledge Discovery and Data Mining, pp. 174–179. AAAI Press (1995)

    Google Scholar 

  29. Johnson, T., Kwok, I., Ng, R.: Fast computation of 2-dimensional depth contours. In: International Conference on Knowledge Discovery and Data Mining, pp. 224–228 (1998)

    Google Scholar 

  30. Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Comm. Comp. Inf. Sci. (2016, to appear)

    Google Scholar 

  31. Keogh, E., Lin, J., Fu, A.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (2005)

    Google Scholar 

  32. Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 392–403 (1998)

    Google Scholar 

  33. Kontaki, M., Gounaris, A., Papadopoulos, A., et al.: Continuous monitoring of distance-based outliers over data streams. In: IEEE International Conference on Data Engineering, pp. 135–146 (2011)

    Google Scholar 

  34. Kozielski, M., Sikora, M., Wróbel, Ł.: DISESOR - decision support system for mining industry. Ann. Comput. Sci. Inf. Syst. 5, 67–74 (2015)

    Article  Google Scholar 

  35. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452 (2008)

    Google Scholar 

  36. Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)

    Article  Google Scholar 

  37. Le, N., Martin, R., Raftery, A.: Modeling flat stretches, time series using mixture transition distribution models. J. Am. Stat. Assoc. 91(436), 1504–1515 (1996)

    MathSciNet  MATH  Google Scholar 

  38. Ma, J., Perkins, S.: Online novelty detection on temporal sequences. In: Proceedings of 9th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 613–618 (2003)

    Google Scholar 

  39. Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  40. Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Comput. Inf. 29(6), 1221–1231 (2010)

    Google Scholar 

  41. Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 504–515 (2007)

    Google Scholar 

  42. Prakash, C., Prashant, C.: Outlier detection techniques over streaming data in data mining: a research perspective. Int. J. Recent Technol. Eng. 1(2), 157–162 (2013)

    Google Scholar 

  43. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)

    Google Scholar 

  44. Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications (Vol. B). Reidel, Dordrecht (1985)

    Google Scholar 

  45. Ruts, I., Rousseeuw, P.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)

    Article  MATH  Google Scholar 

  46. Sadik, S., Gruenwald, L.: Online outlier detection for data streams. In: Proceedings of the 15th Symposium on International Database Engineering and Applications, pp. 88–96 (2011)

    Google Scholar 

  47. Schölkopf, B., Williamson, R., Smola, A., et al.: Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 12, 582–588 (2000)

    Google Scholar 

  48. Shekhar, S., Lu, C.-T., Zhang, P.: Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376 (2001)

    Google Scholar 

  49. Torr, P., Murray, D.: Outlier detection and motion segmentation. In: Proceedings of SPIE, vol. 2059, pp. 432–443 (1993)

    Google Scholar 

  50. Tukey, J.: Exploratory Data Analysis. Addison-Wesley Publishing Company, Reading (1977)

    MATH  Google Scholar 

  51. Yang, D., Rundensteiner, E., Ward, M.: Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 529–540 (2009)

    Google Scholar 

  52. Yogita, T., Toshniwal, D.: A framework for outlier detection in evolving data streams by weighting attributes in clustering. Procedia Technol. 6, 214–222 (2012)

    Article  Google Scholar 

  53. Wei, L., Keogh, E., Xi, X.: SAXually explicit images: finding unusual shapes. In: Sixth International Conference on Data Mining, pp. 711–720 (2006)

    Google Scholar 

  54. Weisberg, S.: Applied Linear Regression. Wiley, Hoboken (2005)

    Book  MATH  Google Scholar 

  55. Widera, M., Kozielski, S.: Strumieniowe systemy zarządzania danymi - przegląd rozwiązań (in Polish), in: Bazy danych. Modele, technologie, narzȩdzia. [Vol. 1]: Architektura, metody formalne, bezpieczeństwo, 257–266, WKŁ (2005)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 within Applied Research Programmes. The infrastructure was supported by “PL-LAB2020” project, contract POIG.02.03.01-00-104/13-00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Michalak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P. (2016). Data Intensive vs Sliding Window Outlier Detection in the Stream Data — An Experimental Approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39384-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39383-4

  • Online ISBN: 978-3-319-39384-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics