Abstract
In the paper a problem of outlier detection in the stream data is raised. The authors propose a new approach, using well known outlier detection algorithms, of outlier detection in the stream data. The method is based on the definition of a sliding window, which means a sequence of stream data observations from the past that are closest to the newly coming object. As it may be expected the outlier detection accuracy level of this model becomes worse than the accuracy of the model that uses all historical data, but from the statistical point of view the difference is not significant. In the paper several well known methods of outlier detection are used as the basis of the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Aggarwal, C.: An Introduction to Data Streams. Springer, USA (2007)
Aggarwal, C.: Outlier Analysis. Springer, New York (2013)
Aggarwal, C., Yu, P.: Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–46 (2001)
Angiulli, F., Fassetti, F.: Distance-based outlier queries in data streams: the novel task and algorithms. Data Min. Knowl. Discov. 20(2), 290–324 (2010)
Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: the stanford data stream management system (2004)
Assent, I., Kranen, P., Baldauf, C., Seidl, T.: AnyOut: anytime outlier detection on streaming data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 228–242. Springer, Heidelberg (2012)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16 (2002)
Barkow, S., Bleuler, S., Prelić, A., Zimmermann, P., Zitzler, E.: BicAT: a biclustering analysis toolbox. Bioinformatics 22(10), 1282–1283 (2006)
Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1994)
Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007)
Breunig, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: OPTICS-OF: identifying local outliers. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 262–270. Springer, Heidelberg (1999)
Bu, Y., Leung, T.-W., Fu, A., et al.: WAT: finding top-\(K\) discords in time series database. In: Proceedings of the 2007 SIAM International Conference on Data Mining (2007)
Byers, S., Raftery, A.: Nearest-neighbor clutter removal for estimating features in spatial point processes. J. Am. Stat. Assoc. 93(442), 577–584 (1988)
Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668–668 (2003)
Dhaliwal, P., Bhatia, M., Bansal, P.: A cluster-based approach for outlier detection in dynamic data streams (KORM: k-median OutlieR miner). J. Comput. 2(2), 74–80 (2010)
Elahi, M., Li, K., Nisar, W., et al.: Efficient clustering-based outlier detection algorithm for dynamic data stream. In: 5th International Conference on Fuzzy Systems and Knowledge, Discovery, pp. 298–304 (2008)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC, Boca Raton (2010)
Georgiadis, D., Kontaki, M., Gounaris, A., et al.: Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1061–1064 (2013)
Grubbs, F.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)
Grubbs, F.: Sample criteria for testing outlying observations. Ann. Math. Stat. 21(1), 27–58 (1950)
Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)
Hawkins, D.: Identification of Outliers. Springer, Netherlands (1980)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)
John, G.: Robust decision trees: removing outliers from databases. In: Knowledge Discovery and Data Mining, pp. 174–179. AAAI Press (1995)
Johnson, T., Kwok, I., Ng, R.: Fast computation of 2-dimensional depth contours. In: International Conference on Knowledge Discovery and Data Mining, pp. 224–228 (1998)
Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Comm. Comp. Inf. Sci. (2016, to appear)
Keogh, E., Lin, J., Fu, A.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (2005)
Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 392–403 (1998)
Kontaki, M., Gounaris, A., Papadopoulos, A., et al.: Continuous monitoring of distance-based outliers over data streams. In: IEEE International Conference on Data Engineering, pp. 135–146 (2011)
Kozielski, M., Sikora, M., Wróbel, Ł.: DISESOR - decision support system for mining industry. Ann. Comput. Sci. Inf. Syst. 5, 67–74 (2015)
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452 (2008)
Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)
Le, N., Martin, R., Raftery, A.: Modeling flat stretches, time series using mixture transition distribution models. J. Am. Stat. Assoc. 91(436), 1504–1515 (1996)
Ma, J., Perkins, S.: Online novelty detection on temporal sequences. In: Proceedings of 9th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 613–618 (2003)
Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005)
Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Comput. Inf. 29(6), 1221–1231 (2010)
Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 504–515 (2007)
Prakash, C., Prashant, C.: Outlier detection techniques over streaming data in data mining: a research perspective. Int. J. Recent Technol. Eng. 1(2), 157–162 (2013)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications (Vol. B). Reidel, Dordrecht (1985)
Ruts, I., Rousseeuw, P.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)
Sadik, S., Gruenwald, L.: Online outlier detection for data streams. In: Proceedings of the 15th Symposium on International Database Engineering and Applications, pp. 88–96 (2011)
Schölkopf, B., Williamson, R., Smola, A., et al.: Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 12, 582–588 (2000)
Shekhar, S., Lu, C.-T., Zhang, P.: Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376 (2001)
Torr, P., Murray, D.: Outlier detection and motion segmentation. In: Proceedings of SPIE, vol. 2059, pp. 432–443 (1993)
Tukey, J.: Exploratory Data Analysis. Addison-Wesley Publishing Company, Reading (1977)
Yang, D., Rundensteiner, E., Ward, M.: Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 529–540 (2009)
Yogita, T., Toshniwal, D.: A framework for outlier detection in evolving data streams by weighting attributes in clustering. Procedia Technol. 6, 214–222 (2012)
Wei, L., Keogh, E., Xi, X.: SAXually explicit images: finding unusual shapes. In: Sixth International Conference on Data Mining, pp. 711–720 (2006)
Weisberg, S.: Applied Linear Regression. Wiley, Hoboken (2005)
Widera, M., Kozielski, S.: Strumieniowe systemy zarządzania danymi - przegląd rozwiązań (in Polish), in: Bazy danych. Modele, technologie, narzȩdzia. [Vol. 1]: Architektura, metody formalne, bezpieczeństwo, 257–266, WKŁ (2005)
Acknowledgements
This work was partially supported by Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 within Applied Research Programmes. The infrastructure was supported by “PL-LAB2020” project, contract POIG.02.03.01-00-104/13-00.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P. (2016). Data Intensive vs Sliding Window Outlier Detection in the Stream Data — An Experimental Approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-39384-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39383-4
Online ISBN: 978-3-319-39384-1
eBook Packages: Computer ScienceComputer Science (R0)