Abstract
In this paper, we study the problem of projected outlier detection in high dimensional data streams and propose a new technique, called Stream Projected Ouliter deTector (SPOT), to identify outliers embedded in subspaces. Sparse Subspace Template (SST), a set of subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective Genetic Algorithm (MOGA) is employed as an effective search method for finding outlying subspaces from training data to construct SST. SST is able to carry out online self-evolution in the detection stage to cope with dynamics of data streams. The experimental results demonstrate the efficiency and effectiveness of SPOT in detecting outliers in high-dimensional data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB Journal 14, 211–221 (2005)
Aggarwal, C.C.: On Abnormality Detection in Spuriously Populated Data Streams. In: SDM 2005, Newport Beach, CA (2005)
Aggarwal, C.C., Yu, P.S.: Outlier Detection in High Dimensional Data. In: SIGMOD 2001, Santa Barbara, California, USA, pp. 37–46 (2001)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: VLDB 2003, Berlin, Germany, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: VLDB 2004, Toronto, Canada, pp. 852–863 (2004)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–26. Springer, Heidelberg (2002)
Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD 2000, Dallas, Texas, pp. 93–104 (2000)
Guttman, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984, Boston, Massachusetts, pp. 47–57 (1984)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, San Francisco (2000)
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-based Outliers in Large Dataset. In: VLDB 1998, New York, NY, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T.: Finding Intentional Knowledge of Distance-based Outliers. In: VLDB 1999, Edinburgh, Scotland, pp. 211–222 (1999)
Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Distributed deviation detection in sensor networks. SIGMOD Record 32(4), 77–82 (2003)
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000, Dallas Texas, pp. 427–438 (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003, Bangalore, India, p. 315 (2003)
Pokrajac, D., Lazarevic, A., Latecki, L.: Incremental Local Outlier Detection for Data Streams. In: CIDM 2007, Honolulu, Hawaii, USA, pp. 504–515 (2007)
Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online Outlier Detection in Sensor Data Using Non-Parametric Models. In: VLDB 2006, Seoul, Korea, pp. 187–198 (2006)
Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, p. 535. Springer, Heidelberg (2002)
Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data. In: VLDB 2004, Toronto, Canada, pp. 1265–1268 (2004)
Zhang, J., Gao, Q., Wang, H.: A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm. In: ICDM 2006, Hong Kong, China, pp. 731–740 (2006)
Zhang, J., Wang, H.: Detecting Outlying Subspaces for High-dimensional Data: the New Task, Algorithms and Performance. In: Knowledge and Information Systems (KAIS), pp. 333–355 (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996, Montreal, Canada, pp. 103–114 (1996)
Zhu, C., Kitagawa, H., Faloutsos, C.: Example-Based Robust Outlier Detection in High Dimensional Datasets. In: ICDM 2005, Houston, Texas, pp. 829–832 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, J., Gao, Q., Wang, H., Liu, Q., Xu, K. (2009). Detecting Projected Outliers in High-Dimensional Data Streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2009. Lecture Notes in Computer Science, vol 5690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03573-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-03573-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03572-2
Online ISBN: 978-3-642-03573-9
eBook Packages: Computer ScienceComputer Science (R0)