Abstract
In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hawkins, D.: Identification of Outliers. Chapman and Hall, Reading (1980)
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal, 379–423 (1948)
Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)
He, Z., Xu, X.-f., Huang, J.Z., Deng, S.: A Frequent Pattern Discovery Based Method for Outlier Detection. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 726–732. Springer, Heidelberg (2004)
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)
Yamanishi, K., Takeuchi, J., Williams, G.: On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In: KDD 2000, pp. 320–325 (2000)
Yamanishi, K., Takeuchi, J.: Discovering Outlier Filtering Rules from Unlabeled Data-Combining a Supervised Learner with an Unsupervised Learner. In: KDD 2001 (2001)
Nuts, R., Rousseeuw, P.: Computing Depth Contours of Bivariate Point Clouds. Computational Statistics and Data Analysis 23, 153–168 (1996)
Johnson, T., et al.: Fast Computation of 2-dimensional Depth Contours. In: KDD 1998 (1998)
Arning, A., et al.: A Linear Method for Deviation Detection in Large Databases. In: KDD 1996 (1996)
Knorr, E., Ng, R.: A Unified Notion of Outliers: Properties and Computation. In: KDD 1997 (1997)
Knorr, E., Ng, R.: Algorithms for Mining Distance-based Outliers in Large Datasets. In: VLDB 1998 (1998)
Knorr, E., Ng, R.: Finding Intentional Knowledge of Distance-based Outliers. In: VLDB 1999 (1999)
Knorr, E., et al.: Distance-based Outliers: Algorithms and Applications. VLDB Journal (2000)
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000, pp. 93–104 (2000)
Angiulli, F., Pizzuti, C.: Fast Outlier Detection in High Dimensional Spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 15. Springer, Heidelberg (2002)
Bay, S.D., Schwabacher, M.: Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: KDD 2003 (2003)
Breunig, M., et al.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD 2000 (2000)
Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)
Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)
Jin, W., et al.: Mining top-n local Outliers in Large Databases. In: KDD 2001 (2001)
Papadimitriou, S., et al.: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)
Hu, T., Sung, S.Y.: Detecting Pattern-based Outliers. Pattern Recognition Letters (2003)
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)
Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Out Outliers in Large Datasets. Knowledge and Information Systems 4(4), 387–412 (2002)
He, Z., et al.: Discovering Cluster Based Local Outliers. Pattern Recognition Letters (2003)
Wei, L., et al.: HOT: Hypergraph-Based Outlier Test for Categorical Data. In: PAKDD 2003 (2003)
Tax, D., Duin, R.: Support Vector Data Description. Pattern Recognition Letters (1999)
Schölkopf, B., et al.: Estimating the Support of a High Dimensional Distribution. Neural Computation 13(7), 1443–1472 (2001)
Cao, L.J., Lee, H.P., Chong, W.K.: Modified Support Vector Novelty Detector Using Training Data with Outliers. Pattern Recognition Letters 24(14), 2479–2487 (2003)
Petrovskiy, M.: A Hybrid Method for Patterns Mining and Outliers Detection in the Web Usage Log. In: AWIC 2003, pp. 318–328 (2003)
Hawkins, S., He, H., Williams, G.J., Baxter, R.A.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)
Willams, G.J., et al.: A Comparative Study of RNN for Outlier Detection in Data Mining. In: ICDM 2002, pp. 709–712 (2002)
He, Z., Deng, S., Xu, X.-f.: Outlier Detection Integrating Semantic Knowledge. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, p. 126. Springer, Heidelberg (2002)
Papadimitriou, S., Faloutsos, C.: Cross-outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 199–213. Springer, Heidelberg (2003)
He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outlier: Concepts, Algorithms and Applications in CRM. Expert System with Applications (2004)
Merz, G., Murphy, P.: Uci Repository of Machine Learning Databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, Z., Deng, S., Xu, X. (2005). An Optimization Model for Outlier Detection in Categorical Data. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_42
Download citation
DOI: https://doi.org/10.1007/11538059_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28226-6
Online ISBN: 978-3-540-31902-3
eBook Packages: Computer ScienceComputer Science (R0)