Skip to main content

An Optimization Model for Outlier Detection in Categorical Data

  • Conference paper
Advances in Intelligent Computing (ICIC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3644))

Included in the following conference series:

Abstract

In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hawkins, D.: Identification of Outliers. Chapman and Hall, Reading (1980)

    MATH  Google Scholar 

  2. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal, 379–423 (1948)

    Google Scholar 

  3. Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)

    Google Scholar 

  4. He, Z., Xu, X.-f., Huang, J.Z., Deng, S.: A Frequent Pattern Discovery Based Method for Outlier Detection. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 726–732. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)

    MATH  Google Scholar 

  6. Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)

    Book  MATH  Google Scholar 

  7. Yamanishi, K., Takeuchi, J., Williams, G.: On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In: KDD 2000, pp. 320–325 (2000)

    Google Scholar 

  8. Yamanishi, K., Takeuchi, J.: Discovering Outlier Filtering Rules from Unlabeled Data-Combining a Supervised Learner with an Unsupervised Learner. In: KDD 2001 (2001)

    Google Scholar 

  9. Nuts, R., Rousseeuw, P.: Computing Depth Contours of Bivariate Point Clouds. Computational Statistics and Data Analysis 23, 153–168 (1996)

    Article  Google Scholar 

  10. Johnson, T., et al.: Fast Computation of 2-dimensional Depth Contours. In: KDD 1998 (1998)

    Google Scholar 

  11. Arning, A., et al.: A Linear Method for Deviation Detection in Large Databases. In: KDD 1996 (1996)

    Google Scholar 

  12. Knorr, E., Ng, R.: A Unified Notion of Outliers: Properties and Computation. In: KDD 1997 (1997)

    Google Scholar 

  13. Knorr, E., Ng, R.: Algorithms for Mining Distance-based Outliers in Large Datasets. In: VLDB 1998 (1998)

    Google Scholar 

  14. Knorr, E., Ng, R.: Finding Intentional Knowledge of Distance-based Outliers. In: VLDB 1999 (1999)

    Google Scholar 

  15. Knorr, E., et al.: Distance-based Outliers: Algorithms and Applications. VLDB Journal (2000)

    Google Scholar 

  16. Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000, pp. 93–104 (2000)

    Google Scholar 

  17. Angiulli, F., Pizzuti, C.: Fast Outlier Detection in High Dimensional Spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Bay, S.D., Schwabacher, M.: Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: KDD 2003 (2003)

    Google Scholar 

  19. Breunig, M., et al.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD 2000 (2000)

    Google Scholar 

  20. Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  21. Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)

    Google Scholar 

  22. Jin, W., et al.: Mining top-n local Outliers in Large Databases. In: KDD 2001 (2001)

    Google Scholar 

  23. Papadimitriou, S., et al.: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)

    Google Scholar 

  24. Hu, T., Sung, S.Y.: Detecting Pattern-based Outliers. Pattern Recognition Letters (2003)

    Google Scholar 

  25. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase Clustering Process for Outliers Detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)

    Article  MATH  Google Scholar 

  26. Yu, D., Sheikholeslami, G., Zhang, A.: FindOut: Finding Out Outliers in Large Datasets. Knowledge and Information Systems 4(4), 387–412 (2002)

    Article  Google Scholar 

  27. He, Z., et al.: Discovering Cluster Based Local Outliers. Pattern Recognition Letters (2003)

    Google Scholar 

  28. Wei, L., et al.: HOT: Hypergraph-Based Outlier Test for Categorical Data. In: PAKDD 2003 (2003)

    Google Scholar 

  29. Tax, D., Duin, R.: Support Vector Data Description. Pattern Recognition Letters (1999)

    Google Scholar 

  30. Schölkopf, B., et al.: Estimating the Support of a High Dimensional Distribution. Neural Computation 13(7), 1443–1472 (2001)

    Article  MATH  Google Scholar 

  31. Cao, L.J., Lee, H.P., Chong, W.K.: Modified Support Vector Novelty Detector Using Training Data with Outliers. Pattern Recognition Letters 24(14), 2479–2487 (2003)

    Article  MATH  Google Scholar 

  32. Petrovskiy, M.: A Hybrid Method for Patterns Mining and Outliers Detection in the Web Usage Log. In: AWIC 2003, pp. 318–328 (2003)

    Google Scholar 

  33. Hawkins, S., He, H., Williams, G.J., Baxter, R.A.: Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  34. Willams, G.J., et al.: A Comparative Study of RNN for Outlier Detection in Data Mining. In: ICDM 2002, pp. 709–712 (2002)

    Google Scholar 

  35. He, Z., Deng, S., Xu, X.-f.: Outlier Detection Integrating Semantic Knowledge. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, p. 126. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  36. Papadimitriou, S., Faloutsos, C.: Cross-outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 199–213. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  37. He, Z., Xu, X., Huang, J., Deng, S.: Mining Class Outlier: Concepts, Algorithms and Applications in CRM. Expert System with Applications (2004)

    Google Scholar 

  38. Merz, G., Murphy, P.: Uci Repository of Machine Learning Databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, Z., Deng, S., Xu, X. (2005). An Optimization Model for Outlier Detection in Categorical Data. In: Huang, DS., Zhang, XP., Huang, GB. (eds) Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_42

Download citation

  • DOI: https://doi.org/10.1007/11538059_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28226-6

  • Online ISBN: 978-3-540-31902-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics