A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data

  • Ke Zhang
  • Marcus Hutter
  • Huidong Jin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)

Abstract

Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF’s false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values.

Keywords

local outlier scattered data k-distance KNN LOF LDOF 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnett, V.: Outliers in Statistical Data. John Wiley, Chichester (1994)MATHGoogle Scholar
  2. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: OPTICS-OF: Identifying local outliers. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS, vol. 1704, pp. 262–270. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  3. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)Google Scholar
  4. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density- based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  5. Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A non- parametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 557–566. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)CrossRefMATHGoogle Scholar
  7. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)Google Scholar
  8. Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: KDD, pp. 444–452 (2008)Google Scholar
  9. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate analysis. Academic Press, New York (1979)MATHGoogle Scholar
  10. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algo- rithms for mining outliers from large data sets. In: SIGMOD Conference, pp. 427–438 (2000)Google Scholar
  11. Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.-L.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 535–548. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. Tukey, J.W.: Exploratory Data Analysis. Addison-Wiley, Chichester (1977)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ke Zhang
    • 1
  • Marcus Hutter
    • 1
    • 2
  • Huidong Jin
    • 1
    • 2
    • 3
  1. 1.RSISE, Australian National UniversityAustralia
  2. 2.National ICT Australia (NICTA), Canberra Lab, ACTAustralia
  3. 3.CSIRO Mathematical and Information Sciences, Acton ACT 2601Australia

Personalised recommendations