Grid-ODF: Detecting Outliers Effectively and Efficiently in Large Multi-dimensional Databases

  • Wei Wang
  • Ji Zhang
  • Hai Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3801)

Abstract

In this paper, we will propose a novel outlier mining algorithm, called Grid-ODF, that takes into account both the local and global perspectives of outliers for effective detection. The notion ofOutlying Degree Factor(ODF), that reflects the factors of both the density and distance, is introduced to rank outliers. A grid structure partitioning the data space is employed to enable Grid-ODF to be implemented efficiently. Experimental results show that Grid-ODF outperforms existing outlier detection algorithms such as LOF and KNN-distance in terms of effectiveness and efficiency.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data Mining Application. In: SIGMOD 1999, Philadelphia, PA (1999)Google Scholar
  2. 2.
    Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. John Wiley, Chichester (1994)MATHGoogle Scholar
  3. 3.
    Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD 2000, Dallas, Texas (2000)Google Scholar
  4. 4.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD 1996, Portland, Oregon (1996)Google Scholar
  5. 5.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, San Francisco (2000)Google Scholar
  6. 6.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)MATHGoogle Scholar
  7. 7.
    Hinneburg, A., Keim, D.A.: An Efficient Approach to Cluster in Large Multimedia Databases with Noise. In: KDD 1998, New York City, NY (1998)Google Scholar
  8. 8.
    Jin, W., Tung, A.K.H., Han, J.: Finding Top_n Local Outliers in Large Database. In: SIGKDD 2001, San Francisco, CA (2001)Google Scholar
  9. 9.
    Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-based Outliers in Large Dataset. In: VLDB 1998, New York, NY (1998)Google Scholar
  10. 10.
    Knorr, E.M., Ng, R.T.: Finding Intentional Knowledge of Distance-based Outliers. In: VLDB 1999, Edinburgh, Scotland (1999)Google Scholar
  11. 11.
    Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: VLDB 1994, Santiago, Chile (1994)Google Scholar
  12. 12.
    Preparata, F., Shamos, M.: Computational Geometry: an Introduction. Springer, Heidelberg (1988)Google Scholar
  13. 13.
    Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000, Dallas, Texas (2000)Google Scholar
  14. 14.
    Ruts, I., Rousseeuw, P.: Computing Depth Contours of Bivariate Point Clouds. Computational Statistics and Data Analysis 23, 153–168 (1996)MATHCrossRefGoogle Scholar
  15. 15.
    Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Wavelet based Clustering Approach for Spatial Data in Very Large Database. VLDB Journal 8(3-4), 289–304 (1999)Google Scholar
  16. 16.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996, Montreal, Canada (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Wei Wang
    • 1
  • Ji Zhang
    • 2
  • Hai Wang
    • 3
  1. 1.College of Educational ScienceNanjing Normal UniversityChina
  2. 2.Falculty of Computer ScienceDalhousie UniversityHalifaxCanada
  3. 3.Sobey School of BusinessSaint Mary’s UniversityHalifaxCanada

Personalised recommendations