Enhancing Effectiveness of Outlier Detections for Low Density Patterns

  • Jian Tang
  • Zhixiang Chen
  • Ada Wai-chee Fu
  • David W. Cheung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2336)

Abstract

Outlier detection is concerned with discovering exceptional behaviors of objects in data sets. It is becoming a growingly useful tool in applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, identifying computer intrusion, detecting health problems, etc. In this paper, we introduce a connectivity-based outlier factor (COF) scheme that improves the effectiveness of an existing local outlier factor (LOF) scheme when a pattern itself has similar neighbourhood density as an outlier. We give theoretical and empirical analysis to demonstrate the improvement in effectiveness and the capability of the COF scheme in comparison with the LOF scheme.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. Arning, R. Agrawal, P. Raghavan: ”A Linear Method for Deviation detection in Large Databases”, Proc. of 2nd Intl. Conf. On Knowledge Discovery and Data Mining, 1996, pp 164–169.Google Scholar
  2. [2]
    V. Barnett, T. Lewis: ”Outliers in Statistical Data”, John Wiley, 1994.Google Scholar
  3. [3]
    M. Breuning, Hans-Peter Kriegel, R. Ng, J. Sander: ”LOF: Identifying density based Local Outliers”, Proc. of the ACM SIGMOD Conf. On Management of Data, 2000.Google Scholar
  4. [4]
    W. DuMouchel, M. Schonlau: ”A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities”, Proc.of 4th Intl. Conf. On Knowledge Discovery and Data Mining, 1998, pp. 189–193.Google Scholar
  5. [5]
    M. Ester, H. Kriegel, J. Sander, X. Xu: ”A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. of 2nd Intl. Conf. On Knowledge Discovery and Data Mining, 1996, pp 226–231.Google Scholar
  6. [6]
    T. Fawcett, F. Provost: ”Adaptive Fraud Detection”, Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp 291–316.CrossRefGoogle Scholar
  7. [7]
    D. Hawkins: ”Identification of Outliers”, Chapman and Hall, London, 1980.MATHGoogle Scholar
  8. [8]
    E. Knorr, R. Ng: ”Algorithms for Mining Distance based Outliers in Large Datasets”, Proc. of 24th Intl. Conf. On Very Large Data Bases, 1998, pp 392–403.Google Scholar
  9. [9]
    E. Knorr, R. Ng: ”Finding Intensional Knowledge of Distance-based Outliers”, Proc. of 25th Intl. Conf. On Very Large Data Bases, 1999, pp 211–222.Google Scholar
  10. [10]
    R. Ng, J. Han: ”Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. of 20th Intl. Conf. On Very Large Data Bases, 1994, pp 144–155.Google Scholar
  11. [11]
    S. Ramaswamy, R. Rastogi, S. Kyuseok: ”Efficient Algorithms for Mining Outliers from Large Data Sets”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 2000, pp 427–438.Google Scholar
  12. [12]
    N. Roussopoulos, S. Kelley, F. Vincent, ”Nearest Neighbor Queries”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 1995, pp 71–79.Google Scholar
  13. [13]
    G. Sheikholeslami, S. Chatterjee, A. Zhang: ”WaveCluster: A multi-Resolution Clustering Approach for Very Large Spatial Databases”, Proc. of 24th Intl. Conf. On Very Large Data Bases, 1998, pp 428–439.Google Scholar
  14. [14]
    S. Guha, R. Rastogi, K. Shim: ”Cure: An Efficient Clustering Algorithm for Large Databases”, In Proc. of the ACM SIGMOD Conf. On Management of Data, 1998, pp 73–84.Google Scholar
  15. [15]
    J. Tang, Z. Chen, A. Fu and D. Cheung: ”A General Framework for Outlier Formulations: Density versus Connectivity”, Manuscript.Google Scholar
  16. [16]
    T. Zhang, R. Ramakrishnan, M. Linvy: ”BIRCH: An Efficient Data Clustering Method for Very Large Databases”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 1996, pp 103–114.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jian Tang
    • 1
  • Zhixiang Chen
    • 2
  • Ada Wai-chee Fu
    • 1
  • David W. Cheung
    • 3
  1. 1.Department of Computer Science and EngineeringChinese University of Hong KongShatinHong Kong
  2. 2.Department of Computer ScienceUniversity of Texas at Pan-AmericaTexasUSA
  3. 3.Department of Computer Science and Information SystemsUniversity of Hong KongPokfulamHong Kong
  4. 4.Memorial University of NewfoundlandCanada

Personalised recommendations