Enhancing Effectiveness of Outlier Detections for Low Density Patterns
Conference paper
First Online:
- 150 Citations
- 1.8k Downloads
Abstract
Outlier detection is concerned with discovering exceptional behaviors of objects in data sets. It is becoming a growingly useful tool in applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, identifying computer intrusion, detecting health problems, etc. In this paper, we introduce a connectivity-based outlier factor (COF) scheme that improves the effectiveness of an existing local outlier factor (LOF) scheme when a pattern itself has similar neighbourhood density as an outlier. We give theoretical and empirical analysis to demonstrate the improvement in effectiveness and the capability of the COF scheme in comparison with the LOF scheme.
Keywords
Outlier Detection Large Data Base Local Outlier Factor Reachability Distance Exceptional Behavior
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- [1]A. Arning, R. Agrawal, P. Raghavan: ”A Linear Method for Deviation detection in Large Databases”, Proc. of 2nd Intl. Conf. On Knowledge Discovery and Data Mining, 1996, pp 164–169.Google Scholar
- [2]V. Barnett, T. Lewis: ”Outliers in Statistical Data”, John Wiley, 1994.Google Scholar
- [3]M. Breuning, Hans-Peter Kriegel, R. Ng, J. Sander: ”LOF: Identifying density based Local Outliers”, Proc. of the ACM SIGMOD Conf. On Management of Data, 2000.Google Scholar
- [4]W. DuMouchel, M. Schonlau: ”A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities”, Proc.of 4th Intl. Conf. On Knowledge Discovery and Data Mining, 1998, pp. 189–193.Google Scholar
- [5]M. Ester, H. Kriegel, J. Sander, X. Xu: ”A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. of 2nd Intl. Conf. On Knowledge Discovery and Data Mining, 1996, pp 226–231.Google Scholar
- [6]T. Fawcett, F. Provost: ”Adaptive Fraud Detection”, Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp 291–316.CrossRefGoogle Scholar
- [7]D. Hawkins: ”Identification of Outliers”, Chapman and Hall, London, 1980.zbMATHGoogle Scholar
- [8]E. Knorr, R. Ng: ”Algorithms for Mining Distance based Outliers in Large Datasets”, Proc. of 24th Intl. Conf. On Very Large Data Bases, 1998, pp 392–403.Google Scholar
- [9]E. Knorr, R. Ng: ”Finding Intensional Knowledge of Distance-based Outliers”, Proc. of 25th Intl. Conf. On Very Large Data Bases, 1999, pp 211–222.Google Scholar
- [10]R. Ng, J. Han: ”Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. of 20th Intl. Conf. On Very Large Data Bases, 1994, pp 144–155.Google Scholar
- [11]S. Ramaswamy, R. Rastogi, S. Kyuseok: ”Efficient Algorithms for Mining Outliers from Large Data Sets”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 2000, pp 427–438.Google Scholar
- [12]N. Roussopoulos, S. Kelley, F. Vincent, ”Nearest Neighbor Queries”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 1995, pp 71–79.Google Scholar
- [13]G. Sheikholeslami, S. Chatterjee, A. Zhang: ”WaveCluster: A multi-Resolution Clustering Approach for Very Large Spatial Databases”, Proc. of 24th Intl. Conf. On Very Large Data Bases, 1998, pp 428–439.Google Scholar
- [14]S. Guha, R. Rastogi, K. Shim: ”Cure: An Efficient Clustering Algorithm for Large Databases”, In Proc. of the ACM SIGMOD Conf. On Management of Data, 1998, pp 73–84.Google Scholar
- [15]J. Tang, Z. Chen, A. Fu and D. Cheung: ”A General Framework for Outlier Formulations: Density versus Connectivity”, Manuscript.Google Scholar
- [16]T. Zhang, R. Ramakrishnan, M. Linvy: ”BIRCH: An Efficient Data Clustering Method for Very Large Databases”, Proc. of ACM SIGMOD Intl. Conf. On Management of Data, 1996, pp 103–114.Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2002