Ranking Outliers Using Symmetric Neighborhood Relationship
- 125 Citations
- 2k Downloads
Abstract
Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.
Keywords
Outlier Detection Mining Algorithm Local Outlier Neighbor Query Neighboring ObjectPreview
Unable to display preview. Download preview PDF.
References
- 1.Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)Google Scholar
- 2.Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density- based Local Outliers. In: SIGMOD (2000)Google Scholar
- 3.Chakrabarti, D.: AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: PKDD 2004 (2004)Google Scholar
- 4.Chen, Z.X., Fu, A.W., Tang, J.: On Complementarity of Cluster and Outlier Detection Schemes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737. Springer, Heidelberg (2003)Google Scholar
- 5.Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)Google Scholar
- 6.Ester, M., Kriegel, H.P., et al.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: KDD 1996 (1996)Google Scholar
- 7.Guha, S., Rastogi, R., Shim, K.: Cure: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD 1998 (1998)Google Scholar
- 8.Hautamki, V., Krkkinen, I., Frnti, P.: Outlier Detection Using k-nearest Neigh-bour Graph. In: ICPR 2004 (2004)Google Scholar
- 9.Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
- 10.Jagadish, H., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB 1999 (1999)Google Scholar
- 11.Jin, W., Tung, K.H., Han, J.W.: Mining Top-n Local Outliers in Large Databases. In: KDD 2001 (2001)Google Scholar
- 12.Knorr, E., Ng, R.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998 (1998)Google Scholar
- 13.Knorr, E., Ng, R.: Finding Intensional Knowledge of Distance-Based Outliers. In: VLDB 1999 (1999)Google Scholar
- 14.Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: SIGMOD 2000 (2000)Google Scholar
- 15.Muthukrishnan, S., Shah, R., Vitter, J.S.: Mining Deviants in Time Series Data Streams. In: SSDBM 2004 (2004)Google Scholar
- 16.Ng, R., Han, J.W.: Efficient and Effective Clustering Method for Spatial Data Mining. In: VLDB 1994 (1994)Google Scholar
- 17.Papadimitriou, S., Kitagawa, H., et al.: LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)Google Scholar
- 18.Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 19.Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD 1995 (1995)Google Scholar
- 20.Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000(2000)Google Scholar
- 21.Shekhar, S., Lu, C.T., Zhang, P.S.: Detecting Graph-based Spatial Outliers. In: KDD 2001 (2001)Google Scholar
- 22.Tang, J., Chen, Z.X., et al.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)CrossRefGoogle Scholar
- 23.Wong, W.K., Moore, A.W., et al.: Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: AAAI 2002 (2002)Google Scholar
- 24.Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004 (2004)Google Scholar
- 25.Yiu, M.L., et al.: Aggregate Nearest Neighbor Queries in Road Networks. IEEE Trans. Knowl. Data Eng 17(6) (2005)Google Scholar
- 26.Zhang, T., et al.: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)Google Scholar