Ranking Outliers Using Symmetric Neighborhood Relationship

Jin, Wen; Tung, Anthony K. H.; Han, Jiawei; Wang, Wei

doi:10.1007/11731139_68

Wen Jin²²,
Anthony K. H. Tung²³,
Jiawei Han²⁴ &
…
Wei Wang²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3533 Accesses
197 Citations

Abstract

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density- based Local Outliers. In: SIGMOD (2000)
Google Scholar
Chakrabarti, D.: AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: PKDD 2004 (2004)
Google Scholar
Chen, Z.X., Fu, A.W., Tang, J.: On Complementarity of Cluster and Outlier Detection Schemes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737. Springer, Heidelberg (2003)
Google Scholar
Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)
Google Scholar
Ester, M., Kriegel, H.P., et al.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: KDD 1996 (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD 1998 (1998)
Google Scholar
Hautamki, V., Krkkinen, I., Frnti, P.: Outlier Detection Using k-nearest Neigh-bour Graph. In: ICPR 2004 (2004)
Google Scholar
Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Jagadish, H., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB 1999 (1999)
Google Scholar
Jin, W., Tung, K.H., Han, J.W.: Mining Top-n Local Outliers in Large Databases. In: KDD 2001 (2001)
Google Scholar
Knorr, E., Ng, R.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998 (1998)
Google Scholar
Knorr, E., Ng, R.: Finding Intensional Knowledge of Distance-Based Outliers. In: VLDB 1999 (1999)
Google Scholar
Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: SIGMOD 2000 (2000)
Google Scholar
Muthukrishnan, S., Shah, R., Vitter, J.S.: Mining Deviants in Time Series Data Streams. In: SSDBM 2004 (2004)
Google Scholar
Ng, R., Han, J.W.: Efficient and Effective Clustering Method for Spatial Data Mining. In: VLDB 1994 (1994)
Google Scholar
Papadimitriou, S., Kitagawa, H., et al.: LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)
Google Scholar
Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003)
Chapter Google Scholar
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD 1995 (1995)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000(2000)
Google Scholar
Shekhar, S., Lu, C.T., Zhang, P.S.: Detecting Graph-based Spatial Outliers. In: KDD 2001 (2001)
Google Scholar
Tang, J., Chen, Z.X., et al.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)
Chapter Google Scholar
Wong, W.K., Moore, A.W., et al.: Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: AAAI 2002 (2002)
Google Scholar
Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004 (2004)
Google Scholar
Yiu, M.L., et al.: Aggregate Nearest Neighbor Queries in Road Networks. IEEE Trans. Knowl. Data Eng 17(6) (2005)
Google Scholar
Zhang, T., et al.: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science, Simon Fraser University, Canada
Wen Jin
Department of Computer Science, National University of Singapore, Singapore
Anthony K. H. Tung
Department of Computer Science, Univ. of Illinois at Urbana-Champaign, USA
Jiawei Han
Department of Computer Science, Fudan University, China
Wei Wang

Authors

Wen Jin
View author publications
You can also search for this author in PubMed Google Scholar
Anthony K. H. Tung
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, W., Tung, A.K.H., Han, J., Wang, W. (2006). Ranking Outliers Using Symmetric Neighborhood Relationship. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_68

Download citation

DOI: https://doi.org/10.1007/11731139_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics