Skip to main content

Ranking Outliers Using Symmetric Neighborhood Relationship

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Abstract

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)

    Google Scholar 

  2. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density- based Local Outliers. In: SIGMOD (2000)

    Google Scholar 

  3. Chakrabarti, D.: AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: PKDD 2004 (2004)

    Google Scholar 

  4. Chen, Z.X., Fu, A.W., Tang, J.: On Complementarity of Cluster and Outlier Detection Schemes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737. Springer, Heidelberg (2003)

    Google Scholar 

  5. Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)

    Google Scholar 

  6. Ester, M., Kriegel, H.P., et al.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: KDD 1996 (1996)

    Google Scholar 

  7. Guha, S., Rastogi, R., Shim, K.: Cure: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD 1998 (1998)

    Google Scholar 

  8. Hautamki, V., Krkkinen, I., Frnti, P.: Outlier Detection Using k-nearest Neigh-bour Graph. In: ICPR 2004 (2004)

    Google Scholar 

  9. Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  10. Jagadish, H., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB 1999 (1999)

    Google Scholar 

  11. Jin, W., Tung, K.H., Han, J.W.: Mining Top-n Local Outliers in Large Databases. In: KDD 2001 (2001)

    Google Scholar 

  12. Knorr, E., Ng, R.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998 (1998)

    Google Scholar 

  13. Knorr, E., Ng, R.: Finding Intensional Knowledge of Distance-Based Outliers. In: VLDB 1999 (1999)

    Google Scholar 

  14. Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: SIGMOD 2000 (2000)

    Google Scholar 

  15. Muthukrishnan, S., Shah, R., Vitter, J.S.: Mining Deviants in Time Series Data Streams. In: SSDBM 2004 (2004)

    Google Scholar 

  16. Ng, R., Han, J.W.: Efficient and Effective Clustering Method for Spatial Data Mining. In: VLDB 1994 (1994)

    Google Scholar 

  17. Papadimitriou, S., Kitagawa, H., et al.: LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)

    Google Scholar 

  18. Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD 1995 (1995)

    Google Scholar 

  20. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000(2000)

    Google Scholar 

  21. Shekhar, S., Lu, C.T., Zhang, P.S.: Detecting Graph-based Spatial Outliers. In: KDD 2001 (2001)

    Google Scholar 

  22. Tang, J., Chen, Z.X., et al.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Wong, W.K., Moore, A.W., et al.: Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: AAAI 2002 (2002)

    Google Scholar 

  24. Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004 (2004)

    Google Scholar 

  25. Yiu, M.L., et al.: Aggregate Nearest Neighbor Queries in Road Networks. IEEE Trans. Knowl. Data Eng 17(6) (2005)

    Google Scholar 

  26. Zhang, T., et al.: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, W., Tung, A.K.H., Han, J., Wang, W. (2006). Ranking Outliers Using Symmetric Neighborhood Relationship. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_68

Download citation

  • DOI: https://doi.org/10.1007/11731139_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics