Advertisement

Ranking Outliers Using Symmetric Neighborhood Relationship

  • Wen Jin
  • Anthony K. H. Tung
  • Jiawei Han
  • Wei Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

Keywords

Outlier Detection Mining Algorithm Local Outlier Neighbor Query Neighboring Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Yu, P.: Outlier Detection for High Dimensional Data. In: SIGMOD 2001 (2001)Google Scholar
  2. 2.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density- based Local Outliers. In: SIGMOD (2000)Google Scholar
  3. 3.
    Chakrabarti, D.: AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: PKDD 2004 (2004)Google Scholar
  4. 4.
    Chen, Z.X., Fu, A.W., Tang, J.: On Complementarity of Cluster and Outlier Detection Schemes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Chiu, A.L., Fu, A.W.: Enhancements on Local Outlier Detection. In: IDEAS 2003 (2003)Google Scholar
  6. 6.
    Ester, M., Kriegel, H.P., et al.: A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In: KDD 1996 (1996)Google Scholar
  7. 7.
    Guha, S., Rastogi, R., Shim, K.: Cure: An Efficient Clustering Algorithm for Large Databases. In: SIGMOD 1998 (1998)Google Scholar
  8. 8.
    Hautamki, V., Krkkinen, I., Frnti, P.: Outlier Detection Using k-nearest Neigh-bour Graph. In: ICPR 2004 (2004)Google Scholar
  9. 9.
    Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
  10. 10.
    Jagadish, H., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB 1999 (1999)Google Scholar
  11. 11.
    Jin, W., Tung, K.H., Han, J.W.: Mining Top-n Local Outliers in Large Databases. In: KDD 2001 (2001)Google Scholar
  12. 12.
    Knorr, E., Ng, R.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998 (1998)Google Scholar
  13. 13.
    Knorr, E., Ng, R.: Finding Intensional Knowledge of Distance-Based Outliers. In: VLDB 1999 (1999)Google Scholar
  14. 14.
    Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: SIGMOD 2000 (2000)Google Scholar
  15. 15.
    Muthukrishnan, S., Shah, R., Vitter, J.S.: Mining Deviants in Time Series Data Streams. In: SSDBM 2004 (2004)Google Scholar
  16. 16.
    Ng, R., Han, J.W.: Efficient and Effective Clustering Method for Spatial Data Mining. In: VLDB 1994 (1994)Google Scholar
  17. 17.
    Papadimitriou, S., Kitagawa, H., et al.: LOCI:Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003 (2003)Google Scholar
  18. 18.
    Papadimitriou, S., Faloutsos, C.: Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD 1995 (1995)Google Scholar
  20. 20.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000(2000)Google Scholar
  21. 21.
    Shekhar, S., Lu, C.T., Zhang, P.S.: Detecting Graph-based Spatial Outliers. In: KDD 2001 (2001)Google Scholar
  22. 22.
    Tang, J., Chen, Z.X., et al.: Enhancing Effectiveness of Outlier Detections for Low Density Patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 535. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  23. 23.
    Wong, W.K., Moore, A.W., et al.: Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks. In: AAAI 2002 (2002)Google Scholar
  24. 24.
    Yiu, M.L., Mamoulis, N.: Clustering Objects on a Spatial Network. In: SIGMOD 2004 (2004)Google Scholar
  25. 25.
    Yiu, M.L., et al.: Aggregate Nearest Neighbor Queries in Road Networks. IEEE Trans. Knowl. Data Eng 17(6) (2005)Google Scholar
  26. 26.
    Zhang, T., et al.: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wen Jin
    • 1
  • Anthony K. H. Tung
    • 2
  • Jiawei Han
    • 3
  • Wei Wang
    • 4
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada
  2. 2.Department of Computer ScienceNational University of SingaporeSingapore
  3. 3.Department of Computer ScienceUniv. of Illinois at Urbana-ChampaignUSA
  4. 4.Department of Computer ScienceFudan UniversityChina

Personalised recommendations