Advertisement

OPTICS-OF: Identifying Local Outliers

  • Markus M. Breunig
  • Hans-Peter Kriegel
  • Raymond T. Ng
  • Jörg Sander
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1704)

Abstract

For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how ‘isolated’ this object is, with respect to the surrounding clustering structure. In this paper, we formally introduce a new notion of outliers which bases outlier detection on the same theoretical foundation as density-based cluster analysis. Our notion of an outlier is ‘local’ in the sense that the outlier-degree of an object is determined by taking into account the clustering structure in a bounded neighborhood of the object. We demonstrate that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches, and we also present an algorithm for finding them. Furthermore, we show that by combining the outlier detection with a density-based method to analyze the clustering structure, we can get the outliers almost for free if we already want to perform a cluster analysis on a data set.

Keywords

Cluster Structure Outlier Detection Local Outlier Fraud Detection Core Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA (1999)Google Scholar
  2. 2.
    Berchthold, S., Keim, D., Kriegel, H.-P.: The X-Tree: An Index Structure for High- Dimensional Data. In: 22nd Conf. on Very Large Data Bases, Bombay, India, pp. 28–39 (1996)Google Scholar
  3. 3.
    Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, pp. 322–331. ACM Press, New York (1990)Google Scholar
  4. 4.
    Barnett, V., Lewis, T.: Outliers in statistical data. John Wiley, Chichester (1994)zbMATHGoogle Scholar
  5. 5.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, pp. 426–435 (1997)Google Scholar
  6. 6.
    DuMouchel, W., Schonlau, M.: A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, pp. 189–193. AAAI Press, Menlo Park (1998)Google Scholar
  7. 7.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  8. 8.
    Fawcett, T., Provost, F.: Adaptive Fraud Detection. In: Data Mining and Knowledge Discovery Journal, 1st edn., pp. 291–316. Kluwer Academic Publishers, DordrechtGoogle Scholar
  9. 9.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 82–88 (1996)Google Scholar
  10. 10.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)zbMATHGoogle Scholar
  11. 11.
    Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proc. 4th Int. Conf. on KDD, New York, NY, pp. 224–228. AAAI Press, Menlo Park (1998)Google Scholar
  12. 12.
    Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, pp. 392–403 (1998)Google Scholar
  13. 13.
    Preparata, F., Shamos, M.: Computational Geometry: an Introduction. Springer, Heidelberg (1988)Google Scholar
  14. 14.
    Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1), 30–34 (1973)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Markus M. Breunig
    • 1
  • Hans-Peter Kriegel
    • 1
  • Raymond T. Ng
    • 1
  • Jörg Sander
    • 1
  1. 1.Institute for Computer ScienceUniversity of MunichMunichGermany

Personalised recommendations