Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Outlier Detection

  • Arthur Zimek
  • Erich Schubert
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_80719-1

Synonyms

Definition

Outlier detection aims at identifying those objects in a database that are unusual, i.e., different than the majority of the data and therefore suspicious resulting from a contamination, error, or fraud. In a statistical modeling, the assessment of “being unusual” is typically based on a parametric model of the data, identifying those objects that do not fit well to the modeled distribution as outliers. In the database context, the statistical intuition of “being unusual” is typically modeled in an approximate but more efficient, nonparametric way by (local) density estimates and comparison to some reference set.

Historical Background

Filtering out those observations that look suspiciously different than the majority of observations is a procedure probably tacitly practiced since people studied data collections and tried to make sense out of observations. In the eighteenth century,...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Hawkins D. Identification of outliers. London: Chapman and Hall; 1980.CrossRefzbMATHGoogle Scholar
  2. 2.
    Barnett V, Lewis T. Outliers in statistical data. 3rd ed. Chichester: Wiley; 1994.zbMATHGoogle Scholar
  3. 3.
    Rousseeuw PJ, Hubert M. Robust statistics for outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1(1):73–9.CrossRefGoogle Scholar
  4. 4.
    Knorr EM, Ng RT, Tucanov V. Distance-based outliers: algorithms and applications. VLDB J. 2000;8(3–4):237–53.CrossRefGoogle Scholar
  5. 5.
    Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), Dallas; 2000. p. 427–38.Google Scholar
  6. 6.
    Angiulli F, Pizzuti C. Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng. 2005;17(2):203–15.CrossRefzbMATHGoogle Scholar
  7. 7.
    Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), Dallas; 2000. p. 93–104.Google Scholar
  8. 8.
    Schubert E, Zimek A, Kriegel HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc. 2014;28(1):190–237.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Orair GH, Teixeira C, Wang Y, Meira Jr W, Parthasarathy S. Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endowment. 2010;3(2):1469–80.CrossRefGoogle Scholar
  10. 10.
    Zimek A, Schubert E, Kriegel HP. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min. 2012;5(5): 363–87.MathSciNetCrossRefGoogle Scholar
  11. 11.
    Zimek A, Campello RJGB, Sander J. Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor. 2013;15(1):11–22.CrossRefGoogle Scholar
  12. 12.
    Chandola V, Banerjee A, Kumar V. Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng. 2012;24(5):823–39.CrossRefGoogle Scholar
  13. 13.
    Akoglu L, Tong H, Koutra D. Graph-based anomaly detection and description: a survey. Data Min Knowl Disc. 2014; doi:10.1007/s10618-014-0365-y.Google Scholar
  14. 14.
    Kriegel HP, Kröger P, Schubert E, Zimek A. Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa; 2011. p. 13–24.Google Scholar
  15. 15.
    Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York; 2013. p. 1009–12.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark
  2. 2.Heidelberg UniversityHeidelbergGermany

Section editors and affiliations

  • Dimitrios Gunopulos
    • 1
  1. 1.Department of Computer Science and EngineeringThe University of California at Riverside, Bourns College of EngineeringRiversideUSA