A Typology of Data Anomalies

Foorthuis, Ralph

doi:10.1007/978-3-319-91476-3_3

A Typology of Data Anomalies

Ralph Foorthuis ORCID: orcid.org/0000-0003-1132-4767¹⁶

Conference paper
First Online: 18 May 2018

1215 Accesses
4 Citations
1 Altmetric

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Abstract

Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on ‘black box’ algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Foorthuis, R.: SECODA: segmentation- and combination-based detection of anomalies. In: Proceedings of the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017), Tokyo, Japan, pp. 755–764 (2017). https://doi.org/10.1109/dsaa.2017.35
Izenman, A.J.: Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York (2008). https://doi.org/10.1007/978-0-387-78189-1
Book MATH Google Scholar
Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2
Book MATH Google Scholar
Pimentel, M.A.F., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Sig. Process. 99, 215–249 (2014)
Article Google Scholar
Foorthuis, R.: Anomaly detection with SECODA. Poster Presentation at the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo (2017)
Google Scholar
Mittelstadt, B.D., Allo, P., Taddeo, M., Wachter, S., Floridi, L.: The ethics of algorithms: mapping the debate. Big Data Soc. 3(2), 2053951716679679 (2016)
Article Google Scholar
Ziewitz, M.: Governing algorithms: myth, mess, and methods. Sci. Technol. Hum. Values 41(1), 3–16 (2016)
Article Google Scholar
Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Proceedings of NIPS 2015, vol. 2, pp. 2503–2511 (2015)
Google Scholar
Breck, E., Cai, S., Nielsen, E., Salib, M., Sculley, D.: What’s your ML test score? A rubric for ML production systems. In: Proceedings of NIPS 2016 (2016)
Google Scholar
Clarke, B., Fokoué, E., Zhang, H.H.: Principles and Theory for Data Mining and Machine Learning. Springer, New York (2009). https://doi.org/10.1007/978-0-387-98135-2
Book MATH Google Scholar
Janssens, J.H.M.: Outlier selection and one-class classification. Ph.D. Thesis, Tilburg University (2013)
Google Scholar
Rokach, L., Maimon, O.: Data Mining With Decision Trees: Theory and Applications, 2nd edn. World Scientific Publishing, Singapore (2015)
MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)
Article Google Scholar
Song, X., Wu, M., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19(5), 631–645 (2007)
Article Google Scholar
Kaiser, R., Maravall, A.: Seasonal outliers in time series. Universidad Carlos III de Madrid, working paper number 99-49 (1999)
Google Scholar
Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate functional outlier detection. Stat. Methods Appl. 24(2), 177–202 (2015)
Article MathSciNet Google Scholar
Chatterjee, S., Hadi, A.: Regression Analysis by Example, 4th edn. Wiley, Hoboken (2006)
Book Google Scholar
Foorthuis, R.: The SECODA Algorithm for the Detection of Anomalies in Sets with Mixed Data. www.foorthuis.nl. 18 Nov 2017
Embrechts, P.: Extreme value theory: potential and limitations as an integrated risk management tool. Deriv. Use Trading Regul. 6(1), 449–456 (2000)
Google Scholar
Koufakou, A., Ortiz, E., Georgiopoulos, M., Anagnostopoulos, G., Reynolds, K.: A scalable and efficient outlier detection strategy for categorical data. In: Proc of ICTAI (2007)
Google Scholar
Ben-Gal, I.: Outlier detection. In: Maimon, O., Rockach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Kluwer Academic Publishers, Boston (2005). https://doi.org/10.1007/0-387-25465-X_7
Chapter Google Scholar
Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms. PLoS ONE 11(4), e0152173 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UWV, La Guardiaweg 116, 1040 HG, Amsterdam, The Netherlands
Ralph Foorthuis

Authors

Ralph Foorthuis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ralph Foorthuis .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Foorthuis, R. (2018). A Typology of Data Anomalies. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-91476-3_3
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91475-6
Online ISBN: 978-3-319-91476-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics