Skip to main content

A Typology of Data Anomalies

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 854))

Abstract

Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on ‘black box’ algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Foorthuis, R.: SECODA: segmentation- and combination-based detection of anomalies. In: Proceedings of the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2017), Tokyo, Japan, pp. 755–764 (2017). https://doi.org/10.1109/dsaa.2017.35

  2. Izenman, A.J.: Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York (2008). https://doi.org/10.1007/978-0-387-78189-1

    Book  MATH  Google Scholar 

  3. Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6396-2

    Book  MATH  Google Scholar 

  4. Pimentel, M.A.F., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Sig. Process. 99, 215–249 (2014)

    Article  Google Scholar 

  5. Foorthuis, R.: Anomaly detection with SECODA. Poster Presentation at the 4th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo (2017)

    Google Scholar 

  6. Mittelstadt, B.D., Allo, P., Taddeo, M., Wachter, S., Floridi, L.: The ethics of algorithms: mapping the debate. Big Data Soc. 3(2), 2053951716679679 (2016)

    Article  Google Scholar 

  7. Ziewitz, M.: Governing algorithms: myth, mess, and methods. Sci. Technol. Hum. Values 41(1), 3–16 (2016)

    Article  Google Scholar 

  8. Sculley, D., et al.: Hidden technical debt in machine learning systems. In: Proceedings of NIPS 2015, vol. 2, pp. 2503–2511 (2015)

    Google Scholar 

  9. Breck, E., Cai, S., Nielsen, E., Salib, M., Sculley, D.: What’s your ML test score? A rubric for ML production systems. In: Proceedings of NIPS 2016 (2016)

    Google Scholar 

  10. Clarke, B., Fokoué, E., Zhang, H.H.: Principles and Theory for Data Mining and Machine Learning. Springer, New York (2009). https://doi.org/10.1007/978-0-387-98135-2

    Book  MATH  Google Scholar 

  11. Janssens, J.H.M.: Outlier selection and one-class classification. Ph.D. Thesis, Tilburg University (2013)

    Google Scholar 

  12. Rokach, L., Maimon, O.: Data Mining With Decision Trees: Theory and Applications, 2nd edn. World Scientific Publishing, Singapore (2015)

    MATH  Google Scholar 

  13. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)

    Article  Google Scholar 

  14. Song, X., Wu, M., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19(5), 631–645 (2007)

    Article  Google Scholar 

  15. Kaiser, R., Maravall, A.: Seasonal outliers in time series. Universidad Carlos III de Madrid, working paper number 99-49 (1999)

    Google Scholar 

  16. Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate functional outlier detection. Stat. Methods Appl. 24(2), 177–202 (2015)

    Article  MathSciNet  Google Scholar 

  17. Chatterjee, S., Hadi, A.: Regression Analysis by Example, 4th edn. Wiley, Hoboken (2006)

    Book  Google Scholar 

  18. Foorthuis, R.: The SECODA Algorithm for the Detection of Anomalies in Sets with Mixed Data. www.foorthuis.nl. 18 Nov 2017

  19. Embrechts, P.: Extreme value theory: potential and limitations as an integrated risk management tool. Deriv. Use Trading Regul. 6(1), 449–456 (2000)

    Google Scholar 

  20. Koufakou, A., Ortiz, E., Georgiopoulos, M., Anagnostopoulos, G., Reynolds, K.: A scalable and efficient outlier detection strategy for categorical data. In: Proc of ICTAI (2007)

    Google Scholar 

  21. Ben-Gal, I.: Outlier detection. In: Maimon, O., Rockach, L. (eds.) Data Mining and Knowledge Discovery Handbook. Kluwer Academic Publishers, Boston (2005). https://doi.org/10.1007/0-387-25465-X_7

    Chapter  Google Scholar 

  22. Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms. PLoS ONE 11(4), e0152173 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralph Foorthuis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Foorthuis, R. (2018). A Typology of Data Anomalies. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 854. Springer, Cham. https://doi.org/10.1007/978-3-319-91476-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91476-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91475-6

  • Online ISBN: 978-3-319-91476-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics