Skip to main content
Log in

Detecting and classifying outliers in big functional data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD detects and classifies outliers by computing for each curve, three indices, all based on the concept of linear regression and correlation, which measure outlyingness in terms of shape, magnitude and amplitude, relative to the other curves in the data. ‘Semifast-MUOD’, the first method, uses a sample of the observations in computing the indices, while ‘Fast-MUOD’, the second method, uses the point-wise or \(L_1\) median in computing the indices. The classical boxplot is used to separate the indices of the outliers from those of the typical observations. Performance evaluation of the proposed methods using simulated data show significant improvements compared to MUOD, both in outlier detection and computational time. We show that Fast-MUOD is especially well suited to handling big and dense functional datasets with very small computational time compared to other methods. Further comparisons with some recent outlier detection methods for functional data also show superior or comparable outlier detection accuracy of the proposed methods. We apply the proposed methods on weather, population growth, and video data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Code availability

Code of proposed method available online on Github at https://github.com/otsegun/fastmuod.

References

Download references

Acknowledgements

This research was funded in part by Agencia Estatal de Investigación (AEI) grant number AEI/10.13039/501100011033. This research was also partially supported by the Regional Government of Madrid (CM) grant EdgeData-CM (P2018/TCS4499, cofunded by FSE & FEDER) and Agencia Estatal de Investigación (AEI) grant PID2019-109805RB-I00/ AEI/10.13039/501100011033. The authorsare grateful to the editor and the referees for their constructive and insightfulcomments that led to considerable improvements in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oluwasegun Taiwo Ojo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11634_2021_460_MOESM1_ESM.pdf

[Simulation Results:] (ESM_1.pdf) Additional simulation results showing comparisons between Fast-MOUD computed with the \(L_1\) median and the point-wise median and outlier detection performance of all methods considered at higher contamination rates. Also includes comparisons of different correlation coefficients for computing the shape index \(I_S\). More simulations results using lower sample size and lower evaluation points are also presented together with a sensitivity analysis of outlier detection performance when more noise is added to the simulation models.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ojo, O.T., Fernández Anta, A., Lillo, R.E. et al. Detecting and classifying outliers in big functional data. Adv Data Anal Classif 16, 725–760 (2022). https://doi.org/10.1007/s11634-021-00460-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00460-9

Keywords

Mathematics Subject Classification

Navigation