Skip to main content
Log in

Assessing the effects of multivariate functional outlier identification and sample robustification on identifying critical PM2.5 air pollution episodes in Medellín, Colombia

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Identification of critical episodes of environmental pollution, both as a outlier identification problem and as a classification problem, is a usual application of multivariate functional data analysis. This article addresses the effects of robustifying multivariate functional samples on the identification of critical pollution episodes in Medellín, Colombia. To do so, it compares 18 depth-based outlier identification methods and highlights the best options in terms of precision through simulation. It then applies the two methods with the best performance to robustify a real dataset of air pollution (PM2.5 concentration) in the Metropolitan Area of Medellín, Colombia and compares the effects of robustifying the samples on the accuracy of supervised classification through the multivariate functional DD-classifier. Our results show that 10 out of 20 methods revised perform better in at least one kind outliers. Nevertheless, no clear positive effects of robustification were identified with the real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The notation is standardized aiming uniformity and it is not necessarily equal to the original.

  2. The original weighting function proposed by Claeskens et al. (2014) gives to each point t a weight that corresponds to the proportion of amplitude of the multivariate dataset at that point.

  3. Simplicial depth is also used with Method one, as can be seen in López-Pintado et al. (2014), but that method is not explored in this document.

  4. Even when Hubert et al. (2015)’s definition of depth region is based on the halfspace depth, any other depth function that meets the properties enumerated above can be used to build depth regions. This is stated but not demonstrated on Hubert et al. (2015).

  5. This inequality is shown in Ieva and Paganoni (2017) in more detail, with the corresponding plots of (modified) band depth (X axis) against (modified) epigraph index (Y axis). Every point falls below the aforementioned inequality, but points that fall too far from the boundary could be understood as shape outliers, while points whit low (M)BD could be considered magnitude outliers.

  6. The indicator goes from 0.5, indicating the poorest performance where all positives are false and none of them are true, and 2, where all positives are true and none of them are false. This indicator, nevertheless, must be complemented with the false positive rate.

  7. The imputation algorithm consists on the estimation of the smoothed mean value for each t using a cubic spline, followed by a modification of the EM algorithm made of three steps: 1. Replacing missing values by estimates, 2. estimate the parameters \(\mu \) and \(\Sigma \), 3. Estimate the level for each multivariate time series, 4. Re-estimate the missing values with new parameters (Junger and Ponce de Leon 2015). The procedure was made using the R package mtsdi.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Miguel Roldán-Alzate.

Additional information

Handling Editor: Dr. Luiz Duczmal.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roldán-Alzate, L.M., Zuluaga, F. Assessing the effects of multivariate functional outlier identification and sample robustification on identifying critical PM2.5 air pollution episodes in Medellín, Colombia. Environ Ecol Stat 29, 801–825 (2022). https://doi.org/10.1007/s10651-022-00544-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-022-00544-5

Keywords

Navigation