Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Detection of outliers in pollutant emissions from the Soto de Ribera coal-fired power plant using functional data analysis: a case study in northern Spain

  • 183 Accesses

  • 1 Citations


For more than a century, air pollution has been one of the most important environmental problems in cities. Pollution is a threat to human health and is responsible for many deaths every year all over the world. This paper deals with the topic of functional outlier detection. Functional analysis is a novel mathematical tool employed for the recognition of outliers. This methodology is applied here to the emissions of a coal-fired power plant. This research uses two different methods, called functional high-density region (HDR) boxplot and functional bagplot. Please note that functional bagplots were developed using bivariate bagplots as a starting point. Indeed, they are applied to the first two robust principal component scores. Both methodologies were applied for the detection of outliers in the time pollutant emission curves that were built using, as inputs, the discrete information available from an air quality monitoring data record station and the subsequent smoothing of this dataset for each pollutant. In this research, both new methodologies are tested to detect outliers in pollutant emissions performed over a long period of time in an urban area. These pollutant emissions have been treated in order to use them as vectors whose components are pollutant concentration values for each observation made. Note that although the recording of pollutant emissions is made in a discrete way, these methodologies use pollutants as curves, identifying the outliers by a comparison of curves rather than vectors. Then, the concept of outlier goes from being a point to a curve that employs the functional depth as the indicator of curve distance. In this study, it is applied to the detection of outliers in pollutant emissions from a coal-fired power plant located on the outskirts of the city of Oviedo, located in the north of Spain and capital of the Principality of Asturias. Also, strengths of the functional methods are explained.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Akkoyunku A, Ertürk F (2003) Evaluation of air pollution trends in Istanbul. Int J Environ Pollut 18:388–398

  2. Basden AG, Atkinson D, Bharmal NA, Bitenc U, Brangier M, Buey T, Butterley T, Cano D, Chemla F, Clark P, Cohen M, Conan JM, de Cos FJ, Dickson C, Dipper NA, Dunlop CN, Feautrier P, Fusco T, Gach JL, Gendron E, Geng D, Goodsell SJ, Gratadour D, Greenaway AH, Guesalaga A, Guzman CD, Henry D, Holck D, Hubert Z, Huet JM, Kellerer A, Kulcsar A, Laporte P, Le Roux B, Looker N, Longmore AJ, Marteaud M, Martin O, Meimon S, Morel C, Morris TJ, Myers RM, Osborn J, Perret D, Petit C, Raynaud H, Reeves AP, Rousset G, Sanchez Lasheras G, Sanchez Rodriguez ML, Santos JD, Sevin A, Sivo G, Stadler E, Stobie B, Talbot G, Todd S, Vidal F, Younger EJ (2016) Experience with wavefront sensor and deformable mirror interfaces for wide-field adaptive optics systems. Mon Not Roy Astron Soc 459(2):1350–1359

  3. Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data. Springer Series in Statistics: 9, Berlin

  4. Colbeck I (2008) Environmental chemistry of aerosol. Wiley-Blackwell, New York

  5. Comrie AC, Diem JE (1999) Climatology and forecast modeling of ambient carbon monoxide in Phoenix. Atmos Environ 33:5023–5036

  6. Cooper CD, Alley FC (2002) Air pollution control. Waveland Press, New York

  7. De Andrés J, Sánchez-Lasheras F, Lorca P, de Cos Juez FJ (2011) A hybrid device of self organizing maps (SOM) and multivariate adaptive regression splines (MARS) for the forecasting of firms’ bankruptcy. Account Manage Inform Syst 10(3):351–374

  8. De Cos J, Sanchez F, Ortega F, Montequin V (2008) Rapid cost estimation of metallic components for the aerospace industry. Int J Prod Econ 112:470–482

  9. Díaz Muñiz C, García Nieto PJ, Alonso Fernández JR, Martínez Torres J, Taboada J (2012) Detection of outliers in water quality monitoring samples using functional data analysis in San Esteban estuary (Northern Spain). Sci Total Environ 439(15):54–61

  10. Elbir T, Muezzinoglu A (2000) Evaluation of some air pollution indicators in Turkey. Environ Int 26(1–2):5–10

  11. Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics 19:331–345

  12. Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52:1694–1711

  13. Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10:419–440

  14. Friedlander SK (2000) Smoke, dust and haze: fundamentals of aerosol dynamics. Oxford University Press, New York

  15. García Nieto PJ (2001) Parametric study of selective removal of atmospheric aerosol by coagulation, condensation and gravitational settling. Int J Environ Health Res 11:151–162

  16. García Nieto PJ (2006) Study of the evolution of aerosol emissions from coal-fired power plants due to coagulation, condensation, and gravitational settling and health impact. J Environ Manag 79(4):372–382

  17. García Nieto PJ, Álvarez Fernández JR, Sánchez Lasheras F, de Cos Juez FJ, Díaz Muñiz C (2012) A new improved study of cyanotoxins presence from experimental cyanobacteria concentrations in the Trasona reservoir (Northern Spain) using the MARS technique. Sci Total Environ 430:88–92

  18. Godish T (2004) Air quality. Lewis Publishers, Boca Raton

  19. Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34:1493–1517

  20. Hewitt CN, Jackson AV (2009) Atmospheric science for environmental scientists. Wiley-Blackwell, New York

  21. Hyndman RJ (1996) Computing and graphing highest density regions. The American Statistician, 50:120–126. Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association Stable URL: https://www.jstor.org/stable/2684423. Accessed 17 May 2018

  22. Hyndman RJ, Ullah MS (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51:4942–4968

  23. Karaca F, Alagha O, Ertürk F (2005) Statistical characterization of atmospheric PM10 and PM2.5 concentrations at a non-impacted suburban site of Istanbul, Turkey. Chemosphere 59(8):1183–1190

  24. Lalor G, Zhang CS (2001) Multivariate outlier detection and remediation in geochemical databases. Sci Total Environ 281:99–109

  25. Lutgens FK, Tarbuck EJ (2018) The atmosphere: an introduction to meteorology. Prentice Hall, New York

  26. Martínez Torres J, García Nieto PJ, Alejano L, Reyes AN (2011) Detection of outliers in gas emissions from urban areas using functional data analysis. J Hazard Mater 186(1):144–149

  27. Monteiro A, Lopes M, Miranda AI, Borrego C, Vautard R (2005) Air pollution forecast in Portugal: a demand from the new air quality framework directive. Int J Environ Pollut 5:1–9

  28. Ramsay JO, Silverman BW (1997) Functional data analysis. Springer, New York

  29. Rousseuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387

  30. Ruts I, Rousseeuw PJ (1996) Computing depth contours of bivariate point clouds. Comput Stat Data Anal 23:153–168

  31. Schnelle KB, Dunn RF, Ternes ME (2017) Air pollution control technology handbook. CRC, Boca Raton

  32. Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, New York

  33. Seinfeld JH, Pandis SN (2016) Atmospheric chemistry and physics: from air pollution to climate change. Wiley, New York

  34. Suárez Sánchez A, García Nieto PJ, Riesgo Fernández P, del Coz Díaz JJ, Iglesias–Rodríguez FJ (2011) Application of a SVM–based regression model to the air quality study at local scale in the Avilés urban area (Spain). Math Comput Model 54(5–6):1453–1466

  35. Tanner MA (1993) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions, 2nd edn. Springer-Verlag, New York

  36. Tukey JW (1975) Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Canad Math Congress, Montreal, vol 2, pp 523–531

  37. Tukey JW (1977) Exploratory data analysis. AddisonWesley, Reading, MA

  38. Turrado CC, Meizoso López MC, Sánchez Lasheras F, Rodríguez Gómez BA, Calvo Rollé JL, de Cos Juez FJ (2014) Missing data imputation of solar radiation data under different atmospheric conditions. Sensors 14(11):20382–20399

  39. Vincent JH (2007a) Aerosol sampling: science, standards, instrumentation and applications. Wiley, Chichester

  40. Vincent JH (2007b) Aerosol sampling: science, standards, instrumentation and applications. Wiley, New York

  41. Wang LK, Pereira NC, Hung YT (2004) Air pollution control engineering. Humana Press, New York

  42. Wark K, Warner CF, Davis WT (1997) Air pollution: its origin and control. Prentice Hall, New Jersey

Download references


The authors wish to acknowledge the computational support provided by the Department of Mathematics at the University of Oviedo, as well as pollutant data from the Santa Marina air quality automated monitoring station supplied by the Section of Industry and Energy from the Government of Asturias (Spain). We would like to thank Anthony Ashworth for his revision of the English grammar and spelling of the manuscript.

Author information

Correspondence to Fernando Sánchez-Lasheras.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editor: Marcus Schulz

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sánchez-Lasheras, F., Ordóñez-Galán, C., García-Nieto, P.J. et al. Detection of outliers in pollutant emissions from the Soto de Ribera coal-fired power plant using functional data analysis: a case study in northern Spain. Environ Sci Pollut Res 27, 8–20 (2020). https://doi.org/10.1007/s11356-019-04435-4

Download citation


  • Functional data analysis
  • Outlier detection
  • Air pollution
  • Gas emissions
  • Functional bagplot
  • Functional high-density region (HDR) boxplot