Skip to main content

Advertisement

Log in

Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

Abstract

Predicting concentrations of pollen is of great importance both for patients and for public health institutions. In this paper, we present a forecasting approach which relies on data and makes no assumptions on the underlying phenomena affecting the plants and the pollination process. Machine learning is used to build a model and to select the most important variables for prediction. Through nonparametric hypothesis testing, we show how some variables are indeed more important than others and how the careful combination of these variables can lead to more accurate and parsimonious models which avoid the huge computational times of more complex models while outperforming them in terms of the precision of the forecasts. By increasing the richness of the selected variables based on the clustered Friedman importance ranks, prediction error is reduced from 4.57 to 4.40 grains/m3 as an average, which accounts for a 3.5% average improvement across locations studied with a 50% reduction of execution times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://gestiona.madrid.org/azul_internet/html/web/AvisosAccion.icm?ESTADO_MENU=1

References

  • Andersen TB (1991) A model to predict the beginning of the pollen season. Grana 30:269–275

    Google Scholar 

  • Astray G, Fernández-González M, Rodríguez-Rajo F, López D, Mejuto J (2016) Airborne castanea pollen forecasting model for ecological and allergological implementation. Sci Total Environ 548–549:110–121

    Google Scholar 

  • Aznarte JL, Benítez Sánchez JM, Lugilde DN, de Linares Fernández C, de la Guardia CD, Sánchez FA (2007) Forecasting airborne pollen concentration time series with neural and neuro-fuzzy models. Expert Syst Appl 32(4):1218–1225

    Google Scholar 

  • Barnes C, Pacheco F, Landuyt J, Hu F, Portnoy J (2001) The effect of temperature, relative humidity and rainfall on airborne ragweed pollen concentrations. Aerobiologia 17(1):61–68

    Google Scholar 

  • Bartková-Scevková J (2003) The influence of temperature, relative humidity and rainfall on the occurrence of pollen allergens (betula, poaceae, ambrosia artemisiifolia) in the atmosphere of Bratislava (Slovakia). Int J Biometeorol 48(1):1–5

    Google Scholar 

  • Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271

    Google Scholar 

  • Bolón-Canedo V, no NSM, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inform Syst 34:483–519

    Google Scholar 

  • Breiman L (1996) Bagging predictiors. Mach Learn 25:123–140

    Google Scholar 

  • Breiman L (2001) Random forest. Machn Learn 45:5–32

    Google Scholar 

  • Cannell M, Smith R (1983) Thermal time, chill days and prediction of budburst in Picea sitchensis. J Appl Ecol 20:269–275

    Google Scholar 

  • Castellano-Méndez M, Aira MJ, Iglesias I, Jato V, González-Manteiga W (2005) Artificial neural networks as a useful tool to predict the risk level of Betula pollen in the air. Int J Biometeorology 49:310–316

    Google Scholar 

  • Cotos-Yáñez T, Rodríguez-Rajo F, Jato M (2004) Short-term prediction of Betula airborne pollen concentration in Vigo (NW Spain) using logistic additive models and partially linear models. Int J Biometeorol 48:179–185

    Google Scholar 

  • Csépe Z, Makra L, Voukantsis D, Matyasovszky I, Tusnády G, Karatzas K, Thibaudon M (2014) Predicting daily ragweed pollen concentrations using computational intelligence techniques over two heavily polluted areas in Europe. Sci Total Environ 542–552:476–477

    Google Scholar 

  • de Weger LA, Bergmann KC, Rantio-Lehtimaki A, Dahl A, Buters J, Déchamp C, Belmonte J, Thibaudon M, Cecchi L, Besancenot JP, Galán C, Waisel Y (2013) Impact of pollen. In: Sofiev M, Bergmann KC (eds) Allergenic pollen. Springer, Netherlands, pp 161–215, https://doi.org/10.1007/978-94-007-4881-1_6

    Google Scholar 

  • Deák A, Makra L, Matyasovszky I, Csépe Z, Muladi B (2013) Climate sensitivity of allergenic taxa in Central Europe associated with new climate change related forces. Sci Total Environ 442:36– 47

    Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Statist Assoc 32:674–701

    Google Scholar 

  • Galán Soldevilla C, Cariñanos González P, Alcázar Teno P, Domínguez Vílches E (2007) Manual de Calidad y Gestión de la Red Española de Aerobiología. Universidad de Córdoba

  • García-Mozo H, Chuine I, Aira M, Belmonte J, Bermejo D, de la Guardia CD, Elvira B, Gutiérrez M, Rodríguez-Rajo J, Ruiz L, Trigo M, Tormo R, Valencia R, Galán C (2008) Regional phenological models for forecasting the start and peak of the quercus pollen season in Spain. Agr Forest Meteorol 148:372– 380

    Google Scholar 

  • Grote M, Vrtala S, Niederberger V, Wiermann R, Valenta R, Reichelt R (2001) Release of allergen-bearing cytoplasm from hydrated pollen: a mechanism common to a variety of grass (poaceae) species revealed by electron microscopy. J Allergy Clin Immunol 108(1):109–115

    Google Scholar 

  • Iglesias-Otero MA, Fernández-González M, Rodríguez-Caride D, Astray G, Mejuto JC, Rodríguez-Rajo FJ (2015) A model to forecast the risk periods of Plantago pollen allergy by using ANN methodology. Aerobiologia 31:201–211

    Google Scholar 

  • Jones A, Harrison R (2004) The effects of meteorological factors on atmospheric bioaerosol concentrations: a review. Sci Total Environ 326:151–181

    Google Scholar 

  • Kmenta M, Bastl K, Kramer M, Hewings S, Mwange J, Zetter R, Berger U (2016) The grass pollen season 2014 in Vienna: a pilot study combining phenology, aerobiology and symptom data. Sci Total Environ 566–567:1614–1620

    Google Scholar 

  • Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Google Scholar 

  • Leanh J, Rind D (1998) Climate forcing by changing solar radiation. J Climate 11(12):3069–3094

    Google Scholar 

  • Levetin E (2014) Daily ragweed pollen forecasting. J Allergy Clin Immunol 133:AB17

    Google Scholar 

  • Li X, Maier H, AC Z (2015) Improved PMI-based input variable selection approach for artificial neural network and other data driven environmental and water resource models. Environ Model Softw 65:15–29

    Google Scholar 

  • Matyasovszky I, Makra L, Csépe Z, Sümeghy Z, Deák A, Pál-Molnár E, Tusnády G (2015) Plants remember past weather: a study for atmospheric pollen concentrations of Ambrosia, Poaceae and Populus. Theor Appl Climatol 122:181–193

    Google Scholar 

  • Myszkowska D (2014) Predicting tree pollen season start dates using thermal conditions. Aerobiologia 30:307–321

    Google Scholar 

  • Navares R, Aznarte J (2016a) Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features. Int J Biometeorol. https://doi.org/10.1007/s00484-016-1242-8

    Google Scholar 

  • Navares R, Aznarte J (2016b) What are the most important variables for poaceae airborne pollen forecasting? Sci Total Environ 579:1161–1169

    Google Scholar 

  • Navares R, Aznarte J (2017) Forecasting the start and end of pollen season in madrid. In: Advances in time series analysis and forecasting. chap 26. Springer International Publishing, pp 387–399

  • Otero J, García-Mozo H, Hervás C, Galán C (2013) Biometeorological and autoregressive indices for predicting olive pollen intensity. Int J Biometeorol 57:307–316

    Google Scholar 

  • Pauling A, Gehrig R, Clot B (2014) Toward optimized temperature sum parametrizations for forecasting the start of the pollen season. Aerobiologia 30:45–57

    Google Scholar 

  • Peternel R, Srnec L, Culig J, Hrga I, Hercog P (2005) Poaceae pollen in the atmosphere of Zagreb (Croatia), 2002–2005. Grana 45:130–136

    Google Scholar 

  • Puc M (2012) Artificial neural network model of the relationship between betula pollen and meteorological factors in Szczecin (Poland). Int J Biometeorol 56(2):395–401

    Google Scholar 

  • Rantio-Lehtimäki A, Koivikko A, Kupias R, Mäkinen Y, Pohjola A (1991) Significance of sampling height of airborne particles for aerobiological information. Allergy 46:68–76

    Google Scholar 

  • Ribeiro H, Cunha M, Abreu I (2007) Definition of main pollen season using logistic model. Ann Agric Environ Med 14:259–264

    Google Scholar 

  • Rodríguez-Rajo F, Frenguelli G, Jato M (1983) Effect of air temperature on forecasting the start of the Betula pollen season at two contrasting sites in the south of Europe (1995-2001). Int J of Biometeorology 47:117–125

    Google Scholar 

  • Rodríguez-Rajo F, Dopazo A, Jato V (2004) Environmental factors affecting the start of pollen season and concentrations of airborne Alnus pollen in two localities of Galicia (NW Spain). Ann Agric Environ Med 11:35–44

    Google Scholar 

  • Shaffer J (1986) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81:826–831

    Google Scholar 

  • Smith M, Emberlin J (2006) A 30-day-ahead forecast model for grass pollen in north London, UK. Int J Biometeorology 50:233–242

    Google Scholar 

  • Subiza J, Jerez M, Jiménez J, Narganes M, Cabrera M, Varela S, Subiza E (1995) Allergenic pollen pollinosis in madrid. J Allergy Clin Immunol 96:15–23

    Google Scholar 

  • Tassan-Mazzocco F, Felluga A, Verardo P (2015) Prediction of wind-carried Gramineae and Urticaceae pollen occurrence in the Friuli Venezia Giulia region (Italy). Aerobiologia 31:559–574

    Google Scholar 

  • Tran H, Muttil N, Perera B (2015) Selection of significant input variables for time series forecasting. Environ Model Softw 64:156–163

    Google Scholar 

  • Tseng Y, Kawashima S, Kobayashi S, Takeuchi S (2018) Algorithm for forecasting the total amount of airborne birch pollen from meteorological conditions of previous years. Agr Forest Meteorol 249:35–43

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Patricia Cervigón (Comunidad de Madrid) and Montserrat Gutiérrez Bustillo (Universidad Complutense de Madrid) for his assistance in obtaining the data for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Navares.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Navares, R., Aznarte, J.L. Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests. Theor Appl Climatol 139, 163–174 (2020). https://doi.org/10.1007/s00704-019-02954-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-019-02954-1

Navigation