Abstract
Predicting concentrations of pollen is of great importance both for patients and for public health institutions. In this paper, we present a forecasting approach which relies on data and makes no assumptions on the underlying phenomena affecting the plants and the pollination process. Machine learning is used to build a model and to select the most important variables for prediction. Through nonparametric hypothesis testing, we show how some variables are indeed more important than others and how the careful combination of these variables can lead to more accurate and parsimonious models which avoid the huge computational times of more complex models while outperforming them in terms of the precision of the forecasts. By increasing the richness of the selected variables based on the clustered Friedman importance ranks, prediction error is reduced from 4.57 to 4.40 grains/m3 as an average, which accounts for a 3.5% average improvement across locations studied with a 50% reduction of execution times.
Similar content being viewed by others
References
Andersen TB (1991) A model to predict the beginning of the pollen season. Grana 30:269–275
Astray G, Fernández-González M, Rodríguez-Rajo F, López D, Mejuto J (2016) Airborne castanea pollen forecasting model for ecological and allergological implementation. Sci Total Environ 548–549:110–121
Aznarte JL, Benítez Sánchez JM, Lugilde DN, de Linares Fernández C, de la Guardia CD, Sánchez FA (2007) Forecasting airborne pollen concentration time series with neural and neuro-fuzzy models. Expert Syst Appl 32(4):1218–1225
Barnes C, Pacheco F, Landuyt J, Hu F, Portnoy J (2001) The effect of temperature, relative humidity and rainfall on airborne ragweed pollen concentrations. Aerobiologia 17(1):61–68
Bartková-Scevková J (2003) The influence of temperature, relative humidity and rainfall on the occurrence of pollen allergens (betula, poaceae, ambrosia artemisiifolia) in the atmosphere of Bratislava (Slovakia). Int J Biometeorol 48(1):1–5
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Bolón-Canedo V, no NSM, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inform Syst 34:483–519
Breiman L (1996) Bagging predictiors. Mach Learn 25:123–140
Breiman L (2001) Random forest. Machn Learn 45:5–32
Cannell M, Smith R (1983) Thermal time, chill days and prediction of budburst in Picea sitchensis. J Appl Ecol 20:269–275
Castellano-Méndez M, Aira MJ, Iglesias I, Jato V, González-Manteiga W (2005) Artificial neural networks as a useful tool to predict the risk level of Betula pollen in the air. Int J Biometeorology 49:310–316
Cotos-Yáñez T, Rodríguez-Rajo F, Jato M (2004) Short-term prediction of Betula airborne pollen concentration in Vigo (NW Spain) using logistic additive models and partially linear models. Int J Biometeorol 48:179–185
Csépe Z, Makra L, Voukantsis D, Matyasovszky I, Tusnády G, Karatzas K, Thibaudon M (2014) Predicting daily ragweed pollen concentrations using computational intelligence techniques over two heavily polluted areas in Europe. Sci Total Environ 542–552:476–477
de Weger LA, Bergmann KC, Rantio-Lehtimaki A, Dahl A, Buters J, Déchamp C, Belmonte J, Thibaudon M, Cecchi L, Besancenot JP, Galán C, Waisel Y (2013) Impact of pollen. In: Sofiev M, Bergmann KC (eds) Allergenic pollen. Springer, Netherlands, pp 161–215, https://doi.org/10.1007/978-94-007-4881-1_6
Deák A, Makra L, Matyasovszky I, Csépe Z, Muladi B (2013) Climate sensitivity of allergenic taxa in Central Europe associated with new climate change related forces. Sci Total Environ 442:36– 47
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Statist Assoc 32:674–701
Galán Soldevilla C, Cariñanos González P, Alcázar Teno P, Domínguez Vílches E (2007) Manual de Calidad y Gestión de la Red Española de Aerobiología. Universidad de Córdoba
García-Mozo H, Chuine I, Aira M, Belmonte J, Bermejo D, de la Guardia CD, Elvira B, Gutiérrez M, Rodríguez-Rajo J, Ruiz L, Trigo M, Tormo R, Valencia R, Galán C (2008) Regional phenological models for forecasting the start and peak of the quercus pollen season in Spain. Agr Forest Meteorol 148:372– 380
Grote M, Vrtala S, Niederberger V, Wiermann R, Valenta R, Reichelt R (2001) Release of allergen-bearing cytoplasm from hydrated pollen: a mechanism common to a variety of grass (poaceae) species revealed by electron microscopy. J Allergy Clin Immunol 108(1):109–115
Iglesias-Otero MA, Fernández-González M, Rodríguez-Caride D, Astray G, Mejuto JC, Rodríguez-Rajo FJ (2015) A model to forecast the risk periods of Plantago pollen allergy by using ANN methodology. Aerobiologia 31:201–211
Jones A, Harrison R (2004) The effects of meteorological factors on atmospheric bioaerosol concentrations: a review. Sci Total Environ 326:151–181
Kmenta M, Bastl K, Kramer M, Hewings S, Mwange J, Zetter R, Berger U (2016) The grass pollen season 2014 in Vienna: a pilot study combining phenology, aerobiology and symptom data. Sci Total Environ 566–567:1614–1620
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Leanh J, Rind D (1998) Climate forcing by changing solar radiation. J Climate 11(12):3069–3094
Levetin E (2014) Daily ragweed pollen forecasting. J Allergy Clin Immunol 133:AB17
Li X, Maier H, AC Z (2015) Improved PMI-based input variable selection approach for artificial neural network and other data driven environmental and water resource models. Environ Model Softw 65:15–29
Matyasovszky I, Makra L, Csépe Z, Sümeghy Z, Deák A, Pál-Molnár E, Tusnády G (2015) Plants remember past weather: a study for atmospheric pollen concentrations of Ambrosia, Poaceae and Populus. Theor Appl Climatol 122:181–193
Myszkowska D (2014) Predicting tree pollen season start dates using thermal conditions. Aerobiologia 30:307–321
Navares R, Aznarte J (2016a) Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features. Int J Biometeorol. https://doi.org/10.1007/s00484-016-1242-8
Navares R, Aznarte J (2016b) What are the most important variables for poaceae airborne pollen forecasting? Sci Total Environ 579:1161–1169
Navares R, Aznarte J (2017) Forecasting the start and end of pollen season in madrid. In: Advances in time series analysis and forecasting. chap 26. Springer International Publishing, pp 387–399
Otero J, García-Mozo H, Hervás C, Galán C (2013) Biometeorological and autoregressive indices for predicting olive pollen intensity. Int J Biometeorol 57:307–316
Pauling A, Gehrig R, Clot B (2014) Toward optimized temperature sum parametrizations for forecasting the start of the pollen season. Aerobiologia 30:45–57
Peternel R, Srnec L, Culig J, Hrga I, Hercog P (2005) Poaceae pollen in the atmosphere of Zagreb (Croatia), 2002–2005. Grana 45:130–136
Puc M (2012) Artificial neural network model of the relationship between betula pollen and meteorological factors in Szczecin (Poland). Int J Biometeorol 56(2):395–401
Rantio-Lehtimäki A, Koivikko A, Kupias R, Mäkinen Y, Pohjola A (1991) Significance of sampling height of airborne particles for aerobiological information. Allergy 46:68–76
Ribeiro H, Cunha M, Abreu I (2007) Definition of main pollen season using logistic model. Ann Agric Environ Med 14:259–264
Rodríguez-Rajo F, Frenguelli G, Jato M (1983) Effect of air temperature on forecasting the start of the Betula pollen season at two contrasting sites in the south of Europe (1995-2001). Int J of Biometeorology 47:117–125
Rodríguez-Rajo F, Dopazo A, Jato V (2004) Environmental factors affecting the start of pollen season and concentrations of airborne Alnus pollen in two localities of Galicia (NW Spain). Ann Agric Environ Med 11:35–44
Shaffer J (1986) Modified sequentially rejective multiple test procedures. J Am Stat Assoc 81:826–831
Smith M, Emberlin J (2006) A 30-day-ahead forecast model for grass pollen in north London, UK. Int J Biometeorology 50:233–242
Subiza J, Jerez M, Jiménez J, Narganes M, Cabrera M, Varela S, Subiza E (1995) Allergenic pollen pollinosis in madrid. J Allergy Clin Immunol 96:15–23
Tassan-Mazzocco F, Felluga A, Verardo P (2015) Prediction of wind-carried Gramineae and Urticaceae pollen occurrence in the Friuli Venezia Giulia region (Italy). Aerobiologia 31:559–574
Tran H, Muttil N, Perera B (2015) Selection of significant input variables for time series forecasting. Environ Model Softw 64:156–163
Tseng Y, Kawashima S, Kobayashi S, Takeuchi S (2018) Algorithm for forecasting the total amount of airborne birch pollen from meteorological conditions of previous years. Agr Forest Meteorol 249:35–43
Acknowledgments
The authors would like to thank Patricia Cervigón (Comunidad de Madrid) and Montserrat Gutiérrez Bustillo (Universidad Complutense de Madrid) for his assistance in obtaining the data for this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Navares, R., Aznarte, J.L. Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests. Theor Appl Climatol 139, 163–174 (2020). https://doi.org/10.1007/s00704-019-02954-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-019-02954-1