Abstract
Technologies for monitoring pollen concentrations in real-time made substantial advances in the past years and become increasingly available. This opens the possibility to calibrate numerical pollen forecast models in real-time and make a significant step forward regarding the quality of pollen forecasts. We present a method to use real-time pollen measurements in numerical pollen forecast models. The main idea is to calibrate model parameterizations and not to assimilate measurements in a nudging sense. This ensures that the positive effect persists throughout the forecast period and does not vanish after a few forecast hours. We propose to adapt in real-time both the model phenology scheme and the overall tuning factor that are present in any numerical pollen forecast model. To test this approach, we used the numerical pollen forecast model COSMO-ART (COnsortium for Small-scale MOdelling-Aerosols and Reactive Trace gases) on a mesh size of 1.1 km covering the greater Alpine domain. Test runs covered two pollen seasons and included Corylus, Alnus, Betula and Poaceae pollen. Comparison with daily measurements from 13 Swiss pollen stations revealed that the model improvements are large, but fine-tuning of the method remains a challenge. We conclude that the presented approach is a first valuable step towards comprehensive real-time calibration of numerical pollen forecast models.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Pollen forecasts allow allergy sufferers to minimize exposure and thus symptoms (Tummon et al., 2021). Area-covering pollen forecasts can be provided by numerical pollen forecast models such as COSMO-ART (Vogel et al., 2009) or SILAM (System for Integrated modeLling of Atmospheric coMposition) (Sofiev et al., 2015). However, operational pollen forecast models run freely over the whole pollen season, i.e. the models are not adapted according to the actual pollen measurements. This reduces the quality of the forecasts.
The reason for the lack of data assimilation is the well-established pollen monitoring technology. Since the 1950s Hirst-type pollen monitoring devices (Hirst, 1952; Galán et al., 2014; PD CEN/TS 16868, 2015) represent the standard of pollen measurements. A shortcoming of this technology is that the pollen grains are determined and counted manually under the microscope. As a consequence, data become available with a delay of 1 to several days which is too late for pollen assimilation in numerical pollen forecast models.
In the past years, new monitoring technologies allowing real-time measurements of pollen concentrations witnessed breakthroughs (Huffman et al., 2020). These technologies emerged from research fields such as robotics, image recognition, laser and fluorescence. An overview of the available devices can be found in Clot et al. (2020). There is an increasing number of European countries where real-time pollen monitoring networks are being established (Tummon et al., 2022) supported by the EUMETNET AutoPollen Programme (Clot et al., 2020).
These developments open new opportunities for numerical pollen forecast models as pollen data assimilation becomes possible. The aim of this study is to develop and test a method to improve numerical pollen forecasts by using actual pollen concentration data in COSMO-ART. We focus on the research question: Is it possible to substantially improve the numerical pollen dispersion model COSMO-ART by incorporating pollen data in hindcast runs as if they were real-time pollen data?
We restrict the study to COSMO-ART but the method may be implemented in other numerical pollen forecast models as well. Additionally, we neither investigate lead-time dependencies nor ensembles. Instead, we focus on deterministic analysis cycles giving the paper a case study character.
2 Data and methods
2.1 Rationale for the calibration approach
The most obvious way to use real-time pollen data in numerical pollen forecast models would be tuning the initial conditions (i.e. starting concentrations). However, Sofiev (2019) found that benefits of classical data assimilation are limited to the first few hours of the forecasts. Using SILAM runs, he showed that half of the positive effect is gone within \(\approx\) 5 h, while 90% vanishes in less than 24 h. However, lead times larger than 5–24 h are of great interest to many of the user groups.
Consequently, we propose not to adapt the initial conditions but rather to adapt suited parameterizations (denoted “calibration” hereafter) of the pollen model. This way it is possible to maintain the benefit from real-time data to the end of the forecast run no matter how long this will be. One crucial question is which parameterization parameters are suitable for tuning.
All numerical pollen forecast models include a number of parameterizations when explicitely calculating phenology (the state of the season), pollen emission, transport and sedimentation. Naturally, the most effective way to improve model output would be to adapt parameterizations with the largest uncertainties. However, uncertainty attribution is not trivial and uncertain itself. Errors in the phenology model’s ability to predict the start of the pollen season will result in significant forecasting errors as concentrations may increase sharply at the beginning of the season. In addition, errors at the start of the season tend to negatively influence the correctness of both the peak and the end of the pollen season.
2.1.1 Calibration of the phenology model
The binary character of season timing (emission vs. no emission) and the impact on end users led us to focus in a first step on model phenology. In COSMO-ART, this module is based on growing degree day models (GDD) that are described in Pauling et al. (2014) and Zink et al. (2013). This module involves a field of temperature thresholds that are variable in space but constant in time.
We suggest to make the field of temperature sum thresholds variable also in time. If the temperature thresholds are exceeded but no pollen has been observed up to that date, a temperature increment corresponding to 1 day is added to the temperature thresholds at pollen stations. Similarly, the temperature sum thresholds can be lowered. Then, the new, station-based temperature sum thresholds are extrapolated onto the whole domain using inverse distance weighting.
2.1.2 Calibration of the tuning factor
The remaining error of the pollen forecast is hard to attribute. Major sources of uncertainty arise from the plant distribution, the emission parameterization, the quality of the weather parameter forecasts and sub-scale processes. Hence, it is hard to adapt specific parameterizations in a conceptually sound way. That is why we opted to implement a bias reduction scheme that accounts for most remaining sources of uncertainties.
In all numerical pollen forecast models, there is a tuning factor that aggregates the various influence factors to pollen emission flux. In COSMO-ART, this is a single species-specific number that was determined based on past pollen seasons at pollen stations and then averaged to minimize the overall bias. This number was held constant and applied to the whole model domain. It did neither account for mast years nor for regional differences. Adapting this number is supported by Sofiev (2019) who found that correction of the total seasonal pollen emission field has a positive impact. However, as whole-season data are not available before or during the pollen season, we use the tuning factor instead that has a similar role as the total seasonal pollen emission.
We suggest to replace this number by a variable field that is updated whenever new real-time pollen data become available (e.g. every hour, as is the case in our hindcast runs). Initial tests have shown that this number, called debiasing factor F hereafter, calculated by
where
yields promising results. F values are calculated for each station and are subsequently extrapolated onto the whole domain using inverse distance weighting. Main considerations include that the change must not take effect too quickly to avoid over-reaction (i.e. due to a very strong, but temporally limited local source). The twenty-fourth root was chosen so that F takes full effect after 24 h (if the routine is called hourly). In addition, species-specific upper and lower limits of F are implemented to ensure that F stays in a predefined range. The results are sensitive to these implementation details and are discussed in Sect. 3.
2.2 Data
The analyses are based on three data sets in measurement space.
-
1.
Measurements: hourly pollen concentration from 13 Swiss pollen stations (Basel, Bern, Buchs, Lausanne, Genève, La-Chaux-de-Fonds, Luzern, Locarno, Lugano, Münsterlingen, Neuchâtel, Visp and Zürich). The measurements were made using Hirst pollen traps because no real-time pollen monitoring device was available yet. Species include Alnus, Betula, Corylus and Poaceae.
-
2.
Baseline model runs: This data set includes modelled concentrations at pollen stations from COSMO-1E which is an operational configuration of the COSMO-ART model at MeteoSwiss. The COSMO-1E configuration is based on a horizontal mesh of 1.1 km and 80 vertical levels. It covers the greater Alpine domain (approx. 1280 km x 860 km). For Alnus (January–March), Corylus (January–March) and Betula (March–June), the database includes the pollen seasons 2020 and 2021. For Poaceae (March–October), the pollen seasons 2019–2020 are available. The runs were performed as 24-h cycles without the calibration module. Hourly concentrations are retrieved from the model output.
-
3.
Calibrated model runs: same as the baseline model configuration but with the calibration module.
The analysis is carried out for each pollen species individually, but for all stations and both years combined.
The temporal resolution for this analysis is daily (i.e. hourly concentrations are converted into daily averages). Sensitivity studies with 12 hourly averages have been carried out, but are not shown as the results were similar.
Measurements of low pollen concentrations have a high measurement uncertainty (Adamov et al., 2021). Therefore, comparison of modelled with measured data for concentrations \(<=\) 10 Pollen/\(m^{3}\) should be considered with a grain of salt. To calculate the log and relative error metrics, 0.001 was added to the modelled and measured values for numerical reasons.
Missing data occur in the measurements (not in model data). Daily averages from hourly values were calculated without missing values if at least 50% of hourly values are present. Otherwise, that specific day was excluded from the analysis for all three data sets. The impact of missing data was evaluated and can be considered minimal.
2.3 Statistical analyses
The analysis and the preprocessed data can be freely retrieved on github in reproducible form: https://github.com/sadamov/realtime_calibration.git
Data preparation and visualization was carried out in R 4.2.1 (R Core Team, 2022) using various packages from the Tidyverse (Wickham et al., 2019). The statistical analysis presented in the next paragraphs is based on the methodologies from Adamov et al. (2021).
After an initial residual analysis, the measurements were converted into logarithmic concentrations for statistical comparison with log-transformed model data (base 10 log). Even the log concentrations did not always fulfil the requirements of standard statistical methods (i.e. assuming normality of independent errors with constant variance and mean zero). These assumptions were tested with an analysis of variance (ANOVA) and quantile–quantile plots of the residuals, Tukey–Anscombe plots of residuals vs. fitted values and index plots. As the data did not fulfil the assumptions mentioned above, robust and/or simple statistical methods were applied. For every metric and figure, it is mentioned whether log-transformed or original concentrations were used.
For the numerical assessment, standard metrics such as bias, standard deviation (SD), mean absolute error (MAE) and root-mean-squared logarithmic error (RMSLE) were calculated. Therewith, the most typical metrics used in the weather community are covered. As the data do not fulfil many of the assumption mentioned above, categorical verification will allow for a more detailed verification. For the assessment of health impact classes, accuracy, Cohen’s kappa, mean absolute error of ordered levels and a pairwise comparison were carried out. The pairwise comparison and simultaneous confidence intervals for the estimated effects were calculated using the nparcomp package (Konietschke et al., 2015) with the Dunnet method, where the Hirst measurements were chosen as the reference level (see Table 2). The resulting estimator can be interpreted as a proxy for the relative difference in median between modelled and measured concentrations.
Additional categorical metrics are shown in Appendix A.2. These metrics together allow the reader to get a full picture of the improvements and shortcomings of the new calibration module. The combination of health impact-based and concentration-based metrics allows a good understanding of the model performance and the main areas of further improvement.
The validation at station level (i.e. modelled vs. measured values) is carried out using the same stations as were used for the model calibration. Given that our calibration targets parameterizations of the pollen module and not concentrations directly, the effect of data leakage can be considered reasonably low. Still, as soon as more stations are available, a station split for model validation should be performed.
3 Results and discussion
We present detailed analysis, metrics and graphs for Alnus pollen. For Corylus, Betula and Poaceae, an overview of several statistical metrics is shown in Appendix A.1. Figure 1 shows the raw data as time series for Alnus on a log scale. In January, the level of the calibrated runs is more elevated than the baseline runs and thus closer to the measurements. The main pollen season occurs in February in all three data sets. Around February 10, the calibrated runs seem to be higher than both the baseline runs and the measurements, whereas in March, a number of baseline runs appear to be too high. On a whole, the baseline and the calibration runs are quite similar.
The positive effect of the calibration method is best visible at the beginning (increasing the pollen level) and at the end of the season (decreasing the pollen level). Both the adaptation of the phenology and the debiasing factor have contributed to this result.
The strong peak in the calibrated run (around February 10) is enhanced by the debiasing factor F (cf. Eq. 1) because of the preceding days with too low model values. A variable tuning factor F increases the range of possible modelled values, making it more likely that the whole range of measurements can be modelled. On the other hand, the forecasts can also be further away from the measurements.
If the airborne pollen concentrations fluctuate on a frequency similar to the memory of the debiasing scheme (5 days in the current implementation), a shortcoming of the proposed debiasing method is revealed. Compared to the measurements, the modelled fluctuations tend to be shifted by half of the wavelength resulting in alternating over- and underestimations. In mid-latitudes, the weather often changes on a timescale of a few days making such fluctuations plausible.
This potential issue has to be weighed against the significant benefits in terms of robustness induced by the 5 days memory. Sofiev et al. (2017) also opted for a 5-day assimilation window when fusing model predictions and measurements into an optimized ensemble. They reported a "robust combination of the models" when this time window was applied. Short-term, local fluctuations of the measurements may not be representative for a large area. When using the 5 days memory, these local fluctuations average out. This is important because the calibrated model values at stations are subsequently extrapolated onto the model grid. The more stations are available the more robust results can be achieved by the proposed calibration scheme. However, the stations should be well distributed over the area of interest.
We have tested the sensitivity of the calibration scheme to the length of the memory and finally optimized this parameter. In the end, it is a trade-off between delayed reaction time inducing shifted wavelengths on the one side and robustness on the other side. This study presents the framework that worked best out of many sensitivity tests carried out for the same years and model domain (see Appendix A for more details about other experimental setups).
Figure 2 illustrates the overall effect of the calibration method. The median of the calibrated run is closer to the median of the measurements than the median of the baseline model. Also, as indicated by the interquartile range, the variation of the calibrated run is more similar to the measurements than the baseline model.
Figure 3 (Bland–Altman plot) shows the dependence of the model/measurement differences on the model/measurement mean. This allows assessment of the model error for different concentration levels separately. Comparison of the two panels in Fig. 3 reveals that overestimation of the baseline model becomes larger with increasing concentration values as shown by the Loess smoother (blue line). This feature is much less pronounced in the calibrated run. The blue confidence band shows that the uncertainty of the Loess smoother becomes larger with increasing values as well. The conical shape of the point cloud is typical for a comparison of two time series where their difference strongly depends on the magnitude of at least one of the time-series absolute values. Note that the difference (y-axis) can maximally be twice as large as the mean (x-axis) for any given pair of modelled and measured values. Further note that the model/measurement mean values \(>~300~Pollen/m^3\) were hidden from this plot for better visibility of all health impact classes.
In the calibrated and baseline run, most data points stay within the dashed red lines (\(\pm 1.96 * sd\) interval in Fig. 3). In the calibrated run, the data points are also closer to symmetry. These results also illustrate the effect of the calibration scheme. Nevertheless, model/measurement differences can still be large and will be subject to future improvements.
Figure 4 depicts the logarithmic (base 10) measurement/model data as scatter plot. Both models overestimate the concentrations for values \(<~100~Pollen/m^3\). This feature is more pronounced in the baseline model data. For both models, the spread around the red line is quite large resulting in rather low correlation coefficients especially for the baseline model. For values \(>~500~Pollen/m^3\), where the uncertainty of the Loess smoother is large, the baseline model underestimates the concentrations less than the calibrated run.
The noticeable range of the correlation coefficients (Kendall, Spearman and Pearson) indicates considerable uncertainty of the correlation estimate (Fig. 4). Still, the correlation based on the calibration run is consistently higher than the ones based on baseline data indicating improved results. The absolute increase in the correlation may appear relatively low (roughly 0.06 for Pearson R). However, as logarithmic data were used, the improvement for Pearson R is still considerable.
It is crucial to know the performance of the models for each health impact class separately. Overall, error metrics are dominated by days with low pollen occurence, simply because they are so numerous. For health applications, higher categories are arguably much more important. Figure 5 shows boxplots for measurements and both models for each impact class. For the impact classes "weak", "medium" and "strong", the calibrated run outperforms the baseline model both in terms of median and variation. On the other hand, the baseline model performs better for the impact class "very strong". Across all categories, the spread of the error is reduced in the calibration run.
Table 1 shows four distinct metrics for testing the model performance. The large reduction of the bias from 30.2 Pollen/\(m^{3}\) (baseline run) to 10.2 Pollen/\(m^{3}\) (calibrated run) is mainly the result of the debiasing method (cf. Eq. 1). However, the standard deviation that is independent of the bias was reduced much less. The mean absolute error (MAE), an overall measure that incorporates both systematic (bias) and random errors (standard deviation), was also reduced substantially. The reduction of the root-mean-square error of the logarithmic data (RMSLE) looks only marginal. However, one should keep in mind that it is based on logarithmic data. Hence, the reduction is also considerable.
The metrics in Table 2 show that the calibrated run outperforms the baseline model also based on categorical data. The impact class is correctly forecast by the calibrated run with an accuracy of 50.1%, the error amounts to only 0.571 impact classes on average (MAE). Both metrics were substantially improved compared to the baseline model (accuracy = 41.8% and MAE = 0.719). The kappa metric confirms these findings (Table 2).
The estimator is a robust method that should be considered the best choice to compare these three time series from a statistical point of view. Multiple testing, non-normality and other factors are accurately treated. The confidence interval of the estimator shows that these results are statistically significant because it does not include 0.5 (Konietschke et al., 2015), i.e. both the baseline and the calibration runs are different from the measurements. The difference between the calibrated model run and the measurements is smaller than the difference between the baseline model run and the measurements.
Additional metrics for evaluating health impact classes as well as additional species can be found in Appendix A.1. Most metrics show similar improvements as for Alnus. However, as shown from Table 7, the results for Corylus and Betula are mixed. Reasons may include that the model of Corylus was recently developed and already its baseline profits from developments that are not yet included in the modelling of the other species (i.e. improved season description curves). In addition, inter-annual variability may substantially influence the results. Early flowering species such as Corylus or Alnus experience more inter-annual variability of season start date than later flowering species such as Betula or Poaceae. For Betula, the largest errors come from the frequently occurring days falling into the "nothing" or "weak" category. The end of the season was extended too much by the calibration module in both years. As a result, the calibrated model produced pollen concentrations that were too high during the full month of June. This highlights the need for further research and fine-tuning of the calibration of both the phenology and the strength of the pollen season.
4 Conclusions
We present a method to continuously adapt the parameterization of the numerical pollen forecast model COSMO-ART. This method was tested in hindcast runs using Hirst-type pollen measurements as a proxy for real-time pollen data. A number of metrics consistently show that the proposed method leads to a substantial improvement of COSMO-ART which was the main research question of this study. This applies to analyses based both on metric and ordinal data.
Tuning the phenology and subsequently debias, the pollen emission flux includes a number of parameterisations itself. This study presents only one implementation of this conceptual approach. Fine-tuning of the approach (especially the 5-day memory in debiasing) is likely to further improve the performance of COSMO-ART. The sensitivity of the tuning factor F on the number of measurement stations and the spatial interpolation method should be investigated systematically.
To our knowledge, the implementation of the described method is the first ever online incorporation of pollen measurements into a numerical pollen forecast model. The forecast improvements were achieved by regular corrections of phenological model parameters and the overall tuning factor. As real-time pollen data become increasingly available, further improvements could be achieved in future.
References
Adamov, S., Clot, B., Crouzy, B., Gehrig, R., Graber, M. J., Lemonis, N., Sallin, C., & Tummon, F. (2021). Statistical understanding of measurement variability of hirst-type volumetric pollen and spore samplers. Aerobiologia. https://doi.org/10.1007/s10453-021-09724-5
Clot, B., Gilge, S., Hajkova, L., Magyar, D., Scheifinger, H., Sofiev, M., Bütler, F., & Tummon, F. (2020). The eumetnet autopollen programme: establishing a prototype automatic pollen monitoring network in europe. Aerobiologia. https://doi.org/10.1007/s10453-020-09666-4
Galán, C., Smith, M., Thibaudon, M., Frenguelli, G., Oteros, J., Gehrig, R., Berger, U., Clot, B., Brandao, R., & Eas, Q. C. (2014). Working Group. Pollen monitoring: minimum requirements and reproducibility of analysis. Aerobiologia, 30, 385–395. https://doi.org/10.1007/s10453-014-9335-5
Hirst, J. M. (1952). An automatic volumetric spore trap. Annals of Applied Biology, 39, 257–265. https://doi.org/10.1111/j.1744-7348.1952.tb00904.x
Huffman, J. A., Perring, A. E., Savage, N. J., Clot, B., Crouzy, B., Tummon, F., Shoshanim, O., Damit, B., Schneider, J., Sivaprakasam, V., Zawadowicz, M. A., Crawford, I., Gallagher, M., Topping, D., Doughty, D. C., Hill, S. C., & Pan, Y. (2020). Real-time sensing of bioaerosols: review and current perspectives. Aerosol Science and Technology, 54, 5. https://doi.org/10.1080/02786826.2019.1664724
Konietschke, F., Placzek, M., Schaarschmidt, F., & Hothorn, L. A. (2015). nparcomp: an r software package for nonparametric multiple comparisons and simultaneous confidence intervals. The Journal of Statistical Software, 64, 1–17. https://doi.org/10.18637/jss.v064.i09
Pauling, A., Gehrig, R., & Clot, B. (2014). Toward optimized temperature sum parameterizations for forecasting the start of the pollen season. Aerobiologia, 30, 45–57. https://doi.org/10.1007/s10453-013-9308-0
PD CEN/TS 16868. (2015). Ambient air. Sampling and analysis of airborne pollen grains and fungal spores for allergy networks. volumetric hirst method. British Standards Document. https://doi.org/10.3403/30314080U
R Core Team. (2022). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/
Sofiev, M. (2019). On possibilities of assimilation of near-real-time pollen data by atmospheric composition models. Aerobiologia, 35, 523–531. https://doi.org/10.1007/s10453-019-09583-1
Sofiev, M., Vira, J., Kouznetsov, R., Prank, M., Soares, J., & Genikhovich, E. (2015). Construction of the silam eulerian atmospheric dispersion model based on the advection algorithm of michael galperin. Geoscientific Model Development 8, 3497–3522. https://doi.org/10.5194/gmd-8-3497-2015
Sofiev, M., Ritenberga, O., Albertini, R., Arteta, J., Belmonte, J., Geller Bernstein, C., Bonini, M., Celenk, S., Damialis, A., Douros, J., Elbern, H., Friese, E., Galan, C., Oliver, G., Hrga, I., Kouznetsov, R., Krajsek, K., Magyar, D., Parmentier, J., … Vokou, D. (2017). Multi-model ensemble simulations of olive pollen distribution in europe in 2014: current status and outlook. Atmospheric Chemistry and Physics, 17, 12341–12360. https://doi.org/10.5194/acp-17-12341-2017
Tummon, F., Arboledas, L. A., Bonini, M., Guinot, B., Hicke, M., Jacob, C., Kendrovski, V., McCairns, W., Petermann, E., Peuch, V. H., Pfaar, O., Sicard, M., Sikoparija, B., & Clot, B. (2021). The need for pan-European automatic pollen and fungal spore monitoring: a stakeholder workshop position paper. Clinical and Translational Allergy,. https://doi.org/10.1002/clt2.12015
Tummon, F., Brufaerts, N., Celenk, S., Choël, M., Clot, B., Crouzy, B., Galán, C., Gilge, S., Hajkova, L., Mokin, V., O’Connor, D., Rodinkova, V., Sauliene, I., Sikoparija, B., Sofiev, M., Sozinova, O., Tesendic, D., & Vasilatou, K. (2022). Towards standardisation of automatic pollen and fungal spore monitoring: best practises and guidelines. Aerobiologia. https://doi.org/10.1007/s10453-022-09755-6
Vogel, B., Vogel, H., Bäumer, D., Bangert, M., Lundgren, K., Rinke, R., & Stanelle, T. (2009). The comprehensive model system cosmo-art radiative impact of aerosol on the state of the atmosphere on the regional scale. Atmospheric Chemistry and Physics, 4(43), 8661–8680. https://doi.org/10.5194/acp-9-8661-2009
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Zink, K., Pauling, A., Rotach, M. W., Vogel, H., Kaufmann, P., & Clot, B. (2013). Empol 1.0: a new parameterization of pollen emission in numerical weather prediction models. Geoscientific Model Development, 6, 1961–1975. https://doi.org/10.5194/gmd-6-1961-2013
Acknowledgements
This paper is a contribution to the EUMETNET (EUropean METeorological NETwork) AutoPollen Programme, which develops a prototype automatic pollen monitoring network in Europe. It covers all aspects of the information chain from measurements to communicating information to the public. Many thanks to the CHAPo project which coordinates the AutoPollen efforts within Switzerland. Special thanks to Dr. Regula Gehrig and Dr. Benoît Crouzy and two anonymous reviewers who helped to improve the article. We also acknowledge access to Tsa at the Swiss National Supercomputing Centre, Switzerland, under the MeteoSwiss’ share with the project ID s83.
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich.
Author information
Authors and Affiliations
Contributions
Both authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by S. Adamov. The first draft of the manuscript was written by A. Pauling, and both authors commented on previous versions of the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Compliance with Ethical Standards
The authors received support from MeteoSwiss for this article, where both authors are employed. Internal review has been carried out by Dr. Regula Gehrig, an editor of Aerobiologia. No funding was received to assist with the preparation of this manuscript. No funding was received for conducting this study. No funds, grants or other support were received. The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
A Appendix
A Appendix
1.1 A.1 Time series at station level for alnus
The first two Figs. (6 and 7) show daily Alnus pollen concentration differences between modelled and measured values (Hirst measurements).
The four Figs. (8–11) show the temporal evolution of the tuning factor F for different calibration model variants. For three model variants, both years 2020 and 2021 are available, for model variant 3, only 2020 is available. Below are the respective experimental settings for the four model variants. In the current model implementation, four main factors were subject to sensitivity studies.
Four adjustable parameters
-
1.
Concentration thresholds for both measurements and modelled values to trigger calibration. In the final model, these are set to \(\sum (obs_{24h}) \ge 240~Pollen/m^{3}\), \(\sum (mod_{24\,h}) \ge 240~Pollen/m^{3}\) and \(\sum (obs_{120\,h}) \ge 720~Pollen/m^{3}\), \(\sum (mod_{120h}) \ge 720~Pollen/m^{3}\). Calibration of the tuning factor F at any station is only triggered if these thresholds are reached.
-
2.
The minimum and maximum values (i.e. climatological limits) of the tuning factor F can reach.
-
3.
Reaction time of the tuning factor adjustment. In the operational setup at MeteoSwiss, the calibration is run every hour. In the final variant, the adjustment is set to:
$$\begin{aligned} F = \root 24 \of {\frac{\sum (obs_{120h})}{\sum (mod_{120h})}} \end{aligned}$$Therefore, the adjustment is carried out in hourly intervals, whereas a full adjustment would be reached after 24 h.
-
4.
The data used for the calculation of the change ratio. In the final variant, the past 120 h are used.
Four Model Variants
For each model variant, the respective differences with the final model are denoted.
-
Final variant (Fig. 8): This is the variant described in this paper.
-
Variant 1 (Fig. 9): The tuning factor adjustment is slower, with a 120th root adjustment every hour:
$$F = {\text{ }}\sqrt[{120}]{{\frac{{\sum {({\text{obs}}_{{120h}} )} }}{{\sum {(\bmod _{{120h}} )} }}}}$$Climatological limits for the tuning factor F were not implemented yet.
-
Variant 2 (Fig. 10): The tuning factor adjustment is faster, with a full adjustment every hour:
$$F = {\text{ }}\frac{{\sum {({\text{obs}}_{{120h}} )} }}{{\sum {(\bmod _{{120h}} )} }}$$ -
Variant 3 (Fig. 11): The update to the tuning factor happens faster, with a full adjustment every hour. The change to the tuning factor is calculated based on the past 24 h of measured and modelled values, instead of the past 120 h:
$$F = \frac{{\sum {({\text{obs}}_{{24h}} )} }}{{\sum {(\bmod _{{24h}} )} }}$$
1.2 A.2 Additional metrics for alnus
See Table 3.
1.3 A.3 Metrics for betula
1.4 A.4 Metrics for corylus
1.5 A.5 Metrics for p oaceae
1.6 A.6 Metrics definitions
See Table 13. The numerical metrics were calculated as follows:
-
Bias = mean(error)
-
SD = sd(error)
-
MAE = mean(abs(error))
-
RMSLE = \(sqrt(mean((log_{10}(1 + modelled\ conc.) - log_{10}(1 + measured\ conc.))^2))\)
Where the error is defined as \(modelled\ concentration\ -\ measured\ concentration\). For the categorical metrics, the concentrations were transformed to an ordinal scale ranging from 0: nothing to 4: very strong. The error was then calculated between those pseudo-numeric values of modelled and measured impact categories.
The following formulas define the categorical metrics using the most simple binary scenario. In our scenario, there are of course five health impact classes, and the formulas have to be extended accordingly.
-
\(Accuracy = \frac{A + D}{A + B + C + D}\)
-
\(Kappa = \frac{2 * (A * D - B * C)}{(A + B) * (B + D) * (A + C) * (C + D)}\) (also called Heidke skill score)
-
\(Sensitivity = \frac{A}{A + C}\)
-
\(Specificity = \frac{D}{B + D}\)
-
\(Pos\ Pred\ Value = \frac{sensitivity * prevalence}{(sensitivity * prevalence) + (1 - specificity) * (1 - prevalence)}\)
-
\(Neg\ Pred\ Value = \frac{specificity * (1 - prevalence)}{(1 - sensitivity) * prevalence + (specificity) * (1 - prevalence)}\)
-
\(Precision = \frac{A}{A + B}\)
-
\(Recall = \frac{A}{A + C}\)
-
\(F1 = \frac{(1 + \beta ^2) * precision * recall}{\beta ^2 * precision + recall}; where~\beta ~is~chosen~such~that~recall~is~considered ~\beta ~times~as~important~as~precision\)
-
\(Prevalence = \frac{A + C}{A + B + C + D}\)
-
\(Det\ Rate = \frac{A}{A + B + C + D}\)
-
\(Det\ Prevalence = \frac{A + B}{A + B + C + D}\)
-
\(Bal\ Accuracy = \frac{sensitivity + specificity}{2}\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Adamov, S., Pauling, A. A real-time calibration method for the numerical pollen forecast model COSMO-ART. Aerobiologia 39, 327–344 (2023). https://doi.org/10.1007/s10453-023-09796-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10453-023-09796-5