Over the past few years, several instruments have been developed to monitor pollen automatically. These measurement systems produce large quantities of data which, in most cases, are analysed using machine learning algorithms to identify and count pollen grains in real-time; see for example Kawashima et al. (2017), O’Connor et al. (2015), Crouzy et al. (2016), Tešendić et al. (2020), Oteros et al. (2015), Šauliene et al. (2019) and Huffman et al. (2019) for a review. While these algorithms are efficient at handling such data and generally provide good results, pollen identification is sometimes ambiguous and can lead to false positive detection. Another source of false positive detection resides in the incompleteness of training sets. Supervised classification algorithms (as used in the present manuscript) assign particles to one of the classes present in the training set: particles belonging to other classes will always result in false positive detections, albeit with lower classification certainty. Even moderate miss-classification rates can have serious consequences. For example, an algorithm that mislabels a few percent of Alnus pollen grains as Betula may wrongly predict significant Betula pollen concentrations in the middle of February. These falsely identified particles (denoted “false positives” hereafter) are particularly important to avoid in an operational context, when providing wrong information to the public can have important negative consequences. The false positive rate for a given pollen taxon relates to the combination of different factors: the performance of the classification model, the device capability of performing stable reproducible measurements on particles and the volume of air sampled allowing for the supervision method proposed here to come into effect quickly.

The issue of false positives (Tešendić et al., 2020) is well known and there are a number of methods in the field of automatic pollen monitoring that are used to deal with them. One technique makes use of strict thresholds which are applied to ensure that only particles that are identified with very high certainty are taken into account (Crouzy et al., 2016). This, however, has the disadvantage that many particles that do not quite meet the prediction threshold are missed, leading to a potential underestimate of the number counts for the taxon of interest. A second technique essentially makes use of prior knowledge, applying on/off switches according to pollen calendars allowing counting of a particular pollen type only if it is expected in the air at that time of year. This method can miss long-range transport events or simply highly unusual seasons when a particular pollen is airborne at an unexpected time of the year.

Fig. 1
figure 1

Average daily airborne pollen concentration of Fraxinus from the Swisens Poleno with (“supervised”) and without (“raw”) the application of the proposed method (upper panel) and of Betula from the Plair Rapid-E also with (“supervised”) and without (“raw”) the application of the proposed method (lower panel). In both cases, manual observations (“hirst”) following Galán et al. (2014) are also shown for comparison

Here, we propose an alternative method (denoted “supervision” hereafter) to deal with the problem of false positives. The idea is to take advantage of the fact that automatic pollen monitoring instruments, in particular the Hund BAA500 (Oteros et al., 2015) and the Swisens Poleno (Sauvageat et al., 2020), sample a large volume of air and count a large number of particles. The proposed method is suitable for various automatic pollen monitoring systems. Devices with a lower sampling like the Plair Rapid-E (Šauliene et al., 2019; Tešendić et al., 2020) will present a more pronounced lag before the activation and deactivation of pollen taxa. As soon as concentrations around \(\sim 10\ grains / m^3\) have been reached, the automatic pollen detector will observe at least a few pollen grains that can be identified with almost perfect certainty over the duration of, for example, a day. The method is applied as follows: (1) The identification algorithm is applied and a particle is assigned to a particular class based on the label that has the highest prediction value. (2) The classification is disregarded if it fails to pass a first certainty threshold, which should not be set too high so as to preserve good sampling. (3) For the particle to be considered as a true detection, a certain number of such particles needs to have been identified with very high confidence over the past user-defined number of hours, with certainty above a second threshold. In a nutshell, the classifier runs twice with different thresholds, once in a counting mode (soft threshold) and once in a supervising mode (hard threshold). The values for the two certainty thresholds, the number of particles that have to fulfill the second threshold, and the period over which this number of particles has to be observed are the parameters that need to be set for this method. They may be different from one pollen taxa to the other or from one device to the other, although fixing general values compatible with the thresholds relevant for allergies already gives good results. Once the activation thresholds of the supervision algorithm have been calibrated, the proposed method bases on measurement on airborne particles only. This presents a decisive advantage over calendar-based approaches (Tešendić et al., 2020) in a changing climate, or when moving the device to other climatic regions.

To provide a concrete example, let us assume that a detection algorithm is applied, and for a given particle assigns a certainty value of 0.68 to the label Betula, 0.25 to the label Alnus, and 0.07 to the label Corylus. The particle needs to pass the first detection threshold at, for example 0.65, to be a candidate to be labeled Betula. However, only if during the previous 48 hours at least 5 other particles with the label Betula have been identified with a certainty of at least 0.98, will the particle be assigned the label Betula and counted as such. If, however, this requirement is not met, this particle is instead not counted. While determining which threshold values are optimal may take a little adjustment, once put into practice the supervision proves very efficient at removing falsely identified particles. Figure 1 presents examples of the application of the supervision to both the Plair Rapid-E and Swisens Poleno airflow cytometers. Both instruments were run by MeteoSwiss in Payerne, Switzerland (for details on the measuring site, see Crouzy et al. (2016)) using either the MeteoSwiss operational recognition algorithm (Sauvageat et al., 2020) for the Swiss Automatic Pollen Network (Swisens Poleno) or the MeteoSwiss algorithm for the Plair Rapid-E as described in Šauliene et al. (2019). Regarding scaling, we followed the method described in Crouzy et al. (2016). In an operational setup at MeteoSwiss, the scaling factor used is constant over the years and the same factor is used on all instruments. False positive detections are to some extent flattened by the scaling, the signal-to-noise ratio remains however unchanged. We decided to present different species in different devices in order to show that the method is not restricted either to a single device or to a single taxon. The choice of assessing performance with Fraxinus and Betula came from their importance for operational monitoring in the region where the measurements were performed (Swiss Plateau). For the two instruments, average daily pollen concentrations are shown with and without the application of supervision and compared against the manual Hirst reference. For the Swisens Poleno the first threshold was set at 0.75, the second threshold at 0.999 and the number of high-confidence particles at 30 over the previous 48h. For the Plair Rapid-E the first threshold was set at 0.65, the second threshold at 0.99 and the number of high-confidence particles at 15 over the previous 24h. Adapting thresholds to optimal values requires at least one measuring campaign with reference instruments, or measurements performed in a region with comparable pollen taxa. We strongly advise to keep all raw data of the automatic pollen monitoring systems in order to reconstruct coherent time series once the iterative process of optimizing thresholds has been completed. It is clear that the peaks outside of the main pollen season are completely removed with the application of the supervision method, with wrongly identified particles no longer counted as such. Differences observed within the season (after label activation and before deactivation) are related on one hand to the uncertainty on manual Hirst measurements (Oteros et al., 2017) and on the other hand to the uncertainty on automatic measurements (Lieberherr et al., 2021). The method allows the efficient removal of false positive detections in situations such as the one shown in the upper panel of Fig. 1. Care must, however, be taken in situations with high raw false positive rate such as the one shown in the lower panel of Fig. 1 as the results heavily rely on the application of the supervision method presented here: failure of the supervision method may result in completely wrong pollen concentrations either by cutting real peaks or by letting false positives pass through.