Where to search: the use of opportunistic data for the detection of an invasive forest pest

Early detection is important for the management of invasive alien species. In the last decade citizen science has become an important source of such data. Here, we used opportunistic records from the “LIFE ARTEMIS” citizen science project, in which people submitted records from places where they observed tree pests, to understand the distribution of a rapidly-spreading forest pest: the oak lace bug (Corythucha arcuata) in Slovenia. These citizen science records were not distributed randomly. We constructed a species distribution model for C. arcuata that accounted for the biased distribution of citizen science by using the records of other tree pests and diseases from the same project as pseudo-absences (so-called constrained pseudo-absences), and compared this to a model with pseudo-absences selected randomly from across Slovenia. We found that the constrained pseudo-absence model showed that C. arcuata was more likely to be found in east, in places with more oak trees and at lower elevations, and also closer to highways and railways, indicating introduction and dispersal by accidental human transport. The outputs from the model with random pseudo-absences were broadly similar, although estimates from this model tended to be higher and less precise, and some factors that were significant (proximity to minor roads and human settlements) were artefacts of recorder bias, showing the importance of taking the distribution of recording into account wherever possible. The finding that C. arcuata is more likely to be found near highways allows us to design advice for where future citizen science should be directed for efficient early detection.


Introduction
Invasive alien species are a threat to biological diversity and ecosystems (Vila et al. 2011). The number of alien species is still increasing and does not appear to have reached saturation globally (Seebens et al. 2017). Although not all alien species will become invasive, 15% are regarded as invasive (Hulme 2009). Invasive alien species affect biodiversity and ecosystem services through competition, predation, crop damage, hybridisation or as disease vectors (Lowe et al. 2000). It is estimated that introduction of Abstract Early detection is important for the management of invasive alien species. In the last decade citizen science has become an important source of such data. Here, we used opportunistic records from the "LIFE ARTEMIS" citizen science project, in which people submitted records from places where they observed tree pests, to understand the distribution of a rapidly-spreading forest pest: the oak lace bug (Corythucha arcuata) in Slovenia. These citizen science records were not distributed randomly. We constructed a species distribution model for C. arcuata that accounted for the biased distribution of citizen science by using the records of other tree pests and diseases from the same project as pseudoabsences (so-called constrained pseudo-absences), and compared this to a model with pseudo-absences selected randomly from across Slovenia. We found that the constrained pseudo-absence model showed that C. arcuata was more likely to be found in east, in places with more oak trees and at lower elevations, invasive alien pests can negatively impact up to 10% of the European total carbon stores (Seidl et al. 2018), substantially reduce growth rate and yields of crops (Battisti et al. 2014;Marcolin et al. 2021) and many species have been driven extinct because of invasive alien species (Clavero and García-Berthou 2005), especially on islands (Spatz et al. 2017). It is therefore important to deal rapidly with invasive alien species and an important first step is to provide effective early detection.
Early detection of invasive alien species is important to reduce their chance of establishing or spreading by facilitating rapid control and eradication measures. However, the cost of early detection can be high and citizen science is a solution to provide long-term and large-scale surveillance for invasive alien species (Johnson et al. 2020). Surveillance of invasive species by citizen scientists is often based on opportunistic reporting of observations by the general public (Pocock et al. 2017), but it would be of benefit to know where it is best to search for recently introduced species in order to focus detection efforts by professionals and citizen scientists. Prior knowledge on the distribution of species from similar countries can be very helpful to predict the likely distribution of invasive species, as has been done for forest pests in the United States (Hudgins et al. 2017).
One of the challenges with citizen science data is that there is an uneven distribution of records, due to uneven coverage of recorders (Isaac and Pocock 2015). This is especially problematic at global scales (Beck et al. 2014;Boakes et al. 2010;Martin et al. 2012), and can depend upon the spatial extent of the projects (Lloyd et al. 2020). However, even at national scales the effect of spatial bias is still apparent with the intensity of records from unstructured citizen science projects often being strongly positively related to human population density, roads and certain habitat types (Boakes et al. 2016;Geldmann et al. 2016;Mair and Ruete 2016;Pernat et al. 2021;Petersen et al. 2021). This bias is affected by aspects of the design and complexity of the citizen science project, and the species being sampled (Boakes et al. 2016;Geldmann et al. 2016;Petersen et al. 2021). There are several methods to account for these biases in analyses, especially where there is data from which to infer absences (Bird et al. 2014;Johnston et al. 2020;Robinson et al. 2018). For example, observations of other species sampled in the same study can be used as evidence that observers sampled an area but did not find the target species (Phillips et al. 2009), and detecting invasive alien species can be a motivator for people to undertake multispecies sampling (Petersen et al. 2021). However, when unstructured citizen science only targets a single species, such as for an invasive species of concern, then it is harder to account for these biases because it difficult to distinguish between absence of the species and absence of observers. Potentially, if we can understand these patterns of recording, biases and gaps in recording, then they can inform where to prioritise future sampling (Pocock et al. 2017;Tulloch et al. 2013).
Here we selected the oak lace bug (Corythucha arcuata Say, [1832] (Hemiptera: Tingidae) as a model species to explore the influence of citizen science biases for modelling the distribution of an invasive alien species. The oak lace bug is native in North America and was first sampled in Italy (2000) and Turkey (2002) and has spread throughout Europe ). In the last decade, strong infestations have been reported from Russia . The species is able to spread rather quickly as it is often spread passively; records of the oak lace bug have often been made near roads and railways, where all life stages of the insect can be transported by pedestrians, traffic and trains Küçükbasmacı 2014;Liebhold et al. 2013;Simov et al. 2018), although this could also be due to more observers in these areas. The species can also be passively spread by the wind Küçükbasmacı 2014). The oak lace bug mainly feeds on oak species (Quercus spp.) but is also capable of infesting other types of woody plants, such as elm (Ulmus spp.), lime (Tilia spp.) and rose (Rosa spp.)  as spillover from oaks. The damage to the trees is caused by sap-feeding by adults and nymphs, which causes a decrease in photosynthesis (Nikolić et al. 2018), premature defoliation and discolouration of the leaves Paulin et al. 2020) and susceptibility to other pests, diseases and pollution (Küçükbasmacı 2014). The oak lace bug is regarded as a problem for both forestry and recreation because it can potentially affect the wood production and the aesthetics of the forests (Bălăcenoiu et al. 2021).
The oak lace bug is a species that will potentially gain the attention by the general public because of its strong visual impact on oak trees (Bălăcenoiu et al. 2021). Therefore, citizen scientists can have great value in the early detection of this species, but guidance is required where citizen scientists should make observations for the early detection of the oak lace bug.
The aims of this paper are to test the usefulness of opportunistic citizen science observations for creating a model that predicts where the species is likely to be first detected, and to explore the impact of biases of the citizen science data in the model outputs. We explored the influence of predictor variables on species distribution models that both accounted for the sampling bias of occurrences due to the uneven distribution of citizen science sampling and those that did not. For the model that accounted for this bias, we expected that the predictor variables will be related to the biology and invasion pathways for this species (i.e. Csóka et al. 2020;Küçükbasmacı 2014;Liebhold et al. 2013), because biases in citizen science are accounted for in the model. With the model that did not explicitly account for this bias, we expected that the variables will be partly confounded with the behaviour of citizen scientists, (Boakes et al. 2016;Petersen et al. 2021), leading to inaccurate estimates of the presence of C. arcuata. We further expected that this would show greater uncertainty than the model accounting for recorder bias. We aimed to use the model accounting for recorder bias to make recommendations about designing future early detection with citizen science data for this species. We used the invasion of the oak lace bug in Slovenia as a test case. The first occurrence of the oak lace bug was in 2016 (Jurc and Jurc 2016), which coincided with the start of the citizen science project LIFE ARTEMIS and so the oak lace bug was recorded through that project (Crow et al 2020). Therefore, the early stage of invasion was potentially well recorded with citizen science, which has not been the case in any other country in which the oak lace bug was introduced. Having the combination of both data of the oak lace bug and other alien species gave us the opportunity to test the influence of biases of recorder behaviour on analyses of invasive species spread.

Area description
Slovenia is a small Central European country (20,273 km 2 ) located between Italy, Austria, Hungary and Croatia. Despite its size, the country is geographically diverse. It is made up of portions of four major European geographic landscapes: the Hungarian plains (Pannonian region) in the east, the karstic Dinaric mountains in the south, and the Alpine region in the north and northwest (Ciglič and Perko 2012); these are divided into six different climatic regions (Kozjek et al. 2017).. Furthermore, Slovenia is an important crossroad for transport in Europe (with major routes from Austria to Croatia and Italy to Hungary), which could be important for the introduction and accidental transport of the oak lace bug ]. Oak species, the main host for oak lace bug, are distributed all over the country: Quercus cerris is distributed in the west and the east of the country, Q. petraea is all over the country, Q. pubescens mainly in the west of the country and Q. robur mainly in the central part and the east of the country (Jogan et al. 2001).

Databases
Data on the presence of the oak lace bug were collected over the period 2017-2020 as part of a citizen science activity run through the LIFE ARTEMIS project (LIFE15 GIE/SI/000770) (Crow et al. 2020). In this project, media attention was given to the first observations and possible consequences of the oak lace bug in Slovenia in order to encourage citizen scientists to collect observations of this species with the "Invazivke" information system (Ogris 2020). The "Invazivke" information system is a database for collecting records on non-native species (plants, fungi, insects, and mammals) associated with forests in Slovenia. In total 158 species are in "Invazivke" of which 94 species now have citizen science occurrence data. Records of all species were georeferenced and accompanied by photographic evidence, allowing them to be verified.
For the study we used data for the presence of the oak lace bug and records of other species from observers who also observed oak lace bug from "Invazivke" during 2017-2020 because we knew that they had the ability to identify this species (Arzenšek et al. 2021) and included six additional presence records from a single professional in 2017 when the LIFE ARTEMIS project had only just begun. In order to reduce the impact of clusters of records, we used a 250 × 250 m grid across Slovenia and randomly selected one observation per grid cell (hereafter a 'location') per year to give a total of 204 unique location x year combinations. For the constrained pseudo-absence records we selected unique location x year combinations without records of oak lace bug, but with records of any other species recorded in "Invazivke" made by the set of people who had recorded oak lace bug (n = 1378). We restricted ourselves to this set of people because we were confident that they had the ability to detect oak lace bug if it had been present when they recorded other species. We describe these as pseudo-absences (Phillips et al. 2009) because they were from grid cells without observations of the oak lace bug, but without confirmation that the oak lace bug was absent. For the analysis, we randomly selected a number of constrained pseudo-absence points, i.e. data points of other alien species reported by the same people in the information system "Invazivke", each year that was equal to the number of oak lace bug records that year. We also selected an equal number of random pseudoabsences, i.e. selected from any location without a record of oak lace bug.
A range of predictor variables was used in this study covering physical, human and ecological attributes (Table 1). These variables differed in their spatial resolution and so were unified to the model grid cell dimension of 250 × 250 m (n = 327,540) covering the whole area of Slovenia. We resampled the elevation from the source 12.5 m spatial resolution to the model cell resolution using the Resample tool of Esri ArcMap 10.6.1 with bilinear interpolation, which calculates the value of each pixel by averaging the values of the surrounding four pixels (GURS 2006). We calculated the slope from a digital elevation model with the Slope tool of ESRI ArcGIS software and planar method, that calculates slope on a projected flat plane using a 2D Cartesian coordinate system (GURS 2006). We obtained the volume of oak trees (m 3 ) as a proxy for oak tree density, from a forest stand database that is managed by Slovenia Forest Service (ZGS 2020). Almost all oak species in Slovenia were considered, i.e. Quercus petraea, Q. robur, Q. pubescens and Q. cerris, because they are the hosts to the oak lace bug. Then, we calculated the shortest distance of each grid cell centroid to forest edges, highways, local roads, and railways using STDistance function of geometry datatype in Microsoft SQL Server 2016 (GURS 2020b). We calculated the area of settlement as an area of buildings in the grid cell using the Land Register of Slovenia (GURS 2020a). Finally, we projected the citizen science data for the oak lace bug and other invasive alien species onto the grid cells, resulting in 204 unique grid cells with oak lace bug records and 1378 with records of other species. Duplicate cells were removed.

Statistical analysis
The variables were checked for outliers and multicollinearity using visual inspection (Zuur et al. 2010) and a variance inflation factor using the library "car" (Fox and Weisberg 2019) using the statistical programming language R (R Core Team 2021). The elevation and the surface of urbanized areas, distance to local road, distance to railway, distance to highway, area of the settlement and amount of oak were log10 + 1 transformed in order to reduce the leverage of the extreme values. The variables were visualized with a density plot prepared in the library ggplot2 (Wickham 2016) and described in the results. We then standardized all variables to have a mean of zero and a standard deviation of one because this gave standardized effect sizes that allowed us to directly compare the importance of the variables (Schielzeth 2010). For the analysis, a generalized linear model (GLM) was used with a binomial error structure.
In total, we ran three species distribution models each of which tested for the effect of these variables against dependent variables of presence and pseudoabsence data. The constrained pseudo-absence model was trained on the presence of oak lace bug and constrained pseudo-absence points: this was to assess the true distribution of the oak lace bug, by accounting for the uneven distribution of citizen science recorders. The random pseudo-absence model was with the presence of oak lace bug and random pseudo-absence points: this assessed the distribution of the oak lace bug, but did not account for the uneven distribution of citizen science recorders. A sizeable difference between random pseudo-absence model and the constrained pseudo-absence model would indicate that the uneven distribution of recorders likely led to biases that affected the results of analysis. A third model was run to describe the spatial biases in the pattern of citizen science reporting. This citizen science reporting model was trained on all citizen science data from the "LIFE ARTEMIS" project in the "Invazivke" database and an equal number of random pseudo-absence points.
We included the independent variables listed in Table 1 in each model. For the constrained pseudoabsence model and the citizen science reporting model, spatial autocorrelation in the residuals was detected with the Moran's I index. Therefore, the Moran's Eigenvector was incorporated in these models with the function "ME" from the R package "spatialreg" (Bivand and Piras 2015) to avoid underestimating the magnitude of the parameters.
We built all models using all the predictor variables in Table 1 and evaluated them via cross-validation.. Training and validation datasets were selected randomly with a ratio of 80:20. For the training dataset, this was done separately for both random and constrained pseudo-absence (each with 1,103 data points) and presence data (164 data points). For the validation dataset, there were 40 data points for the presence data and 275 pseudo-absence data points (both for the random and constrained pseudo-absence points). In order to test the sensitivity of the models to the input data, we ran the constrained pseudo-absence model and random pseudo-absence model six times, each with different subsets of pseudoabsences (random or constrained, depending on the model). For the citizen science reporting model there was a larger dataset available for the citizen science data compared to the other models, but to support comparability with the other models, the models were run randomly chosen subsets of the data that were equal in size to the oak lace bug models, and this was replicated six times as with the models described above. To provide a single estimate of the size and significance of each variable from across the six replicates of each of the three models, we used Monte Carlo simulation: for each variable, we simulated 1,000 iterations from the mean and variance from each replicate model, and hence calculated the overall mean and variance across models. We fitted the averaged model on the validation dataset for validation of the models. The area under curve (AUC) in an receiver operating characteristic (ROC) plot was calculated for the validation dataset with the library "PresenceAbsence" (Freeman and Moisen 2008).
We expected that there may be a systematic difference between the constrained and random pseudoabsence models in the model estimates and their precision, and that this might be affected by estimated probability of presence for a location. We tested this by comparing the estimate and its variance from the constrained and random pseudo-absence models for a range of locations. The estimates of probability were not evenly distributed across the range 0-1 so, in order to balance the distribution of points across the estimated values of presence, we randomly selected ten cells from each 0.1 interval of probability of the constrained pseudo-absence model (in total 100 cells: ten randomly selected from the range 0 to 0.1, ten from 0.1 to 0.2, and so on). For each selected cell we obtained the probabilities of presence from the constrained and random pseudo-absence models. We also calculated the precision of these estimates by using Monte Carlo simulation with the model parameters and their variances to generate 10,000 predictions and calculated the inter-quartile range (IQR) of the predictions from each model for each selected cell. IQR was square root arcsine transformed to reduce the bias due to the boundary effect of the proportion data (Gotelli and Ellison 2012) The prediction and transformed IQR was plotted against the prediction from the constrained pseudoabsence model and a "loess" smoother was applied using the library "ggplot2" (Wickham 2016) to identify differences in variability between the two models.

Results
Initial observation of the data indicated that the distribution of environmental variables differed between the oak lace bug records and the constrained and random pseudo-absence data (Fig. 2). In particular, the citizen science records (oak lace bug and other species) appeared to be distributed closer to highways, roads, railways and the areas with lower slope than the randomly selected locations. The oak lace bug tended to be recorded further east than other citizen science records.
The constrained pseudo-absence model was expected to be the most accurate description of the variables explaining the oak lace bug distribution because it accounted for spatial biases in the citizen science data. It shows that the oak lace bug was, from the strongest to weakest standardized slopes, significantly more likely to be found where there are more oaks, in the east of the country, closer to highways, at lower elevation and close to railways (Table 2, Fig. 1b, Fig. 2). The year was also an explanatory variable, showing that in 2020 there was a higher probability of recording compared to the reference year 2017, which is probably indicative of the increasing abundance of the species over time, but could also be explained by increased awareness of the species amongst reporters.
The random pseudo-absence model had some broad similarities to the constrained pseudo-absence model in terms of magnitude and significance of many of the variables ( Table 2). The six replicates of this model did not show any spatial autocorrelation. However, there were important differences between this and the constrained pseudo-absence model. These were most visible in the map of the predictions (Fig. 2), with higher predicted probability of presence in more accessible locations (near roads and habitation) in the west of Slovenia. These differences indicate the importance of taking account of spatial biases in reporting when analysing these data, although the similarities between the two models indicate that some broad patterns can be identified correctly despite spatial biases in the presence data.
The citizen science reporting model explained the distribution of citizen science records. Although the records will be constrained by the presence of tree pests and diseases and alien plants, we expected that, overall, they provide a reasonable assessment of the distribution of recorders. This model suggests that people are most likely to record in accessible places (from highest to lowest standardized slope): at lower elevations, more in the western part of Slovenia, near transportation, especially local roads, near the forest edge, in places with more human habitation, in places with less sloping ground, close to railways and highways, and in with larger amount of oak trees (Table 2, Fig. 1d, Fig. 2), although in this model only the longitude and distance from local road were statistically significant. This model shows that the difference between the two oak lace bug distribution models ( Fig. 1b and c) was explained by the spatial bias in citizen science recording.
The constrained pseudo-absence model's validation AUC (0.85) was similar to that of the citizen science reporting model (0.83) but considerably lower than that of the random pseudo-absence model (0.95), and this may be because the latter was overfit, suggesting that sampling bias can inflate AUC scores. Finally, we expected that the model estimates and their precision would vary between the constrained pseudo-absence model (I.e. the model that took account of the spatial bias in citizen science reporting) and the random pseudoabsence model. We found that in general, the random pseudo-absence model overestimated the probability compared to the constrained pseudo-absence model (Fig. 3a). Also, uncertainty was lowest for very low and high estimates of probability, but for any level of predicted probability, the random pseudo-absence model had higher uncertainty than the constrained pseudo-absence model (Fig. 3b). This suggests that not taking account of the uneven spatial distribution of citizen science recording leads to predictions that, in this case, are biased to be higher than expected and to have greater uncertainty than expected. Therefore, using information on the distribution of recording effort in modelling is important for accurate predictions of the probability of presence.

The distribution of the oak lace bug in Slovenia
Here we showed that a national, multi-year citizen science dataset was effective for assessing the increasing distribution of an invasive tree pest, the oak lace bug, in Slovenia. In common with many citizen science projects, this one was unstructured (i.e. 'opportunistic'), meaning that people could contribute when and where they chose. Here we modelled habitat suitability for the oak lace bug with citizen  science data on occurrences, taking the uneven distribution of recorders into account, to predict where oak lace bug occurred. The observed locations of the oak lace bug generally matched the predictions from our species distribution model (constrained pseudoabsence model in Table 2; Fig. 1a and b). However, it appeared to be under-recorded in the far east of Slovenia, relatively over-reported around the large city of Ljubljana, and there was a region in the south west of Slovenia where the oak lace bug was predicted to be present but with no records yet, indicating likely suitability of the region for invasion by this species. The model showed the relationship of oak lace bug with predictor variables, specifically that the species was more likely to be present near highways and railways, in places with high oak density and at lower altitudes (constrained absence model in Table 2). Although not much is known about its natural dispersal, adults of the closely related plane lace bug (C. ciliata) do not fly long distances (Wu and Lui 2016), so the association with major travel routes indicates that it spreads to other locations via traffic and railroads Küçükbasmacı 2014;Liebhold et al. 2013;Simov et al. 2018), probably via accidental transport of oak logs. Indeed the first observation in Slovenia was along a railroad (Jurc and Jurc 2017), and one early observation was at a rest stop along the highway near Maribor (pers. comm. M. de Groot).
The oak lace bug first arrived in the east of Slovenia (Jurc and Jurc 2017), from the Balkans and Hungary, where large outbreaks had already been found Jurc and Jurc 2017), and spread westwards, so latitude was an important predictor of current distribution. However, several citizen science observations have been already made in the west of the country. Populations do not seem to be well established there yet, but it seems likely that populations will establish there in the future, with our model indicating regions most susceptible to early establishment (Fig. 2b).
We also found that the amount of oak positively affects the presence of the oak lace bug. This is not surprising because the species mainly, although not exclusively, uses oaks as its host . Outbreaks of oak lace bug could be mitigated by increased tree species diversity as was shown also for other herbivores (Jactel et al. 2017), although it remains a question whether changing woodland structure would decrease the likelihood of outbreaks. Fig. 3 Simulated probability of the presence of oak lace bug and its uncertainty from 1000 randomly selected cells. Figure 3a shows the correlation between the simulated probability of constrained pseudo-absence model and the random pseudo-absence model, and the probability of constrained pseudo-absence model. The grey dots and ribbon are the simulated probability mean and the variability between the first and third quartile of the constrained pseudo-absence model. The black dots and lines are the simulated probability mean and the variability between the first and third quartile of the random pseudo-absence model. Figure 3b shows the simulated variability between the first and third quartile association with the probability of the constrained pseudo-absence model (black line) and the random pseudo-absence model (grey line) In Slovenia, oak is more prevalent at lower altitudes (San-Miguel-Ayanz et al. 2016) but even taking oak abundance into account, altitude was also an important predictor; it remains uncertain whether oak lace bug is restricted by temperature, as is the case for many species (Ratte 1984), or whether it will spread to these less accessible places over time.
We note that this model shows only where the species will be found in this stage of its invasion; a thorough assessment of habitat suitability for the species can be obtained when it invades the whole area and is closer to dynamic equilibrium.
Best use of citizen science in species distribution modelling In this project, we had the benefit of using presence records from other species in the same citizen science project as pseudo-absences in our dataset. This was important because the citizen science recording effort was spatially biased (Model 3 in Table 2). The pattern of citizen science recording was strongly influenced by accessibility of sites to people, as shown by the importance (absolute size of standardised regression coefficient) of variables such as proximity to settlements, local roads and railways, proximity to forest edges, and elevation (Table 2). Several other studies have found that citizen science recording is strongly influenced by factors related to accessibility at both broad scales, e.g. proximity to towns, and local scales, e.g. proximity to roads (Boakes et al. 2010;Isaac and Pocock 2015;Mair and Ruete 2016;Petersen et al. 2021).
By comparing the results of the models with constrained pseudoabsences to that with random pseudoabsences (i.e. the models that did and did not take account of recorder bias, respectively), we showed that not taking recorder effort into account led to biased model outputs. The random pseudoabsence model (Fig. 2c) had higher predicted presence near settlements in the centre and west of Slovenia, compared to the constrained pseudoabsence model (Fig. 2b). Consideration of differences in the model variables ( Table 2) also showed that the model with random pseudoabsences found that being near a forest edge had higher probability of presence. All these differences mirror the distribution of citizen science recording effort (Fig. 2d). In general, the model with random pseudo-absences tended to have higher predicted presence for any specific location (Fig. 3a) and, for any value of predicted presence, tended to be less precise (greater uncertainty; Fig. 3b), which was because it did not account for the spatial bias in the distribution of citizen science recorders.
All these results show the importance of taking the distribution of citizen science recording effort into account when analysing citizen science data of invasive species. However, gaining a true understanding of the distribution of potential recording effort is very difficult. In the past researchers have ignored the problem or used proxies such as other people recording similar species (Pocock et al. 2017). Here, we took pseudo-absence points from the records of other tree pests and diseases from citizen science, so using it as a proxy for recorder effort. However, this was not a perfect measure of 'absence': the other species were from many different species, many were not insects and some occurred at different seasons to the oak lace bug. Also, it is possible that there may be some places (e.g. pristine forests) where people looked but did not record any of the species, so recording effort would be under-estimated there. One solution to this problem would be to ask people to record absences. This would work best with a more structured sampling approach, but the motivation to record an absence is very different to the motivation to record a presence, and structured sampling is inefficient, and arguable unethical (Pocock et al. 2020), for early detection where the probability of detection is low. In addition, paucity of any data from less accessible locations (Fig. 2d) could reduce the statistical power to identify some biologically-informative variables in the model with constrained pseudoabsences.
Many citizen science projects only record presence data, and there is no information on recording effort, so it is not possible to run a model with constrained pseudoabsences. Although there are important differences between our two models of oak lace bug distribution ( Fig. 2 and 3, Table 2), the general patterns of distribution are similar. This indicates that presenceonly data can be used for species distribution modelling, although great care should be taken when interpreting the results of analysis. Ultimately, including the distribution of potential recording in unstructured citizen science is a vital advance that will support the good analysis of citizen science data for the early detection of invasive species. Implications of our findings for early detection of invasive species As well as considering what the results tell us about the distribution of oak lace bug, they also help us consider how to improve citizen science for the early detection of this species. The finding that oak lace bug was most likely to be found along highways, even once biases in recorder effort were taken into account, highlights that these types of sites should be targeted for early detection of this species. Indeed, citizen science records were also more likely near highways (Citizen science reporting model in Table 2), there is a valuable convergence between the places where people choose to record and the places where they need to look for early detection of this species.
Planning for early detection should include the use of citizen science (Pocock et al. 2017), given the growing importance of citizen science in this field (Roy et al. 2018). Therefore, when designing citizen science for oak lace bug in areas where it is not yet present, such as western Slovenia or neighbouring countries, it would be valuable to recommend recorders focus on recording from stands of oaks near highways, in an effort to target early detection of the species (Morisette et al. 2020). Conversely, where the species is becoming established, it could be more valuable to gather records from places where we have least information, e.g. further away from settlements and at higher elevations.
Overall, citizen science is a valuable approach to invasive species early detection and monitoring. This method did not exclude biases and challenges in the use of the data but, as we have shown, with care these can be overcome to make the most of the data and so honour the valuable contribution from members of the public. We recommend that understanding the distribution of the potential recording effort from citizen science is vital to benefit effectively from this opportunity. It helps us properly use the results for construction of unbiased distribution models of the invasive species and so understand predictors of the distribution. This will help us to better target recording effort, whether through citizen science, e.g. via local publicity campaigns and targeted information on where to look, or through professional monitoring to fill gaps in the citizen science recording effort.