The Anthropocene epoch is characterized by a severe crisis in biodiversity, leading to a rapid and unprecedented global loss of species and genetic diversity (Ceballos et al. 2015; Turvey and Crees 2019). Insects are especially affected, with up to 40% of species at risk of extinction (Riley 1986; Habel et al. 2019; Sánchez-Bayo and Wyckhuys 2019). In some protected areas of Central Europe, insects have lost 75% of their biomass (Hallmann et al. 2017) and the species included in the grassland butterfly index declined by 39% in number of individuals in recent decades (Warren et al. 2021). This widespread decline could have significant impacts on ecosystems, as insects play a crucial role in ecological networks (Rosenberg et al. 1986; Yang and Gratton 2014; Noriega et al. 2018; Montgomery et al. 2020).

Despite the urgency for monitoring local extinctions and declines, species trends across space and time are largely unavailable, especially for less visible and unpopular taxa like most insects (Rocha-Ortega et al. 2021; Montgomery et al. 2020; Lobo 2016). Insects are also characterized by a taxonomic complexity that is often solvable only by specialists (Roskov et al. 2019; Marshall 2008; Hochkirch et al. 2022), so monitoring schemes are not available for most insect taxa or, if present, are limited to specific regions and run by professional taxonomists.

Due to the lack of long-term monitoring data, detecting extinctions at local and regional scales in many southern European biodiversity hotspots must be based on time series of occurrence data. Published records follow a typical pattern: there are a few large datasets published in dedicated faunistic papers, rarely replicated over time, and many single records published over the years. This pattern reduces the completeness of time series. In recent decades, citizen science, the involvement of the public in scientific projects (Heigl et al. 2019; Fontaine et al. 2021; Chowdhury et al. 2023a), has repeatedly demonstrated its contribution to documenting species occurrences and spread of invasive species (Crall et al. 2010; Gallo and Waitt 2011; Maistrello et al. 2016; Mannino and Balistreri 2018) and in monitoring protected species and entire communities, as in the case of the European Butterfly Monitoring Scheme (eBMS) (Zapponi et al. 2017; Campanaro et al. 2017; Oberhauser and Prysby 2008; Warren et al. 2021). Citizen science activities provide both structured and unstructured data. Structured data come from organized initiatives that aim to involve the public in pre-defined objectives, such as monitoring a specific taxon (e.g. Callaghan et al. 2020; Krabbenhoft and Kashian 2020; Carpaneto et al. 2017). Unstructured data are provided randomly, without a predefined and taxon-specific (or area-specific) goal, and usually come from citizens uploading their observations on citizen science platforms (e.g. iNaturalist and eBird (Sullivan et al. 2009)). Records from unstructured citizen science are numerous and more distributed temporally and spatially than those from structured citizen science. However, they are more influenced by operator biases since the type and amount of data depend on the user’s personal preferences (Isaac and Pocock 2015; Callaghan et al. 2021; Van Eupen et al. 2021). In this study, we assess the contribution of unstructured citizen science records in discarding local extinctions in the Italian National Parks. Specifically, we addressed three main research questions:

  1. 1.

    To what extent can unstructured citizen science data supplement existing knowledge on butterfly diversity obtained through literature to determine the persistence or local extinction of butterflies’ populations? Unlike literature data, citizen science records are more recent and have seen a significant rise in recent years (Fischer et al. 2021) offering the possibility of resolving concerns regarding local extinctions.

  2. 2.

    To what extent do data collected through citizen science on Italian butterflies depend on species appearance? The number of records uploaded on citizen science platforms may be influenced by species features. Larger species with a wider geographical distribution and longer flight periods might receive more records (Callaghan et al. 2021; Barbato et al. 2021; Stoudt et al. 2022; Van Eupen et al. 2022).

  3. 3.

    Does species appearance affect participants differently depending on their level of engagement in citizen science? A more dedicated involvement in citizen science activities may lead to the observation of less common and less noticeable species (Callaghan et al. 2021). We examined whether users demonstrating greater effort are influenced differently by species appearance and whether they document a larger portion of butterfly diversity compared to those with lower effort, with a similar number of observations.

To address these questions, we analyzed 39,929 butterfly records from literature and 58,993 verified records of Italian butterflies on iNaturalist, one of the best-known and most used citizen science platforms (Aristeidou et al. 2021; Echeverria et al. 2021; Cambria et al. 2021; Nugent 2018; Sanderson et al. 2021). We selected butterflies as a model taxon due to their ecological relevance (Cruden and Hermann-Parker 1979; Courtney et al. 1982; Jennersten 1984), their concerning conservation status (Franzén and Johannesson 2007; Dirzo et al. 2014; McDermott Long et al. 2017; Schultz et al. 2019) and their ease of monitoring by citizen scientists (van Swaay et al. 2008; Wei et al. 2016; Prudic et al. 2017).

Within Italy, National Parks are particularly suitable for our study. They are protected areas that host a high diversity of fauna and flora (Capotorti et al. 2012). Additionally, these are the most extensively studied areas, ensuring a large quantity and quality of literature data on species occurrence that can serve as a basis for evaluating how unstructured citizen science data can complement traditional scientific research. Finally, National Parks are also where citizen science activities occur most frequently, both as independent and organized activities such as bioblitzes (Lundmark 2003).

Two novel and significant resources are also provided: (a) updated butterfly checklists for all Italian National Parks (with time series of occurrence for each species starting from the year 1806—see Danaus chrysippus presence in Vesuvio National Park) and (b) a new R package ( that includes a suite of functions designed to calculate the potential extinction upon time series index (PETS) introduced by Labadessa et al. (2021). This is used to evaluate the importance of unstructured citizen science records in reducing the perception of local butterfly extinctions.

Materials and methods

Data collection

We collected butterfly occurrences from two sources (see Table 1 for definitions of the main terms used in the study):

  1. 1.

    Data collected in the Italian CkMap (Balletto et al. 2007). This resource contains literature data and butterfly specimen records from the main national collections. The database was published in 2007 and it is continuously updated by EB. As of December 2021, it contains 335,499 records of Italian butterflies. The spatial resolution of the checklist is represented by 10 × 10 km2. We only included in our analysis the occurrences whose square center is not more than 5 km away from a National Park perimeter. Out of the total of 39,929 records present in CkMap within the perimeter of Italian National Parks, 20,191 do not have a precise observation date, so they are unusable for our analyses and marked as NA data. Therefore the CkMap has a total of 19,738 usable data.

  2. 2.

    iNaturalist observations. We identified 58,993 records of Italian butterflies uploaded until December 2021 with a location error lower than 1000 m in order to select for high quality data. Among these data, 7427 records were included in National Parks perimeters.

Table 1 Definition of the main terms used in the study

We did not collect data from the Global Biodiversity Information Facility (GBIF) for the following reasons: (i) it is impossible to verify the species identification of citizen science data as pictures are not uploaded, (ii) it contains a mixture of data collected by professional recorders and citizen scientists, making it difficult to separate the two, (iii) in many cases, the observer is not recorded or cannot be precisely identified, which would impede our ability to reliably assess user effort.

All the collected data were organized in the Darwin Core format, a widely used standard format in biodiversity research applications (Wieczorek et al. 2009, 2012; Groom et al. 2019). The main fields for each record are “occurrenceID”, which contains the specimen reference; “Scientific name” in genus species format; “Locality”, where the occurrence is located; “decimalLatitude” and “decimalLongitude” that are the locality coordinates in decimal degrees; “basisOfRecord” that specifies the source of the records, “literature” for CkMap records and “iNaturalist” for records obtained from that platform; “recordedby” is the source of bibliographic information, reference collection, or the iNaturalist user who uploaded the data; and finally “catalogNumber” contains the URL of iNaturalist observations. The Darwin Core file is available for each National Park in the repository of PETS package (

Aim 1—evaluating unstructured citizen science contribution in butterfly diversity monitoring

We assessed the Potential Extinction upon Time Series (PETS) to evaluate the role of citizen science in dispelling doubts about local extinctions. The PETS formula introduced by Labadessa et al. (2021) (Eq. 1) assesses the perception of local extinction in the past years due to effective species losses or to the absence of recent occurrence data.

$$PETS= \frac{{\sum }_{i=1}^{n}last\,year - {last\,occ}_{i}}{\sum _{i=1}^{n}\left(last\,year - {first\,occ}_{i}\right) + 1}$$

In PETS formula (Eq. 1) first occi and last occi are the years of the first and the last observation of the species i, respectively; last year is the year of the assessment (end of the study, 2021 in this case) and it is the same for all species; n is the number of species recorded in the local butterfly community. The potential extinction index for each species is calculated based on the difference between the last year and the last record date (represented by the red bar in Fig. 2a), divided by the time since the first observation date (represented by the cyan bar in Fig. 2a). If all the species observed in the past have been observed in the last year, the PETS index is equal to zero. The output of PETS analysis includes the PETS index, the species list ordered by last observation date, and a graphic representation (Fig. 2b, c). In the graphs, each row on the Y axis represents a species and observation years on the X axis are marked as colored squares with the color indicating the source type. The species with more recent observations are displayed at the top, while older records are shown at the bottom.

For each National Park we calculated two PETS values: (i) The PETS0 index was obtained by considering literature data only (from CkMap) and (ii) the PETS1 index which includes literature and citizen science data (from both CkMap and iNaturalist). We used the “pets” R function of the newly created PETS R package which is freely available at: The difference between PETS0 and PETS1 (ΔPETS) represents the contribution of iNaturalist records in dispelling the perception of local extinction in each National Park butterfly community.

We checked if the Parks characterized by a high fraction of records missing the year of collection (NA data) also showed a higher PETS0, PETS1 and ΔPETS by using Spearman's correlation.

Aims 2 and 3—the interplay between user effort, species traits and documented biodiversity

We assessed if users who put in different levels of effort produce records with a different value in establishing local extinction and in assessing local diversity. We scored the effort as the number of records for each user in the studied National Parks and transformed it by square root (effort in Eq. 2). To assess the value of each record we calculated its contribution in establishing the PETS values (PETSc) as follows: for each National Park, we iteratively removed one iNaturalist record from the dataset and recalculated the PETS1 index without that single datum (PETS1ir). Then we scored PETSc as the absolute value of the difference between PETS1ir and PETS1. A Generalized Linear Mixed Model (GLMM) was used to verify if single records produced by users who put in more effort have a higher PETSc. National Parks and users were included as random factors. The contribution to the observed PETS1 was analyzed by using the following model.

$$PETSc \sim Effort+\left(1|Park\right)+\left(1\right|User)$$

We selected a tweedie distribution for the response variable since it provides a flexible family to deal with non-negative highly right-skewed data as well as symmetric and heavy tailed data. We used the “glmmTMB” function of the glmmTMB R package (Brooks et al. 2017).

We then analyzed the relationship between butterfly species traits, user effort, and the frequency of species records in the entire Italian butterfly dataset. To do this, we used all 58,993 iNaturalist records for Italian butterflies identified by the authors. We first obtained the number of records for each species obtained by each user (records_sp_us, response variable in Eq. 3). Then, for each species we obtained two functional traits from Middleton-Welling et al. (2020) : (i) the wing index (WI), a measure of wing size, and a proxy for species visual appearance; (ii) a set of phenology traits (maximum and minimum number of flight months in Europe, first and last month of flight, number of generations) that were subjected to Principal Component Analysis (PCA) to obtain a single component (Ph, Fig. S1) (Dapporto et al. 2019), which represents species appearance due to the duration of adult flight period. After CkMap we also obtained (iii) the number of 10 × 10 km2 UTM cells where the species have been recorded in Italy (Dis, representing a measure of species appearance based on their distribution). User effort was calculated as the number of butterfly records uploaded by each user.

A Generalized Linear Mixed Model (GLMM) was used to assess the effect of these three species features in determining the number of records per each species uploaded by each user. Interactions between species’ traits and user effort were also included in the model. Species and users were included as random factors.

$$Records\_sp\_us \sim \left(\text{W}\text{I}+\text{P}\text{h}+\text{D}\text{i}\text{s}\right)*Effort+\left(1|User\right)+\left(1\right|Species)$$

Count data was analyzed using a Poisson family. We used the “glmmTMB” function of the glmmTMB R package. Type-III analysis-of-variance table was calculated using the “Anova” function of the car R package. The interactions have been visualized using the “plot_model” function of the sjPlot R package (Lüdecke 2023) with default settings.

Finally, we evaluated if comparable amounts of records from users with varying levels of effort result in different levels of species diversity. To do this, we arranged the users based on the increasing number of records they had. Then, we separated the data into ten quantiles by aggregating the observations from users who exhibit increasing levels of effort, until each quantile boundary was reached. This method ensured that each quantile contained the same number of records, but the first quantiles were comprised of data submitted by users who showed lower levels of effort compared to the latter ones. The species diversity for each quantile was calculated using Hill numbers: species richness (q = 0), Shannon index (q = 1) and Simpson index (q = 2). These calculations were performed using the “hill_taxa” function of the hillR R Package (Li 2018). We used Spearman's tests to identify possible correlations between diversity values and the different level of user effort across the ten quantiles.


Aim 1—evaluating unstructured citizen science contribution in discarding local extinction

We obtained 47,356 records from the Italian National Parks, consisting of 39,929 occurrences from CkMap and 7427 from iNaturalist. While all observations collected by iNaturalist were provided with the year of collection, 50.6% of the records in CkMap, a total of 20,191, were not provided with this information and therefore not useful for analyzing time series.

The results of the PETS indexes for each National Park are displayed in Table S1 and arranged based on the contribution of iNaturalist in reducing the perception of local extinction, as indicated by the difference between PETS0 and PETS1 (ΔPETS).

The results show notable variations among the National Parks in terms of potential extinction of butterfly communities and the contribution of unstructured citizen science in reducing it (Fig. 1). Based on CkMap data only, nine Parks showed that over half of time series were represented by unconfirmed presences (PETS0 > 0.5, Fig. 1). When iNaturalist data were added to the occurrence datasets, only two Parks maintained a PETS1 higher than 0.5. The PETS index dropped from an average of 0.449 ± 0.227 (standard deviation) (for PETS0) to 0.279 ± 0.154 (standard deviation) (for PETS1), which indicates a marked reduction in the lack of knowledge about recent species occurrence (ΔPETS) of 0.170 ± 0.120 (standard deviation) (Fig. 1).

Fig. 1
figure 1

The effect of iNaturalist data in reducing the potential exctinction for each Italian National Parks. In the map the National Parks are marked in green with the results obtained for PETS0 (left) and PETS1 (right) divided in classes and marked by dots colored from green, low potential extinction, to fuchsia, high potential extinction

As a remarkable example, the Gargano National Park was found to have a rich butterfly fauna which was studied in two main campaigns during 1940 and 1950s. However, over half of the species were unrecorded since 1980 (Fig. 2b) resulting in a high PETS0 value of 0.524. Citizen science activities allowed for the confirmation of 56 species in the last 5 years of research (2017–2021) resulting in a large ΔPETS of 0.326 (Table S1 and Fig. 1 and 2b, c).

Fig. 2
figure 2

a The rationale of the PETS algorithm to compute potential extinction based on three records (1981, 1985, and 2011) from 1981 to 2021. Same abbreviations as in Eq. 1. b, c The time series graphs produced by PETS analysis on the butterfly data of the Gargano National Park where each species is represented with a row with its records as in a. b The results for PETS0 where only literature records (CkMap, dark grey dots) are used to assess the potential for extinction and c the result for PETS1 where data from iNaturalist (in red) are also added. The years with both kinds of records for any given species are reported in blue

There were no significant correlations between the percentage of NA data in literature (records for species without a precise year of observation) and different PETS evaluations (PETS0, rho = 0.237, P = 0.255; PETS1, rho = 0.110, P = 0.599; ΔPETS, rho = 0.297, P = 0.149).

PETS graphs for all National Parks, and the butterfly species lists including the first and the last observations, can be found in the Supplementary Results document (Appendix 1).

Aims 2 and 3—the interplay between user effort, species traits and documented biodiversity

The GLMM analyzing the effect of user effort showed a significant positive relationship with the contribution of each observation to determine the PETSc values (Estimate = 0.099, Standard error = 0.035, z value = 2.869, P = 0.004). This demonstrates that single records from more committed users have a higher likelihood of reducing the perception of local extinction.

The Italian data on butterflies collected on iNaturalist also showed that larger species with a wider distribution received a higher number of records (Table 2). The three species features had significant interactions with user effort (Table 2). Users with high effort (Fig. 3a) tended to record species with a wider distribution more frequently. This relationship was less evident for users with a low engagement (Fig. 3a). Similarly, the relationship between flight period and number of records was steeper for users with high engagement (Fig. 3b). Wingspan showed a different trend since users with higher effort tended to record smaller species more frequently while less engaged users showed a steeper and opposite trend, thus reporting mainly large species (Fig. 3c).

Table 2 The effect of the three variables of species appearance on the number of observations per user in a GLMM. Interactions with user effort (number of records per observer) are also reported
Fig. 3
figure 3

Marginal effects of interaction terms in GLMMs between species features and user effort visualized as predicted trends for users showing minimum and maximum effort, species range (A), phenology (B) and Wingspan (C)

We found that a similar number of records uploaded by users with high effort encompassed a higher diversity in terms of the number of detected species (richness, q = 0) and the evenness of recorded individuals among species (Shannon index, q = 1; and Simpson index, q = 2) (q = 0: Rho = 0.893, P < 0.001; q = 1: Rho = 0.939, P < 0.001; q = 2: Rho = 0.939, P < 0.001; Fig. 4).

Fig. 4
figure 4

The relationship between three indexes of diversity obtained after Hill’s numbers (q = 0: richness, q = 1: Shannon index, q = 2: Simpson index) for ten quantiles containing a similar number of observations by user showing an increasing effort. The number of users included in each quantile is indicated in parentheses


We evaluated the impact of citizen science records on reducing the perception of local extinctions in butterfly communities in Italian National Parks, which have an extraordinary diversity within the European and Mediterranean regions. The records by citizen scientists confirmed that several potential local extinctions were actually due to lack of recent records. Additionally, we found that observers with varying levels of effort on iNaturalist had varying contribution to this process, primarily because they recorded different levels of butterfly diversity and responded differently to various aspects of species appearance. These findings provide crucial information for National Parks to develop effective strategies for promoting citizen science initiatives to monitor butterfly populations over time.

Aim 1—the potential extinction upon time series approach

The establishment of the targeted butterfly monitoring scheme (BMS) citizen science project ( has allowed for a precise tracking of the overall decline in butterfly populations over the past decades (e.g. Warren et al. 2021). The BMS has also helped detecting the effects of climate change on butterfly distribution and community composition, as well as the correlation between population trends and functional traits and phylogeny (Parmesan et al. 1999; Devictor et al. 2012; Bonelli et al. 2022; Halsch et al. 2021; Melero et al. 2022). However, long-term data for the BMS is only available for a few European countries and parts of North America (Warren et al. 2021).

Local butterfly extinctions have been documented globally, even in areas without monitoring schemes (Finland: van Bergen et al. 2020; Panama: Basset et al. 2015; California: Preston et al. 2012; Italy: Bonelli et al. 2011, 2022). In most cases, this evidence was based on exceptional datasets, mostly from the past decades, which allowed for the evaluation of a few butterfly communities. The PETS index can integrate knowledge from multiple sources such as literature, museum data, expert collections, standardized monitoring, and unstructured citizen science to evaluate the possibility of local extinctions, even in the absence of exceptional datasets. Our dataset shows that published records are scarce or outdated for most Italian National Parks, and the occurrence of butterflies is not confirmed for more than one-third of the time since their first sighting. Although there is no correlation between the fraction of records without a collection date in literature and the contribution of citizen science in reducing the perception of extinction, the large fraction of records without a precise date greatly hinders the possibility to obtain complete time series. In the light of the current biodiversity crisis, we recommend that researchers include precise data in their observations, especially considering the lack of well-established rules for writing faunistic papers. In this regard, citizen science records are less affected by the lack of collection data.

Due to a general lack of data, the likelihood of local extinctions in PETS0 appears to be quite high, with many species remaining unrecorded in National Parks for decades. This highlights the need for field investigations to confirm the presence of previously recorded species. Such efforts can be costly, but the contribution of citizen science can help reduce the costs. The use of the PETS algorithm revealed that iNaturalist data can play an important role in recording butterfly populations, reducing the lack of knowledge about persistence to an average of 11%. Another significant finding is the considerable variability in PETS0 and PETS1, as well as the difference between them (delta), among different National Parks. This variability is largely dependent on the time since the first and last faunistic study of each park, and to some extent on the level of citizen science activity.

Aim 2 and 3—the effect of species traits and user effort on iNaturalist occurrences

Single observations uploaded on iNaturalist by users who put in more effort contribute more to evaluating the potential for local extensions in Italian National Parks. This is expected if more committed users tend to record a higher proportion of butterfly diversity, being more focused on taking pictures of different species and being differently affected by species appearance.

In general, our findings aligned with previous research on birds by Callaghan et al. (2021), which demonstrated that the availability of unstructured citizen science data depends on species appearance. For butterflies, we found that larger species with a longer flight period and broader geographical distribution are more likely to have a greater number of records available on the iNaturalist platform. A higher number of records for species showing a wider distribution and a longer flight period cannot be considered as a bias but as a desirable property, since these trends are the basis of high quality data obtained from structured monitoring schemes (e.g., in transect counts). However, this property is only shown by iNaturalist users uploading a high number of observations, as documented by the strong interactions between phenology and species range with user effort. In the case of less engaged users, these correlations are less strict than for highly engaged users. This result could be due to the fact that they do not use iNaturalist frequently and their observations constitute too small samples to be affected by phenology and distribution.

While a positive relationship linking upload frequency with phenology and distribution is a desired property of data, the tendency to document more often the occurrence of large species is a typical bias of citizen science (Kral-O’Brien et al. 2020; Isaac et al. 2011; Moranz 2010, Dennis et al. 2006). This expected behavior is generally confirmed in our analysis because larger species scored a higher number of records. However, the interaction between user engagement and butterfly size showed a significant effect. In fact, the decision to upload an observation does not only depend on the probability of encountering a given species, but also on other factors, such as the personal appreciation for that species (e.g. Callaghan et al. 2021; Isaac and Pocock 2015). Also in this case, highly committed users provide more accurate data, as they do not seem to be selectively attracted to bigger and more visible butterfly species.

The preference for capturing pictures of both large and small butterflies by highly engaged users is likely to contribute to the higher diversity observed in their records, both in terms of species richness and evenness. It is possible that these users may learn more about the taxonomy of the butterfly group they are interested in and photograph rarer or less conspicuous species. Additionally, these users may also search more widely to find species with limited distributions, and document butterflies during different seasons. Furthermore, it is possible that a highly committed user may actively search for species not encountered yet, which may further contribute to a more diverse sample of species captured in photographs.

Final remarks

Protected areas play a critical role in conserving biodiversity, promoting sustainability, and raising public awareness of the importance of natural capital and ecosystem services (Bastian 2013; Geldmann et al. 2013; Millennium Ecosystem Assessment 2013; Stolton et al. 2015; Chowdhury et al. 2023). Involving citizens in biodiversity monitoring through citizen science has been shown to be an effective and efficient way to gather data and information (Fontaine et al. 2021; Mannino and Balistreri 2018; Dennis et al. 2017; Zapponi et al. 2017). We documented that citizen science data can also be used to complement existing literature data to more accurately determine the possibility for local extinction and community erosion. This information can then be used by National Parks to prioritize their conservation efforts and save financial resources. National Parks should encourage citizens to participate in both structured and unstructured projects to gather standard and opportunistic data. This can be done through events such as bioblitzes, where people are educated about the importance of monitoring biodiversity and encouraged to upload their observations to platforms like iNaturalist.

It is important to be aware of the limitations of citizen science, including the unequal contributions and quality of data provided by differently engaged users. To address this, National Parks should also promote activities that educate and engage the general public, such as workshops focused on taxa identification with the help of expert taxonomists. In Italy, this has already begun with the hosting of the first Italian BMS workshop in the Sila National Park in 2019, which has since been replicated in five other National Parks. This increased knowledge is likely to result in higher quality data and less influence from aesthetic preferences (Callaghan et al. 2021; Barbato et al. 2021; Randler 2021). Additionally, individuals who are highly engaged in unstructured citizen science are more likely to participate in targeted projects, such as the globally successful Butterfly Monitoring Scheme (Warren et al. 2021). The Italian National Parks are also committed to carrying out pollinator monitoring, including butterfly counts, through Environment Ministry funding with the involvement of volunteers and experts. Improving taxonomy knowledge through citizen science can also help to address the shortage of taxonomists as outlined by the Red List of Taxonomists, a European Commission-funded initiative to increase awareness of the available expertise for preserving insect biodiversity (Hochkirch et al. 2022).