1 Introduction

Climate change is predicted to become an increasingly important cause of biodiversity decline (Thomas et al. 2004; Pereira et al. 2010). Species’ responses to climate are often complex and present challenges for modelling and prediction. We illustrate the models of this paper with reference to butterflies, the most comprehensively monitored insects. Their population status is increasingly recognised as an indicator for changes in biodiversity as they respond sensitively and rapidly to changes in habitat and climate (Thomas 2005).

Previous studies of UK butterflies imply positive associations of populations with warm summer weather, but predicted relationships with winter weather are variable (Roy et al. 2001; Dennis and Sparks 2007; Isaac et al. 2011). Evidence for shifts in phenology (Roy and Sparks 2000) and increases in voltinism (Altermatt 2010) have also been presented.

Extensive sources of citizen-science count data for butterflies are available both in the UK and around the world, and there is much interest in developing robust modelling approaches to assist the monitoring and understanding of species’ responses to change. Butterflies have multi-stage life cycles, and counts fluctuate within each year in response to their emergence as adults, which is generally the only life stage with widespread data. Soulsby and Thomas (2012) developed a mathematical model for this variation, but only allowed for discrete, non-overlapping generations. Other models have been proposed to describe the within-year variation, both non-parametrically using generalised additive models (GAMs, Rothery and Roy 2001; Dennis et al. 2013) and via stochastic mixture models (Matechou et al. 2014; Dennis et al. 2014). Counts adjusted for seasonal fluctuations can then be used to produce longer-term trends, but existing methods do not impose any relationship between counts from one year to the next, which is the topic of this paper.

Causes of variation in both abundance and seasonal pattern from one year to the next are multi-faceted, relating to numbers during the previous year, as well as other factors driving the unobserved stages of the life-cycle, such as weather. We describe a novel dynamic framework which models count data across multiple sites from consecutive years, with abundance in any given year driven by that in the previous year. We adapt the approach for bivoltine species, with the first brood in a year feeding into the second. The models can be fitted efficiently using concentrated likelihoods. Performance is illustrated for a sample of species, making comparisons with indices generated from GAMs, and introducing new methods of exploring covariate dependence.

Although we present and illustrate the work in terms of butterflies, it may be applied to other insect species, possibly after modification appropriate to their ecology. For example, the flightless longhorn beetle, Dorcadion fuliginator, takes two years to reach maturity (Baur et al. 2005), as do many dragonflies and some crickets. The models may also be adapted for the study of migrant bird populations.

2 Materials and Methods

For any species, suppose counts of adults are recorded at S sites, each visited on up to T occasions, in each of Y successive years. Each can be treated as the realisation of a random variable from a suitable discrete distribution. For example, if this is taken as Poisson, with expectation \(\lambda _{i,j,k}\) for site i, visit j and year k, the likelihood has the form

$$\begin{aligned} \text {L}({\varvec{\rho }}, {\varvec{\mu }}, {\varvec{\sigma }},{\varvec{N}}_1;{\varvec{y}}) = \prod _{i=1}^S\prod _{j=1}^T\prod _{k=1}^Y\frac{\text {exp}(-\lambda _{i,j,k})\lambda _{i,j,k}^{y_{i,j,k}}}{y_{i,j,k}!}, \end{aligned}$$

where \(\{y_{i,j,k}\}\) are the counts and \({\varvec{\rho }}\), \({\varvec{\mu }}\), \({\varvec{\sigma }}\) and \({\varvec{N}}_1\), are the model parameters which we describe in the next sections.

We adopt the Poisson distribution throughout, but there are other possibilities, such as negative-binomial or zero-inflated Poisson, for which an approximate concentrated likelihood approach is possible (Dennis et al. 2014). Alternatively, Pagel et al. (2014) accounted for overdispersion with respect to this simple model by a mixed log-normal-Poisson distribution.

The methods of the paper provide joint modelling of data obtained at different temporal scales. We consider two model types which are structurally different: a phenomenological model based simply on normal probability density functions and mechanistic models that are based upon stopover models, which involve mechanisms allowing for estimation of survival.

2.1 Phenomenological Model for Univoltine Species

For a univoltine species, the counts within a season increase from zero and then decrease to zero corresponding to the emergence and death of adult butterflies. This variation may be described by Normal probability density \(N(\mu _{i,k},\sigma _{i,k}^2)\), corresponding to site i and year k, so that for the jth visit at time \(t_{i,j,k}\) (e.g. week number in the season) we have

$$\begin{aligned} \lambda _{i,j,k}=N_{i,k} \frac{1}{\sigma _{i,k}\sqrt{2\pi }}\text {exp}\left\{ -\frac{(t_{i,j,k}-\mu _{i,k})^2}{2\sigma _{i,k}^2}\right\} , \end{aligned}$$
(1)

which we write as \(\lambda _{i,j,k} = N_{i,k}a_{i,j,k}\), where \(N_{i,k}\) provides an estimate of relative abundance for site i in a given year, k, and \(\{a_{i,j,k}\}\) describes the seasonal variation over visits within that year. Thus for site i and year k, the counts for any visit have a Poisson distribution with mean value proportional to the Normal probability density function centred on \(\mu _{i,k}\).

We allow the relative abundance \(N_{i,k+1}\), for site i and year \(k+1\), to depend upon that in the previous year, \(N_{i,k}\), in a deterministic first-order auto-regressive manner via a growth rate, \(\rho _{i,k}\) which, assuming the species does not overwinter as an adult (in which case a model for multiple generations is required), we define as “productivity”, i.e. \(N_{i,k+1} = \rho _{i,k}N_{i,k}\). Developing this recursion over time provides

$$\begin{aligned} \lambda _{i,j,1} = N_{i,1}a_{i,j,1} \quad \text {and}\quad \lambda _{i,j,k}=N_{i,k}a_{i,j,k} = \left( N_{i,1}\prod _{m=1}^{k-1} \rho _{i,m}\right) a_{i,j,k}\quad \text {for }k>1, \end{aligned}$$
(2)

which is similar to the model in Freeman and Newson (2008), but with a seasonal component. The productivities, \(\{\rho _{i,k}\}\), describe the successes of a given generation over sites (i) for each year (k) and represent products of the number of eggs laid per adult and the probability of each egg reaching the adult stage in the next generation. The expressions of Eq. (2) characterise the univoltine models of the paper, with different formulations for the seasonal pattern, \(\{a_{i,j,k}\}\), providing different models, as we shall see for a mechanistic model formulation in Sect. 2.3.

2.2 Phenomenological Model for Bivoltine Species

Bivoltine butterfly species have two broods each year, with the adults of the second brood arising from the eggs laid by the adults of the first. We may extend the model above to describe counts from two annual broods by incorporating two Normal distributions in the model for \(\lambda _{i,j,k}\). Thus we set

$$\begin{aligned} \lambda _{i,j,k}= & {} N_{i,k,1}\frac{1}{\sigma _{i,k,1}\sqrt{2\pi }}\text {exp}\left\{ -\frac{(t_{i,j,k}-\mu _{i,k,1})^2}{2\sigma _{i,k,1}^2}\right\} \\&+\, N_{i,k,2}\frac{1}{\sigma _{i,k,2}\sqrt{2\pi }}\text {exp}\left\{ -\frac{(t_{i,j,k}-\mu _{i,k,2})^2}{2\sigma _{i,k,2}^2}\right\} , \end{aligned}$$

which we may write as

$$\begin{aligned} \lambda _{i,j,k} \equiv N_{i,k,1}a_{i,j,k,1} + N_{i,k,2}a_{i,j,k,2}, \end{aligned}$$

where at site i in year k the relative abundance for the first brood is given by \(N_{i,k,1}\) and for the second brood by \(N_{i,k,2}\). For the means and variances of the two Normal densities, the final subscripts designate brood, and we have \(\mu _{i,k,2} > \mu _{i,k,1}\).

Whereas in Dennis et al. (2014) two broods are described by a mixture of probability density functions, here the relative abundance of a second brood in each year is assumed to depend on that of the first brood that year. Dependence between the two broods in any year is introduced by defining

$$\begin{aligned} N_{i,k,2} = \rho _{i,k,1}N_{i,k,1}, \end{aligned}$$

in addition to the between-year dependence, now given by

$$\begin{aligned} N_{i,k+1,1} = \rho _{i,k,2}N_{i,k,2}. \end{aligned}$$

Thus, \(\rho _{i,k,1}\) represents the productivity of the first brood in a given year k, and \(\rho _{i,k,2}\) represents the productivity of the second brood, which feeds into the relative abundance of the first brood of the following year, \(N_{i,k+1,1}\). So developing the recursion over time we write

$$\begin{aligned} \lambda _{i,j,1}&= N_{i,1,1}a_{i,j,1,1} + N_{i,1,2}a_{i,j,1,2}\nonumber \\&= N_{i,1,1}(a_{i,j,1,1}+\rho _{i,1,1}a_{i,j,1,2}), \end{aligned}$$
(3)

and

$$\begin{aligned} \lambda _{i,j,k}&= N_{i,k,1}a_{i,j,k,1} + N_{i,k,2}a_{i,j,k,2}\nonumber \\&= \left( N_{i,1,1}\prod _{m=1}^{k-1}\prod _{b=1}^2\rho _{i,m,b}\right) a_{i,j,k,1} + \left( N_{i,1,1}\rho _{i,k,1}\prod _{m=1}^{k-1}\prod _{b=1}^2\rho _{i,m,b}\right) a_{i,j,k,2}\nonumber \\&= N_{i,1,1}\left( a_{i,j,k,1}+\rho _{i,k,1}a_{i,j,k,2}\right) \prod _{m=1}^{k-1}\prod _{b=1}^2\rho _{i,m,b} \quad \text {for }k>1. \end{aligned}$$
(4)

The extension to multivoltine species with greater than two broods each year is immediate, though not greatly applicable to the UK species. The new development for bivoltine/ multivoltine species is naturally based on the fact that the relative size of a given brood depends on the productivity of the previous brood. Notationally, we denote the phenomenological models by \(\hbox {P}_{B}\), where B is the number of broods.

2.3 Mechanistic and Stopover Models

Relatively little is known regarding butterfly survival, and what is known results from local short-term mark-recapture programmes, which are expensive. To build survival into our models we introduce additional parameters, the emergence times of adults, which are typically unknown and of interest in their own right as indicators of phenological change in a specific, key point in a species’ life-cycle. We do this as follows.

Suppose first of all that there is only one brood, and a site abundance \(N_{i,k}\) for site i and year k. In order to describe the emergence times we introduce parameters \(\beta _{i,d-1,k}\), which describe the proportions of \(N_{i,k}\) emerging at site i and just prior to visit d in year k. The expected number of individuals at site i at time \(t_{i,j,k}\) in year k is given as

$$\begin{aligned} \lambda _{i,j,k} = N_{i,k}a_{i,j,k}= N_{i,k}\left\{ \sum _{d=1}^j \beta _{i,d-1,k}\left( \prod _{m=d}^{j-1}\phi _{i,m,k}\right) \right\} , \end{aligned}$$
(5)

where the index \(d=1,\ldots ,j\) indicates the possible times of emergence for an individual detected on visit j. The parameters \(\beta _{i,d-1,k}\) describe the proportions of \(N_{i,k}\) emerging at site i and visit d in year k, such that \(\sum _{d=1}^T\beta _{i,d-1,k}=1\), for each site i and year k. We define \(\phi _{i,m,k}\) as the probability that an individual that is present at site i at visit m in year k, will remain at that site until visit \(m+1\). So for example, \(\lambda _{i,3,k}=N_{i,k}\left( \beta _{i,0,k}\phi _{i,1,k}\phi _{i,2,k} + \beta _{i,1,k}\phi _{i,2,k}+\beta _{i,2,k}\right) \).

In order that the emergence parameters have the right type of shape we can set

$$\begin{aligned} \beta _{i,d-1,k}=F_{i,k}(t_{i,d,k})-F_{i,k}(t_{i,d,k}-1), \end{aligned}$$
(6)

where \(F_{i,k}(t_{i,d,k})=P(X \le t_{i,d,k})\) for \(X\sim N(\mu _{i,k},\sigma _{i,k}^2)\), where \(\mu _{i,k}\) is the mean date of emergence and \(\sigma ^2_{i,k}\) is the associated variance. For each i, k, \(\beta _{i,0,k}=F_{i,k}(1)\) and \(\beta _{i,T-1,k}=1-F_{i,k}(T-1)\).

This is a simple stopover model, proposed for butterfly data by Matechou et al. (2014); see also Dennis et al. (2014). Stopover models are used in describing data on migrating birds, which rest and feed during their journey at particular stopover sites where observations take place. Typically, the resulting counts, graphed over time, reflect successive waves of birds arriving, staying and then leaving. Matechou et al. (2014) observe that this is the same pattern seen when adult butterflies are counted within a season, with for example bivoltine species analogous to two waves of birds observed at a stopover site.

In Matechou et al. (2014), the above model is extended to account for multivoltine data, and the expression of Eq. (6) then becomes a mixture of terms, each of which is an area under an appropriate probability density function. In the multivoltine case, we need a different dynamic mechanistic model, in order to allow for the abundance of one brood to feed into that of a succeeding brood, during the same year.

In the univoltine dynamic stopover model, the recursions of Eq. (2) apply, but now with the different specification of \(\{a_{i,j,k}\}\) provided by Eq. (5). For the multivoltine case in the dynamic mechanistic model, we assign a separate site abundance to each brood in a year. Thus, we assume the two broods for bivoltine species to be separate such that, for site i, visit j and brood b, in year k, we extend Eq. (5) to give

$$\begin{aligned} a_{i,j,k,b}= \sum _{d=1}^j \beta _{i,d-1,k,b} \left( \prod _{m=d}^{j-1}\phi _{i,d,k,b}\right) , \quad \text {for } b=1,2, \end{aligned}$$
(7)

where we define \(\{\phi _{i,d,k,b}\}\) as the appropriate survival probabilities of an individual from one visit to the next, which are now estimated separately for each brood. This development is an extension of that in the original specification of Matechou et al. (2014). For brood b, the parameters \(\{\beta _{i,d-1,k,b}\}\) describe the proportions of \(N_{i,k,b}\) arriving at visit d, and are modelled here using Normal distributions, so that

$$\begin{aligned} \beta _{i,d-1,k,b}=F_{i,k,b}(t_{i,d,k})-F_{i,k,b}(t_{i,d,k}-1), \end{aligned}$$

where \(F_{i,k,b}(t_{i,d,k})=\text {Pr}(X\le t_{i,d,k})\), for \(X\sim N(\mu _{i,k,b},\sigma ^2_{i,k,b})\), and \(\mu _{i,k,b}\) is the appropriate mean date of emergence of adults for brood b and \(\sigma _{i,k,b}^2\) is the corresponding variance. For each i, k, and b, \(\beta _{i,0,k,b}=F_{i,k,b}(1)\) and \(\beta _{i,T-1,k,b}=1-F_{i,k,b}(T-1)\). The recursions of Eqs. (3) and (4) then apply, but now with the new specification of \(\{a_{i,j,k,b}\}\) from Eq. (7). Notationally, we specify the dynamic mechanistic model by \(\hbox {M}_{B}\), where B is the number of broods.

2.4 Concentrated Likelihood

We fit models to data by maximum likelihood. As in Dennis et al. (2014), the number of parameters in the likelihood can be reduced by S, using a concentrated likelihood approach. S is typically large for these models and so computational efficiency is substantially increased. We consider first the univoltine case. Using Eq. (2), apart from an additive constant, the log-likelihood for site i may be written as

$$\begin{aligned} \ell _i=\text {Log}(\text {L}_i)&=\sum _{j=1}^T \left[ -N_{i,1}a_{i,j,1}+y_{i,j,1}\text {log}\left( N_{i,1}a_{i,j,1}\right) \right. \nonumber \\&\quad \left. + \sum _{k=2}^Y\left\{ -N_{i,1}a_{i,j,1}\prod _{m=1}^{k-1} \rho _{i,m} + y_{i,j,k}\text {log}\left( N_{i,1}a_{i,j,k}\prod _{m=1}^{k-1} \rho _{i,m}\right) \right\} \right] . \end{aligned}$$
(8)

For the data from all sites, the log-likelihood is \(\ell =\sum _{i=1}^S\ell _i\). Using Eq. (8) we obtain

$$\begin{aligned} \frac{\partial \ell }{\partial N_{i,1}}= \sum _{j=1}^T \left\{ -a_{i,j,1} + \frac{y_{i,j,1}}{N_{i,1}} + \sum _{k=2}^Y\left( -a_{i,j,1}\prod _{m=1}^{k-1}\rho _{i,m} + \frac{y_{i,j,k}}{N_{i,1}} \right) \right\} , \end{aligned}$$

and equating to zero we find

$$\begin{aligned} N_{i,1}=\sum _{j=1}^T\frac{\sum _{k=1}^Y y_{i,j,k}}{a_{i,j,1}+\sum _{k=2}^Ya_{i,j,k}\prod _{m=1}^{k-1}\rho _{i,m}}. \end{aligned}$$
(9)

We note how \(N_{i,1}\) is a weighted sum over visits of totals at site i across years. Thus, despite an apparent strong dependence of \(\{N_{i,k}\}\) on \(\{N_{i,1}\}\) in (2), this is only a consequence of the deterministic links between the \(\{N_{i,k}\}\), and all data contribute to the estimation of \(\{N_{i,1}\}\), and hence \(\{N_{i,k}\}\). Substitution of the expressions for \(\{N_{i,1}\}\) from (9) in (8) results in a concentrated likelihood, which is maximised with respect to only the parameters associated with \({\varvec{\rho }}\) and \(\mathbf {a}\) (which contain the elements of \({\varvec{\mu }}\) and \({\varvec{\sigma }}\)). Estimation of \(\{N_{i,1}\}\) is then made by substituting estimates of \(\{a_{i,j,k}\}\) and \(\{\rho _{i,m}\}\) into (9). The above approach holds for both phenomenological and mechanistic models.

The concentrated likelihood for the bivoltine case is given similarly in the appendix. We maximise the concentrated likelihoods using the optim function in R (R Core Team 2015), with the limited-memory BFGS algorithm (Byrd et al. 1995). Associated R code for the dynamic models is provided in the Supplementary Material.

2.5 Annual Index of Abundance

In the following, we write \(\widehat{\theta }\) for the maximum-likelihood estimate of \(\theta \), for any parameter \(\theta \). The averages of the relative site abundance estimates, for each year k, are used to create an index of abundance \(G_k\) for year k. For a univoltine species, we set

$$\begin{aligned} G_k=\frac{1}{S}\sum _{i=1}^S \widehat{N}_{i,k} = {\left\{ \begin{array}{ll}\displaystyle \frac{1}{S}\sum _{i=1}^S \widehat{N}_{i,1} &{} \text {if } k=1\\ \displaystyle \frac{1}{S}\sum _{i=1}^S \left( \widehat{N}_{i,1}\prod _{m=1}^{k-1}\widehat{\rho }_{i,m}\right) &{} \text {if } k>1,\end{array}\right. } \end{aligned}$$
(10)

for \(k=1,\ldots ,Y\), from equations (2). Similarly for the bivoltine case, we estimate an index \(G_{k,b}\) for each brood, \(b=1,2\), as

$$\begin{aligned} G_{k,1}=\frac{1}{S}\sum _{i=1}^S \widehat{N}_{i,k,1} ={\left\{ \begin{array}{ll}\displaystyle \frac{1}{S}\sum _{i=1}^S\widehat{N}_{i,1,1} &{} \text {if }k=1\\ \displaystyle \frac{1}{S}\sum _{i=1}^S \left( \widehat{N}_{i,1,1}\prod _{m=1}^{k-1}\prod _{b=1}^2\widehat{\rho }_{i,m,b}\right) &{} \text {if } k>1,\end{array}\right. } \end{aligned}$$
(11)

and

$$\begin{aligned} G_{k,2}=\frac{1}{S}\sum _{i=1}^S \widehat{N}_{i,k,2} = {\left\{ \begin{array}{ll}\displaystyle \frac{1}{S}\sum _{i=1}^S\widehat{N}_{i,1,1}\widehat{\rho }_{i,k,1}&{}\text {if } k=1\\ \displaystyle \frac{1}{S}\sum _{i=1}^S \left( \widehat{N}_{i,1,1}\widehat{\rho }_{i,k,1}\prod _{m=1}^{k-1}\prod _{b=1}^2\widehat{\rho }_{i,m,b}\right) &{} \text {if } k>1,\end{array}\right. } \end{aligned}$$
(12)

for \(k=1,\ldots ,Y\), making use of the recursions demonstrated in Eqs. (3) and (4). The separate brood indices can be added to produce a single, annual index but there is potentially great ecological benefit in maintaining them separately, as each corresponds to different times of year and may be driven by different environmental factors.

Once indices are formed they are plotted against year, and we shall see examples in Sect. 3. Standard errors for the indices can be obtained via bootstrapping, as for other methods (Dennis et al. 2013, 2014). Error bars are not presented here for clarity, but in general the differences between the indices derived from the dynamic models and alternative methods (which we explore in the next section) are smaller than the size of the errors.

2.6 Application

We apply the dynamic models to national monitoring scheme data for a subset of UK butterfly species. The UK Butterfly Monitoring Scheme (UKBMS) is the primary source of count data for UK butterflies. The scheme relies on recorders who count butterflies under favourable conditions each week between early April and late September, the main period for butterfly activity. This results in a maximum of \(T=26\) each year, though typically not all of the 26 designated visits are made, so the data do not need to be equally spaced. The UKBMS has grown gradually since it began in 1976 to over 1100 sites monitored in 2012 (Botham et al. 2013). Population trends are typically calculated annually for 56 of the 59 butterfly species regularly found in the UK.

Many studies of UKBMS data involve application to a single illustrative species (Matechou et al. 2014; Pagel et al. 2014). We demonstrate the dynamic models with application to a sample of representative, taxonomically and ecologically diverse species. Six univoltine and five bivoltine species were selected, with varying range size, habitat requirements and phenologies, although very scarce, habitat-specialist species, which generally have limited data, were not considered in this analysis. Each model was fitted to data for 1978–2011. The UK butterfly transect data used in this study are archived by the UKBMS (http://www.ukbms.org).

Sites at which the species of interest was never recorded or at which monitoring was undertaken for fewer than five years were excluded from this analysis. For illustration, a subset of 100 monitored sites was randomly selected for each species, with the exception of Holly Blue, for which a sample of up to 200 sites was instead taken, since using only 100 sites produced bias in the estimates of productivity.

We illustrate the performance of the dynamic models in terms of abundance indices, productivity, survival and phenology. Additional figures and tables are given in Appendix S1 of the Supplementary Material. This will be done for the samples of the univoltine and bivoltine species, with and without the addition of covariates. Where parameters were assumed to be constant spatially the subscript for site, i, is omitted. In models for bivoltine species, we let \(\mu _2=\mu _1 + \mu _d\), where \(\mu _1\ge 0\) and \(\mu _d>0\), to ensure that \(\mu _2 > \mu _1\).

The covariates we select are northing and measures of temperature. They were chosen to demonstrate the potential of the models, and may not be optimal. All covariates were standardised to have zero mean and unit variance. We use monthly mean and minimum Central England Temperature data (Parker et al. 1992).

The average minimum daily temperature during October–March was used as a covariate for overwinter productivity. For bivoltine species, the mean temperature within the flight period of the first brood was used to describe productivity of the first brood. Productivities, which are necessarily positive, were regressed on the log scale. Survival in mechanistic models was logistically regressed on mean temperature within the flight period of the brood of interest. Scientific names and approximate flight periods for the species studied are provided in Table S1.1, and the latter were used to indicate the relevant temperature covariates. Due to interest in the possible effect of covariates on estimates of survival, we primarily use mechanistic models when covariates are employed and phenomenological models otherwise, however alternatives are also possible.

3 Results

3.1 Indices

Indices of abundance are derived from estimates of annual productivity and estimates of initial abundance from the dynamic model, as described in Sect. 2.5. Here \(\mu \) and \(\sigma \) have been considered constant, although varying these between years provides useful information and we shall see examples of this later, but it had no distinguishable effect on indices of abundance. We compare relative abundance indices for model \(\hbox {P}_{1}\) and an approach with GAM-based models for seasonal patterns, currently adopted by the UKBMS and described by Dennis et al. (2013). To compare the different indices, each index was standardised to have zero mean and unit variance. An additional comparison with the generalised abundance index (GAI) approach (Dennis et al. 2014) and consideration of goodness-of-fit are given in Appendix S2 of the Supplementary Material.

The fitted phenomenological dynamic models discussed in the context of indices have 35 and 71 parameters for \(B=1,2\), respectively. Given that \(Y=34\), in the univoltine case there are 33 annual estimates \(\rho _k\), as well as \(\mu \) and \(\sigma \), and in the bivoltine case, there are 34 parameters \(\rho _{k,1}\) and 33 parameters \(\rho _{k,2}\), in addition to \(\mu _1\), \(\mu _d\), \(\sigma _1\) and \(\sigma _2\).

Figure 1a gives a comparison between annual indices of abundance for six univoltine species. There is a good agreement between the indices resulting from the dynamic model and the standard GAM approach (Dennis et al. 2013). By estimating an index for each brood (Eqs. 11 and 12), dynamic models \(\hbox {P}_{2}\) allow us to add more information to indices for bivoltine species, which we illustrate in two different ways in Figs. 2a and S1.1. We see how the dynamic model allows us to elaborate the indices produced by the GAM approach, by providing a separate index for each brood in the bivoltine case. This could, for example, reveal differing trends between broods.

Fig. 1
figure 1

a Relative abundance indices for each univoltine species from model \(\hbox {P}_{1}\) (black, open circles, solid lines) and the GAM approach (red, open triangles, dashed lines) and b annual estimates of productivity, \(\rho _k\), from model \(\hbox {P}_{1}\), which was fitted to estimate \(\rho _k\) across sites for each year. The horizontal dashed line separates productivities above/below unity, corresponding to growth/decline compared to the previous year. Dotted lines represent 95 % confidence intervals for productivity (Color figure online).

For model verification, Appendix S3 of the Supplementary Material summarises the results of applying the dynamic models to simulated data.

3.2 Productivity

Figure 1b presents estimates of productivity for the univoltine species, from fitting model \(\hbox {P}_{1}\). Values of \(\rho _{k}\) greater than unity indicate growth compared to the previous year, and values less than unity indicate decline. Hence as anticipated we see a tendency for productivities less than unity for species in decline, such as Small Skipper in recent years, while for Marbled White productivities tend to be above unity during the initial period of growth, followed by recent fluctuations about unity, when the population appears to be relatively stable.

Fig. 2
figure 2

a Relative abundance indices for each bivoltine species for the first (black, open circles, solid lines) and second (blue, closed diamonds, solid lines) broods from model \(\hbox {P}_{2}\) and the GAM approach (red, open triangles, dashed lines) and b annual estimates of productivity for the first (\(\rho _{k,1}\), black, open circles) and second (\(\rho _{k,2}\), blue, closed diamonds) brood from model \(\hbox {P}_{2}\), which was fitted to estimate \(\rho _{k,b}\) across sites for each brood and year. The horizontal dashed line separates productivities above/below unity, corresponding to growth/decline compared to the previous brood (Color figure online).

Figure 2b presents estimated productivities for each brood for five bivoltine species, using model \(\hbox {P}_{2}\). Values above/below unity represent growth/decline relative to the previous brood. In Figure S1.2, we see how the productivities reflect the relative sizes of the fitted seasonal curves, for which the average over the series is shown (standardised to sum to unity). The relative sizes of the broods will actually vary with productivity each year.

Figure 3 shows the results of including covariates in the dynamic models, in this case for model \(\hbox {M}_{1}\) with \(\rho _{i,k}\) and \(\phi _{i,k}\) (which we revisit in Sect. 3.3) varying with temperature and northing. It is interesting that with the exception of Gatekeeper, higher productivity is significantly associated with cooler winters and in all cases with more Northerly latitudes (regression coefficients and associated standard errors are presented in Table S1.2a).

Fig. 3
figure 3

Predicted productivity with varying minimum overwinter temperature from model \(\hbox {M}_{1}\). Each line represents one of 25 equally spaced northing values within the species range (red at southern sites and blue at northern sites). Model \(\hbox {M}_{1}\) was fitted with productivity, \(\rho _{i,k}\), and survival probability, \(\phi _{i,k}\), regressed on temperature and northing (Color figure online).

Fig. 4
figure 4

Predicted productivity with varying temperature from model \(\hbox {M}_{2}\). The mean temperature during the first brood, and the minimum overwinter temperature were used as covariates for productivity of the first and second brood, respectively. Each line represents one of 25 equally spaced Northing values within the species range (red at southern sites and blue at northern sites). Model \(\hbox {M}_{2}\) was fitted with productivity, \(\rho _{i,k,b}\), for each brood, b regressed on temperature and northing and survival probability, \(\phi _{i,k,b}\), for each brood, b, regressed on temperature (Color figure online).

Figure 4 shows the effect of adding covariates for bivoltine species, in this case for model \(\hbox {M}_{2}\) with productivity varying with temperature and northing, and survival varying with temperature, which we discuss further in Sect. 3.3. As detailed in Sect. 2.6, first-brood productivity was associated with the mean temperature during the first brood, and second-brood productivity with the minimum winter overwinter temperature. Associations of first-brood productivity, \(\rho _{i,k,1}\), with northing and weather varied between species, and regression coefficients for the slope parameters were generally significant (Table S1.3a). The association of higher productivity with cooler winters shown for univoltine species is also found for the second brood of the bivoltine species, with the exception of Wall Brown and Holly Blue, which is a common garden visitor, unlike the other species which favour grasslands, as well as gardens in the case of Small White and Green-veined White.

Given an estimate of productivity for each year, if desired the geometric mean of the productivities over time may be used to provide a simple comparison between species.

3.3 Survival

The mechanistic models allow estimation of the survival probabilities, \(\phi \), of butterflies, from which adult life expectancies (in weeks) can be estimated by \(1/(1-\phi )\), assuming that a species does not overwinter as an adult. Variation in life expectancy with temperature is displayed for univoltine species in Figure S1.3, and for bivoltine species in Figure S1.4, based on the models fitted with covariates in the previous section. Tables S1.2 and S1.3 show the parameter estimates and associated standard errors from the \(\hbox {M}_{B}\) models with covariates. For comparison, estimates are also included for the \(\hbox {P}_{1}\) and \(\hbox {P}_{2}\) models with covariates for \(\rho \), which are not presented in the figures, but produce analogous estimates of the shared parameters. There are differences in \(\mu \) and \(\sigma \) since in the mechanistic model \(\mu \) represents the mean date of emergence which will be earlier than the mean flight date, and \(\sigma \) relates to the length of the period of emergence, which will be shorter than the length of the flight period in the \(\hbox {P}_{1}\) model. The associated errors for \(\mu \) and \(\sigma \) are smaller for the \(\hbox {P}_{1}\) than for the \(\hbox {M}_{1}\) model. For the bivoltine species, there is more variation in the estimates from \(\hbox {P}_{2}\) and \(\hbox {M}_{2}\). As in the univoltine case, standard errors from the phenomenological model tend to be smaller than those from the mechanistic model. The \(\hbox {M}_{B}\) models with covariates have 8 and 14 parameters for \(B=1,2\), respectively, compared to the \(\hbox {P}_{B}\) models with 5 and 10 parameters for \(B=1,2\), respectively. In these cases, reduced precision is a consequence of greater model complexity.

For univoltine species, there was a significant negative association of life expectancy with higher average temperature during the flight period for four out of six species (Figure S1.3; Table S1.2a). Four univoltine species indicated significantly greater survival at southerly sites. Standard errors in Table S1.2a) are generally small, but are large for two instances for Green Hairstreak, which exhibit flatness in the associated plots (Figs. 3, S1.3).

As for the associations of first-brood productivity with weather, in bivoltine species we find that the variation in first brood life expectancy with temperature differs between the species sampled, and slope estimates were only significant for three out of five species (Table S1.3a). With the exception of Holly Blue, life expectancy for the second brood of the bivoltine species increased significantly with temperature. Fitting model \(\hbox {M}_{2}\) with covariates for northing and weather on \(\rho \) and \(\phi \) for each brood produced unrealistic estimates of lifespan for Brown Argus and Holly Blue, hence in Figure S1.4 we allow \(\phi \) in the \(\hbox {M}_{2}\) model to vary with temperature and brood only. This requires further investigation, but is likely to be due to the relatively large number of parameters in model \(\hbox {M}_{2}\) and/or relatively small size of the sample. The corresponding standard errors for this model are sometimes large, for example, for Holly Blue (Table S1.3a).

3.4 Phenology

Here we demonstrate the potential to produce estimates of phenology using the dynamic models. The \(\hbox {P}_{1}\) and \(\hbox {P}_{2}\) models were fitted with \(\varvec{\rho }\), \(\varvec{\mu }\) and \(\varvec{\sigma }\) each varying with year. Hence the \(\hbox {P}_{1}\) model requires 101 parameters to be estimated, corresponding to 33 parameters for \(\rho _k\) and 34 parameters each for \(\mu _k\) and \(\sigma _k\). Similarly the \(\hbox {P}_{2}\) model has 203 parameters: 34 for \(\rho _{k,1}\), 33 for \(\rho _{k,2}\), and 34 each for \(\mu _{k,1}\), \(\mu _{k,d}\), \(\sigma _{k,1}\) and \(\sigma _{k,2}\). To identify potential phenological trends, the models were also fitted with the parameters of interest regressed upon year (indicated by blue lines), as in the models fitted to univoltine species for comparison with the GAI in Appendix S2 of the Supplementary Material. We perform simple linear regressions post model-fitting to identify potential trends between \(\mu \) and productivity \(\rho \), where green lines indicate significant regressions (p-value \(<0.05\)).

Figure 5 gives the mean and standard deviation of the flight periods for the univoltine species and corresponding figures for the bivoltine species are given in Figures S1.5 and S1.6. Figures 5a and S1.5 suggest that the mean flight period date, \(\mu \), has advanced for all species and broods, which is consistent with what is expected under climate change (Sparks and Yates 1997; Roy and Sparks 2000). From Figures 5b and S1.6 we see that the length of the flight period has generally increased, also in agreement with previous findings (Roy and Sparks 2000). Table S2.1 suggests significant increases in \(\sigma \) for 5 out of 6 univoltine species. The location of the fitted line for the Marbled White \(\sigma _k\) in Figure 5b is due to the increase in sample size over time giving more weight to the later years. Figures S1.5 and S1.6 show a small number of outliers which require further investigation.

Fig. 5
figure 5

Annual estimates of a \(\mu _k\) and b \(\sigma _k\) from model \(\hbox {P}_{1}\), which was fitted to estimate \(\rho _k\), \(\mu _k\) and \(\sigma _k\) across sites for each year. Blue lines indicate fitting log-linear regressions on year for \(\mu \) and \(\sigma \), as in Table S2.1a) (Color figure online).

With the exception of Green Hairstreak, for the six univoltine species there was no clear relationship between \(\mu _k\) and \(\rho _k\) (Figure S1.7). For Green Hairstreak, which emerges early in the season, lower productivities are associated with an earlier flight period, which may lead to declines if advances in phenology continue with changes in climate. For most of the five bivoltine species, significant patterns between the mean flight period for each generation and the associated productivity were not found (Figure S1.8). However for Brown Argus and Green-veined White, productivity of the second generation was lower when the mean flight period date of the second brood, \(\mu _{k,2}\), was advanced.

These results show that the dynamic models predict phenological changes consistent with expected patterns. The dynamic models allow for improved estimates of phenology to be studied in combination with demographic parameters, to reveal potential novel insights. Changes in phenology may also be modelled using the mechanistic models, in order to separate changes in emergence time from changes in survival.

4 Discussion

The dynamic model framework allows novel investigation of the drivers of fluctuations in abundance and provides a basis that can be adapted to both the study species and research aim. We have presented only a preliminary application. The methods of Dennis et al. (2014), which model data for each year separately, may be better suited for estimating indices of abundances efficiently (see Table S2.1), whereas dynamic models provide additional information of value for understanding demography. However, the agreement of the indices obtained from the different methods provides confidence that the dynamic models are performing correctly, and that for multivoltine species indices may be derived separately for each brood.

For the majority of the sample species, higher overwinter productivity was associated with cooler winters, which may act to reduce the impact of pathogens. Variability in lifespan and first brood productivity of bivoltine species differed more between species. Given that species have different life-histories, further research may look for trait-based variation, for example overwintering stage: egg, larva, chrysalis or adult, all of which may be affected most severely by different environmental factors. For example, Diamond et al. (2011) explored relationships between changes in date of first appearance and species’ traits.

Further work is needed to explore the relevant covariates driving changes in productivity, survival and phenology. Spatial covariates such as habitat/land-cover may describe additional variation in the parameters. Inclusion of local weather could identify the period within the life-cycle for which weather has the most impact on the adult stage. Growing degree-days may also be explored (Hodgson et al. 2011). In this study, covariates were included additively on a logistic linear scale, whereas true relationships may be non-linear, for example productivity/survival might be limited by extremes in weather. The models could also be extended to model variation in productivity stochastically.

Alternatives to the Normal distribution for describing seasonal variation could be explored, for example to describe skewness (Calabrese 2012). This study has only accounted for species which are distinctly univoltine or bivoltine. A spline may be used to define complex seasonal patterns (Dennis et al. 2014), and the models could be extended to allow more than two broods each year. The models may be developed to accommodate variation in voltinism, where the first generation contributes to both the second generation within the same year and first generation the following year, with relevance for study of potential “lost generations” (Dyck et al. 2015). The dynamic models may also be used to study species which aestivate under hot summer conditions (Spieth et al. 2011; Grill et al. 2013).

The dynamic models produce realistic estimates of parameters relevant to phenology, providing further validation of the models. Phenological studies have typically involved measures such as mean first encounter, mean peak encounter and mean length of the flight period (Roy and Sparks 2000; Diamond et al. 2014; Karlsson 2014), which may be driven by observer behaviour. The improved estimates of phenology from dynamic models provide the opportunity to study linkages between changes in phenology and changes in abundance and productivity, for example phenological mismatch (Hindle et al. 2015).

Using a phenomenological model may be optimal in scenarios with limited data, but the mechanistic model allows for additional insights by estimating survival. Spatio-temporal variation in the lifespans of butterflies has had limited attention, as have potential linkages with other parameters, for example to explore how phenology affects survival, or whether variation in survival can influence productivity. Using a mechanistic model separates relevant parameters, for example to determine whether an increase in flight period length is due to an extended period of emergence, or increased lifespan. The model could be adapted to explore synchrony in populations (Powney et al. 2010), either between sites of a given species or across sites but between multiple species, by incorporating random effects (Lahoz-Monfort et al. 2011, 2013), for example in the \(\rho \) parameter for productivity. Density dependence, which has been highlighted for some butterflies (Nowicki et al. 2009), may be incorporated here in productivity and/or survival by introducing a dependency on the relative abundance. Additionally, allowing for spatial dependence of \(\rho \) and autocorrelation in abundance may be advantageous (Johnson et al. 2012). Pagel et al. (2014) included spatially autocorrelated random effects when modelling mean population density, but did not account for the within-year variability in counts.

For some threatened, conservation-priority UK butterflies, such as Large Blue Phengaris arion, Brown Hairstreak Thecla betulae and Marsh Fritillary Euphydryas aurinia, data are available on other stages of the butterfly life-cycle, such as counts of caterpillars or eggs. An attraction of the model framework proposed is the potential for the incorporation of data from multiple stages of the life-cycle, which could aid the monitoring and conservation of rarer species for which coverage from standard monitoring schemes can be limited.

The dynamic models may address the “lack of mechanistic understanding about factors driving butterfly population dynamics” (Isaac et al. 2011). Future application will generate hypotheses for further investigation, with the potential to illuminate features of butterfly phenology and demography which are at present poorly understood.