1 Introduction

Ecological time series data are often characterised by periodicities such as diel variation, i.e. recurrent patterns over a 24-h period. Ignoring periodic variation can invalidate statistical inference, e.g. standard errors might be underestimated due to residual autocorrelation [1]. Perhaps more importantly, adequately modelling such periodic variation is crucial to comprehensively understand behavioural dynamics, for example to identify times of day at which individuals tend to be most active, allowing inference on a species’ temporal niche [2, 3].

Fortunately, technological advances in, e.g. GPS tracking, accelerometry, and computer vision allow ecologists to study diel variation in much more detail than was previously possible [4, 5]. One popular tool for modelling ecological time series data and the periodicities therein is given by the class of hidden Markov models (HMMs), which links the observed ecological data (e.g. step lengths and turning angles in animal movement) to underlying non-observable states (e.g. resting, foraging, travelling) [6].

In the existing literature, two different approaches have been used to infer periodic variation using HMMs. First, relatively basic HMMs can be used to infer an animal’s behavioural sequence (state decoding), based on which diel variation can be investigated using simple visualisations [5, 7, 8] or an additional regression analysis [9, 10]. From the statistical perspective, such a two-stage approach will often not be ideal: the uncertainty in the state allocation is not propagated, statistical inference on the periodic effects is not straightforward, and the dimension of the state space may be overestimated due to the misspecification of the basic model (see [11] for the latter). Second, periodic variation is nowadays often directly incorporated in HMMs using trigonometric modelling, for instance by relating the state transition probabilities to the hour of the day using sine and cosine functions with 24-h periods [12,13,14,15,16]. While this will often be sufficient, such a parametric modelling of the periodic effect may lack flexibility to capture complex periodic variation, for example with multiple activity peaks over the day. In principle, this limitation can be overcome by including multiple sine and cosine basis functions, with different wavelengths [17,18,19,20]. However, this can lead to numerical instability, and it can be tedious to select an adequate order.

In this contribution, we explore a more flexible, nonparametric estimation of periodicities in the state-switching dynamics of an HMM using cyclic splines. Thereby we avoid making any a priori assumptions on the functional shape of the periodic effect, thus allowing to infer arbitrarily complex behavioural diel patterns. For inference, we devise an expectation–maximisation (EM) algorithm, thereby isolating the estimation of the nonparametric periodic effect. This allows us to exploit the powerful machinery available for nonparametric (regression) modelling, specifically P-splines or other smoothing methods implemented in existing software packages such as mgcv [21]. The feasibility of the proposed approach is illustrated in two case studies, where we investigate diel variation of African elephants (Loxodonta africana) and of common fruit flies (Drosophila melanogaster).

2 Methods

2.1 Notation and Basics

HMMs are used to model time series data \(x_1,\ldots ,x_T\) (e.g. step lengths of an animal) driven by underlying states \(s_1,\ldots ,s_T\) (e.g. the behavioural modes). In a basic HMM, the latent state process is assumed to be a Markov chain with N states, characterised by the initial state probabilities

$$\begin{aligned} \delta _j^{(1)}=\Pr (S_1=j), \end{aligned}$$

\(j=1,\ldots ,N\), and the transition probability matrix (t.p.m.)

$$\begin{aligned} \varvec{\Gamma }^{(t)} = \bigl ( \gamma _{ij} ^{(t)} \bigr ), \quad \text {where} \quad \gamma _{ij}^{(t)} = \Pr (S_{t}=j \mid S_{t-1}=i), \end{aligned}$$

\(i,j=1,\ldots ,N\), \(t=2,\ldots ,T\). The state active at time t selects which of N possible state-dependent distributions \(f_1,\ldots ,f_N\) generates the observation \(x_t\):

$$\begin{aligned} f(x_t \mid s_t=j) = f_j (x_t). \end{aligned}$$

Covariates—including time of day—can be included in either the state-dependent distributions \(f_1,\ldots ,f_N\) or the state transition probabilities \(\gamma _{ij}^{(t)}\). We focus on the latter, as in ecological applications the main interest typically lies in the state process and its drivers, including temporal effects.

2.2 Trigonometric Modelling of Time-of-Day Variation

Including covariates in the state process amounts to regression modelling within the HMM, where covariates affect the state transition probabilities and hence, the behavioural decisions made by an animal. Specifically, if at time \(t-1\) the animal is in state i, the categorical distribution of states at time t is given by the vector \((\gamma _{i1}^{(t)},\ldots ,\gamma _{iN}^{(t)})\). The covariate-dependence of this categorical distribution is typically modelled using a multinomial logistic regression, which is achieved by applying the inverse multinomial logit link to each row i of the t.p.m.,

$$\begin{aligned} \gamma _{ij} ^{(t)} = \frac{e^{\tau _{ij}^{(t)}}}{\sum _{k = 1}^{N} e^{\tau _{ik} ^{(t)}}}, \end{aligned}$$
(1)

defining \(\tau _{ii} ^{(t)}=0\) (reference category). The linear predictors \(\tau _{ij}^{(t)}\) can include, inter alia, simple linear effects, polynomial effects, interaction terms, and random effects. Without the latter (for ease of notation), the general form of the linear predictor for \(\gamma _{ij}^{(t)}\) is

$$\begin{aligned} \tau _{ij}^{(t)} = {\textbf{z}}_t' \varvec{\beta }^{(ij)} = \beta _0^{(ij)} + \beta _1^{(ij)} z_{t1} + \cdots + \beta _p^{(ij)} z_{tp}, \end{aligned}$$

where \(z_{tk}\) are covariates. When the aim is to model periodic patterns in the state-switching dynamics, the linear predictor can be extended by including trigonometric basis functions with the desired periodicity. For example, for modelling diel variation in a time series with hourly data, a possible simple form of the linear predictor is

$$\begin{aligned} \tau _{ij}^{(t)} = {\textbf{z}}_t' \varvec{\beta }^{(ij)} + \omega ^{(ij)} \sin \biggl ( \frac{2\pi t}{24} \biggr ) + \psi ^{(ij)} \cos \biggl ( \frac{2\pi t}{24} \biggr ), \end{aligned}$$

with the additional coefficients \(\omega ^{(ij)}\) and \(\psi ^{(ij)}\) to be estimated alongside \(\varvec{\beta }^{(ij)}\). General periodicities are modelled analogously, replacing the 24 in the denominator by the period length (i.e. the number of sequential observations before completing one period).

With only two harmonics, the flexibility of the periodic component of the linear predictor is somewhat limited: for example, this formulation implies that the periodic component has only one maximum turning point (such that patterns with multiple activity peaks throughout a day may not be adequately captured). The flexibility can be increased by including additional trigonometric functions with different wavelengths:

$$\begin{aligned} \tau _{ij}^{(t)} = {\textbf{z}}_t' \varvec{\beta }^{(ij)} + \sum _{k=1}^{K} \omega _{k}^{(ij)} \sin \biggl ( \frac{2\pi k t}{24} \biggr ) + \sum _{k=1}^{K} \psi _{k}^{(ij)} \cos \biggl ( \frac{2\pi k t}{24} \biggr ), \end{aligned}$$
(2)

see for example [19, 20]. By increasing K, arbitrary (smooth) modelling of the periodic effect can be achieved. However, when complex periodic patterns are to be modelled, it can be tedious to select an adequate order K, with the risk of overfitting looming. It may then be more straightforward to avoid making any assumptions on the functional shape of the periodic effect, instead using nonparametric smoothing methods. Such an approach to estimating periodic effects will be presented and explored in the following.

2.3 Cyclic Splines for Modelling Time-of-Day Variation

We now consider nonparametric modelling of the periodic effect, replacing the sum of trigonometric basis functions in (2) by a spline. Specifically, we construct this spline as a linear combination of Q basis functions,

$$\begin{aligned} \tau _{ij} ^{(t)}= {\textbf{z}}_t' \varvec{\beta }^{(ij)} + \sum _{q=1}^Q a_{q}^{(ij)} B_q(t \; \text {mod} \; 24), \end{aligned}$$
(3)

with the scaling coefficients \(a_{1}^{(ij)},\ldots ,a_Q^{(ij)}\) to be estimated. We use cubic B-spline basis functions \(B_1,..., B_Q\), which are easy to compute and yield visually smooth functions. To enforce the desired periodicity, these are wrapped at the boundaries of the support [21]; see Fig. 1 for an illustration with \(Q=8\) and period length 24 h—again general periodicities are modelled analogously.

Fig. 1
figure 1

Example set of \(Q=8\) cyclic basis functions for modelling diel variation

In practice, a large Q (e.g. 20) is typically used to guarantee sufficient flexibility. Overfitting is avoided by including a penalty on the sums of squared differences between the coefficients \(a_q^{(ij)}\) associated with adjacent B-splines—an approach commonly referred to as P-spline modelling, cf. [22]. While strictly speaking, this model formulation is still parametric, it is commonly labelled as a nonparametric approach because with a large Q the modelling flexibility is effectively unlimited, and there is no meaningful interpretation of the coefficients \(a_q^{(ij)}\).

2.4 EM-Based Estimation of HMMs with Cyclic Splines

The model formulation presented in the previous section effectively involves nonparametric regression modelling within HMMs. For example, in case of \(N=2\) states, the model features a nonparametric logistic regression for each of the state-switching probabilities \(\gamma _{12}^{(t)}\) and \(\gamma _{21}^{(t)}\) (see Eq. (1)). For such nonparametric regression modelling, the inferential machinery (including software packages) is well-established. Therefore, we apply the expectation–maximisation (EM) algorithm to isolate the estimation of the logistic regression component from the estimation of the other parameters of the HMM, in particular those associated with the state-dependent process. This allows us to exploit the tools available for nonparametric logistic regression modelling.

To set up the EM algorithm, we consider the complete-data likelihood (CDLL) of the HMM, i.e. the joint log-likelihood of the observations and the states,

$$\begin{aligned} \ell _{\text {CDLL}} (\varvec{\theta })&= \log \biggl ( \delta _{s_1}^{(1)} \prod _{t=2}^{T} \gamma _{s_{t-1}, s_t}^{(t)} \prod _{t=1}^{T} f_{s_t} (x_t) \biggr ) \nonumber \\&= \log \delta _{s_1}^{(1)} + \sum _{t=2}^{T}\log \gamma _{s_{t-1}, s_t}^{(t)} + \sum _{t=1}^{T}\log f_{s_t}(x_t), \end{aligned}$$
(4)

with \(\varvec{\theta }\) the set of parameters necessary to define \(\delta _j^{(1)}\), \(\varvec{\Gamma }^{(t)}\), and the state-dependent distributions \(f_j(x)\). Each iteration of the EM algorithm involves an E-step, replacing all functions of the unobserved states in the CDLL by their conditional expectations (given the data and the current parameter values), and an M-step, optimising the resulting CDLL with respect to \(\varvec{\theta }\). In the case of HMMs, the appeal of the EM algorithm lies in the fact that the M-step neatly splits into several separate optimisation problems—namely one for each the initial distribution, the t.p.m., and the state-dependent distributions—which we exploit below to conveniently estimate the cyclic spline component.

To apply the E-step, we define the indicator variables

$$\begin{aligned} u_j(t)= \left\{ \begin{array}{ll} 1 &{} \quad \text {if } \ s_t = j, \\ 0 &{} \quad \text {otherwise}, \\ \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} v_{ij}(t)= \left\{ \begin{array}{ll} 1 &{}\quad \text {if } \ s_{t-1} = i \ \text { and } \ s_t = j, \\ 0 &{} \quad \text {otherwise}, \end{array} \right. \end{aligned}$$

and rewrite the CDLL as

$$\begin{aligned} \ell _{\text {CDLL}} (\varvec{\theta })&= \sum _{j=1}^N u_j(1) \log \delta _{j}^{(1)} + \sum _{i=1}^N \sum _{j=1}^N \sum _{t=2}^{T} v_{ij}(t) \log \gamma _{ij}^{(t)} \nonumber \\ {}&\quad + \sum _{j=1}^N \sum _{t=1}^{T} u_j(t) \log f_{j}(x_t). \end{aligned}$$
(5)

In the E-step, the indicator variables are then replaced by their conditional expectations

$$\begin{aligned} {\hat{u}}_j(t)=\Pr (S_t=j \mid x_1,\ldots ,x_T, \varvec{\theta }) \end{aligned}$$

and

$$\begin{aligned} {\hat{v}}_{ij}(t)=\Pr (S_{t-1}=i, S_t=j \mid x_1,\ldots ,x_T, \varvec{\theta }), \end{aligned}$$

with \(\varvec{\theta }\) the current guess of the parameter vector. These conditional expectations are calculated using the standard forward and backward recursions (see [23]).

The M-step then involves optimising the CDLL (5), with \({u}_i(t)\) and \({v}_{ij}(t)\) replaced by \({\hat{u}}_i(t)\) and \({\hat{v}}_{ij}(t)\), respectively, with respect to the parameter vector \(\varvec{\theta }\). The updated estimates of the initial state distribution as well as the state-dependent distribution are obtained as comprehensively described in [23]. For the particular model formulation considered in this contribution, the interest (and challenge) lies in the update of the parameters that affect the state transition probabilities, i.e. the second term in (5),

$$\begin{aligned} \sum _{i=1}^{N} \sum _{j=1}^{N} \sum _{t=2}^{T} {v}_{ij}(t) \log \gamma _{ij}^{(t)}. \end{aligned}$$
(6)

The first summation in (6) corresponds to the N rows of the t.p.m., each of which implies a categorical regression model for the transition to the next state. We can estimate the associated parameters of each of these regressions separately. For example, for \(N=2\), (6) becomes

$$\begin{aligned} \sum _{j=1}^{N} \sum _{t=2}^{T} {v}_{1j}(t) \log \gamma _{1j}^{(t)} + \sum _{j=1}^{N} \sum _{t=2}^{T} {v}_{2j}(t) \log \gamma _{2j}^{(t)}. \end{aligned}$$

Each of these two terms is the log-likelihood of a logistic regression model. For example, the first term can be rewritten as

$$\begin{aligned} \sum _{t=2}^{T} {v}_{11}(t) \log \bigl (1 - \gamma _{12}^{(t)}\bigr ) + \sum _{t=2}^{T} {v}_{12}(t) \log \gamma _{12}^{(t)}. \end{aligned}$$
(7)

In logistic regression terminology, the sum \(\sum _{t=2}^{T} {v}_{12}(t)\) gives the number of “successes” (here, the number of switches from state 1 to state 2), and \(\sum _{t=2}^{T} {v}_{11}(t)\) is the number of “failures” (here the number of instances when the process remains in state 1). Within EM, the indicator variables \({v}_{ij}(t)\) are replaced by their conditional expectations \({\hat{v}}_{ij}(t)\), such that (7) becomes a weighted log-likelihood.

The time-varying transition probability \(\gamma _{12}^{(t)}\) in (7) is modelled using cyclic P-splines, see (1) and (3). As described in Sect. 2.3, a wiggliness penalty is added to the weighted log-likelihood, for example

$$\begin{aligned} -\frac{\lambda _i}{2} \sum _{q=3}^Q \bigl (\triangle ^2 a_q^{(ij)}\bigr )^2, \end{aligned}$$

with \(\triangle ^2\) denoting the second-order difference operator and \(\lambda _i\) the (state-dependent) smoothing penalty (see [22]). The estimation of this weighted nonparametric logistic regression can conveniently be conducted using well-established machinery, including existing software. In the case studies below, we implemented this part of the M-step in the EM algorithm using the mgcv package in R [21].

The updated parameter estimates are then used in the E-step of the next iteration. The E and M steps are repeated until a convergence criterion defined by the user is reached [24], e.g. that the difference between the likelihood values obtained in two consecutive iterations is below some threshold. This iterative scheme identifies a (local) maximum of the likelihood function. To increase the chances of finding the global maximum, several initial starting values for \(\varvec{\theta }\) should be tested.

3 Case Studies

3.1 African Elephant

We consider hourly GPS data collected for an African elephant in Etosha National Park, Namibia, from October 2008 to August 2010. The data are available from the Movement Bank Repository [25, 26], cf. [27]. From the positional data, we calculate the Euclidean step lengths as well as the turning angles between consecutive compass directions. From these two metrics, we aim to investigate diel patterns in the elephant’s behaviour. The empirical step length distributions throughout the day indicate a relatively complex diel variation with two activity modes, which may be difficult to adequately model using parametric periodic effects (Fig. 2).

Fig. 2
figure 2

Boxplots of the elephant’s step lengths for each time of day. Outliers in the right tail are not shown for visual clarity

We model the data using 2-state HMMs with gamma and von Mises distributions for the step lengths and turning angles, respectively, assuming conditional independence of the two variables, given the states [28, 29]. For modelling diel variation in the state-switching dynamics, we consider the cyclic P-spline approach (using the default options implemented in mgcv), the trigonometric approach (2) with \(K=1,2\) and (3), and, as an additional benchmark, a model with homogeneous Markov chain (i.e. no diel variation). All fitted models feature an “encamped” state with relatively short step lengths and frequent reversals in direction (state 1) and an “exploratory” state with longer steps and higher persistence in direction (state 2)—see Fig. 6 in Appendix.

Fig. 3
figure 3

Estimated transition probabilities of the elephant as a function of time of day, for the different HMMs considered. For the P-spline model, the pointwise 95% confidence intervals (CIs) based on the Bayesian posterior covariance matrix, as provided by mgcv, are shown. The other CIs are omitted for visual clarity

Figure 3 displays the time-varying probabilities of switching states (left panel: from state 1 to state 2, right panel: vice versa) as estimated under the nonparametric as well as the parametric approach. All models detect a reduction in exploratory activity during the night. However, the flexible P-spline approach additionally captures a bimodal diel variation, with more frequent switching to the exploratory mode in the early morning hours but also in the early afternoon. In contrast, the commonly used trigonometric effect modelling with \(K=1\) (i.e. one sine and one cosine basis function) is not sufficiently flexible to identify this bimodality. When increasing the order to \(K=2\), the bimodality can be identified, however, only with \(K=3\) the parametric approach produces results similar to those obtained using splines (thus indicating that even with \(K=2\) the parametric model might be too inflexible). Furthermore, the proportion of time spent in the exploratory state, for each time of day calculated based on the Viterbi-decoded states, varies notably across the five models fitted (see Fig. 4). This underlines the importance of adequately modelling diel variation, as inflexible models can invalidate inference on the state process.

To formally compare spline-based models with parametric alternatives using trigonometric base functions, we consult the Akaike information criterion (AIC) and the Bayesian information criterion (BIC; see Table 1). The AIC and the BIC favour the trigonometric models with \(K=5\) and \(K=3\), respectively, with the spline-based approach arriving at a similarly complex model as measured by the effective degrees of freedom (edf). Notably, this demonstrates that the penalised spline approach corresponds to data-driven model selection since the choice of the wiggliness penalty \(\lambda \) aims at achieving a favourable balance between underfitting and overfitting.

Table 1 AIC and BIC values of the different HMMs considered for the elephant data
Fig. 4
figure 4

Proportion of time spent in state 2 (“exploratory”) by the elephant according to Viterbi state decoding based on the different models considered

3.2 Common Fruit Flies

In the second case study, we consider the locomotor activity of laboratory wild type Drosophila melanogaster (iso31) [30]. We collected 2- to 3-days-old male flies and trained them individually to a standard 12-h-light and 12-h-dark condition (LD) for 4.5 days in locomotor tubes. Subsequently, we subjected them to 6 days of constant darkness (DD). The temperature was kept constant (25\(^{\circ }\)C). During these 10 days, locomotor activity was recorded using the Drosophila Activity Monitor (DAM) system (TriKinetics Inc), by counting the times each fly interrupts the infrared beam passing the middle of the locomotor tube. We consider two time series—one under light condition LD and the other under condition DD—for each of 15 individuals. Each observation is the count of beams crossed over a period of 30 min.

The time series of half-hourly counts are modelled using a 2-state HMM with negative binomial state-dependent distributions. For the state transition probabilities, we consider the same time-varying predictors as in the elephant example, additionally allowing for different periodic effects under the two light conditions. The fitted models’ states are associated with low and high activity, with state-dependent mean counts of 2.7 and 54.9, respectively, obtained for the spline-based model (see Fig. 7 in Appendix.).

Fig. 5
figure 5

Model-implied probability of the fruit flies occupying the high-activity state for conditions LD (left) and DD (right), as implied under the different models fitted. For the P-spline model, the pointwise 95% CIs are shown, obtained via Monte Carlo simulation using the Bayesian posterior covariance matrix provided by mgcv. The other CIs are omitted for visual clarity. Horizontal bars indicate the light–dark cycle (LD) in black and white, while under constant darkness (DD), the previous times of light are indicated in grey

Figure 5 shows the time-varying probability of occupying the high-activity state as obtained for the different models considered. Note that these are effectively summary statistics implied by the time-varying t.p.m., which are shown here to facilitate the comparison between the two light conditions. The results emphasise the importance of allowing for sufficient modelling flexibility in the periodic effects, as the commonly used approach with only one sine and one cosine basis function fails to capture several key characteristics: (1) the bimodal activity pattern over the course of the day; (2) the near-certain occupancy of the high-activity state in the evening hours (and of the low-activity state during the night) in the DD condition; (3) the fact that the first activity peak is less pronounced, and the second more pronounced, in the DD condition. The other three models—i.e. those with trigonometric effect modelling and either \(K=2\) or \(K=3\) as well as the spline-based model—all yield similar results. The slight midday peak only revealed by the spline-based approach—as well as by trigonometric models with \(K\ge 4\) (not displayed in the figure)—was also found in another study on activity patterns of fruit flies, albeit under varying temperature conditions [31]. Furthermore, the model comparison shows that the spline-based approach leads to a similar fit as the trigonometric approach, with the BIC favouring the latter with \(K=4\) (see Table 2).

Table 2 AIC and BIC values of the different HMMs considered for the fruit fly data

4 Conclusion

Periodic variation in time series data is often of key interest but can be challenging to adequately incorporate in state-switching models. To reveal potentially complex patterns, e.g. multiple activity peaks throughout the day, we explored a flexible nonparametric approach using cyclic P-splines. As illustrated in two case studies, such flexibility in the modelling of periodic variation can uncover relevant patterns that may otherwise go unnoticed, emphasising the potential usefulness of the approach, in particular in settings where periodic variation is of primary interest (cf. [32,33,34]).

We implemented the HMM with cyclic P-splines building on the mgcv functionality within the EM algorithm. However, this is not the only option for conducting inference for such a model. In particular, optimisation of the HMM’s marginal likelihood, obtained by integrating out the spline coefficients using the Laplace approximation [35], has recently been explored and is already implemented in the immensely flexible R package hmmTMB [36]. Moreover, both hmmTMB and the presented EM algorithm are not limited to P-spline modelling but could also incorporate other functional relationships within HMMs, such as random effects or more complex smoothing functions provided by mgcv.

The manifold possibilities of flexibly modelling covariate effects further complicate the already difficult task of model selection in HMMs, which is aggravated by the interplay of the number of states and the modelling of the state process [37]. Although information criteria offer some guidance in identifying an adequately complex model, additional considerations regarding interpretability and computational costs need to be taken into account. For example, even if a spline-based model fits the data slightly better than a simpler parametric one, this advantage may be outweighed by the additional computational burden. Therefore, HMMs demand a holistic approach to model selection, that is to pragmatically balance goodness of fit and study aims.

In practice, nonparametric modelling of periodic variation allows to investigate temporal patterns without making any restrictive assumptions a priori. When used as an exploratory tool, the approach may of course also show that simple trigonometric modelling is sufficient. In both case studies presented in this contribution, information criteria did indeed favour trigonometric modelling of the periodic variation, though only when using considerably more basis functions than are commonly applied in ecological modelling. Selecting an adequate number of basis functions can be tedious in practice, such that the spline-based approach, with its automated optimisation of the bias-variance trade-off via penalised likelihood, may sometimes be more convenient to implement. In any case, incorporating the required flexibility to capture periodic variation—whether using nonparametric or parametric modelling—is crucial for guaranteeing valid inference on behavioural processes.