Introduction

Remote sensing and telemetry data are widely used to study environmental impacts on animal life (Kays et al. 2015; Nathan et al. 2022). Analyses based on these sources already helped to uncover a variety of animal-environment interactions (e.g. Pekarsky et al. 2021) and can assist to examine landscape changes and their implications for individuals and entire populations (del Mar Delgado et al. 2018). These, in turn, provide a baseline for ecologists and allow landscape planners and managers to make informed decisions that benefit animal welfare and conservation (Kerr and Ostrovsky 2003; Pettorelli et al. 2011).

Vegetation characteristics and their dynamics are key features of animal habitats because they are linked to the availability of food, nesting sites and shelter (He et al. 2015). Remote sensing time series—hereafter referred to as time series—are the primary means of capturing vegetation dynamics in animal-environment studies. Typically, data with moderate spatial (250–1000 m)/high temporal (~ 1–2 days) resolution are used to infer large-scale vegetation phenology (Pettorelli et al. 2014; Neumann et al. 2015). For example, the “Green Wave Surfer” studies relied on multitemporal MODIS imagery to identify areas/periods of high forage quality for herbivores (e.g. Aikens et al. 2017). Enhancements to existing satellite missions (e.g., Landsat) and new missions (e.g., Sentinel) are increasingly providing time series with high spatial (10–30 m)/medium temporal (~ 2–10 days) resolution (Roy and Yan 2018). Consequently, small-scale vegetation dynamics can also be identified and addressed in animal ecology. As an example, Standfuß et al. (2022) used Landsat-based Normalized Difference Vegetation Index (NDVI) time series to distinguish periods of favourable/poor prey accessibility for white storks at the field scale. They first identified local half-maximum dates from fitted intra-annual NDVI profiles per field (Fig. 1A). These delineate the midpoints between successive NDVI minima/maxima and are indicators for leaf-unfolding and loss of canopy structure (e.g. due to harvesting), respectively (Fisher et al. 2006; Bradley et al. 2007). Periods of short vegetation are then defined as consecutive days with NDVI values below those of the nearest half-maximum date(s), while the remaining periods should exhibit high vegetation. The former are known to be advantageous for storks to access to their prey (Johst et al. 2001).

Fig. 1
figure 1

Original discrimination of periods of favourable/poor prey accessibility for storks (based on half-maximum dates) on an exemplary field (A) and schematic representation of potential deviations of these periods (B) due to radiometric uncertainty and uncertainty about the fitted NDVI profile due to irregular temporal NDVI sampling

Continued improvements in time series data and methods, combined with advances in telemetry tracking and movement analyses (Nathan et al. 2022), are opening up new opportunities to study the effects of human-induced habitat change (Northrup et al. 2021). However, like any measurement system, both data sources are associated with uncertainties that can potentially confound the results of animal-environment studies. Telemetry devices record the location of animals in equal temporal intervals. In practice, sampling is often irregular and the recorded locations are error-prone (Frair et al. 2010). Irregular sampling is mostly caused by failed location attempts, which can be triggered by dense vegetation cover and further exacerbated in complex terrain (Dussault et al. 1999; D’Eon et al. 2002). Furthermore, for solar-powered devices, the temporal sampling interval can be reduced during prolonged periods of bad weather to conserve battery power (e.g. from 5 to 20 min in Rotics et al. (2016)). In addition to the irregular sampling, the tracked locations are inaccurate by at least a few metres, which is driven by landscape characteristics and by the Global Navigation Satellite System (GNSS) in use (Fleming et al. 2020). These errors are not consistent and can vary by location (Fleming et al. 2015, 2020). Earth observation satellites, by contrast, capture imagery of the entire planetary surface at regular temporal intervals. However, optical satellite data, the main source for vegetation monitoring, only provide valid observations under cloud-free conditions (Li et al. 2022). A regular temporal resolution is therefore rarely achieved, and coarser, irregular intervals are the norm (King et al. 2013). Even data products with fixed temporal intervals (e.g., MODIS 8-/16-day products) are affected. These usually specify regular observation dates, although the data for these are often derived from images taken days apart (Guindin-Garcia et al. 2012; Zeng et al. 2021). Besides the temporal irregularities, optical satellite imagery itself is subject to radiometric uncertainty. It can be caused by measurement and instrument noise, atmospheric conditions, calibration and data processing (Janesick 2001; Atkinson and Foody 2006; Chander et al. 2013), and expresses the degree of doubt about the measured values at each pixel/spectral band (Gorroño et al. 2018).

A thorough understanding of habitat selection and its impact on animals is crucial for effective conservation practices (Kerr and Ostrovsky 2003). Neglecting uncertainties in the input data can lead to incorrect conclusions and misdirected conservation efforts (Costa et al. 2018). To increase the likelihood of achieving good conservation results, uncertainties need to be quantified, reported (e.g., through tolerance intervals or uncertainty maps), and sensitively incorporated into conservation planning (Barry and Elith 2006; Jansen et al. 2022). Uncertainties in telemetry data can bias ecological inferences through overrepresentation of some better sampled habitats or spatial mismatch between used and sampled locations (D’Eon et al. 2002; Frair et al. 2010). Numerous methods have therefore been developed to address uncertainty in telemetry data. Advanced techniques treat animal movement not as a sequence of locations, but as a continuous-time stochastic process (CTSP) (Calabrese et al. 2016). They fit a movement model to the telemetry data and then perform probabilistic path reconstruction. A major advantage of the CTSP approach is that it can account for irregular sampling and location errors (e.g., through a pre-calibrated error model (Fleming et al. 2020)) in the modelling process. When errors are specified, path reconstruction is performed fully probabilistically by generating predictions of the ‘true’ locations and then interpolating between them, all depending on the movement model, data sampling and error (Fleming et al. 2015). The latter permits the integration and quantification of telemetry uncertainty when deriving metrics of habitat use, such as home ranges or utilization distributions (Calabrese et al. 2016).

Although advances in movement modelling methods enable accounting for uncertainties in telemetry data in animal-environment studies, uncertainties in time series data have been disregarded thus far. However, these can also bias inferences made about habitat selection, which is illustrated below using the work of Standfuß et al. (2022). Their analyses were based on intra-annual NDVI profiles fitted to temporal NDVI samples from Landsat time series. These data can be affected differently by time series uncertainties. First, the radiometric uncertainty per-pixel/spectral band would propagate to a certain scatter of the NDVI values of the temporal samples (Fig. 1B) (Graf et al. 2023). Second, uncertainties in the fit of intra-annual NDVI profiles are introduced by the irregular temporal sampling of NDVI (Schwieder et al. 2016). For example, by omitting just one of the initial temporal samples, the fitted NDVI profile of a field (Fig. 1B) can deviate significantly from the original one (Fig. 1A). Radiometric uncertainty and/or irregular temporal sampling can therefore affect the temporal differentiation of favourable/poor prey accessibility for storks (compare Fig. 1A/B) and ultimately the validity of the study’s findings.

To date, one factor potentially limiting the ability to account for time series uncertainties is likely to be the lack of information about them. First, there is no alternative system that provides daily reference measurements, such as NDVI, for large study areas. Second, per-pixel/spectral band radiometric uncertainties are often not reported by data providers, although efforts are underway for Sentinel-2 (e.g. Gorroño et al. 2018). Furthermore, the characterisation of radiometric uncertainties is both computationally and mathematically challenging, e.g. due to the so-called ‘curse of dimensionality’ (Kerr et al. 2015), and likely beyond the scope of individual studies. Therefore, it remains difficult, if not impossible, to assess whether and how the results of animal-environment studies are affected by the cumulative uncertainty of time series and telemetry data. This is particularly true when using time series data for which radiometric uncertainty has not yet been reported (e.g. Landsat).

To address this gap, we propose a data-driven approach to approximate uncertainties in time series, that can be applied in scenarios where official information on them is lacking. Coupled with advanced movement modelling, this allows to determine whether the results of animal-environment studies are robust to the cumulative uncertainties of time series and telemetry data. Using the study by Standfuß et al. (2022) as an example, we apply our approach and approximate time series uncertainties based on data subsampling. Additionally, we use a CTSP method to characterise uncertainties of the telemetry data through probabilistic path reconstruction. Based on these two uncertainty estimates, we aim to evaluate whether the storks’ preference for fields during periods of favourable prey accessibility can be confirmed. We use a Monte Carlo method to propagate the telemetry and time series uncertainties, and derive different estimates of stork habitat use and levels of prey accessibility at field-scale. These data are then applied in two analyses to derive probability distributions of the analyses results and quantify the output uncertainties. Both analyses examine foraging habitat selection but at different spatial scales. The first compares all used fields with all available fields in the storks’ breeding areas, while the second compares individual used (presence) fields with simultaneously unused (absence) ones in their immediate vicinity. By following this approach, we aim to answer two research questions:

RQ1:

Is the time series-based differentiation of prey accessibility still valid when accounting for cumulative input data uncertainties?

RQ2:

Does the number of temporal NDVI samples affect the validity of the time series-based differentiation of prey accessibility?

Material and methods

Study sites

The spatial focus of the study by Standfuß et al. (2022) was on the breeding areas of 18 adult storks (3.3 km around each nest) during the 2014 breeding season. These were located in north-eastern Germany (Fig. 2A) and consisted mainly of croplands and intensively cultivated grasslands (Zurell et al. 2018).

Fig. 2
figure 2

Overview of the study sites and the remote sensing time series data (A) and telemetry data (B) used. Additionally, the proxies used to capture the time series-based prey accessibility for storks (half-maximum dates and half-maximum-amplitude (HM-amplitude)) (D/E) and to characterise stork habitat use (foraging time) C are shown. Regarding D/E, periods of favourable prey accessibility occur when fields have low vegetation and low NDVI. However, what is considered a low NDVI value is subjective and depends on the vegetation type (compare D/E) (Pettorelli et al. 2011). Plot C additionally shows a Kernel Density Estimate (KDE), which was used to characterise the distribution of measured foraging time per field/stork/foraging bout, considering uncertainties in the telemetry data

Input data

The data used for the time series-based differentiation of prey accessibility for storks (Standfuß et al. 2022) are described below:

The spatial reference areas of the study were cropland and grassland fields within the storks’ breeding areas (total of 2855 cropland fields/2908 grassland fields). Thus, a dataset capturing the geometries of these entities was generated using the Digital Landcover Model of Germany (BKG 2018) and a Landsat-based landcover classification (Mack et al. 2016).

Field-level habitat use by the storks was assessed based on E-obs telemetry data (e-obs GmbH, Munich, Germany) with assigned behavioural information recorded during the 2014 breeding season (April-August). The data were initially divided into foraging bouts, that captured consecutive locations where the birds displayed typical foraging behaviour, i.e. ‘walking’ and brief ‘resting’ periods. Using CTSP movement modelling (Calabrese et al. 2016), multiple quasi-continuous movement paths (15 s resolution) were then simulated for each bout until convergence (Fig. 2B). Finally, these simulation sets were linked to the field geometry dataset to quantify the birds’ habitat use, measured as the median foraging time per field/foraging bout (Fig. 2C).

A time series of Landsat-7/-8 (Level-1) data from 2014 (58 Landsat-8 and 46 Landsat-7 scenes, spread over five tiles (Fig. 2A)) was used to capture periods of favourable/poor prey accessibility for storks per field. These data were initially atmospherically corrected to derive surface reflectance. Invalid pixels such as clouds, shadows or those affected by the SLC-off failure (Landsat-7) were excluded. Additionally, the Landsat-7 data were adjusted (harmonized) to the reflectance wavelength of the Landsat-8 data. After pre-processing, the NDVI was calculated for each image in the time series and linked to the field geometry dataset to derive the field-wise temporal NDVI samples (median per time step) (Fig. 2D/E). Only fields with ≥ six temporal NDVI samples were considered further, as Franke et al. (2012) found that fewer records did not adequately represent vegetation phenology. Using thin-plate spline curve fitting (Nychka et al. 2017), field-wise NDVI profiles with a 1-day resolution were then derived from the temporal NDVI samples. These formed the basis for the identification of the half-maximum dates (midpoints between successive NDVI minima/maxima) and for distinguishing periods of favourable (NDVI < NDVI closest half-maximum date) from those of poor (NDVI > NDVI closest half-maximum date) prey accessibility for storks. In addition, Standfuß et al. (2022) introduced and calculated the half-maximum amplitude (HM-amplitude) (Fig. 2D/E) to be used as a continuous predictor for stork prey accessibility in habitat selection modelling. It measures the difference between the NDVI of a given day (NDVIDay) and the NDVI of the closest half-maximum date(s) (NDVIHM_closest), which can be simply written as:

$$HM-amplitude= {NDVI}_{H{M}_{closest}}- {NDVI}_{Day}$$

It therefore takes positive values during favourable prey accessibility periods and negative values during periods with poor prey accessibility.

Additional details on the input data processing are provided in the “Supporting Information” by Standfuß et al. (2022).

Analyses of foraging habitat selection and key findings

Standfuß et al. (2022) investigated stork foraging habitat selection in two analyses carried out at two different spatial scales (analysis A—breeding area scale and analysis B—field scale). Their overall aim was to identify whether the time series-based differentiation of prey accessibility actually captures favourable/poor foraging conditions for storks. Furthermore, it was tested whether NDVI was less effective than HM-amplitude in predicting habitat selection (within analysis B). For storks, low NDVI values are also assumed to capture favourable foraging conditions (Zurell et al. 2018) (Fig. 2D/E).

Analysis A compared the habitat characteristics of all used fields with those of all available fields in the breeding areas on a daily basis. Habitat characteristics were specified as either favourable or poor conditions for storks to access prey. Use and availability were contrasted using the Manly selectivity index (Manly 2002) which enabled to identify days with significant (Chi-squared test: p-value < 0.05) selection (use > availability) or avoidance (use < availability) of the two habitat characteristics. The proportion of days (throughout the breeding season) with significant avoidance/selection of favourable/poor habitat characteristic were determined separately for croplands and grasslands. The results by Standfuß et al. (2022) indicated that storks tended to prefer favourable habitat characteristics when foraging, as these were selected to a greater extent than poor ones (croplands/grasslands). Furthermore, in croplands, poor habitat characteristics were avoided more than favourable habitat characteristics during the breeding season.

Analysis B was oriented towards the location of individual fields in use and compared them with simultaneously unused fields in their immediate vicinity. For this purpose, a presence-absence dataset was derived, which distinguished visited fields (foraging time > 1 min) and unvisited fields (random set of 3 fields with foraging time < 1 min within 1 km distance of a visited field (adopted from Zurell et al. (2018)). Each of the presence-absence fields was characterised by four predictor variables: nest distance (conditioned on the visiting stork), landcover, NDVI, and HM-amplitude. The relationship between foraging habitat selection (presence-absence) and the predictor variables was modelled with two Generalized Linear Mixed Models. In these models, both nest distance and landcover, and either HM-amplitude (Model 1) or NDVI (Model 2) were included as predictors. Stork identity was taken as random effect on slope and intercept of the predictors. The two-model setup was necessary because HM-amplitude and NDVI were correlated. Standfuß et al. (2022) found a significant positive effect (p-value < 0.001) of the HM-amplitude on habitat selection, providing further support that storks prefer to forage during periods with favourable prey accessibility. Furthermore, they identified HM-amplitude (Model 1) to be a more effective predictor than NDVI (Model 2), as its estimated predictive power (effect size) was comparatively stronger.

In the next sections we outline the characterisation and propagation of the cumulative uncertainties of the telemetry and time series data used by Standfuß et al. (2022), to quantify their impact on the results of the two habitat selection analyses. We adhere to the procedure presented in the “Guide to the Expression of Uncertainty in Measurement” (GUM) (JCGM 2008a) as well as one of its supplements (JCGM 2008b).

Characterisation of input data uncertainties

For the telemetry data, we focussed on uncertainties introduced through irregular sampling (bad weather/failed location attempts) and location inaccuracies (landscape and GNSS dependent). The e-obs data used were already pre-calibrated for positional error. Thus, every simulation with a CTSP movement model resulted in a prediction of one potential movement path taken by a stork during foraging, while accounting for irregular sampling and positional errors (Fleming et al. 2015). In each simulation set derived per foraging bout, all the paths taken together then indicate the probability that a stork foraged in a given area for a given time (Fleming et al. 2015). In Standfuß et al. (2022), foraging habitat use and all associated variables (presence-absence) were assessed on the basis of foraging time per field. By combining the sets of simulated movement paths with the field geometry dataset, we could generate probability distributions of the foraging times per field/stork during a foraging bout (Fig. 2C). These gave an estimate of the resulting uncertainty in the calculated foraging times.

With respect to the time series, we considered two sources of uncertainty; (1) the degree of doubt about the NDVI values at each temporal sample and (2) the irregular temporal sampling of NDVI.

(1) We considered the degree of doubt about the NDVI values of the temporal samples to be the result of radiometric uncertainty in the Landsat data. This radiometric uncertainty should propagate and lead to a certain spread in NDVI values when calculating the index (Graf et al. 2023). However, information on radiometric uncertainty of surface reflectance is not reported for Landsat data, and calculation using sensor calibration information is both challenging and an active field of research (e.g. Gorroño et al. 2023). It is therefore beyond the scope of this and other applied studies to date. To still approximate the spread of NDVI per sample day, we therefore proposed a data-driven approach based on subsampling the temporal NDVI samples per field. First, we drew several unordered sets (without replacement) of six temporal samples (the minimum required to approximate intra-annual vegetation phenology (Franke et al. 2012)) from the available temporal NDVI samples (Fig. 3A). The number of these subsample sets depended on the total number of available temporal NDVI samples per field. We set the minimum number of required subsample sets per field to > 75. The latter was a compromise between a reasonably sufficient approximation of the spread of NDVI values and minimal data loss (compared to Standfuß et al. (2022) in terms of the number of considered fields: Supplement A). We therefore relied on fields with ≥ 9 temporal NDVI samples, which allowed the generation of at least 84 subsample sets per field. Based on the temporal NDVI samples of the subsample sets, we generated several intra-annual NDVI profiles (1-day resolution) (Fig. 3B) per field using thin-plate spline curve fitting provided in the R package ‘fields’ (Nychka et al. 2017). The set of all NDVI profiles of a field allowed us to approximate probability distributions of possible NDVI values per day (Fig. 3C). The latter provided an estimate of the degree of doubt about the NDVI values on each temporal sampling day.

Fig. 3
figure 3

Schematic overview of the approach used to approximate radiometric uncertainties of the temporal NDVI samples per field. A shows a selection of subsample sets derived based on the original temporal NDVI samples of a field. B displays the NDVI profiles that were fitted per subsample set as well as the original fitted NDVI profile. C shows the probability distribution of the NDVI values of a sampling day (DOY 220) and the kernel density estimate (KDE), which mathematically characterises the NDVI scatter and thus, the radiometric uncertainty

(2) Uncertainties due to irregular temporal NDVI sampling were not approximated by probability distributions. We addressed these indirectly during uncertainty propagation (next section).

To mathematically describe the estimated degree of doubt about the NDVI values per temporal sampling day and about the foraging times per field/foraging bout, we derived probability density functions (PDFs) from the probability distributions using kernel density estimation (KDE) (Figs. 2 and 3C). KDE is a non-parametric method that does not require knowledge of the shape of the underlying distribution, and has proven advantageous in data-driven applications (Lataniotis 2019). Only the bandwidth h from which the PDFs are derived needs to be specified. Here, we used the R package ‘Kedd’ (Guidoum 2015) to determine appropriate h values through maximum-likelihood cross-validation, and then derived the PDFs through KDE using the R package ‘KernSmooth’ (Wand 2020).

Propagation of input data uncertainties

Below, we describe the propagation of the uncertainties in the telemetry and time series data to quantify their impact on the results of analyses A and B. We refer to the temporal NDVI samples per field and the field-wise foraging times per foraging bout, i.e. the input data of Standfuß et al. (2022), as Xfields and Xforaging_bouts, respectively. Similarly, we refer to the original results of analyses A and B as YA and YB.

Since the PDFs characterising the uncertainties of Xfields and Xforaging_Bouts could not be characterised by Gaussian or T-distributions, the probability distributions of YA and YB could not a priori be described through these distributions either. We therefore used Monte Carlo Simulation (MCS) to propagate the input data uncertainties (JCGM 2008b). The MCS repeatedly draws random samples – until convergence – from the input data PDFs and uses them to compute different estimates of the analysis outputs. By accumulating the latter, it is then possible to obtain an estimate of the PDF of an output quantity and thus characterise its uncertainty.

In order to draw samples from an input data PDF, we used inverse transform sampling (Fig. 4: Step B). Given an input variable defined over a domain Ω, we estimated its Cumulative Density Functions (CDFs) −\(f: \Omega \to [0;1]\)- by integration of their PDFs. In each MCS trial Mi, we then obtained input data samples—Xfields Mi and Xforaging_bouts Mi—by drawing random numbers—\({\xi }_{i}\)—from a uniform distribution in [0;1] and computing \({{f}_{e}}^{-1}({\xi }_{i})\). A single \({\xi }_{i}\) was drawn for each independent input entity, i.e., one for every field i of Xfields and one for every foraging bout i of Xforaging_bouts. This \({\xi }_{i}\) was used to compute samples e, either of the NDVI values of the temporal samples of a field i or of the foraging times per field during a foraging bout i.

Fig. 4
figure 4

Overview of the workflow (from top to bottom) for propagating input data uncertainties (telemetry and time series) (steps A–D) using Monte Carlo Simulation (MCS) to quantify the output uncertainties (results of analyses A and B) (step E). Step A applies only to the time series data and approximates uncertainties due to irregular temporal NDVI sampling by first defining the number (A_1) of temporal samples to be considered per field and then randomly selecting dates (day of the year) for these within the range of observation dates (A_2). Within step B inverse transform sampling is used to draw samples of the foraging times per field/foraging bout (telemetry) and of the NDVI values (time series) of the temporal samples defined in step A. Step C then uses the samples to derive one realization (X Mi) of each input dataset (telemetry/time series). All N input data sets from one MCS analysis (X M1-N) are then used in analyses A and B (step D) to derive probability distributions of the analysis results (step E). The latter provide the basis for the calculation of measures that quantify the output uncertainty (step E)

Within each MCS trial Mi, we then used the drawn NDVI values of the temporal samples of each field iXfields Mi—to first fit an intra-annual NDVI profile (1-day resolution) and then identify the half-maximum dates per field. These in turn provided the basis for distinguishing the periods of favourable/poor prey accessibility and for deriving the HM-amplitude (Fig. 4: Step C). Additionally, we generated presence-absence locations (fields) based on the sampled foraging times per field/foraging bout iXforaging_bouts Mi. Finally, the above datasets were used as input for analyses A and B to calculate one realisation of YA Mi and YB Mi (Fig. 4: Step D). Taken together, the results of all MCS trials (N) characterise the probability distributions of YA and YB, i.e. P(YA) and P(YB) (Fig. 4: Step E).

We conducted two sets of MCS analyses to answer the research questions posed in this study. These differed in terms of how the uncertainties due to irregular NDVI sampling (Fig. 4: Step A) (e.g. missing NDVI samples due to high cloud cover on certain dates) were addressed.

To assess the general validity of the original outcomes YA and YB and thus, of the time series-based differentiation of prey accessibility (RQ1), we performed a single separate MCS analysis (MCS_a). In each MCS trial Mi, we first randomly selected a number n of (at least six) temporal samples e of each field i (Xfields) within the range of observation dates (Fig. 4: Step A_1/A_2). Subsequently, we drew the NDVI values for all e in n (Fig. 4: Step B).

To evaluate the influence of the number of temporal NDVI samples on the analysis outcomes YA and YB (RQ2), we performed five separate MCS analyses (MCS_b-MCS_f). In each MCS analysis, we specified the number n of temporal NDVI samples e to be considered – six up to ten – for all Xfields beforehand and only selected their dates randomly during each MCS trial Mi (Fig. 4: Step A_1/A_2). To ensure comparability of the results of the five MCS analyses and enable a certain variation of the randomly selected dates, we only considered fields with ≥ 13 temporal NDVI samples to investigate RQ2. This decision allowed us to reproduce the original results despite further data loss (compared to the number of fields considered to assess RQ1: Supplement A), but limited the testable number n of temporal samples to a maximum of ten.

In order to minimise the number of MCS trials N and thus the computational time for each MCS analysis, we defined a convergence criterion. We calculated the standard deviation (SD) of the simulated results YA and YB after every 100th MCS trial and computed the difference between successive SDs (SD_diff). Convergence of an MCS analysis was achieved when the SD_diff was ≤ 0.001 three times in a row. The latter ensured that the SD_diff was not the result of randomly drawing similar samples within a batch of 100 simulations. Following this procedure, we conducted 1,300 MCS trials in the single separate MCS analysis (MCS_a) to address RQ1, and 1,100 (seven temporal samples) to 1,600 (ten temporal samples) trials in the five MCS analyses (MCS_b-MCS_f) to address RQ2.

Quantification of output uncertainties

Finally, we calculated three summary statistics (JCGM 2008a) based on P(YA) and P(YB) to quantify the output uncertainties (Fig. 4: Step E). The mean provides an estimate of the output quantities and the standard deviation (SD) is a proxy for their standard uncertainty. In addition, the spread of the output quantities can be characterised through tolerance intervals. These statistical intervals cover a pre-defined proportion of the population—here of P(YA) and P(YB)—with a certain level of confidence (coverage probability) (Meeker et al. 1991).

We derived mean, SD and 99%/95% tolerance intervals (covering 99% of the population with 95% confidence) based on P(YA) and P(YB) of each MCS analysis (RQ1: MCS_a and RQ2: MCS_b-MCS_f). As the probability distributions of the outputs were neither characterizable through Gaussian nor T-distributions, we computed the tolerance intervals with a non-parametric approach using the R package ‘tolerance’ (Young 2010).

RQ1:

Is the time series-based differentiation of prey accessibility still valid when accounting for cumulative input data uncertainties?

To answer RQ1, we investigated whether the results obtained after accounting for the uncertainties of the input data still coincided with the original results YA and YB. Hence, we tested whether the former still indicate that storks preferentially selected habitats with favourable prey accessibility and/or avoided habitats with poor prey accessibility when foraging. Moreover, we evaluated the predictive power (effect size, direction and significance) of the NDVI and the HM-amplitude with respect to the uncertainties.

RQ2:

Does the number of temporal NDVI samples influence the time series-based differentiation of prey accessibility?

RQ2 was examined in two ways. Similar to RQ1, we first tested if the results from the different MCS analyses (MCS_b-MCS_f) were still in line with the original results. Second, we evaluated whether the spread of P(YA) and P(YB) decreased/changed with more temporal samples.

In this study, we relied on subsets of the original data to investigate the effects of cumulative input data uncertainties. Specifically, we used a dataset with fields with ≥ 9 and ≥ 13 temporal NDVI samples to investigate RQ1 and RQ2, respectively. To assure consistency in the aforementioned comparisons, we recalculated the original results YA and YB based on these subsets using the original input data values.

Results

RQ1: Is the time series-based differentiation of prey accessibility still valid when accounting for cumulative input data uncertainties?

Analysis A Fig. 5 shows the probability distributions of the estimated proportions with which the storks selected/avoided foraging habitats (fields) with favourable/poor prey accessibility (hereafter termed favourable/poor habitats) in the breeding areas. After accounting for the uncertainties, the PDFs indicated that the birds selected favourable habitats over poor habitats in both, croplands and grasslands (Fig. 5A/B). At the same time, we found that poor habitats were avoided more often than favourable habitats in croplands but not in grasslands. The above-mentioned proportions could be distinguished from each other, as indicated by their mostly non-overlapping tolerance intervals. Moreover, the distributions of most of the estimated proportions of selection/avoidance of favourable/poor foraging habitats were consistent with those found in the original results by Standfuß et al. (2022) and fell within their 99% tolerance interval, except for ‘avoidance of poor habitats’ in grasslands (Supplement B).

Fig. 5
figure 5

Proportion(s) of days (during the breeding season 2014) (x-axis) on which the studied storks selected/avoided foraging habitats with good/poor prey accessibility in croplands (A) and grasslands (B). Each plot shows both, the original proportions (vertical lines) from Standfuß et al. (2022) as well as the proportions calculated considering the cumulative input data (telemetry and time series) uncertainties (quantified by tolerance intervals and probability density distributions)

Analysis B The probability distributions of the estimated effect size of the four predictors—nest distance, landcover, HM-amplitude and NDVI—used to model the field-wise foraging habitat selection by storks are shown in Fig. 6. Nest distance and landcover appeared to be only marginally affected by the input data uncertainties, as indicated by their narrow tolerance intervals in both, Model 1 and Model 2 (Fig. 6A/B). In comparison, the width of the tolerance interval and the standard uncertainty (SD) (Supplement: C) were larger for HM-amplitude (Model 1) and NDVI (Model 2). Overall, we observed a strong positive effect of HM-amplitude on stork habitat selection (Fig. 6A), indicated by the estimated effect size of 3.26 and the tolerance interval (2.61–3.92) not overlapping with zero (no effect). Moreover, this estimated effect size was only slightly smaller than its original counterpart (3.46) found by Standfuß et al. (2022). NDVI, on the contrary, had a negative effect on foraging habitat selection in both, the estimated and the original effect size (Fig. 6B). Nevertheless, our estimated effect size (− 0.67) was markedly larger (closer to zero) than the original effect size (− 1.54). This difference was so pronounced that the original effect size fell outside the estimated tolerance interval (− 1.34–0.04). In addition, we observed that the upper end of the tolerance interval of NDVI was approaching zero, indicating that the predictor did not have a significant effect on stork habitat selection in all MCS trials. Accordingly, we found nest distance, landcover and HM-amplitude to be significant (p-value < 0.001) in all 1,300 (100.00%) MCS trials while NDVI was only significant in 222 (17.0 8%) of them.

Fig. 6
figure 6

Effect size and direction (x-axis) of the four predictors—HM-amplitude (A), NDVI (B), nest distance (A/B) and landcover (grassland) (A/B)—used to model stork foraging habitat selection of white storks during the 2014 breeding season. Each plot shows both, the original effect size and direction (vertical line) from Standfuß et al. (2022)as well as the effect size and direction (mean) that were determined under consideration of the input data (telemetry and time series) uncertainties (including tolerance intervals and probability density distributions)

RQ2) Does the number of temporal NDVI samples influence the time series-based differentiation of prey accessibility?

Analysis A Fig. 7 shows the estimated proportions by which storks selected/avoided favourable/poor foraging habitats (fields) (x-axis) in the breeding areas as a function of the number of temporal NDVI samples considered per MCS analysis (y-axis). In croplands, the estimated proportions and tolerance intervals provided evidence that storks selected favourable habitats more often than poor ones, and vice versa, avoided poor habitats to a greater extent than favourable ones (Fig. 7A). This pattern was consistent with the original results of Standfuß et al. (2022), which fell within the tolerance intervals, and appeared to be independent of the number of temporal NDVI samples considered. Furthermore, the scatter associated with uncertainties of the estimates was similar among the results of the MCS analyses and decreased only slightly with more temporal samples. In grassland, however, a clear preference for favourable over poor habitats only emerged with ten temporal NDVI samples (Fig. 7B). When fewer temporal samples were considered, the estimates showed only a slight preference for favourable foraging habitats and were not consistent with the original results. The latter fell even outside the tolerance intervals in some cases. Moreover, when only six to nine temporal samples were used, the uncertainty was greater compared to ten temporal samples. This was reflected by wider and highly overlapping tolerance intervals and larger SDs (Supplement D).

Fig. 7
figure 7

Estimated proportion(s) (mean and tolerance intervals (99%/95%)) of days (during the 2014 breeding season) (x-axis) on which the studied storks selected/avoided foraging habitats with good/poor prey accessibility in croplands (A) and grasslands (B) as a function of the number of temporal NDVI samples considered per MCS analysis (y-axis: 6 to 10 samples). In addition, the original proportions (vertical lines) of Standfuß et al. (2022) are plotted

Analysis B Fig. 8 shows the estimated effect sizes of the predictors used to model the field-wise foraging habitat selection (x-axis) as a function of the number of temporal NDVI samples considered per MCS analysis (y-axis). In both, Model 1 and Model 2, nest distance and landcover were not affected by the number of temporal NDVI samples considered (Fig. 8A/B). We observed a strong positive effect of the HM-amplitude on foraging habitat selection, consistent with the original effect size in Standfuß et al. (2022), regardless of the number of temporal NDVI samples (Fig. 8A). This effect became most pronounced when nine to ten temporal NDVI samples were considered. The NDVI showed only a trend towards a small negative effect on foraging habitat selection in all MCS analyses. Furthermore, the effect size estimated through the different MCS analyses was significantly larger (smaller effect) than the original effect size from Standfuß et al. (2022), which fell always outside the estimated tolerance intervals (Fig. 8B). However, the effect size became more pronounced (smaller) when nine to ten temporal NDVI samples were used. With fewer temporal samples, the upper ends of the tolerance intervals of NDVI approached zero (no effect). Overall, nest distance, landcover and HM-amplitude were significant predictors in more than 99% of the MCS trials of all MCS analyses. In contrast, NDVI was not a significant predictor in most of the individual MCS trials of the different MCS analyses. However, as more temporal NDVI samples were considered, the proportion of trials in which NDVI was a significant predictor increased from 11.47% (six samples) to 40.55% (ten samples) (Supplement: E).

Fig. 8
figure 8

Estimated effect size and effect direction (x-axis) (mean and tolerance intervals (99%/95%)) of the predictors used to model stork foraging habitat selection as a function of the number of temporal NDVI samples considered per MCS analysis (y-axis: 6 to 10 temporal NDVI samples). The four predictors considered are HM-amplitude (A), NDVI (B), nest distance (A/B) and landcover (grassland) (A/B). In addition, the original effect size and effect direction of the predictors (vertical lines) as found by Standfuß et al. (2022) are shown

Discussion

We have proposed a data-driven approach that allows the approximation of time series uncertainties in scenarios where official information on these uncertainties is lacking. Considering the time series-based differentiation of prey accessibility for storks by Standfuß et al. (2022) as an example, we applied our approach to characterise uncertainties of Landsat-based NDVI time series. Additionally, we used CTSP movement modelling to estimate uncertainties of telemetry data. This allowed us, for the first time, to assess whether the results of an animal-environment study are robust to cumulative uncertainties of the two input data sources. Our results showed that after accounting for input data uncertainties, periods of favourable/poor prey accessibility were well discriminated, with storks showing the expected degree of preference/avoidance for them. However, in grasslands rather than croplands, we found that more temporal NDVI samples were needed to reliably identify these periods. Furthermore, NDVI derived from fitted NDVI profiles did not appear to be a coherent predictor of stork habitat selection when uncertainties were considered. The latter two points highlight the importance of assessing the impact of uncertainties in input data, which is essential for validating results and identifying shortcomings. Below, we discuss our findings and the proposed approach for characterising time series uncertainties.

RQ1:

Is the time series-based differentiation of prey accessibility still valid when accounting for cumulative input data uncertainties?

We demonstrated a high degree of agreement between our estimated results—of analyses A and B—from an MCS analysis and the original results by Standfuß et al. (2022). However, we also identified inconsistencies, mainly related to the NDVI (analysis B). Against our expectations, we found that our estimated effect size of NDVI had less predictive power than the original effect size. In fact, the latter fell outside the estimated tolerance interval. This behaviour was likely due to the subsampling approach we used to approximate the uncertainty due to irregular temporal NDVI sampling. Specifically, we took subsample sets of the temporal NDVI samples within the range of the original observation dates. Most of these sets consequently captured the impact of fewer temporal samples compared to the original number of samples for a given field. The effect of a higher number of temporal samples could not be considered. Although not optimal, our results indicated that NDVI is more sensitive to a decrease in sample size than the HM-amplitude. Both predictors were derived from the fitted NDVI profiles. However, the former took the fitted NDVI value of a given day, whereas the latter considered the shape of the whole NDVI profile instead. Our results suggest that despite variations in the daily NDVI values, the shape of the intra-annual NDVI profile often remained stable even when fewer temporal NDVI samples were used for fitting. The inferred periods of favourable/poor prey accessibility were therefore likely consistent with the originally identified periods. This would explain the better and more robust performance of HM-amplitude in habitat selection modelling, and the discrepancy in predictive power and significance of NDVI between the original (Standfuß et al. 2022) and our uncertainty-adjusted estimate.

Besides, we found that nest distance and landcover are hardly affected by the uncertainties addressed. As these predictors were not derived from the NDVI time series, they were only sensitive to the telemetry data uncertainty. E-obs telemetry data have a mean spatial uncertainty of ~ 10 m (Rotics et al. 2016), which is further amplified by the uncertainty associated with irregular temporal sampling (i.e. 5 min at best). However, these uncertainties do not seem to have led to a high degree of spatial mismatch. Consequently, we found little scatter in the estimates of nest distance and landcover.

RQ2:

Does the number of temporal NDVI samples influence time series-based differentiation of prey accessibility?

Our results showed that the number of temporal NDVI samples used to fit the NDVI profiles affected the validity of the time series-based differentiation of prey accessibility differently. In analysis A, we found that the results for grasslands were strongly dependent on the number of temporal NDVI samples. A possible explanation is the higher complexity of the intra-annual NDVI profiles of grasslands compared to croplands. Croplands in Europe are mainly harvested once a year (Estel et al. 2016), resulting in an intra-annual NDVI profile that increases to a (single) peak during the growing season and then declines due to senescence/harvesting practices (Veloso et al. 2017). Intensively cultivated grasslands, are mown twice or more and can have a more dynamic profile with multiple peaks (Itzerott and Kaden 2006; Franke et al. 2012). Conversely, this means that if fewer—six to eight—temporal NDVI samples are considered to fit the NDVI profiles, the mowing events may not always have been adequately captured (Griffiths et al. 2020). This may have affected the temporal distribution of identified periods of favourable/poor prey accessibility in grasslands and was a likely reason for the observed discrepancy with the original results of Standfuß et al. (2022). Our results for analysis B suggest that nest distance, landcover and HM-amplitude were relevant predictors of stork foraging habitat selection. HM-amplitude was even significant regardless of the number of temporal NDVI samples considered. The contribution of NDVI was less clear, but appeared to become more stable as more temporal data samples were used. A possible reason could be that the difference between the fitted NDVI profiles per field derived during the different trials per MCS analysis became smaller when more temporal samples were used. This in turn would have led to a smaller scatter of NDVI values per day and thus, a higher robustness in the predictive power of NDVI. In particular, the more complex the grassland profiles were, the more likely it was that they could be captured more reliably with a larger number of temporal NDVI samples. This would also explain why we observed improvements from nine temporal NDVI samples onwards; an increase in the predictive power/significance of NDVI and HM-amplitude (analysis B) and a more pronounced preference for grassland habitats with favourable prey accessibility (analysis A).

While we have shown a way to assess the effects of cumulative input data uncertainties in animal-environment research, some methodological challenges remain. First, the suggested subsampling approach may not be the most appropriate method for handling time series uncertainties. However, it may be the only option when official information about these uncertainties is lacking. In such cases, our approach enables to obtain a preliminary indication of the stability of the environmental predictors/proxies and of whether they are experiencing strong fluctuations that may reduce their significance in explaining habitat selection. However, radiometric uncertainty is the result of, amongst others, instrument noise and calibration uncertainties, that can be scene and time dependent (Janesick 2001; Chander et al. 2013). These factors have not been considered in the proposed subsampling method. For Sentinel-2 data, more sensor-specific calculation of radiometric uncertainty for Level-1 (top-of-atmosphere) data is already feasible (Gorroño et al. 2018) and is currently being extended to include some effects of atmospheric correction (Level-2 – surface reflectance) (Gorroño et al. 2023). Future research should therefore explore the use of these more causal estimates of radiometric uncertainty. For datasets where such information is not readily available, it is recommended to consider extending our proposed subsampling technique. One option could involve KDE weighting of the PDFs with known data quality layers to characterise the input data uncertainty in a more sensor-oriented manner. In addition, the increasing availability of time series with high spatial (10–30 m) and temporal resolution (~ 2 days) provides new opportunities to assess the effects of irregular temporal NDVI sampling. Using these data, future studies could not just explore the impact of the number of samples but also of their temporal distribution.

Second, the MCS analysis is a computationally intensive method. In our study we examined a relatively small study area, the breeding territories of 18 storks. In total, we performed six MCS analyses, each of which required more than 1,000 runs and several days of computation to converge. This will be challenging in studies covering larger areas, such as several countries or even continents. Future research should therefore look at ways of optimising the calculation, in addition to pure programming refinements. One approach could be to test alternative sampling schemes, such as Latin Hypercube or Quasi-Monte Carlo, which have been shown to converge faster than classical Monte Carlo approaches (Soboĺ 1990; Singhee and Rutenbar 2010).

Conclusion

Human-induced environmental change is threatening many species with extinction (Ceballos et al. 2015). After a long series of political efforts to protect the world’s biodiversity, the Kunming-Montreal Global Biodiversity Framework was released in December 2022, outlining, amongst others, targets that require immediate attention (CBD 2022). One focus is on the design of comprehensive conservation efforts, which depend on a sound understanding of how animals interact with their environment. Of particular interest are vegetation characteristics (Timmermans and Daniel Kissling 2023), which are often linked to the availability of food, shelter and nesting sites in animal habitats (He et al. 2015). Sensors such as Sentinel-2 and/or Landsat, offer an unprecedented potential to infer vegetation characteristics in great detail (e.g., through NDVI or half-maximum related proxies). Combining this information with animal movements from telemetry data then allows comprehensive studies of habitat selection, which provides the basis for predicting species distributions (Northrup et al. 2021). However, uncertainties in both data sources can lead to erroneous conclusions about the importance of environmental predictors and thus to uninformative predictions (Stoklosa et al. 2014). The impact of input data uncertainties on habitat selection and species distribution modelling should therefore be quantified to develop informed conservation measures (Jansen et al. 2022). However, to date, studies based on time series and telemetry data have only assessed the effect of uncertainties in telemetry data, not least because time series uncertainties are often not reported or difficult to obtain. As a fallback solution, we have proposed a data-driven approach to approximate time series uncertainties. In addition, we performed a first quantitative assessment of the impact of cumulative uncertainties of time series and telemetry data based on the study by Standfuß et al. (2022). We showed that their discrimination between periods of favourable/poor prey accessibility for storks was relatively robust. However, we also found that NDVI was significantly affected by uncertainties in the input data. This is not necessarily a generalisable finding and should be investigated on a case-by-case basis, as this behaviour may be different in other locations or with other data. Nevertheless, it demonstrates the importance of considering input data uncertainties to avoid false conclusions and misguided conservation efforts. Although further research is needed, we hope that our study provides a starting point for assessing the cumulative uncertainty of time series and telemetry data in animal-environment studies.