1 Introduction

Models describing the distribution patterns of species are important scientific tools for improving our understanding of biodiversity and species abundance, thereby enabling informed decisions for sustainable biodiversity management. This significance is particularly pronounced in the marine environment, where numerous species, habitats, and ecosystems have experienced substantial declines (McCauley et al. 2015), also due to climate change (Hoegh-Guldberg and Bruno 2010; Pires et al. 2018).

To effectively monitor fishery resources, spatiotemporal analyses of abundance indexes have been developed [e.g., Paradinas et al. (2017); Pennino et al. (2020); Dambrine et al. (2021)]. Particularly, species distribution modeling (SDM) has been employed to comprehend species dynamics, identify and assess the impacts of climate change, define species habitats, and design protected areas (Martínez-Minaya et al. 2018).

Several approaches have been established in the realm of SDM over the past few decades. Indeed, SDMs can be employed to establish connections between species occurrence and/or abundance processes and environmental conditions. They also aid in identifying species’ ecological niches and drawing inferences regarding species distribution. Leathwick et al. (2005) employed multivariate adaptive regression splines to elucidate nonlinear relationships between environmental factors and species in New Zealand’s freshwater ecosystems. Reiss et al. (2011); Attorre et al. (2011) conducted comparative studies involving diverse SDM techniques, encompassing regression, classification, envelope models, machine learning approaches, and spatial interpolation models. Their studies evaluated the environmental impacts on marine benthic species in the North Sea and tree species in the Italian Peninsula, respectively. To accommodate zero-inflated data within six bird species abundance indexes, Joseph et al. (2009) utilized zero-inflated N-mixture models. The growing recognition of the importance of integrating prior knowledge about species ecology and complex dynamics into the modeling process has driven an increased adoption of Bayesian approaches (Martínez-Minaya et al. 2018). Notably, negative binomial hidden Markov models, as well as hierarchical Bayesian models, were applied by Spezia et al. (2014); Paradinas et al. (2017); Adde et al. (2020), respectively.

Species distribution data often exhibits residual spatial autocorrelation, indicating that observations are not conditionally independent in space. The spatial autocorrelation frequently arises due to the oversight of significant environmental factors, such as climate conditions that influence species distribution, as well as intrinsic factors like competition, dispersal, and aggregation (Miller 2012; Guélat and Kéry 2018). Indeed, contrasting the application of spatial and non-spatial methods on the same dataset can yield different conclusions (Kühn 2007; Kneib et al. 2008). Moreover, a simulation study conducted by Guélat and Kéry (2018) revealed that neglecting spatial autocorrelation can lead to erroneous predictions, even when dealing with extensive datasets. Hence, it is imperative to account for spatial autocorrelation in SDM (Martínez-Minaya et al. 2018). However, certain case studies may involve regions of interest with distinctive shapes or boundaries, such as coastlines, introducing physical barriers. In such cases, the application of classical spatial models like the Matérn field may be misleading due to inherent smoothing over these physical barriers (Bakka et al. 2019).

In addition to the spatial distribution, the temporal scale constitutes an important component that merits consideration within the modeling process since species abundance exhibits variability both in time and space (Hefley and Hooten 2016) and its temporal evolution represents an ecological interest (Paradinas et al. 2017; Martínez-Minaya et al. 2018). Time series analysis operates on a similar principle as spatial statistics, where observations closer in time are more closely related than distant ones.

The objective of the present study is to estimate the spatiotemporal distribution of the European sardine (Sardina pilchardus, Walbaum 1792) in the Northern part of the Canary upwelling system (Portuguese shelf). The study seeks to pinpoint the environmental drivers influencing sardine spatial dynamics, comprehend sardine dynamics across both time and space, and characterize the study region based on sardine occupancy.

The European sardine, a small pelagic fish, inhabitats the eastern North Atlantic Ocean (from the North Sea to Senegal), the Mediterranean Sea, the Sea of Marmara, and the Black Sea (Whitehead 1985). It holds socioeconomic importance for Portugal and Spain, emerging as a key species for the purse-seine fishery (Mendes et al. 2019). The late 1990 s witnessed a positive phase in the sardine stock, the past decade recorded persistently low biomass levels (ICES 2017; Pennino et al. 2020; Izquierdo et al. 2022), whereas recent years have witnessed a revival (Cabrero et al. 2019). These fluctuations may be attributed to oceanographic conditions variability, including climate-driven changes that have been demonstrated to affect small pelagic fish generally (Checkley et al. 2017), and the dynamics of the European sardine population specifically (Cabrero et al. 2019; Garrido et al. 2017).

Numerous studies have been undertaken to unravel the intricate aspects of sardine distribution, its habitat preferences, and its interplay with the marine ecosystem. Noteworthy examples include investigations into the effects of environmental conditions on sardine abundance and distribution in Mediterranean waters (Bellido et al. 2008; Voulgaridou and Stergiou 2003; Gordó-Vilaseca et al. 2021), the northwest coast of Africa witnessed (Bacha et al. 2017), the Alboran Sea (Jghab et al. 2019), the Bay of Biscay (Alvarez and Chifflet 2012), and the Portuguese continental coast and the northern Spanish waters (Santos et al. 2012). Despite the socioeconomic importance of this species, the understanding of sardine distribution on the Portuguese shelf remains limited. While Santos et al. (2012); Zwolinski et al. (2010); Rodríguez-Climent et al. (2017) delved into the temporal distribution of sardine and its association with environmental conditions, the spatial dimension was regrettably overlooked. In a different vein, Izquierdo et al. (2022) employed a Bayesian spatiotemporal approach to model the standardized sardine catch-per-unit-effort (CPUE) along the west coast of Portugal. Nonetheless, strong relationships between sardine CPUE and environmental conditions remained elusive.

This paper presents a proposal for hierarchical spatiotemporal SDM designed to estimate the sardine distribution. The modeling of spatiotemporal dynamics for species distribution poses significant challenges, stemming from the growing complexity of the data, data collection methodologies, and the specific characteristics and evolutionary patterns of each species. Moreover, addressing hurdles such as the semi-continuous nature of the response variable, excessive zero values, discrepancies between presence and biomass processes, and the complex geomorphology of the study region adds further complexity. To effectively address these challenges, we propose a two-part model that decomposes the problem into a series of interconnected levels through probability functions. Our approach adopts a Bayesian framework for modeling species distribution, with the inference process facilitated by the integrated nested Laplace approximation (INLA) methodology (Rue et al. 2009). INLA enables the approximation of posterior marginals of latent Gaussian fields (GFs). Additionally, we utilized the stochastic partial differential equation (SPDE) technique to approximate a Barrier spatial GF (Bakka et al. 2019) with Matérn covariance functions into a discretely indexed Gaussian Markov random field (GMRF) (Lindgren et al. 2011). This choice is made due to the computationally intensive nature of factorizing dense covariance matrices. Moreover, our proposed model encompasses spatiotemporal and temporal effects, incorporating environmental conditions from the recent past. This facet permits the suggestion of different time lags, each accompanied by associated weights for every covariate. Finally, the introduced framework facilitates the construction of maps of species occupancy that can provide valuable insights for decision makers striving for marine sustainability.

This manuscript is structured into four main sections. Section 2 outlines the data, the study area, and the methodology employed. In Sect. 3, we present the obtained results, followed by a comprehensive discussion of the findings in Sect. 4.

2 Material and Methods

2.1 Data

The spatial distribution of sardine biomass was evaluated based on Spring annual acoustic surveys from the Portuguese sprinc acoustic (PELAGO) series, conducted by the Portuguese Institute for the Sea and Atmosphere (IPMA) in Continental Portuguese waters (Fig. 1). Positioned at the northern extent of the Canary upwelling ecosystem, this marine region exhibits distinct topographic and oceanographic features along its western (west of 9\(^\circ \)W) and southern (east of 9\(^\circ \)W) coasts.

On the west coast, the presence of coastal upwelling dominates during spring/summer, characterized by a southward flowing upwelling jet over the shelf, cold water fronts, and filaments. These features arise due to the persistent northerly winds (Relvas et al. 2007; Teles-Machado et al. 2016). In contrast, winter on the west coast is dominated by the Iberian Poleward current, transporting warmer and saltier waters northward. The season is marked by warm fronts, eddies (Teles-Machado et al. 2016), and buoyant plumes of lower salinity arising from enhanced winter runoff (Peliz et al. 2002).

The southern coast is demarcated by Cape Santa Maria, a geographical boundary segregating distinct oceanographic regimes. To the west, a cyclonic cell, representative of the spring/summer circulation, propels west coast upwelling waters eastward into the Gulf of Cadiz (García-Lafuente et al. 2006).

Fig. 1
figure 1

Study area map. Dashed and black lines indicate bathymetric contours at 100 m and 200 m depths, respectively

The main objective of this survey series is to monitor the spatial distribution of abundance, biomass, and various biological parameters of sardine and other small pelagic fish. The survey design entails continuous daytime acoustic measurements along parallel transects, facilitated by a calibrated 38-kHz echosounder. To process the data, the resulting backscatter from the water column is integrated and averaged over 1nm intervals, expressed as nautical area-scattering coefficients [NASC; \(S_A\) (in m\(^2\)nm\(^{-2}\))]. The inter-transect distance varies is 6nm. The methodology underpinning the PELAGO series is detailed in Doray et al. (2021). Throughout the analyzed timeframe, 2000–2020 (with a gap in 2012), a total of 16,370 sardine NASC values were collected (refer to Table 1 of Supplementary Material S.1). Each NASC value, serving as a proportionate representation of fish density, is adopted as a proxy for biomass henceforth for a specific pair of coordinates (longitude and latitude). A visual representation of the data’s spatial distribution is available in Fig. 2.

Fig. 2
figure 2

Maps depicting the sardine biomass index (NASC, \(n^2 nm^{-2}\)) for each annual PELAGO survey conducted between 2000 and 2020 in Portuguese waters

To explore the relation between sardine distribution and environmental variables, a comprehensive dataset of daily environmental information was procured from the COPERNICUS server (https://resources.marine.copernicus.eu/products), encompassing the study region and time frame. Specifically, the dataset includes satellite-derived sea surface temperature (SST) measured in degrees Celsius (ESA SST CCI and C3S reprocessed sea surface temperature analyses https://doi.org/10.48670/moi-00169), as well as chlorophyll-a concentration in \(mg~m^{-3}\) (Global Ocean Colour project https://doi.org/10.48670/moi-00281), bathymetry in meters, intensity in \(m~s^{-1}\), and direction in degrees of ocean currents (Atlantic-Iberian Biscay Irish-Ocean Physics Reanalysis https://doi.org/10.48670/moi-00029).

The Portuguese coastline undergoes an abrupt change in direction, forming a “L” shape. This geographical feature introduces significant biomass index variability, particularly in the meridional (N-S) orientation along the west coast and in the zonal (E-W) orientation along south coast (refer to Fig. 2). This distinct pattern prompted the incorporation of a binary covariate, representing the coast with values “south” and “west” to capture discrepancies not accounted for by the environmental conditions.

2.2 Spatiotemporal Species Distribution Model

2.2.1 Two-Part Model

The species biomass distribution is articulated as the outcome of two distinct components: the presence distribution and the biomass distribution under presence (Eq. 1).

Let \(Y_{\textbf{s}t}\) be the spatiotemporal distributed biomass process at year \(t=1,\ldots ,T\) and location \(\textbf{s} \in \mathfrak {D} \subset \mathbb {R}^2\) where \(\mathfrak {D}\) represents the study region. \(Z_{\textbf{s}t}\) denotes the presence sub-process, taking the binary value 0 if no species was observed at location \(\textbf{s}\) and year t, and 1 otherwise. I.e., NASC \(=0\) indicates absense, while positive NASC values signify presence. \(Y_{\textbf{s}t} \vert (Z_{\textbf{s}t}=1)\) takes the positive value of biomass index (i.e., positive values of NASC) observed at location \(\textbf{s}\) and year t. Consequently, the distribution of (represented by \(\left[ . \right] \) in Eq. 1) the process of interest, species biomass index, is given by:

(1)

Considering the semi-continuous nature of the data, the process governing presence is assumed to come from a Bernoulli distribution with probability \(\pi _{\textbf{s}t}\). For the biomass process under the presence, a continuous distribution is required such as Gamma or log-Normal. In our case, we employed the Gamma distribution, parameterized by shape (\(a_{\textbf{s}t}\)) and scale (\(b_{\textbf{s}t}\)) parameters.

The proposed hierarchical model can be written as follows:

$$\begin{aligned} Z_{\textbf{s}t} \sim Bernoulli (\pi _{\textbf{s}t})\nonumber \\ logit(\pi _{\textbf{s}ti})=\alpha '+\beta ' X_{C\textbf{s}ti}+\sum _j^{p'}f'_j(&K(X'_{j\textbf{s}ti},c,l))+\gamma '_t+W_{\textbf{s}t}\nonumber \\ Y_{\textbf{s}t} \vert \left( Z_{\textbf{s}t} =1 \right) \sim Gamma (&a_{\textbf{s}t},b_{\textbf{s}t})\nonumber \\ log(\mu _{\textbf{s}ti})=log(a_{\textbf{s}ti}/b_{\textbf{s}ti})=\alpha +\beta X_{C\textbf{s}ti}+&\sum _j^{p}f_j(K(X_{j\textbf{s}ti},c,l))+\gamma _t+kW_{\textbf{s}t} \end{aligned}$$
(2)

where i designates the ith day of the survey during year t. For modeling the biomass index \(\mu _{\textbf{s}ti}\) under presence, we adopted the logarithmic link function, whereas the probability of presence \(\pi _{\textbf{s}t}\) was represented through the logistic function (Eq. 2). The functions \(f_j(.)\) and \(f'_j(.)\) denote smoother functions, particularly B-splines, thin plate, and cubic regression splines. The intercepts \(\alpha \) and \(\alpha '\) serve as regression constants. \(X_{C\textbf{s}ti}\) signifies the binary covariate with a value of 0 assigned to the west part of the study region and 1 to the south part. Consequently, \(\beta \) and \(\beta '\) capture the impact of the south coast in comparison to the west coast for each respective process.

\(K(\cdot )\) represents a weighted average of environmental covariates \(X_{j\textbf{s}ti}\) observed at location \(\textbf{s}\) and day i of year t such that:

$$\begin{aligned} K(X_{j\textbf{s}ti},c,l)=\sum _{q=-l}^{l} w_{c-q} X_{j\textbf{s}t(i-(c-q))}. \end{aligned}$$
(3)

The weights \(w_{c-q}\) are determined through a Gaussian kernel density (GKD) function, as proposed by Sheather (2004). In this context, c represents the central day, indicating the mode of the GKD, for a time interval prior to day i when the process of interest was observed. Additionally, l denotes the distance between the mode and the minimum (or maximum) of the GKD. Notably, the constraint \(0 \le l \le c\) is upheld. The GKD function necessitates the bandwidth parameter for its computation. In our study, this bandwidth parameter was estimated using the maximum likelihood cross-validation (MLCV) method (Habbema et al. 1974; Duin 1976). This approach involves estimating the log-likelihood of the density at the \(k^{th}\) observation, with all observations except the \(k^{th}\) being considered. The utilization of the \(K(\cdot )\) function stems from the requirement to account for the effects of environmental conditions with a temporal delay relative to the date when the species biomass index was either estimated or observed. In essence, this function permits the incorporation of covariates from specific past moments and their amalgamation, assigning distinct degrees of importance to different past moments. Supplementary Materials S.3 provide illustrative examples of this concept.

\(W_{\textbf{s}t}\) denotes a progressive spatiotemporal phenomenon (Paradinas et al. 2017). Underlying this phenomenon is a precision matrix \(\textbf{Q}\), governed by the structure matrix \(\textbf{R}_W\), where \(\textbf{Q}=\tau _W\textbf{R}_W\) and \(\tau _W\) is an unknown scalar. Consequently, the structure matrix delineates the nature of temporal and/or spatial interdependencies among elements of \(\textbf{W}\), and it can be factorized as the Kronecker product of structure matrices of the corresponding interacting main effects. In our analysis, the phenomenon exhibits temporal change following a first-order autoregressive process, as defined by the equation:

$$\begin{aligned} W_{\textbf{s}t}=\delta W_{\textbf{s}(t-1)}+\xi _{\textbf{s}t} \end{aligned}$$
(4)

where \(t=2,\ldots ,T\), \(\left| \delta \right| <1\), and \(W_{\textbf{s}1} \sim N(0,\sigma _W^2/(1-\delta ^2))\). The selection of the first-order autoregressive model was motivated by the limited temporal span of the data (20 years) and the evident rapid shifts in species distribution, driven largely by climatic changes. Meanwhile, \(\xi _{\textbf{s}t}\) denotes a zero-mean GF, assumed to be temporally independent. Thus, \(Cov(\xi _{\textbf{s}t},\xi _{\textbf{u}h})=Cov(\xi _{\textbf{s}},\xi _{\textbf{u}})\) for \(t=h\) and \(\textbf{s} \ne \textbf{u}\).

To address discrepancies between the west and south coasts and the unique geomorphology of the Portuguese continental coast (as depicted in Fig. 2), \(\xi _{\textbf{s}}\) embodies a Barrier spatial GF (Bakka et al. 2019). This approach is rooted in the Matérn covariance function, albeit diverging from conventional methods of computing distances between two points. The methodology accommodates two distinct sets of paths: one confined solely to the study region and the other traversing a physical barrier. For the former set, a stationary Matérn field with marginal variance \(\sigma ^2\) and a range denoted by \(\phi \) is utilized. The latter set involves a Matérn field with the same \(\sigma ^2\) and a range that approximates zero. Defining this range close to zero effectively nullifies correlation over the physical barrier, which is extraneous to the region of interest.

The interconnection between the two linear predictors is performed by a shared spatiotemporal latent field, \(W_{\textbf{s}t}\), governed by a scaling parameter k. This parameter serves to assume the same spatiotemporal correlation structure, while accommodating distinct covariance structures for each process. The proper interpretation of k within the presence probability linear predictor warrants attention. A value of \(k=1\) corresponds to equivalent covariance structures. Moreover, \(\vert k \vert <1\) signifies lower variability in presence process when compared to species biomass process, while \(\vert k \vert >1\) denotes a higher variability in presence process. In the context of INLA, this approach can be effectively implemented through the copy model, as elucidated by Krainski et al. (2019).

Additionally, temporal dynamics are accounted for through unstructured temporal effects, denoted by \(\gamma _t\) and \(\gamma '_t\), characterized by a Gaussian exchangeable prior with a mean of zero and precisions \(\tau _{\gamma }\) and \(\tau _{\gamma '}\), respectively. These effects are introduced to model variability that cannot be explained by environmental conditions or the spatiotemporal structure. Instead, they encompass variations stemming from survey-related factors, such as disparities in survey timing or the utilization of different vessels.

In order to maintain minimal informativeness, prior distributions were assigned as follows: \(\alpha ,~\alpha ' \sim N(0,\infty ),~\beta ,~\beta ' \sim N(0,1000),~k\sim N(1,0.1)\), \(log(\tau _{\gamma }),log(\tau _{\gamma ~'}) \sim log\)-Gamma(1, 0.00005) and \(log \left( \frac{1+\delta }{1-\delta } \right) \sim N \left( \frac{20}{3}\right) \). For \(\sigma ^2\) and \(\phi \), the prior specification adhered to the PC prior framework (Simpson et al. 2017). A sensitivity analysis about the priors of parameters was performed (see Supplementary Material S.6).

2.2.2 Model Evaluation

Concerning the evaluation and comparison of models, established metrics rooted in goodness of fit and complexity were employed to guide the selection of covariates and optimal parameter combinations for c and l in Eq. (3). Different combinations were examined for each covariate, with the exception of bathymetry, as it was considered a static covariate across time.

One metric frequently used for assessing model fit within the Bayesian framework is the deviance information criterion (DIC), introduced by Spiegelhalter et al. (2002). This criterion serves to identify models that best explain observed data, minimizing uncertainty concerning observations generated under similar conditions and at the same temporal juncture. The DIC comprises two components, one appraising model fit and the other gauging model complexity.

The conditional predictive ordinate (CPO) represents a metric rooted in the posterior predictive distribution and derived through cross-validation methods (Roos and Held 2011). Specifically, the CPO index is constructed using the posterior predictive distribution at the \(k^{th}\) observation, wherein the data is generated while excluding the \(k^{th}\) observation itself. In the context of two-part models, the CPO encompasses posterior predictive distributions of both estimated processes, denoted as \(P(\tilde{z}|z)\) and \(P((\tilde{y}|\tilde{z})|(y|z))\). A global measure of model fitting is given by the log-conditional predictive ordinates (LCPO).

Consequently, the selection of covariates, smoother terms, and c and l parameter combinations was guided by the pursuit of minimized DIC and LCPO values. The decision-making process is elucidated in detail in Supplementary Material S.4.

2.2.3 Areas of Occupancy

Given the acknowledged impact of ecosystem conditions on species distribution and behavior, the characterization of the study area in terms of species occupancy holds significant value for marine ecology and fisheries management. Four distinct categories of biomass index were delineated: rare, low occasional, high occasional, and preferred. The classification process was based on predictions of the biomass index and its associated uncertainty. Biomass index predictions were determined through the median of posterior predictive distribution, \(F^{-1}_{\tilde{Y}_{\textbf{s}t}}(0.5)\), derived for each new location using Equation (2) of Supplementary Material S.2. Concurrently, the level of uncertainty was defined by the uncertainty linked with the posterior distribution of the interest process, \(P\left( \vert W_{\textbf{s}t}\vert > \frac{4}{5}\sigma \right) \), since \(W_{\textbf{s}t}\) encapsulates unexplained variability inherent to the interest process, beyond the scope of considered covariates. \(\sigma \) denotes the estimated posterior mean of the marginal standard deviation, while \(\frac{4}{5}\) serves as a scaling parameter for the explained variability encapsulated by the spatial structure, the marginal standard deviation itself.

Two distinct thresholds were established, one for the biomass index and another for the uncertainty measure. These thresholds were informed by the annual median of the predicted biomass values within the study area, as well as the median of the uncertainty support - fixed at 0.5. Consequently, rare areas are characterized by biomass values falling below the biomass threshold and possessing low uncertainty (below the uncertainty threshold). low occasional areas feature low biomass values coupled with high uncertainty (above the uncertainty threshold). In contrast, high occasional areas are marked by elevated biomass values exceeding the biomass threshold, accompanied by high uncertainty. Preferred areas are distinguished by high biomass values and low uncertainty.

Furthermore, the exploration of the temporal evolution of these distinct areas can facilitate the identification of both favorable and unfavorable habitats—an integral aspect in defining species habitat and comprehending species dynamics. In this particular study, favorable areas are defined as those exhibiting a sustained period of preferred occupancy, while unfavorable areas are characterized by rare and persistent species occupancy. For the purposes of this study, persistence was defined by a duration of 3 years due to the limited time series. Two-part model was performed using the INLA approach in R software. The R code corresponding to the creation ok kernel time-lagged covariates, the fit of Barrier and the definition of occupancy areas is available on Github (https://github.com/SilvaPDaniela/Environmental-Effects-on-the-Spatio-Temporal-Variability-of-Sardine-Distribution).

3 Results

Bathymetry, chlorophyll-a concentration, and ocean current intensity surfaced as explanatory variables for both sardine presence and the biomass index, whereas SST exclusively contributed to explaining the species’ biomass index (Table 1, Fig. 3).

The best model embraced purely unstructured annual effects, symbolized by \(\gamma _t\) and \(\gamma '_t\), for the biomass and presence processes (Table 1), respectively. Although structured annual effects, characterized by linear effects or autoregressive processes with varying orders, were also considered, their incorporation did not yield a discernible enhancement in model performance.

Table 1 Illustration of the most effective sardine biomass model for the Portuguese continental coast based on performance metrics, DIC and LCPO

Notably, the coastal indicator solely demonstrated relevance in explaining the species’ presence, with the corresponding parameter yielding a posterior mean of 0.89 (Figure 3 of Supplementary Material S.5). This finding indicates that, on average, the probability of sardine occurrence is more than twice as high in the southern region compared to the western coast. Chlorophyll-a observed in the four days leading up to, and including, the day of biomass index determination emerged as influential factors in explaining sardine presence. Interestingly, the effect of chlorophyll-a exhibited a pronounced inflection point at 0.45mg m\(^{-3}\), with a decreasing impact up to this threshold followed by an ascending effect beyond it. Noteworthy patterns also emerged in relation to water depth, where the probability of presence exhibited a peak at 30 m before declining. Bathymetry demonstrated limited influence, becoming irrelevant beyond a depth about 615 m. The probability of sardine presence was positively correlated with lower current intensity values observed within time interval from 20 to 28 days prior to biomass index determination (peaking around 0.08m s\(^{-1}\), 0.16m s\(^{-1}\), and 0.22m s\(^{-1}\)) while exhibiting a negative relationship beyond 0.26m s\(^{-1}\), though the intensity effect became irrelevant beyond a threshold of 0.37m s\(^{-1}\).

The sardine biomass index showcases a distinctive pattern in relation to SST, where an ascending trend is evident up to 14\(^\circ \)C, followed by a relevant decline starting at 15.3\(^\circ \)C with increasing SST. The biomass index displays an increasing trend aligned with chlorophyll-a levels, observed 17 days prior to sardine biomass estimation, up to 1.14mg m\(^{-3}\). Subsequently, a prominent decrease manifests up to 1.25mg m\(^{-3}\), followed by another relevant upsurge within the range of 10 to 30 mg m\(^{-3}\). The relationship between sardine biomass and bathymetry also reveals noteworthy insights, with a positive correlation observed. This influence is most pronounced at depths of up to 125 m, beyond which its effect becomes marginal, ultimately tapering off at depths exceeding 420 m. The impact of intensity, within the temporal interval spanning from 4 to 16 days prior to biomass index determination. On sardine biomass index, it is characterized by oscillating patterns, featuring four distinctive peaks of positive effect at 0.02, 0.1, 0.18, and 0.67m s\(^{-1}\). Beyond the threshold of 0.7m s\(^{-1}\), the influence of ocean current intensity becomes negligible.

Fig. 3
figure 3

Fixed (environmental) effects for sardine presence (first column) and sardine biomass (second column) derived from biomass index modeling during the PELAGO surveys conducted between 2000 and 2020. Certain covariates are represented by acronyms (SST: sea surface temperature, CHL: chlorophyll-a concentration, INT: intensity of ocean currents). Vertical lines depict the 80% quantiles for each observed covariate, and K() refers to the function defined in Eq. (3)

The obtained posterior predictive distributions for the spatial covariance parameters as well as for the time correlation parameter were relevant (Figure 4 Supplementary Material S.5). The spatial autocorrelation was almost null from about 86 Km, while the mean annual dependence was estimated at approximately 0.18. The unstructured annual effects conveyed an annual variability of 2.38 for presence and 0.41 for the biomass index.

Both processes hinged on an identical spatiotemporal structure, scaled by the parameter k. The estimated mean of k stood at \(-\)0.48, indicating that the remaining unexplained variance in the biomass process under presence diverges notably from that in the presence process.

Upon completing the modeling process, we gained the capacity to map the mean of the posterior predictive distributions of presence probability (\(E\left[ \tilde{Z}_{\textbf{s}t} \right] \)) (Fig. 4) and the median of the posterior predictive distribution of the biomass index (\(F^{-1}_{\tilde{Y}_{\textbf{s}t}}(0.5)\)) (Fig. 5) for specific days. In this endeavor, we opted to employ the central days corresponding to the surveys under investigation, while also designating May 10, 2012, as a representative day for the omitted year within the dataset. This choice emerged from its positioning midway between the chosen days for 2011 and 2013. An examination of the resulting maps reveals predominant instances of low presence probability across the majority of the study area, particularly concentrated between Peniche and Lisboa. Remarkably, regions situated between Viana do Castelo and Porto, as well as those stretching from Aveiro to Figueira da Foz, often manifested elevated biomass values.

Figure 6 enables the identification of sustained sardine distribution areas over time. Noteworthy is the observation of a preferred occupancy of the species along the coastal expanse between Porto and Aveiro. Conversely, deeper zones along the west coast and along with coastal areas at the south coast, typified occasional occupancy of sardine.

Fig. 4
figure 4

Map depicting the mean of the posterior predictive distribution for the probability of sardine presence (\(E[\tilde{Z}_{\textbf{s}t}]\)) on the representative day of each survey, obtained from the modeling of the biomass index during the PELAGO surveys conducted between 2000 and 2020

Fig. 5
figure 5

Map depicting the median of the posterior predictive distribution for the biomass index, \(F^{-1}_{\tilde{Y}_{\textbf{s}t}}\), (\(m^2~nm^{-2}\), on the representative day of each survey, obtained from the modeling of the biomass index during the PELAGO surveys conducted between 2000 and 2020

Fig. 6
figure 6

Maps illustrating the sardine occupancy areas categorized as rare, low occasional, high occasional, and preferred areas, based on the representative day of each survey

During the time span of 2000–2002, a predominance of favorable zones contrasted with unfavorable counterparts, exhibiting a converse trend during the period of 2018–2020 (Fig. 7). In the first period, most of favorable zones were concentrated north of Lisboa, while these zones were more dispersed in the latter period. Despite distinct persistent zones being identified within each period, certain similarities can be discerned. An area located proximate to the coast and situated to the south of the western coastal region was consistently deemed unfavorable in both epochs. Upon juxtaposing the two periods, substantial changes become evident in the western extent of the study region, particularly in the northern vicinity of Lisboa. Furthermore, a proliferation of unfavorable zones materialized in recent years, with two distinguishable regions emerging—one between Viana do Castelo and Porto, and another between Peniche and Lisboa. The paucity of persistent zones aligns with the marginal value of the time correlation parameter (Figure 4 of Supplementary Material S.5).

Fig. 7
figure 7

Representation of favorable (preferred and persistent) and unfavorable (rare and persistent) sardine zones for two distinct 3-year periods: 2000–2002 (left) and 2018–2020 (right)

4 Discussion

Within the realm of SDM, a central aim lies in devising comprehensive methodologies that encapsulate the intricate interplay between the species and its ecosystem. However, reliance solely on environmental factors might fall short in capturing the nuanced relationship between species and environment, particularly in the case of small pelagic species, known for their heightened sensitivity to environmental fluctuations (Schickele et al. 2021). In this study, we put forth a dynamic spatiotemporal SDM, tailored to predict sardine distribution across the waters of the Portuguese shelf and relate the sardine behavior with environmental conditions in a flexible way. We used a hierarchical Bayesian spatiotemporal approach capable of dealing with the complex spatiotemporal dynamics underlying biological phenomena. Environmental information was incorporated referenced in both time and space, in a temporally lagged way, to relate the ecosystem conditions with the process of interest.

Paradinas et al. (2017) and Izquierdo et al. (2022) have made substantial contributions to our current comprehension of the two-part model framework within the realm of spatiotemporal SDM. However, a crucial distinction emerges: both models lack the capability to accommodate the effects of environmental conditions with temporal lags. This omission becomes particularly significant due to the pivotal role of temporal lags in shaping environmental influences. Therefore, we elevated its capacity to encapsulate the intricacies of ecological marine dynamics.

Several studies have employed distributed lag approaches in their analyses. For instance, Gordó-Vilaseca et al. (2021) incorporated covariates with time lags to elucidate the spawning patterns of sardine and anchovy, while Tredennick et al. (2016) utilized time-lagged covariates to investigate the impact of climate change on plant populations. When comparing with these approaches, our proposed methodology gains dynamism, as it employs shorter time lags (measured in days rather than months) and permits a diverse array of time period combinations, each with distinct weights. In essence, our approach affords the flexibility for time lags to encompass intervals, rendering the model a more comprehensive and realistic framework capable of mitigating prediction bias and addressing outliers stemming from extreme events. Heaton and Peng (2012) proposed a generalized linear model that incorporates temporal lags to capture the effects of heat on mortality in various US metropolitan areas. This approach necessitates the consideration of temporal lags over a wide interval, encompassing recent and distant past lags that exhibit decreasing correlation as the lag duration increases. However, recent past lags may not always be pertinent, as illustrated by the effect of chlorophyll-a on fish distribution and abundance. In this case, favorable chlorophyll-a conditions may require several weeks to influence zooplankton abundance and consequently attract fish to the area (Bellido et al. 2008). Pugh et al. (2019) introduced a spatiotemporal model for crop yield, incorporating soil water content as a weighted average observed at different temporal lags. The weighting was determined by a spatiotemporal kernel utilizing separable effects, which were discretized over a grid of time periods and locations due to computational constraints. However, this approach may prove restrictive when applied to species distribution data, considering that sampling locations evolve over time, and nonlinear effects of covariates need to be accounted for. Our proposed approach circumvents these limitations by accommodating a broader range of lag intervals and treating the spatial domain as continuous. It is important to highlight that applying time lags to static covariates may lack conceptual relevance in certain contexts.

Effective fish distribution monitoring is integral to harmonizing fisheries with marine spatial planning (Janßen et al. 2018). Successful fisheries management hinges on comprehending fish population spatial dynamics. In this context, we introduced a classification of the study region based on model outputs, offering an accessible alternative to complex distribution maps. This classification, simpler to interpret yet equally essential, aids conservation decisions and targeted management strategies in two scenarios. Firstly, it supports the protection of one species while pursuing another, optimizing catch outcomes. The proposed classification scheme identifies potential no-take zones, focusing on areas where protected species tend to gather, thereby minimizing their capture while permitting other species’ capture within the fishery. Secondly, our classification framework proves invaluable for conserving juvenile fish populations. By designating areas as preferred or even preferred plus high occasional for juveniles, these regions can be established as no-take zones within the stock’s habitat. Such spatial insight is pivotal for creating effective Marine Protected Areas, as underlined by Grüss et al. (2019).

Our classification process centers on predicted biomass estimates and associated uncertainties. To quantify these uncertainties, we examine the latent field probability in relation to a chosen scale of the marginal standard deviation. For our investigation, we adopted a scale of 4/5, albeit determined empirically. However, we advocate for a decision driven by biological insights and expert judgment to ensure credible and meaningful outcomes. Furthermore, exploring the persistence of rare and preferred zones offers valuable insights into species distribution shifts. Although our delineation is temporally limited due to a short time series, it is conceivable to extend this definition by incorporating multiple consecutive time periods. Such adaptations guided by biological insights or empirical testing offer promising avenues for further exploration.

Within the context of equivalent environmental conditions, a conspicuous divergence in sardine presence emerged, with diminished occurrences along the west coast as compared to the southern counterpart. This divergence could potentially be attributed to a higher incidence of species absences observed in the west comparing to the south during most surveys. The zenith of sardine biomass index was concentrated in the northern vicinity near the coast, emblematic of more prolific zones attributed to the presence of river mouths and estuaries (Monteiro 2017; Zwolinski et al. 2010). The perpetuation of certain ecological niches over time was particularly conspicuous along the west coast. Remarkably, the influence of depth exhibited a nonlinear relationship with both occurrence and biomass, revealing sardine’s affinity for shallow waters (up to 85 m). This affinity for shallow depths has been corroborated by other studies, including those conducted in Spanish Mediterranean waters (Bellido et al. 2008), the Northwestern Mediterranean Sea (Saraux et al. 2014), and the western Iberian coast (Zwolinski et al. 2010). While the inclusion of current direction yielded minimal enhancement to the model, the incorporation of current intensity demonstrated discernible improvement. Conversely, Izquierdo et al. (2022) noted a slight enhancement in their model’s performance with the inclusion of current intensity, although it was excluded from their final model. In our point of view, the ocean current effect on sardine biomass is not easy to understand and deserves further investigation, to try to identify the oceanographic processes signaled by current velocity that might be affecting sardine distributions, such as, e.g., the effect of the ocean fronts (Palomera 1992). Sardine biomass index exhibited pronounced elevation in regions characterized by SST ranging between 13.1 and 14.6\(^\circ \)C, in line with existing literature on the distribution of sardine eggs (Coombs et al. 2006), sardine larvae (Garrido et al. 2016), and sardine reproduction (Zwolinski et al. 2010). A similar trend in the SST effect on sardine was discerned in the NW Mediterranean Sea (Gordó-Vilaseca et al. 2021), albeit the optimal temperature was estimated around 13.5\(^\circ \)C, during a period of lower temperatures. Chlorophyll-a, an indicator of primary productivity, emerged as a pivotal determinant for both processes which resonates with studies by Bacha et al. (2017); Garrido et al. (2017); Gordó-Vilaseca et al. (2021); Izquierdo et al. (2022). The temporal lag between chlorophyll-a and sardine biomass potentially mirrors the lag between phytoplankton and zooplankton blooms—a primary source of sustenance for sardines—spanning from larvae to mature individuals (Garrido et al. 2008).

The spatiotemporal structure contributed to a modest annual dependency in the sardine biomass index. Santos et al. (2012) discerned that sardine recruitment in a specific year is contingent upon the recruitment observed in the preceding year. On the other hand, Borges et al. (2003) conducted a time series analysis of sardine catch data spanning from 1946 to 1990, revealing that sardine catch is influenced by catches recorded up to 14 years earlier. Indeed, the last decades have been characterized by more recurrent and impactful environmental changes (Tang 2019) which makes it difficult to detect strong annual dependencies.

Studying the dependency between the fish abundance and environmental variables augments our understanding of abundance dynamics, facilitates habitat delineation, and enhances our capacity to prognosticate marine species trends. Our study underscored the utility of Bayesian spatiotemporal modeling in elucidating species distribution patterns. While our investigation focused on a specific species inhabiting the Portuguese shelf, the proposed approach can be readily extrapolated to diverse species sets and geographical domains. We advocate for the exploration of various time lag combinations for covariates, guided by prior knowledge (e.g., ecological insights), given the inherent computational challenges of Bayesian methodologies, especially in higher-dimensional datasets.