Introduction

Eutrophication of coastal ecosystems is a global problem, especially in coastal lagoons (Nixon 1995; Cloern 2001). Increased nutrient inputs amplified by urbanization, agriculture or industry, lead to complex direct and indirect natural ecosystem responses (Schramm 1999; Viaroli et al. 2008). Anoxic crises, toxic algal blooms, loss of biodiversity, and, more broadly, deterioration of ecosystem functions and services are all consequences of anthropogenic eutrophication (Cloern 2001; Zaldívar et al. 2008a,b).

For decades, Mediterranean coastal lagoons have been exposed to anthropogenic eutrophication and are among the most vulnerable systems to such pressures (Viaroli et al. 2005; Zaldívar et al. 2008a, b; Souchu et al. 2010). They are influenced by intensively farmed and densely populated watersheds, particularly during the summer, when the Mediterranean Sea becomes a major tourist attraction across the world (Vogiatzakis et al. 2006). In addition to urban pressure, these ecosystems are particularly vulnerable to eutrophication due to their transitional condition (de Jonge et al. 2002; Newton et al. 2014), The Palavasian complex, for example, is composed of eight lagoons located along the French Mediterranean coast that have seen extensive eutrophication over the last four decades, owing mostly to nutrient over-enrichment from constant sewage discharges (Leruste et al. 2016). According to García-Ayllón (2017), the Mar Menor lagoon, located in the east of Spain’s Region of Murcia, has undergone a considerable process of severe anthropization during the preceding five decades. The rapid population increase of new jellyfish species, which reached over 100 million specimens every summer, was one of the most surprising indicators (Robledano et al. 2011). Thau Lagoon is another interesting example of Mediterranean coastal lagoon eutrophication. Thau lagoon is a coastal environment located on the Mediterranean French coast that is recognized for sustaining traditional shellfish farming in France. It has been subjected to eutrophication, which has resulted in large anoxic occurrences linked with enormous shellfish mortalities (Derolez et al. 2020). The Ghar el Melh lagoon, located on Tunisia’s north Mediterranean coast, is an excellent model for studying the eutrophication process in coastal Mediterranean lagoons. According to Shili et al. (2002), this lagoon had many dystrophic crises between 1994 and 1996. Furthermore, Turki et al. documented the proliferation of toxic algae species in the lagoon in 2007, including Kryptoperidinium foliaceum, Prorocentrum micans, and Anabaena sp.

Despite the availability of a large number of studies on eutrophication processes in natural aquatic habitats (García-Pintado et al. 2007; Trabelsi-Bahri et al. 2013; Derolez et al. 2020), very little progress has been made in anticipating and preventing eutrophication and its consequences in these ecosystems.

Targeted monitoring of major water pollution indicators can detect episodes of eutrophication in coastal areas. For example, chlorophyll-a (Chl-a) is the most significant pigment in aerobic photosynthetic species, and its measurement is used to evaluate the quantity of phytoplankton biomass in the water, the potential of algal bloom events, and, as a result, the degree of environmental eutrophication (Yu et al. 2019).

Traditionnaly, Chl-a in-situ sampling and measurement statetgies have necessitated regular monitoring and laboratory studies (Oh et al. 2007). Due to several constraints, such as (i) field monitoring costs, (ii) staff availability and resources, (iii) field safety issues, and (iv) long time intervals between data collection, reporting, and public notification, these programs have limited capacity for environmental managers to adequately monitor and respond to eutrophication episodes (Oh et al. 2007). As a result, in order to decrease the cost and time required for in-situ monitoring and laboratory studies, an early-warning proactive approach to Chl-a is required to avoid or mitigate the incidence of eutrophication and, ultimately, to reduce its risk on water bodies (Oh et al. 2007).

Forecasting potential algal bloom occurrences is a critical emergency management tool for monitoring water quality and protecting the aquatic system (Villanoy et al. 2005; Dippner et al. 2011; Recknagel et al. 2013, Recknagel et al. 2014). This has piqued ecologists’ curiosity, resulting in the creation of eutrophication or algal bloom predictive modeling techniques (Chen et al. 2015).

In general, these models are classified into two types: physical-based approaches and data-driven approaches. Physical-based simulations reveal the fundamental mechanisms for algae growth and outbreak (Hood et al. 2006; Zhang et al. 2013). Many physical, chemical, and biological processes, however, remain unknown due to the complexity of marine ecosystems and the wide variety of factors that require calibration. Such models are mostly used for scenario analysis rather than prediction (Recknagel et al. 2014).

Data-driven models based on autoregression, multivariate regression, piecewise regression, or Artificial Neural Networks (ANN) have been rapidly developed and applied to eutrophication and algal bloom forecasting as an alternative. When an explicit function to characterize the modeled system is not provided, ANNs are effective in predicting data (Chen et al. 2015). In 2007, Oh et al. have applied two ANN models to identify temporal phytoplankton community patterns in the Daechung reservoir (Korea). Wang et al. built an ANN model in 2010 to forecast cyanobacterial blooms based on meteorological variables for use in early warning systems in Dianchi Lake (China). However, because the construction of an ANN model is still based on trial and error, it lacks generic direction (Chen et al. 2015). Furthermore, ANN models are prone to falling into a local optimum (Hecht-Nielsen 1989). Regression models are set to identify governing factors and establish their approximate relationship to dependent variables (Cui et al. 2007; Onderka 2007; Davis et al. 2009; Wilhelm et al. 2011). In 1982, Ouchi had applied principle component analysis to develop a multivariate prediction model for algal biomass in the Northern Hiroshima Bay (Japan). In 2007, Lui et al. developed a vector-based autoregression model for algal bloom forecasting in Hong Kong waters. These models often have a good prediction performance for stationary systems. Eutrophication, on the other hand, is event-driven and characterized by non-stationary characteristics (Hasting 2001; Onderka 2007; Paerl and Huisman 2008; Recknagel et al. 2013). More significantly, aquatic ecological data are typically sparse and incomplete, with either hydro-environmental or biological factors missing (Donner 1982). Despite the availability of tools for dealing with missing and noisy data, such data makes developing multivariate regression models challenging (Donner 1982).

The Box and Jenkins (1976) method has been used widely in a variety of disciplines. The method is a dynamic, computer-based iterative process that generates an autoregressive, integrated moving average model (ARIMA), optimized for seasonal and trend factors (Gaynor and Kirkpatrick 1994).

The Box and Jenkins method is known for: 1) its capacity to deal with complex processes; 2) its adaptability in processing dependent time series data; 3) its advanced mathematical and statistical techniques and 4) the ease with which it is implemented. Box and Jenkins methodologies often provide the most reliable forecasting models for any data collection (Gaynor and Kirkpatrick 1994).

Armstrong’s (1985) comparative test on the ranking of extrapolation methods (from the highest rank as 1 to the lowest rank as 5) for both short and long range rated Box and Jenkins methods as 1.5 for short-range forecast accuracy and 2 for long-range forecast accuracy (Lu and Abou Rizk 2008).

Box and Jenkins approaches often start with the most recent observations and then examine recent forecasting errors to make appropriate modifications for future periods (Lu and Abou Rizk 2008). In doing so, they allow timely adjustments of error levels and provide a more flexible imitation of a particular complex trend or seasonality (Lu and Abou Rizk 2008). In addition, Box and Jenkins models are capable of dealing with those dependent time-series data which are not considered suitable for other methods. For example, a regression model has a standard assumption that the error term should be statistically independent. In reality, many time series related data are dependent or are correlated with each other (Lu and Abou Rizk 2008).

The method begins with the assumption that the process that created the time series may be estimated using an ARMA model if it is stationary, or an ARIMA model if non-stationary (Lu et al. 2008).

Time series analysis is a powerful tool for assessing water quality indicators and predicting future changes. Several studies have used ARIMA model. For instance, for predicting water levels in Lake Malawi (Makwinja et al. 2017), or water salinity in Apalachicola bay in Florida (Sun and Koch 2001) and for forecasting sulphur dioxide in Tehran (Hassanzadeh et al. 2009). Chen et al. (2015) have designed and proved the efficiency of an ARIMA model for predicting daily Chl-a concentrations in Taihu Lake (China) in comparison to a multivariate linear regression model (MVLR). Seasonal changes are common in natural systems generating variations in biomass and ecosystems structures (Prista et al. 2011). The seasonal, autoregressive, integrated moving average (SARIMA) model is composed of ARIMA model, including a seasonal component of the time series data. SARIMA is very frequently used for monthly time series that exhibit a seasonal pattern (Prista et al. 2011). Chl-a is a parameter known to be related to temperature (Tizro et al. 2014), which have seasonal characteristics. For that reason, a SARIMA model is implemented to handle the characteristics of seasonal variations, which improves the prediction accuracy (Tizro et al. 2014). In this context, the North Lagoon of Tunis is particularly an interesting case for a Mediterranean coastal lagoon eutrophication study. It is a South Mediterranean shallow coastal lagoon located in the north of Tunisia. This environment has a long history of pollution and was formerly one of the world’s most contaminated lagoons (Harbridge et al. 1976; Afli et al. 2008; Armi et al. 2008). To limit the anthropogenic input, the lagoon has been the subject of a restoration project that has been effective in greatly decreasing environmental deterioration. This operation was implemented in 1985. Before this project was conducted, the above-mentioned lagoon was the main outlet for solid waste and domestic/industrial wastewaters stemming from the city of Tunis (the capital of Tunisia). Dystrophic events, anoxia, fish mortality, and red waters were reported in the environment of the North Lagoon of Tunis (Shili et al., 2002). This coastal area remains relatively sensitive and its environmental monitoring is necessary to ensure the sustainability of its proper ecological functioning. In the light of this, the North Lagoon of Tunis was considered as the study case. Time series predictive modeling of key environmental parameters are good decision-making tools for sustainable management of this vulnerable coastal area in the center of an urban agglomeration.

The objective of this present study was indeed to investigate the applicability of a SARIMA model in algal bloom forecasting using Chl-a concentrations as a eutrophication indicator and a period of approximately three decades (January 1989–April 2018) of retrospective Chl-a data in the North Lagoon of Tunis.

Materials and methods

Study area

The North Lagoon of Tunis is located in the Southern Mediterranean Sea to the east of the city of Tunis (36°45′-36°52′ N and 10°10′-10°20′ E). Water is exchanged with the Mediterranean Sea via the Kheireddine canal, which is 800 m long, 40 m wide, and has a mean depth of roughly 2.5 m (Ben Charrada 1992). To reduce the eutrophication, the hydrological system was changed by installing a supplemental separation dam (Fig. 1), which led in a significant gain in biodiversity (Ben Charrada 1992; Shili et al. 2002; Rezgui et al. 2008). The North Lagoon of Tunis was designated a Ramsar Wetland of International Importance in January 2013 (Mdaini et al. 2019).

Fig. 1
figure 1

Location of the study area.

However, The North lagoon of Tunis is currently a completely artificial environment as a result of the human intervention, and the ecological follow-up is a necessity to ensure the proper ecological functioning of this ecosystem located in the center of the city.

Data collection

In this study, the monthly Chl-a data from January 1989 to April 2018 in the North Lagoon of Tunis were obtained from sampling campaigns carried out by Al-Buhaira Invest company that is in charge of the lagoon ecological stability, as a part of the monitoring program.

Box and Jenkins methodology

Time series data is a collection of measurements recorded sequentially and equally spaced across a discrete time interval. The fundamental assumption in time series modeling is that past characteristics will continue to occur in the future (Raman et al. 2018). The autoregressive integrated moving average (ARIMA) model developed by Box and Jenkins (1976) is the most widely used approach for time series analysis. The box and Jenkins approach rely on a mathematical model developed to characterize the time series data based on autocorrelation analysis (Chen et al. 2015). Following the creation of the model, it is utilized to anticipate future values using historical and current time series data (Chen et al. 2015). A major characteristic of time series models is the assumption of some form of statistical equilibrium (Hyndman and Athanasopoulos 2013). An assumption of this kind is that of stationarity (Pai and Lin 2005). Forecasting is based on a linear combination of past observations, which necessitates a stable time series with no visible trend (Pai and Lin 2005). The ARIMA model presumes that the process remains in statistical equilibrium with constant probability features across time (Box et al. 2008). If the mean or variance changes over time, the series may need to be transformed to make it stationary before modeling (Allard 1998).

Time series data are frequently non-stationary, with mean and variance fluctuations across time. Previously, researchers discovered that by differencing the time series, they could remove trend components in the mean (Helfenstein 1991). Typically, one or two orders of differencing are sufficient to prepare data for the technique (Dindarloo 2015). In addition to the trend components, time series related to the ecology field often show seasonal patterns. Box and Jenkins have developed a method to handle time series that contain seasonality (Box et al. 2008). In this case, the model is known as SARIMA model with S observations per period. It is represented by SARIMA (p,d,q) (P,D,Q)S, which has the following form:

$$\varphi (B)\Phi \left({B}^S\right){\left(1-{B}^S\right)}^D{\left(1-B\right)}^d{Z}_t=\theta (B)\Theta {\left({\mathrm{B}}^{\mathrm{S}}\right)}_{\upvarepsilon \mathrm{t}}$$
(1)

With

$$\varphi (B)=1-{\varphi}_1B-\dots -{\varphi}_p{B}^p$$
(2)
$$\theta (B)=1-{\theta}_1B-\dots -{\theta}_q{B}^q$$
(3)
$$\Phi\ \left({\mathrm{B}}^{\mathrm{S}}\right)=1-{\Phi}_1{\mathrm{B}}^{\mathrm{s}}-\dots -{\Phi}_{\mathrm{p}}{\mathrm{B}}^{\mathrm{S}\mathrm{P}}$$
(4)
$$\Theta \left({\mathrm{B}}^{\mathrm{S}}\right)=1-{\Theta}_1{\mathrm{B}}^{\mathrm{S}}-\dots -{\Theta}_{\mathrm{Q}}{\mathrm{B}}^{\mathrm{S}\mathrm{Q}}$$
(5)

Where p is the autoregressive order, d is the number of differencing operations, q is the moving average order and P, D, and Q are the corresponding seasonal orders.

To build an ARIMA or SARIMA model for a time series, Box and Jenkins (1976) proposed an iterative approach (Tiao 2001) involving four steps: stationarity check, identification and estimation, diagnostic and residual check and prediction (Fig. 2).

Fig. 2
figure 2

The prediction process using the SARIMA model

This technique has gained popularity, allowing for the practical use of time series models for forecasting. The stationarity check step entails appropriate time series differencing. It is carried out to attain stationarity and normality; the time series’ stationarity is evaluated using the Augmented Dickey Fuller (ADF) test. This test’s null hypothesis is that the data is non-stationary. If the null hypothesis is rejected, that implies stationary data with a p value of equal to or less than 0.05.

The transformed data correlation structure is identified by studying its autocorrelation (ACF) and partial autocorrelation (PACF) functions (Mishra and Desai 2005). The goal is to restrict the field of parsimonious models worthy of further research (Tiao 2001). The ACF and PACF graphs are used to identify the six parameters (p, d, q, P, D, Q) of the SARIMA model and various models’ parameters can be selected. The lowest Aikake information criterion (AIC) and Bayesian information criterion (BIC) proposed by Akaike (1973) and Schwartz (1978), respectively, are used to select the best-fit model (Fraley and Raftery 1998) among the candidate ones that were created in the previous step.

AIC estimates the relative amount of information lost by a given model: the less information a model loses, the greater the quality of that model (Aho et al. 2014).

$$AIC=\ln \left({\sigma_r}^2\right)+2r/T$$

The BIC is a model selection criteria that is closely connected to the AIC. It is preferable to use the model with the lowest BIC (Aho et al. 2014).

$$BIC=\ln \left({\sigma_r}^2\right)+ rn(T)/T$$

Where, σr2 is the maximum likelihood estimate of the innovation variance, r is the number of parameters in the model and T is the size of the sample series.

After an appropriate model is chosen, several tests are required (model fitness and residual checking) to verify whether the model is adequate for describing the studied process.

To do so, the ACF and quantile-to-quantile (Q-Q) figures of residuals are plotted. In addition, the forecast accuracy of the model is evaluated by splitting the data into two sets. The last observations in our data set are used to compare between simulated values and actual ones. Finally, the chosen SARIMA model is applied to predict the monthly Chl-a future values. The long time series, which covers approximately three decades of monthly Chl-a historical data, enabled us to perform multiple time steps ahead (Wei 1990) from May 2018 to December 2025.

The modeling of the SARIMA model was performed using Econometrics toolbox in the MATLAB software MATLAB® software (version 7,860,349 (R2020b), The Mathworks, MA, USA).

Results and discussion

The time series of monthly Chl-a values are shown in Fig. 3. The autocorrelation properties of the Chl-a time series in the North Lagoon of Tunis (Fig. 4) shows that the autocorrelation coefficients declined relatively quickly. It was preferable to do an ADF test to ensure the stationarity of the Chl-a concentration data (Table 1).

Fig. 3
figure 3

Temporal distribution of Chl-a concentrations in the North lagoon of Tunis

Fig. 4
figure 4

ACF (a) and PACF (b) of the monthly Chl-a variations in the North Lagoon of Tunis

Table 1 ADF test results applied to the Chl-a original time series.

The ADF test confirmed the stationarity of the Chl-a time series with a p value <0.05. According to that, the time series does not need transformation. It was not necessary to transform the data by differencing. Instead, we used the data on the original scale. Fitting a SARIMA model directly is advantageous for forecasting because forecasts are returned on the original scale.

Based on the PACF (Fig. 4), all SARIMA (p, d, q, P, D, Q)S models in which the autocorrelation delay p and seasonal autocorrelation delay P was less than or equal to 4 and the moving average q and seasonal moving average Q was less than or equal to 4, were tested. As mentioned above, Chl-a time series data contain a seasonal component. To analyze its time series from January 1989 through April 2018, we defined S = 12 because we have 12 observations per year.

It was found the minimal AIC and BIC values when performing a SARIMA (2,0,2)(2,0,2)12. With AIC = 628.91, BIC = 666.78 and R2 = 0.52. To avoid making remarkable changes in the original data, it is better to keep the number of parameters to a minimum, so that the values of p, P, q, Q, d, and D selected are less than or equal to 2 (Hintze 2007).

It was observable that the SARIMA (2,0,2)(2,0,2)12 developed in this study outperformed the SARIMA (1,0,0)(2,0,0)12 with AIC = 1593, BIC = 1612 and R2 = 0.46 reported by Raman et al. (2018).

In addition, the developed model performed better than the SARIMAX (1,0,0)(2,0,0)12 with AIC = 1114.2,

BIC = 1141.0 and R2 = 0.52 reported by Raman et al. (2018). Even though the two models have the same coefficient of determination (R2), the AIC and BIC of our SARIMA (2,0,2)(2,0,2)12 are the lowest.

It is important to mention, that SARIMAX is composed of a SARIMA model with external factors derived using PCA analysis.

Estimation of the SARIMA (2,0,2)(2,0,2)12 model parameters and their testing results are presented in Table 2  All estimated coefficients are statistically significant (p value <0.05).

Table 2 Parameter estimates and their testing results of the SARIMA (2, 0, 2) (2, 0, 2)12 model

The ACF of SARIMA (2,0,2)(2,0,2)12 residuals are presented in Fig. 5. ACF residuals suggested autocorrelations near 0. This indicates that the residuals did not deviate significantly from a 0 mean. In other words, it means that the residuals are not correlated.

Fig. 5
figure 5

ACF of residuals

The residual Q-Q plot (Fig. 6) suggests that the residuals are approximately normally distributed, with a slightly heavier tail. Given that, we can conclude that residuals of the SARIMA (2,0,2)(2,0,2)12 model are uncorrelated and normally distributed.

Fig. 6
figure 6

Residuals (Q-Q) plot

Comparison of observed Chl-a data with the fitted ones by the SARIMA (2,0,2)(2,0,2)12 model is presented in Fig. 7. There is a fairly good match between the observed values and the fitted values. To check the forecasting accuracy, Table 3, shows a comparison between the predicted values and the observed ones for the period from January 2017 to April 2018.

Fig. 7
figure 7

SARIMA (2,0, 2) (2,0,2)12 model fit of Chl-a content time series in the North Lagoon of Tunis from January 1989 through April 2018

Table 3 Comparison of predicted and observed monthly Chl-a variations using the SARIMA (2, 0, 2)(2, 0, 2)12 model for the data from January 2017 to April 2018

In this study, we have applied a SARIMA (p, d, q) (P, D, Q)S model to analyze Chl-a variations of monthly collected data in the North Lagoon of Tunis from 1989 through 2018, with the purpose of assessing and forecasting eutrophication and contributing to prevent any deterioration in the studied ecosystem. We have developed a SARIMA that closely fits Chl-a observed data. According to our results, the SARIMA model developed in this study was reliable with high validity, which suggests that our model could be an appropriate statistical tool to predict the future changing trends of Chl-a values or other key parameters, thus preventing high eutrophication scenarios in the North Lagoon of Tunis or other ecosystems. SARIMA offers various advantages over other prediction methods, such as moving average, exponential smoothing, neural networks, and fuzzy logic, owing to its forecasting power, particularly over long time periods, and its richer information on time-related changes (Linthicum et al. 1999; Box et al. 2008). Once we obtained the satisfactory model, we have used it to forecast future values of the Chl-a in the ecosystem. Fig. 8 shows the forecast results of Chl-a concentrations in the North Lagoon of Tunis. The model seems to provide realistic predictions for the future.

Fig. 8
figure 8

Time series plot of Chl-a concentrations in the North Lagoon of Tunis with forecasts at 75% confidence interval

To the best of our knowledge, this is the first study that has applied the SARIMA model to fit and forecast monthly Chl-a concentrations in the North Lagoon of Tunis.

Forecasting is essential in the creation and implementation of policies, particularly in the management of fisheries and aquatic resources. Accordingly, multiple time series model have been developed for forecasting purposes, including ARIMA model, Neural Networks, multiple linear regression (MLR), nonlinear regression (NLR), smoothing models, generalised auto regressive conditional heteroscedasticity (GARCH), Gaussian autoregressive models (Stergiou 1991; Stergiou et al. 1997; Romilly 2005). According to the studies, ARIMA validation errors are substantially smaller and it is a superior forecasting model (Raman et al. 2018).

Similarly, ARIMA model has been used in various studies in lagoons, such as on modeling of the great lakes freeze for ferrous scrap (Albertson and Aylen 1996), and on simulating spatio-temporal behaviour of atmospheric temperature data of the great lakes US (Agrawal 2011). ARIMA’s reliable forecasts with sufficient precision demonstrated an essential role for managers of aquatic natural environments (Raman et al. 2018).

It is known that the SARIMA modeling process needs a large amount of data. An early study suggested that a minimum of 50 observations are needed to build a reasonable SARIMA model (Wei 1990), understanding the majority of its variation and accurately predicting the data’s seasonality and correlation structure (Stergiou 1991; Pajuelo and Lorenzo 1995). To obtain a stable and precise SARIMA model, we had to collect over 350 observations of Chl-a data during the past three decades without interruption to be mentioned. We can state that the results of our study are robust enough.

It is important to mention that, using SARIMA model is money and time-saving for researchers, because it needs only one input variable and provides predictions with acceptable accuracy, which facilitates its applicability for algal blooms early warning use, especially, in vulnerable ecosystems.

Conclusion

Thanks to its biological stability, the North Lagoon of Tunis has a significant social and ecological value. Tourism (water sports), fishing, and marine bird conservation are only a few of the benefits provided by this ecosystem. As a result, in order to apply long-term management and conservation stratetgies, we must maintain a close eye on the eutrophication process and potential algal blooms episodes.

Forecasting Chl-a levels as eutrophication indicator is widely used in coastal ecosystems, as it gives the ability to the managers to identify future problems before they arise, make informed management decisions, develop data-driven strategies and evidence-based conservation programs.

In our study, we have used the SARIMA model to forecast future values of Chl-a in the North Lagoon of Tunis as a eutrophication indicator. This model was applied for a time window of approximately three decades. Different SARIMA (p, d, q) (P, D, Q)S models were implemented. The chosen one, a SARIMA (2,0,2) (2,0,2)12 with the lowest AIC and BIC, was used for forecasting. The goodness of fit was analyzed by comparing with the actual data from the last observations and checking the residuals. The residual diagnostic indicated that they are uncorrelated and normally distributed.

The forecasting results are quite satisfactory since the forecasting period seems to reproduce relatively well the normal Chl-a monthly content in the lagoon. Despite the fact that the restored North Lagoon of Tunis was classified as a Wetland of International Importance (Ramsar site), given its history and importance, this ecosystem is still fragile. The SARIMA model applied to historical data of Chl-a or any other key parameter could be an important tool to providing evidence that guides prevention and control interventions for the ecosystem.