Introduction

Due to the growing global population and socioeconomic development, water resource shortages and the balance between supply and demand are becoming increasingly acute, threatening sustainable development and ecosystem health/stability1,2. Water pollution is one of the main causes of water resource scarcity3. Globally, approximately 44% of domestic wastewater is discharged into the environment without safe treatment. The situation in developing countries is worse, wherein nearly 90% of sewage is discharged without treatment into natural water systems4. To address water shortage and pollution problems, many countries have constructed reservoirs and water diversion projects that provide reliable water sources for agriculture, industry, and urban development, improve living conditions, and play substantial roles in rational water resource use and protection3.

Reservoirs worldwide face numerous ecological threats5. Among these threats, eutrophication and the algal blooms it triggers are the most common ecological challenges facing reservoirs6. An algal bloom refers to the phenomenon in which phytoplankton proliferate and accumulate in quantities that exceed a threshold in an aquatic environment with sufficient nutrient salts7,8. Algal bloom events are more likely to occur in water bodies like lakes and reservoirs, where the long hydraulic retention time leads to the easy accumulation of nutrient8,9. As China contains over 20% of the global population and more than 97,000 reservoirs, water pollution and eutrophication risks affect the health and safety of a portion of the population. The safety of the water supply in Chinese reservoirs also substantially influences water quality and water resource management worldwide.

The Danjiangkou Reservoir (DJKR) is the largest artificial lake in Asia and representative of reservoirs in China, as it is the core water source of the Middle Route of the South-to-North Water Diversion Project of China (MRSNWDPC), which is the longest inter-basin water diversion project in the world. Since the commissioning of the MRSNWDPC in 2014, the DJKR has transferred more than 60.5 billion m3 of water to North China, making it the primary water source for 26 large and medium cities and directly benefiting more than 100 million people10. The water quality of the DJKR directly affects the efficiency and effectiveness of the MRSNWDPC and the drinking water safety in the receiving area; therefore, it is essential to ensure the water quality and safety of the DJKR.

Researchers have conducted many investigations on the water quality and phytoplankton distribution status in the DJKR. The water quality of the DJKR was not very good before the water diversion project was commissioned in 201411. Of which, the water quality of the DJKR deteriorated from Classes I or II to II or III from 2007 to 201012. After the water diversion project opened, the DJKR maintained good water quality conditions13. The phytoplankton population in the DJKR in 2009–2010 comprised diatoms and cyanobacteria, and the community structure was primarily influenced by conductivity14. To prepare for water diversion to the MRSNWDPC, the storage level in the DJKR was raised, after which the phytoplankton biomass decreased and cyanobacterial abundance increased15. After the water diversion project was commissioned, the proportion of diatoms in the DJKR decreased considerably, and the cyanobacteria and green algae became the dominant species from 2018 to 202016. The DJKR currently exhibits mesotrophic conditions17. Although the eutrophication level of the DJKR is not high. Still, as the core water source of the MRSNWDPC, it would jeopardize the water supply of North China and threaten over 100 million people in the adverse event of environmental risk. In the past years, some sections of the DJKR have also experienced severe nitrogen and phosphorus pollution18, nearing eutrophic conditions19. This indicates that the potential risk of algal proliferation does exist in the DJKR. Under this premise, assessing the potential risk of algal proliferation is essential. Therefore, it is crucial to understand the complex relationships between Chl-a concentrations and environmental factors in the DJKR to prevent the occurrence of algal blooms and safeguard water quality in both the DJKR and downstream areas.

Previous studies of the DJKR have shown that water quality between the different water bodies can influence each other due to their hydrologic connectivity13. Since the DJKR is the water source for the MRSNWDPC, the freshwater resources from the Dan and Han reservoirs pass through the Taocha (TC) station and are transported by the main canal to North China. Thus, the water quality at TC station can be affected by the sub-reservoirs, which in turn affects the water quality of the MRSNWDPC. Therefore, it is necessary to monitor the water quality at the TC intake of the MRSNWDPC to determine its algal proliferation potential risk under the influence of water exchange in the reservoir area.

Phytoplankton are affected by various environmental factors, including water temperature, pH value, nutrient contents, and their combined effects20,21,22. Under suitable environmental conditions (e.g., adequate nutrient contents, appropriate light and temperature conditions, and low water flow), algae proliferate rapidly23,24. This proliferation is also related to multiple environmental factors that interact with each other25,26,27. There have been numerous studies on the effects of environmental factors on phytoplankton in the DJKR28,29,30. These studies primarily used methods such as correlation analysis and Mantel tests, which can identify and describe the correlation between variables but are not effective in inferring causal relationships between multiple factors and planktonic algae. A more innovative approach is needed to explore the driving mechanisms of planktonic algae under the influence of multiple factors. Structural equation modeling (SEM) can be used to overcome this difficulty, as it can handle causal relationships between multiple factors31. SEM can also capture the direct effects between variables, as well as indirect effects generated by the mediating variables32. SEM visualizes the pathways through which environmental variables affect planktonic algae and has been widely applied to determine the mechanisms that influence the response and explanatory variables33,34.

The objectives of this study were to investigate the intricate interactions and driving mechanisms between Chl-a and environmental factors in the DJKR, and to assess the risk of algal blooms at the intake of the MRSNWDPC while considering the impacts of surrounding water areas. By employing structural equation modeling and multivariate statistical techniques, the research will provide a more comprehensive and detailed understanding of the interactions between Chl-a and environmental factors across the entire reservoir and at different stations, as well as the mutual influences of Chl-a across different stations, thus enhancing the understanding of algal proliferation mechanisms under multiple factors. However, due to limitations in data and numerical models, this study does not account for the effects of climate and hydrodynamic factors on algae, nor the impacts of pollutant inputs and biological migration under different hydrological conditions, which need to be addressed in future research.

Materials and methods

Study area and data collection

The DJKR (32° 36′–33° 48′ N, 110° 59′–111° 49′ E) straddles Hubei and Henan provinces and is the largest lake-type reservoir and protected drinking water source area in Asia, with a standard storage level of 170 m, storage capacity of 29.05 billion m3, and watershed covering 95,000 km2. The DJKR comprises the Dan (DR, Henan Province) and Han (HR, Hubei Province) reservoirs35 and is located in a low-mountainous and hilly region with a subtropical monsoon climate (i.e., with a warm spring, hot and rainy summer, cool fall, and cold and less rainy winter)36. The average annual temperature in the DJKR area ranges from 15 to 16 °C, and the annual precipitation ranges from 800 to 1000 mm, nearly 80% of which occurs from May to October. The main soil types in the study area are brown, yellow–brown, and yellow-cinnamon soils37, and the main land use types include woodlands, agricultural land (rice paddies and uplands), orchards, and wetlands38. Since water diversion via the MRSNWDPC officially began in December 2014, the DJKR has transferred more than 60.5 billion m3 of freshwater resources to North China. The DJKR is the primary water source for 26 large and medium cities, benefits more than 108 million people, and replenishes more than 9.5 billion m3 of ecological water in more than 50 rivers in northern China.

In this study, six representative water quality monitoring stations in the DJKR area were selected based on the Chinese national water quality monitoring programs, which covered two sub-reservoirs (i.e., the Dan and Han reservoirs). The Qingshan (QS), Longkoulanghe (LK), and Baqian (BQ) stations are located in the HR, and the Madeng (MD), Taochaqujie (QJ), and Taocha (TC) stations are located in the DR. The details of the water quality monitoring stations used in this study can be found in Fig. 1 and Table S1. The Changjiang River Scientific Research Institute of Changjiang Water Resources Commission collected water samples daily from March 2018 to February 2023. Nine water quality indicators were analyzed, i.e., water temperature (WT, °C), dissolved oxygen (DO, mg/L), pH, turbidity (Tur, NTU), total nitrogen (TN, mg/L), total phosphorus (TP, mg/L), ammonia nitrogen (NH3-N, mg/L), permanganate index (CODMn, mg/L), and Chl-a (μg/L). WT, pH, and DO were measured using a water quality multi-parameter monitor. NH3-N, TP, TN, and Chl-a were measured using a UV–visible spectrophotometer. Tur was measured using a portable turbidimeter, and CODMn was measured by an acid buret. Descriptions of the analytical methods and instrumentation can be found in the Supplementary Material (Table S2). The seasons were divided into spring (March to May), summer (June to August), autumn (September to November), and winter (December to February of the following year).

Figure 1
figure 1

Locations of water quality monitoring stations of the DJKR from 2018 to 2023 in this study.

Methodology

The methodological framework of this study was divided into three parts to identify factors driving Chl-a, determine their roles, and estimate the influences of water bodies in the study area on algal proliferation at the TC intake (Fig. 2). We first identified the main factors affecting Chl-a at each station using stepwise multiple regression analysis. We then used principal components analysis (PCA) to downscale the multiple nutrient and water physicochemical indicators into composite variables. Using PCA eliminated potential problems related to covariance and reduced the number of variables used in the SEM, simplifying the model. SEM was then used to model the nutrient, water physicochemical, and organic pollution factors influencing Chl-a to determine the mechanism(s) driving planktonic algal blooms in the study area. Finally, the Copula function was used to construct the joint distribution functions of Chl-a in the different water bodies and at the intake. Bayesian theory was applied to estimate algal proliferation risk at the TC water intake under the influence of the different water bodies. The data used for the above calculations and analysis were daily averages from March 2018 to February 2023.

Figure 2
figure 2

Methodological framework used in this study to identify environmental driving pathways and assess the risk of algal proliferation.

Stepwise multiple regression analysis

The goal of the stepwise multiple regression method is to select the most important variables from a large set of variables. We used this method to determine the main factors driving Chl-a concentrations at each station. The finalized model is shown in Eq. (1):

$$Chl\text{-}a={a}_{0}+\sum_{i=1}^{n}{a}_{i}{X}_{i}$$
(1)

where a0 is a constant, ai represents the different coefficients, and Xi represents the different environmental factors. A log transformation (lg(x + 1)) was applied to pre-process the data before performing the stepwise multiple linear regressions, both of which were performed using the SPSS 24 software (USA, IBM).

PCA

PCA was applied to reduce the data dimensionality and improve its interpretability. PCA minimizes information loss by creating new uncorrelated principal components that sequentially maximize the variance39. The driving factors identified in the stepwise multiple regression analysis were categorized as water physicochemical, nutrient, and organic pollution factors according to a priori knowledge. The multiple water physicochemical and nutrient factors were downscaled using PCA. PCA was not applied for organic pollution downscaling, as only one indicator (CODMn) was used in this study. The principal components obtained from the PCA were referred to as water physicochemical (e.g., PChm 1 and PChm 2) and nutrient (e.g., Nutr 1 and Nutr 2) factors. If only one principal component remained after dimensionality reduction, it was referred to as PChm or Nutr. The PCA was performed using the SPSS 24 software (USA, IBM).

Each principal component is a linear combination of the original variables, with one principal component reflecting the characteristics of multiple variables. Before computing the principal components, data standardization is necessary. This is because different variables may have different units and scales, and variables with larger scales can disproportionately influence the covariance matrix, leading to significant bias in the PCA results. This step eliminates the effects of differing units and enhances the robustness of the results. When performing principal component analysis (PCA) using SPSS, the software automatically standardizes the data at the beginning of the calculation.

Structural equation modeling

SEM is a quantitative method based on statistical techniques that can be used to deal with multi-factor causal relationships31. SEM can reflect the direct influences of the variables, as well as the indirect influences generated by mediating variables32. The direct and indirect causal relationships between multiple variables are represented as paths (directional arrows) in an SEM diagram40. The topological structure comprises multiple variable nodes, and the paths between nodes reflect their driving mechanisms. SEM has been widely used to explore the driving mechanism between multiple variables41,42.

In this study, SEM was used to determine the environmental driving mechanism of Chl-a concentrations in the DJKR. This study utilized Confirmatory Factor Analysis (CFA). Based on prior knowledge, water physicochemical factors directly affect Chl-a and indirectly influence Chl-a through organic pollution factor and nutrients. An initial Structural Equation Modeling (SEM) model was constructed according to this theory, followed by path analysis to quantify the relationships between variables. Finally, model fit was evaluated using fit indices, and the model was refined to achieve higher explanatory power and goodness of fit.

Copula function and Bayesian theory

Relationships among hydrologic variables are complex, and hydrologic variables have various probability distributions. Thus, the Copula function is commonly used to determine the correlations between variables43. The Copula function divides a joint distribution into the marginal distribution and correlation structure, which are treated separately, preserve all information about the correlation, and describe the nonlinear and asymmetric correlations between variables44. According to Sklar's theorem, a binary joint distribution function with continuous marginal distributions F(x) and G(y) has a unique Copula function as follows:

$$H(x,y)=C\left({F}_{x}(x),{F}_{y}(y)\right)$$
(2)

where H(x,y) is the bivariate joint distribution function of the random variables X and Y; F(x) is the marginal distribution function of the random variable X; G(y) is the marginal distribution function of the random variable Y; and C(F(x), G(y)) is the copula function that connects the marginal distribution functions F(x) and G(y).

In this study, the Copula function was used to construct bivariate joint distribution functions for the Chl-a concentrations between different monitoring stations. The conditional distribution functions for different conditions were obtained by applying Bayesian theory. Specifically, to calculate the probability that different Chl-a concentrations at the MD, QJ, QS, LK, and BQ (Chl-aMD, Chl-aQJ, Chl-aQS, Chl-aLK, and Chl-aBQ, respectively) stations will cause the Chl-a concentrations at the TC (Chl-aTC) station to change. The minimum–25%, 25–50%, 50–75%, and 75%–maximum Chl-a concentrations at the respective monitoring stations were used in this analysis. The conditional distribution function can be defined as follows:

$$F({Chl\text{-}a}_{TC}>x|{{y}_{2}>Chl\text{-}a}_{others}\ge {y}_{1})=1-\frac{C(x,{y}_{1})-C(x,{y}_{2})}{{F}_{Chl\text{-}a others} ({y}_{2})-{F}_{Chl\text{-}a others}({y}_{1})}$$
(3)

where Chl-aTC represents the Chl-a monitored at the TC measurement station; Chl-aothers represents the Chl-a monitored at other stations excluding the TC station; x is a specific value of Chl-aTC; y1 and y2 are both specific values of Chl-aothers; C(x,y) is the copula function that captures the dependence between Chl-aTC and Chl-aothers; \({F}_{Chl\text{-}a others}\) is the marginal distribution function of Chl-aothers.

A Chl-a concentration of 3.4 μg/L is a benchmark nutrient concentration for lakes in central-eastern China, as defined by the Ministry of Ecology and Environment of the People’s Republic of China45. Algal proliferation does not jeopardize water body functions when Chl-a concentrations are below this benchmark value. The Copula function and Bayesian theory were calculated using the R software package. The R packages used are “copula” and “VineCopula”.

Ethical approval

All authors have read, understood, and have complied as applicable with the statement on "Ethical responsibilities of Authors" as found in the Instructions for Authors.

Results

Water quality characteristics

Table 1 presents a statistical summary of the water quality indicators and Chl-a concentrations at the six monitoring stations in the DJKR. Kruskal–Wallis tests indicate that all indicators varied significantly among the monitoring stations (P < 0.05). During the monitoring period, WT and DO exhibited significant seasonal characteristics (Fig. S1). DO concentrations were significantly higher in spring and winter than those in summer and autumn, with a minimum of 6.10 mg/L and a maximum of 14.40 mg/L. The average pH values at all stations exceeded 8.0, with a minimum average of 8.08 (at TC station) and a maximum average of 8.21 (at MD station), indicating that the water quality in the DJKR was predominantly neutral to alkaline. In addition, the seasonal variations in pH differed significantly (Fig. S1). The average Tur value at the QS station (4.21 NTU) was higher than that at the other stations. The minimum and maximum Tur values were also observed at the QS station (0.00 and 35.10 NTU, respectively). The concentrations of NH3-N and CODMn were low, all meeting the Chinese surface water quality standards for Class I or II. The minimum observed TP concentration was 0.003 mg/L, and ranged from 0.003 to 0.041 mg/L among all stations. Significant seasonal differences in TP concentrations were observed at all stations except the MD (Fig. S1).

Table 1 Statistical water quality parameters at the six monitoring stations in the DJKR from 2018 to 2023 (Avg.: Average; S.D.: Standard deviation).

Using the lowest or highest average values as examples, to illustrate the significant spatial and temporal differences for each indicator. The statistical analysis on the annual scale was provided in Table S3. Spatially, although WT passed the Kruskal–Wallis test (P = 0.41), the adjusted significance in pairwise comparisons was > 0.5. Annually, WT in 2020 was significantly lower than in other years. The remaining indicators exhibited significant differences both spatially and temporally (P < 0.5). DO was significantly lower at BQ compared to the other four stations except for LK, and significantly higher in 2020 than in other years. pH was significantly higher at MD compared to other stations, and significantly higher in 2022 than in other years. Tur was significantly higher at QS compared to other stations, and significantly higher in 2021 than in other years. NH3-N was significantly higher at LK compared to the other four stations except for BQ, and significantly higher in 2022 than in all other years except for 2021. TP was significantly higher at QS compared to the other five stations, and significantly higher in 2019 than in other years. TN was significantly lower at BQ compared to the other four stations except for MD, and significantly higher in 2022 than in all other years except for 2018. CODMn was significantly higher at MD compared to other stations, and significantly higher in 2021 than in other years.

Spatiotemporal distributions of Chl-a

Chl-a exhibited significant spatial and temporal variations (P < 0.5), with MD showing significantly higher values than other stations, and 2022 being significantly higher than other years. At the stations, the average Chl-a concentrations ranged from 2.07 to 2.38 μg/L in the entire reservoir, from 2.12 to 2.38 μg/L in the DR, and from 2.07 to 2.23 μg/L in the HR (Fig. 3a). The Chl-a concentrations in the DR were low (maximum of only 8.20 μg/L at MD station), whereas higher Chl-a concentrations were more often observed in the HR (QS and BQ stations reached 11.60 μg/L and 11.38 μg/L, respectively). Except for the QS station, significant seasonal differences in Chl-a concentrations were observed at all stations (Fig. 3b). Chl-a concentrations were significantly higher in spring than in other seasons, except at QS and MD stations. In addition, outlying high Chl-a concentrations were observed in the winter at the QS station and in the spring at the BQ station. Chl-a concentrations gradually increased from 2018 to 2023, and more outliers were observed in 2022 (Fig. 3c). The average spring Chl-a concentrations in 2018, 2019, 2020, and 2022 were higher than those during the rest of the year (Fig. 3d).

Figure 3
figure 3

Spatial (a, b) and seasonal (c, d) variations in chlorophyll-a (Chl-a) in the DJKR from 2018 to 2023.

Correlations between Chl-a and environmental factors

The Spearman correlation analysis (Fig. 4) indicates that WT was positively correlated with Chl-a only at the MD station in the DR (r = 0.18, P < 0.001) and did not show a significant correlation with Chl-a at QJ and TC stations. DO was less correlated with Chl-a, as it was negatively correlated at the MD station (r = − 0.08, P < 0.01), positively correlated at the QJ station (r = 0.14, P < 0.001), and uncorrelated at TC station. At all DR stations, both pH and Tur were highly and significantly positively correlated with Chl-a. Similarly, NH3-N, TN, and CODMn were all positively correlated with Chl-a but had smaller correlation coefficients (0.10 to 0.37) compared with those of pH value and Tur. In contrast, TP and Chl-a were significantly negatively correlated (r = − 0.29 to − 0.07, P < 0.01).

Figure 4
figure 4

Spearman correlation matrix of six water quality monitoring stations in the DJKR from 2018 to 2023.

In the HR, WT was negatively correlated with Chl-a only at BQ station (r = − 0.07, P < 0.01), and was not directly correlated with Chl-a at QS and LK stations. DO was weakly correlated with Chl-a, with positive correlations at LK and BQ stations (r = 0.10 to 0.24, P < 0.001) and no direct correlation at the QS station. Tur was highly positively correlated with Chl-a at all HR stations. Similarly, NH3-N and CODMn were both positively correlated with Chl-a but had smaller correlation coefficients (r = 0.06 to 0.28) than those between Tur and Chl-a. TP was significantly positively correlated with Chl-a at the QS station (r = 0.11, P < 0.001), negatively correlated at the LK station (r = − 0.12, P < 0.001), and not directly correlated at the BQ station. TN and Chl-a were significantly positively correlated at BQ and LK stations (r = 0.13, P < 0.001) and no correlated at the QS station.

Overall, the correlations between Chl-a and the water quality factors varied among the monitoring stations. The Chl-a concentrations exhibited low correlations with the individual factors. However, the correlations with most of the water quality factors suggest that changes in Chl-a concentrations were the result of the combined effects of multiple factors.

Key factor determination and downscaling

The stepwise multiple regression analysis results indicate that at least five key driving factors at each station explained the Chl-a concentration variations (Table S4). These results verify that the changes in Chl-a concentrations were synergistically driven by multiple water quality parameters. For instance, the key factors at MD and QJ stations were CODMn, WT, TP, DO, and pH; Tur, DO, TN, TP, WT, NH3-N, and CODMn, respectively.

To reduce the number of variables, the key drivers were identified using PCA prior to SEM calculation (Table S5 and Table 2). The results of the KMO and Bartlett’s sphericity tests were presented in Table S5. Some variables had KMO values below 0.5. Usually, the KMO value above 0.5 is typically required for conducting PCA. However, these variables with low KMO values all passed Bartlett’s Test of Sphericity, with significance levels less than 0.05, indicating that the correlation matrix significantly deviates from the identity matrix and is still suitable for PCA. Therefore, even though some variables had low KMO values, the principal components obtained through PCA are still meaningful.

Table 2 Principal component analysis results for the monitoring stations and sub-reservoirs in the DJKR.

The results of the downscaling were presented in Table 2. At the MD station, the water physicochemical indicators were determined to be PC1. At the QJ station, the water physicochemical indicators were identified as PC1, and the nutrients were selected as PC1 and PC2. At the TC station, the water physicochemical indicators were identified as PC1. At the QS station, the water physicochemical indicators and nutrients were selected as PC1. At the LK station, the water physicochemical indicators were selected as PC1 and PC2. At the BQ station, the water physicochemical indicators were selected as PC1 and PC2, and the nutrients were identified as PC1.

Driving pathway identification of algal proliferation based on SEM

Different SEMs were constructed and their fits were evaluated using Chl-a and driving factors after downscaling. The evaluation criteria are presented in Table S6. The goodness-of-fit results indicate that all of the SEMs passed the corresponding criteria evaluations (χ2/df < 3, CFI > 0.9, GFI > 0.9, AGFI > 0.9, RMSEA < 0.05, and SRMR < 0.05). Therefore, the models were well-fitted and met the requirements of the study. The SEMs for each station are shown in Fig. 5. All insignificant pathways were removed from the SEMs, and only the significant relationships (P < 0.05) were retained. The factor loading of a pathway between two variables is the direct effect, whereas the indirect effect is the product of the factor loadings on all associated pathways (Table 3 and Fig. 6). The total effect between two variables is the sum of the direct and indirect effects (Fig. 7).

Figure 5
figure 5

Environmental factors affecting Chl-a variations at different water quality monitoring stations and sub-reservoirs based on structural equation models (SEMs) (Note: The variable at the beginning of each arrow is the independent variable, while the variable at the arrow endpoint is the response variable for that independent variable. Data on the arrows are the factor loadings for the variable pairs. (a) MD, (b) QJ, and (c) TC stations are located in the Dan Reservoir (DR). (d) Overall conditions in the DR. (e) QS, (f) LK, and (g) BQ stations are located in the Han Reservoir (HR). (h) Overall conditions in the HR).

Table 3 Driving effects of Chl-a variations at different monitoring stations and sub-reservoirs in the DJKR.
Figure 6
figure 6

Direct, indirect, and total effects of driving factors on Chl-a variations at the water quality monitoring stations and sub-reservoirs based on the SEMs (Note: direct, indirect, and total effects were calculated using standardized path coefficients and are represented by red, blue, and green bars, respectively).

Figure 7
figure 7

Relative contributions of different environmental driving factors to Chl-a variations at the water quality monitoring stations and sub-reservoirs (Note: a, b, c, e, f, and g are monitoring stations. d and h are sub-reservoirs).

In the DR, the SEMs for MD and TC stations had the same driving pathways and key factors, indicating that PChm and CODMn directly affected Chl-a concentrations and simultaneously had indirect effects on Chl-a via the mediator variable TP. However, the environmental driving pathway at QJ station was relatively complex, wherein the CODMn had direct and indirect effect on Chl-a by influencing two nutrient variables. In the HR, the relationships between the driving factors and Chl-a were similar to those in the DR (i.e., PChm and CODMn had both direct and indirect effects on Chl-a by influencing mediator nutrient variables). Figure 5 shows the overall scenarios in the DR and HR, respectively. The main difference between the sub-reservoirs was that CODMn in the DR had a direct effect on Chl-a, whereas that in the HR did not directly affect Chl-a.

In the DR (Fig. 6), the total effects of PChm, TP, and CODMn on Chl-a at the MD station were 0.16, − 0.12, and 0.31, respectively. The total effects of PChm, Nutr 1, Nutr 1, and CODMn on Chl-a at the QJ station were − 0.01, 0.11, 0.25, and 0.16, respectively. The total effects of PChm, TP, and CODMn on Chl-a at the TC station were 0.33, − 0.38, and 0.12, respectively. In the HR, the total effects of PChm, Nutr, and CODMn on Chl-a at the QS station were 0.11, 0.26, and − 0.11, respectively. The total effects of PChm 1, PChm 2, TP, and CODMn on Chl-a at the LK station were 0.02, − 0.41, − 0.11, and 0.17, respectively. The total effects of PChm 1, PChm 2, Nutr, and CODMn on Chl-a at the BQ station were − 0.23, 0.02, 0.20, and 0.22, respectively.

In the DR, the factors driving Chl-a were PChm 1, PChm 2, Nutr, and CODMn, which contributed 16.81%, 43.36%, 23.89%, and 15.93%% to Chl-a variations, respectively. Among them, water physicochemical indicators (PChm 1 and PChm 2) were the dominant factors (total contribution of 60.18%) driving Chl-a variations in the DR. In the HR, the driving factors of Chla were PChm 1, PChm 2, Nutr 1, Nutr 2, and CODMn, which contributed 16.98%, 1.89%, 22.64%, 50.94%, and 7.55% to Chl-a variations, respectively. Of which, nutrient factors (Nutr 1 and Nutr 2) were the dominant factors (total contribution of 73.58%) driving Chl-a variations in the HR.

Chl-a variation risks at TC station

The effects of the different sub-reservoirs on Chl-aTC concentrations were identical, i.e., the probability of exceeding a specific Chl-aTC concentration increased with increasing Chl-a concentrations in other parts of the DJKR (Fig. 8). Chl-a at MD station ranged from 0.21 to 8.20 μg/L, and the probabilities of Chl-aTC exceeding 3.4 μg/L in the four quartiles were 0.72%, 2.53%, 8.40%, and 22.92%, respectively. The probabilities of Chl-aTC exceeding 3.4 μg/L when the QJ station Chl-a was under the same conditions were 0.53%, 2.25%, 8.30%, and 23.77%, respectively. The probabilities of Chl-aTC exceeding 3.4 μg/L when the QS station Chl-a was under the same conditions were 0.72%, 2.70%, 8.79%, and 23.65%, respectively. The probabilities of Chl-aTC exceeding 3.4 μg/L when the LK station Chl-a was under the same conditions were 0.58%, 2.29%, 8.50%, and 24.32%, respectively. The probabilities of Chl-aTC exceeding 3.4 μg/L when the BQ station Chl-a was under the same conditions were 2.09%, 4.91%, 9.43%, and 17.56%, respectively.

Figure 8
figure 8

Conditional risk probability of Chl-a concentration variations in the TC station under the influence of other stations (Note: a, b, c, d, and e represent the effects of Chl-a concentration variations in MD, QJ, QS, LK, and BQ stations on TC station respectively).

The probabilities that TC station would be affected by an algal bloom were largest for the uppermost quartile to maximum Chl-a in the sub-reservoirs, which ranged from 17 to 25%. For the rest of the concentration ranges, the probabilities of algal bloom risk at TC station did not exceed 10%. The probability of actual Chl-a concentrations exceeding 3.4 μg/L at TC station during 2018–2023 was 7.17%, which is similar to predictions for the sub-reservoirs when the Chl-a was in the 50%–75% quartile range (8.30–9.43%), which were less than 10%.

Discussion

Chl-a concentrations were relatively lower in the DR, particularly at TC station, whereas higher concentrations were more frequently observed in the HR, which is consistent with the findings of previous studies46. The algal bloom events signify that the planktonic algal biomass has peaked, and the Chl-a concentration is an important indicator for the degree of algal bloom and eutrophication. Therefore, comparing Chl-a maxima can reflect the intensity and scale of potential algal blooms in the two sub-reservoirs. Chl-a concentrations at QS and BQ stations exceeded 10 μg/L (11.60 and 11.38 μg/L, respectively). Thus, the algae were more aggregated in the HR and the risk of algal blooms was higher than that in the DR. This spatial difference in Chl-a concentration may be related to the differing lengths of the reservoir backwaters and the reservoir shapes. The DR is a lake-type reservoir with an expansive water surface and short shoreline, whereas the HR area is a canyon-type reservoir with a long and narrow reservoir surface, many tributaries, and zigzag-shaped banks that receive more exogenous nutrients from surface runoff14. Therefore, water in the HR is more prone to algal proliferation and blooms. The high Chl-a concentrations observed at QS station may be related to higher TP and NH3-N inputs from the Han River46. Chl-a concentrations at BQ station were also higher, which can be attributed to nutrient retention at the dam and lower flow rates, which resulted in Chl-a enrichment47. Therefore, the tributary inflow and dam in the HR should be considered when conducting algal risk management in the DJKR.

Algal proliferation requires a suitable environment, including optimal temperature, pH value, and light/radiation conditions, as well as appropriate dissolved oxygen concentration status, organic pollutant concentrations, and primary nutrients (carbon, nitrogen, and phosphorus) loadings for proliferation and survival48,49,50,51.

The composite variable Nutr reflected information from multiple nutrient indicators52. Nitrogen and phosphorus are key nutrients that affect algal proliferation53. Different TN and TP levels affect spatiotemporal phytoplankton dynamics and species succession54, and the phytoplankton biomass exhibits spatiotemporal heterogeneity with changing TN and TP concentrations55. A previous case study has shown that changes in nitrogen and phosphorus affected microcystis blooms in Isahaya Bay Reservoir, Japan, and additional nutrients provided by precipitation events promoted algal proliferation and led to algal blooms in the reservoir56. The water quality in the DJKR is generally good13; however, the TN exceeds the standard and is between Class III and IV57. TN concentrations recorded during the monitoring period were consistent with those obtained by previous investigations, suggesting that the nitrogen salts in the DJKR have always been adequate. Higher TN or TP concentrations lead to higher algal densities when they exceed the optimal range for algal development58. Thus, the adequate TN concentrations were responsible for the significant positive correlation observed between TN and Chl-a in this study. Additionally, algal biomass may be limited by TN concentrations less than 1.0 mg/L or TP concentrations less than 0.025 mg/L59. TP concentrations were below 0.025 mg/L during the monitoring period, which may have limited algal proliferation. Previous studies have determined that phosphorus is one of the main factors that influences planktonic algae in the DJKR10. Among the nutrient factors at MD, TC, and LK stations, the only selected factor was TP. Based on the SEMs for these stations, TP had a negative effect on Chl-a, whereas the SEMs for the other stations indicate that multiple trophic factors had a positive effect on Chl-a. This indicates that the synergistic effects of TP, TN, and NH3-N in this study were beneficial for algal proliferation. NH3-N is the preferred inorganic nitrogen utilized by planktonic algae60. A combination of NH3-N and phosphorus also promotes phytoplankton growth61,62. The synergistic effects of TN, TP, and NH3-N also showed positive impacts on Chl-a in Xiannv Lake, China63, which is consistent with what we observed in the HR.

Nutrient concentrations influence algal proliferation, but interactions also occur with water physicochemical indicators64. The correlation analysis indicates that WT was not well correlated with Chl-a, which may have an indirect effect on Chl-a by modulating other factors65. However, the stepwise multiple regression analysis results indicate that WT was a selected driving factor at all stations except for QS station. Therefore, we hypothesized that other factors may have more significant direct effects on Chl-a than WT66. Previous studies have shown that DO affects phytoplankton biomass in the DJKR67. The correlation analysis indicates that WT and DO are highly negatively correlated, but together they reflect the primary environmental conditions required for algal proliferation. Therefore, DO was considered to be a key driver of algal proliferation in this study. Unlike common phenomena, this study found a low correlation between DO and Chla, while TP was generally negatively correlated with Chla. It may be related to the phytoplankton community in DJKR. Previous studies have shown that the dominants of the DJKR are Cyanophyta, Chlorophyta, and Bacillariophyta10. The DO and TP are significantly negatively correlated with Cyanophyta and Chlorophyta, significantly positively correlated with Bacillariophyta, and exhibit a negative and weaker correlation with the phytoplankton diversity index10. Chl-a is an indicator of the biomass of all planktonic algae in the water. When considering the entire phytoplankton community, the effects on individual communities may be neutralized, resulting in the correlations between DO, TP, and Chla as observed in this study.

Tur also affects algal proliferation, as low turbidity can provide suitable light conditions68. When an algal bloom event occurs, it reduces the water clarity and increases turbidity, which occurred at the QS station in the autumn (both Chl-a and Tur concentrations were relatively high). Tur recorded in this study were highly positively correlated with Chl-a, and was selected in the stepwise multiple regression models at all monitoring stations except MD. generally as the first or second factor. Therefore, among the physicochemical factors affecting Chl-a concentration variations in the DJKR, Tur was one of the key driving factors69.

As one of the driving factors, pH value is one of the components of the water physicochemical factor (PChm) in the SEMS established for the MD, TC, and QS stations. The PChm arrow in the model points to Chl-a, which reflects the significant and positive active role of pH in influencing Chl-a. During the monitoring period, the mean pH value range was 8.08 to 8.21, which is within the optimal range. The effects of pH and algae are reciprocal in that suitable pH facilitates algal blooms, while photosynthesis by the algae absorbs CO2 in the water, thereby changing the pH value70,71. Therefore, pH value was included as a key driving factor.

The SEM results in this study showed that CODMn was one of the key factors driving algal proliferation in the DJKR. Previous studies showed that the water quality of the MRSNWDPC was good, with low CODMn concentration and a significant correlation between CODMn and planktonic algal cell density10,72. Some algal proliferation decreases with increasing organic pollutant concentrations, whereas some algae survive and degrade at lower concentrations, producing nutrient sources in the degradation products that can promote algal growth73. Specifically, at low organic matter concentrations, organic matter can be used as a direct nutrient source and degradation product to promote algal growth by increasing enzymatic activity and lipid peroxidation in algal cells74. At higher organic matter concentrations, algal growth is inhibited by damage to cell membranes and reduced enzymatic activity75. In addition, planktonic algal growth also releases organic matter, which affects organic pollutant concentrations76.

Hydrodynamic conditions are likely to be a significant factor influencing algal growth. Research indicates that the DJKR experiences notable thermal stratification, with the thermocline situated between 2 and 15 m below the surface, averaging around 10 m in thickness, and the stratification influences the distribution of nitrogen and phosphorus, thereby impacting the community structure of phytoplankton77. Heavy rainfall can severely disrupt this stratification, resulting in increased dissolved oxygen levels that inhibit nutrient release from sediments and consequently reduce phytoplankton biomass78. Additionally, the reservoir’s phytoplankton diversity index is higher in spring and winter and lowest in autumn28, likely due to variations in flow velocity and volume. Historical data reveal that the monthly average of DJKR inflow is highest from July to October, peaking at 2410 m3/s. The intake flow velocity and volume are greatest in autumn and lowest in winter, with annual averages of 0.76 m/s and 203.48 m3/h, respectively28. High flow velocity conditions negatively affect algal growth79. If hydrodynamic conditions are incorporated as a factor in constructing a structural equation model, the initial model should illustrate that hydrodynamic factors directly influence physicochemical factors, nutrient factors, organic pollution factors, and phytoplankton, and indirectly influence phytoplankton through physicochemical, nutrient, and organic pollution factors. The causal pathways among the remaining three factors should remain consistent with those observed in this study. A thorough elucidation of the driving mechanisms of planktonic algae necessitates considering additional factors, which will be addressed in future research.

Conclusions

This study developed a methodological framework for analyzing spatiotemporal distributions of algal proliferation and quantifying the driving mechanism using multivariate statistical techniques and structural equation models. The Danjiangkou Reservoir, i.e., the core water source area of the South-to-North Water Diversion Project of China, was taken as a study case. The risks of Chl-a concentration changes at the TC station were analyzed using the Copula function and Bayesian theory. The main conclusions are as follows:

  1. (1)

    Chl-a concentration distributions exhibited significant spatiotemporal variations in the DJKR, and Chl-a was relatively higher in spring than in other seasons. The factors driving algal proliferation differed spatially. Water physicochemical factors and CODMn had both direct and indirect effects on Chl-a by influencing mediator nutrient variables, whereas nutrient factors had direct effects on Chl-a variations.

  2. (2)

    In the DR, the dominant factor driving Chl-a was the water physicochemical factors, whereas nutrients were the dominant factor in the Han reservoir. When Chl-a concentrations in other areas of the DJKR ranged from the minimum to the upper quartile, the probabilities of the TC station Chl-a exceeded 3.4 μg/L (owing to the influences of other parts of the DJKR) were all less than 10%. Thus, the risk of planktonic algal proliferation at the MRSNWDPC intake is currently low.

Due to limitations in data and numerical models, this study does not account for the effects of climate and hydrodynamic factors on algae, nor the impacts of pollutant input and biological migration under different hydrological conditions. Future research should comprehensively consider the influence of climate/hydrodynamic factors on algae proliferation, establish a mathematical model of mechanism, and quantitatively analyze the environmental impact of pollutant input and algae migration under different hydrological conditions so as to make a more reliable and informative comparative analysis of the relevant statistical models and achievements.