Background

Global studies indicated that variations of precipitation and temperature have been the crucial factors in the fluctuations of the water levels of lakes (Motiee and McBean 2009; Mekonnen et al. 2012; Kiani et al. 2017). Climate studies in Ethiopia by the national meteorological agency (NMA) and others indicated consistently increasing rates of changes in temperature and variable trends in precipitation at most parts and across the entire country (Conway 2000; Seleshi and Zanke 2004; Bewket and Conway 2007; NMA 2007; McSweeney et al. 2008; Ayalew et al. 2012; Alemayehu and Bewket 2017). This change pattern in climate of Ethiopia promoted reduced surface runoff, decreased overlake precipitation and increased evapotranspiration to reduce the water levels of lakes in the country (Kebede et al. 2006; Olana 2014; Gebeyehu 2017).

In Ethiopia, climate is largely influenced by seasonal shifts in atmospheric pressure systems controlling prevailing winds (NMA 2007) and the complex topography of the country affecting the climate via its local heating and orographic effects (Camberlin 1997). Thus, Ethiopia receives seasonally varying precipitation in which most parts of it experience three precipitation seasons: kiremt or the long precipitation season (June–September), belg or the short precipitation season (March–May) and bega or the dry season (October–February) (NMA 2007). Seasonal precipitation failures of kiremt and belg due to El Nino/Southern Oscillation or ENSO to cause droughts have been common phenomena in the country (NMA 2007).

Climate information at the local level is critical for sustainable climate hazard management on a specific lake in developing countries like Ethiopia, where adaptive capacity to changing climate is limited. The Lake Hayk basin is a closed lake basin on the northeastern edge of the Ethiopian highlands, which is a climate sensitive region due to its proximity to the northernmost limit of the Inter Tropical Convergence Zone (ITCZ), where a slight shift in the monsoon system's position or strength can cause a switch between aridity and moisture excess (Loakes et al. 2018). Thus, due to its closed nature and location in one of Ethiopia’s most drought-prone areas, the Endorheic Lake Hayk basin is extremely sensitive and vulnerable to climate change (Conway 2000; Philip et al. 2018).

There have been studies on Lake Hayk that have been directly or indirectly linked to fluctuations in the water level of the lake though limited in number. The limnology study (Baxter and Golobitsch 1970), bathymetric studies (Demlie et al. 2007; Yesuf et al. 2013), Late Quaternary climate change (Loakes et al. 2018) and Lake Hayk basin response to magnitude of surface runoff generation (Mewded et al. 2021) were all investigated. Bathymetric surveys on Lake Hayk were conducted at various times, with the first being in the 1930s by Italian limnologists, such as Morandini’s bathymetry of Lake Hayk, surveyed in May 1938 and published in 1941 (Baxter and Golobitch 1970), reported a maximum lake depth of 88.2 m, whereas the most recent bathymetric survey done in 2009 (Yesuf et al. 2013) evidenced a maximum water depth of 81.44 m. As a result, the endorheic lake Hayk’s water depth dropped from 88.2 m in 1938 to 81.44 m in 2009, a total drop of 6.76 m in 71 years. Similarly, the lake’s surface area has shrunk from 2320 ha in 1938 (Baxter and Golobitch 1970) to 2156.76 ha in 2015 (Mewded et al. 2021); a loss of 163.24 ha in 77 years.

Importantly, the Literature revealed that the lake's water level has been continuously declining since the 1970s, posing a severe problem for the lake (Baxter and Golobitch 1970; Demlie et al. 2007; Yesuf et al. 2013; Loakes et al. 2018; Mewded et al. 2021). However, none of the studies have addressed the issue in relation to climate variability and change. As a result, this study investigated the local level variability/trends in precipitation, temperature and lake level for the historic series from 1986 to 2015 on monthly, annual and seasonal scales to scientifically show the implications of climate variability/change on Lake Hayk water level variations in the Endorheic Lake Hayk basin to aid in making local-level climate change oriented water management decisions to save Lake Hayk from extinction.

Materials and methods

Description of the study area

Ethiopia is situated in Eastern Africa, between 3°  and 15°  latitude and 33°  and 48°  longitude (Horn of Africa). The Lake Hayk basin is a naturally closed (endorheic) drainage that belongs to one of Ethiopia's most vulnerable zones to climate change and variability. Its areal extent is within 39.68°E to 39.81°E, 11.24°N to 11.39°N and its area coverage including the lake water surface area of 2156.76 ha is 8592.68 ha (Fig. 1).

Fig. 1
figure 1

Map depicting Ethiopia’s diverse topography (upper) and a map depicting the location of the Endorheic Lake Hayk basin in Ethiopia, as well as the location of the meteorological station and lake level measuring gauge in the Lake Hayk basin (lower)

The Lake Hayk basin is under a subhumid tropical climate with bimodal precipitation regimes (kiremt and belg). The lake basin received a mean annual precipitation of 1192.31 mm; the mean annual surface temperature was 17.58 °C.

Data sources

The hydroclimate variability/trend in the Endorheic Lake Hayk basin was analyzed using mean monthly historical datasets of precipitation, mean temperature (Tmean) and Lake Hayk’s Water Level (LWL) from 1986 to 2015. Precipitation data from only one Hayk meteorological station (11.31°N, 39.68°E; 1984 m amsl) outside the study area (Fig. 1) has been used to observe the Lake Hayk water level response to climate change/variability. Evidently, the degree to which precipitation amounts vary across an area is an important characteristic of the climate of an area that affects hydrology of lakes. Keeping this in mind, we believe that the Hayk meteorological station is the sole relevant and appropriate station from which precipitation data is enough to evaluate the impacts of climate on Lake Hayk water levels. This is from two perspectives. The first is that, despite being outside the basin, it is very close to Lake Hayk (less than 6 km away), even closer than several points within the basin. Its proximity to Lake Hayk allows it to collect data that is nearly identical to what the lake and its surroundings receive. The other reason is that the lake basin is small (85.93 km2), resulting in a density of meteorological stations in the lake basin of 85.93 km2 per station, which adequately represents the Lake Hayk basin according to the World Meteorological Organization (WMO) recommendation of 300–1000 km2 per station in Temperate, Mediterranean and Tropical zones (Dingman 2002). In light of these considerations, the authors used only the Hayk meteorological station rather than interpolating data from other stations, which could compromise data quality. The Ethiopian National Meteorological Agency has provided us with data for the lake basin's mean monthly precipitation (1986–2015) and temperature (1994–2015). In addition, due to a lack of station data, the reanalysis temperature products (RTPs) of the same station for the years 1986–2015 were retrieved from the climate explorer (https://climexp.knmi.nl/) portal.

Furthermore, the LWL data of Lake Hayk are measured using the water level measuring gauge situated on the southwest shore of Lake Hayk (Fig. 1). The lake average daily water level time series from 1986 to 2015 were provided by Ethiopia’s Ministry of Water Resources. However, the LWL time series was riddled with severely missed data. Daily data with more than 10% missing values must be excluded from the analysis (Seleshi and Zanke 2004). Full daily LWL data were available for the years 1999–2005 and 2011–2015. Therefore, we fused the water level observations from these periods with remote sensed water extents to bridge the gap between 2005 and 2011. Cloud free (clouds cover ≤ 10%) Landsat 5 Thematic Mapper (TM) images for 2009–2011 and Landsat 7 Enhanced Thematic Mapper Plus (ETM +) images for 2005 and 2008 years were retrieved from the Earth Explorer (http://earthexplorer.usgs.gov/) archiving system. To ensure greater accuracy of interpretation, all Landsat images were downloaded for months of the dry (bega) season of the year.

Data analysis

This study combined hydroclimate data from gauging station, gridded data (reanalysis) and remotely sensed satellite data to analyze climate variability/change and its implications on changes in the water level of endorheic Hayk Lake at the local level, using statistical approaches with the integration of remote sensing and geographic information system. The general methodology of the study is depicted schematically in Fig. 2.

Fig. 2
figure 2

A schematic illustration of the study's methodology

Evaluating the reanalysis temperature data

Reanalysis products are thought to be useful in situations when meteorological stations are insufficient and unevenly dispersed, as well as in cases where missing records and short period observations exist (Dee et al. 2011). Climate reanalyses such as the European Center for Medium-Range Weather Forecasts (ECMWF) ReAnalysis 5th generation (ERA5) (Hersbach et al. 2020), the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) (Reinecker et al. 2011) and the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) (Kalnay et al. 1996) are currently in use. Due to the scarcity of historic station temperature records in the Endorheic Lake Hayk basin, we relied on reanalysis products (historic estimates produced by combining a numerical weather prediction model with observational data from satellites and ground observations) as the best alternative solutions, but their performance evaluation should no longer be overlooked. Therefore, the ERA5, MERRA-2 and NCEP/NCAR reanalysis temperature products (RTPs) in the Lake Hayk basin were quantitatively evaluated against ground station temperature data for the 1994–2015 time series on annual and seasonal scales using coefficient of determination (R2), root mean square error (RMSE) and relative bias (Alemseged and Tom 2015; Nkiaka et al. 2017).

$$R^{2} \, = \,\,\,\left[ {{{\left( {\sum\limits_{t = 1}^{n} {\left( {T_{r} - \overline{{T_{r} }} } \right)\,\,\left( {T_{s} - \overline{{T_{s} }} } \right)} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{t = 1}^{n} {\left( {T_{r} - \overline{{T_{r} }} } \right)\,\,\left( {T_{s} - \overline{{T_{s} }} } \right)} } \right)} {\left( {\sqrt {\sum\limits_{t = 1}^{n} {\left( {T_{r} - \overline{{T_{r} }} } \right)\,\,\sum\limits_{t = 1}^{n} {\left( {T_{s} - \overline{{T_{s} }} } \right)} } } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sqrt {\sum\limits_{t = 1}^{n} {\left( {T_{r} - \overline{{T_{r} }} } \right)\,\,\sum\limits_{t = 1}^{n} {\left( {T_{s} - \overline{{T_{s} }} } \right)} } } } \right)}}} \right]^{2}$$
(1)
$$RMSE\,\, = \,\,\,\sqrt {{{\sum\limits_{t = 1}^{n} {\left( {T_{r} - T_{s} } \right)^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{t = 1}^{n} {\left( {T_{r} - T_{s} } \right)^{2} } } n}} \right. \kern-\nulldelimiterspace} n}}$$
(2)
$$Bias\,\, = \,\,\,\left( {{{\sum\limits_{t = 1}^{n} {\left( {T_{r} - T_{s} } \right)} } \mathord{\left/ {\vphantom {{\sum\limits_{t = 1}^{n} {\left( {T_{r} - T_{s} } \right)} } {\sum\limits_{t = 1}^{n} {T_{s} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{t = 1}^{n} {T_{s} } }}} \right)\,\,\times\,100\%$$
(3)

where Tr and Ts denote reanalysis and ground station temperature records respectively and n is the length of data. R2 varies within 0 ≤ R2 ≤ 1; R2 = 0 reveals no correlation and R2 = 1 indicates perfect correlation between the reanalysis product and station temperature record. Bias detects a systematic error in temperature values. Zero bias indicates absence of systematic error, whereas negative/positive biases reveal respectively underestimation and overestimation of values (Alemseged and Tom 2015). The RMSE measures residual dispersion (estimation errors) around the best fitting line. RMSE near zero would be a better fit to the data.

Variability and trends analysis of hydroclimate time series

Various statistical approaches were used to examine the variability/trend in the hydroclimate time series of the Endorheic Lake Hayk basin from 1986 to 2015. The coefficient of variability (CV), the standardized rainfall anomaly (SRA) and the precipitation concentration index (PCI) were employed to study variability of the data. The Modified Mann Kendall (MK) trend test method and the Sen Slope estimator were applied to analyze the significance and magnitude of trend respectively using XLSTAT software. The CV value represents the level of variability in the dataset and is defined as the standard deviation (SD) to mean value (μ) ratio (Hare 2003).

$$CV\, = \,\,\,\left( {{{SD} \mathord{\left/ {\vphantom {{SD} \mu }} \right. \kern-\nulldelimiterspace} \mu }} \right)\,\times100$$
(4)

Hare (2003) characterizes variability as being less for CV values less than 20, moderate for CV values between 20 and 30 and high for CV greater than 30. PCI examines the heterogeneity of mean monthly precipitation data. For Pi is the ith month precipitation magnitude, Oliver (1980) defines PCI as follows:

$$PCI\,\, = \,\,\,\,\,\left[ {{{\left( {\sum\limits_{i\, = 1}^{12} {P_{i}^{2} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{i\, = 1}^{12} {P_{i}^{2} } } \right)} {\left( {\sum\limits_{i\, = 1}^{12} {P_{i} } } \right)^{2} }}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{i\, = 1}^{12} {P_{i} } } \right)^{2} }}} \right]\,\,\times\,100$$
(5)

Precipitation concentration can be identified as low concentration (uniform distribution of precipitation) for PCI values lower than 10, high for values from 11 to 20 and very high for values above 21 (Oliver 1980). SRA offers insights on the occurrence and severity of drought periods. Standardized rainfall and temperature anomalies have no units. They are dimensionless. To get more information about the magnitude of the anomalies, standardized anomalies are calculated by dividing anomalies (deviations of mean value from each observation) by the standard deviation to remove influences of dispersion. Therefore, for Pt is annual precipitation at a year of interest t and Pm is the mean annual precipitation value during the study period, SRA can be estimated according to Agnew and Chappell (1999):

$$SRA\, = \,\left( {P_{t} - P_{m} } \right)\,/\,SD\,\,\,\,$$
(6)

Then, severity of drought can be categorized as extreme drought (SRA <  − 1.65), severe drought (− 1.28 > SRA >  − 1.65), moderate drought (− 0.84 > SRA >  − 1.28) and no drought (SRA >  − 0.84) (Agnew and Chappell 1999).

The modified Mann Kendall (MK) trend test was used to examine the monotonic trends of hydro climatic time series in the endorheic Lake Hayk basin from 1986 to 2015 at a significance level of 5% on a monthly, annual and seasonal basis. It was chosen because it is a rank-based (i.e., less affected by low-quality hydro climatic data-data with missing values and/or outliers) nonparametric (i.e., less sensitive to skewed datasets-applies for all distributions) method (Hirsch and Slack 1984). The MK tests the null hypothesis (H0) assuming no trend against the alternative hypothesis of monotonic trend (Ha) using either the S statistics (n < 10) or the standardized normal Z statistics (n ≥ 10) (Hirsch and Slack 1984; Yue et al. 2002). The MK test S statistic is calculated using the following equations (Eqs. 7 and 8) as:

$$S\, = \,\sum\limits_{i = 1}^{n = 1} {\sum\limits_{j = i + 1}^{n} {{\text{sgn}} \left( {x_{j} - x_{i} } \right)} } \,\,\,\,\,\,\,$$
(7)
$$\,\,{\text{sgn}} \left( {x_{j} - x_{i} } \right)\,\, = \,\,\left\{ \begin{gathered} + 1\,\,\,\,if\,\,\theta \,\,\, > \,0 \hfill \\ 0\,\,\,\,\,\,\,if\,\theta \, = \,0 \hfill \\ - 1\,\,\,\,\,if\,\theta \, < \,0 \hfill \\ \end{gathered} \right.\,\,\,\,\,\,\,\,$$
(8)

where n is the data size and xi and xj are the data values at times i and j respectively, i = 1, 2,…, n−1 and j = i + 1, i + 2…, n. Every value in the chronologically ordered time series is compared to every value preceding it, yielding a total of n (n – 1)/2 pairs of data. The total of all rises and falls result in the ultimate value of S (Yue et al. 2002). S values can be positive to show rising trends or negative to indicate falling trends.

When n ≥ 10, the S statistic is assumed to have a normal distribution, with the mean becoming zero and the variance computed using the following equation (Eq. 9) (Kendall 1975):

$$V\left( S \right)\, = \,\frac{1}{18}\left[ {n\left( {n - 1} \right)\,\left( {2n + 5} \right)\, - \,\sum\limits_{t = 1}^{m} {t_{i} \left( {t_{i} - 1} \right)\,\left( {2t_{i} - 5} \right)} } \right]\,\,$$
(9)

where V (S) is the variance of S statistics, m denotes the size of tied groups (groups with similar values) and ti represents the size of data points in the ith tied group. Then, the Z test statistics can be calculated from the known values of S and V(S) using the following equation (Eq. 10):

$$Z\, = \,\,\left\{ \begin{gathered} {{\left( {s - 1} \right)} \mathord{\left/ {\vphantom {{\left( {s - 1} \right)} {\sqrt {v\,\,\left( s \right)} \,\,}}} \right. \kern-\nulldelimiterspace} {\sqrt {v\,\,\left( s \right)} \,\,}}\,\,\,\,\,\,if\,\,S > 0 \hfill \\ 0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,\,\,S = 0 \hfill \\ {{\left( {s + 1} \right)} \mathord{\left/ {\vphantom {{\left( {s + 1} \right)} {\sqrt {v\,\,\left( s \right)} }}} \right. \kern-\nulldelimiterspace} {\sqrt {v\,\,\left( s \right)} }}\,\,\,\,\,\,\,\,\,if\,\,S < 0 \hfill \\ \end{gathered} \right.\,\,\,\,\,\,\,\,\,\,\,\,$$
(10)

The resulting Z values indicate the direction of a trend (values can be positive to show rising trends or negative to indicate falling trends). Furthermore, the Z statistic is used to measure significance of a trend. When testing for a trend (2 tailed) at significance α, H0 is rejected if \(\left| Z \right|\,\) equals or exceeds its critical value \(\,\left( {\left| Z \right|\,\,\, \ge \,\,Z_{\alpha /2} } \right)\). For instance, if the 5% significance level is used, H0 is rejected when \(\,\left( {\left| Z \right|\,\,\, \ge \,\,1.96} \right)\) or P ≤ 0.05, indicating that a trend exists (A time series has a trend when it is significantly correlated with time). The letter P symbolizes the probability of risk to reject/accept H0 while it is true.

Prior to trend testing, it is critical in time series analysis to examine autocorrelation or serial correlation, which is frequently overlooked in many trend detection studies. To account for the effect of autocorrelation, Hamed and Rao (1998) propose a modified Mann–Kendall test rather than the original MK test, which should be used only on datasets with no seasonality or significant autocorrelations. This is because the presence of significant autocorrelation in a dataset can alter the variance of the original MK test (the existence of positive autocorrelation will lower the actual value of V (S) and vice versa). Hence, when data exhibit autocorrelation, the modified MK test calculates the modified variance using the following equations (Eqs. 11 and 12):

$$V^{*} \left( S \right)\,\,\, = \,\,\,\frac{1}{18}\left[ {n\,\,\left( {n - 1} \right)\,\,\left( {2n + 5} \right)} \right]\frac{n}{ns*}\,\,\,\,\,\,\,\,\,\,$$
(11)
$$\frac{n}{ns*}\,\, = \,\,\,1 + \,\frac{2}{{n\,\,\left( {n - 1} \right)\,\left( {n - 2} \right)}}\,\,\,\sum\limits_{i = 1}^{p} {\left( {n - i} \right)\,} \,\left( {n - i - 1} \right)\,\,\left( {n - i - 2} \right)\,\,\,p_{s} \,\left( i \right)\,\,$$
(12)

1998) modified MK trend test facility (at a significance level of 10%), to account for the autocorrelation effect.

Sen's slope estimator computes the linear annual rate and direction of change (Sen 1968). It is a nonparametric approach for dealing with skewed datasets and outlier effects. The linear model f (t) is defined by the equations (Eqs. 13 and 14) (Sen 1968) as follows:

$$f\left( t \right)\,\, = \,\,Qt\, + \,\beta \,\,\,\,\,\,\,\,\,\,\,$$
(13)
$$Q\,\,\, = \,\,Median\,\,\,\,\,\,{{\left( {X_{i} \, - \,X_{j} } \right)} \mathord{\left/ {\vphantom {{\left( {X_{i} \, - \,X_{j} } \right)} {\left( {i - j} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {i - j} \right)}},\,\,\,\,\,\forall j\,\, < \,\,i\,\,\,\,\,$$
(14)

Lake Hayk water level response to climate change/variability

Due to the endorheic (closed) nature of the Lake Hayk basin, the main underlying hydrological processes are surface runoff and evapotranspiration, with precipitation and temperature being the most prominent climatic factors. Under such conditions, water level is the primary response variable that serves as an indicator to better reflect the climate change/variability effects on lake storage. In addition, it can easily be measured at observation stations and the changes in lake water levels can be monitored easily, accurately and continuously (Tan et al. 2017). However, in situations where lake level data is patchy, as it is in Lake Hayk, remotely sensed water extents derived from Landsat images would allow us to bridge the data gap of the water level time series (McFeeters 1996; Xu 2006). This is achieved by developing spectral water indices to extract water bodies from remotely sensed Landsat images, which is typically accomplished by computing the normalized difference between two image bands and then applying an appropriate threshold to segment the results into two categories (water and nonwater features). The Modified Normalized Difference Water Index (MNDWI) can efficiently extract lake waters from Landsat images by easily suppressing signals from various environmental noises (such as vegetation, built-up areas and shadow noises) compared to its predecessor, the Normalized Difference Water Index (NDWI) using Shortwave Infrared (SWIR) rather than Near Infrared (NIR) used in the NDWI (Xu 2006). The formula used for the MNDWI calculation is:

$$MNDWI = \,\,{{\left( {Green\, - \,SWIR} \right)} \mathord{\left/ {\vphantom {{\left( {Green\, - \,SWIR} \right)} {\left( {Green\,\, + \,SWIR} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {Green\,\, + \,SWIR} \right)}}$$
(15)

For Landsat 5 Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper plus (ETM +), MNDWI becomes:

$$MNDWI = \,\,{{\left( {band2 - \,band5} \right)} \mathord{\left/ {\vphantom {{\left( {band2 - \,band5} \right)} {\left( {band2\, + \,band5} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {band2\, + \,band5} \right)}}$$
(16)

MNDWI values for water and nonwater features are determined by the reflectance ability of the features to band 2 and band 5 of Landsat 5 TM and Landsat 7 ETM + satellite images (Xu 2006). Water features have greater reflectance in band 2 than in band 5, resulting in positive MNDWI values, whereas nonwater features have negative MNDWI values (i.e., index values range from − 1 to + 1). A standard threshold value of zero is used to signify that a feature is water if MNDWI > 0 and nonwater if MNDWI ≤ 0 (Xu 2006). When the standard threshold value of zero is used, the MNDWI can accurately determine the spatial position of shorelines at the land–water boundary and successfully extract them from the multi-temporal Landsat TM and ETM + images. Thus, MNDWI values are still positive for the shallowest parts of water bodies and waterlogged areas (areas inundated with water). A change in MNDWI values at a specific time occurs when a sensor detects a spatio-temporal change in nonwater features (change in land use and land cover types within the lake basin) and/or change in depth and quality of water features.

Therefore, to bridge the time gap between 2005 and 2011, water level observations from 1999 to 2005 and 2011 to 2015 were fused with the MNDWI index extracted remotely sensed water areas from Landsat images in the ArcGIS 10.1 software environment. The analysis was conducted using cloud free (clouds cover ≤ 10%) Landsat 7 Enhanced Thematic Mapper plus (ETM +) satellite images of 2005 and 2008 and Landsat 5 Thematic Mapper (TM) satellite images from 2009 to 2011 obtained from the http://earthexplorer.usgs.gov/ portal. The obtained Landsat images (Level 1 Terrain Corrected (L1T) product) were pre-geo-referenced to the Universal Transverse Mercator (UTM) zone 37°N projection system using a World Geodetic System 84 (WGS84) datum. A single Landsat image is sufficient to encompass the entire Lake Hayk basin, which has an area of 8592.68 ha. Landsat 5 TM and Landsat 7 ETM + images specifications are shown in Table 1.

Table 1 Specifications of Landsat 5 TM and Landsat 7 ETM + images

The MNDWI index was validated by correlating remotely sensed water areas to water level observations both obtained on the same date in each year by using the Pearson’s (parametric) and Kendall's tau (nonparametric) correlations at a 0.01 significance level with the Statistical Package for the Social Sciences (SPSS) version 20 software. For n sample sizes of X and Y variables, the Pearson's coefficient (r) can be computed as:

$$r\,\, = \,\,{{\sum\limits_{i = 1}^{n} {\left[ {\left( {X_{i} \, - \,\overline{X} } \right)\,\left( {Y_{i} \, - \,\overline{Y} } \right)} \right]\,} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {\left[ {\left( {X_{i} \, - \,\overline{X} } \right)\,\left( {Y_{i} \, - \,\overline{Y} } \right)} \right]\,} } {\,\left( {\sqrt {\sum\limits_{i = 1}^{n} {\left( {X_{i} \, - \,\overline{X} } \right)^{2} \,\sum\limits_{i = 1}^{n} {\left( {Y_{i} \, - \,\overline{Y} } \right)^{2} } } } } \right)}}} \right. \kern-\nulldelimiterspace} {\,\left( {\sqrt {\sum\limits_{i = 1}^{n} {\left( {X_{i} \, - \,\overline{X} } \right)^{2} \,\sum\limits_{i = 1}^{n} {\left( {Y_{i} \, - \,\overline{Y} } \right)^{2} } } } } \right)}}$$
(17)

This can be confirmed by the nonparametric Kendall's correlation. The Kendall's correlation helps to minimize the effects of extreme values and/or the effects of violations of the normality and linearity assumptions (Kendall 1938). The Kendall’s tau (τ) is calculated based on signs as:

$$\tau = \,\,\frac{2}{{n\,\left( {n - 1} \right)}}\,\sum {_{i\,\, < \,\,j} \,sign\,\left[ {\left( {x_{i} \, - \,x_{j} } \right)\,\,\left( {y_{i} \, - \,y_{j} } \right)} \right]}$$
(18)

In both cases, the correlation coefficients are within − 1 and + 1. Correlation values close to ± 1 indicate strong relationships. For result interpretation, the hypothesis for a 2 tailed test of the correlation at a given significant level is defined as: H0: r, tau = 0 versus Ha: r, tau ≠ 0.

Results and discussion

Performance evaluation of the reanalysis temperature data

The evaluation result (Table 2) indicates that annual temperature is well represented in ERA5 when compared to the other RTPs, with a higher R2 value of 0.965, a lower RMSE value of 0.07 and a minimum bias (− 0.233%). On a seasonal scale, the analysis clearly shows that ERA5 is superior, with the highest coefficient of determination, lowest RMSE and lowest bias, while MERRA-2 and NCEP/NCAR perform poorly. Temperature biases are clearly reduced in ERA5, implying that the estimated values are very close to the lake basin's observed temperature data. As a result, the ERA5 is the most robust reanalysis to represent the temperature time series of the Lake Hayk basin, and the estimated temperature is liable for the lake basin’s climate variability/trends analysis.

Table 2 Evaluating reanalysis temperature products in the lake basin (1994–2015)

The fact that ERA5 best represents the temperature time series of the Endorheic Lake Hayk basin could be attributed to its significantly improved spatial and temporal resolutions, which allow it to explore finer temperature data at a local scale spatial coverage, as well as its ability to estimate uncertainty and provide bias corrected outputs.

ERA5 is the most recent reanalysis product of the ECMWF which has a significantly improved horizontal resolution of 0.25°  × 0.25°  grid with an output frequency of hourly intervals (Hersbach et al. 2020), whereas MERRA-2 data has a resolution of 0.5°  latitude × 0.625°  longitude grid with 3 h intervals (Rienecker et al. 2011) and NCEP/NCAR reanalysis has a resolution of about 2.5°  × 2.5°  with 6 h intervals (Kalnay et al. 1996). Therefore, ERA5 has the highest capability to offer finer temperature data for the Lake Hayk basin that agrees well with ground station observations at both the annual and seasonal scales (Table 2).

Furthermore, ERA5 is based on four-dimensional Variational (4D-Var) data assimilation utilizing Cycle 41r2 of the Integrated Forecasting System (IFS), which was operational at ECMWF in 2016 (Hersbach et al. 2020). Thus, ERA5 estimates uncertainty using the 4D-Var ensembles, a 10-member ensemble of data assimilations with 3 hourly outputs at 63 km grid spacing (i.e., the Ensemble of Data Assimilations (EDA) for ERA5 has lower spatio-temporal resolution than ERA5 itself). Knowing the level of uncertainty quantified by a 10-member ensemble in the EDA system for different seasons, regions, periods, levels, and variables helps to readjust the previous short range forecast to do an analysis at the start of the assimilation window (the last 12 h) and then we start running the forecast given a background forecast was valid at the start of the assimilation window and observations were falling within that window. After that, more accurate ERA5 reanalysis results can be obtained. In contrast, the other reanalysis temperature products (MERRA-2 and NCEP/NCAR) lack the capability to capture the errors of the day (instabilities of the background flow).

Therefore, ERA5 is the most robust reanalysis that best represents the Lake Hayk basin in terms of temperature time series estimation, and that the temperature from the ERA5 reanalysis product was used for the climate variability/trends analysis of the lake basin. Similarly, Gleixner et al. (2020) indicated that the ERA5 reanalysis can provide enhanced reanalysis temperature and precipitation products over East Africa, including Ethiopia, to address climate impact-based researches.

Finally, we believe that adapting climate reanalysis products is vital for developing countries such as Ethiopia, where a scarcity of gauge station data is a major challenge in hydroclimate studies in Lake Basins and elsewhere on smaller and larger scales.

Precipitation variability and trends

Table 3 depicts the variability/trend in precipitation in the Lake Hayk basin for the time series 1986–2015 computed using the MK and Sen’s methods. The average annual precipitation was 1192.31 mm. The highest annual precipitation (1835.30 mm) was recorded in 2010 and the lowest (827.10 mm) in 2015 that respectively stood out as the wettest and the driest years over the study period. Kiremt (June to September) or the principal rainy season contributed about 63.15% of the entire yearly rain. About 50% of kiremt precipitation occurred in July and August, while June and September contributed 2.93% and 10.37% respectively. This statistic was a clear indicator of high precipitation concentrations during the kiremt season in the lake basin. The short rainy or belg season (March to May) also contributed to a significant amount of precipitation (about 22.93% of total yearly precipitation). Bewket and Conway (2007) and Ayalew et al. (2012) also indicated that 55–85% and 8–24% of the annual mean precipitation of the Amhara national regional state (the region where the Lake Hayk basin is found) was due to the kiremt and belg seasons respectively.

Table 3 Variability and trend tests of precipitation (1986–2015)

Precipitation variability expressed in CV terms showed notable precipitation variability on a monthly scale varying from 31.70% in August to 133.68% in December. On the other hand, the Lake Hayk basin was characterized by less variable annual mean precipitation (CV = 16.66%) and moderate to high variable seasonal precipitation according to variability classification of Hare (2003). Seasonally, the belg (CV = 43.94%) and bega (CV = 47.46%) seasonal precipitations were almost twofold more variable than kiremt (CV = 23.80%) seasonal precipitation. Similar findings were reported by (Ayalew et al. 2012; Alemayehu and Bewket 2017). With respect to the precipitation trend, each month (excluding February, September and November) showed statistically nonsignificant trends. February and September showed a significant decreasing trend; November alone changed positively. Likewise, the annual and belg seasons showed nonsignificant decreasing trend, whereas the kiremt season exhibited nonsignificant increasing change. This is in line with Bewket and Conway (2007) and Ayalew et al. (2012) that indicated nonsignificant precipitation trends in several locations in Ethiopia.

The other variability measurement index called the PCI indicated a high to very high concentration of monthly precipitation in the lake basin (Table 4). Similarly, Alemayehu and Bewket (2017) reported high precipitation concentration in the highland areas of central Ethiopia.

Table 4 Precipitation concentration indices (1986–2015)

According to the results of the standardized rainfall anomaly (SRA), the Lake Hayk basin experienced considerable interannual precipitation variability, with the proportions of years with negative and positive anomalies estimated to be 57% and 43% respectively. The SRA-based interannual precipitation fluctuations revealed precipitation anomalies ranging from − 1.84 in 2015 (the driest year) to + 3.24 in 2010 (wettest year) (Fig. 3). This signified that the annual precipitations during the driest and the wettest years have been 1.84 and 3.24 × the SD below and above the long term (1986 to 2015) mean value respectively. Very low values of precipitation anomalies corresponded to severe drought periods. The year 2015 could be placed in the extreme drought category (SRA < 1. 65). Philip et al. (2018) also showed 2015 was a strong El Niño driven worst drought year in recent decades occurred in large parts of Ethiopia including Lake Hayk basin. Moreover, similar seasonal and annual precipitation anomaly patterns were observed, nevertheless some dry years in kiremt appeared wet in belg and vice versa.

Fig. 3
figure 3

Standardized rainfall anomalies (1986–2015)

Temperature variability and trends

Table 5 depicts the results of the variability/trend analysis of temperature time series (1986–2015) computed using MK and Sen's methods. The highest and lowest mean monthly temperature values were 22.06 °C in April and 13.91 °C in December respectively. Likewise, the highest seasonal mean temperature was observed in belg and the lowest was in bega season. The annual Tmean value ranged from 16.65 °C (minimum) in 1989 to 18.54 °C (maximum) in 2015. Its annual mean value over the study period was 17.58 °C. The monthly, annual and seasonal CV values indicated that Tmean was in a less variability category. The standardized anomalies of Tmean revealed that most of the 2000s were warmer than Tmean's long run average (Fig. 4). 2015 was the ever-hottest year during the study period in the lake basin.

Table 5 Variability and trend tests of mean temperature (1986–2015)
Fig. 4
figure 4

Mean annual and seasonal Tmean standardized anomalies (1986–2015)

Concerning trends of Tmean, statistically significant upward trends were detected in March, April and August through October; the rest months, excluding December (December showed a nonsignificant decreasing trend) indicated nonsignificant increasing trends. The monthly mean temperature varied from + 0.07 to + 0.60 °C per decade over the 12 months of the year. The most rapid increase occurred in March and April at rates of + 0.60 and + 0.50 °C per decade respectively. The months between August and November experienced almost similar rate of increase (about 0.30 °C every 10 years). The annual and seasonal Tmean time series showed statistically significant upward trends. The seasonal Tmean varied from + 0.20 to + 0.45 °C per decade; the highest rate was recorded during the belg season. The annual Tmean has been changed at a linear rate of + 0.26 °C per decade. Similarly, McSweeney et al. (2008) showed an increase of 0.28 °C per decade in the average annual temperature in Ethiopia for the period 1960–2006. This warming trend has tended to accelerate the lake water evaporation in the Endorheic Lake Hayk basin that could have negative implications on the water level changes of Lake Hayk in the lake basin.

Lake Hayk water level response to climate change/variability

Lake water level change during 1999–2005

As shown in Table 6, the monthly CV values ranged from 15.81% in June to 39.26% in December, with the majority of the months falling into the moderate to high variability category. However, there was less variability in the annual and seasonal LWL data. The MK trend test results demonstrated a nonsignificant positive trend in LWL data over monthly, yearly and seasonal periods.

Table 6 Variability and trends of mean water level of Lake Hayk (1999–2005)

Changes in precipitation and temperature are the most important climatic elements influencing lake level variations in endorheic lake basins (Tan et al. 2017). Increases in annual precipitation result in increased overlake precipitation and surface runoff into the lake, whereas increases in annual temperature result in increased lake water evaporation loss. During the period 1999–2005, both precipitation and temperature experienced substantial interannual variability, with positive and negative anomalies alternating every year or two years (Figs. 3 and 4). It was discovered that the impacts of precipitation in increasing LWL were less compensated negatively by the effects of rising temperatures, resulting in a statistically nonsignificant upward trend in Lake Hayk's water level (Table 6 and Fig. 5). Lake water level (LWL) refers to the free water surface reading on a reading gauge located on the southwest shore of Lake Hayk, whereas water level variation refers to the mean value deviation from each observation, which can be positive or negative in units of observed values.

Fig. 5
figure 5

The mean annual water level (free water surface gauge reading) and level variations (mean value deviations from each observation) from 1999 to 2005

Changes in lake water extent (2005–2011)

Using the MNDWI index, remotely sensed Landsat images of the Lake Hayk basin from 2005 to 2011 were extracted into two categories: water and nonwater features. The extraction accuracy was evaluated by applying the Pearson and Kendall correlations to correlate the extracted water areas to the gauge station lake levels measured on the same date as the image acquisition date of each year (Table 7). The Pearson and Kendall correlation coefficients have p values less than 0.01 (0.000 outputs in Kendall's tau as rounded to three decimal places), showing that the two variables have a highly significant correlation. As a result, H0 is rejected, indicating that the LWL and lake area are correlated. The Pearson (r = 0.980) and Kendall (tau = 0.983) correlation coefficients confirmed that the MNDWI extracted remotely sensed areas accurately represented the Lake Hayk basin and were adequate for water area change detection.

Table 7 The Correlation between gauge station lake levels and MNDWI index extracted remotely sensed water areas (2005–2011)

The validated remotely sensed lake areas were then used to detect changes in water areas of Lake Hayk during 2005–2011 (Table 8 and Fig. 6). Water surface area of Lake Hayk was 2241.33 ha in 2005 and went down to 2158.58 ha in 2008. It then began to rise to 2268.83 ha in 2010, before falling to 2165 ha in 2011. The lake’s water area decreased and increased most in 2008 and 2010 respectively, with area changes fluctuating within the 110.25 ha range between 2005 and 2011. As a result, the lake areas in 2008 and 2010 could be the prominent symptoms of the exacerbation of drought and flood climate hazards in the lake basin respectively. This pattern of change in the LWL data could be related to the patterns of precipitation and temperature changes. The mean annual precipitation has most of the time shown negative anomalies (except in 2005 and 2010, which showed positive anomalies), with the largest negative anomaly (-1.33) recorded in 2008 and the maximum positive anomaly (3.24) recorded in 2010 (Fig. 3), contributing to the 2008 drought and the 2010 flood events respectively.

Table 8 Lake Hayk water area variations (2005–2011)
Fig. 6
figure 6

Maps of water and nonwater features extracted from Landsat 7 ETM + (2005 and 2008) and Landsat 5 TM (2009–2011) using the MNDWI index

Lake Hayk water level changes (2011–2015)

As shown in Table 9, the mean monthly LWL ranged from 330 mm in February to 1910 mm in August. Seasonal values ranged from 580 mm in bega to 2090 mm in kiremt, while annual values varied from 1337.5 mm in 2011 to 570 mm in 2015. On monthly, annual, and seasonal scales, the LWL in the lake basin showed high variability. The MK test revealed statistically significant downward trends on monthly, annual and seasonal scales. The kiremt season had the greatest seasonal drop (− 340 mm/year), while the bega season had the least (− 250 mm/year). From 2011 to 2015, the mean annual rate of fall in LWL was 280 mm/year.

Table 9 Variability and trends in mean water level of Lake Hayk (2011–2015)

Unlike the preceding two periods (1999–2005 and 2005–2011), there were no discernible interannual fluctuations in precipitation and temperature data between 2011 and 2015 (Figs. 3 and 4). The consistent reduction in precipitation, along with rising temperature, resulted in less surface runoff into the lake, reduced overlake precipitation and increased lake water evaporation. As a result, a statistically significant declining trend in Lake Hayk's water level was identified (Table 9 and Fig. 7). Similarly, WMO (2016) reported that the 2011–2015 was the hottest period and 2015 was the warmest year since modern observations started in the late 1800s. Philip et al. (2018) also confirmed that 2015 was an El Niño driven worst drought year in most parts of Ethiopia including the Lake Hayk basin, causing the worst decline in the water level of Lake Hayk during that time.

Fig. 7
figure 7

The mean annual water level (free water surface gauge readings) and level variations (mean value deviations from each observation) from 2011 to 2015

The findings of this study are supported by findings from around the globe and at the national level. For example, Motiee and McBean 2009, Mekonnen et al. 2012 and Kiani et al. 2017 reported globally, and Kebede et al. 2006, Olana 2014 and Gebeyehu 2017 reported nationally in Ethiopia that regional variability and declining trends in precipitation, combined with increased evapotranspiration due to consistently warming trends, led to a significant changes in lake water levels, resulting in lakes disappearing or being at risk of disappearing.

Conclusions and recommendations

This study can be viewed as a starting point for improving knowledge about the coupling of hydroclimate time series from gauge stations, gridded datasets (reanalysis products) and remotely sensed Landsat images to analyze the climate change/ variability in the endorheic Lake Hayk basin and its implications on water level variations of the Lake Hayk at the local scale using statistical approaches integrated with remote sensing and geographical information systems. As far as we know, fusing the patchy gauge station water level observations with remotely sensed water areas to analyze climate change/variability implications on water level fluctuations of Lake Hayk is a newly revealed method that has not been explored in previous Ethiopian climate studies.

This study found that the Endorheic Lake Hayk basin experienced variable and declining precipitation, as well as a consistent warming trend. At the same period, Lake Hayk’s water level fluctuated in response to changes in precipitation and temperature. In recent times, the Lake Hayk water level has been constantly dropping due to combined effects of declining precipitation and consistently rising temperature. This suggests that climate change/variability in the lake basin has direct implications for Lake Hayk’s water level changes, necessitating immediate climate change oriented water management strategies for Lake Hayk to save it from extinction.

From these perspectives, we believe that this study has significant contributions to the scientific community in the field of hydroclimatology. Its scientific contributions in terms of methodology, findings and applications are discussed below.

In terms of methodology

The use of the spectral water index (MNDWI index) to extract lake water areas from Landsat images in order to fill the lake water level data gap in the endorheic Lake Hayk basin for local scale climate change impact study can be regarded as a new insight towards methodological advancement in this area of study, particularly in Ethiopia, where it has not yet been adapted for climate related studies.

In terms of findings

This study examined hydroclimate variability/trends at the local scale and identified the implications of climate change/variability on Lake Hayk water level fluctuations. Therefore, the findings of this study, as well as the overall methodological approaches used in the study, have made significant contributions to documenting and accessing the local scale hydroclimate historical background of the Lake Hayk basin in order to guide towards the right and effective local scale water management decisions that will save the Lake Hayk from extinction.

In terms of applications

The distinctive feature of this study in terms of applications may be the use of both innovative and traditional methods of disseminations to assist application of the research findings. Social media accounts (Facebook, Twitter and LinkedIn), a professional academic social network (ResearchGate), preprints prior to journal submission and publishing on open access journals are some of the innovative dissemination methods we used in previous studies and will continue to use in future studies. These are excellent opportunities to share our findings with larger audiences, as well as engage with our community and possibly generate new ideas and partnerships. The innovative disseminations are complemented with traditional forms to enhance the effectiveness of the findings’ application. One strategy is publication in the Journal Environmental Systems Research, which is also an innovative method. Others, such as workshops and brochures in English and indigenous Amharic languages, are used to raise awareness of the lake basin's concerned stakeholders at all levels (from farmers to policymakers) in order to assist in the formulation of an appropriate water management strategies to reverse the negative impact of climate change/variability on Lake Hayk's falling water level.

However, the findings of this study must be viewed in light of some limitations that could be addressed in future researches. The main findings of this research have implications for the long-term management of Lake Hayk’s water resource, which is under stress from climate change/variability dynamics. Therefore, future research could look into appropriate climate-oriented adaptation methods to manage Lake Hayk water sustainably. This study used ERA5 reanalysis temperature and MNDWI index extracted remotely sensed lake water areas and came up with valid results. This means that customizing gridded (reanalysis) and Landsat satellite data is vital in developing countries like Ethiopia, where a scarcity of gauge station data is a key barrier to hydroclimate studies. Adapting climate reanalysis products and Landsat satellite imagery for climate impact studies in the Lake Hayk basin and elsewhere on small and large scales could thus be a research topic worth investigating further. It's also important to realize that the main findings of this study can be extended to a water balance analysis of Lake Hayk. The responses of water balance components of endorheic Lake Hayk (precipitation, runoff and evapotranspiration) to changing climate help in determining whether inflows to the lake Hayk can sustain future storage of the lake, allowing for timely water management decisions. Fortunately, the authors are presently dealing with the issue.