Introduction

Drought is a natural phenomenon that occurs when precipitation is significantly lower than normal (Belayneh et al. 2014; Saadat et al. 2011). These deficits may cause low crop yields for agriculture, reduced flows for ecological systems, loss of biodiversity and other problems for the environment, in addition to adversely impacting the hydroelectric industry, as well as causing deficits in the drinking water supply which can negatively affect local populations. The less predictable characteristics of droughts such as their initiation, termination, frequency and severity can make drought both a hazard and a disaster. Drought is characterized as a hazard because it is a natural accident of unpredictable occurrence, but of recognizable recurrence (Mishra and Singh 2010). Drought is also characterized as a disaster because it corresponds to the failure of the precipitation regime, causing the disruption of the water supply to natural and agricultural ecosystems as well as to other human activities (Mishra and Singh 2010). 22 % of the global damage caused by natural disasters can be attributed to droughts (Keshavarz et al. 2013).

Droughts have also had a great impact in Africa. The Sahel has experienced droughts of unprecedented severity in recorded history (Mishra and Singh 2010). The impacts in sub-Saharan Africa are more severe because rain-fed agriculture comprises 95 % of all agriculture (Keshavarz et al. 2013). In 2009, reduced rainfall levels led to an increase in the frequency in droughts and resulted in an increase of 53 million food insecure people in the region (Husak et al. 2013). Due to the changes in climate in Africa and around the world (Adamowski et al. 2009, 2010; Nalley et al. 2012, 2013; Pingale et al. 2014), it is likely that droughts will become more severe in the future.

Due to their slow evolution in time, droughts are phenomena whose consequences take a significant amount of time with respect to their inception to be perceived by both ecological and socio-economic systems. Due to this characteristic, effective mitigation of the most adverse drought impacts is possible, more than in the case of the other extreme hydrological events such as floods, earthquakes or hurricanes, provided a drought monitoring system which is able to promptly warn of the onset of a drought and to follow its evolution in space and time is in operation (Rossi et al. 2007 ). An accurate selection of indices for drought identification, providing a synthetic and objective description of drought conditions and future drought conditions, represents a key point for the implementation of an efficient drought warning system (Cacciamani et al. 2007). This can help local stakeholders to try and adapt to the effects of droughts in an effective and sustainable manner (Halbe et al. 2013; Kolinjivadi et al. 2014a, b; Straith et al. 2014; Inam et al. 2015; Butler and Adamowski 2015).

In this study, the drought index chosen to forecast drought is the Standardized Precipitation Index (SPI), which was developed to quantify a precipitation deficit for different time scales (Guttman 1999). The Awash River Basin was the study basin explored in this study and the SPI index was used to forecast drought mainly because the SPI drought index requires precipitation as its only input. Furthermore, it has been determined that precipitation alone can explain most of the variability of East African droughts and that the SPI is an appropriate index for monitoring droughts in East Africa (Ntale and Gan 2003).

In hydrologic drought forecasting, stochastic methods have been traditionally used to forecast drought indices. Markov Chain models (Paulo et al. 2005; Paulo and Pereira 2008) and autoregressive integrated moving average models (ARIMA) (Mishra and Desai 2005, 2006; Mishra et al. 2007; Han et al. 2010) have been the most widely used stochastic models for hydrologic drought forecasting. The major limitation of these models is that they are linear models and they are not very effective in forecasting non-linearities, a common characteristic of hydrologic data (Tiwari and Adamowski 2014; Campisi et al. 2012; Adamowski et al. 2012; Haidary et al. 2013).

In response to non-linear data, researchers in the last two decades have increasingly begun to forecast hydrological data using artificial neural networks (ANNs). ANNs have been used to forecast droughts in several studies (Mishra and Desai 2006; Morid et al. 2007; Bacanli et al. 2008; Barros and Bowden 2008; Cutore et al. 2009; Karamouz et al. 2009; Marj and Meijerink 2011). However, ANNs are limited in their ability to deal with non-stationarities in the data, a weakness also shared by ARIMA and other stochastic models.

Support vector regression (SVRs) are a relatively new form of machine learning that was developed by Vapnik (1995), and which have been recently used in the field of hydrological forecasting. There are several studies where SVRs were used in hydrological forecasting. Khan and Coulibaly (2006) found that a SVR model was more effective at predicting 3–12 month lake water levels than ANN models. Kisi and Cimen (2009) used SVRs to estimate daily evaporation. Finally, SVRs have been successfully used to predict hourly streamflow (Asefa et al. 2006), and were shown to perform better than ANN and ARIMA models for monthly streamflow prediction (Wang et al. 2009; Maity et al. 2010), respectively. SVRs have also been applied in drought forecasting (Belayneh and Adamowski 2012).

Wavelet analysis, an effective tool to deal with non-stationary data, is an emerging tool for hydrologic forecasting and has recently been applied to: examine the rainfall–runoff relationship in a Karstic watershed (Labat et al. 1999), to characterize daily streamflow (Saco and Kumar 2000) and monthly reservoir inflow (Coulibaly et al. 2000), to evaluate rainfall–runoff models (Lane 2007), to forecast river flow (Adamowski 2008; Adamowski and Sun 2010; Ozger et al. 2012; Rathinasamy et al. 2014; Nourani et al. 2014), to forecast groundwater levels (Adamowski and Chan 2011), to forecast future precipitation values (Partal and Kisi 2007) and for the purposes of drought forecasting (Kim and Valdes 2003; Ozger et al. 2012; Mishra and Singh 2012; Belayneh and Adamowski 2012).

The effectiveness of these data-driven models and wavelet analysis, coupled with ANN and SVR models, has been shown in a variety of study locations. Kim and Valdes (2003) used WA-ANN models to forecast drought in the semi-arid climate of the Conchos River Basin of Mexico. Mishra and Desai (2006) used ANN models to forecast drought in the Kansabati River Basin of India. Bacanli et al. (2008) forecast the SPI in Central Anatolia where the precipitation was concentrated in the spring and winter and where the temperature difference between summer and winter was extremely high. Ozger et al. (2012) coupled wavelet analysis with artificial intelligence models to forecast long term drought in Texas, while Mishra and Singh (2012) investigated the relationship between meteorological drought and hydrological drought using wavelet analysis in different regions of the United States. While the principal reason for the use of these models in the aforementioned areas is the susceptibility of these regions to drought, the variability in climatic conditions highlights how versatile and effective these new forecasting methods are.

Belayneh and Adamowski (2012) forecast SPI 3 and SPI 12 in the Awash River Basin using ANN, SVR and WA-SVR models. This study complements that study by forecasting the SPI over a larger selection of stations in the same area, and by coupling, SVR models with wavelet transforms. The main objective of the present study was to compare traditional drought forecasting methods such as ARIMA models with machine learning techniques such as ANNs and SVR, along with ANNs with data pre-processed using wavelet transforms (WA-ANN), SVR, and the coupling of wavelet transforms and support vector regression (WA-SVR) for short-term drought forecasting. The SPI, namely SPI 3 and SPI 6, were forecast using the above mentioned methods for lead times of 1 and 3 months in the Awash River Basin of Ethiopia. Both SPI 3 and SPI 6 are short-term drought indicators, and forecast lead times of 1 and 3 months represent the shortest possible monthly lead time and a short seasonal lead time, respectively.

Theoretical development

Development of SPI series

The SPI was developed by McKee et al. (1993). A number of advantages arise from the use of the SPI index. First, the index is based on precipitation alone making its evaluation relatively easy (Cacciamani et al. 2007). Secondly, the index makes it possible to describe drought on multiple time scales (Tsakiris and Vangelis 2004; Mishra and Desai 2006; Cacciamani et al. 2007). A third advantage of the SPI is its standardization which makes it particularly well suited to compare drought conditions among different time periods and regions with different climates (Cacciamani et al. 2007). A drought event occurs at the time when the value of the SPI is continuously negative; the event ends when the SPI becomes positive. Table 1 provides a drought classification based on SPI. Details regarding the computation of the SPI can be found in Belayneh and Adamowski (2012) and Mishra and Desai (2006).

Table 1 Drought classification based on SPI (McKee et al. 1993)

Autoregressive integrated moving average (ARIMA) models

Autoregressive integrated moving average models are amongst the most commonly used stochastic models for drought forecasting (Mishra and Desai 2005, 2006; Mishra et al. 2007; Cancelliere et al. 2007; Han et al. 2010).

The general non-seasonal ARIMA model may be written as (Box and Jenkins 1976):

$$z_{t} = \frac{{\theta (B)a_{t} }}{{\phi (B)\nabla^{d} }}$$
(1)
$$\phi (B) = (1 - \phi_{t} B - \phi_{2} B^{2} - \cdots - \phi_{p} B^{p} )$$
(2)

and

$$\theta (B) = (1 - \theta_{1} B - \theta_{2} B^{2} - \cdots - \theta_{q} B^{q} )$$
(3)

where z t is the observed time series and B is a back shift operator. \(\phi\)(B) and θ(B) are polynomials of order p and q, respectively. The orders p and q are the order of non-seasonal auto-regression and the order of non-seasonal moving average, respectively. Random errors, a t are assumed to be independently and identically distributed with a mean of zero and a constant variance. \(\nabla^{d}\) describes the differencing operation to data series to make the data series stationary and d is the number of regular differencing.

The time series model development consists of three stages: identification, estimation and diagnostic check (Box et al. 1994). In the identification stage, data transformation is often needed to make the time series stationary. Stationarity is a necessary condition in building an ARIMA model that is useful for forecasting (Zhang 2003). The estimation stage of model development consists of the estimation of model parameters. The last stage of model building is the diagnostic checking of model adequacy. This stage checks if the model assumptions about the errors are satisfied. Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit of the tentative model to the observed data. If the model is inadequate, a new tentative model should be identified, which is subsequently followed, again, by the stages of estimation and diagnostic checking.

Artificial neural network models

ANNs are flexible computing frameworks for modeling a broad range of nonlinear problems. They have many features which are attractive for forecasting such as their rapid development, rapid execution time and their ability to handle large amounts of data without very detailed knowledge of the underlying physical characteristics (ASCE 2000a, b).

The ANN models used in this study have a feed forward multi-layer perceptron (MLP) architecture which was trained with the Levenberg–Marquardt (LM) back propagation algorithm. MLPs have often been used in hydrologic forecasting due to their simplicity. MLPs consist of an input layer, one or more hidden layers, and an output layer. The hidden layer contains the neuron-like processing elements that connect the input and output layers given by (Belayneh and Adamowski 2012):

$$y_{k}^{\prime } (t) = f_{0} \left[ \sum\limits_{j = 1}^{m} {w_{kj} } \cdot f_{n} \left( \sum\limits_{i = 1}^{N} w_{ji} x_{i} (t) + (w_{j0} ) + w_{k0} \right)\right]$$
(4)

where N is the number of samples, m is the number of hidden neurons, \(x_{i} (t)\) = the ith input variable at time step t; \(w_{ji}\) = weight that connects the ith neuron in the input layer and the jth neuron in the hidden layer; \(w_{j0}\) = bias for the jth hidden neuron; \(f_{n}\) = activation function of the hidden neuron; \(w_{kj}\) = weight that connects the jth neuron in the hidden layer and kth neuron in the output layer; \(w_{k0}\) = bias for the kth output neuron; \(f_{0}\) = activation function for the output neuron; and \(y_{k}^{\prime } (t)\) is the forecasted kth output at time step t (Kim and Valdes 2003).

MLPs were trained with the LM back propagation algorithm. This algorithm is based on the steepest gradient descent method and Gauss–Newton iteration. In the learning process, the interconnection weights are adjusted using the error convergence technique to obtain a desired output for a given input. In general, the error at the output layer in the model propagates backwards to the input layer through the hidden layer in the network to obtain the final desired output. The gradient descent method is utilized to calculate the weight of the network and adjusts the weight of interconnections to minimize the output error.

Support vector regression models

Support vector regression (SVRs) was introduced by Vapnik (1995) in an effort to characterize the properties of learning machines so that they can generalize well to unseen data (Kisi and Cimen 2011). SVRs embody the structural risk minimization principle, unlike conventional neural networks which adhere to the empirical risk minimization principle. As a result, SVRs seek to minimize the generalization error, while ANNs seek to minimize training error.

In regression estimation with SVR the purpose is to estimate a functional dependency f(\(\mathop x\limits^{ \to }\)) between a set of sampled points X = \({\vec x_{1} , \vec x_{2} , \ldots , \vec x_{l}}\) taken from R n and target values Y = \(\{ y_{1} ,y_{2} , \ldots ,y_{l} \}\) with \(y_{i} \in R\) [the input and target vectors (x i ’s and y i s) refer to the monthly records of the SPI index]. Detailed descriptions of SVR model development can be found in Cimen (2008).

Wavelet transforms

The first step in wavelet analysis is to choose a mother wavelet (\(\psi\)). The continuous wavelet transform (CWT) is defined as the sum over all time of the signal multiplied by scaled and shifted versions of the wavelet function ψ (Nason and Von Sachs 1999):

$$W(\tau ,s) = \frac{1}{{\sqrt {\left| s \right|} }}\int\limits_{ - \infty }^{\infty } {x(t)\psi^{*} \left( {\frac{t - \tau }{s}} \right){\text{d}}t}$$
(5)

where s is the scale parameter; \(\tau\) is the translation and * corresponds to the complex conjugate (Kim and Valdes 2003). The CWT produces a continuum of all scales as the output. Each scale corresponds to the width of the wavelet; hence, a larger scale means that more of a time series is used in the calculation of the coefficient than in smaller scales. The CWT is useful for processing different images and signals; however, it is not often used for forecasting due to its complexity and time requirements to compute. Instead, the successive wavelet is often discrete in forecasting applications to simplify the numerical calculations. The discrete wavelet transform (DWT) requires less computation time and is simpler to implement. DWT scales and positions are usually based on powers of two (dyadic scales and positions). This is achieved by modifying the wavelet representation to (Cannas et al. 2006):

$$\psi_{j,k} (t) = \frac{1}{{\sqrt {\left| {s_{0}^{j} } \right|} }}\psi \left( {\frac{{t - k\tau_{0} s_{0}^{j} }}{{s_{0}^{j} }}} \right)$$
(6)

where j and k are integers that control the scale and translation respectively, while s 0 > 1 is a fixed dilation step (Cannas et al. 2006) and \(\tau_{0}\) is a translation factor that depends on the aforementioned dilation step. The effect of discretizing the wavelet is that the time–space scale is now sampled at discrete levels. The DWT operates two sets of functions: high-pass and low-pass filters. The original time series is passed through high-pass and low-pass filters, and detailed coefficients and approximation series are obtained.

One of the inherent challenges of using the DWT for forecasting applications is that it is not shift invariant (i.e. if we change values at the beginning of our time series, all of the wavelet coefficients will change). To overcome this problem, a redundant algorithm, known as the à trous algorithm can be used, given by (Mallat 1998):

$$C_{i + 1} (k) = \sum\limits_{l = - \infty }^{ + \infty } {h(l)c_{i} (k + 2^{i} l)}$$
(7)

where h is the low pass filter and the finest scale is the original time series. To extract the details, \(w_{i} (k)\), that were eliminated in Eq. (7), the smoothed version of the signal is subtracted from the coarser signal that preceded it, given by (Murtagh et al. 2003):

$$w_{i} (k) = c_{i - 1} (k) - c_{i} (k)$$
(8)

where \(c_{i} (k)\) is the approximation of the signal and \(c_{i - 1} (k)\) is the coarser signal. Each application of Eqs. (7) and (8) creates a smoother approximation and extracts a higher level of detail. Finally, the non-symmetric Haar wavelet can be used as the low pass filter to prevent any future information from being used during the decomposition (Renaud et al. 2002).

Study areas

In this study, the SPI was forecast for the Awash River Basin in Ethiopia (Fig. 1). The Awash River Basin was separated into three smaller basins for the purpose of this study on the basis of various factors such as location, altitude, climate, topography and agricultural development. The statistics of each station is shown in Table 2. Drought is a common occurrence in the Awash River Basin (Edossa et al. 2010). The heavy dependence of the population on rain-fed agriculture has made the people and the country’s economy extremely vulnerable to the impacts of droughts. The mean annual rainfall of the basin varies from about 1600 mm in the highlands to 160 mm in the northern point of the basin. The total amount of rainfall also varies greatly from year to year, resulting in severe droughts in some years and flooding in others. The total annual surface runoff in the Awash Basin amounts to some 4900 × 106 m3 (Edossa et al. 2010). Effective forecasts of the SPI can be used for mitigating the impacts of drought that manifests as a result of rainfall shortages in the area. Rainfall records from 1970 to 2005 were used to generate SPI 3 and SPI 6 time series. The normal ratio method, recommended by Linsley et al. (1988), was used to estimate the missing rainfall records at some stations.

Fig. 1
figure 1

Awash River Basin (Source: Ministry of Water Resources, Ethiopia. Agricultural Water Management Information System. http://www.mowr.gov.et/AWMISET/images/Awash_agroecologyv3.pdf. Accessed 06-June-2013)

Table 2 Descriptive statistics of the Awash River Basin

Methodology

ARIMA model development

Based on the Box and Jenkins approach, ARIMA models for the SPI time series were developed based on three steps: model identification, parameter estimation and diagnostic checking. The details on the development of ARIMA models for SPI time series can be found in the works of Mishra and Desai (2005) and Mishra et al. (2007).

In an ARIMA model, the value of a given times series is a linear aggregation of p previous values and a weighted sum of q previous deviations (Mishra and Desai 2006). These ARIMA models are autoregressive to order p and moving average to order q and operate on dth difference of the given times series. Hence, an ARIMA models is distinguished with three parameters (p, d, q) that can each have a positive integer value or a value of zero.

Wavelet transformation

When conducting wavelet analysis, the number of decomposition levels that is appropriate for the data must be chosen. Often the number of decomposition levels is chosen according to the signal length (Tiwari and Chatterjee 2010) given by L = int[log(N)] where L is the level of decomposition and N is the number of samples. According to this methodology the optimal number of decompositions for the SPI time series in this study would have been 3. In this study, each SPI time series was decomposed between 1 and 9 levels. The best results were compared at all decomposition levels to determine the appropriate level. The optimal decomposition level varied between models. Once a time series was decomposed into an appropriate level, the subsequent approximation series was either chosen on its own, in combination with relevant detail series or the relevant detail series were added together without the approximation series. With most SPI time series, choosing just the approximation series resulted in the best forecast results. In some cases, the summation of the approximation series with a decomposed detail series yielded the best forecast results. The appropriate approximation was used as an input to the ANN and SVR models. As discussed in “Wavelet transforms”, the ‘a trous’ wavelet algorithm with a low pass Haar filter was used.

ANN models

The ANN models used to forecast the SPI were recursive models. The input layer for the models was comprised of the SPI values computed from each rainfall gauge in each sub-basin. The input data was standardized from 0 to 1.

All ANN models, without wavelet decomposed inputs, were created with the MATLAB (R.2010a) ANN toolbox. The hyperbolic tangent sigmoid transfer function was the activation function for the hidden layer, while the activation function for the output layer was a linear function. All the ANN models in this study were trained using the LM back propagation algorithm. The LM back propagation algorithm was chosen because of its efficiency and reduced computational time in training models (Adamowski and Chan 2011).

There are between 3 and 5 inputs for each ANN model. The optimal number of input neurons was determined by trial and error, with the number of neurons that exhibited the lowest root mean square error (RMSE) value in the training set being selected. The inputs and outputs were normalized between 0 and 1. Traditionally the number of hidden neurons for ANN models is selected via a trial and error method. However a study by Wanas et al. (1998) empirically determined that the best performance of a neural network occurs when the number of hidden nodes is equal to log(N), where N is the number of training samples. Another study conducted by Mishra and Desai (2006) determined that the optimal number of hidden neurons is 2n + 1, where n is the number of input neurons. In this study, the optimal number of hidden neurons was determined to be between log(N) and (2n + 1). For example, if using the method proposed by Wanas et al. (1998) gave a result of four hidden neurons and using the method proposed by Mishra and Desai (2006) gave seven hidden neurons, the optimal number of hidden neurons is between 4 and 7; thereafter the optimal number was chosen via trial and error. These two methods helped establish an upper and lower bound for the number of hidden neurons.

For all the ANN models, 80 % of the data was used to train the models, while the remaining 20 % of the data was divided into a testing and validation set with each set comprising 10 % of the data.

WA-ANN models

The WA-ANN models were trained in the same way as the ANN models, with the exception that the inputs were made up from either, the approximation series, or a combination of the approximation and detail series after the appropriate wavelet decomposition was selected. The model architecture for WA-ANN models consists of 3–5 neurons in the input layer, 4–7 neurons in the hidden layer and one neuron in the output layer. The selection of the optimal number of neurons in both the input and hidden layers was done in the same way as for the ANN models. The data was partitioned into training, testing and validation sets in the same manner as ANN models.

Support vector regression models

All SVR models were created using the OnlineSVR software created by Parrella (2007), which can be used to build support vector machines for regression. The data was partitioned into two sets: a calibration set and a validation set. 90 % of the data was partitioned into the calibration set while the final 10 % of the data was used as the validation set. Unlike neural networks the data can only be partitioned into two sets with the calibration set being equivalent to the training and testing sets found in neural networks. All inputs and outputs were normalized between 0 and 1.

All SVR models used the nonlinear radial basis function (RBF) kernel. As a result, each SVR model consisted of three parameters that were selected: gamma (γ), cost (C), and epsilon (ε). The γ parameter is a constant that reduces the model space and controls the complexity of the solution, while C is a positive constant that is a capacity control parameter, and ε is the loss function that describes the regression vector without all the input data (Kisi and Cimen 2011). These three parameters were selected based on a trial and error procedure. The combination of parameters that produced the lowest RMSE values for the calibration data sets were selected.

WA-SVR models

The WA-SVR models were trained in exactly the same way as the SVR models with the OnlineSVR software (2007) with the exception that the inputs were wavelet decomposed.

The data for WA-SVR models was partitioned exactly like the data for SVR. The optimal parameters for the WA-SVR models were chosen using the same procedure used to find the parameters for SVR models.

Performance measures

To evaluate the performances of the aforementioned data driven models the following measures of goodness of fit were used:

$${\text{The}}\;{\text{coefficient}}\;{\text{of}}\;{\text{determination}}\; (R^{ 2} )= \frac{{\sum\nolimits_{i = 1}^{N} {(\hat{y}_{i} - \bar{y}_{i} )} }}{{\sum\nolimits_{i = 1}^{N} {(y_{i} } - \bar{y}_{i} )^{2} }}$$
(9)
$${\text{where}}\;\bar{y}_{i} = \frac{1}{N}\sum\limits_{i = 1}^{N} {y_{i} }$$
(10)

where \(\bar{y}_{i}\) is the mean value taken over N, y i is the observed value, \(\hat{y}_{i}\) is the forecasted value and N is the number of samples. The coefficient of determination measures the degree of correlation among the observed and predicted values. It is a measure of the strength of the model in developing a relationship among input and output variables. The higher the value of R 2 (with 1 being the highest possible value), the better the performance of the model.

$${\text{The Root Mean Squared Error }}\left( {\text{RMSE}} \right) \, = \sqrt {\frac{\text{SSE}}{N}}$$
(11)

where SSE is the sum of squared errors, and N is the number of data points used. SSE is given by:

$${\text{SSE}} = \sum\limits_{i = 1}^{N} {(\hat{y}_{i} - y_{i} )^{2} }$$
(12)

with the variables already having been defined. The RMSE evaluates the variance of errors independently of the sample size.

$${\text{The Mean Absolute Error }}\left( {\text{MAE}} \right) \, = \sum\limits_{i = 1}^{N} {\frac{{\left| {\hat{y}_{i} - y_{i} } \right|}}{N}}$$
(13)

The MAE is used to measure how close forecasted values are to the observed values. It is the average of the absolute errors.

Results and discussion

In the following sections, the forecast results for the best data driven models at each sub-basin are presented. The forecasts presented are from the validation data sets for time series of SPI 3 and SPI 6, which are mostly used to describe short-term drought (agricultural drought).

SPI 3 forecasts

The SPI 3 forecast results for all data driven models are presented in Tables 3 and 4. As the forecast lead time is increased, the forecast accuracy deteriorates for all stations. In the Upper Awash basin, the best data driven model for SPI 3 forecasts of 1 month lead time was a WA-ANN model. The WA-ANN model at the Ziquala station had the best results in terms of RMSE and MAE, with forecast results of 0.4072 and 0.3918, respectively. The Ginchi station had the best WA-ANN model in terms of R 2, with forecast results of 0.8808. When the forecast lead time is increased to 3 months, the best models remain WA-ANN models. The Bantu Liben station had the model with the lowest RMSE and MAE values of 0.5098 and 0.4941, respectively. The Sebeta station had the best results in terms of R 2, with a value of 0.7301.

Table 3 The best ARIMA, ANN and SVR models for 1 and 3 month forecasts of SPI 3
Table 4 The best WA-ANN and WA-SVR models for 1 and 3 month forecasts of SPI 3

In the Middle Awash basin, for forecasts of 1 month lead time, WA-ANN and WA-SVR models had the best forecast results. The WA-ANN model at the Modjo station had the best results in terms of R 2 with a value of 0.8564. However, unlike the Upper Awash basin, the best forecast results in terms of RMSE and MAE were from a WA-SVR model. The WA-SVR model at the Modjo station had the lowest RMSE and MAE values of 0.4309 and 0.4018, respectively. For forecasts of 3 months lead time, WA-ANN models had the best results across all performance measures with the Modjo station having the highest value of R 2 at 0.6808 and the Gelemsso station having the lowest RMSE and MAE values of 0.5448 and 0.5334, respectively.

In the Lower Awash basin, for forecasts of 1 month lead time, the best results were from WA-ANN and WA-SVR models, similar to the Middle Awash basin. The highest value for R 2 was 0.7723 and it was from the WA-ANN model at the Eliwuha station. The lowest values for RMSE and MAE were 0.4048 and 0.3873, and were from the WA-SVR model at the Eliwuha station. For forecasts of 3 months lead time the best results were observed at the Bati station in terms of R 2. The WA-SVR model at this station had the highest R 2 value of 0.5915 and the WA-SVR model at the Eliwuha station had the lowest RMSE and MAE values of 0.5159 and 0.5129 respectively.

SPI 6 forecasts

The SPI 6 forecast data is presented in Tables 5 and 6. The forecast results for SPI 6 are significantly better than SPI 3 forecasts according to all three performance measures. In the Upper Awash Basin a WA-ANN model at the Bantu Liben station had the best forecast result in terms of RMSE with a result of 0.3438. Another WA-ANN model at the Ginchi station provided the lowest MAE value of 0.3212. The best forecast result in terms of R 2, 0.9163, was from a WA-SVR model at the Ziquala station. The forecast results in Tables 5 and 6 show that WA-ANN and WA-SVR models provide the best SPI 6 forecasts. Neither method is significantly better than the other. This is best illustrated by the fact the models provide very similar results according to the performance measures used.

Table 5 The best ARIMA, ANN and SVR models for 1 and 3 month forecasts of SPI 6
Table 6 The best WA-ANN and WA-SVR models for 1 and 3 month forecasts of SPI 6

As the forecast lead time is increased the forecast accuracy of all the models declines. This decline is most evident in the ARIMA, ANN and SVR models. The forecast accuracy for 3 month lead time forecasts are still significantly better for WA-ANN and WA-SVR models. In the Upper Awash Basin the best model in terms of R 2 is a WA-ANN model at the Sebeta station and has a forecast result of 0.7723. The best model in terms of RMSE and MAE is a WA-SVR model at the Ginchi station with forecast results of 0.4224 and 0.3864 respectively. In the Middle Awash Basin the best model in terms of R 2 is also a WA-ANN model and the Modjo station with a result of 0.7414 and the best model in terms of RMSE and MAE is a WA-SVR model at the Dire Dawa station with forecast results of 0.4049 and 0.3897 respectively. In the Lower Basin the best result in terms of R 2 and MAE is from a WA-SVR model at the Eliwuha and Dubti stations respectively and the best model in terms of RMSE is from a WA-ANN model at the Dubti station.

Discussion

As shown in the forecast results for both SPI 3 and SPI 6, the use of wavelet analysis increased forecast accuracy for both 1 and 3 month forecast lead times. This pattern is similarly shown in SPI forecasts within the Awash River Basin for lead times of 1 and 6 months in Belayneh and Adamowski (2012). Once the original SPI time series was decomposed using wavelet analysis it was found that the approximation series of the signal was disproportionally more important for future forecasts compared to the wavelet detail series of the signal. Irrespective of the number of decomposition levels, an absence of the approximation series would result in poor forecast results. Adding the approximation series to the wavelet details did not noticeably improve the forecast results compared to using the approximation series on its own in most models. Traditionally, the number of wavelet decompositions is either determined via trial and error or using the formula L = log [N], with N being the number of samples. Using this formula the optimal number of decompositions would be L = 3. In this study, the above method was repeated for wavelet decomposition levels 1 through 9 until the appropriate level was determined using the aforementioned performance measures.

In general, WA-ANN and WA-SVR models were the best forecast models in each of the sub-basins. Wavelet-neural networks were also shown to be the best forecast method in forecasting the SPI at 1 and 6 months lead time in Belayneh and Adamowski (2012). Unlike the study by Belayneh and Adamowski (2012), this study also coupled wavelet transforms with SVR models. Coupled WA-SVR models had improved results compared to SVR models and outperformed WA-ANN using some of the performance measures models at some stations.

While both the WA-ANN and WA-SVR models were effective in forecasting SPI 3, most WA-ANN models had more accurate forecasts. In addition, as shown by Figs. 2 and 3, the forecast from the WA-ANN model seems to be more effective in forecasting the extreme SPI values, whether indicative of severe drought or heavy precipitation. While the WA-SVR model closely mirrors the observed SPI trends, it seems to underestimate the extreme events, especially the extreme drought event at 170 months.

Fig. 2
figure 2

SPI 3 forecast results for the best WA-ANN model at the Bati station for 1 month lead time

Fig. 3
figure 3

SPI 3 forecast results for the best WA-SVR model at the Bati Station for 1 month lead time

The reason why WA-ANN models seem to be slightly more effective than WA-SVR models for forecasts of SPI 3, and seem to be more effective in forecasting extreme events, is likely due to the inherent effectiveness of ANNs compared to SVR models, such as their simplicity in terms of development and their reduced computation time, as the wavelet analysis used for both machine learning techniques is the same. This observation is further supported by the fact that most ANN forecasts have better results than SVR models as shown in Table 3. Theoretically, SVR models should perform better than ANN models because they adhere to the structural risk minimization principle instead of the empirical risk minimization principle. They should, in theory, not be as susceptible to local minima or maxima. However, the performance of SVR models is highly dependent of the selection of the appropriate kernel and its three parameters. Given that there are no prior studies on the selection of these parameters for forecasts of the SPI, the selection was done via a trial and error procedure. This process is made even more difficult by the size of the data set (monthly data from 1970 to 2005), which contributes to the long computation time of SVR models. The uncertainty regarding the three SVR parameters increases the number of trials required to obtain the optimal model. Due to the long computational time of SVR models the same amount of trials cannot be done as for ANN models. For ANN models, even in complex systems, the relationship between input and output variables does not need to be fully understood. Effective models can be determined by varying the number of neurons within the hidden layer. Producing several models with varying architectures is not computationally intensive and allows for a larger selection pool for the optimal model. In addition, the ability of wavelet analysis to effectively forecast local discontinuities likely reduces the susceptibility in ANN models when they are coupled.

This study also shows that the à trous algorithm is an effective tool for forecasting SPI time series. The à trous algorithm de-noises a given time series and improves the performances of both ANN and SVR models. The à trous algorithm is shift invariant, making it more applicable for forecasting studies, which includes drought forecasting. The fact that wavelet based models had the best results is likely due to the fact that wavelet decomposition was able to capture non-stationary features of the data.

Conclusion

This study explored forecasting short-term drought conditions using five different data driven models in the Awash River basin, including newly proposed methods based on SVR and WA-SVR. With respect to wavelet analysis, this study found, for the first time, that the use of only the approximation series was effective in de-noising a given SPI time series. SPI 1 and SPI 3 were forecast over lead times of 1 and 3 months using ARIMA, ANN, SVR, WA-SVR and WA-ANN models. Forecast results for SPI 1 were low in terms of the coefficient of determination, likely a result of the low levels of autocorrelation of the data sets compared to SPI 3. Overall, the WA-ANN method, with a new method for determining the optimal number of neurons within the hidden layer, had the best forecast results with WA-SVR models also having very good results. Wavelet coupled models consistently showed lower values of RMSE and MAE compared to the other data driven models, possibly because wavelet decomposition de-noises a given time series subsequently allowing either ANN or SVR models to forecast the main signal rather than the main signal with noise.

Two of the three Awash River Sub-basins have semi-arid climates. The effectiveness of the WA-ANN and WA-SVR models indicate that these models may be effective forecast tools in semi-arid regions. Studies should also focus on different regions and try to compare the effectiveness of data driven methods in forecasting different drought indices. The forecasts did not show a particular trend with respect to a particular sub-basin. The climatology of a given sub-basin did not significantly affect the forecast results for any particular station. This study has not found a clear link between a particular sub-basin and performance indicating the need for further studies in different climates to determine whether there is a significant link between forecast accuracy and climate. The coupling of these data driven models with uncertainty analysis techniques such as bootstrapping should be investigated. In addition, coupling SVR models with genetic algorithms to make parameter estimation more efficient could be explored.