Rainfall and runoff time-series trend analysis using LSTM recurrent neural network and wavelet neural network with satellite-based meteorological data: case study of Nzoia hydrologic basin

This study compares LSTM neural network and wavelet neural network (WNN) for spatio-temporal prediction of rainfall and runoff time-series trends in scarcely gauged hydrologic basins. Using long-term in situ observed data for 30 years (1980–2009) from ten rain gauge stations and three discharge measurement stations, the rainfall and runoff trends in the Nzoia River basin are predicted through satellite-based meteorological data comprising of: precipitation, mean temperature, relative humidity, wind speed and solar radiation. The prediction modelling was carried out in three sub-basins corresponding to the three discharge stations. LSTM and WNN were implemented with the same deep learning topological structure consisting of 4 hidden layers, each with 30 neurons. In the prediction of the basin runoff with the five meteorological parameters using LSTM and WNN, both models performed well with respective R2 values of 0.8967 and 0.8820. The MAE and RMSE measures for LSTM and WNN predictions ranged between 11–13 m3/s for the mean monthly runoff prediction. With the satellite-based meteorological data, LSTM predicted the mean monthly rainfall within the basin with R2 = 0.8610 as compared to R2 = 0.7825 using WNN. The MAE for mean monthly rainfall trend prediction was between 9 and 11 mm, while the RMSE varied between 15 and 21 mm. The performance of the models improved with increase in the number of input parameters, which corresponded to the size of the sub-basin. In terms of the computational time, both models converged at the lowest RMSE at nearly the same number of epochs, with WNN taking slightly longer to attain the minimum RMSE. The study shows that in hydrologic basins with scarce meteorological and hydrological monitoring networks, the use satellite-based meteorological data in deep learning neural network models are suitable for spatial and temporal analysis of rainfall and runoff trends.


Introduction
In sustainable water resources management, the accurate modelling of hydrological processes at watershed scales is a significant contributing factor.In particular, predictions of rainfall and runoff trends are important for different water resource planning such as in irrigation, flood control, structural design, and eco-hydrological services [1].In most countries, catchment basins are sparsely gauged without accurate and adequate rainfall and runoff measurements [2].This contributes to higher uncertainty in attempts to predict hydrological responses in such areas [3].Rainfall and runoff can be characterized as random stochastic processes related to complex physical factors within catchments.Because of the spatial and temporal variabilities within watersheds, the patterns and number of variables required for the modelling of rainfall and runoff presents a complex hydrologic problem [4,5].
Generally, the forecasting of time-series data depends on the sequence being modelled and can have different dimensional spatial dependencies.In the prediction of rainfall and runoff time-series trends, physical and conceptual models are traditionally utilized [6].While conceptual models are considered to be suitable for daily timescale analysis, physical models can be used for daily and sub-daily timescale predictions.Because of the timescale dependencies, the physical and conceptual models are considered as unsuitable for accurate prediction of rainfall and runoff particularly where there is lack of high resolution spatial [7,9].Furthermore, these models require physical parameters which limit their application in the prediction of sequence data with unknown or limited quasi-periodic dynamics [10][11][12]50].
Among the statistical methods, the Autoregressive Moving Average (ARMA) and its invariants as Autoregressive Integrated Moving Average (ARIMA) and nearest-neighbor methods have been used for rainfall and runoff predictions.Nevertheless, the accuracy of the statistical methods depends on the quality of the input data and can only satisfactorily describe time-series data that exhibit non-stationary behaviors within and across seasons [45].To improve on the rainfall and runoff prediction results, data-driven approaches have been proposed.This is attributed to the ability to approximate the inherent patterns and dynamics of series data without the knowledge of the parameters, and to take into consideration the stochasticity in observation and system noise.In particular, ANNs have been explored and preferred for rainfall and runoff time-series trend analysis [13].
To predict rainfall and runoff, [14] proposed to use ANN and fuzzy logic, while ANN and ARIMA models were adopted in [15].Other studies, e.g.[16], utilized Radial Basis Function (RBF) network and empirical model decompositions in the prediction of rainfall.The above studies reported that the approaches did not adequately capture the seasonal decomposition and the inherent cyclical fluctuations in rainfall and runoff time sequence data.Towards improving ANN performance in hydrological predictions for Kentucky River catchment, [17] developed an ANN-based training using genetic algorithms for streamflow magnitude predictions.
For the case study of data sparse Malaprabha River Basin in India, [18] developed a modular neural network to capture variable intensities in rainfall and runoff simulations.The results from [18] were superior as compared to the methods in [6,17,19].
Despite the aforementioned results, the main drawback of the conventional feedforward ANNs is their tendency to lose significant information on the sequential order of the input data during training.This is attributed to the vanishing gradient effect which occurs as the number of layers increases [13].Further, the applicability of a single ANN in hydrological phenomena modelling may not reliably capture the localized temporal and seasonal dynamics of rainfall and runoff [18,20].Thus, because of the inherent seasonality and non-linear characteristics of rainfall and runoff time-series data, hybrid models such as: wavelets and least squares Support Vector Machines [40]; wavelet transform and artificial neural network hybrid models [46]; waveletartificial neural network and comparison with adaptive neurofuzzy inference system [48], and singular spectral analysis and discrete wavelet transform in hybrid models [55] have been recommended in their simulation and prediction.
The problem of displaying long-term dependencies in time-series data implies that the desired output at time t depends on input value that occurred at an earlier time  << t .As such, the dynamical neural system for such a task should be able to learn to store information for an arbitrary duration (memory) for the minimization of noise corruption.Because the typical feedforward network is not sufficiently powerful to discover contingencies spanning long temporal distances, it easily suffers from vanishing gradient effect as the number of layers increases.RNNs are most suited to store long-term time-series data with different temporal scales.However, simple RNNs that depend on the largest eigenvalue of the state-update matrix may have gradients which either increase or decrease exponentially over time.Long short-term memory (LSTM) RNN [25], was developed to improve on the conventional RNN models.LSTM-RNN uses input, output and forget gates to achieve a network that can maintain state and propagate gradients in a stable fashion over long time spans.These networks have been shown to outperform deep feedforward neural networks on a variety of tasks [57].Due to such capability, LSTM has been applied for rainfall and runoff predictions in [13,21].
To take into account the non-stationarity in the assimilation of rainfall and runoff time-series data, this study proposes to compare the wavelet neural network (WNN) and the LSTM recurrent neural network.The comparison is based on the fact that in data showing persistence structure within the series, data-driven models can be considered to be more appropriate in accounting for their sequence dependency, non-stationarity and non-linearity.Recent investigations have demonstrated that WNN [23] and LSTM [13,21] as data-driven models, can overcome the constraints of timeseries modelling and are suitable for taking into account the quasi-periodicities in rainfall and runoff predictions.
Specifically, wavelet transform (WT) can be used to analyze the data signal details through signal decomposition into time-frequency domains.Adopting discrete wavelet transformation (DWT), the rainfall and runoff series data can be estimated into independent data with periodicity [24,25,39].Further, in temporal sequence predictions, DWT can infer the normally hidden time-frequency information in time-series data.This study thus proposes the wavelets coupled ANN towards improving the rainfall and runoff prediction model performance as demonstrated in forecasting streamflows with more reliable results [40,41].The waveletneural network model is proven to be superior to the conventional ANN and statistical regression models in rainfall and runoff prediction in different case studies by [48,51,52].
Compared to WNN, LSTM is capable of dynamically incorporating predecessor or past learning experience due to internal recurrence.LSTM is also considered to be more powerful computationally and topologically more reasonable as compared to the conventional feedforward neural networks without internal states [25].With this ability, LSTM can automatically project the inherent properties in time-series data for accurate simulation and approximation of the chaotic series.LSTM is also suitable where there is a long delay and accounts for time-series signals with lowand high-frequencies.Compared to the conventional ANN and statistical models such as ARMA and ARIMA, LSTM is capable of robustly learning information contained in time-series data, and can effectively capture the variability of time-series data [23,42,43].
To understand the significance of the two neural network models in rainfall and runoff trend analysis, this study explores the implementation of WNN and LSTM in rainfall and runoff trend characterization and predictions within a hydrologic basin with scarce meteorological and hydrological monitoring network.From literature, comparisons on the advantages of WNN and LSTM have not been carried out especially in rainfall and runoff hydrological applications in data scarce basins.Because of lack of observed meteorological data, this study also evaluates the significance of satellite-based meteorological data including rainfall, temperature, relative humidity, wind speed and solar radiation as input data in rainfall and runoff trend characterization and prediction in the Nzoia River basin in Kenya.

Study area characterization
The Nzoia River Basin forms part of the larger Lake Victoria basin.The basin is situated within latitudes 1°30′ N and 0°30′ S and longitude 34°00′ E and 35°45′ E, with an approximate area of 12,700 km 2 .The elevation ranges between 1100 m and 4000 m AMSL (Fig. 1) [44].The lower parts of the basin have a flat terrain with a slope mainly ranging between 2 and 6%, whereas the upper part is hilly with more rugged terrain as depicted in Fig. 1.Further details on the study area land-use and climatic characteristics can be found in [27,44].
In the prediction modelling, the rainfall and streamflow stations are independently modelled with respect to the containing sub-basin as depicted in Fig. 1.Each discharge station was treated as a pour point and sub-basins were delineated to include all the streams draining to each discharge station.In Fig. 1, the spatial distributions of the discharge stations and the ground and satellite meteorological stations are shown.

Data
The basin was divided into upper (1BC01), middle (1DA02) and lower (1EF01) sub-basins according to the locations of the discharge measurement stations (Fig. 1).The in situ rainfall data and satellite-based meteorological data were aggregated to monthly averages for the 30 years (1980-2009) of study.
The satellite meteorological data were downloaded from the National Centers for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR) website (https:// clima tedat aguide.ucar.edu/ clima te-data/ clima te-forec astsystem-reana lysis-cfsr).The graphical variability of the mean monthly satellite-based and measured meteorological parameters and streamflow are presented in Fig. 2. Notably, from the meteorological data, a linear trend analysis shows increase in temperature and solar radiation, with corresponding decrease in relative humidity, wind speed and received precipitation within the basin (Fig. 2a-d).The trends in the precipitation from the in-situ measurements and from satellite data have a marginal difference with the satellite data overestimating the observed precipitation (Fig. 2e).
Table 1 presents a summary of the mean monthly statistical descriptions of the satellite-based meteorological data and the observed rainfall and streamflow data.Comparing the observed and satellite-based meteorological data, it is noted that the satellite-based rainfall overestimated the measured mean monthly rainfall by approximately 36 mm, and with twice the standard deviation.The streamflow volume is seen to vary according to the location of the pour point and size of the sub-basin.

Methods
This section introduces the neural network models structure, implementation and validation approach.Figure 3 presents a summary of the implementation strategy and processing flow for the prediction of rainfall and runoff trends with the proposed WNN and LSTM models.In Fig. 3, the missing rainfall data (RF) were interpolated using Inverse Distance Weighting (IDW) method.The deterministic IDW was used to interpolate the rainfall points since the missing data was less than 5%.
In general, time-series data forecasting can be represented as a 2D problem, for example as a P × Q, phenomenon tensor Y ∈ R P×Q×k with k measurements.The spatial observations can be represented in a time-dimension with T-time steps as Y 1∶T .In the forecasting modelling, with Y 1∶T as the previous observations, the future ΔT sequence is defined by the time interval T(1 + ΔT) which is estimated as ⌢ Y T+1∶T+ΔT .The forecasting t ask in this case is def ined as , which is the mostly likely predicted sequence.

LSTM recurrent neural networks
RNNs are a special neural network designed for understanding the temporal and dynamic sequences in series data [28][29][30][31].As compared to feedforward networks that pass the data forward from input to output, RNNs have feedback loops where output data can feedback into the input at some point before feedforward again for further processing and final output.The advantage of the RNN model is in the ability to model sequence time-series data such that each sample can be assumed to be dependent on previous ones.The feedback connections in RNNs provide the memory of previous activations, thus allowing the model to iteratively learn the dynamics of the sequential data in time-steps [42].Though RNNs exhibit powerful capability for modelling of complex and non-linear time-series data [32], the conventional RNNs suffers from diminishing gradient, especially in the backpropagation iterative learning process.Thus, conventional RNNs may not adequately learn from longer time-lag data with dependencies [32].To solve this problem, [25] proposed the LSTM algorithm (Fig. 3).The LSTM have been proven to be more suitable in the simulation of sequence-based problems with long-term dependencies [34,41].
In the LSTM-RNN model, the memory blocks contain the input gate, output gate and forget gate that replace the hidden units.The gates are responsible for the control the internal operations on the network.Despite different LSTM variants being proposed, a comparative analysis shows that the standard LSTM is still the most significant [35,42], and thus adopted and evaluated in this study.Figure 4 shows a sample representation of a single LSTM architecture as adopted in the current study.In addition to the memory gates, the fundamental components of the LSTM network comprise of the cell state, sigmoid and the tanh activation functions as depicted in Fig. 4. The inclusion and removal of information to the cell state are regulated by the gates.The sigmoid activation functions in the gates multiplies inputs by values of [1], and determines the data to be included or removed.
Let c t be the input sum at time step t , the LSTM updates for time step i at given inputs x t , h t−1 , and c t−1 are given as in Eqs.1-5 [25], that describe the algorithms of a typical LSTM layers.
(1)   where: = non-linearity sigmoid function, i t = input gate, W = weight matrix, x t = time t input, c t = cell state, h t = time t output, f t = forget gate [1], o t = output gate, h t−1 = hidden state vector of the previous time step, and b i = input bias vector [25].
During model training, the LSTM network is optimized using the backpropagation algorithm, and the structure of the LSTM hidden layer unit comprises of addition and multiplication operations, and several active layers.This minimizes the RNN drawback of gradient vanishing problems.The implementation of the LSTM is detailed in [13,21].

Wavelet neural networks
Wavelet transform (WT) WT of a signal is the representation of the data in terms of the time-frequency domain.Through the transformation, the noise components are removed and (2) the signals are decomposed into high-and low-frequency through the high-pass and low-pass operations.In the trend analysis of time-series data, wavelet transform is useful for the effective capture of the inherent and hidden characteristics and trends, as well as in detecting localized and nonstationary of the events.It is proposed in this study that WT is capable of detecting the non-stationarity phenomena in rainfall and runoff data by representing the original signal in low-and high-frequency data components.
As detailed in [25,36], a mother wavelet function is constructed for the wavelet function.If (t) is an integrable square function, with (t) = L 2 (R) , and if its Fourier trans- form Ψ( ) satisfies the compatibility condition (Eq. 6): then (t) is the mother wavelet.Translation ( ) and scale ( a ) factors of WT are made so that we get function a, (t) as in Eq. 7: Ψ a, ( ) = continuous wavelet, and the inner product of the input signal x(t) and (t) is calculated as in Eq. 8, and its Fourier transform time-domain in Eq. 9.In the implementation of WT, the success is based on the selected mother wavelet.In hydrological time-series modelling, the Daubechies (DAUB-N) wavelets have proven to be more effective due to the orthonormality properties and balance between information quality conservation and abundance [37,38,46].DWT was used in this study to decompose the rainfall and runoff data using the Daubechies level 4 (DB4), since DB4 is able to minimize the noise but does not oversmoothen the signal [37].More details on the development and implementation of DWT are in presented our previous studies [25,36,47].
To accomplish the characterization and detection of localized phenomena of non-stationary time-series data, the first step is the decomposition of the measured discharge {D d1 (t), D d2 (t), …, D di (t), D a (t)} and rainfall {R d1 (t), R d2 (t), …, R di (t), R a (t)} to multi-frequent data, where D di (t) and R di (t) are the resulting DWT details and D ai (t) and R ai (t) represent the approximation of time-series discharge (D) and rainfall (R).The detail i defines the ith approximation level of the decomposed data.

WT-neural network (WNN) model
Inheriting the properties of wavelets and ANN, the topology of WNN is based on feedforward backpropagation (multilayer perceptron) network, with the mother wavelet acting as the hiddenlayers transfer function.The WNN network topology is shown in Fig. 5a and the implementation strategy in Fig. 5b.The feedforward multilayer perceptron with input layer, hidden layers and output layer was adopted for the rainfall and runoff signals decomposed by the wavelet transform into approximation [D ai (t) and R ai (t)] and detail [D di (t) and R di (t)] coefficients.
Before selecting the DB4, levels 1-10 were compared by trial-and-error and level 4 was adopted on the basis of the size of the validation data [53][54][55].From the wavelet decomposition, several sub-series from the original data were obtained as input variables to the feedforward MLP (Fig. 5a). Figure 5c, d shows sample inputs following DB4 decomposition of the original max temperature and rainfall for station 8,934,023.Sample station 8,934,023 is chosen as it represents the middle elevation of the catchments area.The input includes the level 4 approximation of the original signal and the 4-level details (D1-D4).
From the input vector, hidden layer output is determined as in Eq. 10: where h(j) = node; j = hidden layer nodes; h j = mother wave- let; w ij = input and hidden layer connecting weight; b j = trans- lation factor, and a j = scaling factor for h j .
Adopting the DB4 [25,36], the output layer is determined as in Eq. 11: The updating of the WNN weights and the wavelet function parameters is as the following steps: Step 1: WNN prediction error E(W) computation: where y(k) = prediction output value and y t (k) = target output.
Step 2: WNN weights update and variation of wavelet according to the prediction e:

Input data quantification and normalization
The [1] normalization based on min-max predefined boundary method (Eq.15), was used to linearly transform the original data and to maintain the inherent relationships within the respective datasets [56].After the test output, denormalization is carried out to be able to relate the prediction output with the observed data.

Metrics for model performance evaluation
To compare and evaluate the models, the following statistical measures were used R 2 , RMSE and MAE (Eqs.[16][17][18]: where P i = observed data P ′ i = simulated data; P = mean observed data; P ′ = mean simulated data and e = model errors.

Neural network optimum architecture for rainfall and runoff prediction
The construction of LSTM and WNN model architectures comprises of the creation of the topology of the deep learning network which is significantly determined by the hidden layer neurons and the selection of the optimal training (16) parameters.Through trial-and-error input parameter combinations, the output performance of hidden layers is determined using R 2 , MAE and RMSE (Table 2).To determine the optimal architecture in rainfall and runoff predictions, the hidden layers were varied from 1 to 5 layers, with 30 neurons in each layer and for each sub-basin.The training and validation results from LSTM and WNN models are summarized in Table 2, and are based on all the five input parameters in the entire basin.
The prediction results in Table 2 using R 2 evaluation shows that the performance of the LSTM and WNN models improved with increase in the number hidden layers and corresponding neurons.Notably, after the fourth hidden layer, there was observed to be a consistent decrease in the prediction performance for the models.The LSTM prediction of rainfall as measured with R 2 increased from 0.6448 with 1 hidden layer of 30 neurons, to a maximum of 0.8610 with a topology of 4 hidden layers with 30 neurons each.At one hidden layer, WNN marginally outperformed the LSTM by about 6%, however with four hidden layers LSTM performed better than the WNN by 8% as measured in terms of R 2 .Similar patterns in the results were also observed for runoff prediction with the tested network topological structures as in Table 2.The MAE and RMSE performance metrics showed that with increase in neurons, WNN tended to marginally minimize the prediction errors in comparison with the LSTM neural network model.It is considered that both WNN and LSTM are capable of predicting the rainfall and runoff trends, however with deep learning structure, LSTM marginally outperformed the WNN.
Notably in Table 2, MAE for rainfall prediction using LSTM with three hidden layers received the lowest error, while the opposite observations are obtained when using RMSE evaluation and when using WNN with two hidden layers.Similarly for runoff predictions using LSTM with four hidden layers and WNN with three hidden layers, the inverse variabilities in MAE and RMSE with increase in hidden layers are observed.These observations could be attributed in part to the fact that for the same data, the differences in the observed RMSE and MAE may arise when the error distributions in the data are biased or non-Gaussian.Further, while the MAE gives the same weight to all errors, the RMSE penalizes variance as it gives errors with larger absolute values more weight than errors with smaller absolute values.This could be the cause of the varied prediction results in Table 2 for rainfall and runoff using LSTM and WNN.
To further assess the difference in performance between LSTM and WNN in rainfall and runoff trend prediction in terms of computing time, the variations of the epochs and the RMSE as the standard deviation of the prediction residuals is presented in Fig. 6.In the prediction of rainfall, RMSE converged to a minimum of 14.55 mm in 31st epochs for LSTM and at 15.17 mm in 37th epochs using WNN (Fig. 6a).Further training of the networks towards loss minimization and to obtain possible higher accuracy resulted in an increase in RMSE and reaches a saturation point after 43 epochs with WNN.
In the prediction of runoff, WNN is observed to consistently overestimate the mean runoff within the basin with a minimum RMSE of 15.17 m 3 /s at 34th epochs (Fig. 6b).Using LSTM, the minimum RMSE is at approximately 13.12 m 3 /s was achieved between the 25th-35th epochs.The results further confirm that both models are suitable for the prediction of rainfall and runoff trends with reasonable computing time.
Figure 6c, d shows the accuracy performance the LSTM and WNN models calculated for 50 iteration epochs using training and validation datasets for prediction of rainfall and runoff for the entire basins.In general, the model accuracies on the training and validation datasets increases after each iteration with fluctuations, which could be attributed to some randomnesses in the network.As the model trains during the first pass through the data, both training and validation accuracy increases indicating that the model is learning the structure of the rainfall and runoff prediction data as well as the temporal correlations of the time-series.In the first and consecutive iterations, the validation accuracy did not increase significantly and always higher than the training accuracy.This indicates that the network did not overfit the training data and accurately generalized to the unseen validation data.The optimal accuracy was obtained between 30 and 40 epochs with LSTM being higher WNN in predicting rainfall and runoff respectively by approximately 10% and 5%.The prediction of runoff was consistently recorded with a higher R 2 accuracy as compared to rainfall using both WNN and LSTM (Fig. 6c, d).

Runoff prediction results using the LSTM and WNN models
Adopting the optimal four hidden-layer configuration, the runoff prediction results at the three discharge stations (1BC01, 1DA02 and 1EF01) using the LSTM model is presented in Fig. 7. Similarly, the runoff prediction results using WNN for the three discharge measurement stations are shown in Fig. 8.For both models, the input consisted of the mean monthly satellite-based meteorological datasets.A statistical comparison of the runoff prediction results from LSTM and WNN are presented in Table 3 in terms of R 2 , MAE and RMSE.Except for station IBC01 where WNN predicted the runoff with the R 2 of 0.7820, both models predicted the runoff at the three stations with R 2 greater than 0.80 using the five input parameters as input (Fig. 9).This is evidenced in the 30-year prediction accuracy with the MAE and RMSE of less than 13 m 3 /s for both models.In overall for the entire basin, it is observed that LSTM marginally outperformed the WNN model, with MAE = 11.1452m 3 /s and RMSE = 12.1933 m 3 /s at the basin outlet 1EF01.In practical applications, it is conclusive that both models can be used in the prediction of runoff in data scarce basins.
A comparison of the goodness-of-fit for the prediction of streamflow runoff with the two models as presented in Fig. 9 shows that for the three stations, the use of satellite data to predict streamflow is acceptable as the R values were more than 85%.The accuracy of streamflow prediction is observed to increases with increase in the number of prediction stations within the sub-basins.

Performance of individual meteorological factors in runoff prediction
To determine the significance and accuracy of the contributions of the satellite-based meteorological parameters in the prediction of runoff, each of the five parameters were used as independent inputs with runoff as output.The comparative output results are presented in Fig. 10 representing the runoff predictions for the three discharge stations.
The results in Fig. 10 show that rainfall is the highest contributing indicator variable in runoff prediction with R 2 > 0.8 for the three discharge stations.This confirms the fact that the amount of rainfall that remains after storage, infiltration, interception, evaporation and transpiration contributes to runoff.The least contributing meteorological factor is the relative humidity with R 2 ranging between 0.6 and 0.65 for the three discharge stations using LSTM and WNN.The rest of the parameters, average temperature, wind speed and solar radiation estimated the runoff in the three stations with R 2 ranging between 0.675 and 0.80, with temperature performing better than wind speed and solar radiation.Conclusively, as the input increases from rainfall to all the datasets (1-5) and the hidden layers increased from 1 and 4, the accuracy of runoff prediction is observed to increase for the model training, testing and validation by up to 10%.Further investigations on the explanation of the basis of the individual predictions by comparing the contribution of each feature to each prediction using a unified approach such as SHapley Additive exPlanations (SHAP) [57] is recommended.

Performance of individual meteorological parameters for rainfall prediction
In evaluating the significance of the satellite-based meteorological parameters in rainfall trend prediction, Table 4 presents the performance results of basin mean monthly rainfall prediction using the different meteorological parameters.Satellite-based precipitation is observed to be the most significant predictor in estimating the measured rainfall with R 2 > 0.8 and the least MAE and RMSE errors measures.This is contributed to by the fact that for medium sized and climatically homogenous basins like the Nzoia Basin, the climate factors tend to be replicated throughout the catchment area with minimal variabilities.As such the occurrence of rainfall at one station is generally an indication of rainfall also being recorded at a near distant station within the basin.
Temperature is the second best predictor for rainfall prediction, and the effect of temperature on rainfall arises from the fact that increased temperature leads to increased evaporation, an accelerated rate of the hydrological cycle and more precipitation especially during the wet season.Humidity, wind speed and solar radiation are consecutively ranked as in Table 4 with nearly the same contribution effects on rainfall prediction, implying that they are highly correlated within the basin in terms of their contribution in rainfall prediction.This is also attributed to the size of the basin and the fact that the climate factors are nearly similar within the basin.

Rainfall prediction with combined satellite-based meteorological data
Results of the mean monthly predicted rainfall for the four stations using the five meteorological datasets are   13.The comparative performance between the two models imply that the LSTM predicted the basin mean rainfall with higher R 2 = 0.8610 for the ten stations, while WNN's prediction was at R 2 = 0.7825.The higher accuracy prediction results at individual rainfall stations could be attributed to continuous and accurate gauge data.
In addition to the statistical performance evaluations, graphical comparisons of the observed and modelled runoff and rainfall with LSTM and WNN in Figs. 7, 8 and 11, 12, respectively, shows that the LSTM results matched closely in spatial position and trend with the observed data as compared to the WNN results.Further, the regression lines for runoff and rainfall in Figs. 9 and 13, respectively, shows that LSTM modelled the parameters closer to the 45° line of fit as compared to the WNN.The slightly lower performance from WNN results could be attributed to the feedforward ANN used in the training the input signals.The study results on the prediction of rainfall and runoff trends show that LSTM and wavelet-based neural networks are able to overcome the timescale conversion problems in time-series data analysis for accurate forecasting [8,9], as they are capable of capturing the quasi-periodic signals in long-term rainfall and runoff data which are also characterized by cyclical fluctuations with inherent noise [10][11][12].The LSTM and WNN are considered to be superior to the conventional ANN models since they appear conserve the crucial information input data sequence order because of the deep learning process in the hidden layers [13,[21][22][23]45].
According to [24], WNN as a data-driven model is able to take into account the non-stationarity in the assimilation of rainfall and runoff time-series data as they account for sequence dependency, periodicity and non-linearity in such data [24,25,[39][40][41].LSTM on the other hand can dynamically incorporate past learning experience due to internal recurrence, thus presenting a powerful internal state for accurate learning and predictions in data with long delay and mixed frequencies [23,26,42,43].
The distribution of the measured and predicted rainfall from the results in Figs.11 and 12 for the year 1999 were spatially interpolated using ordinary Kriging [33], and the results presented in Fig. 14.The year 1999 is chosen because it had the most continuous measured precipitation in all the gauge stations within the basin, thus suitable for comparative analysis.Figure 14a presents the observed mean monthly precipitation in 1999 and the results from LSTM and WNN are respectively presented in Fig. 14b, c.It is observed that LSTM has the ability to accurately infer the long-term patterns in the 30-year rainfall data in most parts of the basin.As compared to the LSTM, WNN tended to overestimate the higher precipitation values and underestimate the lower precipitations.The results in Fig. 14  statistical evaluation results, the spatial representation of the phenomenon gives a more insightful area-based comparison of the results.
In the prediction of rainfall and runoff in hydrologic basins with scarce data, the LSTM model performed marginally better than the wavelet-based neural network model.Both the models displayed the capability to learn the inherent temporal dynamics in the time-series data, and also to capture the seasonality in the quasi-periodic rainfall and runoff data.The results show that optimized effectively, LSTM and WNN can resolve the non-stationarity and nonlinearity problems associated with trend analysis of rainfall and runoff data.With the same optimal neural network topological structure of 4 hidden layers each consisting of 30 neurons, both LSTM and WNN models predicted runoff with average R 2 value of 0.8 for all the 3 stations, except station 1BC01 using WNN.The RMSE and MAE metrics from both models in runoff prediction was achieved at less than 13 m 3 /s for the 30-year study period.The evaluation of the significance of each meteorological parameter in the prediction of runoff showed rainfall as the most significant input parameter, followed by temperature, and solar radiation as the least contributing factor.Best results were obtained by including all the parameters in the prediction model.In the forecasting of rainfall, LSTM gave the best predictive results with R 2 = 0.8610 for the average monthly basin rainfall from the ten stations, with satellite-based precipitation being the best rainfall predictor.WNN estimated the mean basin rainfall with R 2 = 0.7825 using the five satellite data.At the sub-basins scale, it was observed that the performance of the models improved with increase in the number if input parameters and number of data stations.
The study results shows that for catchments with scarce and low quality hydrological and meteorological data observations, use of satellite data in optimized LSTM and wavelet neural network models can be relied upon for the prediction of rainfall and runoff trends.It is recommended that similar studies be carried out with the inclusion of basin physical characteristics such as elevation, slope and flow accumulation as training inputs to determine the significance of the physical watershed characteristics in rainfall and runoff predictions.

Fig. 1
Fig. 1 Location of Nzoia River basin within the Lake Victoria basin and the rainfall and discharge measurement stations.The main pour point is station 1EF01 and sub-basin discharge stations are 1BC01 and 1DA02

Fig. 2
Fig. 2 Mean monthly satellite-based: a-d temperature, humidity, wind speed and solar radiation; e ground-observed and satellite measured rainfall, and f streamflow for stations 1EF01, 1DA02 and 1BC01 ◂

Fig. 3
Fig.3Processing flow for the prediction of rainfall and runoff using LSTM and WNN

Fig. 4
Fig. 4 RNN layer structure for LSTM implementation memory cell at the time step t where Δ(i+1)  n.k , Δa (i+1) k and Δb(i+1)   k are calculated by prediction error of the network: with as the network learning rate.The training of the network comprises of the following steps[48]: 1.Data pre-processing: normalized data division into training (70%), testing (15%) and validation (15%) datasets.The validation is part of training the model and updating the parameters.It utilizes part of datasets to validate and update the model parameters after each training epoch.2. Network initialization: random initialization of weights, translation, translation and scale factor, and the learning rate { ij and jk , b k , a k , }. 3. Network training: training, prediction and prediction error e estimation between output and expected value.

(
. Weights updating: parameter and network weights update depending on magnitude of e. 5. Network testing: use test dataset for network reliability testing, else iterate to Step 3.

Fig. 5 a 3 Fig. 5 3 Fig. 5
Fig. 5 a Wavelet-based feedforward multilayer perceptron (MLP) neural network layer structure.n is the input vector data which comprise of the wavelet details (D) and approximation (A) as: x 1 = D i,N−j , ….x n = A I,N−J .b Model development for wavelet coupled artificial

Fig. 6 a
Fig. 6 a, b RMSE vs epoch for model learning with LSTM and WNN models in predicting the mean monthly rainfall and runoff.c, d Accuracy performance for the prediction of rainfall and runoff using LSTM and WNN models

Fig. 9
Fig.9 Comparative regression models for the prediction of runoff with LSTM and WNN at the discharge stations 1BC01, 1DA02 and 1EF01

Fig. 11 Fig. 12
Fig. 11 Observed and LSTM predicted rainfall for four gauge stations within the basin

Fig. 13
Fig.13 Regressions between the observed and predicted rainfall with WNN for four stations within the basin

Fig. 14
Fig. 14 Spatial distribution of a observed mean monthly rainfall in 1999, and the corresponding predicted rainfall from LSTM b and WNN ) models

Table 1
Descriptive statistics of the mean monthly observed and satellite-based meteorological data and streamflow SD standard deviation, CV coefficient of variation, SE standard error

Table 3 Performance
also illustrates that despite the good