Introduction

With increasing demands on freshwater, the depletion of this valuable resource is becoming a growing concern, especially in urban areas where it is already widespread (Zubaidi et al. 2018a, b). Consequently, the water distribution network constitutes an essential component of critical infrastructure crucial for society's functioning (Mekonnen 2022).

The urban water infrastructure is facing significant strain due to extreme weather events associated with global warming. These phenomena, including floods, droughts, and storms, are increasingly placing a heavy burden on existing water systems. Moreover, there is still a lot of uncertainty regarding how water systems' capacity would be affected by a sudden surge in water demand due to acute weather events on a seasonal or yearly basis (Mohd Azlan et al. 2021). Hence, predicting the demand for water is essential for efficient design, operation, and management of urban water supply infrastructure. Policymakers in the water industry also consider water demand prediction to be a critical topic. Different methods for estimating water demand are explored by Panagopoulos et al. (2012).

By accurately forecasting water demand, stakeholders can ensure that water supply is adequate and effectively meet the needs of urban areas. Furthermore, such forecasting can also help policymakers to anticipate and plan for changes in water demand that may arise due to factors such as population growth, climate change, and other environmental impacts (Arbués et al. 2003; Baroudy et al. 2015).

Accurate short-term prediction of municipal water demand can be critical in the water industry. For instance, it can address the issue of uncertainty by proactively optimizing the operation of water pumps, leading to improved water quality for customers and reduced power consumption (Zubaidi et al. 2018a, b). Nevertheless, obtaining the desired level of accuracy in predicting trends remains an extremely challenging task (Malik et al. 2020). Many researchers have explored the correlation between water consumption and meteorological variables using traditional models (Arbues and Villanua 2006; Ashoori et al. 2016). However, one disadvantage of statistical models is that they require a predefined structure, making it challenging to identify a specific mathematical function that performs well across diverse datasets. In addition, statistical models can struggle to handle intricate data relationships, and their predictive accuracy tends to decline as the size of the dataset grows.

In the past few studies, there has been an analysis and comparison between traditional and machine learning models in predicting water demand. The results show that machine learning techniques are more effective in predicting water demand than traditional models (Shuang and Zhao 2021). One of the most popular ML techniques is artificial neural networks (ANNs) that have proven to be highly effective in a wide range of applications (Cemek et al. 2023; El-Shafie et al. 2012; Yafouz et al. 2021a). However, standalone ML models are generally limited in their ability to learn complex patterns and relationships in data, especially when the data are high-dimensional and nonlinear (Ibrahim et al. 2022; Rezaie-Balf et al. 2020; Wee et al. 2022). This can result in lower accuracy and reduced performance when compared to more advanced machine learning model (Ehteram et al. 2020, 2018). To overcome these limitations, optimization technique introduced (Pham et al. 2020; Yafouz et al. 2021b).

Meta-heuristic algorithms are a class of algorithms used to solve optimization problems. They have many advantages, such as the ability to solve complex problems, the ability to find global optimal solutions, and the ability to handle a variety of constraints. Many studies integrated different optimization algorithms of machine learning. For instance, study carried out by Zubaidi et al. (2022), where ANN integrated with different nature-inspired algorithms to predict the water demand in Iraq. Another study conducted by Zubaidi et al. (2018a, b) found that when particle swarm optimization (PSO) is used to optimize ANN model, the model exhibits level of accuracy in simulating municipal water demand. Similar finding is reported by Zubaidi et al. (2020) where hybrid model outperformed standalone ANN model in forecasting urban water demand.

However, there are also several limitations to these algorithms. Some of the limitations of meta-heuristic, for example meta-heuristic algorithms, are often based on stochastic processes and do not guarantee that they will find the global optimal solution in a finite amount of time (Ibrahim et al. 2022). They may converge to a local optimum instead of the global optimum, or they may get stuck in a suboptimal solution. Some meta-heuristic algorithms can be computationally expensive and may not scale well to large-scale problems, especially when the number of variables or dimensions of the problem is large. Many meta-heuristic algorithms have a number of parameters that need to be set before they can be used, and these parameters can significantly impact the performance of the algorithm. Improperly setting these parameters can result in poor performance or even failure of the algorithm (Lai et al. 2022).

Adaptive guided differential evolution (AGDE) is an optimization algorithm based on differential evolution (DE) that solves nonlinear, multimodal, and difficult optimization problems. AGDE has several advantages over other meta-heuristic algorithms, including improved convergence, an adaptive strategy that guides the search toward the global optimum, simplicity of implementation, and adaptability to real-time problem characteristics (Mohamed and Mohamed 2017). AGDE is computationally efficient, easily parallelizable, and suitable for large-scale optimization problems with limited computational resources, making it suitable for real-world applications. AGDE has shown good performance in complex optimization problems with many variables and constraints (Biswas and Sharma 2023; Liao et al. 2023). As far as the authors are aware, no research has been conducted to evaluate the efficacy of integrating AGDE technique with an artificial neural network (ANN) model for water demand prediction.

Choosing the right learning algorithm for a neural network is crucial since it can affect various aspects of the network's performance, including convergence, accuracy, and generalization ability (Essam et al. 2022a, b; Essam et al. 2021). Different algorithms may have varying performance levels, complexity, and scalability. The ideal algorithm can decrease training time, computational resources, and overfitting risk while improving transparency and accountability. Selecting an algorithm that suits the complexity of the problem can minimize resource usage and reduce training time. Scalable learning algorithms are preferred for handling large-scale problems. Some algorithms are more interpretable than others, which can help in understanding the network's behavior and validating the results (Essam et al. 2022a, b; Sami et al. 2021), therefore, by testing a wide range of learning algorithms on a particular problem, which learning algorithm can perform best for predicting water demand in this study.

To build a prediction model, there are two crucial stages: explanatory variable selection and model development. Explanatory variables, also referred to as model features, are chosen based on their ability to provide insights into water demand patterns. During model development, an algorithm is employed to establish a correlation among the certain features and the prediction target, which, in this case, is water demand. This process is critical in accurately forecasting water demand, as it enables analysts to identify the key drivers of water usage and create a more precise model that captures the complex relationship between these variables and the ultimate prediction target. A multitude of factors, including the economy, society, climate, and environment, have been shown in numerous studies to impact urban water demand (Shuang and Zhao 2021). Lu et al. (2020) highlighted that water demand has a strong correlation with monthly average temperature, population and monthly average humidity. Furthermore, water demand was found by many studies to be associated with climatic aspects, including precipitation (Chen et al. 2020; Tiwari and Adamowski 2015; Zubaidi et al. 2022). These variables can significantly influence patterns of water consumption and usage, highlighting the importance of understanding their interplay to more accurately forecast demand. By recognizing the complex and interconnected nature of these factors, water managers and policymakers can more effectively develop strategies to manage water supply and meet the needs of urban communities.

The objectives of this study are:

  • To integrate ANN model with AGDE to select the optimum hyperparameters values.

  • To train it using eleven different learning algorithms, which are the most widely used learning algorithms.

  • To identify which algorithm works best for the specific problem at hand, and optimize the model accordingly. This step is critical in ensuring that the developed model is robust, accurate, and generalizes well to new, unseen data. To reduce the effect of randomness, the developed model will be run multiple times.

  • To compare the developed models with other models previously developed and reported in the literature.

The ultimate objective is to provide decision-makers with a scientific perspective on how climatic factors impact water demand, in order to promote sustainability in a country that is grappling with a distinctive context of climate change and water scarcity.

Methodology

Dataset and pre-processing

This study considered the dataset that was employed previously by Zubaidi et al. (2022) over ten years from 2006 to 2015. The data include the monthly urban water consumption time series (megalitre, ML) for South East Water, which is a retail potable water utility in Melbourne City, Australia. This region has faced numerous droughts in the past, and based on climate models, it is projected that the area will encounter an even drier climate in the future. As a result, this area is likely to experience heightened water stress and greater challenges related to water security. Obtained dataset contains minimum temperature (Tmin) (°C), maximum temperature (Tmax) (°C), mean temperature (Tmean) (°C), evaporation (Eva) (mm), rainfall (Rain) (mm), solar radiation (Srad) (MJ/m2), maximum relative humidity (RHmax) (%), vapor pressure (VP) (hpa), and potential evapotranspiration (FAO56) (mm).

The technique of Zubaidi et al. (2022) was followed in the data preparation. The data were normalized using natural logarithm and denoized then by employing the discrete wavelet transform (DWT). Also, to determine which predictors were most reliable, principal component analysis (PCA) was used. Accordingly, four weather factors were used (i.e., Tmax, Rain, Eva, and RHmax) to simulate urban water consumption. More details can be found in Zubaidi et al. (2022).

Adaptive guided differential evolution algorithm (ADGE)

The ADGE is a novel algorithm developed to overcome the drawbacks of differential evolution (DE) by introducing a new mutation rule to improve the convergence and accuracy of the basic DE. The basic DE contains four main parts: initialization, mutation, crossover, and selection. In the initialization process, the optimization process starts with the establishment of an initial population, with a number of individuals, which is assumed within a certain lower and upper bond of individuals (Mohamed and Mohamed 2017).

$${x}_{j,i}^{0}={x}_{j,L}+\mathrm{rand}\left(\mathrm{0,1}\right).\left({x}_{j,U}-{x}_{j,L}\right)$$

After establishing the initial population, the generation of the next population (generation) is conducted according to the following equation: -

$${v}_{i}^{G+1}={x}_{r}^{G}+F\cdot \left({x}_{p\_\mathrm{best}}^{G}-{x}_{p\_\mathrm{worst}}^{G}\right)$$

where G refers to the new generation and \({x}_{r}^{G}\) refers to a vector which is selected randomly from the middle of the population individuals (i.e., apart from the top and bottom). The \({x}_{p\_\mathrm{best}}^{G}\) and \({x}_{p\_\mathrm{worst}}^{G}\) are randomly selected as one of the top and bottom 100%p individuals. F refers to a random factor selected from the range 0.1–1 to control the mutation process.

For the present adaptive DE, a new adaption process for updating the crossover factor, which controls the diversity of the population, is used. At each generation, the CR probability is calculated using uniform distribution in two ranges (i.e., [0.05, 0.15] and [0.9, 1]) which was recommended by Wang et al. (2011) and Ronkkonen et al. (2005). In AGDE, the CR is adapted at each generation (G) using two different sets based on the experiences of the generated new solution during the optimization process, as follows:

$$\begin{array}{*{20}l} {\quad \quad \quad \quad {\text{If}}\;\;G = {1}} \hfill \\ {{\text{CR}}_{i}^{1} = \left\{ {\begin{array}{*{20}l} {{\text{CR}}_{1} ,} \hfill & { \quad {\text{if}}\;\; u\left( {0,1} \right) \le 1/2} \hfill \\ {{\text{CR}}_{2} ,} \hfill & {\quad {\text{otherwise}}} \hfill \\ \end{array} ,} \right.} \hfill \\ {\quad \quad \quad \quad {\text{Else}}} \hfill \\ {{\text{CR}}_{i}^{G} = \left\{ {\begin{array}{*{20}l} {{\text{CR}}_{1} ,} \hfill & { \quad {\text{if}}\;\; u\left( {0,1} \right) \le P_{1} } \hfill \\ {{\text{CR}}_{2} ,} \hfill & { \quad {\text{if}}\;\; P_{1} < u\left( {0,1} \right) \le P_{1} + P_{2} } \hfill \\ \end{array} ,} \right.} \hfill \\ {\quad \quad \quad \quad {\text{End}}} \hfill \\ \end{array}$$

Denote \({P}_{j},j=\mathrm{1,2},\dots , m\) as the probability to select jth set, where \(m\) is the total integer number of sets and it is set to be 2, and the sum of \({P}_{j}\) is one. \({P}_{j}\) is initialized as 1/\(j\). The target vector set based on probability \({P}_{j}\) is chosen using roulette wheel selection technique. \({P}_{j}\) is updated continuously at each generation, as follows:

$$P_{j}^{G + 1} = \left( {\left( {G - 1} \right) \times P_{j}^{G - 1} + Ps_{j}^{G} } \right)/G,$$
$$Ps_{j}^{G} = \frac{{s_{j}^{G} }}{{\mathop \sum \nolimits_{j = 1}^{m} s_{j}^{G} }},$$

where

$$s_{j}^{G} = \frac{{ns_{j}^{G} }}{{\mathop \sum \nolimits_{G = 1}^{G} ns_{j}^{G} + \mathop \sum \nolimits_{G = 1}^{G} nf_{j}^{G} }} + \varepsilon ,$$

where \(G\) is the number of generation and \({ns}_{j}^{G}\) and \({nf}_{j}^{G}\) are the numbers of offspring vectors generated during jth set, in which survive or fail in the selection operation the previous G. \({s}_{j}^{G}\) is the success ratio of the trial vector, \({Ps}_{j}^{G}\) refers to the probability of choosing jth set at the current generation, and \(\varepsilon\) refers to a small integer constant value for preventing the null success ratios, which is set to be 0.01. There are two benefits of employing these crossover strategies: first is to increase the exploration ability in the initial stage. Then, the diversity of population is decreased in order to boost the convergence speed.

The last part is the selection which is obtained based on a greedy selection strategy. The best fitness function will be updated along with its solutions' location, as presented by the following:

$$x_{i}^{G + 1} = \left\{ {\begin{array}{*{20}c} {u_{i}^{G} ,} & {f\left( {u_{i}^{G} } \right) \le f\left( {x_{i}^{G} } \right) } \\ {x_{i}^{G} ,} & {{\text{otherwise}}} \\ \end{array} } \right.$$

These aforementioned steps will continue until the predefined criteria are met or the best optimal fitness value is determined.

Artificial Neural Network (ANN)

In the field of hydrology, artificial neural networks (ANNs) and, more specifically, feedforward back-propagation (FFBP) learning are currently the most widely used machine learning (ML) techniques. The FFBP was utilized to map the nonlinear behavior of water data to generate accurate simulations of urban water requirements across multiple spatial and temporal scales (Shirkoohi et al. 2021; Zounemat-Kermani et al. 2020).

ANN with two hidden layers has been successfully implemented by a variety of researchers across a wide range of applications; results confirmed that these techniques adequately modeled the nonlinear relationship between independents and outcomes, such as Thomas et al. (2017) and Ghorbani et al. (2017). Accordingly, the neurons in an ANN were organized into four distinct layers: an input layer for the independent variables (i.e., climate variables), two hidden layers for processing the data, and an output layer for the final prediction (i.e., urban water consumption).

The tansigmoidal activation function is utilized in both the 1st and 2nd hidden layers, while the linear activation function is utilized in the final output layer. Following previous research (Zubaidi et al. 2023; Zubaidi et al. 2020), the dataset was split into three parts: a training set (consisting of 70% of the data), a testing set (15%), and a validation set (15%).

In this study, the ANN model was integrated using the AGDE algorithm by selecting the best values for the ANN’s hyper-parameters (i.e., the learning rate coefficient (Lr) and the number of neurons in the 1st and 2nd hidden layers (N1 and N2)).

Since there is no single performance Cartesian, detecting the proper criteria for a particular application is vital (Abed et al. 2023). Also, it is common to employ different performance criteria because each performance Cartesian has advantages and disadvantages (Seo et al. 2018). This research used several tests to estimate the prediction models’ performance. Four types of performance measures are employed in this analysis: absolute, relative, and dimensionless errors, and graphical tests.

Determining the ANN coefficients (weight and biases between layers) suits ANN training well. Accordingly, eleven optimization technique algorithms are performed to understand how these algorithms regulate the training process to offer an accurate solution (Ahmad and Chen 2020; Khalid and Javaid 2020). The optimization technique algorithms, including Bayesian regularization (BR), conjugate gradient with Powell/Beale restarts (CGB), Fletcher-Powell conjugate gradient (CGF), Polak-Ribiére conjugate gradient (CGP), Gaussian discriminant analysis (GDA), gradient descent method (GDM), variable learning rate backpropagation (GDX), Levenberg–Marquardt (LM), one-step secant (OSS), resilient backpropagation (RP), scaled conjugate gradient (SCG).

Results

This section includes results of the hybrid ANN models optimized with ADGE and trained with eleven different learning algorithms. Figure 1a represents the training progress of RP algorithm assessed using RMSE. The legend 50—1 denotes the swarm number (50) of ADGE optimization and the number of times the model is trained to achieve better results. As presented in Fig. 1, the third model has least RMSE. Similar procedure has been applied for all training optimization algorithms. As a result, all the proposed model architectures have been successfully completed the training procedure and achieved the performance goal. After completing the training procedure, the choice of the best model architecture based on the different training algorithm has been carried out taking into consideration the minimal value of the attained RMSE associated to swarm number of the ADGE optimizer and the number of the hidden layers.

Fig. 1
figure 1

Fitness function versus iteration a BR algorithm and b CGB Algorithm

The performance of models is assessed using parameters such as mean absolute error (MAE), mean absolute relative error (MARE), root-mean-square error (RMSE), maximum error and regression (R2). Tables 1 and 2 present the testing and validation performance of the ANN models corresponding to different learning algorithms. ANN models were trained several times to find the configuration (nodes in the hidden layers) that provides optimum results. Tables 1 and 2 contain the configuration of the ANN models which achieve the optimum results. First hidden layer is denoted as H1, and second hidden layer is denoted as H2. All the learning algorithms provided comparable accuracy; hence Taylor diagram was used to find the learning algorithm which provided the best accuracy for the prediction of water demand.

Table 1 Testing performance of ANN models
Table 2 Validation performance of ANN models

Figure 2 represents the Taylor diagram containing all models with 11 learning algorithms. Models are labeled from B to L in order as presented in Tables 1 and 2, and alphabet A denoted the actual data point. As presented in Fig. 2, the learning algorithms E, I, K and G (i.e., CGP, LM, RP and GDM, respectively) are approximately at the same distance from point A, which means these models have the potential to provide more accurate results in comparison to other leaning algorithms investigated in this study. Comparing the values of performance parameters for these four models from Tables 1 and 2, it was inferred that the algorithm RP provides better results. Among these four models, RP has least maximum error in validation (0.0668) and testing datasets (0.0876) and better validation R2 value (0.9579) and testing R2 value (0.9554).

Fig. 2
figure 2

Model comparison using Taylor Diagram

Discussion

Comparison of the models based on their performance parameters, as mentioned above, states that RP is providing better accuracy. In this section, models’ prediction capability is analyzed and discussed. To substantiate the findings from the above analysis for the achieved results, the predictions versus the observed values of the water demand are shown in Fig. 3. The major observation from Fig. 3 is that the all the models provided prediction values for the water demand lower that the actual value; in other words, all the models underestimate the actual values of the water demands. However, the positive remarkable point about the achieved results are that the model could mimic the water demand pattern trend, as the predicted value just.

Fig. 3
figure 3

Actual versus prediction water demand pattern from the proposed model

Above comparison of the current study results with literature states that the application of ADGE optimization method has improved the prediction accuracy in comparison to standalone ANN models. ADGE optimized the ANN coefficients (its weights and biases) which ultimately helped ANN models in achieving greater accuracy. For this study, the number of swarm in the ADGE was fixed to 50; however, for future research works, it is suggested to investigate the effect of different swarm number on accuracy of the ANN model. To evaluate the predictive performance of the proposed models in estimating water demand, we have calculated the relative error percentage, as shown in Fig. 4.

Fig. 4
figure 4

Relative Error between the actual versus prediction water demand pattern from the proposed models

Investigating eleven different learning algorithms provided the insights of which learning algorithm may provide better accuracy. According to the results, for this dataset of water demand, RP algorithm provided better accuracy. However, since different dataset requires different configuration of ANN models to achieve better accuracy, hence, there is a possibility that other learning algorithms may provide better accuracy.

Conclusion

Water demand is one of the most fluctuating variables, making it hard to infer. Predicting short-term water demand is crucial for the decision-makers to sync it with the freshwater availability and manage it in the extreme situations like flood, drought, and storms. In this study, hybrid model is developed for the prediction of monthly water demand based on the dataset of monthly urban water consumption time series of Melbourne, Australia. The dataset consisted of minimum, maximum, and mean temperature (°C), evaporation (mm), rainfall (mm), solar radiation (MJ/m2), maximum relative humidity (%), vapor pressure (hpa), and potential evapotranspiration (mm). Before feeding the data to the model, it was normalized using natural logarithm and denoized then by employing the discrete wavelet transform, followed by application of principle component analysis determine which predictors were most reliable. Hybrid model development included the optimization of weights and biases of ANN model using adaptive guided differential evolution algorithm with swarm size of 50. Post-optimization the ANN model was trained using eleven different learning algorithms: Bayesian regularization, conjugate gradient with Powell/Beale restarts, Fletcher-Powell conjugate gradient, Polak-Ribiére conjugate gradient, Gaussian discriminant analysis, gradient descent method, variable learning rate backpropagation, Levenberg–Marquardt, one-step secant, resilient backpropagation, and scaled conjugate gradient. According to the results, for this monthly water demand dataset, resilient backpropagation provided better accuracy with validation and testing R2 values as 0.9579 and 0.9554, respectively.