In order to highlight the necessity of using a neural network, it is necessary to first show the results obtained using MLR.
Multiple regression
It should be noted that the data used in this section are those of the subset of the test phase. The MLR equation used according to the regression coefficients obtained is as follows:
$$ {\text{ET}}0 = 0.17{\text{T}} - 0.06{\text{Rh}} + 1.31{\text{Ws}} + 0.26{\text{I}} - 0.05 $$
(10)
Where ET0 is reference evapotranspiration (mm/day), T is average daily temperature (°C), Rh is relative humidity (%), Ws is average daily wind speed (m/s) and I sunshine duration (number of hours/day).
Statistical analysis of the data shows a close relationship between the observed and the simulated series; the determination coefficient R
2 reached 97%. Generally, all the parameters used in the models contributed significantly to estimating ET0. The results showed a confidence level of 0.05, which means that the marginal contribution of each variable is significant. They also showed that the observed F (1,445.14) was higher than the critical F (3.9).
The T statistic of relative humidity is −11.40, which reflects an inverse relationship with the evapotranspiration and water requirements for cultivation. Whereas the effects of air temperature, wind speed and sunshine hours were found to be positive.
It is a natural fact that meteorological factors in general act in concert. Therefore, it is pertinent to take into account the combined influence of all the meteorological parameters on evapotranspiration. As far as the significance of individual meteorological parameters is concerned, this study revealed that the highest value of correlation coefficient was obtained for evaporation with air temperature, followed by wind speed and relative humidity.
Figure 6 indicates that the observed series and the simulated series have almost the same speed, although they merge up and down several times. Nevertheless, the two series diverge occasionally, especially at the peaks of small values. Therefore, the observed values have more importance than the simulated values in some cases, and vice-versa in other cases. The comparison of the ET0 predicted by MLR and the observed values shows good agreement with R
2 = 0.97 (Fig. 7).
The efficiency E, R
2, RMSE, and MSE statistics of this model for the dataset of testing phase are given in Table 2. This result shows that all these performance criteria are very satisfactory, emphasizes the factors influencing ET0 since the model considered all the variables, and indicates that the relationship between the two series is very high.
Table 2 Performance criteria obtained by multiple linear regression (MLR) and a network with a single hidden layer containing four neurons. R
2 Determination coefficient, E Nash-Sutcliffe efficiency, MSE root mean squared error, RMSE root mean squared error
Neural networks
Using a simple neural network architecture, we obtained some very satisfactory results. Indeed, when we compared the performance criteria of each modeling phase with those of MLR, we found that the performance criteria of a single-hidden-layer architecture with four neurons was more interesting. All statistical parameters used showed that the ANN model is better than the MLR model (Table 2).
In this context, Tabari et al. (2010) have also noted from comparisons of model performances that ANN was more suitable than MLR. Also, Izadifar (2010) found that, using a single hidden layer and five neurons, the MLR model is better than the ANN model.
The results presented above are very satisfactory and we can stop with this simple architecture. In this context, Tabari et al. (2009) noted that, among several tested architectures, a single hidden layer with five neurons was the best architecture. So we can say that an ANN with only one hidden layer is enough to represent the nonlinear relationship between the climatic elements and the corresponding ET0. But, it should be noted that the advantage of the neural method lies in the possibility of making improvements in performance criteria by modifying the network architecture. Koleyni (2010) believed that the performance of a neural network is very often related to its architecture. This performance is usually determined simply through experiments due to lack of theory. The choice of the neural network capacity fundamentally reflects its ability to learn and generalize. If the network model is proportionally too small, it will be unable to obtain the desired function. However, if it is too complex, it will be unable to generalize the model.
Throughout the various architectures tested, we sought to (1) maximize the determination coefficient R
2, and (2) approximate the Nash criterion to 1. In fact, we applied a trial-and-error technique by increasing the number of neurons in the first hidden layer until further improvement ceased, and then added another hidden layer.
In fact, the improvement of model performance by adding neurons to the single hidden layer was limited to 13 neurons; thereafter it decreased. The values of MSE obtained in the test phase were 0.029, 0.019 and 0.027 (mm/day)2, respectively, by 12, 13 and 14 neurons. The best value obtained by the single hidden layer was bigger than 0.0047 (mm/day)2, which was obtained by the network architecture chosen in this study.
We found that with one hidden layer, R
2 values fluctuated, and with 2 hidden layers, R
2 values progressed in a quick and monotonous way. Moreover, the values of the Nash criterion (E) progressed significantly to reach a value of 1 at the test phase (Fig. 8).
Furthermore, the addition of other nodes may not improve model performance.
Another parameter that should absolutely be taken into consideration is the number of epochs. The different combinations show that 1,000 epochs are enough to obtain the best results. The addition of more epochs is useless and may decrease performance.
Decreasing the number of hidden layers will not automatically improve model performance. It may affect all performance criteria negatively, like the network architecture (c = 2, n = 4), but with the addition of neurons to each hidden layer, the rate of improvement becomes very fast.
As the two criteria approach 1, the criteria that reflect the importance of the errors between the observed and simulated values regress little by little to achieve their minimal values (Fig. 9).
Extensive test experiments were conducted in order to select the optimal network architecture. Consequently, these tests led to a network of two hidden layers, each of eight neurons.
We should mention that, as the network architecture becomes more complex, the learning process becomes more and more difficult, and the time required to perform this operation increases progressively. Therefore, modeling can take a long time and the search for a better architecture requires considerable processor time. So, the most suitable architecture in our case is a network of two hidden layers of eight neurons each.
Also, neural networks require setting up a learning rate and number of iterations. After testing different combinations, we chose a learning rate of 0.2 and a number of iterations of 1,000.
At first glance, MLR showed a remarkably satisfactory performance. Nevertheless, the neural network model outperformed MLR overall, as shown in Table 3.
Table 3 Comparison of performance criteria obtained by MLR and a neural network model. R
2 Determination coefficient, E Nash-Sutcliffe efficiency, MSE mean-square error, RMSE root mean squared error, MARE is the mean absolute relative error
Comparing the performance criteria obtained during different stages of the neural method model with those obtained by MLR for the various sets of data shows the importance of the neural network modeling. The MARE (%), i.e., the percentage of recorded errors between real and simulated values of ET0, indicates the higher performance of the neural networks over MLR.
Table 3 shows the absence of overfitting because the difference between errors (MSE) at the learning and testing phases is not significant. These errors increase when moving from training phase to test phase, and then decrease at the validation phase. We should note that errors occur due to the nature of the data. Yet, in the case of MLR, the rate of errors is higher compared to the neural network model. The absence of overfitting is due mainly to the procedure adopted to avoid it and, at the same time, confirms the correct choice of neural network architecture.
In order to evaluate the correlation between the observed values of the ET0 and the simulated values, we plotted them in a graph as shown in Fig. 10. The result shows scattered points distributed statistically around the line y = x. This shows a very good resemblance that explains a high correlation coefficient between the learning, test and validation phases. We mentioned that most of the values predicted with ANN lie near the y = x line. Further, this study concludes that a combination of mean air temperature, wind speed, sunshine hours and mean relative humidity provides better performance in predicting ET0.
In addition, the statistical parameters show close resemblance between the three modeling phases. These results further confirm the high performance of the model (Table 4).
Table 4 Statistical parameters of observed and simulated ET0 series (mm/day). ET0 Reference evapotranspiration, ET0o observed evapotranspiration, ET0s simulated evapotranspiration, STDEV standard deviation
The comparison between the observed and simulated series of ET0 values reveals a high resemblance (Fig. 11).
If we compare the results obtained by ANNs in the validation phase (ET0a) with those obtained by the multiple regression method (ET0m) for the same dataset, we can see clearly that the neural networks series is a better fit (Fig. 12). The difference becomes greater at extreme values; this adds further justification to the choice of neural networks. In this context, Deswal and Pal (2008) noted that, of the two regression analysis approaches that have been used, ANNs provide better results in terms of predicting evaporation due to a higher correlation coefficient with a lower RMSE.
Finally, we must confess that the performance of the models varies according to the number of inputs as well as the predicted time step. Hence, Wang et al. (2010) noted that wind velocity and relative humidity were found to improve temperature-based backpropagation accuracy when incorporated into the network input sets.
Indeed, this performance will be even better if we were interested in modeling a more extensive time step. With a simple architecture, we can obtain a very strong correlation, i.e., R
2 close to 1. Nevertheless, performance decreases when the number of the inputs is reduced.