Climate change is an extremely important topic. Modern climate change is heavily influenced by human activities (Karl and Trenberth 2003). Human activities that lead to emit various gases to atmosphere have increased the rate of climate change. Energy source-related emission is one of the largest factors to influence the climate change. The energy source emissions can even reach 80% of total greenhouse gas emissions (Quadrelli and Peterson 2007). Carbon dioxide (CO2) emission to the atmosphere has reached the highest level ever recorded in the recent decade. However, not only CO2 but also some other gases contribute to the greenhouse effect and then to climate change. Water vapor, methane and ozone are few examples for these other gases (Karl and Trenberth 2003).

The climate change has impacted not only human life (Barnett and Adger 2007; Haines et al. 2006; Patz et al. 2005; Vorosmarty 2000) but also other living species in the world, including plants (Harvell et al. 2009; Hughes 2003; Mawdsley et al. 2009; Root et al. 2003). World Health Organization estimated that there are around 150,000 deaths during the 30 years (Patz et al. 2005). Many deaths out of them were linked to climate fluctuations. Climate change even has imposed worse impacts on extinct animals (Malcolm et al. 2006; McDonald and Brown 1992; Pounds et al. 2006; Sekercioglu et al. 2007).

Atmospheric temperature is one of the most significant climatic factors, in which the human body feels. Therefore, even a small change is influenced to the daily routine of the people (Kalkstein and Smoyer 1993). Therefore, analysis of atmospheric temperature takes a significant interest from the researchers throughout the world (Mears and Wentz 2017; Rotstayn et al. 2014; Santer et al. 2012; Simmons et al. 2014). Trend analysis of atmospheric is one of the most influenced research areas. Many researchers performed temperature trend analysis and predicted on the future scenarios (Forster et al. 2013; Lee et al. 2013; Marotzke and Forster 2015; Shaltout and Omstedt 2014). These present the increase in surface temperature patterns in the future for most of the parts of the world. Atmospheric temperature trend analysis has taken place in Middle Eastern region too, including Saudi Arabia (Alghamdi and Moore 2014; Almazroui et al. 2013; Athar 2013; Krishna 2014). However, this kind of research was not extended to Tabuk area in Saudi Arabia. Therefore, there is a clear research gap along the lines of temperature trends in Tabuk area.

In addition, usage of gene expression programming (GEP) and artificial neural networks (ANN) is very famous on estimating various climatic parameters. Many researchers used these techniques to estimate incoming solar radiation (Landeras et al. 2012), daily reference evapotranspiration (Guven et al. 2008; Izadifar and Elshorbagy 2010; Yassin et al. 2016), evaporation (Shiri et al. 2012), dew point temperature (Shiri et al. 2014), etc. Basically, an unknown or unmeasured climatic parameter was estimated by using the other climatic parameters using GEP and ANN. However, there is no literature on the usage of these techniques to Tabuk, Saudi Arabia. Therefore, the area is clearly under a research gap as Tabuk is an important agricultural area in Saudi Arabia. This paper presents the results from a GEP model to estimate the atmospheric temperature in Tabuk area using the other climatic parameters. In addition, an ANN model is developed to obtain the atmospheric temperature in the same area, and then, results are compared against the estimations by GEP model and the real recorded data.

Overview of ANN and GEP

ANN and GEP are two of the most widely used branches of soft computing techniques in hydraulic engineering. ANNs have been reported to provide reasonably acceptable solutions for problems in water resources and hydraulic engineering; particularly, they execute results for highly nonlinear and complex water resource problems (Azmathullah et al. 2006; Bilhan et al. 2010; Haghiabi et al. 2017; Kisi et al. 2008; Parsaie and Haghiabi 2015, 2018; Parsaie et al. 2018a). During the last three decades, researchers have noticed that the use of soft computing techniques as alternative to conventional statistical methods based on controlled laboratory or field data yield significantly better results. For example, Azamathulla et al. (2005, 2006, 2008, 2016), Guven and Gunal (2008a) and Kisi et al. (2008) have shown that these predictive approaches such as artificial neural networks (ANNs) yield effective estimates to the scour around the hydraulic structures. In addition, GP has shown agreeable estimates to the similar problems (Azamathulla et al. 2010; Guven and Gunal 2008a, b; Guven and Aytek 2009).

Artificial neural networks are basically a collection of simple processing elements (PEs) arranged into input layer, output layer and usually one or more hidden layers. The multilayer perceptron (MLP) is a class of feedforward ANN and usually uses a sigmoid-type function for each PE (Parsaie et al. 2018b). Sigmoid-type functions are bound, monotonic and non-decreasing functions with a nonlinear response. They are good at simulating any nonlinear processes.

On the other hand, gene expression programs (GEPs) are evolutionary algorithms following extensions to genetic programs (Koza 1999; Parsaie et al. 2017). The computer programs of GEP are all encoded in linear chromosomes, which are then expressed or translated into expression trees (ETs). ETs are sophisticated computer programs that are usually evolved to solve a particular problem, and selected according to their fitness at solving that problem. GEPs are a full-fledged genotype–phenotype system where the genotype is separated from the phenotype. In contrast, GPs are replicator systems. Therefore, solving power of GEP is greatly enhanced compared to GP (Ferreira 2001). GEP initializes the solution search process by initializing the population in which the chromosomes of each individual are randomly generated. These chromosomes are individually evaluated based on a fitness function and then reproduce with modifications. The reproduction process is repeated until the GEP reached a predefined number of generations or until a solution is achieved.

Study area

As it is stated in introduction, the study of the climate change is crucial because it enables us to know the factors that affect weather. This knowledge can be useful in the quest to limit the negative aspects of climate change. The emission of inert gasses such as carbon dioxide and methane increases temperature causing harmful influence on water evaporation percentage and ultimately decreases groundwater stock. Tabuk is located in the northwestern part of Saudi Arabia. It has an area of 139,000 km2 (Fig. 1) and was bounded in north to Saudi–Jordan country border, in south and west to Red Sea and in east to Hufa depression. The average height of the city area is 770 m from mean sea level. Tabuk is classified as a hyper-arid catchment (Abushandi and Alatawi 2015). It has a high evaporation rate with a low vegetation cover. In addition, the area is popular for flash floods due to unexpected rainfalls. Tabuk has two seasons: shorter winter season and a longer summer season. The average annual rainfall is 33 mm and can be seen in the winter season from October to April.

Fig. 1
figure 1

(source: Al-Harbi 2010)

Study area

Despite the arid climate in Tabuk, it is an agricultural area that relies heavily on little rainwater received and extracted groundwater. Therefore, any change in the water resources would be devastating for the agricultural industry. Al-Harbi (2010) reported that the agricultural land extent in Tabuk was increased by 10% for the 20 years from 1988 to 2008.


Monthly weather data for 30 years (1986–2015) were collected from the Saudi General Authority of Meteorology and Environmental Protection. Rainfall, wind speed, air pressure, humidity and atmospheric temperature were among the collected data. These data were used to calibrate and validate the ANN and GEP models. The summary of the gathered data is given in Table 1.

Table 1 Monthly average climatic data for 30 years (1986–2015)

The overall objective of this research work is to find the atmospheric temperature of Tabuk using the other climatic factors. These minimum, maximum and mean values of monthly rainfall, relative humidity, wind speed, atmospheric temperature and atmospheric pressure data were fed as the inputs of the ANN model. Each data set was randomly partitioned into two sets, where 80% of data out of 30 years data were used for model calibration (training), while the other 20% was used for validation (testing). We developed an ANN model based on multilayer perceptron neural network architecture. The ANN model was trained using the Neural Network Toolbox in MATLAB. The Levenberg–Marquardt algorithm is used in this toolbox. The coefficient of determination (R2) and the mean squared error (MSE) were used to test the developed ANN model.

In addition to the ANN model to predict the atmospheric temperature in Tabuk, we developed a GEP model. Therefore, the two developed models can be compared for their results and accuracy. GEP was also used to estimate the atmospheric temperature using the other climatic factors. Similar to the ANN modeling, we use 80% of the climatic data what we obtained for the training set. The remaining 20% of the data were used to the testing set. The training set defines the learning environment of the GEP system. First, the fitness function was chosen. The chosen fitness function for this problem is given in the following equation (Eq. 1).

$$f_{i} = \sum\limits_{j = 1}^{{C_{t} }} {\left( {M - \left| {C_{i,j} - T_{j} } \right|} \right)}$$

where M is the range of selection, Ci,j is the value returned by the individual chromosome i for fitness case j and Tj is the target value for fitness case j. |Ci,j Tj| is the precision of the fitness function. If it is less than 0.01, then the precision is considered zero and fi reaches to its maximum; fmax= CtM. In this problem M = 100 was used. In this case, fmax = 1000. The fitness function in Eq. 1 allows finding the optimal solutions by itself. This is one of the main advantages of having such a fitness function. In addition, the simulation continues until it reaches the maximum of the fitness function value. Next, the set of terminals (T) and the set of functions (F) were chosen to create the chromosomes. Set of terminals for this problem consists of single variable of atmospheric temperature and follows the following equation (Eq. 2).

$${\text{TEMP}} = f\left( {{\text{WS}},\,P,\,{\text{RH}},\,{\text{RF}}} \right)$$

where TEMP is the atmospheric temperature and WS, P, RH and RF are for wind speed, atmospheric pressure, relative humidity and rainfall, respectively. The choice of the appropriate function given in Eq. 2 is not straightforward. However, a good guess can always be helpful. We have used five basic arithmetic operators (+, −, ×, /, power) in connecting the parameters to introduce the function for TEMP.

Then, the chromosomal architecture was selected. Chromosomal architecture consists of the length of the head and the number of genes. The chromosomes with three genes (length of 9, t = 9) with head length of 8 (h = 8) were selected. The length of the chromosome is, therefore, 30. Linking of the f functions is done using the addition arithmetic function. The literature shows that the addition gives better results (Azmathullah et al. 2005). These selected functions are given in the following equations (Eqs. 36).

$$T = \left( {\frac{{4.667 \times {\text{RH}}^{3} }}{P}} \right)^{4} + \left( {\frac{\text{RH}}{10}} \right)^{5}$$
$$T = T + \left( {6.32 \times {\text{RH}} \times \left( { - {\text{RF}}} \right) \times {\text{WS}} \times \sqrt P } \right)$$
$$T = T + \left( {{\text{RH}} - P - {\text{WS}} - 472\left( {{\text{RH}} \times {\text{RF}} \times {\text{WS}} \times P^{4} } \right)} \right)$$
$$T = T + \left( {1.01549\left( {{\text{RH}} + P} \right) + \frac{2P}{3} + 0.516 + 4.62\left( {{\text{RF}} \times P} \right)^{4} } \right)$$

Therefore, Eq. 2 can be expanded using the combination of Eqs. 36 and can be presented in Eq. 7. This equation is obtained from the developed GEP model.

$$\begin{aligned} {\text{TEMP}} & = \left( {\frac{{4.667 \times {\text{RH}}^{3} }}{P}} \right)^{4} + \left( {\frac{\text{RH}}{10}} \right)^{5} \\ & \quad + \,\left( {6.32 \times {\text{RH}} \times \left( { - {\text{RF}}} \right) \times {\text{WS}} \times \sqrt P } \right) \\ & \quad + \,\left( {{\text{RH}} - P - {\text{WS}} - 472\left( {{\text{RH}} \times {\text{RF}} \times {\text{WS}} \times P^{4} } \right)} \right) \\ & \quad + \,\left( {1.01549\left( {{\text{RH}} + P} \right) + \frac{2P}{3} + 0.516 + 4.62\left( {{\text{RF}} \times P} \right)^{4} } \right) \\ \end{aligned}$$

A combination of genetic operators (mutation, crossover and transposition) was used to set variations of the data set. As it was stated earlier, 288 out of 360 data sets (80%) were used to train the GEP model and the remaining 72 (20%) data sets were used to test the GEP model.

Results and discussion

Figure 2 shows the results obtained from the developed artificial neural network model. The observed atmospheric temperature data were compared to the predicted atmospheric temperature data from ANN model. The best fit line is the 450 inclination line (Y = X line or predicted temperature = observed temperature). The results show that they are more biased to the − 25% of the best fit line. Therefore, the developed ANN model for predicting atmospheric temperature for Tabuk, Saudi Arabia, is under predicting the results. This clearly shows from the R2 value (R2 = 0.67) for the results.

Fig. 2
figure 2

Predicted temperature using ANN against the observed temperature

In contrast, Fig. 3 presents the predicted atmospheric temperature from the GEP model. The predicted temperature values were compared against the observed atmospheric temperature values. Unlike the predicted atmospheric temperature values in Fig. 2, Fig. 3 shows an un-scatter plot. They lie closer to the best fit (45° inclined line) line where predicted values are equal to the observed values. The R2 (= 0.91) proves the better fitness of the plot. Therefore, it can be clearly seen herein that our GEP model predicts the better atmospheric temperature values using the other climatic parameters.

Fig. 3
figure 3

Predicted temperature using GEP against the observed temperature

The performances of the developed ANN and GEP models to predict the atmospheric temperature were compared to test data sets. The comparison can be seen in Fig. 4.

Fig. 4
figure 4

Comparison of results obtained from ANN and GEP models

Figure 4 clearly shows that the GEP model gives better predictions than the ANN model. The ANN model produced the lower coefficient of determination compared to GEP model results and highest RMSE error (R2 = 0.67 and RMSE = 20.179). However, the RMSE error of GEP model predicted that atmospheric temperature is 0.44. Therefore, it can be clearly concluded herein the GEP model predicts more accurate answers. In addition, ANN model does not provide any governing equations, and this is one of the drawbacks.

Summary and conclusions

We developed two different generic models (ANN based and GEP based) to predict the atmospheric temperature in Tabuk, Saudi Arabia. Climatic parameters including atmospheric pressure, wind speed, relative humidity and rainfall were used to predict the atmospheric temperature using the models. Results clearly show that the GEP model outperforms the ANN model to predict the atmospheric temperature and suggest that the proposed GEP model is robust and useful for practitioners. Therefore, the GEP model is proposed to estimate the temperatures in agricultural areas in Tabuk under the climate change scenarios. Future research would incorporate the winter and summer months separately in the predictions and would be better to use more input data than 30 years.