1 Introduction

The process of climate change, particularly changes in temperature and rainfall is the most important issue in the field of environmental sciences. Climate change has increasing importance due to its scientific and practical aspects (environmental, economic-social impacts). The earth temperature change in the report of the Intergovernmental Panel on Climate Change has been calculated between 0.3 to 0.6 °C for 1900 to 1995 (Intergovernmental Panel on Climate Change, 2001). Most climatologists believe the main reason for the increase in temperature of the Earth is human activity, which through excessive increase in greenhouse gases increases the temperature of the planet in the last century (Alijani 2011).

Shumway et al. (1988) also showed that temperature has a major role in Los Angeles death rate. The stressful effect of climate on death rate (mortality) has been proved which means that the farther the temperature from the human comfort zone, the further the stress, resulting in increased death rate (Marmor 1975; Ramlow et al., 1990). A significant relationship has been determined between temperature and death rate for some world cities. Deaths that are directly related to the temperature such as increased body temperature can be the result of cardio-respiratory diseases or poor functioning of the vessels that transfer nutrients and blood to the body (Kalkstein 1991; Martens 1998).

Since climate has a wonderful effect on human’s social and individual life, climatic weather forecast is performed based on current and predicted values of atmospheric parameters (Teshnehlab & Monshi., 2003). The role of maximum temperature is quite clear in increasing the evaporation and transpiration, reduction of surface and underground water, the spread of various diseases, forest fires, the process of melting glaciers, and drought and water shortages in other areas (Hosseini, 2009). Significant changes in global temperatures or global warming are considered as the most important aspects of climate change in the present century. If estimating and predicting methods have enough accuracy, they can be used in planning and management (Karamooz et al. 2006). High temperature can also cause many disasters in road transportation. Direct impact on the vehicle through evaporation of gasoline and water (Keay and Simonda 2005) and driver fatigue (Eriksoon and Lindqvist 2003) are among these cases. Nowadays, one of the efficient methods that has many uses in the science of climate is the artificial neural networks. According to the researchers, its power and high-speed in simulating the processes that are not properly understood or checking them with other methods is time-consuming and difficult is the main reason for its acceptance and growing use. Generally, it can be said that artificial neural network is a high-capability robust model that can be positively viewed on climate and hydrological issues. Specially, this network can extract the law the data even the noisy data (Dehghani and Ahmadi 2008).

Jain (2003) predicted the temperature of South Georgia for the next one to twelve hours using the artificial neural network. Rahman and Mohandas (2008) showed that neural network could estimate the solar radiation of Saudi Arabia through temperature and relative humidity during the statistical period of 1998 to 2000. Cadenas and Rivera (2009) could predict wind speed in Lamenta, Osaka, and Mexico for short periods of time with good precision. Ahmadi (2005) classified Boushehr rainfall changes using artificial neural network based on his self-organizing map model (SOM). SoheiliKhah and Teshnehlab (2004) predicted daily maximum temperature using a new dynamic structure of neuro-fuzzy network. They could increase the accuracy of prediction using their self-proposed algorithm. Rahmani and Teshnehlab (2005) proposed a new structure based on artificial neural network named TD-CAMC which is an expansion of cerebellum or CAMC model and used it for predicting daily maximum and minimum values. Alijani and Ghavidel Rahimi (2005) predicted and compared Tabriz annual temperature changes with global temperature anomalies using artificial neural network and linear regression. Khosravi et al. (2010) considered the use of artificial neural network in the field of atmospheric sciences and calculating climatological parameters. They used variables such as relative humidity, average wind speed, average hours of sunshine, and the difference between the average minimum temperature and the average maximum temperature as Perceptron multilayer neural network input.

Artificial neural networks are effective tools for modeling nonlinear systems. Because these networks do not consider mathematical relationships for complex phenomenon (Kumar et al. 2002). Today, researchers developed sciences like intelligent methods that are flexible and powerful tool, looking for ways beyond the usual methods for understanding and forecasting important meteorological parameters. Artificial neural networks are one of the methods able to calculate the arithmetic and logical functions (Sharma and Bose 2014). Hence, modeling of temperature variables is important in water resources management and agriculture, especially in arid and semi-arid regions. Due to climate change and global warming in recent decades and the importance of simulating and forecasting climate parameters and also given the power and speed of artificial neural network, this article considers and estimates Tehran maximum temperature in winter using neural network model.

2 Materials and Methods

2.1 Study Area

Tehran is located in 35 N and 51 E in the southern foothills of Alborz Mountains (Fig. 1). Tehran’s area is about 800 Km2. City height in south in Mehrabad airport is 1200 m and in north is 2000 m. Figure 1 shows the area under study. Monthly data of maximum temperature of Tehran Mehrabad Synoptic station during the periods of 1951 to 2010 are used in order to predict winter maximum temperature. Just the mentioned Synoptic station has above 50 years, data from other stations are not used. Concerning the general climate divisions, this region is among semi-arid climate, from June to August is almost dry, and the rest of the months of the year is wet. Average annual rainfall and average annual temperature during a period of 55 years are 333 mm and 17.2 °C respectively. An important part of rainfall pour during December to May. Average temperature of January as the coldest month of the year is 3.3 °C.

Fig. 1
figure 1

Position of the studied station in Tehran city

In the present paper; monthly average of maximum temperature (Tmax), difference between maximum and minimum temperatures (Rtempt), sunny hours (n), wind speed (U2), mean relative humidity (RH mean), and mean temperatures (T mean) of Tehran Mehrabad synoptic station during the statistical period of 1951 to 2010 has been used to predict Tehran maximum temperature in winter. As shown in Fig. 2, Tehran maximum temperature process is nonlinear. Therefore, it is not possible to use statistical models applied to predict the linear processes. To avoid this problem, it is better to use models that have the predictive ability for nonlinear processes.

Fig. 2
figure 2

Tehran maximum temperature time series in winter in the statistical period of 1951 to 2010

With respect to the length of the statistical period used, 70% (43 years) of the statistical period was considered for network training and 30% (17 years) of this used for the test period. Qnet 2000 software was used to design the artificial neural network (ANN). Cross validation method was used in this research to perform a sensitivity analysis on the results of the model.

One of the models for forecasting nonlinear processes is artificial neural networks. Neural network technique includes two phases: training phase and testing phase. For this purpose, 70% of the data was allocated for network training and 30% of the data was used for testing and validation. The fact that what percentage of the data is sufficient for network training and whether there is a threshold below which the network is not useful, depends on the complexity of the problem and the quality of the data. In addition, error rate during network training depends on the number of examples used for training network. If the number of patterns and the number of neurons in the middle layer is low, Network cannot realize the connections between the outputs and inputs. Moreover, if the number of neurons in the middle layer is more than necessary, network maintains patterns. So, the data which is good during the training phase is weak for the testing phase and has no generalizability. It must be mentioned that in this study, the hyperbolic tangent activation function in the hidden layer and the hyperbolic tangent activation function in the output layer are used. Finally, cross validation method was used to assess the accuracy of the appropriate selected model. In this study, the number of input parameters in designing models gradually decreased to evaluate the ability of neural networks in the lack of climate data. For this propose, 5 input vector compositions were considered as described in Table 1.

Table 1 The combination of input variables in the designed models

In order to obtain the best network layout, designing models begins with one hidden layer and then two and three hidden layers. Three values of 0.6, 0.7, and 0.8 were considered to study the effect of momentum. Since changes in the number of the hidden layer nodes can have a significant impact on the accuracy of the network output, the number of nodes was changed between 1 and 10 to determine the best number of hidden layer nodes. Due to the above, to estimate the maximum temperature using Artificial Neural Networks, 450 models (5 combinations of input vector × a number of 1–3 hidden layer × 3 momentum states × a number of 1–10 nodes in the hidden layer) were designed to achieve the most suitable combination and arrangement of the network.

2.2 Artificial Neural Network

Artificial neural network like natural neural network is composed of elements called neurons. As in natural neural networks, in which some cells are responsible for the effects of external stimuli, some are responsible for information processing and some are responsible for the transmission of response to the target member, also in an artificial neural network, a number of cells are responsible for receiving the details of the issue, some are responsible for information processing and some also provide answers to the question. The neural network consists sequentially of an input layer, a hidden layer and an output layer. There are an input layer, an output layer and some hidden layer in all artificial neural networks. A set of data are used as neuron (which may be the output of other neurons) in mathematical neuron modeling. Each input is multiplied by its corresponding weight and then all the weighted inputs are added to determine the level of neuron activity. Figure 3 shows the model of a multi-input neuron (a multi-input neuron model).

Fig. 3
figure 3

The model of a multi-input neural network (Ramirez et al. 2005)

In Fig. 2; some inputs x1, x2, x3, …, xn enter the neuron which are briefly represented by vector x. Each neuron input belongs to one of the input signals. Each signal is multiplied by a corresponding related weight includingwi1, wi2, wi3, …, wij which is briefly shown as vector W. The values obtained are added inside the neurons and the output value is calculated:

$$ {\displaystyle \begin{array}{l} NET=x.w\\ {} NET={x}_1.{w}_{i1}+{x}_2.{w}_{i2}+\dots {x}_K.{w}_{ij}=\sum \limits_{i=1}^n{x}_i{w}_{ij}\end{array}} $$
(1)

The total obtained is more evaluated by a function called activation function and determines the output of the neuron:

$$ {y}_i=g(NET) $$
(2)

(g) is the activation function. The way and the shape of the connectivity of neurons in different layers create different structures in the artificial neural network.

The data used to evaluate and compare the performance of the models were not involved in the training process. Indices such as Root Mean Square Error (RMSE), coefficient of determination (R2), and the Maximum Absolute Error (MAE) were used to determine the best type of network arrangement. The appropriate method is the one that has the lowest RMSE and MAE. The smaller the value of RMSE and MAE, the closer the calculated values by the model to the true values.

$$ RMSE=\sqrt{\frac{\sum \limits_{i=1}^N{\left({O}_i-{E}_i\right)}^2}{N}} $$
(3)
$$ R=\frac{\sum \limits_{i=1}^N\left({O}_i-\overline{O}\right)\left({E}_i-\overline{E}\right)}{\sqrt{\sum \limits_{i=1}^N{\left({O}_i-\overline{O}\right)}^2.\sum {\left({E}_i-\overline{E}\right)}^2}} $$
(4)
$$ MAE=\frac{\sum \limits_{i=1}^N\left|{O}_i-{E}_i\right|}{N} $$
(5)

In which O is the observed values, E is the predicted maximum temperature to observe i … n. O and E are also the average observed values and the average predicted values respectively and N is the number of observations.

3 Results and Discussion

The obtained results for 5 basic compositions are listed in Table 2. As indicated in this Table, model 4 with three neurons in the input layer, a hidden layer with 9 neurons, and hyperbolic tangent function in the hidden layer and output layer, learning coefficient of 0.01, Momentum of 0.8, and in the case of regular arrangement of samples provides a better model. Network performance evaluation (Table 2) showed that the root mean square error and maximum absolute error of Model 4 were 0.001 and 0.61 °C respectively which were the best performance during training phase. Whatever closer these values to zero, the better the performance of the network. The coefficient of determination and correlation coefficients for both training and testing phases are equal to 0.99 which is significant at level of 1% (Fig. 4). And given that the other network assessment criteria were also within acceptable limits and estimated maximum temperature is closer to the actual values (Fig. 5), it can be said that the designed network has a good performance.

Table 2 Network performance evaluation for the designed models
Fig. 4
figure 4

The regression equation between the observed and predicted maximum temperature in the training phase by the model 4

Fig. 5
figure 5

Comparison of observed and predicted maximum temperature by the model 4

To assess the accuracy of the selected model, cross validation method is used (Table 3). This method is used for considering the goodness of parameters estimation and forecasts, as well as the selection of appropriate model for the data and outlier’s detection. The basic idea is removing the data and using other data to predict the deleted one. Then, the amount of the deleted data will be returned to its place. This procedure is repeated for the next data. Thus, all data will be forecasted and as a result, the accuracy of the model can be calculated.

Table 3 Validation of the training data by cross-validation method

To evaluate the methods used in this study, the correlation coefficient, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), of the models have been calculated using (Cadenas and Rivera 2009), (Chauhan and Shrivastava 2009) and (Dehghani and Ahmadi 2008) relationships. RMSE and MAE values for validation period are 0.01 and 0.127 respectively. And the obtained Mean of Bias Error is −0.005. The obtained results suggest that the average amount of observed and estimated values has fewer differences which results in predicting the maximum temperature of the area under study with less deviation by the model.

Changes in the input neurons combination and creation of 5 different models provide the chance of selecting the best network structure in each model and finally, considering the results of 5 models’ best structure and selecting the best model with the least error.

The results generally show that the increase of momentum term decreases the amount of error. In performing the above models, increasing momentum term from 0.7 to 0.8 decreases the amount of error in the network but increasing the momentum term from 0.8 to 0.9 increases each model’s amount of error. On the other hand, the obtained results from changing the number of the designed network hidden layers show that increasing the number of hidden layer from one to two increases the amount of network error. In many studies, just one hidden layer has been used due to higher efficiency and also faster performance of the model (Sudheer et al. 2002; Wang et al., 2008; Chauhan and Shrivastava 2009).

The assessment of network performance indicates that the Root Mean Square Error and the Maximum Absolute Error of model 4 equal 0.10 ° C and 0.275 ° C respectively which has the best performance in the testing phase. The closer these values to zero, the better the performance of the network. The determination coefficient and the correlation coefficient of both training and testing phases are 0.99 and 0.99 while the correlation coefficient is significant at 1% level (Fig. 6). And as the other assessment criteria of the network were also acceptable and the predicted maximum temperature values of testing phase in Table 4 has a little difference with true values (Fig. 7), therefore, it can be said that the designed network performs very well. The remaining is the difference between the observed values and the predicted values. The Scatter plot of the remaining against the independent variable can be drawn to consider the edited model more. The remaining must be scattered around the Zero Axis. As it is shown in Fig. 8, the distribution of remaining is random and the best condition is the scattering of the remaining randomly.

Fig. 6
figure 6

The regression equation between the simulated and observed value of maximum temperature in the test period by the model 4

Table 4 The comparison of observed and predicted maximum temperature during testing phase
Fig. 7
figure 7

Comparison of the simulated and observed value of maximum temperature in the test period by the model 4

Fig. 8
figure 8

The scatter plot of residuals in the testing period

Qnet 2000 can calculate the relative importance of present variables in different models. The importance of each of the input parameter can be determined by considering these values (Table 5).

Table 5 The relative importance of the input variables in the best structure of each the composition (percent)

Two parameters from the 5 input climate parameters, that is, the difference between maximum and minimum temperature and the mean temperature have the most influence on the output model (the maximum temperature). The combination and the number of parameters have been changed in designing each model. The results of the model show that more number of input parameters does not increase the accuracy of the model.

4 Final Results

Estimating temperature as one of the important climate factors which is a nonlinear, temporal-spatial phenomenon influenced by many climatic and geographical factors is of great importance. In this research, neural network has been used as a powerful tool in modeling nonlinear and undetermined processes to predict Tehran maximum temperature in winter. Considering the influence of using climate variable for model input shows that model 4 with three variables of the mean temperature, sunny hours and the difference between maximum and minimum temperature is the most accurate model because it can predict Tehran maximum temperature in winter with the least error and the most correlation coefficient. The most suitable structure to predict Tehran maximum temperature in winter showed that a model with three neurons in the input layer, a hidden layer with nine neurons and using hyperbolic tangent function in the hidden layer, that is, 1–9-3 arrangement in which the amount of Root Mean Square Error, correlation coefficient, and the Mean Absolute Error is respectively 0.61, 0.997, 0.001 for the training phase and 0.104, 0.997, 0.311 for the testing phase. The determination coefficient and the correlation coefficient for both training and testing phases equal 0.99 and 0.99 and the correlation coefficient is significant at the level of 1%. Given that the other network assessment criteria is acceptable and the estimated maximum temperature is closest to true values (Fig. 7), it can be said that the designed network has a very good performance. Neural network is of more importance relative to classical models because of having nonlinear and undetermined characteristics. Of course, it should be mentioned that neural networks consist linear models inside them, which are more comprehensive in relation to other classical methods.

Generally, it can be concluded that artificial neural network model is a powerful model which can be viewed positively in predicting climatic and hydrological issues. Specially, the ability of this network in extracting the law governing the data, even the noisy data, is one of the outstanding characteristics of this model in comparison with other models. The results of predicting maximum temperature can be used in environmental planning such as controlling pests and diseases, water resources management, ecological studies, and etc.