By inducing oxidative stress, air pollutants may lead to allergic inflammation and induce acute asthma exacerbations (Sfetsos and Vlachogiannis 2010). Because air pollutants can harm human health (Hong et al. 2011), the forecasting of air pollutant concentrations has received much attention. The air pollution forecasting is a complex issue but is closely related to human health and the environment (Wang et al. 2015). Nitrogen oxides (NO x  = NO + NO2) are emitted into the urban atmosphere primarily from vehicle exhausts. Primary NO x emissions are mostly in the form of nitric oxide (NO) which then reacts with ozone (O3) to form nitrogen dioxide (NO2) (Gardner and Dorling 1999). Many works have been carried out to determine the factors which control NO x and NO2 concentrations in order to enable the development of tools to aid in the forecasting of pollutant concentrations. One approach to predict future concentrations is to use a detailed atmospheric diffusion model. Such models aim to resolve the underlying physical and chemical equations controlling pollutant concentrations and therefore require detailed emissions data and meteorological fields. Collet and Oduyemi (1997) provide a detailed review of these particular types of models. The second approach is to devise statistical models which attempt to determine the underlying relationship between a set of input data (predictors) and targets (predicted). Regression modeling is an example of such a statistical approach and has been applied to air quality modeling and prediction in a number of studies (Shi and Harrison 1997; Ziomass et al. 1995). One of the limitations imposed by linear regression models is that they will underperform when used to model nonlinear systems (Gardner and Dorling 1998). The highly nonlinear processes of pollutant concentration genesis and its dynamics are only partially known, and they need complex computer modeling and simulation to obtain a reliable prediction (Brunelli et al. 2008).

Artificial neural network (ANN) modeling has proven to be a reliable air pollution time series modeling tool (Gardner and Dorling 1998; Niska et al. 2004). It can be used to derive nonlinear functions relating the concentrations of pollutants to meteorological and source characteristics. The accuracy of the trained neural network model in predicting the concentrations of an unseen data set reflect the extent to which the input parameters have captured the emission and dispersion pattern that have resulted from the observed concentrations (Elangasinghe et al. 2014). ANNs can model nonlinear systems and have been employed for modeling of systems (Chelani et al. 2002). Gardner and Dorling (1999) used a multilayer perceptron (MLP) approach, for modeling of NO and NO2 concentrations in London. They found out that the temporal variation of emissions could be represented by using the input variables of time of day and day of week. In addition, simple meteorological input variables are used, providing some indication of atmospheric stability, without the need for processing of the measured meteorological data. In the air quality forecasting, the selection of optimal input subset (Jiang et al. 2004) becomes especially a tedious task due to large number of measurements from heterogeneous sources and their nonlinear interactions. Applications of ANNs in the atmospheric sciences generally give better results than linear methods (Gardner and Dorling 1998).

The findings of numerous research studies also exhibit that the performance of ANNs is generally superior in comparison to traditional statistical methods, such as multiple regression, classification and regression trees, and autoregressive models (Gardner and Dorling 2000; Chaloulakou et al. 2003a; Grivas and Chaloulakou 2006; Palani et al. 2008; Elangasinghe et al. 2014). In this paper, we used ANN for forecasting air pollution in new geographic location (Tabriz) with deferent climate condition to confirm previous studies. So, in this work we developed a model that could make accurate short-term (hourly) predictions and since the relationship between NOx and NO2 and meteorology in Tabriz using ANN.


Study area

The study was carried out in the city of Tabriz in the northwest area of Iran. Tabriz is the center of east Azerbaijan province and is located in within the 46.17° east longitudinal and 35.05° north latitudinal position (Mojtabazadeh 2005). In the mid-twentieth century, Tabriz is selected as one of the industrial poles in Iran. Establishment of heavy industrial centers in west and southwest is the main factor in Tabriz air pollution (Sadr Mousavi and Rahimi 2008). The city is increasingly faced with development and population growth (Breuste and Rahimi 2015) and the most densely populated in northwest of Iran (Fig. 1).

Fig. 1
figure 1

The study area map (author’s illustration)

Artificial neural network

Artificial neural networks (ANNs) are able to approximate accurately complicated nonlinear input–output relationships. Like their physics-based numerical model counterparts, ANNs require training or calibration. After training, each application of the trained ANN is an estimation of a simple algebraic expression with known coefficients and is executed practically instantaneously. The ANN technique is flexible enough to accommodate additional constraints that may arise in the application (Palani et al. 2008). ANNs need a considerable amount of historical data to be trained; upon satisfactory training, an ANN should be able to provide output for previously “unseen” inputs (Palani et al. 2008, Antanasijević et al. 2013). The selection of input variables for an ANN forecasting model is a key issue, since irrelevant or noisy variables may have negative effects on the training process, resulting to an unnecessarily complex model structure and poor generalization power (Voukantsis et al. 2011).

ANNs represent complex, nonlinear functions with many parameters that are adjusted (calibrated or trained) in such a way that the ANN’s output becomes similar to measured output on a known data set. ANNs need a considerable amount of historical data to be trained; upon satisfactory training, an ANN should be able to provide output for previously “unseen” inputs. The main differences between the various types of ANNs involve network architecture and the method for determining the weights and functions for inputs and neurodes (training). The multilayer perceptron (MLP) neural network has been designed to function well in modeling nonlinear phenomena. A feed-forward MLP network consists of an input layer and output layer with one or more hidden layers in between. Each layer contains a certain number of artificial neurons (Palani et al. 2008).

The general procedure for the ANN simulation includes the following steps:

  1. 1.

    Representation of input and output vectors.

  2. 2.

    Representation of the transfer function.

  3. 3.

    Selection of the network structure.

  4. 4.

    Selection of the random weights.

  5. 5.

    Selection of the learning procedure.

  6. 6.

    Presentation of the test pattern and prediction or validation set of data for generalization.

Multilayer perceptron (MLP) is a feed-forward layered network with one input layer, one output layer, and some hidden layers. Figure 2 shows a MLP with one hidden layer. The task of every node is computing a weighted sum of its inputs and passing the sum through a soft nonlinearity. This soft nonlinearity or activity function of neurons should be no decreasing and differentiable. The most popular function is unipolar sigmoid Eq. (1):

Fig. 2
figure 2

The structure of a three-layer MLP (author’s illustration)

$$ f\left(\theta \right)=\frac{1}{1+{e}^{-\kern0.5em \theta }} $$

The task of the network is vector mapping, i.e., by inserting the input vector, Xq, the network will answer with the vector Z q in its output (for q = 1,…,Q). The goal is to adapt the parameters of the network in order to bring the actual output Z q close to corresponding desired output d q, (for q = 1,…,Q). The most popular method for training MLP is back propagation algorithm. Back propagation is based on minimization of a suitable error or cost function. Total sum squared error (TSSE) is considered as the cost function Eq. (2).

$$ \mathrm{TSSE}={\displaystyle \sum_q{\displaystyle \sum_k{\left({d}_k^q-{z}_k^q\right)}^2}}\kern0.5em \mathrm{f}\mathrm{o}\mathrm{r}\kern0.5em \left(q=1,\dots Q\right) $$

where \( {d}_k^q \) and \( {z}_k^q \) are the components of desired and actual output vectors, respectively. Training can be carried out in two modes: pattern mode and batch mode. Pattern mode is preferred because of easier implementation and less demand on memory. In the pattern mode, the correction of weights is made immediately after the error is detected; but in the batch mode, the individual error for all patterns are accumulated and then the accumulated error for entire training set` used for the correction of weights. In forward pass, the network outputs are computed by proceeding implementation forward through the network, layer by layer form Eqs. (3) and (4):

$$ \left\{\begin{array}{l}{\mathrm{net}}_j={\displaystyle \sum_j{x}_i{w}_{ij}}\\ {}{y}_i=\frac{1}{1+{e}^{-{\mathrm{net}}_j}}\end{array}\right.,\begin{array}{cc}\hfill j=1,\dots, {l}_2\hfill & \hfill \hfill \end{array} $$
$$ \left\{\begin{array}{l}{\mathrm{net}}_k={\displaystyle \sum_j{y}_j{u}_{jk}}\\ {}{z}_k=\frac{1}{1+{e}^{-{\mathrm{net}}_k}}\end{array}\right.,\begin{array}{cc}\hfill k=1,\dots {l}_3\hfill & \hfill \hfill \end{array} $$

where w ij is the connection weight between node i and j and u jk is the connection weight between node j and k, respectively. l 2 and l 3 are the number of neurons in hidden and output layers. In backward pass, the error gradients versus weights values, i.e., \( \frac{\partial E}{\partial {w}_{ij}} \) (for i = 1,…l 1, j = 1,…l 2) and \( \frac{\partial E}{\partial {u}_{jk}} \) (for j = 1,..l 2, k = 1,…l 3), are computed layer by layer starting from the output layer and proceeding backwards. Then the connection weights between nodes of different layers are updated by Eqs. (5) and (6):

$$ {u}_{jk}\left(n+1\right)={u}_{jk}(n)-\eta \times \frac{\partial E}{\partial {u}_{jk}}+\alpha \kern0.5em \left({u}_{jk}(n)-{u}_{jk}\left(n-1\right)\right) $$
$$ {w}_{ij}\left(n+1\right)={w}_{ij}(n)-\eta \times \frac{\partial E}{\partial {w}_{ij}}+\alpha \kern0.5em \left({w}_{ij}(n)-{w}_{ij}\left(n-1\right)\right) $$

where η is the learning rate adjusted between 0 and 1, α is the momentum factor in the interval [0,1] and is used to speed up the convergence as well as alleviating the local minima problem. The decision to stop training is based on some test result on the network, which is carried out every N epoch after TSSE becomes smaller than a threshold value (Vakil-Baghmisheh and Pavešic 2003; Rahimi 2016).


Hourly air pollution concentration data were collected from the Department of the Environment, automatic monitoring network during October and November 2012 for two monitoring sites (Abrasan and Farmandari sites) in Tabriz. Both sites represent the most polluted parts of the city which are located in the busiest part in the Tabriz. The data from both sites were combined to produce one series of data.

Hourly meteorological data were obtained for the same period from the department of Tabriz Met. Office. Meteorological data were selected to be used in this study, since these are the best representative for the whole of the urban area and also contain relevant derived atmospheric turbulence parameters. The meteorological variables in this work were as follows:

  • Wind speed (m/s)

  • Wind direction (degree)

  • Precipitation (mm.)

  • Vapor pressure (mbar)

  • Air temperature (°C)

  • Relative humidity (percent)

  • Total radiation (J)

  • Barometric pressure (mbar)

In order to be used with the MLP, meteorological data and concentration data were normalized, respectively, into the range 0–1 and 0.2–0.8. This was carried out by determining the maximum and minimum values of each variable over the whole data period and calculating normalized variables (Gardner and Dorling 1999). The available data set was separated into 745 training sets, 405 validation data set, and 232 testing data set.

Results and discussion

Choice of network structure

Feed-forward neural networks have been used in this study. The architecture of a net is established base on the numbers of neurons in the input and the output layers and the number of the hidden layers and/or the number of neurons in each hidden layer depends on the kind of the modeled system. Designing of the network architecture is based on the theory of Kolmogorov (Kolmogorov 1957). According to this theory, a feed-forward neural network, containing at least one hidden layer with (2N + 1) neurons, is able to approximate any continuous function converting the N-dimensional input vector into the M-dimensional output vector. This theory does not describe precisely a net architecture, but it is rather a starting point to optimization procedure.

In order to determine the optimum number of hidden nodes, a series of topology was used. the number of nodes was varied from 21 to 35. The starting point for nodes in this paper was based on Kolmogorov theory.

Each topology was repeated three times to avoid random correlation due to the random initialization of the weight. Figure 3 illustrates the relation between the network error and number of neuron in hidden layer. The root mean square error (RMSE) was used as the error function. The R 2 of each output was calculated by Eq. (7):

Fig. 3
figure 3

Effect of the number of neurons in hidden layer on the performance of the neural network in prediction of NO2 and NO x concentrations at test set (author’s illustration)

$$ {R}^2=\left(\frac{{\displaystyle {\sum}_i\left[\left({x}_i-\overline{x}\right)\left({y}_i-\overline{y}\right)\right]}}{{\left[{\left({\displaystyle {\sum}_i{x}_i-\overline{x}}\right)}^2{\left({\displaystyle {\sum}_i{y}_i-\overline{y}}\right)}^2\right]}^{1/2}}\right) $$

where x i is original target vector, \( \overline{x} \) is the mean of target vector, y i is the predicted vector, \( \overline{y} \) is the mean of predicted vector, and j is an index of data (Zupan and Gasteiger 1999).

Neural model development

Particularly, this step is crucial for a robustness and accuracy of the developed neural model. The following procedure was carried out for selection of input variables:

In the first model configuration, meteorological variables are tested as input. So, wind speed, wind direction, precipitation, vapor pressure, air temperature, relative humidity, barometric pressure, and total radiation were used one by one as input of network, and the NO2 and NO x concentrations were used for output. Then, a progressive increase of the number of the input variables was carried out, in order to increase the number of model parameters. The criterion for increasing the number of variables was the value of the correlation coefficient R 2 and RMSE. Thus, if the increase of a given input variable resulted in a decrease in the value of RMSE and increase in the value of (R 2), the variable was added in the model. If not, it was increased (RMSE) and decreased (R 2), the procedure was repeated with another variables, because the selection of input variables has significant effect on performance of networks. The network structure was selected 8-30-1. It was found that there is a good agreement between prediction and real data.

In the case of prediction of NO2 and NO x concentrations, we added NO (ANN predicted) and O3 concentrations to input variable set. With this modification in input variables set, significance increasing in regression coefficient (R 2) was observed. It can be due to interaction between this species. This interaction can be described by following chemical reaction set Eqs. (8) to (10), (Gürmen and Fogler 2006):

$$ {\mathrm{NO}}_2+hv\to \mathrm{NO}+\mathrm{O} $$
$$ \mathrm{O}+{\mathrm{O}}_2\to {\mathrm{O}}_3 $$
$$ {\mathrm{O}}_3+\mathrm{NO}\to {\mathrm{NO}}_2+{\mathrm{O}}_2 $$

RMSE in the selected network is 0.0046 and 0.0038; R 2 is 0.92 and 0.94 for NO2 and NO x , respectively. Table 1 shows the effect of input variable selection on network performance.

Table 1 The effect of different inputs on optimized network performance

Figures 4 and 5 present the comparisons of prediction results on the testing data during October and November 2012, for NO2 and NO x concentrations, respectively. It is shown that the prediction results generated by the MLP model are getting closer to the actual data. Multiple linear regression models were developed in this work for result comparison. The best model with the lowest RMSE 3.6 and 2.94 and the highest R 2 is 0.41 and 0.44 for NO2 and NO x concentrations, respectively. The results in this work exhibit that the performance of ANNs is generally superior for air pollution modeling in comparison to multiple linear regression (MLR) as a traditional statistical method.

Fig. 4
figure 4

Comparison of observed and calculated values for NO2 (author’s illustration)

Fig. 5
figure 5

Comparison between observed and calculated values of NO x (author’s illustration)

Importance analyses

The Garson method (Garson 1991) is shown by Olden and Jackson (Olden and Jackson 2002) and is based in the partition of the neural weights of the hidden and output layers. This method determines the relative importance (I) of jth input neuron in the output neuron. This relative importance is defined as:

$$ {I}_j=\frac{{\displaystyle {\sum}_{m=1}^{N^h}\left(\frac{\left|{W}_{jm}^{ih}\right|}{{\displaystyle {\sum}_{k=1}^{m={N}^h}\left|{W}_{km}^{jh}\right|}}*\left|{W}_{mn}^{ho}\right|\right)}}{{\displaystyle {\sum}_{k=1}^{k={N}^i}\left\{{\displaystyle {\sum}_{m=1}^{m={n}^h}\left(\frac{\left|{W}_{jm}^{ih}\right|}{{\displaystyle {\sum}_{k=1}^{m={N}^h}\left|{W}_{km}^{jh}\right|}}\right)*\left|{W}_{mn}^{ho}\right|}\right\}}} $$

where N h is the number of neuron in hidden layer, N i is the number of weight for each neuron in hidden layer and W mn is the weight of nth neuron in output layer.

The use of Garson method in this work reveals that the NO (ANN predicted) concentration, relative humidity, and air temperature are the best important variables in NO2 and NO x concentration prediction (Fig. 6).

Fig. 6
figure 6

Calculated importance (%) for each input variables in the prediction NO2 and NO x concentration (author’s illustration)

Why the air pollution forecasting is important?

Air pollution is rapidly increasing due to various human activities, and it is the introduction into the atmosphere of chemicals, particulates, or biological materials that cause discomfort, disease, or death to humans, damage other living organisms such as food crops, or damage the natural environment or built environment. Indeed, air pollution is one of the important environmental problems in metropolitan and industrial cities (Garcia Nieto and Alvarez Antَn 2014). Stoves in homes, vehicles, factories, and fires are different sources of air pollution. Both ambient (outdoor) and household (indoor) pollution exert many harmful effects on either human health or the environment (Bedoui et al. 2016). Increasing air pollution has become a global problem that is triggering both official anxiety and public concern. As reported in an assessment by the World Health Organization (WHO 2014), air pollution has become the largest single environmental health risk in many parts of the world, and around seven million people died from air pollution exposure in 2012, equivalent to one in eight of the total global deaths (Xie et al. 2016).

Air pollution in all major cities of Iran has reached a dangerous and alarming level. Air pollution poses a dire risk to Iranians today. The consequences can be measured in the numbers of pollution-related deaths, the number of school and work days lost to pollution, and additional health challenges experienced by children, the elderly, and people with heart or lung conditions (Khani 2016). These are drastic times for Iran’s big cities such as Tabriz.

The public is informed of air quality index (AQI) calculated from air pollutants concentrations forecasted and associated health risks through government announcements (Zhang et al. 2012). Therefore, an accurate and reliable model for forecasting air pollutant concentrations is important since it can provide advanced air pollution information at an early stage such that guiding the works of air pollution control and public health protection (Bai et al. 2016).

In recent years, many research efforts have been made to develop the air quality prediction models. Atmospheric dispersion models used to predict the ground level concentration of the air pollutants around the sources (Kesarkar et al. 2007; Bhaskar et al. 2008; Singh et al. 2012) require precised knowledge of several source parameters and the meteorological conditions (Collett and Oduyemi 1997; Gardner and Dorling 1998).

Linear and nonlinear methods for air pollution forecasting

In recent decades, air pollution has been considered a serious threat to the environment, the quality of life, and the health of people around the world and forecasting of air quality parameters is the common goal for a great number of researches due to the diseases caused by the different gas pollutants. In recent time, there have been many attempts to analyze the concentration of air pollutants and explore them to build short-term forecast of concentrations. Linear and nonlinear models were developed, however, there was no significance difference noted between nonlinear and linear models (Pires et al. 2008a; Pires et al. 2008b).

The statistical models attempt to determine the underlying relationship between a set of input data and targets. Several linear (multiple linear regression, principal component regression, partial least squares regression) and nonlinear (multivariate polynomial regression, artificial neural networks, support vector machines) regression models are now available, which have the ability to relate the input and output variables (Singh et al. 2012). Although linear regression modeling finds some applications in the air quality prediction (Shi and Harrison 1997), it generally does not permit for consideration of complex and nonlinearity in data (Gardner and Dorling 1998). Partial least squares (PLS) is a multivariate regression method that projects the input–output data down into a latent space, extracting a number of principal factors with an orthogonal structure, while capturing most of the variance in the original data. Multivariate polynomial regression (MPR) captures nonlinearities in data to some extent and is considered a low-order nonlinear method (Singh et al. 2010). ANN, which has the capabilities of nonlinear mapping, self-adaption, and robustness, has proved its superiority and is widely used in forecasting fields. Recently, various structures of the ANN have been developed for improving the forecasting performances of air pollutant concentrations (Bai et al. 2016). Results in this work confirm that the ANN in air pollution forcasting generally gives better results than linear methods.

Tabriz air pollution resource and the role of metrological parameter in pollution exacerbating

Environment pollution is a challenge to the modern society, especially in developing countries for example Iran. In the beginning of the century, industrialization and expansion of the factories became the main concern of Iranian big cities. The city of Tabriz in Iran that could hold a large population in it turned to become as one of the industrial poles in the country. Within the years 1967–1975, Tabriz city was the subject of new changes and developments. But in the process of industrialization and installation of the manufacturing sites and factories, some decision makers and executive managers did not try to take the geographical and topographical conditions of the city into their considerations; therefore, the city of Tabriz became more and more polluted and the people’s social health and hygiene were endangered (Mojtabazadeh 2005). Establishment of high industrial factories in the west and southwest of Tabriz, such as chemical and petrochemical industries, thermal power plant, and oil refinery, and blowing of wind from west and southwest transferred their pollution to the inner city (Sadr Mousavi and Rahimi 2008). In recent decade, changing the patterns of vehicle use, particularly in urban areas, and increasingly use of private cars instead of urban public transport cause that the vehicles are a significant source of emissions into the atmosphere and Tabriz air pollution. So, industrial factories and vehicles are the main air pollution factors in Tabriz now.

The result of the past study in Tabriz air pollution indicates that the metrological parameters and, especially, wind blowing are the main variables in intensification and alteration of Tabriz air pollution (Sadr Mousavi and Rahimi 2008, 2010, Mojtabazadeh 2005). But, the results in this paper show that the relative humidity and temperature are the main metrological variables in the prediction of NO2 and NO x concentrations and the wind importance for NO2 and NO x modeling approximately is 7%. While in Sadr Mousavi and Rahimi (2008) studies, wind speed and wind direction important in CO concentration modeling are 19.17 and 14.12%, respectively. Also, based on this work results, we cannot conclude like Mojtabazadeh (2005) results that the main metrological variable in Tabriz air pollution is wind blowing.


Fluctuations of hourly NO2 and NO x concentrations in Tabriz atmosphere for the period of October and November 2012 were studied. It is found that ANN is a useful tool for the short-term prediction of NO2 and NO x concentrations. An ANN trained by scaled-conjugate-gradient (trainscg) training algorithm has been implemented to model NO2 and NO x concentrations. The optimum structure of ANN was determined by obtaining a minimum RMSE for the test set. It was found that the structure of ANN with 30 neurons in the hidden layer has the best performance. It is also demonstrated that MLP neural networks had advantages over traditional MLR models.

This work shows that MLP neural networks can accurately model the relationship between local meteorological data and NO2 and NO x concentrations in an urban environment and the performance of ANNs is generally superior in comparison to traditional statistical methods, such as multiple regressions. So, this paper confirms the Gardner and Dorling 2000; Chaloulakou et al. 2003b; Grivas and Chaloulakou 2006; Palani et al. 2008; Elangasinghe et al. 2014 studies in new geographic location. The fluctuation of NO2 and NO x concentrations in this work could be influenced by local meteorological factors such as relative humidity and temperature.