Comparative study on the effects of meteorological and pollutant parameters on ANN modelling for prediction of SO_{2}
- 166 Downloads
Abstract
Variations in meteorological parameters, different transportation mechanisms, complex reaction mechanisms and insufficient control measures have made control of ambient air pollution, a challenge. Prediction mechanisms for pollutant concentrations in advance have become necessary to regulate parameters within the acceptable limits. Artificial neural network (ANN) modelling can be used for predicting the concentrations of pollutants by establishing functional relationships between complex and nonlinear predictor variables and outputs. The application of time-series ANN models to predict air quality parameters was investigated by using six pollutant variables, five meteorological parameters and three time parameters to predict the concentration of SO_{2}. An industrial belt in the southern part of India was selected as the study area. Two years of input parameters and feed-forward back propagation algorithm were used to construct the ANN model. Input parameters were optimized using forward selection and backward elimination techniques. Mean squared error and coefficient of determination were used to evaluate the models. The developed model exhibited very promising performance evaluation characteristics with the best model resulting in MSE of 0.0115 and R^{2} value of 0.8979. The hybrid ANN models employing input parameter optimization techniques resulted in better performance characteristic values than conventional models. Any model exhibited a minimum reduction in MSE by 9% and 3% improvement in correlation. Pollutant parameters were found to affect the ANN models when compared to the meteorological parameters. Predicted values compared to the National Ambient Air Quality Standards were as low as 55% of the maximum allowable concentration in ambient air.
Keywords
Artificial neural network (ANN) Air quality modelling Prediction Backpropagation SO_{2} concentration Feedforward Time-seriesMathematics Subject Classification
92B20JEL Classification
C15 C45 Q351 Introduction
Ambient air pollution is one of the serious problems encountered by developing countries like India. In 2018, India was ranked 177 among 180 countries in the Environmental Performance Index (EPI) [1]. The study area selected is the industrial area of Ambalamugal in the South Indian state of Kerala. It is ranked 24th amongst the critically polluted areas (CPA) in India [2]. The major industries located in the area comprise of a petroleum refinery, a phosphatic fertiliser unit, petrochemical plant and a carbon black manufacturing industry. In the urban environment, various types of pollutants are released into the atmosphere at varying concentrations, at different heights and meteorological parameters also determine the transportation, dilution, dispersion, transformation, deposition and absorption of pollutants [3].
The main air pollutants in the atmosphere are carbon monoxide (CO), nitrogen oxides (NO_{x}), particulate matter (PM), sulphur dioxide (SO_{2}) and ozone (O_{3}) [4]. Exposure to air pollutants escalates the risk of contravention of respiratory diseases, such as asthma, respiratory infections and chronic obstructive pulmonary disease, in children and adults alike [5]. Considering the area of study, sulphur dioxide (SO_{2}) and sulphur trioxide (SO_{3}) are the major oxides of sulphur responsible for pollution along with sulphate-containing compounds (SO_{4}^{2−}). They are mainly produced by the combustion of fuel containing sulphur, from refining, manufacturing processes, municipal incineration and metal extraction processes. SO_{2} can cause acid rains, corrosion, damage to human, plant and animal health. These pollutants also pose heightened danger due to their synergistic effects as there is prospects of their interactions, having to exist in the same medium [6].
It is important that sufficient tools are available to predict and forecast the concentrations of these harmful pollutants well in advance. Air pollution models currently being adopted are of two types (i) Deterministic or Mechanistic models and (ii) Statistical or Data-driven models [7, 8]. The former approach considers mathematical representation of various pollutant transportation mechanisms and their reactions, chemically. Hence, they tend to be strenuous, exhaustive and computationally expensive. The latter on the other hand have the ability to identify the relationships between the input and output variables without having to evaluate all transportation mechanisms or chemical transformations of the pollutants. Artificial Neural Network (ANN) is a statistical model and its capability of learning, training and predicting with parameters that have nonlinear relationships with each other, is being utilized in this work [9]. ANN model is designed to replicate the simple function of a biological network and is used for solving complex nonlinear functions. ANN has the capability to recognize nonlinear relationships between variables and complex patterns in data sets that may not be adequately described by simple mathematical equations [6]. ANN can model relationships without making any prior assumptions regarding the data distributions [10]. If sufficiently trained, statistical models like ANN are found to be more appropriate in determining dependencies between pollutant concentration and predictor parameters than deterministic models [11].
The most common network used for prediction modelling is the multilayer perceptron feed-forward network where the information flows in one direction from input to output nodes [16]. The study was to develop an optimized feed-forward ANN model employing back propagation algorithm suitable for predicting the air quality parameters for the South Indian industrial area of Ambalamugal. A multilayer feed-forward ANN model is characterized by one input, at least one hidden and one output layers, each composed of neurons [13]. ANN models with Levenberg–Marquardt backpropagation algorithm having one hidden layer is used for prediction models usually [12, 15, 17]. Backpropagation algorithm feeds back the error produced by neural networks to the nodes to modify the connection weights and thresholds [12]. For this particular study intended at predicting the air quality parameters, time-series modelling was utilized. Time-series modelling analyses the data which is already available in the past and by extrapolating the future values based on this historical data [18]. It employed a type of dynamic filtering, i.e. the parameter values in the past of one or more time series were utilized to predict the data in future. Referred to as dynamic neural networks, they utilized tapped delay lines used for nonlinear filtering and prediction. ANN models have been effectively used to determine nonlinear solutions to prediction problems. ANN models can be effectively used to approximate any quantifiable function if they are designed and trained by appropriate historical data. ANN has the ability to predict values even in the presence of noise data and it learns and determines the patterns and generalizations in the available data.
If seasonal variations are duly considered, a full year of data is sufficient for developing statistical models if the emissions are present throughout the year [19]. The data sets have to be partitioned into training, validation and testing data sets such that every element of a subset represents the entire data set [15]. The data sets used in the modelling have to be cleaned, randomized and normalized. Removal of unusual values, errors and outliers is necessary before modelling as the network learns according to the input data sets. If introduced to an erroneous data set, the network will learn accordingly and ultimately results in erroneous results. The data sets have to be normalized to ensure that all the input values fall into comparable range. If not normalized, inputs having higher numerical magnitude have a tendency to mask the inputs which are of numerically smaller values [16].
As the number of input parameters increases, the complexity of the developed ANN model also increases which results in degraded model performance. Hence it is imperative that input optimization techniques are applied to narrow down the parameters to optimum by suitably eliminating those variables that have the least effect on the model’s efficiency [15]. Forward selection starts with no predictor variable in the model and subsequently adds those variables that have the highest correlation with the target output. Backward elimination begins with all predictor input variables and subsequently eliminates those parameters that provides the least increase in the squared error [15].
The number of hidden layers, the number of neurons constituting the hidden layers, the transfer functions and the training algorithm also affect the performance of the ANN models [13]. Training is the process of determining connection weights and thresholds to minimize the difference between the actual and predicted outputs [12]. Mean squared error (MSE) and coefficient of determination (R^{2}) are some of the parameters used for evaluation of developed networks. The higher the value of coefficient of determination (R^{2} = 1) and lower the value of mean squared error (MSE = 0), the network’s performance is considered to be superior [6]. The work also optimizes input parameters based on the performance of the ANN model using forward selection and backward elimination. Comparison of performance of optimized models is achieved using model performance evaluation. The effects of meteorological and pollutant parameters on the prediction capability of the ANN model are also compared.
There is a crucial need to monitor the concentrations of various pollutants and the relationship of various meteorological and pollutant parameters in the prediction of ambient air quality parameters. Even though studies are available for prediction of ambient air quality parameters, air quality models for monitoring pollutant concentrations are not popular in developing world. In particular, in critically polluted area like Ambalamugal in Kerala where the study is being conducted, even though continuous ambient air quality monitoring facilities are available, additional tools like ANN model is to be used to predict the concentrations of air quality parameters considering the equipment is taken out for maintenance or calibration. During the severe floods which affected the state of Kerala recently on 8 August 2018 due to unusually high rainfall during the monsoon season, the monitoring equipment was found not being functional. It is imperative that auxiliary mechanisms or techniques that can predict the air quality parameters are evolved, in addition to direct measurement devices installed in these areas. Additionally, comparison of effects of pollutant and meteorological parameters on the prediction capabilities of ANN-based modelling is to be conducted. This study aims to determine the capability of ANN networks in prediction of air quality parameters in an industrial area and the same is demonstrated by assessing the prediction capability of concentration of sulphur dioxide (SO_{2}) using various input parameters. Owing to site specificity, ANN models are individually constructed for each location. A neural network constructed and trained to perform for a particular monitoring location will not be effective in predicting the pollutant concentrations at a different location as the predictor variables would vary with sites [11, 17]. Hence the boundary of the applicability of the developed model is limited to the site under study.
2 Materials and methods
Artificial Neural Networks are statistical models used for training and prediction of outputs that have nonlinear relationships with their corresponding inputs. The topology, ANN parameters and the learning algorithm have been selected satisfactorily to predict the outputs with required performance level.
2.1 Study area
The study area selected was Ambalamugal, an industrial suburb in the South Indian state of Kerala. It is located 14 km towards East from Ernakulam in Kerala and has an altitude of 12 m above the sea level.
2.2 Software
The Neural Network Toolbox of MATLAB R2018a (The MathWorks Inc. USA) was used for constructing the air quality prediction model. The Neural Network Toolbox allowed selection of various parameters for configuring the desired network architecture.
2.3 Data pre-processing
The prediction model was developed using data sets collected from September 2016 to September 2018 stretching over a period of 2 years. Pollutant data was collected from the three continuous ambient air quality monitoring stations maintained by the petroleum refinery located in the study area. The facility is capable of measuring six air quality parameters, namely sulphur dioxide (SO_{2}), nitrogen oxide (NO_{x}), ammonia (NH_{3}), carbon monoxide (CO), particulate matter of size less than 10 µm in diameter (PM_{10}) and particulate matter of size less than 2.5 µm in diameter (PM_{2.5}). Meteorological data were collected from Indian Meteorological Department (IMD). Meteorological parameters being considered were surface temperature (T), rainfall (RF), relative humidity (RH), wind direction (WD) and wind velocity (WV). The time parameters considered for the study were month of the year (MY), day of the month (DM) and day of the week (DW). This was achieved by considering values from 1 to 12 for January to December; values from 1 to 31 for the corresponding day of the month; and values 1–7 for the corresponding day of the week (1—Sunday to 7—Saturday).
2.4 ANN model
The report was prepared based on the pollutant, meteorological and time parameters. Accordingly, three different types of architectures were considered. Model A was constructed using all the 14 input parameters. Model B was constructed using only SO_{2}, meteorological and time parameters. Here the other pollutant parameters were excluded. Model C was constructed using only the pollutant and time parameters. In Model C, the meteorological parameters were considered excluded.
2.5 Input parameter optimization
List of ANN models considered in the study
ANN model | Model description |
---|---|
A1 | ANN model constructed with all 14 input parameters (six pollutant + five meteorological + three time parameters) |
A2 | ANN model constructed with all 14 input parameters and Forward Selection (six pollutant + five meteorological + three time parameters) |
A3 | ANN model ANN model constructed using all 14 input parameters and Backward Elimination (six pollutant + five meteorological + three time parameters) |
A4 | ANN model constructed with the input parameters after optimization (six pollutant + five meteorological + three time parameters) |
B1 | ANN model constructed with all nine input parameters (five meteorological + three time parameters + SO_{2}) |
B2 | ANN model constructed with all nine input parameters and Forward Selection (five meteorological + three time parameters + SO_{2}) |
B3 | ANN model ANN model constructed using all nine input parameters and Backward Elimination (five meteorological + three time parameters + SO_{2}) |
C1 | ANN model constructed with all nine input parameters (six pollutant + three time parameters) |
C2 | ANN model constructed with all nine input parameters and Forward Selection (six pollutant + three time parameters) |
C3 | ANN model ANN model constructed using all nine input parameters and Backward Elimination (six pollutant + three time parameters) |
2.6 Model performance evaluation criteria
3 Results and discussion
Model A was developed using 9184 input data sets (14 parameters × 656 observations) collected from September, 2016 to September, 2018 stretching over a period of 2 years. The data sets were also partitioned such that 70% of the data (6440 data sets) was used for training, 15% (1372 data sets) for validation and 15% for testing (1372 data sets). Similarly, Model B was developed using 5904 input data sets (9 parameters × 656 observations). Model C was developed using 9 input parameters which consisted of pollutant and time variables. This prediction model B was developed using 5904 input data sets (9 parameters × 656 observations).
Performance evaluation of ANN models
Model | Model parameters | Inputs excluded from the model | No. of iterations | MSE | R^{2} |
---|---|---|---|---|---|
A1 | MY, DM, DW, SO_{2}, NO_{x}, NH_{3}, CO, PM_{10}, PM_{2.5}, T, RF, RH, WD, WS | – | 11 | 0.0188 | 0.8325 |
A2 | DM, DW, SO_{2}, NH_{3}, CO, WS | MY, NO_{x}, PM_{10}, PM_{2.5}, T, RF, RH, WD | 11 | 0.0129 | 0.8679 |
A3 | DM, DW, SO_{2}, NH_{3}, CO, PM_{10}, RF, WD | MY, NO_{x}, PM_{2.5}, T, RH, WD | 11 | 0.0158 | 0.8617 |
A4 | DM, DW, SO_{2}, NH_{3}, CO, PM_{10}, RH, WD, WS | MY, NO_{x}, PM_{2.5}, T, RH | 10 | 0.0124 | 0.8727 |
B1 | MY, DM, DW, SO_{2}, T, RF, RH, WD, WS | – | 15 | 0.0162 | 0.8675 |
B2 | MY, DW, SO_{2}, T, RF, WD | DM, RH, WS | 13 | 0.0152 | 0.8914 |
B3 | MY, DW, SO_{2}, T, RF, WD | DM, RH, WS | 10 | 0.0124 | 0.8935 |
C1 | MY, DM, DW, SO_{2}, NO_{x}, NH_{3}, CO, PM_{10}, PM_{2.5} | – | 9 | 0.0171 | 0.8559 |
C2 | MY, DW, SO_{2}, NH_{3}, PM_{2.5} | DM, NO_{x}, CO, PM_{10} | 14 | 0.0130 | 0.8944 |
C3 | MY, DW, SO_{2}, NH_{3}, PM_{2.5} | DM, NO_{x}, CO, PM_{10} | 15 | 0.0115 | 0.8979 |
Out of Model ‘A’ ANN models, Model A4 which was constructed using input parameters chosen utilizing input optimization techniques, provided better result when compared to the remaining models. In Model ‘B’, the ANN model combined with backward elimination provided the best results and Model ‘C’ also converged in similar lines. Out of all the 10 models evaluated, the best result was exhibited by Model C3. This model was constructed using pollutant and time parameters only. According to this model, 5 parameters namely, month of the year, day of week, concentration of SO_{2}, NH_{3} and PM_{2.5} were found to contribute maximum to the prediction capability of the model.
Also the predicted concentration of SO_{2} concentrations was compared with the National Ambient Air Quality Standards values [21] and were all below the allowable limit of 80 µg/m^{3} (24 h TWA value). The maximum value of SO_{2} obtained from predicted values was only 55% of the maximum allowable concentration in ambient air.
4 Conclusions
Out of all the models, the conventional model with all parameters exhibited reduced performance which implied input optimization and subsequent parameter selection resulted in improved performance. The other models exhibited an average reduction in MSE by 26% and an average improvement in correlation by 4%.
Comparison of performance of the best performing models of A, B and C ANN models showed that the best performance was exhibited by ANN model having pollutant and time parameters only. The minimum MSE obtained in any model was 0.0115 and maximum R^{2} value of 0.8979.
The best result was provided by models incorporating pollutant variables since the concentration of SO_{2} pollutant was being predicted in the study. Additionally, any model trained and evaluated showed a minimum reduction in MSE of 9% and improvement in correlation by 3%.
When compared with the National Ambient Air Quality Standards values, the predicted concentrations of SO_{2} were found to fall well below the allowable limit; as much as 55% of the maximum allowable concentration in ambient air. The rate of gaseous SO_{2} emissions into the ambient air and their control measures in place could be monitored by similar comparisons and were found to be in agreement.
The results revealed that ANN models can be effectively utilized for predicting the concentration of pollutants. As far as prediction capacity of the ANN models for primary air pollutants was concerned, the models exhibited very promising results. However, requirements of including additional parameters, while trying to predict secondary air pollutants where other reaction mechanisms have also to be considered, has to be studied on.
Comparing the results it was found in general that, all the models were showing values of MSE closer to zero and R^{2} closer to unity. Methods of input optimization and considerations of whether to include pollutant or meteorological parameters—all helped in fine-tuning the predicted results and reduced the disparity between the actual and predicted outputs.
Also, further studies could be conducted to understand the mechanism by which ANN was evolving relationships between input and output variables. Hybrid ANN model can be effectively utilized to predict the concentration of SO_{2} and similar air quality parameters in an industrial area.
Notes
Acknowledgements
The authors wish to thank Ms. BPCL-Kochi Refineries Limited, Kochi, India, and Ms. Indian Meteorological Department, Thiruvananthapuram, India, for providing pollutant and meteorological data, respectively.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animals rights
The present research does not involve human participants and/or animals.
References
- 1.Yale (2018) Environmental Performance Index—global metrics for the environment: ranking country performance on high-priority environmental issues, pp 1–4Google Scholar
- 2.Kerala State Pollution Control Board (2010) Action Plan for Greater Kochi Area, Executive Summary, pp 1–89Google Scholar
- 3.Goyal SK, Chalapati Rao CV (2007) Assessment of atmospheric assimilation potential for industrial development in an urban environment: Kochi (India). Sci Total Environ 376:27–39. https://doi.org/10.1016/j.scitotenv.2007.01.067 CrossRefGoogle Scholar
- 4.Oduber F, Calvo AI, Blanco-Alegr C, Castro A, Vega-Maray AM, Valencia-Barrera RM, Fernández-González D, Fraile R (2019) Links between recent trends in airborne pollen concentration, meteorological parameters and air pollutants. Agric For Meteorol 264:16–26. https://doi.org/10.1016/j.agrformet.2018.09.023 CrossRefGoogle Scholar
- 5.Kim D, Chen Z, Zhou LF, Huang SX (2018) Air pollutants and early origins of respiratory diseases. Chronic Dis Transl Med 4:75–94. https://doi.org/10.1016/j.cdtm.2018.03.003 CrossRefGoogle Scholar
- 6.Fraile R, Monsalve F, Tomás C (2013) Influence of meteorological parameters and air pollutants onto the morbidity due to respiratory diseases in Castilla-La Mancha, Spain. Aerosol Air Qual 13:1297–1312. https://doi.org/10.4209/aaqr.2012.12.0348 CrossRefGoogle Scholar
- 7.Athira V, Geetha P, Vinayakumar R, Soman KP (2018) DeepAirNet: applying recurrent networks for air quality prediction. Procedia Comput Sci 132:1394–1403. https://doi.org/10.1016/j.procs.2018.05.068 CrossRefGoogle Scholar
- 8.Azid A, Zain SM, Latif MT, Juahir H, Osman MR (2013) Feed-forward artificial neural network model for air pollutant index prediction in the southern region of Peninsular Malaysia. J Environ Prot (Irvine, Calif) 4:1–10. https://doi.org/10.4236/jep.2013.412a1001 CrossRefGoogle Scholar
- 9.Samarasinghe S (2010) Neural networks for nonlinear pattern recognition. In: Neural networks for applied sciences and engineering. https://doi.org/10.1201/9781420013061.ch3
- 10.Ghazali S, Ismail LH (2013) Air quality prediction using artificial neural network. Int J Adv Comput Eng Appl 3–5:1–5Google Scholar
- 11.Feng X, Li Q, Zhu Y, Hou J, Jin L, Wang J (2015) Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos Environ 107:118–128. https://doi.org/10.1016/j.atmosenv.2015.02.030 CrossRefGoogle Scholar
- 12.Md Kamal M, Jailani R, Shauri R L A (2006) Prediction of ambient air quality based on neural network technique. In: SCOReD 2006—proceedings on 2006 4th student conference on research and development. Towards enhancing res. excell. reg., pp 115–119, 27–28. https://doi.org/10.1109/SCORED.2006.4339321
- 13.George J, Arun P, Muraleedharan C (2018) Assessment of producer gas composition in air gasification of biomass using artificial neural network model. Int J Hydrogen Energy 43:9558–9568. https://doi.org/10.1016/j.ijhydene.2018.04.007 CrossRefGoogle Scholar
- 14.Fausett L (1994) Fundamentals of neural networks—architectures, algorithms and applications. Pearson Publications, London, pp 21–44zbMATHGoogle Scholar
- 15.Cabaneros SMLS, Calautit JKS, Hughes BR (2017) Hybrid artificial neural network models for effective prediction and mitigation of urban roadside NO2 pollution. Energy Procedia 142:3524–3530. https://doi.org/10.1016/j.egypro.2017.12.240 CrossRefGoogle Scholar
- 16.Shakerkhatibi M, Mohammadi N, Benis KZ, Sarand AB, Fatehifar E, Hashemi A (2015) Using ANN and EPR models to predict carbon monoxide concentrations in urban area of Tabriz. Environ Health Eng Manag J 2:117–122Google Scholar
- 17.Elangasinghe MA, Singhal N, Dirks KN, Salmond JA (2014) Development of an ANN-based air pollution forecasting system with explicit knowledge through sensitivity analysis. Atmos Pollut Res 5:696–708. https://doi.org/10.5094/apr.2014.079 CrossRefGoogle Scholar
- 18.Benkachcha S, Benhra J, El Hassani H (2015) Seasonal time series forecasting models based on artificial neural network. Int J Comput Appl 116:0975-8887Google Scholar
- 19.Carslaw DC, Carslaw N (2007) Detecting and characterising small changes in urban nitrogen dioxide concentrations. Atmos Environ 41:4723–4733. https://doi.org/10.1016/j.atmosenv.2007.03.034 CrossRefGoogle Scholar
- 20.Shen J, Chen J, Zhang X, Zou S, Gao Z (2017) Outdoor and indoor ozone concentration estimation based on artificial neural network and single zone mass balance model. Procedia Eng 205:1835–1842. https://doi.org/10.1016/j.proeng.2017.10.253 CrossRefGoogle Scholar
- 21.National Ambient Air Quality Standards (2009) Central Pollution Control Board Notification in the Gazette of India, Extraordinary, New DelhiGoogle Scholar