Data Fusion from Multiple Stations for Estimation of PM2.5 in Specific Geographical Location
- 1.1k Downloads
Nowadays, an important decrease in the quality of the air has been observed, due to the presence of contamination levels that can change the natural composition of the air. This fact represents a problem not only for the environment, but also for the public health. Consequently, this paper presents a comparison among approaches based on Adaptive Neural Fuzzy Inference System (ANFIS) and Support Vector Regression (SVR) for the estimation level of PM2.5 (Particle Material 2.5) in specific geographic locations based on nearby stations. The systems were validated using an environmental database that belongs to air quality network of Valle de Aburrá (AMVA) of Medellin Colombia, which has the registration of 5 meteorological variables and 2 pollutants that are from 3 nearby measurement stations. Therefore, this project analyses the relevance of the characteristics obtained in every single station to estimate the levels of PM2.5 in the target station, using four different selectors based on Rough Set Feature Selection (RSFS) algorithms. Additionally, five systems to estimate the PM2.5 were compared: three based on ANFIS, and two based on SVR to obtain an aim and an efficient mechanism to estimate the levels of PM2.5 in specific geographic locations fusing data obtained from the near monitoring stations.
KeywordsANFIS PM2.5 estimation Support Vector Regression
The World Health Organization (WHO) has studied the harmful effects of the polluted air on the human health, showing the need of monitoring different types of pollutants, including pollutants like PM2.5 particles in the cities of developed and developing countries . It is because there are evidences that these particles have a high association with cancer, heart diseases, lung diseases, and low tract respiratory infections, increasing the morbidity and mortality of the population in a considerable way. Therefore, to measure or estimate the PM2.5 in order to determine the values of the air quality, everywhere in an effective way, is very important in order to establish preventive measures and reduce the risk on the health of the population.
There are three types of models, which are widely used, for predicting the quality of the air: Probabilistic, autoregressive and hybrids . Probabilistic models use different techniques to assess the relationship between the air quality and the meteorological factors . These are usually cheap computational models, such as Hidden Markov Models (HMM), which allow to do the accuracy statistic prediction with relatively less detailed data in relation to the prognostic in many areas [2, 3, 4]. However, these techniques have problems when the sampling data are limited, and the model may not work as expected which limits their ability of predicting future events. The nonlinear relationship between PM2.5 and meteorological data pose a problem of nonlinear regression between predictors, so models of artificial neural networks (ANN) are used, which have the ability to detect underlying nonlinear relationships between the responses and the predictors. These can be trained using some algorithms and require much less informatics resources [4, 5, 6, 7, 8]. The Autoregressive models have been widely used to predict the air quality, but with a variable precision due to their capacity of application to nonlinear processes and their dependence of the quality of the input data [9, 10]. The support vector machines (SVM) are designed to solve nonlinear classification problems; but because of its ability to generalize, it has shown its application in regression problems and time series forecasting. The SVM makes use of a kernel function, attributing its great capacity for generalization, even when the training set is small, i.e., where both the generalization and the training process of the machine do not necessarily depend on the attributed number; allowing an excellent performance in high dimensional problems . Referring to hybrid models as ANFIS , they have better performance, because they have the capacity to predict the maximum PM2.5 concentrations, that are considered a critical factor in the prediction system of air pollution ; and they also give adequate solutions to nonlinear problems, ambiguity, and randomness of the data. Other reported studies for the prediction of meteorological variables and contaminants have, in most of them, limitations in predictions respect to time and generality. Also, they do not focus on the estimation of variables in specific geographic locations and cities with a limited number of measurement stations and/or limited variables measurements, as is the case in Medellín city [13, 14, 15, 16, 17, 18, 19], which is the principal contribution of this work.
In this study, a comparison among three ANFIS approaches, and two SVM approaches to estimate the PM2.5 was applied together with a relevant analysis based on Rough Set Feature Selection. This aimed at reducing the number of features obtained from the fused measures of near monitoring stations, in order to provide an objective, and an accurate mechanism for more reliable estimation of the PM2.5 concentration on a specific geographical location. Thus, this was obtained from meteorological data and air quality variables in order to give the community an adequate information about the air quality to make decisions on the environmental exposure.
2 Materials and Methods
2.2 Theoretical Background
Adaptive Neural Fuzzy Inference System (ANFIS).
Support Vector Machines.
3 Results and Discussion
PM2.5 estimation -ANFIS and SVR
St- 2 (%)
St- 3 (%)
In this study, meteorological and pollutant variables obtained from three nearby measurement stations were fused, and a relevance analysis using techniques based on RS for selecting was carried out. This allowed to obtain objectives, and accurate mechanisms to estimate the PM2.5. Besides, five regression systems were compared, where the results were similar. The best results were obtained using the three ANFIS and one SVM systems, which have demonstrated the capability of the variables selected together with data fusion of nearby stations to estimate the PM2.5 in specific geographical location, which demonstrated the effectiveness of our proposed procedure. However, the adjustment of parameters of the regression systems can be optimized using metaheuristic algortithms in order to obtain major results in terms of accuracy. In addition, this study should be extended using other database from other geographical locations in order to increase its generality.
This work was supported by the research project identified with code 267 at the “Institución Universitaria Salazar y Herrera” of Medellin, Colombia, CALAIRE Laboratory of “Universidad Nacional of Colombia”, and the Area Metropolitana de Medellín, who supplied the database.
- 1.OMS | Calidad del aire (exterior) y salud, WHO. http://www.who.int/mediacentre/factsheets/fs313/es/. Accessed 24 Oct 2015
- 11.Velásquez, J.D., Olaya, Y., Franco, C.J.: Time series prediction using support vector machines. Ingeniare, 64–75 (2010)Google Scholar
- 18.Pai, T.-Y., Hanaki, K., Su, H.-C., Yu, L.-F.: A 24-h forecast of oxidant concentration in Tokyo using neural network and fuzzy learning approach. CLEAN – Soil Air. Water 41(8), 729–736 (2013)Google Scholar
- 19.Polat, K.: A novel data preprocessing method to estimate the air pollution (SO2): neighbor-based feature scaling (NBFS). Neural Comput. Appl. 21(8), 1–8 (2001)Google Scholar
- 23.Orrego, D.A., Becerra, M.A., Delgado-Trejos, E.: Dimensionality reduction based on fuzzy rough sets oriented to ischemia detection. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5282–5285 (2012)Google Scholar
- 24.Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 2(3), 267–278 (1994)Google Scholar