Forecasting air pollutants using classification models: a case study in the Bay of Algeciras (Spain)

Rodríguez-García, M. I.; Ribeiro Rodrigues, M. C.; González-Enrique, J.; Ruiz-Aguilar, J. J.; Turias, I. J.

doi:10.1007/s00477-023-02512-2

Forecasting air pollutants using classification models: a case study in the Bay of Algeciras (Spain)

ORIGINAL PAPER
Open access
Published: 22 July 2023

Volume 37, pages 4359–4383, (2023)
Cite this article

Download PDF

You have full access to this open access article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Forecasting air pollutants using classification models: a case study in the Bay of Algeciras (Spain)

Download PDF

1207 Accesses
3 Citations
Explore all metrics

Abstract

The main goal of this work is to obtain reliable predictions of pollutant concentrations related to maritime traffic (SO₂, PM₁₀, NO₂, NO_X, and NO) in the Bay of Algeciras, located in Andalusia, the south of Spain. Furthermore, the objective is to predict future air quality levels of the principal maritime traffic-related pollutants in the Bay of Algeciras as a function of the rest of the pollutants, the meteorological variables, and vessel data. In this sense, three scenarios were analysed for comparison, namely Alcornocales Park and the cities of La Línea and Algeciras. A database of hourly records of air pollution immissions, meteorological measurements in the Bay of Algeciras region and a database of maritime traffic in the port of Algeciras during the years 2017 to 2019 were used. A resampling procedure using a five-fold cross-validation procedure to assure the generalisation capabilities of the tested models was designed to compute the pollutant predictions with different classification models and also with artificial neural networks using different numbers of hidden layers and units. This procedure enabled appropriate and reliable multiple comparisons among the tested models and facilitated the selection of a set of top-performing prediction models. The models have been compared using several quality classification indexes such as sensitivity, specificity, accuracy, and precision. The distance (d₁) to the perfect classifier (1, 1, 1, 1) was also used as a discriminant feature, which allowed for the selection of the best models. Concerning the number of variables, an analysis was conducted to identify the most relevant ones for each pollutant. This approach aimed to obtain models with fewer inputs, facilitating the design of an optimised monitoring network. These more compact models have proven to be the optimal choice in many cases. The obtained sensitivities in the best models were 0.98 for SO₂, 0.97 for PM₁₀, 0.82 for NO₂ and NO_X, and 0.83 for NO. These results demonstrate the potential of the models to forecast air pollution in a port city or a complex scenario and to be used by citizens and authorities to prevent exposure to pollutants and to make decisions concerning air quality.

Statistical Methods to Forecast Air Quality in Taipa Ambient and Taipa Residential of Macao

A Statistical Model to Assess Air Quality Levels at Urban Sites

Article 05 November 2015

Macao air quality forecast using statistical methods

Article 27 July 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Air pollution is a real threat in today's world according to the World Health Organization (WHO). The European Directive 2008/50/EC regulates several key atmospheric pollutants, including particulate matter (PM), nitrogen dioxide (NO₂), sulfur dioxide (SO₂), ozone (O₃), and carbon monoxide (CO). Vessels-related atmospheric pollutants encompass sulfur dioxide (SO₂), nitrogen oxides (NO_x), and particulate matter (PM). Exposure to hazardous air pollutants emissions can lead to a range of human health problems, including respiratory disorders, cardiovascular disease, and increased risk of stroke. Manisalidis et al. (2020) showed an overview of the effects of air pollution on human health. A large number of scientific work has demonstrated that particulate matter directly affects human health by reducing air quality (Adeyemi et al. 2022). Air pollution in urban areas is a complex mixture of toxic components that have unhealthy effects on residents, especially sensitive populations such as children and people with cardiac and respiratory diseases (Kolehmainen et al. 2001). From an environmental point of view, the conduct of a study on the prediction of air pollutant levels or concentrations (inmisions) is crucial for the protection of human health and the environment. This research pretends to provide valuable insights into the factors influencing the distribution, temporal variations, and potential exposure risks associated with ambient pollutants. Accurate prediction models can be developed to forecast pollution levels, identify pollution hotspots and assess compliance with regulatory standards. These predictive models play a vital role in urban planning, industrial siting and the formulation of effective emission control strategies. By proactively predicting and mitigating high pollution episodes, air pollution forecasting research contributes to protecting public health, reducing environmental impact and promoting sustainable communities (Pope and Dockery 2006; Stieb et al. 2009; Kloog et al. 2013). A review of models to forecast air pollution health outcomes is presented by Oliveri et al. (2017), where different huge cities were compared regarding different pollutants. Besides, in Savouré et al. (2021), Subramaniam et al. (2022), Traina et al. (2022) artificial intelligence is applied to forecast air pollution related to human health.

Different studies show the air pollutants related to vessel traffic (Miola and Ciuffo 2011; Moreno-Gutiérrez et al. 2015; Ekmekçioğlu et al. 2020), and estimate the amount of pollution associated with ships in port areas (Lu et al. 2006; Liu et al. 2014; Fameli et al. 2020). These pollutants are sulphur dioxide (SO₂), nitrogen oxides (NO_x) and Particulate Matter (PM). Marine pollution is regulated by the International Maritime Organisation (IMO) through the Marine Pollution Protocol (MARPOL). Decarbonisation is the main purpose of the IMO and the reduction of emissions of Greenhouse gas emissions. An energy efficiency index is applied to vessels to indicate their classification (A, B, C, D, E) (MARPOL, Annex VI). The aim of the IMO is to achieve zero emissions by 2050 (IMO 2021). The air pollutants responsible for acid rain are sulphur dioxide (SO₂) and nitrogen oxides (NO_x) in the atmosphere, which react with water, oxygen, and other chemicals to form sulphuric acid and nitric acid. NO₂ is primarily responsible for the formation of smog and acid rain in urban areas, causing both acute and chronic effects (Menezes and Popowicz 2022). These pollutants are emitted from the combustion of fossil fuels in industrial processes, power generation, and transport. The main pollutants associated with port activity are presented in (Yang et al. 2022; Yeh et al. 2022; Mueller et al. 2023).

In recent decades, artificial neural networks (ANNs) have been applied in the field of air quality forecasting in a wide range of literature (Kukkonen et al. 2003; Fernando et al. 2012; Hu et al. 2021; Muruganandam and Arumugam 2023). Numerous studies have been developed using artificial intelligence (AI) and machine learning techniques in monitoring the air quality (Bai et al. 2018; Mclean et al. 2019; Baklanov and Zhang 2020; Liu et al. 2021; Masood and Ahmad 2021). Bai et al. (2018) analysed the three classical methods for forecasting air pollution (statistical, artificial intelligence, and numerical prediction methods). There is literature on air quality in urban areas using different statistical methods to forecast air quality (Mavroidis et al. 2007; Ilacqua et al. 2007; Lu et al. 2014). Considering meteorological aspects, in Mavroidis et al. (2007) a successful methodology was suggested for assessing the impact of different emission reduction scenarios on the attainment of air quality standards for CO and NO₂ in the Athens area. Furthermore, in Ribeiro and Gonçalves, (2022), in Portugal, NO₂ is classified as a binary objective using a benchmark model. In Durão et al. (2016), classification and regression tree techniques were successfully used to predict ozone in Sines (Portugal). For NO₂, Prati et al. (2015) provided an insight into the relevance of a spatial analysis of data that provides knowledge on how ship emissions affect the air in a port city. To forecast air quality in urban areas, Lu et al. (2014) proposed different semi-parametric regression models. Particulate matter (PM) sources in three European cities (Athens, Basle, and Helsinki) are described and analysed using structural equation modelling in parallel with traditional principal components (Ilacqua et al. 2007). Similar machine learning techniques are used by Lakra and Avishek, (2022) to forecast fog, which is also related to meteorological factors. Other techniques are used to construct air quality models. In García-Nieto et al. (2015) air quality in Oviedo (Spain) was modelled using multivariate adaptive regression splines (MARS) and subsequently, support vector regression (SVR), multilayer perceptron (MLP), were specifically used to forecast PM₁₀ concentrations in the same city by García-Nieto et al. (2018). In addition, meteorological variables are considered by Luna et al. (2019), where low-cost electrochemical sensors are used to quantify air pollution exposure, prediction, and control of CO₂ and SO₂ concentrations using ANNs. The most relevant information extracted from this study was that pollution prediction is sensitive to humidity, wind speed, and temperature. Therefore, the use of ANNs could predict and impute missing values or re-evaluate doubtful values. A method for predicting SO₂ emissions in several cities is shown by Ju et al. (2023), which is of great help for accurate control of this pollutant. Applied to megacities, He et al. (2014) provided an ANN-based method, in particular a multilayer perceptron (MLP), that predicts fine particles suggesting that particulate matter concentrations are generated by traffic and controlled by weather conditions.

Air quality assessment, from an operational point of view, requires the characterisation of atmospheric quality (Corani and Scanagatta 2016; Méndez et al. 2023). The aim of this work is to predict future values of the levels of each pollutant. Machine learning methods based on classification models have been used for this purpose. A comprehensive comparison of classification models was developed. The classifiers tested were trees, support vector machines (SVMs), artificial neural networks (ANNs), ensembles, K-nearest neighbours (KNNs), discriminant, and naïve Bayes. Most of them have already been successfully used by authors in different papers (Turias et al. 2008; Ruiz-Aguilar et al. 2020; Song and Fu 2020; González-Enrique et al. 2021; Moscoso-López et al. 2022). Regarding local studies, the impact of ship propulsion systems on air pollution in the Strait of Gibraltar in 2017 is presented in Durán-Grados et al. (2022). This study is based on an inventory of ships crossing the Strait and calling at the ports of Algeciras, Tarifa, and Ceuta. In Martín et al. (2008) air pollution was modelled with classification techniques in the Bay of Algeciras (Spain). Additionally, Rodríguez-García et al. (2022) conducted an extensive analysis of statistical, risk, and trends developed in the area of the Bay of Algeciras from 2010 to 2015. Furthermore, due to the large number of inputs used to build the models, the problem of the curse of dimensionality (Bishop 2006) could arise. Therefore, a feature selection stage was applied using the Minimum Redundancy Maximum Relevance (mRMR) method, which has been successfully tested by the authors previously in air pollution forecasting problems (González-Enrique et al. 2021).

The main motivation of this manuscript is to provide citizens with reliable information on air pollution forecasts. This challenge is achieved through a data-driven approach using historical data and machine learning techniques, which will be explained in more detail in the next sections. Improving the air quality in populated cities is another of the main motivations for this study, which is carried out in the Bay of Algeciras (southern Spain), where the most important port in Spain and the fourth in Europe in terms of cargo traffic is located. The importance of maritime traffic in Algeciras, which has experienced a massive increase in the last ten years, in terms of air pollution, lies in the fact that this increase in the number of vessels in the port of Algeciras may affect the air quality in the area and in the nearest city (Algeciras). Since there have been few studies on air pollution in this strategic area of port activities in terms of pollution, this research can make a specific contribution.

Another main contribution of this work is the use of a classification-based machine learning scheme to predict the next level of a pollutant, including an analysis of the most relevant variables (using mRMR) for each of the pollutants and sites studied. In addition, many different classification methods were used and compared. This research has allowed us to develop a procedure for predicting future pollution levels, both on an hourly basis for nitrogen oxides (NO₂, NO_x, and NO) and, on a daily basis for SO₂ and PM₁₀. The results obtained are suitable for the design of air pollution forecasting system that can be used by citizens or institutions to support decision making.

The rest of this article is organised as follows: Sect. 2 describes the database, the site, the case study and the regulations to be applied, Sect. 3 presents the methodology including the classification models tested in the study together with the feature selection process and the experimental procedure used to achieve the objectives, Sect. 4 presents and discusses the results and, finally, Sect. 5 draws the main conclusions.

2 Materials

The importance of environmental studies in this area is due to the fact that the Port of Algeciras is located in this area, handling more than 100 million tonnes of goods per year since 2017, and is located in an area with special meteorological and orographic conditions, the Strait of Gibraltar, as well as in a highly industrialised region where the Port of Algeciras coexists with numerous industries (a refinery, several chemical and thermal power plants, a stainless steel factory, etc.), together with several highways and the Gibraltar airport.), together with several motorways and Gibraltar airport, contribute to a very complex air pollution scenario. Maritime traffic in Algeciras has increased dramatically over the last decade. It is logical to think that the increase in the number of vessels in the Port of Algeciras could affect the air quality in the area.

In order to develop this study, the main pollutants related to port activities were selected as shown in (Yang et al. 2022; Yeh et al. 2022; Mueller et al. 2023). Immission data of SO₂, NO₂, NO_X, NO and, PM₁₀ concentrations, meteorological data (relative humidity, solar radiation, temperature, atmospheric pressure, wind speed, wind direction, and rainfall) were provided through the Andalusian Government's monitoring network, and the vessel gross tonnage (GT) database was provided by the Algeciras Bay Port Authority, all for the years 2017 to 2019. Similar studies, such as López-Aparicio et al. (2017), analysed all these pollutants in a Nordic port and concluded that the main emission contributions come from berthed vessels and manoeuvres.

The Andalusian Government’s system of sensors in the Bay of Algeciras includes a total of sixteen air pollutant monitoring stations and five specialised meteorological sensors (W_1-5) distributed throughout the bay (see Fig. 1), which record hourly data of each pollutant and meteorological values over a three-year period, from 1st January 2017 to 31st December 2019 (see Table 1). The meteorological sensors W₃, W₄, and W₅ are located in the chimney of a refinery at different heights, 10 m, 15 m, and 60 m. The data analysed are recorded at stations in the towns of Algeciras and La Línea and in the Alcornocales Park, in order to compare three distant locations. The importance of Algeciras and La Línea spots is due to their coastal areas and the huge port of Algeciras, with massive truck traffic, and Alcornocales Park is an unspoilt area far from anthropogenic activity. In addition, La Línea and Algeciras are two cities located opposite each other, thus studying both can shed more light on air pollution immissions. Algeciras is the most populated city in the bay with 122,982 inhabitants in 2021 and La Línea is the second most populated city with 63,365 inhabitants.^{Footnote 1} The entire database consists of 131 variables. In each experiment, the output variable is the concentration of each pollutant in each of the monitoring stations according to the rest of the study variables described in Table 1 (pollutant concentrations in the rest of the monitoring stations, meteorological information and vessel data).

Table 1 Monitoring stations codes. Meteorological variables codes. Pollutant variables

Forecasting air pollutants using classification models: a case study in the Bay of Algeciras (Spain)

Abstract

Similar content being viewed by others

Statistical Methods to Forecast Air Quality in Taipa Ambient and Taipa Residential of Macao

A Statistical Model to Assess Air Quality Levels at Urban Sites

Macao air quality forecast using statistical methods

Explore related subjects

1 Introduction

2 Materials

3 Methodology

3.1 Classification

3.1.1 Trees

3.1.2 Discriminant analysis

3.1.3 Naïve Bayes

3.1.4 Support Vector Machines (SVMs)

3.1.5 KNN

3.1.6 Ensemble learning

3.1.7 Artificial neural networks (ANNs)

3.2 Feature selection

3.3 Experimental procedure

4 Results

5 Conclusions

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation