Introduction

Human life depends on air (Nemerow et al. 2009; Borhani et al. 2017). In recent years, there are increasing concerns about air pollution, which is one of the leading environmental problems and environmental health (Borrego et al. 2006). The ambient air quality around industrial plants (including Cement plants, Bitumen insulation production plants, etc.) is affected by the dispersion of short-lived climate pollutants and of other sources which can cause some impairment to human health, air pollution and climate change (Merenu et al. 2007; Borhani et al. 2016, 2021a; Cheraghi and Borhani 2016a, b; Hoveidi et al. 2017; Mousavi et al. 2017; Borhani and Noorpoor 2020; Mousavi and Falahatkar 2020; Maddah et al. 2022). The cement industry is one of the major and strategic industries in the country. The cement industry, as the basis of the country's development, has a key role in the construction of housing, dam construction projects, industrial plants, buildings, road development, etc. With the growth of cement industry, cement industries are one of the biggest causes of air pollution (Humphreys and Mahasenan 2002; Ali et al. 2011; Assegaf and Jayadipraja 2015). One of the most important pollutants released from the cement industry is particulate matter. The most important sources of particulate matter emissions in this industry are including crushing, dry grinding, raw materials transportation, rotary cement kilns operation, Clinker cooling process, packaging and vehicle movement (Chehregani 2004). So, short-lived climate pollutants (with special emphasis paid to PM2.5 and PM10) monitoring and prediction studies are of great importance as they warn air pollutants emission systems and improve ambient air quality control regulations in an industrial plant (Baldasano et al. 2003; Özden et al. 2008).

Several studies assessed the environmental impacts of industrial plants productions on around environment. For example, Sharma and Pervez (2003) studied the changes seasonal of PM10 and SPM levels in ambient air around a cement plant. Their results show that there is a positive correlation coefficient between PM10 and SPM. Agrawal and Khanam (1997) investigated the concentration of particulate matter around a cement plant. The results showed Concentrations of particulate matter often exceeded permissible limits up to 2 km SE (southeast) of the source. Dust fall rate was quite significant up to 5 km SE. The dust fall rate was highest in winter, while total suspended particles (TSP) levels were highest in summer. Mohebi and Baroutian (2006) measured the concentration of PM10 dispersion in Kerman cement plant and predict PM10 concentration using the Eulerian model, Gaussian dispersion model and neural Networks (ANNs) and compared with the measured data. Borhani and Noorpoor (2017) investigated the cancer risk assessment of the release of volatile organic compounds (VOCs) from the production of insulation bituminous. Also, they studied and modeled the environmental effects of the emission of NOX and CO pollutants around the insulation bituminous plant (Borhani et al. 2019).

Further, several researchers have investigated the forecast of particulate matter (PM10 and PM2.5) concentrations using machine learning techniques. Jeong et al. (2020) used a machine learning algorithm called random forest method to predict PM10 concentration using air parameters (i.e., wind speed, relative humidity and air temperature). Masood and Ahmad (2020) presented a model for evaluating particulate matter (PM2.5) based on machine learning approaches such as support artificial neural networks (ANN) and vector machines (SVM). Vidnerová and Neruda (2021) modeled air pollution by machine learning methods including kernel methods, regularization networks, regularization networks with composite kernels and deep neural networks. Díaz-Robles et al. (2008) predicted particulate matter concentrations in Temuco city using Box–Jenkins time series (ARIMA) and multilinear regression (MLR) models. In another study, an ARMA/ARIMA time-series model was employed to predict the short-term series of the PM10 concentrations (Zhang et al. 2017).

The purpose of this research is to investigate the changes in particulate matter concentrations from 2016 to 2020 around the Kerman cement plant. Due to several constraints in measuring pollutants around the Kerman cement plant, measurements were taken only at seasonal intervals and only once in each season, at four sampling stations of the plant to provide relative coverage of the plant area both spatially and temporally. The number of measurements is not enough to achieve the goal of this study, which is to examine the changes in mean particulate matter concentrations around the Kerman cement plant and forecast for the next 1 year (2021). The novelty of this research consists in its sampling-resampling approach that combines the actual measurement of particulate matter with statistical analysis performed using Pandas libraries in Python programming language. Therefore, we investigate the employment of resampling strategies in imbalanced time series. Then, we use interpolation methods to fill missing values in series data using Pandas in Python. In the following, resulting correlations between particulate matter and meteorological parameters were investigated. Finally, the SARIMA time series analysis model is used for predicting particulate matter (PM) concentrations. Figure 1 shows a flowchart of the steps followed during the research.

Fig. 1
figure 1

Flowchart of research methodology

Materials and methods

Study area

Kerman city

Kerman, the capital city of the largest province of Iran, is in the southeast of Iran. The city with an area of 238.80 km2 is located at 30°17′30″ and 57°05′00″. Kerman is about 1755 m (5758 ft) above sea level, that makes it the third in elevation among provincial capitals of Iran. Kerman city has a moderate weather and the mean annual rainfall is 142 mm, the minimum and maximum temperatures are −7 and 39.6 °C, respectively, and the mean relative humidity is 36% (Kerman-met 2020). Kerman city is exposed to increasing levels of air pollution as a result of concentrated industrial activities and urbanization and transportation. Figure 2 shows the location of the city of Kerman in Iran.

Fig. 2
figure 2

Map of the study area and sampling stations in Kerman cement plant

Kerman cement plant

This plant is located in the southwest of Kerman and near Bandar Abbas city (Fig. 2). Kerman cement plant, established in 1965, is the first cement production plant in the southeast to produce different types of Portland cement type 2, type 5, Pozzolanic Portland cements, and oil well cement class G with a nominal capacity of 3600 tons of clinker per day according to the market demand and needs (Kcig 2020). Also, the Kerman cement plant is located approximately 17 km on the Kerman-Tehran highway. The presence of particulate matter in the exhaust fumes of cars traveling on the highway also imposes additional pollution on the area. Therefore, it can be termed as one of the most important sources of main air pollutants in Kerman.

Sampling

In the present study, the amount of environmental suspended particles in the air of the Kerman cement plant during the years 2016–2020 has been studied. A portable aerosol particulate monitor (Dust-Trak instrument, TSI Model 8534, Germany) based on the photometry and the Standard Methods of Examination (BS-EN-12341) with a measuring range of 0.001–150 mg/m3 and accuracy of ± 0.001 mg/m3 in the ambient air used for measuring the concentrations of particulate matter with aerodynamic diameters less than 2.5 µm (PM2.5) and 10 µm (PM10). Sampling was implemented in four corners of the Kerman cement plant which were located in short distance—a few hundred meters—from the origin of the Stacks in spring, summer, autumn and winter. In this method, the device is first calibrated and placed 1.5 m above the ground (De Nevers 2000) on the north side (in front of the waste depot), east side (opposite the clinker depot), south side (adjacent to the entrance door), and west side (in front of the mine) (Table 1). Then, the flow of the adjusting device (pump flow 0.0017 \({\mathrm{m}}^{3}/\mathrm{min}\) and suction volume 0.102 \({\mathrm{m}}^{3}\)) and sampling time were done for 60 min. Due to the capability of the device, the average amount of particulate matter (PM) in the air per unit volume was determined (McMurry 2000; Gokhale 2009). Meteorological data (including wind speed, relative humidity, air temperature and rainfall) data are recorded by Kerman Meteorological Station (Kerman-met 2020).

Table 1 Location information of sampling in Kerman cement plant

Resampling

Resampling involves changing the frequency of Kerman cement plant data’s time series observations. Upsampling and downsampling are two types of resampling (Brownlee 2019). Here, we used of upsampling scenario using Pandas in Python. We increased the frequency of the samples, from seasonally to monthly. For this purpose, we assigned data of a season to the first month of that season and filled the other 2 months with NaN values.

Interpolation

Then, our aim is to investigate the accuracy of four interpolation techniques to fill missing data from Kerman cement plant data. Data interpolation is a statistical method of estimation that finds new data based on the range of a discrete set of existing, neighboring values (Dan et al. 2020). Here, we applied the spline method, polynomial, PCHIP (piecewise cubic Hermite interpolating polynomial) and Akima interpolation method (Chapra and Canale 1988; Cheney and Kincaid 2012). The spline and polynomial methods can take different orders to perform the best interpolation, so in this study, orders 1, 2, and 3 were used for both methods. They are the simplest and the most common type of interpolation (Jung and Chong 2017).

Prediction

To predict the data, we used seasonal autoregressive integrated moving average (SARIMA or Seasonal ARIMA). SARIMA (p, d, q) models are based on autoregressive (AR) and moving average (MA) (Suhartono 2011). We used Partial Auto-Correlation Function (PACF) and Auto-Correlation Function (ACF) plots to find p, d and q parameters of the AR-I-MA model (Agrawal et al. 2017). Here, in the written code, we have defined a simple AR process and found its order using the ACF and PACF plots (Salvi 2019; Dettling 2013). Auto regressive process, a time series is said to be auto regressive when present amount of the time series can be obtained using previous amounts of the same time series, i.e., the present amount is weighted mean of the past amounts. The AR process of an order p can be written as,

$$y_{t} = C + \emptyset_{1} y_{t - 1} + \emptyset_{2} y_{t - 2} + \cdots + \emptyset_{p} y_{t - p} + \varepsilon_{t}$$
(1)
$$\varepsilon_{t} = y_{t} - y_{t - 1}$$
(2)

where C is intercept, \({\varnothing }_{i}\) (i = 1, 2... p) is auto-regressive model parameters, \({y}_{t}\) is current time-series value,\({y}_{t-1}\), \({y}_{t-2}\)\({y}_{t-{p}}\) is past values and \({\varepsilon }_{t}\) is random error. The Root Mean Square Error (RMSE) is a good measure of accuracy, but only to compare forecasting errors of different models or model configurations for a particular variable and not between variables, as it is scale-dependent. RMSE was given by the following equation (Beckerman et al. 2013; Elavarasan et al. 2018; Goap et al. 2018).

$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} (y_{i} - \hat{y}_{{i{ }}} )^{2} }$$
(3)

where \(N\) denotes the number of datum points in the set, \({y}_{i}\) is actual value and \({\widehat{y}}_{i}\) is model predicted value. Finally, the best interpolation method was selected by minimizing the residual mean squared error (RMSE) using cross-validation.

Result and discussion

Sampling data analysis

The results of PM2.5 and PM10 concentrations are presented in Table 2. Results showed that the maximum annual average of 24 h of particulate matter (PM2.5) concentration belong to the east side (opposite the clinker depot) in 2019 (31.50 μg m−3) and west side (in front of the mine) in 2020 (31.00 μg m−3), which is more than the allowable limit (the quality standard of PM2.5 is 25.00 μg m−3). Also, the maximum annual average of 24-h of particulate matter (PM10) concentration belong to the west side (in front of the mine) in 2020 (121.00 μg m−3) and east side (opposite the clinker depot) in 2020 (120.75 μg m−3), which is more than the allowable limit (50.00 μg m−3) (WHO 2013). Alizadeh et al. (2010) estimated the average of particulate matter 380 μg m−3 in the Kerman cement plant. The results also showed that particulate matter concentration (PM10 and PM2.5) varied in different units in the Kerman cement plant. The highest average concentrations of PM10 and PM2.5 identified in the east side (opposite the clinker depot), west side (in front of the mine), north side (in front of the waste depot) and south side (adjacent to the entrance door), respectively. These findings are in agreement with previous studies (Abu-Allaban and Abu-Qudais 2011; Ahmad et al. 2013; Aghamolaie et al. 2015; Jayadipraja et al. 2016). The concentration of PM10 in 2018, 2019 and 2020 is higher than the standard limits, but in general, the concentration of PM2.5 in the environment could be considered acceptable (Table 3). The highest concentrations of particulate matter (PM10 and PM2.5) are obtained in winter (24.45 and 64.30 µ/m3), followed by spring and summer but the lowest amounts in the autumn (Table 2). Our findings are similar to previous research (Leone et al. 2016; Shahri et al. 2019; Ciobanu et al. 2021; Borhani et al. 2022a).

Table 2 The average concentrations of particulate matter for the period of 2016 to 2020 in Kerman cement plant
Table 3 Annual average of particulate matter and meteorological parameters recorded from 2016 to 2020 around the cement plant, in Kerman, Iran

Correlation analysis

Table 4 presents correlation between particulate matter concentration and meteorological parameters were tested using the Pearson coefficients (r). The value of P-value below 0.05 was considered statistically significant (Field and Miles 2009; Carslaw and Ropkins 2012; Borhani et al. 2021b). The results indicated that the particulate matter concentration increases with increasing relative humidity and rainfall (p-value > 0.05). The above-mentioned findings are in accordance with Wang et al. (2010) which means relative humidity increases will lead to increased particulate matter concentration. There is an inverse relationship between particulate matter and wind speed and temperature (p-value < 0.05). This finding is similar to the results obtained in the research of (Nazif et al. 2019; Akbal and Ünlü 2022; Bañuelos Gimeno et al. 2022; Borhani et al. 2022b). Also, a significant positive correlation was found between PM2.5 and PM10, which showed that PM2.5 and PM10 have the same origin (r = 0.907).

Table 4 Spearman's rank correlations between particulate matter and meteorological parameters data from 2016 to 2020 around the cement plant, in Kerman, Iran

Results of the machine learning techniques on python

The sampling data trend and their prediction trend for particulate matter concentration are shown in Figs. 5 and 6 for each location from 2016 up to 2021. Also, Figs. 5 and 6 show the PM2.5 and PM10 concentrations predictions, in Kerman cement plant from 2020 to 2021, obtained using all interpolation methods. We have forecasted particulate matter concentrations for 2 years (2020 and 2021), using SARIMA statistical model. Table 5 presents the lowest RMSE values obtained from 8 interpolation methods. This table clearly shows that the spline order 1 interpolation method is relatively superior to other methods. The Akima's interpolation method has a high prediction accuracy for PM2.5 concentrations in location D (RMSE = 0.55).

Table 5 Residual mean squared error (RMSE) values of competing Interpolation methods

Residual mean squared error (RMSE) values of PM2.5 concentrations are lower than PM10 concentrations in competing interpolation methods (Table 5). So, we conclude that forecasting the PM10 concentrations is more difficult than PM2.5 concentrations. Figures 3 and 4 show the partial auto-correlation function (PACF) and auto-correlation function (ACF) plots for prediction particulate matter parameters (i.e., PM2.5 and PM10) from 2020 to 2021 in the Kerman cement plant. The shaded region represents the 95% confidence interval. Table 6 predicts future PM2.5 and PM10 with 95% confidence intervals for Kerman cement plant from January 2021 to December 2021 on monthly basis (Figs. 5 and 6). According to the forecast model, the concentrations of PM2.5 and PM10 have decreased by about 14.20 and 25.44%, respectively, in 2021 compared to 2020 (Tables 3 and 6). maybe to some extent, this decrease in the concentrations of particulate matter can be attributed to the effects of the COVID-19 pandemic on the air pollutants concentration during the lockdown in Iran.

Fig. 3
figure 3

The auto-correlation function (ACF) and partial auto-correlation function (PACF) plots for prediction of PM2.5 concentrations, a north side, b east side, c south side, d west side

Fig. 4
figure 4

The auto-correlation function (ACF) and partial auto-correlation function (PACF) plots for prediction of PM10 concentrations, a north side, b east side, c south side, d west side

Table 6 Forecast monthly average concentrations of PM2.5 and PM10 from January 2021 to December 2021 based on minimum RMSE values for each interpolation method
Fig. 5
figure 5

Time-series analysis (Train set, Forecast set, Algorithm forecast, Test set and Algorithm test) model of PM2.5 concentrations in sampling stations in Kerman cement plant from 2016 to 2020, a A, b B, c C, d D

Fig. 6
figure 6

Time-series analysis (Train set, Forecast set, Algorithm forecast, Test set and Algorithm test) model of PM10 concentrations in sampling stations in Kerman cement plant from 2016 to 2020, a A, b B, c C, d D

Conclusion

The paper presents the results of comparative analysis of the monthly trend of the concentration of PM10 and PM2.5 pollutants in 5-year average time series (2016–2020) and predict future trends in the next year (2021) in the Kerman cement plant. We show how to define and then compute efficiently the marginal likelihood of a SARIMA model with missing observations. We also showed how to predict and interpolate missing observations and obtained the mean squared error of the estimate. Therefore, different methods of sampling, resampling, interpolation and prediction on Python were used. The results indicated that the maximum annual average of 24-h of PM2.5 belonged to the east side (opposite the clinker depot) in 2019 (31.50 μg m−3) and west side (in front of the mine) in 2020 (31.00 μg m−3). Also, maximum annual average of 24-h of PM10 belonged to the west side (in front of the mine) in 2020 (121.00 μg m−3) and east side (opposite the clinker depot) in 2020 (120.75 μg m−3). According to the forecast model, the concentrations of PM2.5 and PM10 have decreased by about 14.20 and 25.44%, respectively, in 2021 compared to 2020. In the next step, the correlations of PM2.5 and PM10 and with meteorological parameters (i.e., temperature, wind speed, relative humidity and rainfall) were examined using Spearman's rank correlations. The results showed that PM2.5 and PM10 have a positive and significant correlation with relative humidity and rainfall and a negative correlation with temperature and wind speed. Based on our predictions, it is recommended that Kerman cement plant officials and administrators must make suitable and good decisions at the proper time for equipment and materials supply to control and reduce the effects of pollution by the production cement units because the ambient air quality around the Kerman cement plant is affected by the dispersion of industrial short-lived climate pollutants emissions, meteorological conditions, and other sources which can cause some impairment to air pollution and human health.