1 Introduction

The atmosphere is being polluted by air, water, and land for various reasons [1]. Particularly, polluted air affects human health to a greater extent [2]. Inhaling poor-quality of air damages both respiratory and cardiovascular organs [3,4,5]. Along with that, the pollutants also affect plants and other living beings (Kumar). Such air pollution is mainly caused by burning fossil fuels, which release an excess of sulphur dioxides and nitrogen oxides that gets mixed up with air [6]. In addition, the emission of smoke from vehicles and industries also becomes a source of air pollution. To protect the environment from danger, AQI (Air Quality Index) is being calculated to measure the emission level of gases which determines the highly polluted areas [7]. Based on the information collected from the index, appropriate actions can be taken to protect the health of humans.

Also, most countries utilize AQI to provide policies for better air quality [8, 9]. Identifying particular hotspots experiencing greater levels of air pollution is being controlled by targeted actions and considerably improves the future well-being of the environment [10].Due to the serious impacts of air pollution, it is found to be very sensitive in predicting air quality. For this purpose, many AI (Artificial Intelligence) based algorithms have been used in prediction. Recently, with the progress of computing algorithm technology, deep learning models have begun to be used to analyze and forecast nonlinear relationships between data variables. In a similar way, deep learning models have significantly improved data analysis performance because they provide highly reliable results [11]. Correspondingly, RFR (Random Forest Regression) and SVR (Support Vector Regression) have been utilized in [12] for building the regression structure in the prediction of AQI. The concentration of nitrogen oxides in an Italian city from a public dataset has been analyzed. Coefficient correlation, RMSE, along with R2 calculation, has measured the performance of both models and exhibited that both methods have achieved better prediction efficiency. Similarly, the suggested research [13] has utilized an optimal and hybrid method that has fused the benefits of SD (Secondary Decomposition) with an AI optimization algorithm. AQI data collected from china between 2016 and 2018 has been considered to verify the prediction model. Wavelet decomposition has been generated with higher frequency, lower frequency, and variational mode, along with sample entropy for smoothening the decomposition. Followed by that, LSTM (Long Short Term Memory) based NN (Neural Network) has been adopted for the prediction of AQI [14]. In addition, parameters of least-square SVM (Support Vector Machine) have been optimized by the BAT technique by considering the factors of air pollutants. Outcomes have revealed that it has captured characteristics of AQI data efficiently.

Recognizing the influence of pollution in air through the AQI algorithm has been performed through the considered model [15] depending on certain meteorological attributes. The collection of such parameters has been done through the inverse-distance-weighted statistical method. Analysis outcome has been achieved through measurements obtained from three different stations in Kuala Lumpur during the time interval between June and August 2018. In addition, OLS (Ordinary Least Squares) process in GIS (Geographical Information System) based on statistical analysis in recommended research [16] has been adopted for three months’ time period, and the predicted value of AQI has been compared with the observed AQI. Outcomes revealed that the considered analysis [17, 18] had provided efficient monitoring and prediction of AQI with the expected accuracy level.

Moreover, the evaluation of the prediction of pollutant concentration on an hourly basis has been performed with the ANN (Artificial Neural Network) in Iran for the complete year from 2009 to 2010. The suggested algorithm [19] neurons in the hidden layer and a single output layer at the final stage. Six pollutants, such as O3 PM 10, SO2, CO, NO2, and PM 2.5, have been considered in [20, 21] AQI measurements from R2 are significant in all three areas. RMSE value obtained through comparison has achieved better prediction on air quality which has assisted in making effective decisions. The earlier prediction of air quality aids in controlling pollution considerably [22, 23]. Equivalently, recommended research [24] has established multiple site quality prediction structures using the DL model based on a clustering strategy. The performance of the prediction of BPNN for four hours has been ranked in ascending order.

Additionally, CNN (Convolutional Neural Network) integrated with LSTM has been performed for multiple-hour analysis [25]. Performance outcome has indicated that CNN with LSTM has gained a better level of prediction than with CNN and BPNN. The emission of air pollutants has been monitored through the heuristic optimization method in [9], which has conducted pre-processing techniques in detecting outliers and has excavated significant characteristics in optimizing extreme learning parameters. Experimental analysis has emphasized the significance of outlier detection and provided a feasible solution for predicting AQI measures [26, 27]. It has also been investigated that the prediction methods have faced difficulty with time consumption and complexity. To tackle such issues, the adoption of SVM in the recommended study [28] has utilized three significant parameters such as regularization, penalty factor, and kernel function, in which the attributes of kernel function have been improved to enhance the performance of the algorithm. Outcomes from the considered model have been analyzed with error metrics. Efficiency has been addressed through data screening techniques in the outlier-detection process.

Although existing studies have used numerous algorithms and methods in the prediction analysis of air quality in smart cities, accurate prediction of AQI with minimum error rate in the prediction process using regression algorithm still requires improvement. Moreover, the regression analysis on quality prediction alone has not assisted in minimizing the pollution rate, the missing and unstable conditions in input data also require an efficient data-pre-processing technique to prevent backdrops in AQI assessment. Further, AQI assist in predicting the health risks associated with air pollution as AQI plays a critical role as tool for assessing and monitoring the quality of air for protecting public health and also understand the root cause of pollution. Therefore, precise model is needed for effective AQI, hence accurate measurement of AQI and suitable processing of missing values together will reduce the backdrops considerably. To this purpose, the DL based regression method in AQI prediction with effective pre-processing of input data is proposed in the present study for controlling air pollution in the future. By adopting the effective integration of DL algorithms, the proposed research addresses the challenges of existing studies in attaining effective predictions on AQI measures.

1.1 Gaps identified

The evaluation of the existing works has used numerous intelligent models for prediction with AI (Artificial Intelligence) oriented algorithms for effective AQI prediction, however, there are significant gaps that has been identified such as imprecise AQI model, inability of the models to work with massive data, monitoring AQI effectively. Thus, effective models need to be used for overcoming these limitations. Hence, the objective of the present research is projected as follows,

  • To pre-process the data using proposed Deep GAN (General Adversarial Network) by imputing the missing inputs in the dataset by effectively monitoring the quality of air.

  • To perform regression with the modified Stacked Attention GRU (Gated Recurrent Unit) with modified KL (Kullback–Leibler) divergence to predict AQI (Air Quality Index) in three different cities like Ahmedabad, Chennai and Ernakulam with minimized regression loss.

  • To evaluate the present system with performance metrics such as MSE, RMSE, R2, MAE, and MAPE internally and performs a comparative assessment with existing algorithms for validating the efficacy of the proposed model.

1.2 Paper organization

The research paper is organized as follows. Section 2 shows the review made on several existing studies related to pollution estimation of AQI in smart cities. Section 3 sketches the proposed methodology with the regression algorithm used. Section 4 discusses the system's performance through internal and comparative analysis with other techniques. Finally, the conclusion part is provided in Sect. 5.

2 Literature review

The following section discusses various techniques and methodologies used to identify AQI prediction.

Many existing prediction methods have lacked efficiency because of locally optimal solutions trapped with feature selection along with over-fitting complications. The suggested model [29] tackled overcoming the optimal local problem with the BSMO (Balanced Spider Monkey Optimization) in selecting the best features. Air quality data has been acquired from the CPCB (Central Pollution Control Board) from four different cities: Chennai, Bangalore, Cochin, and Hyderabad. The Min–Max normalization method has been performed to fill up missing values in the dataset. CNN model has been applied in the deeper representation of the input. BSMO methodology has selected features based on balancing attributes and passed the features to BI-LSTM (Bidirectional LSTM), which has predicted air quality with index measure. Suggested hybrid BSMO with BI-LSTM has attained effective performance in predicting AQI. Similarly, AQI for Delhi has predicted using LSTM-GRU model [30]. Implementation of the model has resulted in considerable outcome for AQI prediction, in which the R Square value obtained by the model was 0.84. Likewise, AQI for Bangladesh [31] has been notified using LSTM-GRU model. The outcome of the model suggested that, hybrid model has delivered better outcome than standalone models.

To protect society with a pollution-free environment, the considered model [32] has used an attention-based AQI predictor, following sequence-to-sequence modeling. Moreover, it has exploited historical quality information in predicting future indexes. To overcome the minimum training speed [33], the encoder in RNN has been replaced with a fully connected encoder to enhance the training. Position-level embedding has been used to predict the relationship between the sequences. Recurrent levels of prediction have caused an increase in error, and it has been resolved through the n-step process of RNN. Experimental outcomes with the air quality predictor and n-step recurrent process have attained reduced training time in the prediction of the index and decreased rate of error. Prediction of AQI acts as the alarm when the concentration level of gases has exceeded a certain limit and contributes to taking effective measures to the appropriate departments. Traditional methods have faced difficulty utilizing temporal features from correlations acquired from various monitoring regions and have attained poor predictions above 24 h. Such issues have been handled through the recommended research [34] using the DL model in the prediction of quality levels in the air all through the day. TS-LSTME (Temporal Sliding Long Short Term Memory Extended) model. It has integrated optimized time lag for the realization of prediction with the multilayer involved with the hourly-based concentration of temporal, PM 2.5, and meteorological information. Outcomes exhibited a better level of stability and obtained a higher correlation coefficient. The considered model has predicted the quality index with an O3 level of concentration.

Effective decisions on protecting the atmosphere have been performed through significant index measurements of PM 2.5 (Particulate Matter), which acts as the indicator of controlling the frequency of air pollution. Semisupervised learning model has been designed in [35], which has included EMD (Empirical Mode Decomposition) with BI-LSTM networks. It has taken the level of PM 2.5 as the input data, which has been considered the signal through the adoption of EMD as a feature-learning technique that has decomposed data and has extracted both amplitude and frequency features. Certain sudden variations have been recorded and have improved short-term predictions. Bi-LSTM has been applied during the supervised learning phase. Likewise, study [36] has compared DL algorithm like LSTM and ML algorithms like ARIMA, DT, KNN, Dummy regressor, Adaptive Boosting, gradient and extreme gradient boosting, Huber regressor. Analytical outcome has depicted that, DL based LSTM has delivered better outcome than ML models for forecasting the quality of air. Similarly, GA-KELM (Genetic Algorithm Based Improved Extreme Learning Machine) [37] has used in the paper for predicting the quality of the air, along with other models like SVM and DBN-BP. However, GA-KELM model has delivered rapid outcome and precise outcome than existing model.

The rapid development of technologies in smart cities leads to different kinds of pollution in the environment. More specifically, air pollution has increased due to transportation and the excretion of smoke in manufacturing industries [38]. It is highly recommended by the government to measure AQI in polluted areas, and the suggested model [39] has provided an ML algorithm with intelligent based data –analysis in the prediction of air quality. It has utilized the integration of the DT (Decision Tree) algorithm with the NN for an accurate level of prediction. Similarly, integrating LSTM with DAE (Deep Auto Encoder) is considered research [40] have predicted PM concentration from 2015 to 2018 in South Korea, analyzing the quality of air hourly. On average, the concentration level of PM has been predicted from continuous 10 days with the optimal learning rate with 100 epochs for 0.01 for the size of batch in LSTM as 32. The best performance of DAE has been achieved with a batch size of 64. Experimental outcomes with the considered methodology have exhibited better performance in prediction.

Air pollution has been one of the issues in recent decades, thus, forecasting the quality of air play a vital role in controlling the air pollution. Therefore, SVR (Support Vector Regression) and RFR (Random Forest Regression) has used for building the regression model for predicting the AQI. Despite the performance of the model, SVR model is not suitable for processing huge number of samples [12]. The development of IoT (Internet of Things) in the prediction of quality measures in the air has assisted in improving the quality of life of humans. In the same way, the recommended research [41] has utilized sensors for the monitoring process and predicts the emission gas level of concentration, which affects the purity of air. The initial process has been handled with sensors, values stored in a cloud-based server have been measured using LSTM, and obtained results have been compared with SVR. It has measured the AQI value based on the concentration of harmful substances present in the air. With the assistance of DL algorithms, the concentration of pollutants has been predicted for two years. Intelligent prediction models in [42] have used RNN (Recurrent NN) with PSO (Particle Swarm Optimization). The Computation of the model has been performed through the LSTM structure. It has considered six different kinds of pollutants mixed in the air for the prediction of AQI. It has collected data from multiple stations and handled missing values in pre-processing using normalization with the MinMax-scalar approach and evaluated the AQI for every hour in 30 days.

3 Proposed methodology

The proposed regression model employs the DL algorithm for analyzing the concentration level of pollutants mixed in the air to predict the pure quality of air through AQI measures. Continuous monitoring of air quality in different cities in India assists the government in taking appropriate quality measures to control pollution. Existing literature on quality prediction has suffered with maximum loss, and multiple missing values in the dataset have created biased outcomes and unstable prediction results. Moreover, it has undertaken limited areas for analysis and lacked generalized prediction. Hence, the present study is motivated by all such factors and intends to enhance the efficiency of prediction through an effective regression technique with the enhancements performed at the pre-processing process using Deep GAN and regression process accomplished using Modified Stacked Attention GRU with Modified KL Divergence. The architecture diagram of the proposed model is depicted in Fig. 1.

Fig. 1
figure 1

Overall architectural diagram of the proposed method

As illustrated in Fig. 1, the proposed working mechanism is divided into two stages: pre-processing and regression. The input dataset considered for computation is Air-quality-data in India dataset. The pollutant data are taken from different cities in India concerning the information about the concentration level of various pollutants mixed in the air. With that inputs, the possibilities of missing values are greater in range. To make an effective outcome on prediction, missing values are handled through Deep GAN, and feature scaling is processed to minimize the complexity of the model. Independent features are standardized to a fixed range in the dataset during the scaling process. Higher variances in the magnitude of values are handled with Deep GAN. With the assistance of generators and discriminators, it generates data samples from the statistical distribution of information. It outputs samples in the dataset which are closely associated with the real distribution of values in the dataset. If missing values are untreated, it creates complications in real-time datasets and produces biased outcomes. After that, the pre-processed data are splitted up into 80% training and 20% testing of the model, which then passes on to the regression phase. Finally, effective prediction is being performed with the modified Stacked Attention GRU with KL divergence for efficient prediction of AQI. It is mainly used for mapping selected pollutant attributes with the collected pollution data due to the self-feedback connections and improved training methods. To manage longer sequences of input data, KL divergence is incorporated, which minimizes the parameters and enhances the model's efficiency. Since Stacked Attention GRU operates with less memory consumption and a faster training method, the proposed system can perform effective prediction, and the complexity of the model is reduced considerably through the proposed regression algorithm. The illustrative diagram of the proposed working structure is illustrated in Fig. 2.

Fig. 2
figure 2

Illustrative diagram of the proposed model

From Fig. 2, the overall process is very clearly represented in which the pollutant sources from different cities in India are being collected and stored in a dataset, namely Air Quality data in India. AQI values are generally categorized as good, moderate, or severe base on the concentration level of pollutants. The dataset holds the data of AQI on an hourly and daily range of various stations over multiple cities in India. The concentration of harmful gases and the AQI index are stored in the dataset. Input data is being processed by the proposed model. Missing values are imputed using GAN, and after filling up input values, the process of training and testing is performed. Followed by that, the regression model plays its role in predicting AQI using the modified stacked attention GRU with KL divergence. From the regression calculation of the higher, medium, and low levels of AQI measured in different cities with the proposed algorithm, it is capable of predicting the three cities such as Ahmedabad with high AQI, Chennai with medium level, and Ernakulum with lower AQI which is being evaluated through error metrics such as MAPE, MSE, R2, RMSE, and MAE. It can provide a lower error in the difference between predicted and actual values.

3.1 Dataset description

The dataset preferred for the proposed algorithm analysis is the Air Quality Data in India which possess the data of AQI calculated for hourly and daily level in multiple cities and various stations in India. It measures data between 2015 and 2020. Values of AQI occurred in data at a Particular city containing two different stations are being compared based on hour, city, day, and station level for confirming that the calculations are valid. Calculation of AQI utilizes seven different measures such as NH3, SO2, CO, PM 2.5, O3, NO2, and PM10. Average AQI values over the previous 24 h are used with the criteria of holding 16 different values. Specifically for O3 and CO, the maximum concentration level in the last eight hours is being considered. Every measure is converted into a sub-category index depending on the pre-defined clusters. At times, measures many measures are not available due to the absence of measuring or might be due to a lack of specified data points. The final level of AQI amounts to the maximum level of the index with the criteria of presenting at least the concentration of three factors from the seven attributes. The AQI level categorizes the measure as good for (0–50), which shows minimal impact, satisfactory for (51–100), which depicts a minor level of breathing discomfort in old people, moderate (101–200), which affects lungs, children, and leads to heart diseases, poor (201–300) exhibits high breathing difficulties, very poor (301–400) leads to respiratory illness on prolonged exposure and severe (> 400) range indicates that it affects the respiratory system entirely even for healthy and young people. The dataset link is https://www.kaggle.com/datasets/rohanrao/air-quality-data-in-india.

3.2 Pre-processing with deep GAN imputation

Data preprocessing is performed by checking missing values and feature scaling. Pollution data are stored as discrete values in a dataset in which the occurrence of missing values is higher. It is being handled through an effective missing data imputation strategy. Scaling of features minimizes the complexity. The proposed GAN model in data pre-processing imputes the missing information directly by considering spatial and temporal relationships with the remaining-observed input values already collected. Table 1 shows the Attribute, missing values of the dataset.

Table 1 Attributes, missing values

Table 1 depicts the attributes, missing values and the total % values present in the dataset. As the missing values of the dataset is collectively in large number, it is important to replace the missing values in order to the improve the performance of the model. GAN is a kind of unsupervised generative structure for summarizing data distribution. It has better generalization capability with limited parameters for processing. The two sections in GAN are the generator and discriminator, in which the generator is trained to generate real data illustrations.

In contrast, the discriminator is trained to differentiate the generated examples from real values. Following that, the generator segment takes the original distribution with a random mask and noise matrix to indicate the real and fake values and then performs the imputation process. The next section of the discriminator obtains the generated information, which determines the components imputed. The generator tries its best to impute the lost information in such a way as to confuse the discriminator.

Similarly, the discriminator puts maximum effort into identifying the imputed information from real values.

Finally, the imputed values are successfully generated when the discriminator is inefficient in finding the imputed data from real data. Considering 15 random variables in the dataset represented as DAT with attributes P1 to P15 as \(DAT=\left({P}_{1},{P}_{2},\dots .,{P}_{15}\right)\) which is being arranged in the form of \(DAT={\left({P}_{1},{P}_{2},\dots .,{P}_{15}\right)}^{K}\) where k is referred to as a total number of sampling points and if suppose \({M}_{vec}=\left({M}_{vec1},{M}_{vec2},{\dots ,M}_{vec15}\right)\) is 15 random attributes with a masking vector representing the missing values of 0 or 1. The incomplete dataset is being defined as \(\widetilde{D}={\left({\widetilde{P}}_{1},{\widetilde{P}}_{2},\dots .,{\widetilde{P}}_{n}\right)}^{K}\) in which \(\widetilde{P}=\left({\widetilde{P}}_{1},{\widetilde{P}}_{2},\dots .,{\widetilde{P}}_{15}\right)\) is considered as the random variable masked by the \({M}_{vec}\) and denoted as

$$\widetilde{{P}_{i}}=\left\{\begin{array}{c}{P}_{i}, if\quad {M}_{veci}=1 \\ nan, if \quad {M}_{veci}=0\end{array}\right.$$
(1)

where \(nan\) denotes the missing data. The generator takes samples of \(\widetilde{{P}_{i}},N\), and \({M}_{vec}\) represented as \(N=({N}_{1},{N}_{2},\dots .,{N}_{15}\) in which randomized noise is being included, and defining of complete and imputed data are given as

$$\overline{P }=G(\overline{P },{M}_{vec},(1-{M}_{vec})\otimes N)$$
(2)
$$\widehat{P}={M}_{vec}\otimes \overline{P }+(1-{M}_{vec})\otimes \widehat{P}$$
(3)

\(\overline{P }\) is the imputed value vector, and G denotes computation operation in a generative network. Finally retrieved complete value is referred to as \(\widehat{P}.\) Particularly, the data observed directly are outputted as given without changes, and missing values are processed with the generator and then outputted. The operator \(\otimes\) determines the element level multiplication. The architecture of GAN is self-possessed with two fully connected layers. The activation functions for both hidden and output layers are Relu (Rectified Linear Unit) and the sigmoid functions, respectively. The operations of both sigmoid and Relu are mathematically computed as follows:

$$Relu:f\left(x\right)=\mathit{max}\left(0,x\right)=\left\{\begin{array}{c}{p}_{i}, if \quad{p}_{i}\ge 0\\ 0, if \quad{p}_{i}<0\end{array}\right.$$
(4)
$$Sigmoid:f\left(x\right)=\frac{1}{1+{exp}^{-x}}$$
(5)

To ensure the imputation quality, the hint method is used for delivering partial information on loss values to the discriminator, which takes both imputed data and hint-matrix for predicting the mask-matrix. The features of real data are captured using the hint mechanism. After obtaining information from the mask-matrix, it provides a hint as 0 as missing data and 1 as observed data. The discriminator tries to output the mask matrix to distinguish the missing value 0 from the real value 1. Satisfactory outcomes are achieved even with more missing values in the dataset. In the imputation process, the discriminator maximizes the level of accuracy in output, whereas the generator minimizes the accuracy of discriminator prediction. The adversarial process of training ensures the quality of imputation. Hence, binary cross-entropy obtains the distance between the actual and predicted possibilities representing 0 or 1. With the help of the hint matrix, the estimated value of the mask matrix is outputted from the discriminator, and the objective function is defined as

$$\begin{array}{cc}min& max\\ Gen& Dis\end{array} {\mathbb{E}}_{\widehat{p},{M}_{vec},H}[{M}_{vec}^{T}\mathit{log}\widehat{{M}_{vec}}+{\left(1+{M}_{vec}\right)}^{T}\mathit{log}\left(1-log\widehat{{M}_{vec}}\right)]$$
(6)

where \(log\) denotes the element − wise logarithm function, and \(\widehat{{M}_{vec}}\) is considered as the projected mask matrix, which is represented as \(\widehat{{M}_{vec}}=D(\widehat{P}, H)\). In addition to that, binary cross − entropy based loss − function is defined as a negative expectation of a log of corrected − predicted possibilities, which is in the form of

$$\mathcal{L}\left(x,y\right)= \sum_{i=1}^{n}[{x}_{i}\mathit{log}\left({y}_{i}\right)+\left(1-{x}_{i}\right)\mathit{log}\left(1-{y}_{i}\right)]$$
(7)

Followed by that, Eq. (6) is being simplified and rewritten as

$$\begin{array}{cc}min& max\\ Gen& Dis\end{array} {\mathbb{E}}[\mathcal{L}\left({M}_{vec},\widehat{{M}_{vec}}\right)]$$
(8)

Through the efficiency of GAN, the incomplete data in the dataset are handled through imputation, and with the pre-processed data, regression is processed. Table 2 shows before and after pre-processing.

Table 2 Before and after pre-processing

Before and after pre-processing has been depicted in Table 2 where the undefined numbers are replaced by using GAN model, which helps in precise prediction of AQI.

3.3 Modified regression using stacked attention GRU with KL divergence

The regression process integrates KL (Kullback–Leibler) with the Stacked Attention-based GRU. KL divergence reduces the loss of information while approximating the distribution. Combining the divergence with NN assists in learning the complex approximate distribution of data. It measures the difference between the probability distribution over a similar variable. Specific functions for assessment of similarity between the two different distributions are referred to as relative entropy and called KL divergence, in which two specific probability-mass functions are specified as

$$MKL\left(p\left(x\right)\parallel \left(x\right)\right)=\sum_{x\in X}p(x)\, log\frac{p(x)}{q(x)}$$
(9)

The above Eq. (9) determines 0 if both points of distribution, such as \(p(x)\) and \(q(x)\), have the same quantity of data. The function \({\text{MKL}}({\text{p}}({\text{x}}) \parallel \mathrm{ q}({\text{x}}))\) provides whether the data is lost and \(q(x)\) approximates the value of \(p(x).\) When both distribution value outputs a positive value, the KL- divergence value is always a positive number. When the value of p(x) yields 0 value, then the equation attains an infinity value. Similarly, when \(q(x)\) is zero, it provides an undefined value. Minimization of KL divergence is equivalent to a reduction of negative-log likelihood.

$$MKL= KL\left(p\left(x\right)\parallel \left(x\right)\right)={\mathbb{E}}_{p}(log\frac{p\left(x\right)}{q\left(x\right)})$$
(10)

The above Eq. (10) determines the relationship ratio with likelihood, which is projected as the expected date of log-likelihood, which assist in estimating the data distribution. GRU is a kind of RNN (Recurrent Neural Network) which solves the complication of vanishing gradient. Both the encoder and decoder are designed with GRU. The encoder section combines the bidirectional stack of GRU, and the decoder section combines unidirectional GRU. Moreover, the attention method is involved with the recurrent-based sequence generator. The attention process learns the alignment within the input and output sequence. In the case of input mathematical denotion consisting of a sequence of points with variable N represented as

$$\left\{\left[{p}_{1},{q}_{1}, {r}_{1}\right],\left[{p}_{2},{q}_{2},{r}_{2}\right],\dots . , \left[{p}_{L},{q}_{L},{r}_{l}\right]\right\}$$
(11)

In which p1 and q1 are p-q coordinates, and si determines the stroke at which the i-th value belongs and addresses the problem of non-uniform based sampling by variations in size and speed of coordinates in which the feature-vectors for every point is represented as

$$[{p}_{k},{q}_{k},\Delta {p}_{k}, \Delta {q}_{k}, {\Delta }^{2}{p}_{k},{\Delta }^{2}{q}_{k}, \delta \left({s}_{i}={s}_{i}+1\right),\updelta \left({{\text{s}}}_{{\text{i}}}\ne {{\text{s}}}_{{\text{i}}}+1\right)]$$
(12)

where \(\Delta {p}_{k}={p}_{k+1}-{p}_{k},\) and \(\Delta {q}_{k}={q}_{k+1}-{q}_{k}\) The attributes of the equation is represented as \({\Delta }^{2}{p}_{k}={p}_{k+2}-{p}_{k}\) and \({\Delta }^{2}{q}_{k}={q}_{k+2}-{q}_{k}\) k with δ(.) = 1 when condition is 0 or 1. The final terms represent flags denoting the status. The mathematical representation is denoted by \(P=\left({P}_{1},{P}_{2},\dots ., {P}_{L}\right)\) representing the input-sequence of the encoder as \({P}_{i}\epsilon {\mathbb{R}}^{d} \left(d=8\right)\) The sequences are represented as \(\left({P}_{1},{P}_{2},\dots ., {P}_{L}\right)\) in which RNN is applied as the encoder for computing the hidden sequences such as \(({h}_{1},{h}_{2},\dots ..,{h}_{L})\) such that.

$${h}_{t}={\text{tanh}}\left({W}_{ph}{P}_{t}+{W}_{hh} {h}_{t-1}\right)$$
(13)

In which \({W}_{ph}\) is represented as the connection weight of the network matrix within the hidden and input layers, and \({W}_{hh}\) is denoted as the weight matrix of recurrent level connections within hidden layers. It performs RNN mapping from the previous input data to every output. The encoder segment in the hidden state is computed using the following equation as

$${z}_{t}=\sigma ({W}_{pz} {p}_{t}+{W}_{hz}{h}_{t-1})$$
(14)
$${m}_{t}=\sigma ({W}_{pm} {p}_{t}+{W}_{hm}{h}_{t-1})$$
(15)
$$\widetilde{{h}_{t}}={\text{tanh}}\left({W}_{ph}{P}_{t}+{U}_{mh} ({m}_{t}\otimes {h}_{t-1})\right)$$
(16)
$${h}_{t}=\left(1-{z}_{t}\right)\otimes {h}_{t-1}+{z}_{t}\otimes \widetilde{{h}_{t}}*MKL$$
(17)

In which σ is represented as the sigmoid function and \(\otimes\) is represented as an element-level operator for multiplication.\({z}_{t}\), \({r}_{t}\to {m}_{t}\), and \(\widetilde{{h}_{t}}\) are the update gate, reset gate, and candidate activation. The decoder is equipped with an attention process which is generating a sequence with formulations, and accordingly, the output is encoded as vectors which are signified as

$$Y=\left\{{q}_{1},\dots .,{q}_{n}\right\}, {q}_{i}\in {\mathbb{R}}^{K}$$
(18)

where the parameter \(k\) is represented as the total number of attributes in AQI and L is considered as the length with \(D\) as the dimensional vector and an as the matrix–vector in which the bi-directional encoder in GRU creates annotation presented as

$$A=\left\{{a}_{1},\dots .,{a}_{len}\right\}, {a}_{i}\in {\mathbb{R}}^{D}$$
(19)

To estimate the learning of variable length and associate the length with the computation of intermediate fixed size vectors, the probability of every prediction is denoted as the following equation.

$$p\left( {q\_t|P, q\_\left( {t - 1} \right)} \right) = s \left( {W_{o} \left( {Eq_{{\left( {t - 1} \right)}} + W_{s } c_{t} + W_{c} sta_{t} } \right)} \right)$$
(20)

The variable s represents the softmax-activation function, ct explores the present status of hidden states, and stat denotes the stacked vector. The decoder also considers the previous stage target as input and adopts the single directional GRU for calculating the hidden state with the representation as follows

$$z_{t} ^{\prime } = \sigma (W_{{qm}} Eq_{{t - 1}} + U_{{sm}} c_{{t - 1}} + C_{{cm}} sta_{t} )$$
(21)
$${m}_{t}^{\prime }= \sigma ({W}_{qm}E{q}_{t-1}+{U}_{sm}{c}_{t-1}+{C}_{cm}st{a}_{t})$$
(22)
$$\widetilde{{c}_{t}}={\text{tanh}}\,({W}_{qz}E{q}_{t-1}+{U}_{mz}({{m}_{t}^{\prime }\otimes c}_{t-1})+{C}_{cz}st{a}_{t})$$
(23)
$${c}_{t}=\left(1-{m}_{t}^{\prime }\right) \otimes {c}_{t-1}+{m}_{t}^{\prime }\otimes \widetilde{{c}_{t}}$$
(24)

where \({m}_{t}{\prime}\), \({a}_{t}{\prime}\), and \(\widetilde{{c}_{t}}\) are considered as he updates, reset along with candidate-activation parameters, respectively. \({W}_{qm}\), \({W}_{qz}\), \({U}_{sm}\), \({U}_{zm}\), \({C}_{cm}\), and \({C}_{cz}\) are weighted matrices. For every prediction from the decoder section, a subset of adjacent points is contributed to the computation of vectors at every time. Hence, the decoder adopts the attention model in linking the related sequence and assigns a greater weight to the appropriate annotation-based vector. The parameterization of attention is being trained with an encoder and decoder as

$${W}_{ti}={V}_{att}^{T} {\text{tanh}}\,({W}_{att}{c}_{t-1}+{U}_{att}{a}_{i})$$
(25)
$${\propto }_{ti}= \frac{{\text{exp}}\left({W}_{ti}\right)}{\sum_{k=1}^{len}{\text{exp}}\left({w}_{tk}\right)}$$
(26)

where \({\propto }_{ti}\) represents the attention probability with n as the attention dimension in which\(Vatt\epsilon n{\prime}\), \({\mathbb{R}}n{\prime}X n, Uatt\epsilon {\mathbb{R}}n{\prime}X D\). With weights as \({w}_{ti}\) and stack vector stat is being calculated as

$$st{a}_{t}= {\sum }_{i}^{len}{\propto }_{ti} {a}_{i}$$
(27)

The attention model is being used for providing specific concentrations to input pollutants to determine AQI accurately. Computation on coverage-related attention method with the sum of previous attention-probabilities is defined as

$${\rho }_{t}= {\sum }_{l}^{t-1}{\propto }_{l}$$
(28)
$$cov=Q* {\rho }_{t}$$
(29)
$${W}_{ti}={V}_{att}^{T}{\text{tanh}}\,({W}_{att}{S}_{t-1}+{U}_{att}{a}_{i}+{U}_{f} {f}_{i}$$
(30)

In which \(\propto_{l}\) is considered the attention probability at the initial step, and the variable ρt is also initialized as a zero vector. Such vectors are produced with a convolutional layer with relative attention-based probabilities. The decoding process is accomplished throughout the entire input by assigning a higher level of attention. The proposed stacked attention-based GRU integrating the KL divergence method effectively predicts the AQI measure in a dataset. It determines the high, medium, and low-level polluted cities in India.

4 Results and discussion

The regression performance attained from the computation of the proposed algorithm is evaluated with the rate of prediction and loss obtained. The experimental outcomes are described in the succeeding section and perform a comparison of values internally and externally with several other algorithms to identify the effectiveness of the enhanced algorithm used in present research concerning evaluation metrics like RMSE, MAPE, MSE, MAE, and R2.

4.1 Experimental results

The competence estimation of the proposed system is being evaluated with various performance metrics, and this section demonstrates the exploration made with the experimentation of the model along with analyzing comparative assessments are being performed. From regression outcomes, it is found that high, medium and low level polluted cities in India are found. Simulation outcomes achieved through the Ahmedabad, Chennai and Ernakulam are projected in subsequent section.

4.1.1 Ahmedabad

Figure 3a shows the output obtained ‘without imputation’ and Fig. 3b AQI with imputed data for Ahmedabad.

Fig. 3
figure 3

a Without Imputation. b AQI with Imputed Data

Figure 3a indicates AQI prediction with the proposed mechanism yielding efficient outcomes. Figure 3b depicts that the proposed imputation by Deep GAN for filling missing values exhibits continuous data distribution and leads to attaining greater prediction efficiency. The highest AQI measure in the Ahmedabad city is given in Fig. 4.

Fig. 4
figure 4

a Prediction Validation. b Proposed Predicted data in Ahmedabad

From Fig. 4a, b, the AQI predicted for Ahmedabad city is being validated. It found that the predicted values using the proposed regression algorithm are much similar to the observed values. The green line shows higher volatility and momentous fluctuations compared to the red line. Notable underestimation occurs when the green line peaks while the red line remains lower at the same observation point, indicating that the red line underestimates the values compared to the green line’s data. The red line exhibits a smoother progression and gradual decrease in values over time, consistently lagging behind the green line. Notable overestimation instances, where the red line surpasses the green line, are minimal or non-existent based on the provided image.

The reasons behind the prominent underestimation observed in the Fig. 4b may stem from differences in data sources, measurement techniques, or inherent variability in the datasets represented by the green and red lines. Further analysis and comparison of the data sources and methodologies used to generate the green and red lines could provide insights into the reasons behind the observed underestimation. Figure 5 shows the estimation of pollutant contents in the air.

Fig. 5
figure 5

Estimation of pollutant contents in the air

From Fig. 5, it is obvious that the prediction of AQI is based on the number of pollutants present in the air, which data collected from Ahmadabad assists in predicting AQI. Table 3 depicts the ‘after imputation’ using 0, mean and GAN model.

Table 3 Imputation for Ahmedabad

MSE, RMSE, MAE, MAPE and R Square obtained by using after imputation technique using 0, mean and GAN is depicted in Table 3, where the error rate using proposed technique is less than the other ‘after imputation’ approaches.

4.1.2 Chennai

The concentration level of harmful gases obtained in Chennai is projected in Fig. 6.

Fig. 6
figure 6

Prediction of Pollutant Contents in Chennai City

From Fig. 6, it is evident that the amount of pollutant present in the air is being predicted by estimating the AQI level, which indicates the efficiency of the regression model in determining the degree of polluted air present in Chennai. The experimental results in Chennai city are illustrated in Fig. 7.

Fig. 7
figure 7

a AQI Prediction with Dat. b AQI versus Time with Imputed Data

From Fig. 7a, b, the prediction of AQI measure in Chennai city is being calculated at different dates and times with imputed data to provide effective prediction outcomes. The observation from the prediction is presented in Fig. 8.

Fig. 8
figure 8

a Validation with Chennai data. b Proposed Evaluation with Chennai

The proposed validation on Chennai city observed from Fig. 8a, b indicates that the observed outcomes are similar to the predicted values, which shows better prediction performance experienced in Chennai city. Similarly, Table 4 illustrates the ‘After imputation technique’ for Chennai city.

Table 4 Imputation for Chennai city

RMSE, MSE, MAE, MAPE, R Square values obtained using GAN model are 0.0977, 0.0095, 0.0795, 0.1219 and 0.9064, which is better than the ‘After imputation technique using 0 and mean’.

4.1.3 Ernakulam

The concentration level of pollutants in Ernakulam is depicted in Fig. 9.

Fig. 9
figure 9

Estimation of pollutant contents in the air

From Fig. 9, it is obvious that the prediction of AQI is based on the number of pollutants present in the air, which data collected from Ernakulam assists in predicting AQI. The simulation results of the proposed model are illustrated in Fig. 10.

Fig. 10
figure 10

a AQI Prediction. b AQI vs time in Ernakulam with Imputed data

From Fig. 10a, b, prediction with imputed data determines that Ernakulam shows a low level of pollutant content with the measure of AQI. Imputation with Deep GAN in the proposed work achieves greater efficiency.

The prediction competence is projected in Fig. 11.

Fig. 11
figure 11

Proposed Prediction Efficiency

From Fig. 11, it is evident that the predicted values with the utilization of Stacked Attention GRU with KL divergence are very similar to observed values, which exhibits the efficiency of the proposed regression algorithm. ‘After imputation’ outcome for Ernakulam city is depicted in Table 5.

Table 5 Imputation for Ernakulam City

Table 5 shows that MSE, MAPE, R square, RMSE and MAE values obtained using GAN model is better than ‘After imputation’ technique using 0 and Mean. This is because, when a missing value is simply replaced with 0 and mean values, significant information and variability in the data maybe lost, which leads to biased and inaccurate predictions, however, imputation using GAN model is more effective as it can learn the underlying structure of the data and generate realistic values in order to fill the missing data.

4.2 Performance analysis

The proposed model is evaluated with performance metrics such as RMSE, MAPE, MSE, R-Square, and MAE for estimating the effectiveness of regression. The performance of the proposed approach in terms of loss values for AQI predicted in the city of Ahmedabad is tabulated in Table 6.

Table 6 Performance evaluation for Ahmedabad city

From Table 6, it is clear that the proposed algorithm is being evaluated with regression error and estimated concerning MAE of 0.0979, MSE with 0.0161, RMSE of 0.1271, MAPE with 0.0587 and R-Square with 0.8942, which exhibits an efficiency of the regression method through minimum loss acquired from the analysis. The medium level of AQI in Chennai city is being predicted and tabulated in Table 7.

Table 7 Performance Efficiency of AQI Predicted in Chennai

From Table 7, it is clear that the prediction made on AQI in Chennai city acquires minimum loss values calculated with MSE as 0.0095, MAE with 0.0795, MAPE with 0.1219, R-Square with 0.9064 and RMSE with 0.0977. The minimum values of loss calculated from Chennai prediction data indicated the efficiency of the proposed regression method. Along with that, the regression analysis of the city of Ernakulam is tabulated in Table 8.

Table 8 Performance evaluation of AQI Measure in Ernakulam

From Table 8, inference from loss values predicted in terms of MSE with 0.0134, MAE with 0.1013, MAPE with 0.1152, R-Square with 0.9479, and RMSE with 0.1156. Figure 12 depicts the graphical representation of the performance metrics for different cities.

Fig. 12
figure 12

Comparative Plot

Figure 12 showcases the performance metrics of obtained for different cities. Different metrics are used for assessing the efficacy of the model for precise and effective AQI.

4.3 Comparative analysis

The efficiency of the proposed methodology is exhibited by comparing the calculated loss values with the existing techniques. A comparison of different existing algorithms with the proposed algorithm is illustrated in the upcoming section.

4.3.1 Ernakulum

Different existing work has been compared with the proposed model for observing the loss rate in terms of AQI. Therefore, Table 8 depicts the MSE, RMSE and MAE value obtained by the existing SVR, ESVR, CNN and ECNN model along with proposed model.

From Table 9, comparing the loss obtained with regression calculation on lower AQI in Ernakulam by the proposed technique is found to be lower than considered SVR, CNN, ESVR, and ECNN. It demonstrates that the proposed regression algorithm is efficient in AQI prediction. The graphical representation of the comparison of algorithms is given in Fig. 13.

Table 9 Comparative analysis of Loss values of Stacked Attention GRU with KL Divergence [43]
Fig. 13
figure 13

Comparison of Loss Efficiency from Proposed with Existing Regression Algorithms [43]

From comparing the loss acquired from the proposed with existing methods in Fig. 13, the proposed method is identified to be efficient in regression and yields effective prediction output.

4.3.2 Chennai

The existing research obtained AQI prediction for only Chennai city, which is being compared with the prediction values obtained in Chennai by the present research and tabulated in Table 10.

Table 10 Comparing the Performance of AQI predicted in Chennai [44]

From Table 10, the inferred loss values on the prediction made from Chennai city with both proposed and existing regression methods exhibit that the proposed Stacked Attention-based GRU model with KL divergence has acquired minimum loss measured in terms of R2, MAE, and RMSE. The graphical analysis is depicted in Fig. 14.

Fig. 14
figure 14

Comparative analysis of Regression Loss Obtained in Chennai by Proposed and Existing models [44]

From Fig. 14, comparing loss calculated from existing and proposed regression algorithms in predicting AQI measures signifies that the proposed regression methodology is efficient in prediction with minimum loss values. Likewise, Table 11 shows the AQI value obtained using MAE, MSE, RMSE and R Square for Chennai.

Table 11 Comparative analysis for Chennai [45]

From Table 11 it can be observed that, MAE, MSE, R square and RMSE obtained using the proposed model is better than the existing model for AQI prediction and the graphical representation is shown the Fig. 15.

Fig. 15
figure 15

Comparative analysis

4.3.3 Ahmedabad

Similarly, RMSE and R Square value for Ahmedabad is depicted in Table 12 for existing works like SARIMA, SVM-Linear, SVM-RBF, SVM-polynomial and LSTM.

Table 12 Comparative analysis for Ahmedabad [46]

From Table 12, it can be identified that, RMSE, R-Square value attained for Ahmedabad city is better than the existing model as the RMSE value obtained by the proposed model is 0.1271 and R square value obtained is 0.8942. This primarily due to the incorporation of Deep GAN imputation and regression using Stacked Attention GRU with KL divergence in the proposed mechanism. Figure 16 shows the graphical depiction of table.

Fig. 16
figure 16

Comparative analysis

With the assistance of proposed Deep GAN imputation and regression using Stacked Attention GRU with KL divergence, efficient prediction on AQI with minimum loss is obtained. The better level of prediction efficiency with lower loss is the advantage of the proposed system. From AQI measured using the proposed model, both government and private institutions can focus on initiating activities to improve air quality.

5 Discussion

Analytical outcome of the proposed mechanism shows that proposed work has delivered better performance than the existing works. Table 13 shows the overall values obtained by the proposed model using different loss rate metrics for different cities.

Table 13 Values of proposed model

RMSE value obtained by the proposed model is 83.1143% difference lower than SVR, 76.5617% than ESVR, 79.4578 than CNN, 53.4854% difference than ECNN. Likewise, MSE values of the proposed model is lower than existing models, thereby implying the efficacy of the proposed mechanism. Similarly, MAE value of the proposed model is, RMSE and R square obtained by the proposed model is 164.064% lower than existing model, likewise, RMSE value of the proposed model obtained is 161.425% lower than the prevailing model, eventually, R Square value of the proposed model is 0.7085%. Moreover, R square obtained by LSTM-GRU model [30] for AQI prediction is 0.87. However, R square attained by the model for AQI for Chennai city is 0.9064, Ernakulam 0.9479 and Ahmedabad is 0.8942, this shows that, R square of the proposed work is better than the existing LSTM-GRU for AQI. Further, SVR employed [12] in the study possess the limitation of not working effectively for large data, however, this limitation is primarily overcome by proposed work as it employs Deep GAN model and modified stacked attention GRU with KL divergence. Though different cities are covered in the proposed work, Chennai, Ahmedabad, Ernakulam have not covered in other studies, thereby making the paper distinct and diverse. Further, in order to depict the overall work of the proposed mechanism, a SWOT analysis has been provided in Table 14.

Table 14 SWOT analysis

6 Conclusion

The proposed regression methodology utilized Deep GAN for imputing missing values in the dataset during preprocessing. With the processed input data, regression was performed using modified Stacked Attention GRU with KL divergence for predicting AQI in multiple cities of India. The regression process performed with the combined approach was evaluated to estimate the model’s efficiency. From the experimental outcome, it was found that Ahmedabad was possessed with higher AQI in terms of MSE, RMSE, R Square, MAE, and MAPE with values 0.0161, 0.1271, 0.0152, 0.0979, and 1.3862, respectively and Ernakulam city was predicted with lowest AQI in terms of 0.0134, 0.1156, 0.2946, 0.1013 and 0.4839 respectively. The regression loss obtained was also compared with other traditional algorithms. As a result, the proposed AQI prediction was considered efficient since it achieved reduced error values than other existing regression algorithms. Despite the advantages of the proposed model, there are few limitations of the model which includes complexity of the model, as the model complexity can pose the challenges in terms of computational resources, besides, proposed model only focuses on predicting AQI in specific cities in India and may not be applicable to other regions with different environmental factors, this generalizing the findings top broader geographic area requires careful consideration of local factors and validation against relevant datasets. As future work, the present research can focus on predicting AQI after COVID 19 in India with the involvement of additional factors after air pollution, further associating diseases through air pollution will be focused on future.