Introduction

The consistent rise in global temperatures over the twenty-first century continues to pose a serious challenge to humanity. The principal factors of climate change are the rising solar energy and global warming due to the aggravation of greenhouse effects (McCarthy et al. 2001; Griggs and Noguer 2002; Asadi and Karami 2021). Climate change has adversely affected various aspects of human life, such as agriculture, economy, food security, and public health (Dogru et al. 2019; Kogo et al. 2021; Pasquini et al. 2020).

Relative humidity (RH) is the amount of water vapor or vapor pressure in the air. It is measured as a percentage of the amount of moisture that the atmosphere can hold at the same temperature and pressure (Khatibi et al. 2013; Ghadiri et al. 2020). Relative humidity is an important climate feature that directly influences some of the major sectors such as human health, hydrological studies, pharmaceutical industry, agriculture, irrigation scheduling, floods, and hydropower (Yu 2009; Gunawardhana et al. 2017). Besides, extreme values of RH (\(< 40\%\) or \(> 60\%\)) can negatively impact human health including cold and flu, nasal bleeding, vomiting, asthma attacks, allergies (Falagas et al. 2008; Zhang et al. 2016). The human body is more susceptible to respiratory diseases like COVID-19 infection during low relative humidity conditions (Mangla et al. 2021). Besides, high relative humidity causes an increase in precipitation, which can be dangerous for the economy of any country (Silveira 2002). Moreover, accurate weather information and accurate humidity forecasting are frequently essential for warning about natural disasters induced by sudden changes in climatic conditions (Adnan et al. 2021). The above discussion emphasizes the urgency of monitoring and predicting relative humidity throughout the year in developing countries like India.

Many studies employed various time series models, such as autoregressive integrated moving average (ARIMA) models, fuzzy time series (FTS) models, and artificial neural network (ANN) models for the prediction of entities in various fields, such as hydrology, earth sciences, and economics (Nilashi et al. 2012; Singh 2018; Chen et al. 2019; Luk et al. 2001). Sarraf et al. (2011) implemented ARIMA model to forecast the relative humidity and monthly mean temperature during 2009–2011 of Ahwaz Station in Iran. Arzu et al. (2020) used multiple linear regression (MLR), ARIMA, and ANN models for the prediction of wind speed in Suva, Fiji, and demonstrated that MLR performed better than the ARIMA and ANN models. Li et al. (2019) employed ARIMA and long short-term memory (LSTM) models to forecast the daily average relative humidity in Gansu Linxia, China. This study showed that the ARIMA model provides better accuracy than the LSTM model. Masngut et al. (2020) employed ARIMA and ANN models to forecast the rainfall of Simpang Ampat, Pulau Pinang in Malaysia. The study used the daily rainfall data during the period January 2016 to December 2018 and revealed that the ANN model provides better prediction of rainfall compared to the ARIMA model. Shi et al. (2018) developed a prediction model based on backpropagation neural networks to forecast indoor air temperature and relative humidity every 10 min and 6–72 h in advance in Chongqing, China. The study demonstrated that the proposed model is more effective in predicting temperature.Wanishsakpong and Owusu (2020) implemented ARIMA and ARIMAX models on the average monthly temperature data pertaining to the period 2006 to 2016 in the southwestern region of Thailand and demonstrated that the ARIMA model provided a better forecast of temperature compared to the ARIMAX model. Eymen and Köylü (2019) utilized the Mann–Kendall rank test for trend analysis of relative humidity and wind speed at Yamula Dam, Turkey. Theil–Sen slope method has been used to find out the power of the trend. Further, the ARIMA model has been implemented to forecast the relative humidity. Zhang et al. (2017) proposed a wavelet-ARMA/ARIMA model for forecasting of particulate matter \(PM_{10}\) and compared its accuracy with the ARMA/ARIMA model. The results revealed that the proposed model outperformed the ARMA/ARIMA model. Casallas et al. (2021) implemented an LSTM model to forecast \(PM_{2.5}\) and meteorological variables (temperature, radiation, humidity, wind speed) for Bogotá, Colombia. The results revealed that the LSTM performs better, especially for \(PM_{2.5}\) and wind speed. Based on the daily rainfall and streamflow datasets from 1991 to 2014, Ali and Shahbaz (2020) employed the ANN models to forecast the daily river streamflow of Lahore, Pakistan. The estimated parameters of the proposed ANN model have been assessed based on different criteria such as root mean square error (RMSE), correlation coefficient (R), and the coefficient of determination (\(R^2\)). The results showed that the ANN model with a composition of input patterns demonstrated to the model significantly affect the learning ability, training time, and functioning of the ANN model. Tao et al. (2022) developed random forest and multivariate adaptive regression spline models for the prediction of relative humidity in Iraq and validated their performance with support vector regression. Astsatryan et al. (2021) utilized the neural network technique for the prediction of hourly temperature in the Ararat valley, Armenia. The results revealed that the suggested model provided \(87.31\%\) and \(75.57\%\) accuracy in the prediction of temperature for 3 and 24 h.

Many studies have been conducted to forecast the relative humidity in India. Namratha V (2020) employed the ARIMA model for the prediction of relative humidity in Bangalore. Kamath and Kamat (2018) forecasted monthly rainfall for Idukki district, Kerala using ARIMA, ANN, and exponential smoothing state space (ETS) models. They showed that the ARIMA model produced more reliable results compared to ANN and ETS models. Kumar et al. (2021) used the ARIMA and machine learning (ML) algorithms for the prediction of air pollution in Assam. The results revealed that the ARIMA model performed better than the machine learning algorithm. Kulkarni et al. (2018) employed the ARIMA model for the prediction of air pollution in Nanded, Maharashtra. This study showed that the level of air pollution is increasing in Nanded city. Many studies employed SARIMA models to predict the temperature and precipitation in India. These studies showed that the predicted data were proficient with the trend in the observed data (Dimri et al. 2020; Dabral and Tabing 2020). On the basis of monthly solar insolation data during 1984–2017, Shadab et al. (2020) implemented a seasonal ARIMA model to forecast solar radiation in Delhi. The proposed seasonal ARIMA model was found to explain the maximum forecasted insolation value in May and the minimum in the months of January and December.

Litta et al. (2013) employed an ANN model to predict the temperature and relative humidity in Kolkata during pre-monsoon thunderstorms in 2009 and examined the utility of ANN for estimating hourly surface temperature and relative humidity. This showed that the ANN model provides a better prediction of hourly temperature and relative humidity during thunderstorm hours. Rajendra et al. (2019) employed artificial neural network models, namely multilayer perceptron (MLP) and radial base function (RBF) to predict the metrological variables of two stations situated in India. The study demonstrated that the MLP and RBF had provided 91–96\(\%\) accuracy for predictions of metrological variables. Kapadia and Jariwala (2021) developed a model for the prediction of ozone in Surat city using ANN feature selection techniques, namely, sensitivity analysis, Boruta algorithm, and the recursive feature elimination algorithm (RFE). The results revealed that the efficiency of the proposed model was found to be 79.4\(\%\). Biswas and Sinha (2021) employed a long short-term memory (LSTM) model and a bidirectional long short-term memory (BiLSTM) model to forecast the Indian Ocean wind speed. The study used daily wind speed data from 2006 to 2017 and demonstrated that the BiLSTM model performs much better than the LSTM model. Ramesh and Iyengar (2016) implemented an artificial neural network (ANN) model on the monthly Indian monsoon rainfall data over the course of the twentieth century. The proposed ANN model was found to explain more than \(90\%\) of the underlying variance of the data. Lama et al. (2021) used the SARIMA model in conjunction with the exponential autoregressive (EXPAR) and time-delayed neural network (TDNN) models to predict the changes in the monthly rainfall in the Himalayan region of India. The study demonstrated that TDNN has stronger pattern prediction ability and higher forecast accuracy than the SARIMA and EXPAR models.

To the best of our knowledge, not a single study was conducted to compare the performances of the seasonal autoregressive integrated moving average (SARIMA) and artificial neural network (ANN) with MLP models. Thus, the purpose of this study is to examine the SARIMA and ANN with MLP models’ forecasting accuracy and pattern prediction ability for predicting monthly relative humidity in Delhi, India during 2017–2025.

Data and methods

Data

The monthly average relative humidity data (2000–2016) collected by the India Meteorological Department (IMD), Pune were used to fulfill the objectives of the study. The data consist of the average relative humidity in percentage (%) per month.

Methods

The SARIMA and ANN with MLP models were used to predict the monthly average relative humidity in Delhi, India. The Box–Jenkins (B–J) methodology (Box et al. 2015) was applied to fit the SARIMA model, it used the stationary stochastic processes to predict the relative humidity in Delhi. On the other hand, a multilayer perceptron algorithm was used to fit the ANN model to predict relative humidity.

SARIMA \((p, d, q)\times (P, D, Q)_S\) model

The SARIMA \((p, d, q)\times (P, D, Q)_S\) model, where p and q represent the orders of the non-seasonal autoregressive and moving average terms, respectively, d is the order of difference. Similarly, P and Q represent the orders of the seasonal autoregressive and moving average terms, respectively, and D represents the order of seasonal difference. The model can be expressed as

$$\begin{aligned} \Psi (B^S)\psi (B)\nabla ^d\nabla _S^D Y_t = \varphi (B)\Phi (B^S)\epsilon _t, \end{aligned}$$
(1)

where \(\psi (B)\) and \(\varphi (B)\) are the p and q order of non-seasonal autoregressive and moving average polynomials, respectively.

$$\begin{aligned} \psi (B)= & {} 1-\psi _1(B)-\psi _2(B^2)-\cdots -\psi _p(B^p), \end{aligned}$$
(2)
$$\begin{aligned} \varphi (B)= & {} 1-\varphi _1(B)-\varphi _2(B^2)-\cdots -\varphi _q(B^q), \end{aligned}$$
(3)

Similarly, \(\Psi (B)\) and \(\Phi (B)\) = \(1-\Phi _1(B^S)-\Phi _2(B^{2S})-\cdots -\Phi _Q(B^{QS})\) represent the P and Q order of seasonal autoregressive and moving average polynomials, respectively.

$$\begin{aligned} \Psi (B)= & {} 1-\Psi _1(B^S)-\Psi _2(B^{2S}-\cdots -\Psi _P(B^{PS}), \end{aligned}$$
(4)
$$\begin{aligned} \Phi (B)= & {} 1-\Phi _1(B^S)-\Phi _2(B^{2S})-\cdots -\Phi _Q(B^{QS}. \end{aligned}$$
(5)

B is the back-shift operator and defined as the \(B^d(Y_t)\) = \(Y_{t-d}\), \(\nabla ^d\) = \((1-B)^d\), \(\nabla _S^D\) = \((1-B^S)^D\) and \(\epsilon _t\) denotes the error term which behaves as a white noise process. S represents the seasonal (S=12) frequency.

The SARIMA model is used in the Box-Jenkins approach (1976), which contains four-steps to forecast the relative humidity: identification of the model’s order, parameters estimation, diagnostic checking, and forecasting. In the first step, we must check the stationarity of time series data in SARIMA \((p, d, q)\times (P, D, Q)_S\) model. If the data are not stationary, then, in that case, we take the difference of the data to make it stationary. In this study, the augmented Dickey–Fuller (ADF) and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests were applied to check the stationarity of time series data. In the ADF test, if p value < 0.05, then the given time series considered to be stationary (Said and Dickey 1985), whereas in the KPSS test, if p value > 0.05, then the given time series data considered to be stationary. To obtain the order of pqPQ, autocorrelation functions (ACF) and partial autocorrelation functions (PACF) were used. Furthermore, Akaike information criterion (AIC) (Akaike 1974), and diagnostic analysis were used for the selection of the best-fitted model. After identification of the model, the maximum likelihood estimation method was used to estimate the value of parameters of the fitted model. To verify the properties of the residuals of the fitted model to follow the white noise process, the Ljung–Box test was used. The Ljung–Box test examines the null hypothesis that no significant autocorrelation remains in the model’s residuals and shows whether the model is accurately specified. A p value > 0.05 indicates that the model is properly constructed to describe time series correlation information (Ljung and Box 1978). Finally, the fitted models was employed to forecast the average monthly relative humidity in Delhi during 2017–2025.

Artificial neural network (ANN) model

The concept of ANN was first developed by McCulloch (1943). It is an effective method of information processing that resembles a biological neural network in its characteristics. In the last few decades, ANN has been widely employed for the prediction, pattern recognition, and feature extraction, etc (Khan et al. 2016).

In the present study, a multilayer perceptron (MLP), which is a class of feed-forward artificial neural network (ANN), was used to forecast the average monthly relative humidity. The MLP is the most widely used ANN technique in modeling hydrological activities (Traore et al. 2010). There can be more than one linear layer in an MLP. The input layer is the first layer that collects data. The output layer is the final layer that generates output data. The hidden layers exist between the output and input layers. The MLP model with a hidden layer is represented by I:Hs: Ol, where I denotes the number of nodes in the input layer, H describes the number of nodes in the hidden layer, O is the number of nodes in the output layer, s denotes the logistic sigmoid transfer function and l indicates the linear transfer function. It is presented in Fig. 1.

Fig. 1
figure 1

Structure of MLP for monthly relative humidity data

Figure 1 shows the structure of an MLP for average monthly relative humidity data to create the relationship between input and output layers. Each unit processes it with an activation function in the input and hidden layer and finally transmits it to the output layer. The relationship between input and hidden layer in the case of MLP can be determined as

$$\begin{aligned} Y_i = W_0+\displaystyle \sum _{i=1}^{i} W_i X_i, \end{aligned}$$
(6)

where \(W_i\) represents the weights and \(X_i\) is the inputs nodes.

Results and discussion

Figure 2 shows the time series graph of monthly average relative humidity in Delhi during 2000–2016. The time series graph indicates that the observed data is the stationary time series. Besides, ADF and KPSS tests were applied to confirm the stationarity of the data. In the case of the ADF test, the p-value was found to be less than 0.05. It indicates that the relative humidity data during 2000–2016 is stationary. The results of the KPSS test showed that the p value was greater than 0.05 (Table 1). Similarly, it also indicates that the relative humidity data during 2000–2016 is stationary.

Fig. 2
figure 2

Graph of time series data for monthly relative humidity (2000–2016)

Table 1 Test of stationarity for relative humidity data

The autocorrelation function (ACF) and the partial autocorrelation function (PACF) are illustrated in Figs. 3 and 4, respectively. In ACF plot, the significant spikes found at lag 1 and lag 12. Thus, this plot suggests that the order of non-seasonal moving average term was (q=1) and the order of seasonal moving average term was (Q=1). Similarly, the PACF plot showed the significant spikes at lag 1 to 4 and at lag 12, which suggested the order of non-seasonal autoregressive term was (p=4) and the seasonal autoregressive term was (P=1).

Fig. 3
figure 3

Plot of autocorrelation function (ACF) for 20 lags of relative humidity data

Fig. 4
figure 4

Plot of partial autocorrelation function (PACF) for 20 lags of relative humidity data

On the basis of the minimum values of AIC and diagnostic analysis, we selected the best-fitted SARIMA\((1,0,0)\times (0,1,1)_{12}\) model for predicting the relative humidity. The final model is defined by Eq. 1 and have the following form \(0.33(B) \nabla _{12}^1 Y_t = -0.86(B^1 )\epsilon _t\). Table 2 presents the estimated parameter values of SARIMA\((1,0,0)\times (0,1,1)_{12}\) model. The best-fitted SARIMA\((1,0,0)\times (0,1,1)_{12}\) model had a low AIC score 1269.93 and well-behaved residuals are evident from Figs. 56 and 7. The ACF and PACF plots of the residuals highlight the absence of any autocorrelation among the residuals. Further, the Ljung–Box test has also been implemented on the residuals, and the p value was found to be greater than 0.05 (Table 3). It revealed that the residuals of the fitted model followed the white noise process. Hence, all the residuals plot and Ljung–Box test showed that the SARIMA\((1,0,0)\times (0,1,1)_{12}\) model can be used to forecast the relative humidity in Delhi.

Table 2 Estimated parameters value of SARIMA\((1,0,0)\times (0,1,1)_{12}\) model
Table 3 Test statistic for the fitted model from the Ljung–Box test
Fig. 5
figure 5

Residuals plot of the fitted SARIMA\((1,0,0)\times (0,1,1)_{12}\) model for relative humidity data

Fig. 6
figure 6

Residual autocorrelation function (ACF) plot of fitted SARIMA\((1,0,0)\times (0,1,1)_{12}\) model

Fig. 7
figure 7

Residual partial autocorrelation function (PACF) plot of fitted SARIMA\((1,0,0)\times (0,1,1)_{12}\) model

Thus, SARIMA\((1,0,0)\times (0,1,1)_{12}\) model was employed to forecast the monthly relative humidity in Delhi during 2017–2025. The forecasted values were given in the supplementary table (Table S1). Besides, Fig. 8 shows the forecasted values of the relative humidity in Delhi during 2017–2025. From Fig. 8, we observed that the estimated relative humidity will decrease every year from January to May and September to October. On the other hand, the relative humidity will increase from June to August and November to December. The relative humidity attained a low value of \(43.62\%\) (30.57\(-\)56.67) in April and \(43.43\%\) (30.37\(-\)56.48) in May 2017. Similarly, the relative humidity will be 43.58 \(\%\) (29.58\(-\)57.57) in April and \(43.41\%\) (29.42\(-\)57.41) in May 2025. The maximum relative humidity \(79.60\%\) (67.31\(-\)91.89) was obtained in January 2017 and it will also be maximum \(75.02\%\) (61.02\(-\)89.01) in January 2025. The relative humidity was dropped by \(45.4\%\) between January and May of 2017, and by \(6.78\%\) during September and October of 2017. It also increased by \(37.23\%\) from June to August 2017 and by \(9.21\%\) from November to December 2017. Again, relative humidity will decrease by \(44.70\%\) during January to April 2025 and \(6.78\%\) during September to October 2025. The relative humidity will increase by \(37.23\%\) from May to August 2025 and by \(9.21\%\) from November to December 2025.

Fig. 8
figure 8

Forecasted relative humidity data during 2017–2025 in Delhi, India from SARIMA

Further, we forecasted the monthly relative humidity using MLP. The structure of the MLP model consisted of 12 input nodes, 5 hidden layer nodes and 1 node in the output layer. The selected MLP model is defined by 12:5 s:1 l. The results of the MLP model were presented in the supplementary table (Table S2). Figure 9 shows that the relative humidity will be decrease every year from January to April and September to October during 2017–2025. It will also be increased from May to August and from November to December during 2017–2025. The minimum relative humidity of 49.17\(\%\) was found in April 2017. The maximum relative humidity of 73.33\(\%\) reached in the months of August 2017. The results of the MLP revealed that relative humidity decreased by 31.86\(\%\) from January to April and 5.69\(\%\) from September to October in 2017 and 2025, respectively. In addition, it increased 49.07\(\%\) from May to August and 3.99\(\%\) from November to December of 2017. In 2025, relative humidity will fall by 31.86\(\%\) from January to May and 5.69\(\%\) from September to October, while increasing 49.07\(\%\) from June to August and 3.99\(\%\) between November and December.

The results of Table 4 showed the accuracy of SARIMA and MLP models. It was measured in terms of RMSE and MAE. The value of RMSE and MAE for MLP model were 4.65 and 3.42, which were lower than the RMSE of 6.04 and MAE of 4.56 for SARIMA model. Thus, results revealed that the MLP model provides better accuracy compared to the SARIMA model.

Fig. 9
figure 9

Forecasted relative humidity data during 2017–2025 in Delhi, India from MLP

Table 4 Comparison of results for SARIMA and MLP models

Conclusion

In this study, SARIMA and ANN with MLP models were employed to forecast the monthly relative humidity in Delhi, India, during 2017–2025. Besides, it also compared their forecasting accuracy in terms of RMSE and MAE. From the results, we observed that the relative humidity will decrease from January to April, September to October, during 2017–2025. It will increase from May to August and November to December, during 2017–2025. Besides, the results of the study also indicated that the MLP model achieved better accuracy than the SARIMA model in terms of RMSE and MAE. Therefore, this research suggests that the ANN with MLP model performs better than the SARIMA model with minimum forecasting error.