Introduction

Though globalization, industrialization, and urbanization have escalated the economic growth of nations, these activities have played foul on the environment. Better understanding of ill effects of these activities on environment and human health and taking appropriate control measures in advance are the need of the hour. For this the relevant data could be analyzed through time series. Time series forecasting is an extensive quantitative technique involving collection and analyzation of historical observations for the development of an appropriate model. Theoretically, analysis of time series basically contains three steps: characterization, modeling, and forecasting. While forecasting calculates short-term progression of the system, the modeling component establishes long-term behavioral features of the system. The first step determines the fundamental properties such as measure of randomness or degrees of freedom. This analysis can be employed to build predictive models with minimum errors for forewarning. It is important to underline that the modeling choice in any temporal, or more generally, in any spatio-temporal prediction, is relevant and must be suitably faced (De Iaco and Posa 2018; Cappello et al. 2018; De Iaco et al. 20132015).

Autoregressive integrated moving average (ARIMA) model is the most popular accepted time series models (Shahwan and Odening 2007; Singh et al. 2020b). The Box-Jenkins methodology (Box and Jenkins 1970), high mathematical precision, and reliability are what makes ARIMA models very popular (Singh et al. 2021b). ARIMA models have large number of applications. The model is applied to forecast various things like commodity prices (Weiss 2000); for load forecasting in the power system (Nicolaisen et al. 2000; Hippert et al. 2001); future of energy resources, such as oil (Morana 2001) or natural gas (Buchananan et al. 2001); daily environmental factors such as ozone levels ( Robeson and Steyn 1990; Prybutok et al. 2000); forecasting various air pollutant (Kulkarni et al. 2018; Chaudhuri and Dutta 2014), noise pollution data (Garg et al. 2015), water quality time series data (Faruk 2010), for water quality management (Parmar and Bhardwaj 2014); etc. Thus, we find large applications of ARIMA almost in every field.

ARIMA and environment are highly correlated. Forecasting is needed in all the fields of environment such as air pollution, noise pollution, fossil fuels, rainfall data, and underground waters, as all these factors have direct relationship with health. Forecasting in these fields can result in forewarning which can be highly beneficial. Though there are many research papers on application of ARIMA in various fields including environment, there is no substantial work that reviews the building stages of ARIMA. In this regard, the present work attempts to present three different stages through which ARIMA was evolved. It needs to be highlighted that detailed information about the process of evolution is equally significant as knowledge about its application. In this review paper, we have tried to review more than 100 papers based on pure ARIMA and its hybrid modeling. The first part of this paper deals with general introduction to ARIMA, followed by its historical overview, and the last part deals with its application review. The last section is divided into two categories, where major emphasis is laid on application of ARIMA in the field of atmosphere/environment/health or factors influencing air quality such as air pollutants, noise pollution, and rainfall, and the last section deals with application of ARIMA in other fields such as financial data and load forecasting data. The last categories are again divided into two further subsections: one involving purely ARIMA models and the other development of hybrid models with ARIMA.

ARIMA model analysis

The classes of autoregressive moving average (ARMA) models are frequently used while modeling linear and stationary time series due to their outstanding results and effectiveness (Al-Saba and El-Amin 1999). In 1921, Yule presented pure moving average process, whereas he introduced pure autoregressive process in 1927. Box and Jenkins (1970), Hannan (1970), and Anderson (1971) have been pioneers in building various techniques using autoregressive moving average models. Hipel et al. (1977) have given the theoretical and practical approaches for the model building stages of ARIMA.

ARIMA processes are a kind of stochastic techniques which are used to investigate behavioral pattern of time series. ARIMA models are quite flexible in nature as they can represent pure autoregressive (AR), pure moving average (MA), and mixed AR and MA (ARMA) series. Unfortunately, ARIMA models are unable to capture nonlinear pattern of the series and thus are not suitable for approximating complex real-world problems. Time series which has either a trend or seasonal patterns do not exhibit stationary behaviors. ARIMA (p, d, q) model only captures trends and not seasonal behavior of time series. To model a seasonal pattern, we have \(ARIMA\) (p,d,q)(P,D,Q) model.

Mathematical formulation

Moving average \(({\varvec{M}}{\varvec{A}})\) process

A process {\({z}_{t}\)} is said to be a moving average process of order \(q\) if.

\({z}_{t}={a}_{t}-{\theta }_{t-1}{-}_{\dots .}-{\theta }_{q}{a}_{t-q}\) where \({\theta }_{i}\); i = 1,2,3…,q are constants and {\({a}_{t}\}\) is a purely random process with mean zero and variance \({\sigma }^{2}\)(Box et al. 1994).

Autoregressive \(({\varvec{A}}{\varvec{R}})\) process

A process { \({z}_{t}\)} is regarded as an autoregressive process of order \(p\) if.

\({z}_{t}={\phi }_{1}{z}_{t-1}+{\phi }_{2}{z}_{t-2}+\dots +{\phi }_{p}{z}_{t-p}+{a}_{t}\) where \({\phi }_{j}\) (j = 1,…,p) are constants and {\({a}_{t}\}\) is a purely random process with mean zero and variance \({\sigma }^{2}\) (Box et al. 1994). This model works like multiple regression model.

Autoregressive moving average \(({\varvec{A}}{\varvec{R}}{\varvec{M}}{\varvec{A}})\) process

Both the autoregressive \((AR)\) and the moving average \((MA)\) are combined to build the autoregressive moving average \((ARMA)\) model. The \((ARMA)(p,q)\) model for a time series which contains \(p\) \((AR)\) terms and \(q\) \((MA)\) terms can be expressed as.

$$\begin{array}{cc}{x}_{t}=\sum_{i=1}^{p}{\phi }_{i}{x}_{t-i}-\sum_{j=1}^{q}{\theta }_{j}{a}_{t-j}+{a}_{t}& ;t=1, \end{array}\dots ,T$$

Here, \({a}_{t}\) is known as normal white noise process. It has zero mean and variance \({\sigma }^{2}\). T is the amount of data in the time series (Box et al. 1994). AR parameters should satisfy the condition for stationarity and MA parameters should satisfy the conditions for invertibility.

Seasonal ARIMA

ARIMA as such does not support seasonal data, i.e., time series with repeating cycles. However, such time series are expressly modeled by ARIMA extension known as seasonal ARIMA, i.e., SARIMA. Seasonal ARIMA models are defined by 7 parameters \(p, d, q, P, Q, D, and s\) and mathematically defined as

$$\left(1- {\upphi }_{1}\mathrm{B}-{\upphi }_{2}{B}^{2}-\bullet \bullet \bullet \bullet -{\upphi }_{p}{B}^{p}\right)\left(1- {\upbeta }_{1}{B}^{s}-{\upbeta }_{2}{B}^{2s}-\bullet \bullet \bullet -{\upbeta }_{P}{B}^{Ps}\right){\left(1-B\right)}^{d}{\left(1-{B}^{s}\right)}^{D} {\mathrm{y}}_{t}=\mathrm{c}+\left(1-{\Psi }_{1}\mathrm{B}-{\Psi }_{2}{B}^{2}-\bullet \bullet \bullet \bullet -{\Psi }_{q}{B}^{q}\right)\left(1- {\uptheta }_{1}{B}^{s}-{\uptheta }_{2}{B}^{2s}-\bullet \bullet \bullet -{\uptheta }_{Q}{B}^{Qs}\right){\epsilon }_{t}$$
  • p and P are non-seasonal and seasonal autoregressive polynomial orders, respectively.

  • q and Q are non-seasonal and seasonal moving average polynomial orders, respectively.

  • d and D are order of normal and seasonal differencing, respectively.

  • s is the period of the seasonal pattern appearing.

The ARMA models work only on stationary data, but in reality, data of the various fields is non-stationary, making ARMA unfit for these problems. With the help of differencing, the data can be made stationary, and this step leads to the development of ARMA. ARMA is in fact generalization of ARMA processes.

Historical overview

Based on the principle of parsimony, statisticians Box and Jenkins (1970) gave a practical approach to build ARMA model. The method uses a three-step iterative approach of model identification, parameter estimation, and diagnostic checking to build the best parsimonious model from a general class of ARMA models (Fig. 1). The process is repeated until a satisfactory model is obtained which can be then used for prediction (Singh et al. 2020a, b, 2021a, b). In this section, historical evolution of various steps involved in ARMA modeling is discussed in detail.

Fig. 1
figure 1

Methodology of ARIMA model

Model identification

Model identification is a herculean task in any ARMA modeling. In this regard, autocorrelation function (ACF) and partial autocorrelation function (PACF) are vital statistics in determining the order of the model. While ACF explains correlation, PACF describes partial correlation between the series and lags of itself. Box and Jenkins (1970) introduced the concept of degree of differencing “d” also. Generally, d is 0, 1, or 2. After d is selected, p and q are calculated from the overall trend of ACF and PACF of the appropriately differenced series. Cleveland (1972) presented inverse autocorrelation function (IACF) for this step. But the method was not suitable for mixed models.

Sometimes by using visualization tools such as ACF and PACF, it is not possible to identify the parameters p, d, q and P, D, Q. In that case, the model with the lowest BIC (Bayesian information criterion) or AIC (Akaike information criterion) is selected. Based on different information theoretic techniques such as AIC or minimum description length (MDL), various methods have been proposed by researchers Hurvich and Tsai (1989), Ljung (1987), and Shibata (1976) for order selection.

Astrom and Eykhoff (1971), Van Boom and Enden (1974), and Unbehauen and Göhring (1974) acquainted on time series model identification in the engineering field. Since the beginning of the 1970s, various estimation-type identification methods have come into limelight. Akaike (1969) introduced final prediction error (FPE) to determine the order \(p\) of the AR models which is defined as

$${FPE}_{\left(P\right)}=(1+ \frac{2p}{n}){\sigma }_{p}^{2}$$
(1)

where n is the number of observations and \({\widehat{\sigma }}^{2}\)(P) is an estimate of white noise variance.

For the mixed model, the criteria are expressed as

$$\delta \left(p,q\right)=\mathit{log}\left({\widehat{\sigma }}^{2}\right)+\frac{\left(p+q\right)g\left(n\right)}{n}$$
(2)

Here, \({\widehat{\sigma }}^{2}\) represents the maximum likelihood estimate for the residual variance \({\sigma }^{2}\). The values \(\widehat{p}\) and \(\widehat{q}\) minimizing δ (p, q) are the best approximations for p and q. Taking different values of g(n) leads to different criteria (Table 1).

Table 1 Model identification criterions depending upon values of g(n) along with their mathematical expressions

Another procedure known as criterion of autoregressive transfer function (CAT) was introduced by Parzen (1974) where the actual model is presumed to have an AR (\(\propto\)) representation. The order selected \(\widehat{p}\) is interpreted as best finite order AR approximation to the true AR (\(\propto\)) process. Extending the work of Woodward and Gray (1981), Glasbey (1982) introduced generalization of partial autocorrelations (GPAC) as crucial techniques in ARMA model identification. To estimate d, Janacek (1982) proposed a method using log of the power spectrum. For order estimation of a finite moving average process, Bhansali (1983) gave autoregressive and window estimates of the inverse correlation function. Monahan (1983) used Bayesian approach for determining the order (p, q) of the ARMA model. Pattern identification techniques using extended Yule-Walker equations for ARMA order identification was employed during early 1980s. Tucker (1982) replaced R and S arrays with RS array for the ARMA model identification. By integrating order identification approach for mixed stationary and non-stationary ARMA models, Tsay and Tiao (1984) eliminated the need of differencing required for producing stationarity in the time series. Augmenting their study, Tsay and Tiao (1985) introduced the smallest canonical correlation (SCAN) method. Broersen (1985) presented weak parameter criterion (WPC) for model order selection. Poskitt (1987) modified Hannan and Rissanen (1982) criterion of order selection. The order of ARMA model was determined using white noise test by Pukkila and Krishnaiah (1988). Koehler and Murphree (1988) preferred Schwarz information criterion (SIC) over Akaike information criterion (AIC) for order selection. Hurvich and Tsai (1989) developed a bias-corrected method AlCc for better model order choices. Koreisha and Pukkila (1993) introduced iterative procedures for determining the degree of differencing required to make time series data stationary. Zhang and Zhang (1993) developed an algorithm involving only autocorrelations for determining order of MA processes. Liang et al. (1993) gave a new approach based on the Eigen values of covariance matrix for ARMA model order determination. Sreenivasan and Sumathi (1997) formulated a new generalized parameters technique for both seasonal and non-seasonal ARMA model identification.

Model estimation

The estimation of AR parameters is very crucial in time series analysis for the adequate information about the model. Maximum likelihood methods, ordinary least squares (OLS), and method of moments are some of the extensively used techniques for parameter estimation in time series analysis. Pure autoregressive models are either estimated by OLS method or by using Levinson (1947)-Durbin (1960) algorithm. Durbin (1960) showed that method of OLS leads to optimum estimates of the model provided errors are normally distributed. Burg (1975) came with an algorithm similar to forward backward prediction especially for short records and Huzii (1981) proposed a method based on higher order moment for estimating AR coefficients. Proposing an alternative to the Burg’s estimates Pukkila and Krishnaiah (1988) calculated true correlation matrix of the lagged variables. Basu and Das (1992) consider optimality of the maximum likelihood estimator under a general set-up of roots of the characteristic equation of the \(p\) th order of autoregressive process.

In comparison to the parameter estimation methods available for AR model, the number of techniques to approximate MA parameters is fewer. Durbin (1959) is pioneer in the estimation of MA parameters with a simple estimation procedure. A quadratically convergent algorithm was formulated by Wilson (1969) to estimate the parameters of a MA process. Whereas Gauss–Newton method was used to estimate the parameters of a non-linear function, Fuller (1976) utilized it for the MA process. Godolphin (1977, 1978) proposed an alluring computational procedure augmenting the previous studies. 

For ARMA cases, myriad studies are available. Wilson (1969) and Marquardt (1963) have developed an algorithm to estimate the ARMA parameters which was employed by Box and Jenkins (1970) to lay the foundation of all the ARMA, AR and MA processes. The research done by the latter is regarded to be a milestone in the field of modeling. McLeod (1977) gave an easier implementing modified version of Box and Jenkins (1970) which approximated exact ML estimators very precisely. Tuan (1984) derived many recursive relations which worked as a tool in identifying the order and parameters of ARMA at its preliminary stages. Using autocovariance function, Choi (1986) presented an algorithm for the parameter estimation of stationary ARMA process. The iterative algorithm proposed by Choi in 1986 was not convergent; therefore, he developed a convergent Newton–Raphson solution for MA parameters in the subsequent year. Saikkonen (1986) derived two-step estimators which were asymptotically efficient.

Recently, different studies have focused on maximum likelihood (ML) procedures as a tool to estimate ARMA models. Basu and Das (1991) analyzed the asymptotic properties of least-squares estimation procedures of the parameters of an ARMA (p, q) process in the stable case. The SCA System can be used to estimate the parameters of the model. For best results, a conditional likelihood function is selected along with the detection and adjustment of outliers. Mikosch et al. (1995) derived the asymptotic properties of the estimators. The major drawback of ML estimates is that it often lies outside the invertibility region. However, the generalized least-squares method for the estimation of pure MA and ARMA model presented by Pukkila et al. (1990) has succeeded in overcoming it.

Model diagnostic checking

Before the remarkable advent of Box and Jenkins (1970) procedure, hypothesis testing methods were largely used for ARMA model identification. McLeod (1974) examined the need to check whiteness and homoscedasticity of the residuals in the diagnostic checking step. Godfrey (1979) proposed a new approach based upon Silvey’s (1959) Lagrange multiplier method to check the adequacy of the ARMA model. Hokstad (1983) has proposed a diagnostic test with the estimated cross correlation function (CCF) between the observed values and the residuals. The CCF can also be used as an indicator for the required improvement in the model.

Model forecasting

Among the wide applications of time series, the most popular is forecasting. Forecasting is easily attainable with state space framework of which ARIMA models are special case. With the state space framework, observational vectors are brought into a system with one element at a time (Durbin and Koopman 2012). State space models have great contribution in the science of environment (Harvey et al. 2004) and also play a pivotal in the analytical handling of time series models (Harvey 1989).

For the practical computation of forecasting, the simplest and most elegant method is of difference equation from which minimum mean square error forecasts can be directly generated (Box and Jenkins 1970). Also, the probability limits for the forecasts can be obtained by solving recursively. Makridakis and Wheelwright (1977) concluded that the adaptive filtering technique can be applied to time series forecasting dealing with real data. Cartwright (1985) has assessed the forecasting performance of Priestley’s model and concluded that forecasting errors can be significantly reduced with the use of broader classes of time series. Ray (1988) concluded that forecasting performance of Bilinear model is more than Box-Jenkins model and threshold autoregression model. Tiao and Tsay (1994) studied developments of time series in both the linear and non-linear domain. They were of the opinion that when the parameters involved are estimated adaptively, linear models provide more accurate forecasts. Thus, we can see the model building process is quite laborious and needs great human expertise.

Wrapping up all the above steps, the general statistical methodology scheme of ARIMA is as presented by Fig. 1.

Modeling and forecasting work flow of ARIMA

As observed ARIMA finds large applications in various fields due its high mathematical precision and great reliability. The next section of the paper deals with review of papers employing ARIMA exclusively and hybrid ARIMA models published in world’s best-class journals. The rising disastrous threats to the atmosphere or air quality are turning to be great concerns of the twenty-first century. Its hazardous effects are felt by human beings as well as by the entire wildlife. Thus, forecasting in this field can be a great contributor to governments and public at large in taking all the required precautionary steps in advance. There are various environmental factors affecting the air quality such as air pollutants, water pollutants, noise pollution due to traffic congestions, rainfalls, and surface erosions. Also, the environment exhibits close connections with energy resources. Thus, our coming section deals with all these environmental factors first forecasted with ARIMA alone followed by hybrid models involving ARIMA. The data set undertaken, methodology adopted, and results obtained, all are discussed elaborately for the complete understanding of the readers.

Survey of air quality/environment/health data involving application of ARIMA models

Kulkarni et al. (2018) studied the variation of SO2, RSPM, NOx, and SPM parameters present in the atmosphere of Nanded City (Maharashtra, India). The randomness, trend, and seasonality present in the data were also analyzed. The forecasted result revealed that RSPM and SPM are exceeding the permitted limits. Jaiswal et al. (2018) observed the statistical trend of CO, NO2, SO2, PM2.5, and PM10 concentrations for the time span January 2013 to December 2016 for the city Varanasi (India) using Mann–Kendall and Sen’s slope estimator approach. Different ARIMA models, namely, ARIMA (1,0,0), (1,0,1), and (1,1,1), were fitted on the three data sets of summer, monsoon, and winter season and their results compared. ARIMA (1,1,1) was chosen as the best fit model for forecasting all the pollutants. Pohoata and Lungu (2017) tried to analyze the air quality of the city Ploiesti in Romania for the pollutants O3, CO, NO2, NOx, and PM10. While ARIMA (3,1,3) provided good results for NOx, NO2 and O3, it failed to give satisfactory results for CO and PM10. Kumar et al. (2004) forecasted one-day advance O3 concentration in Brunei Darussalam using ARIMA modeling approach. ARIMA (1,0,1) was the best fitted method. Liu et al. (2018) used ARIMA with numerical forecasts (ARIMAX) with the vision to improve forecast of O3, PM2.5, and NO2 for Xingtai (China). Significant reduction in RMSE, viz., 47.8–49.7%,14.3–21.0%, and 41.2–46.3%, respectively, for O3, PM2.5, and NO2, was seen by employing CMAQ-ARIMA for the daily 1-h and 8-h forecasting values at all the stations. Dynamic hourly forecast shows that ARIMAX can also be successfully applied to forecast of 7- to 72-h PM2.5, 4- to 72-h NO2, and 4- to 6-h O3. Zhang et al. (2018) forecasted PM2.5 concentrations using AQI and meteorological parametric data for Fuzhou (China) with the help of ARIMA. It was observed that PM2.5 concentrations were positively correlated with PM10, NO2, and SO2 concentrations and negatively correlated with meteorological parameters. Average PM2.5 concentrations were 52% higher in cold periods in comparison to warm periods. ARIMA (6,1,1) was found to be best model for the data.

This is the first applicability of ARIMA model on the dataset of VC. Prior to model simulation of VC over the region of Delhi, trends and variations in the data set were analyzed. 12 ARIMA models ARIMA (0,0,1), (1,0,2), (0,0,5), (1,0,0), (0,0,1), (1,0,0), (0,0,1), (0,0,1), (1,0,0), (0,0,2), (0,0,1), and (0,0,3), respectively, for each month from January to December were developed separately. The result revealed that past continued to impact future values of VC.

Kumar and Goyal (2011) thrived to build a forecasting AQI model. Three models are assessed, i.e., ARIMA (1,1,1), PCR, and ARIMA combined with PCR with respect to 4 seasons of the year in Delhi. The hybrid model performed better in predicting AQI one-day advance. The hourly and monthly concentrations of CO spread over 7 years of data of Hong Kong were analyzed by Lau et al. (2009) using ARIMA modeling. Association of hourly concentration of CO with different days of the week was examined. The hourly data of CO was like traffic data. This strong association was proven by SARIMA (0, 1, 1) (0, 1, 1)24 model. It was also shown that data possessed long-term memory features. Kumar and De Ridder (2010) took the daily maximum O3 concentration data of four sites of London and Brussels for study. They studied GARCH modeling technique in association with FFT-ARIMA to make forecasts of ozone episodes at these sites. In the study of Slini et al. (2002), ARIMA (1,1,1) and ARIMA (1,1,0) were used over the data of maximum daily O3 concentration data of Athens (Greece) from 1990 to 1998. The forecasting performance of these two models were observed under the three categories of alarm limit greater than 180,170 and 160 μg/m3. Robeson and Steyn (1990) tried to develop three predictive models, i.e., D/S model, ARIMA (1,1,0), and TEMPER model for the temporal variability of ozone in the lower Fraser Valley of British Columbia by taking 8 years data for the period 1978–1985 from two monitoring stations T9 and T11. On comparison of forecasting ability for ozone concentrations, TEMPER model outperformed the other two models. Siew et al. (2008) compared the performance of ARMA (3, 1, 3) and integrated ARFIMA (0, − 0.5, 2) models for the data of O3, PM10, NO2, SO2, and CO concentration from March 1998 to December 2003 of Selangor Malaysia. Though both models could not forecast all the values completely, ARFIMA performed slightly better than ARIMA.

Ahn (2000) applied second order differencing ARIMA models to daily groundwater head time series (1985–1990) from 7 monitoring wells located in Collier County, Florida. Variance and autocovariance equations were derived for the second-order time series models using ARIMA (0,2,1), (0,2,2), (1,2,0), (2,2,0), and (1,2,1) as function of parameters of the model under study. Mirzavand et al. (2014) worked on AR, MA, ARMA, ARIMA, and SARIMA models in analogous to forecast groundwater levels up to 60 mo in plain expanses of Kashan aquifer in Iran. The seasonality and stationarity of the data were checked. Taheri Tizro et al. (2014) analyzed several water parameters of Hor Rood River at Kakareza station with the help of ARIMA modeling. Based on R2, AIC, RMSE, and MAPE, ARIMA (2,1,3), (2,1,3), (1,1,3), (1,1,3), (2,1,1), (2,1,1), (1,1,1), and (2,1,2) were found to be best suitable for parameters TDS, EC, HCO3-, SO42−, Ca+, Na+, ph, and SAR generation, respectively. The increasing trend of majority of parameters showed a picture of deteriorating water quality conditions in the region.

Garg et al. (2015) in their paper simulated daily mean LDay (06–22 h) and LNight (22–06 h) in A- and C-weightings in conjunction with single-noise metrics, day-night average sound level (DNL) for a period of 6 months for the station East Arjun Nagar in Delhi using ARIMA methodology. ARIMA (0,0,14), (0,1,1), (7,0,0), (1,0,0), and (0,1,14) are chosen to be fit models for LDay dBA, LNight dBA, LDay DB(C), LNight dBC, and DNL dBA, respectively. Augmenting their work, the authors in 2016 compared ARIMA and ANN on the same problem and found ANN model gave better results than ARIMA model. Guarnaccia et al. (2017) presented two methods for the acoustic data set of Nice (France) international airport for the year 2000. The two methods utilized were DD-TSA (deterministic decomposition model) and SARIMA (0,1,1) (0,1,1)24. While the former method captured long-term behavior of the data set, the latter captures short-term behavior. To quantify the forecasting errors, residual analysis was carried out. The DD-TSA gave slightly better results in terms of low standard deviation. Williams and Hoel (2003) applied SARIMA (1,0,1)(0,1,1)672,(1,0,2)(0,1,1)672, (2,0,1)(0,1,1)672 for M25 station and \({(\mathrm{1,0},1)(\mathrm{0,1},1)}_{672}\) \({,(\mathrm{3,0},0)(\mathrm{0,1},1)}_{672}\), \({(\mathrm{1,0},2)(\mathrm{0,1},1)}_{672}\) for I-75 stations on the vehicular traffic data. The authors concluded that one-step seasonal ARIMA predictions constantly outperformed ARIMA and random walk forecast results.

Ab Razak et al. (2018) using Mann–Kendall trend analysis found ARIMA (0, 1, 2) is the best suitable model for the daily and monthly rainfall and stream flow data of stations of Malaysia for the period 2000–2010. Zakaria et al. (2012) developed four ARIMA models (3,0,2)(2,1,1)30, (1,0,1)(1,1,3)30, (1,1,2)(3,0,1)30, and (1,1,1)(0,0,1)30 for the weekly rainfall data for the stations Sinjar, Mosul, Rabeaa, and Talafar in north west Iraq for the period 1990–2011 stations.

Benvenuto et al. (2020) found ARIMA (1, 0, 4) and ARIMA (1, 0, 3) to be the best model for determining the prevalence and incidence of COVID-19, respectively, from January 20, 2020, to February 10, 2020. Logarithmic transformation was done out to check the seasonality influence on the prediction.

Suresh and Priya (2011) took 57 years data from (1950–195 l) to (2007–2008) of sugarcane area, production, and productivity for Tamil Nadu for analyzing. Various ARIMA models with \(p\) and \(q\) varying from 0, 1, and 2 were fitted. While ARIMA (1, 1, 1) model was found suitable for sugarcane area and productivity, ARIMA (2,1, 2) was appropriate for sugarcane production.

Li and Li (2017) applied GM (1,1), ARIMA (1,2,0), metabolism GM (1,1), and GM-ARIMA to forecast future energy needs of Shandong province using energy data for the span 1995–2015. Employing histogram judgment method, it was shown that data contains non-linear sequencing. This led to the development of GM (1,1) model on the data. Unfortunately, this method was also precluded because of 21 data points in the data set. The optimal requirement of data points for application of gray model is 5–10. In the next step, gray metabolic forecast model was tried. After the successful application of gray metabolic model on the basis of relative error, a series of residuals was obtained. Residual corrections are done with the help of ARIMA model. This process enhanced the prediction accuracy significantly, and this was the major innovation of this study. Aamir and Shabri (2016) employed ARIMA, GARCH, and ARIMA-Kalman to predict crude oil rates in Pakistan by undertaking average monthly prices of crude oil for the time February 1986 to March 2015. The ARIMA Kalman filter technique proved to be best approach as MAE and RMSE were minimum in this case as compared to ARIMA and GARCH.

It is very much evident from the above discussion that ARIMA is highly significant in the field of environment and health. ARIMA being simple and reliable has tremendous potential to forecast in these areas. Thus, it was important to review contribution of ARIMA in these domains.

Hybrid modeling

The real-world problems are usually complex rather than being simple. Thus, linear predictive methods do not perform as desired when used to process data from a nonlinear system (Patil 1990). To find out whether the series is linear or non-linear requires enormous efforts of the forecasters. The researchers try different combination of models based on different theoretical and practical approaches and various other factors such as sampling variation, model structure and uncertainty to develop a model which yields more accurate results and enhanced forecasting (Jenkins 1982; Makridakis 1989). Bates and Granger (1969) are considered to be pioneer in introducing combining forecasts as an alternative to use one single forecast. The literature indicates that performance of time series forecasting increases through combining forecasts (Makridakis et al. 1982).

The next part of our study deals with review of papers based on ARIMA hybrid modeling in the field of environment/atmosphere/health.

Analyzing ARIMA hybrid-based studies

Mani and Volety (2021) forecasted three air pollutants, namely, CO, NH3, and O3 of Vijayawada station by employing both ARIMA and the LSTM models. Kalman filters are also used to enhance the performance and forecasting abilities. The latter model proved to have higher accuracy using RMSE and MAE as performance indices. Wang et al. (2017) came up with hybrid-GARCH model to overcome conditional heteroscedasticity almost present in every hybrid model. The authors used ARIMA and SVM models to explain the linear and non-linear components, respectively, of the AQI data comprising PM2.5, SO2, NO2, CO, and O3 concentrations for six stations from the Shenzhen air quality monitoring network (China) for the time period 01 Sep 2013 to 10 Sep 2013. For estimating the coefficients of individual models, GARCH model is introduced. The accuracy level of hybrid-GARCH model in terms of metrices MAE and RMSE were higher than the individual models. Samia et al. (2012) attempted to foresee one day in advance the max 24-h ma PM concentrations in the region of Sfax Southern Suburbs using MLP, ARIMAX, and the hybrid model. The study revealed that hybrid ARIMAX-ANN outperformed the individual models since it can explore both linear and non-linear patterns. Zhu et al. (2017) stated that the most challenging problem while forecasting AQI is of data being highly complex and non-stationary. Thus, they presented two hybrid models EMD-SVR-Hybrid and EMD-IMFs-Hybrid for AQI data of Xingtai, China, collected from June 2014 to August 2015. To obtain smooth IMF, EMD technique is employed. Then, SVR is used to predict total sum of IMF’s and finally S-ARIMA is applied for analyzing residual sequences obtained from the two proposed models. In this paper where ARIMA is used for modeling AQI data, S-ARIMA is kept for forecasting IMF5 as well as for analysis of the residues. The predicted outcomes of AQI are sum of EMD-IMFs and S-ARIMA. The IMF4, IMF5, IMF6 and IMF7 chosen from IMFs are forecasted by Holt-Winters (0.9,0.2,0.3), S-ARIMA (1,1,1) (0,1,1), Holt Winters (0.1,0.2,0.5), and GM (1,1) respectively. Chelani and Devotta (2006) examined the effectivity of their proposed hybrid model based on chaos theory with ARIMA and nonlinear models for the time series data of \({NO}_{2}\) concentration present in the air from 1999 to 2003 at a site in Delhi. The prediction performance based on MAPE, RMSE, and RE confirmed that the hybrid modeling is more effective than individual models in forecasting the air pollutant concentrations. Díaz-Robles et al. (2008) studied hourly and daily time series of PM10 and meteorological data during 2000–2006 at the Las Encinas monitoring station in Temuco. Hybrid model outperformed ARIMAX and ANN in terms of RMSE, MAE, and BIC. Prybutok et al. (2000) studied a neural network model for forecasting daily maximum ozone levels and showed that the neural network model is superior to the two conventional statistical models, regression and Box-Jenkins ARIMA models. The maximum ozone data for the year 1994 from Houston was the research object.

Ji et al. (2019) forecasted the future carbon prices using the hybrid ARIMA-CNN-LSTM model. The linear features of the data set were modeled by ARIMA. CNN model extracted the spatial features of the residual of ARIMA model. And finally long-term dependencies of these features were captured by LSTM. It was evident from the experimental analysis that the hybrid ARIMA-CNN-LSTM outperformed the individual models. The dataset of carbon future prices from 7 April 2008 to 6 May 2019 was taken. Koutroumanidis et al. (2009) showed that the hybrid ARIMA-ANN model has a better adaptability and can make better predictions of future selling prices of the fuelwood produced by Greek state forest farm as compared to both the ARIMA model and the simple ANN model.

Faruk (2010) observed that the best fit models for water temperature (0C), boron (mgl−1), and DO (mgl−1) are SARIMA (1,1,1) (0,0,1)12, (1,1,1) (0,0,1)12, and (1,1,1) (0,0,1)12, respectively. They further found hybrid modeling approach of combining SARIMA with NNBP can give more reliable predictions for these parameters of a river than the Neural Network and traditional baseline ARIMA modeling approach individually.

Chattopadhyay and Chattopadhyay (2010) and Somvanshi et al. (2006) examined the rainfall data of India and found that hybrid models outperformed individual models in terms of forecasting efficiencies. Wei et al. (2016) found that the most appropriate model for the morbidity data for hepatitis from the Heng County CDC from January 2005 to December 2012 is ARIMA (0,1,2) (1,1,1)12. (Table 2).

Table 2 Concise results of application of ARIMA and its hybrid methodology in the field of environment/health/air quality

From Table 3, it can be clearly seen that errors are reduced when hybrid models are implemented on the various data sets. Thus, both theoretical and empirical findings are in strong favor of combining different methods to achieve effective and efficient forecasts (Newbold and Granger 1974). Consequently, hybridizing different models has become the latest research objectives in the field of modeling and forecasting. Table 2 provides concise results of sections “Survey of air quality/environment/health data involving application of ARIMA models,” “Analyzing ARIMA hybrid-based studies,” “Reviewing application of ARIMA alone,” and “Investigating ARIMA with hybrid methodology” and some of the other papers involving ARIMA in tabular form which would be easier to comprehend.

Table 3 Application of performance evaluation measures on various ARIMA and its hybrid models

Examining role of ARIMA in studies other than air quality/environment/health concerns

ARIMA and its hybrid models have got such wide utilization that they cannot be restricted to merely one or two arenas of study. Therefore, in this section, few research studies in the field of financial data or load forecasting or some other fields are briefly reviewed to provide a glimpse of its varied applications. Following the previous chronology, we review papers with ARIMA alone and then followed by review of papers involving ARIMA and its hybrid methodology.

Reviewing application of ARIMA alone

CONTRERAS et al. (2003) proposed two ARIMA models to predict hourly prices in the electricity markets of Spain and California. The Spanish data showed volatility. Jadevicius and Huston (2015) generated around twenty ARIMA models ranging from ARIMA (1, 0, 0) to ARIMA (4, 0, 4). ARIMA (3, 0, 3) model outperformed all the models to model Lithuanian house price index. Garcia et al. (2005) applied ARIMA and GARCH model to forecast one-day ahead electricity for mainland Spain and California with the conclusion that GARCH model is more accurate than ARIMA. Yaziz et al. (2016) surveyed the modeling and forecasting performances of gold prices using ARIMA-TGARCH with Gaussian, Student’s t skewed, Student’s t, GED, and skewed GED innovations. Hybrid ARIMA (0,1,0)-TGARCH (1,1) with t-innovation was chosen as the best model.

Ariyo et al. (2014) and Wadi et al. (2018) applied various ARIMA models on the stock exchange data sets to find ARIMA (2,1,0), ARIMA (1,0,1), and ARIMA (2,1,1) best fits Nokia stock index, Zenith bank stock index, and Amman Stock Exchange, respectively. Siregar et al. (2017) based on ACF and PACF concluded ARIMA (3.0, 2) is the most appropriate model for predicting the sales of the factory in the city of Bandung. Wabomba et al. (2016) found ARIMA (2,2,2) as the best model for Kenyan GDP data. ARIMA (1,1,1) was found the best suitable method on the monthly gold price data by Guha and Bandyopadhyay (2016). In his research work, Lin (2018) gave an extensive review on GARCH models as well as detailed analysis of stock markets in general and particularly in China.

Investigating ARIMA with hybrid methodology

To capture the volatility present in the data, Babu and Reddy (2014) decomposed the series into two sets using MA filter. The proposed hybrid model resulted in improved one-step and multi-step ahead prediction accuracy. Nie et al. (2012) compared hybrid of ARIMA model and SVMs for short-term load forecasting with individual models via simulation for the electric load data of power company in Heilongjiang of China from March 1 to May 31, 1999. Hybrid ARIMA-SVM model performed better than the two separate models alone. Chen and Wang (2007) observed that the values of NMSE, R (correlation coefficient), and MAPE were lowest for hybrid model (SARIMASVM2) for the production data of the Taiwanese machinery industry from January 1991 to December 1996. Zhang (2003) compared ARIMA and ANN for the Wolf’s sunspot data from 1700 to 1987, the Canadian lynx data, and the British pound spanning from 1821 to 1934 and the US dollar exchange rate data extending from 1980 to 1993 and concluded combined model has greater forecasting accuracy. Khashei and Bijari analyzed the same data as used by Zhang (2003) in the year 2011. A new hybrid model better than Zhang’s model, ARIMA and ANN alone, was developed. Tseng et al. (2002) concluded that SARIMABP model outperformed SARIMA models for the total production revenues of Taiwan machinery industry. Wang (2011) compared ARIMA and fuzzy time series by heuristic models with Taiwan exports data from January 1990 and 30 March 2002. For longer analyzing time period, the MSE of the time series ARIMA model are lower, while for shorter analyzing period, the MSE are more. The heuristic fuzzy time series model is an appropriate tool when information is lacking and an urgent decision is needed. Wang and Leu (1996) used a hybrid model to conclude that ANN provides better results with differenced data than raw data in case of Taiwan stock exchange.

Wang et al. (2012) observed data for monthly closing index of SZII and opening index of DJIAI from China and the USA, respectively, from January 1993 to December 2010. It was seen that hybrid model could more effectively capture various relationships in the data. Ho et al. (2002) applied ARIMA, RNN, and MFNN to the failure time data for a repairable compressor system at a Norwegian process plant. RNN at the optimal weighting factor gives satisfactory performances compared to the ARIMA model. The simple and wide use of ARIMA models led Mondal et al. (2014) to study 56 Indian stock markets spread over different sectors. The high lightening feature of this research was analysis of sector-based ARIMA models, thus covering larger portion of Indian stocks. Different ARIMA models were generated and their AIC compared.

All the above narration can be summarized that in recent past hybrid modeling techniques have substituted single modeling processes. Also, these emerging nonlinear soft computing techniques are robust, parsimonious in their data requirements and provide good long-term forecasting. Though with the advent of new soft computing methods many difficulties while implementing ARIMA have been overcome, still ARIMA remains the benchmark in the field of modeling and forecasting due its high level of simplicity and great level of reliability.

Comprehension of ARIMA models along with its hybrid requires knowledge of performance evaluation criteria also. Thus, our coming section deals with various performance evaluation metrices.

Performance evaluation of hybrid models

Different statistical metrices such as MAD (mean absolute deviation), SSE (sum squared error), RMSE (root mean squared error), MSE (mean squared error), MAPE (mean absolute percentage error) are employed while evaluating the performance of the proposed model. MAD, RMSE and MAPE are defined by

$$\left.\begin{array}{c}\mathrm{MAD}=\frac{1}{n}\sum_{t=1}^{n}\left|y\left(t\right)-\widehat{y}\left(t\right)\right|\\ \mathrm{RMSE}=\sqrt{{\frac{1}{n}\sum_{t=1}^{n}\left|y\left(t\right)-\widehat{y}\left(t\right)\right|}^{2}}\\ \mathrm{MAPE}=\frac{1}{n}\sum_{t=1}^{n}\left|\frac{y\left(t\right)-\widehat{y}\left(t\right)}{y\left(t\right)}\right|\times 100\end{array}\right\}$$

where \(\widehat{y}(t)\) denotes the predicted value of \(y(t)\) and \(n\) is the number of points of the training and testing data sets. It is very much evident from Table 3 that errors are reduced when hybrid models are implemented on the various data sets.

Table 3 represents the various models employed and the criteria chosen for performance evaluation of these various models by various authors.

Conclusion

Though ARIMA finds its application extensively in the field of forecasting, it too has shortcomings. ARMA models are linear models, but the time series involving environmental/atmosphere/air quality/financial data, etc. are rarely pure linear combinations. A great competence, expertise, and experience of a researcher is required while implementing results of the Box-Jenkins procedure. Moreover, results are strongly affected by the path chosen and hence are path dependent (Weigend and Gershenfeld 1994). But almost all the models though different from each other have similar estimated correlation patterns, resulting in the arbitrary choice of the model (Box et al. 1994). Another drawback of Box-Jenkins procedure is that it is sometimes highly time consuming in the model identification step. ARMA models are highly unsuitable for running policy simulations.

After such a massive review of ARIMA and hybrid modeling involving ARIMA in the fields including or excluding environment/health/atmosphere, it can be concluded that the combined models are more robust and have higher ability to capture all the patterns of the series uniformly. Thus, combining several models or using hybrid model has emerged as a routinized custom, though ARIMA still remains the benchmark of many baseline models.