Forecasting volatility in Asian financial markets: evidence from recursive and rolling window methods

The present paper examines the relative out-of-sample predictive ability of GARCH, GARCH-M, EGARCH, TGARCH and PGARCH models for ten Asian markets by using three different time frames and two different methods, considering the features of volatility clustering, leverage effect and volatility persistence phenomena, for which the evidence of existence is found in the data. Five measures of comparison are employed in this research, and a further dimension is investigated based on the classification of the selected models, in order to identify the existence or lack of any differences between the recursive and rolling window methods. The empirical results reveal that asymmetric models, led by the EGARCH model, provide better forecasts compared to symmetric models in higher time frames. However, when it comes to lower time frames, symmetric GARCH models tend to outperform their asymmetric counterparts. Furthermore, linear GARCH models are penalized more by the rolling window method, while recursive method places them amongst the best performers, highlighting the importance of choosing a proper approach. In addition, this study reveals an important controversy: that one error statistic may suggest a particular model is the best, while another suggests the same model to be the worst, indicating that the performance of the model heavily depends on which loss function is used. Finally, it is proved that GARCH-type models can appropriately adapt to the volatility of Asian stock indices and provide a satisfactory degree of forecast accuracy in all selected time frames. These results are also supported by the Diebold-Mariano (DM) pairwise comparison test.


Introduction
Volatility is the degree of variation of a trading price series over time, and is usually measured by the standard deviation of logarithmic returns. As an important concern for traders, investors, companies and financial regulatory authorities, volatility forecasts of asset returns have been studied over the years for risk management, security valuation, portfolio diversification and monetary policy making purposes. Furthermore, volatility modelling and forecasting have especially attracted finance professionals and academics following the stock market crash in 1987, since the main reason for the crash was attributed to high volatility (Haugen et al. 1991).
The behavior of stock market volatility is time-varying. The early prominent empirical works of Mandelbrot (1963) and Fama (1965) revealed that small (large) changes in asset prices tend to be followed by small (large) price changes of the same magnitude, a phenomenon known as volatility clustering. Throughout the empirical applications over the last five decades, evidence suggests that volatility changes of return series are predictable, particularly in the long-term (Fama and French 1989;Wurgler 2000;Cochrane 2008;Campbell and Thompson 2008). Therefore, numerous empirical models and methods have been developed and applied to identify and accurately predict the volatility behavior of return series. Nevertheless, earlier studies reveal no consensus regarding which model or method can provide the most accurate forecasts of asset returns.
Early studies tried to predict future volatility through simple statistical approaches based on averaging and smoothing methods. However, these simple models had limited prediction capacity, as financial time series tend to harbor certain special characteristics, such as volatility clustering. In order to deal with this issue, Engle (1982) developed the first generation of heteroscedasticity models with the seminal idea on ARCH models. Bollerslev (1986) took another step and put forward a generalized version, called the GARCH model. Although the ARCH and GARCH models received incredible attention from researchers and practitioners and proved their empirical success, these models were still not able to capture the stylized fact of volatility asymmetry, which was later named the leverage effect by Black (1976). This constraint has been solved by the development of more adaptable and advanced versions. Noteworthy and popular examples of this new model class are Nelson's (1991) Exponential GARCH (EGARCH) model, Ding et al.'s (1993) Power GARCH (PGARCH) model, and Zakoian's (1994) Threshold GARCH (TGARCH) model. A number of studies have been devoted to reviewing the important GARCH family models, such as Poon and Granger (2003), Bauwens et al. (2006), Silvennoinen and Teräsvirta (2009), and Bhowmik and Wang (2020).
The aim of the present paper is to investigate and evaluate the relative out-ofsample forecasting ability of linear and non-linear GARCH models by comparing daily, weekly, and monthly frequencies, using recursive and rolling window methods. However, evaluation of predicted models is not an easy task and one of the major issues is that the "true" volatility series is not observed. To overcome of this problem, the squared return series are used as a proxy for the unobserved volatility process, since squared returns are an unbiased gauge for volatility, as revealed by Andersen and Bollerslev (1998). With the usage of squared returns, proper evaluation of the estimated models is ensured in terms of selected error statistics.
Another important aspect of the paper is its coverage of a broad range of Asian markets, including those of emerging economies. Although there are a significant number of papers on forecasting stock market volatility, there are limited studies examining the Asian markets, particularly on emerging markets. The review of Poon and Granger (2003) reports only five out of 93 papers on volatility forecasting cover Asian markets, namely New Zealand, Australia and Japan, and none at all on emerging Asian markets. Some recent papers have individually examined stock market volatility in Asian markets, including Ibrahim et al. (2020) for Asia-Pacific markets, Pati et al. (2018) for India, Australia and Hong Kong, and Duan et al. (2021) for Taiwan. However, the stock markets of emerging countries such as Indonesia, Thailand, Malaysia, and the Philippines, which together constitute 66% of the market capitalization of the ASEAN economies as of 2016 (Ganbold 2021), tend to be ignored in volatility exercises. In addition, volatility dynamics in emerging stock markets of Asia is expected to influence the global stock markets through the "leverage effect" and idiosyncratic risk factors (Atanasov 2018;Bouri et al. 2020), and hence further indicating the importance of generating more accurate and comprehensive forecasts in these bloc. Therefore, this paper aims to extend the literature of volatility forecasting by selecting ten Asian markets with up-to-date data and covering periods of both financial crisis and recent developments.
It is broadly acknowledged by the financial literature that an increase in data frequency is accompanied by excess kurtosis, which challenges the capabilities of forecasting models due to the fat-tailed distribution on return series (Mandelbrot 1963). Under assumption of normality for errors, the results of the models would be biased. Therefore, the present paper considers student's t distribution in all selected time frames to capture anomalies in the return series. Furthermore, it aims to contribute to the ongoing debate for determining the best model between linear (symmetric) and non-linear (asymmetric) GARCH family models for producing the most accurate volatility forecasts.
This research adds to the current academic literature in three ways. First, it finds that GARCH-type models can appropriately adapt to the volatility behaviour of Asian stock indices and provide a satisfactory degree of forecast accuracy in all selected time frames. The superiority of asymmetric models is more evident for higher time frames, while symmetric models tend to outperform asymmetric ones in lower time frames. Second, given the level of risk associated with investment in stock markets, day traders, investors, financial analysts, and empirical finance professionals should consider alternative error distributions while specifying a predictive volatility model, as less contributing error distributions implies incorrect specification, which could lead to loss of efficiency in the model. Investors should also not ignore the impact of news while forming expectations of investments. Finally, the obtained results report that frequency of data and choice of forecast method have a strong effect on the performance of the models, and therefore, depending on the investment perspective and risk sensitivity, correct method and time frames should be applied.
The remainder of the paper is organized as follows. Section 2 provides a literature review of volatility forecasting applications on various markets with the emphasis on Asian markets. Section 3 reports the methodology used and Sect. 4 provides the data. Section 5 presents the empirical analysis and results. Finally, Sect. 6 discusses the study findings and concludes.

Literature review
Numerous studies in the existing literature have applied various approaches to the question of a superior forecasting model, yet a consensus still has not been reached. Since the stock market incidents in the early 1990s, triggered by the Japanese asset price bubble and Hong Kong's stock market collapse in 1992, a significant amount of study has been undertaken to examine the uncertainty of stock markets in Asia. As Franses and McAleer (2002) state, researchers are committedly seeking to model stock market volatility better, in order to forecast stock markets movements more accurately and possibly foresee such shocks. In light of prominent studies by Engle (1982), French et al. (1987), and Bollerslev (1987), the accumulated literature of financial econometrics indicates that, in addition to the set of economic variables suggested by Chen et al. (1986), stock market volatility has mainly been examined and estimated by time series volatility models. Mandelbrot (1963) and Fama (1965) revealed that stock market volatility shows the volatility clustering property, a phenomenon which has been modeled by Engle's (1982) ARCH model and its extension, Bollerslev's (1986) GARCH model. For example, Bera and Higgins (1993) highlighted that the main contribution of the ARCH family models would be finding unconditional variance changes with time in the volatility of financial time series. On the other hand, Engle and Patton (2001) argued that "despite the success of GARCH models in capturing the salient features of conditional volatility, they have some undesirable characteristics" (p.244). The drawbacks of these models triggered the development of alternative specifications. As a result, options that consider asymmetric effects, such as EGARCH (Nelson 1991), PGARCH (Ding et al. 1993), and TGARCH (Zakoian 1994) have been introduced by researchers over the years. Furthermore, models that consider the long memory phenomenon have also been developed, such as FIGARCH (Baillie et al. 1996), FIEGARCH , CGARCH (Engle and Lee 1999), and HYGARCH (Davidson 2004). Although the success of the above models changes depending on the selected markets and time frames, it can be concluded that GARCH family models are powerful in estimating stock market volatility, confirming the studies of Chiang et al. (2000), Hung (2009), and Ahmed and Suliman (2011). Some Asian markets have been deeply studied over the years using various models. Among these markets, Japan and China took the lead due to the rapid economic progress and explosive investments. Lux and Kaizoji (2007) studied the NIKKEI 225 Index from 1975 to 2001, and the findings showed that GARCH family models are able to present good forecast performance compared to naïve sample variance models, leading the authors to conclude that the time series models are well-suited for predicting large realizations of volatility. Ishida and Watanabe (2009) extended the research into the Japanese stock market by focusing minute-to-minute data with sample period spans from 1996 to 2007. They combined the GARCH model with ARFIMA and successfully predicted realized variance. On the other hand, Gu and Cen (2011) expanded the models for China's stock market and the results revealed that GARCH and CGARCH models are preferred for more accurate prediction of volatility, while TGARCH and EGARCH are better to capture the asymmetric effects of the volatility behavior in China's stock markets. They also suggested that GARCH-type models are more accurate and provide better forecasting compared to SV models for China's capital markets. Meanwhile, Lin (2018) compared the adaptability of the GARCH models on the SSE Index and SX Index using daily returns from 2013 to 2017. Through empirical analysis and forecast evaluation, he discovered that the EGARCH model outperforms the ARCH, TARCH, GARCH and ARIMA models and it is more competent to predict volatility behavior in selected indices. For further research, see Guidi (2010), Chen and Wu (2011), Wei et al. (2018), Chaudhary et al. (2020), and Bhowmik and Wang (2020).
The ongoing argument over the performance of forecasting models has also leaped to emerging economies of Asia. The early findings about volatility behavior in ASEAN nations are fairly mixed. Wong and Kok (2005) compared the forecasting capabilities of six different models using daily returns from the ASEAN-5 equity markets (Indonesia, Malaysia, Singapore, Thailand and the Philippines) by covering the data from 2 January 1992 to 12 August 2002. They separated the results into precrisis, crisis and post-crisis periods. The findings suggest that the forecast results are most reliable for the pre-crisis and post-crisis periods and least reliable for the crisis period. Furthermore, the TARCH and ARCH-M models were found superior for the pre-crisis period, the ARCH-M and Random Walk models outperformed for crisis period, while the TARCH and EGARCH models were best for the post-crisis period for the selected ASEAN countries.
Likewise, Evans and McMillan (2007) examined the volatility forecasts of equity returns with a focus on asymmetric and long memory dynamics in more than 30 economies, including ASEAN-5 countries. The daily data for this study covered 11 years, from 1994 to 2005. By comparing 5 GARCH family and 4 simple pre-ARCH class of models, they found that the HYGARCH model performs best for Singapore, the CGARCH model for Thailand, and the EGARCH model for Indonesia, based on the RMSE error statistic. On the other hand, the moving average method provides the best forecast results for Malaysia and the exponential smoothing method is the best model for predicting the volatility of the Philippines stock market. Guidi and Gupta (2012) studied the same ASEAN-5 stock markets over the period from 2 January 2002 to 30 January 2012. They deployed the APARCH model under two different distributions to predict the volatility of the returns, and the empirical results revealed that the APARCH with the t-distribution is a good prediction model for the selected indices. They concluded that the Indonesian stock market has the largest response to volatility shocks among the ASEAN countries.
More recently, Anggita et al. (2020) investigated the stock market of Indonesia by using ARCH/GARCH models for the period of 2011-2017. The study concluded that the EGARCH model is superior compared to linear GARCH models in modelling and forecasting volatility in emerging markets. In a different study, Sharma et al. (2021) analyzed the top five emerging countries among the E7, including China and Indonesia, using linear and non-linear GARCH models for a period from 2000 to 2019 where the study results revealed that the GARCH model beat the non-linear GARCH models in all selected window periods, which supports the earlier findings of Srinivasan and Ibrahim (2010), but contradicts with Anggita et al. (2020). On the other hand, Lin (2018) showed the suitability of non-linear models for China's stock market due to the significant properties of clustering and asymmetric events in the SSE Composite Index.
Although the reviewed literature has considerably enhanced our understanding of the forecasting performance of a variety of models and volatility behaviors in emerging and developed markets, the findings from the previous studies are quite unclear, given that they were highly dependent on the selection of countries and the range of data period. Thus, the current paper is expected to be one of the first empirical works regarding forecast comparison in ten Asian markets using three different time frames with 24 years of data, which includes two major crises that hit the selected economies at different magnitudes. Moreover, this research addresses the true nature of financial market volatility in countries that tend to be ignored, such as, the Philippines, Thailand, and Taiwan. In addition, identifying excess kurtosis by using student's t-distribution and using recursive and rolling window methods for the selected GARCH models is expected to contribute to the gap in methodology in the field of stock market volatility of Asian countries.

Empirical models
There are more than 300 GARCH-type models (Hansen and Lunde 2005) in the existing literature. Therefore, for brevity, the current paper is confined to focuses on the employed models only. In all selected models, the distributional assumption is considered under student's t-distribution. The rationale behind this choice is that the asset returns are likely to follow levy distribution with fat tails, and the student's t-distribution is more capable of accommodating fat tails compared to normal distribution, which reduces the potential considerable biases on the forecasting results (Andersen and Bollerslev 1998).

GARCH model
The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model was developed and proposed by Tim Bollerslev in 1986. ARCH family models are a milestone in regression analysis in terms of estimating variance by a nonlinear estimation model. The GARCH model is based on a weighted average of past squared residuals with a few improvements compared to ARCH. First, GARCH has decaying weights on past squared residuals that stay above zero, no matter how much it falls.
Second, it puts greater weight on more recent events. Third, it is superior for handling different sets of data in different frequencies. With these combined benefits, GARCH is an avant-garde model with a wide selection of extensions in predicting conditional volatility.
This model can be expressed with a mean specification and a variance specification. The GARCH (1,1) can be represented as follows: Mean specification Variance specification where 0 > 0 , 1 ≥ 0 and ≥ 0 , and r t = asset return, = average return, t = returns of residual.
Returns of residual can also be expressed as: where z t is a random variable with zero mean and 1 variance (i.i.d.), and h t is the time-dependent standard deviation. For the GARCH (1,1) model, these two assumptions ( 1 ≥ 0 , ≥ 0 ) are again needed to confirm that the conditional variance h 2 t will have a non-negative value. To make sure that the model is covariance stationarity 1 + < 1 is required.The mean specification is formed of by the aggregate of average term and error term. This process generates a one-period ahead estimate for the conditional variance h 2 t which is a function of: • Hypothetical long-run average variance: 0 (known as the constant term) • First independent variable which reflects "news" about previous period volatility: 2 t−1 (known as ARCH term) • Second independent variable which reflects forecast variance from previous period: h 2 t−1 (known as GARCH term)

GARCH-M model
Most models used in finance suppose that investors should be rewarded for taking additional risk by obtaining a higher return (Brooks 2008). Engle et al. (1987) proposed a new model to fit this theory, called GARCH in Mean (GARCH-M). This model is another variant of the GARCH-, class models with some extensions, which considers the conditional mean as a function of the conditional variance. The GARCH-M (1,1) model can be expressed by the two specifications as: Mean specification Variance specification (1) The parameter in the mean specification indicates risk premium coefficient. A positive indicates that the conditional variance is positively correlated with the return and vice versa.

EGARCH model
The Exponential GARCH model was proposed by Nelson (1991) based on the logarithmic version of conditional volatility. The benefit of the EGARCH model is that it places no restrictions on parameters, which allows negative coefficients in the model. Therefore, even if negative parameters exist in the equation, the conditional variance will remain positive. The EGARCH (1,1) equation is applied as follows: where the parameter indicates the leverage effect which captures the impact of asymmetric news. If the leverage parameter is positive, it demonstrates that good news (positive shock) will reduce the future volatility. However, when bad news (negative shock) increases future volatility, the leverage effect will be negative and the term a 1 will capture the volatility clustering effect.

TGARCH model
The Threshold GARCH model (also called as the GJR model) is one of the bestknown and most commonly used asymmetric models to measure and handle with possible asymmetries, such as leverage effects. This model was developed by Zakoian (1994), but also studied by Glosten et al. (1993) as the Gloster-Jagannathan-Runkle GARCH (GJR-GARCH). In the TGARCH (1,1) model, the variance equation is defined as follows: where D t−1 is a dummy variable to capture the leverage effect and where the term is the leverage effect parameter. If = 0 , the specification above turns into the general GARCH (p, q) form. Apart from that, the impact of good news on volatility is a 1 , and the impact of bad news on volatility is a 1 + . Thus, with a positive and significant leverage parameter ( ) , bad news has greater effect than good news on conditional volatility ( h 2 t ).

PGARCH model
The Power GARCH (PGARCH) model was developed by Ding, Granger, and Engle in 1993. The PGARCH model differentiates itself from the other asymmetric models by using conditional standard deviation instead of the conditional variance. The power parameter is defined as and h t is used instead of h 2 t . The model is defined as follows: where a l = standard ARCH parameter, k = standard GARCH parameter, l = leverage parameter.
The leverage parameter l captures the asymmetric effects of previous shocks. When the power parameter = 2 , the equation turns into a classic GARCH model, and when = 1 , the model estimates conditional standard deviation instead of conditional variance.

Forecasting method
Out-of-sample tests are widely considered as the "gold standard" of the forecast evaluation, and according to the "conventional wisdom", the forecasts of the estimated models should be evaluated by conducting out-of-sample fit rather than generating the same set of data that was used to estimate the model's parameters, which is called an "in-sample" forecast. Bartolomei and Sweet (1989) and Pant and Starbuck (1990) show that even the best in-sample forecasts may not be successful to forecast post-sample data. Furthermore, throughout the empirical studies, in-sample forecasting performance is found to be less reliable compared to out-of-sample tests, which may be due to the vulnerability to outliers and data mining (White, 2000). Therefore, out-of-sample forecast is seen as the "ultimate test of forecasting model" by econometricians and forecasters (Stock and Watson 2015, p. 571).
Out-of-sample forecasts can be estimated using two different methods which are known as recursive forecast and rolling window forecast. The recursive forecast sets a fixed initial sample data starting from t = 1, … , T to fit the models, and L step ahead forecast is computed for out-of-sample prediction starting from time T until no more L step ahead forecast can be counted. On the other hand, the rolling window forecast sets fixed initial sample data starting from t = 1, … , T to estimate the model and specify the window length. Out-of-sample forecast begins from time T , and both the start and the end estimation dates consecutively increase by one observation where the model is re-estimated each time from t = 2, … , T + 1 . L step ahead out-of-sample forecast is computed beginning with time T + 1 until no more L step ahead forecast can be counted.
For each index, forecasting models are estimated using recursive and rolling window methods and assessed by out-of-sample performance. The maximum likelihood method has been used to estimate parameters. The choice of window size for out-of-sample forecasting is controversial, since there is no satisfactory solution for the optimal length. However, to keep the competence of the estimated parameters robust and avoid non-convergence problems, adequately large estimation size is recommended, especially in the applications of richly parameterized GARCH family models (Peseran and Timmerman 2007;and Inoue et al. 2014). Therefore, the whole sample period is divided into two samples in each frequency and a hold-out sample for the out-of-sample forecast is chosen as a second half, with parameters estimated based on the first half. In this context, a similar procedure has been followed with earlier works, such as those of Akgiray (1989), Pagan and Schewert (1990), Brailsford andFaff (1996), andBrooks (1998). Sample periods and sample sizes can be seen in Table 1.

Forecast performance evaluation
Great decisions are based on great forecasts. There are a wide selection of procedures available in the literature to evaluate the most accurate forecasts. In this study, the most common and important error measures are chosen to evaluate the predictive accuracy of selected volatility models. Nevertheless, there is no consensus about which error function is more suitable to assess the models. Therefore, instead of focusing on a single criteria, five different loss functions were determined for producing forecasts. These loss functions are Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Quasi-Likelihood (QLIKE) and Mean Squared Error (MSE).

Mean absolute error (MAE)
MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight. The mean absolute error is given by: where n denotes the rank of forecasted data, 2 t is the true volatility series which is obtained by the squared return series and ̂ 2 t is the forecasted conditional variance at time t acquired by using GARCH family models.

Mean absolute percentage error (MAPE)
MAPE is the sum of the individual absolute errors divided by each period separately. In other words, it is the average of the percentage errors. The advantage of the MAPE is that it is easy to interpret and helpful for comparing the performance of the estimated volatility models. The mean absolute percentage error is defined as follows:

Root mean square error (RMSE)
RMSE is the square root of the average of squared differences between prediction and actual observation. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE is most useful when large errors are particularly undesirable. Its value can only be positive, and a value of zero (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSE is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used. The following formula is given for the root mean square error:

Quasi-likelihood loss function (QLIKE)
The term quasi-likelihood function was introduced by Wedderburn (1974) to describe a function that has similar properties to the log-likelihood function. In QLIKE loss function, the mean and the variance is specified in the form of a variance function giving the variance as a function of the mean. Patton andSheppard (2009), Patton (2011), and Conrad and Kleen (2018) revealed that the squared error loss tends to be more sensitive to extreme observations than QLIKE, which provides further motivation for using QLIKE in volatility forecasting applications.

Mean squared error (MSE)
MSE is another popular accuracy measure in the empirical financial literature developed by Bollerslev et al. (1994) to gauge the forecasting performance of volatility models. As a distinctive feature, it has the tendency of penalizing large forecast errors compared to other loss functions, and thus it is recognized as one of the most appropriate measures in terms of dealing with imperfect volatility proxy (Patton 2011). The mean squared error is given as follows:

Forecast comparison test (DM-test)
In order to evaluate the predictive accuracy of two competing models, the Diebold-Mariano test (hereafter, the DM test) is employed. Diebold and Mariano (2002) introduced an approach for testing of the null hypothesis of no difference for the equal forecast accuracies between two sets of competing models. The test can be applied with any error criterion such as straight differences, absolute differences or squared differences. Furthermore, it is able to incorporate autocorrelation between the given series. The DM test is widely employed in the empirical finance literature with various adaptations: see Xekalaki and Stavros (2010), Curto and Pinto (2012), Gilleland and Roux (2015), and Coroneo and Iacone (2018). Consider two sets of competing forecast sequences, defined as: and define the equation of difference between actual value y t y t ∶ t = 1, 2, … T and the predicted value f it as The accuracy of each forecast is gauged by the loss function: The loss functions adopted for this study are the absolute-error loss function: and the Squared-error loss function: and the loss differential between the two forecasts is defined by: To assess whether the two competing forecasts have same predictive ability, the equal accuracy hypothesis is considered. The null hypothesis of DM test is given as: versus the two-sided alternative hypothesis of one of the two forecasts have better accuracy: Then, the DM test statistic can be expressed as:

Data
Asia is divided into two regions: developed and emerging economies. The highly developed countries include Japan and the four Asian Tigers-Hong Kong, South Korea, Taiwan, and Singapore. China and Malaysia are other major economic forces which are considered an important powerhouse in the region, however, academics often classify these countries as "developing"; see Johansson andLjungwall (2009), Luo et al. (2010), Jayasuriya (2011), Zhang et al. (2013), and Li and Giles (2015). Besides this, the Shanghai Stock Exchange was founded in 1990, 99 years after the Hong Kong Stock Exchange, which was founded in 1891. Even today, most of mainland Chinese companies are listed in Hong Kong. Therefore, the Chinese stock market will be evaluated in the emerging markets category. In this study, ten Asian countries have been selected for investigation and their widely accepted indices have been chosen. The five developed market indices that have been added are as follows: the Nikkei 225 Index ( Daily, weekly and monthly time series data is obtained from the Bloomberg database to ensure the reliability and accuracy of older data. The overall sample period covers 25 years in total, starting from November 1993 to May 2018. However, one problem was the limitation on accessing older data in higher time frames, and thus Daily log returns the daily and weekly data start from 1994 instead of 1993. Another challenge was non-synchronous holidays in different markets which may cause computational difficulties and negatively effect the output of the models. Therefore, the data range has been chosen separately for each market to not get exposed to data loss. The statistical software Eviews 10 was used for the quantitative analysis.
The main advantage of daily data is providing more information in terms of estimating volatility for applied econometric models since they are more data-intensive than simple regression models. Weekly and monthly frequencies are also estimated, since they provide a broader framework regarding volatility, and it is crucial to understand the comparison between different frequencies. In order to satisfy stationarity, closing price series are transformed to return series in all daily, weekly and monthly time frames for each index.
Return series have been obtained as shown in the following formula: where R t denotes the logarithmic return at time t . P t and P t−1 are the closing price of the index at time t and t − 1 respectively. Figures 1, 2 and 3 show that the return series are fluctuating around zero, which is evidence of the volatility clustering phenomenon. Table 2 reports the descriptive statistics of the in-sample period for each frequency. According to the tables, the mean and median are centered around zero in the daily return series, while with the reducing frequency the tendency of deviation increases, which is expected. Looking at the skewness of the series, the NIKKEI and STI indices have negative values for all selected time frames which implies asymmetric distributions skewed to the left, while the KLCI, SET and SSE indices report positive skewness for each frequency suggesting asymmetric distributions skewed to the right. For the remaining five indices, the direction of skewness changes depending on the selected frequency. Where the kurtosis is concerned, the given values from all tables indicate a leptokurtic characteristic, which signifies the existence of (24) R t = log(P t ∕P t−1 ) * 100 fatter tails. Lastly, the Jarque-Bera test statistic for normality rejects the null hypothesis that returns follow a normal distribution. Table 3 demonstrates the forecasting performance for daily return series based on the calculation of Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), Quasi-Likelihood (QLIKE) and Mean Squared Error (MSE) using the recursive approach. The overall results of forecasting performance inform that the EGARCH and TGARCH models perform better than the rest of the models in the HANG SENG, STI, SET, JCI, TAIEX, KOSPI and PSE indices. These findings are also in line with the studies of Liu and Morley (2009), and Wei-Chong et al. (2011), in which they find that asymmetric models outperform in the stock markets of Hong Kong and Japan respectively. The results also indicate that the GARCH-M model outperforms in KLCI index, based on MAE and QLIKE statistics, while the GARCH and GARCH-M models equally outperform in the SSE index, which is steady with the findings of . The KLCI index is the only index that shows mixed results, since EGARCH has minimum values for both RMSE and MSE loss functions while GARCH-M indicates the smallest numbers under the MAE and QLIKE statistics. Lim and Sek (2013) had similar results on the Malaysian stock market, which shows that the Malaysian market tends to produce more complicated results and requires more detailed examination.

Empirical results
On the other hand, based on Table 4 which reports the results for the rolling window method, there is no symmetric model that performs better than the asymmetric models. Asymmetric models dominate in all the selected markets, with the leading of EGARCH model except for the HANG SENG index, where the PGARCH model has clear superiority based on the four out of five statistics. The reason may arise from these two issues. First, due to the nature of symmetric GARCH models, they are not able to capture the leverage effect of volatility, and Asian stock markets tend to exhibit volatility asymmetry phenomenon. Second, the rolling window method does not allow the use of all available data to generate forecasts as in the recursive method, which may lead to potential estimation problems. However, as Table 3 shows, asymmetric models have superiority in most of the indices as well. The values between recursive and rolling window methods are highly mixed. Regardless of the models, a comparison cannot be conducted based on the error statistics since each method provides results in its own terms. Therefore, daily results do not suggest any significant superiority between these two methods. A general conclusion for the daily forecasting results is that in most circumstances, the asymmetric models provide a smaller loss function than the symmetric models. Based on the error measures no specific model emerges as unconditionally best. Yet, in the presence of EGARCH, asymmetric models seem to outperform, especially in the developed markets, which contradicts the findings of  to some extent. As asymmetric models reduce the forecast errors in emerging markets, the findings are relatively consistent and conclusive that asymmetric models perform best compared to the symmetric models. The conclusion is that asymmetric models provides smaller loss functions than symmetric models in some markets but symmetric models have no clear superiority for daily return series among the ten Asian markets, except for the recursive GARCH and GARCH-M models in the SSE Index. Therefore, according to the provided results, asymmetric models should be the best choice for market participants regardless of their degree of risk preference. Table 5 presents the recursive forecasting results for the weekly return series, and one can see that the values in loss functions are higher compared to the daily forecasts, except for the MAPE, which is expected since it provides percentage errors. For the JCI index, the EGARCH model clearly outperforms the rest based on the four out of five loss functions. For the NIKKEI, STI, SSE and PSE indices, the EGARCH model is still favorable since it provides the smallest errors in MAE, RMSE and MSE error statistics, except for the QLIKE in all four cases. On the other hand, the HANG SENG Index is dominated by the TGARCH model, which provides the lowest values in all error statistics, consistent with the study done by Liu and Morley (2009). The remaining four indices are quite inconclusive, having no single volatility model that is preferred based on all five error statistics. However, focusing on the KLCI index, GARCH-M outperforms the rest under the MAE, RMSE and MSE error functions, with the GARCH model being best under the remaining two: an outcome which contradicts the study by Wong    (2005), yet supports the findings of Brailsford and Faff (1996). The best forecasting model for Thailand's SET index is PGARCH under the RMSE, QLIKE and MSE loss functions, EGARCH under MAE, and TGARCH under MAPE which is in line with the findings of Wong and Kok (2005). The SSE and TAIEX indices are inconclusive, with both the symmetric and the asymmetric models having superiority. Table 6 shows the rolling window forecasts for weekly series, which is slightly different compared to recursive forecast results. Asymmetric models have clear superiority for the NIKKEI, HANG SENG, SET, JCI, TAIEX and PSE indices. These results are consistent with those of Awartani and Corradi (2005) and Evans and McMillan (2007), which reveal supportive evidence for asymmetric GARCH models producing more accurate predictions in volatility forecasting. The results also display that the asymmetry effect should be considered by investors when they deal with the Asian markets mentioned above. Furthermore, the STI, SSE and KOSPI indices present mixed results where volatility prediction can be examined by employing either symmetric or asymmetric GARCH models. The KLCI index is dominated by predictions of symmetric models which does not support the findings of Balaban et al. (2006) where they recommended asymmetric models for the Malaysian stock market. According to these results, the Malaysian stock market does not seem to follow an asymmetric volatility pattern, and therefore investors can rely on the predictions of symmetric GARCH models in the medium term.
The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model)   Table 7 reports the monthly out-of-sample forecasting results based on the recursive method. Statistical values increase with the reducing frequency compared to daily and weekly time frames, which is expected, except for the percentage-based loss function MAPE. The results are very surprising compared to daily and weekly outcomes. The only superiority for asymmetric models is reported from the STI index, where the PGARCH and EGARCH models are recommended based on the MAPE and remaining loss functions respectively. The NIKKEI, HANG SENG, SSE, KOSPI and PSE models indicate mixed results and are fairly incomplete in terms of the most preferred model, yet either symmetric or asymmetric models can be conducted for prediction depending on the selected loss functions. Still, it can be said based on the estimated results that these five markets are indecisive and neither symmetric nor asymmetric models dominate each other which supports the earlier work of Ng and McAleer (2004). On the other hand, symmetric models dominate in the SET and TAIEX indices except for the MAPE statistic, which suggests EGARCH superiority. The smallest error values are provided by the symmetric GARCH models under all statistics for the KLCI and JCI indices which is in line with the findings of Minkah (2007) and Lee et al. (2017). Thus, the GARCH and GARCH-M models can be the best forecast models in this two markets for either econometricians or other market participants.
Based on the rolling window forecast results as presented in Table 8, asymmetric models are clearly superior in the NIKKEI and SET indices, while EGARCH The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model)  models are the single superior type based on all statistics in the JCI index. This is very surprising, since the recursive method recommends symmetric GARCH models for the JCI index, whereas the rolling window method does not offer a recommendation at all. In addition, the GARCH and GARCH-M models dominate in the HANG SENG index, which supports the findings of Gokcan (2000), yet contradicts the studies of Morley (2009), andSabiruzzaman et al. (2010) where they recommend the EGARCH and TGARCH models respectively for Hong Kong stock market returns. The remaining indices are indecisive and inconclusive in terms of the dominance of symmetric and asymmetric GARCH models, yet they support the work of Etac and Ceballos (2018). Tables from 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 report pairwise Diebold and Mariano test results for a further evaluation of the performance in selected forecasting models for each selected index. In the tables, DM(A) and DM(S) indicate DM test statistics based on the absolute-error loss and the squared-error loss, respectively. Their corresponding p-values are also attached for each statistic to show the level of significance.
The conducted DM test results are mostly in line with the forecasting results, as can be seen from the tables below. A considerable portion of the pairwise comparisons show that the forecasting accuracy of one of the selected models is better based on the value of the error loss. Specifically, the daily results provide more significant values based on the absolute-error loss criteria for both recursive and The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model)  The weekly results are more indecisive compared to the daily DM test results. The DM statistics for the NIKKEI and HSI Indices are less than 1.96, and therefore the zero hypothesis cannot be rejected. Thus, the observed difference between the forecasting performance of selected models is not significant and might be due to stochastic interference, which is in line with the findings of Burda and Bélisle (2019). The STI and KLCI indices also do not provide noteworthy test results based on the squared-error loss criterion. However, the remaining indices indicate similar results with empirical forecasting results for both the recursive and rolling window methods.
Finally, the forecasting comparison for the monthly return series reports significant forecasting accuracy for superior models, especially those based on the absolute-error loss. On the other hand, the DM statistics based on the squarederror criteria provide weaker results due to the smaller values for both recursive and rolling window methods: that is to say, the zero hypothesis cannot be rejected.
Summarizing the results listed in the following tables shows that the DM test results are highly consistent with the empirical volatility forecasts, indicating that The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model)

Summary and conclusion
The present paper examines the volatility forecasting ability of the GARCH-type econometric models based on recursive and rolling window methods for ten Asian stock markets, inspired by the theoretical gap in model accuracy and the practical need for more comprehensive evidence for the selected markets and models. Five GARCH models are considered, namely GARCH, GARCH-M, EGARCH, TGARCH and PGARCH models where the first two represents symmetric and the remaining three represents asymmetric models. Daily, weekly and monthly return series data has been used and the evaluation of the forecasts are determined by using five different error statistics. Based on the empirical analyses, GARCH-type models can appropriately adapt to the volatility behavior of Asian stock indices and provide a satisfactory degree of forecast accuracy in all selected time frames. The superiority of asymmetric models is more evident for higher time frames, while symmetric models tend to outperform in lower time frames. More precisely, the EGARCH model generates the most accurate volatility forecasts, closely followed by the TGARCH and PGARCH models for The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model)  the daily and weekly frequencies, indicating that asymmetric specification of volatility dynamics needs to be taken into account: a finding which is in line with the study of Anggita (2020). This outcome also further implies that the asymmetric models might be more appropriate than the symmetric models when applying risk management strategies for Asian stock markets. This result is contradictory to Sharma et al. (2021), in which they argue that linear GARCH models are superior to non-linear. One potential explanation is that the present paper considers student's t-distribution to accommodate fat tails and excess kurtosis, which reduces the chance of bias and the supports to capture of volatility asymmetries in the non-linear models. However, when it comes to monthly return series, the GARCH-M model gains more attention and the superiority of non-linear models decrease compared to higher time frames. Moreover, the use of three different frequencies not only implies that not just the ranking differs when applying various error statistics, but also shows how significantly it can differ. There is an important controversy over the fact that one error statistic suggests that a particular model is the best, while another error statistic suggests that the same model to be the worst. This highlights the importance of choosing a proper error statistic for the intended purpose of the forecast. For a better visualization of the performance of the employed models and overall conclusion, Tables 19, 20 and 21 have been created. According to Table 19, the EGARCH model is clearly superior for both methods, followed by the TGARCH model. Performance records of the GARCH, PGARCH and GARCH-M models The table presents results from estimating regressions of volatility for each model and market. Columns indicate the particular loss functions, while the rows show corresponding volatility models under the selected markets. Numbers in bold demonstrate the minimum forecast error (preferred model) 1.The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t-statistics indicates the benchmark model is dominated by (dominates) the corresponding model 1. The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t-statistics indicates the benchmark model is dominated by (dominates) the corresponding model 1. The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t-statistics indicates the benchmark model is dominated by (dominates) the corresponding model  1. The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t-statistics indicates the benchmark model is dominated by (dominates) the corresponding model 1. The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t-statistics indicates the benchmark model is dominated by (dominates) the corresponding model  1. The columns labelled DM(A) and DM(S) contain t-statistic based on absolute and squared prediction errors, respectively. 2. The null hypothesis of DM-test is that of equal predictive ability of the two models; a significantly positive (negative) t statistics indicates the benchmark model is dominated by (dominates) the corresponding model

Table 19
Summary of performance ranking of the models for daily return series

Table 20
Summary of performance ranking of the models for weekly return series

Table 21
Summary of performance ranking of the models for monthly return series The "Best" and the "Worst" columns in Tables 19, 20 and 21 indicate the number of times that the selected model is ranked as the best or the worst based on the corresponding loss function. The "TOTAL" column summarizes the total number of times a forecasting model is ranked as the best (worst report double-digit numbers in terms of worst overall performance, while the EGARCH model does not report any numbers among the worst performers for either method. This makes the model is a clear winner and highlights the asymmetric specification of volatility dynamics in daily return series. Rolling window GARCH and GARCH-M models do not provide any accurate forecast values and are therefore the worst performers. These results are consistent with Awartani and Corradi (2005), Hansen and Lunde (2005), and Evans and McMillan (2007). Table 20 indicates that the EGARCH and TGARCH models provide the lowest error statistics in total compared to other models, which make them the best performers for weekly return series. Surprisingly, the PGARCH model becomes the worst forecasting model based on the reported values. The GARCH and GARCH-M models increase in forecasting powers compared to daily results, which suggests that, symmetric models should be considered for a better risk management purposes in selected Asian markets for weekly returns series. The results are in partially in line with those of Ng and McAleer (2004), Liu et al. (2009), Mwita andNassiuma (2015), and Sharma (2016).
Based on the reported values by Table 21, EGARCH still provides a strong forecast performance record compared to its asymmetric counterparts, while GARCH seems to be the best forecasting model for monthly return series. This may be due to the reducing asymmetric volatility dynamics in the lower frequencies. Furthermore, the GARCH-M model indicates mixed results, which seem to be penalized more heavily by the rolling window method, while the recursive method put it among the best performers. The PGARCH model is the clear loser, followed by the TGARCH model. These findings are in line with Balaban (2004), but contradicts with Atoi (2014), which recommends the PGARCH model as a best performer.
Through the analyses above, the following conclusion can be drawn.
• Symmetric and asymmetric GARCH models can be applied to Asian stock markets. Although these models were developed and widely used in the process of researching Western financial markets, it does not obstruct the use of them in emerging or developed Asian financial markets. • In terms of the time series perspective, the volatility behavior of Asian markets indicates considerable clustering and time-varying events. This is more evident during the turbulent times, such as the 1997-1998 Asian crisis and the 2008 US subprime crisis, due to the information shock on the markets reflecting the phenomenon whereby large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes. • Given the level of risk associated with investment in stock markets, day traders, investors, financial analysts and empirical finance professionals should consider alternative error distributions while specifying predictive volatility model as less contributing error distributions implies incorrect specification, which could lead to loss of efficiency in the model. Investors should also not ignore the impact of news while forming expectations on investments.
• Frequency of the data and choice of forecast method have a strong effect on model performance, and therefore, depending on the investment perspective and risk sensitivity, the correct method and time frames should be applied.
The out-of-sample performance of the compared volatility models in terms of the different loss functions based on the three data sets, thus suggests a bit of a challenge. It is far from evident which of the specific conditional volatility models outperforms the other. First, the ranking of models based on a specific loss function differs for the three data sets. Secondly, for the selected markets the best and worst model depends heavily on which loss function is used. To answer which model has the best out-of-sample performance, one must first consider the specific data set used and then which loss function to use as the criteria.
The main limitation of this study is data availability, especially for the higher frequency of data in the emerging countries of Asia. Further research could explore a wider sample of financial markets-Vietnam, India, Russia and other countries in Asia-with more up-to-date data considering the recent COVID-19 crisis and the war between Ukraine and Russia. This would explore how the news information impacts volatility behavior across stock markets of Asia. Another agenda for future research could include a wider set of GARCH family models to test and estimate the forecasting accuracy of a wider sample.
Funding Not applicable.

Data availability
The data that support the findings of this study are available from Bloomberg database upon subscription.

Code availability
The codes that support the findings of this study are available from the author on request.

Conflict of interest
The author states that there is no conflict of interest.
Ethical approval This article does not contain any studies with human participations or animals performed by the author.

Consent for publication
The author provides consent for publication if accepted.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission