Abstract
Given the unique institutional regulations in the Chinese commodity futures market as well as the characteristics of the data it generates, we utilize contracts with three months to delivery, the most liquid contract series, to systematically explore volatility forecasting for aluminum, copper, fuel oil, and sugar at the daily and three intraday sampling frequencies. We adopt popular volatility models in the literature and assess the forecasts obtained via these models against alternative proxies for the true volatility. Our results suggest that the long memory property is an essential feature in the commodity futures volatility dynamics and that the ARFIMA model consistently produces the best forecasts or forecasts not inferior to the best in statistical terms.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we are concerned with volatility forecasting in the Chinese commodity futures market. Volatility modeling and forecasting is a much devoted area of research as volatility is considered the "barometer for the vulnerability of financial markets and the economy" (Poon and Granger 2003, p. 479) and central to asset pricing, derivative valuation, portfolio allocation, and risk management. We are interested in this particular market in part because it has become an important part of the global futures markets with tremendous trading volume.Footnote 1 \(^{,}\) Footnote 2 More importantly, this market is regulated by two unique institutional rules that makes it interesting to explore.
The first regulation is the time-dependent margin rate, whereby the margin as a fraction of the contract value increases as contracts move closer to delivery. Take sugar as an example. The margin rate for deposit two months prior to delivery is 6 % of the contract value for an investor. In the month before delivery, it increases to 8 % in the first 10 days, 15 % between the 11th to the 20th day of the month, 25 % in the final 10 days of the month, culminating to 30 % in the delivery month.Footnote 3 The second regulation is that, although they represent 97 % of all investors in the futures markets, individual investors are not allowed to trade nearby contracts.Footnote 4 Both regulations effectively push market participation and trading volume to more distant contracts with implications for market liquidity.
Our contribution to the literature is that we take into account unique institutional regulations of this market and design empirical volatility forecasting exercises that are appropriate for the characteristics of the market and the data it generates. Our data on aluminum, copper, and fuel oil consistently show that contracts with three months to delivery enjoy the best liquidity. We are not the first to note this pattern (see Liu et al. 2014; Peck 2008), but we are the first to offer solid and detailed evidence. Using 5-min returns data over long sample periods, we compute three popular liquidity measures that capture different aspects of liquidity, namely the effective spread of Roll (1984), the proportion of zero returns of Lesmond et al. (1999), and the Amihud (2002) illiquidity measure (Goyenko et al. 2009). Our results show that contracts with three months to delivery are the most liquid as they exhibit the lowest effective spread, the lowest percentage of zero returns, and the smallest value for the Amihud (2002) illiquidity measure. This is different from the majority of futures markets and contracts for which the nearby contracts are usually the most liquid (see Baillie et al. 2007; Lee 2009; and the references therein). Crucially, this liquidity pattern results from the unique institutional environment in which trading takes place.
On the other hand, being an emerging market, the Chinese commodity futures market exhibits large proportion of zero returns (Bekaert et al. 2007) and this is particularly evident in our 5-min return series. Even for the most liquid 3-month to maturity contracts, the fraction of zero returns is as high as 36.27, 23.90, and 31.50 % on average, respectively, for aluminum, copper, and fuel oil. In the existing literature, intraday data are widely adopted for volatility modeling and forecasting as they are shown to contain more information and provide more accurate and efficient forecasts (see Fuertes et al. 2015; Hseu et al. 2007; Shi and Lee 2008; and the references therein). However, the large proportion of zero returns in our data suggests that higher data sampling frequency does not necessarily translate into better forecasting performance due to information loss or noise in the data (Bandi and Russell 2006; Phillips and Yu 2009). Hence we choose to perform volatility forecasting by aggregating 5-min data into 15-, 30-, and 60-min intraday returns and compute daily returns from daily prices so that we can observe and compare how good different models are at capturing the volatility dynamics given the data.
Equally important for the volatility forecast comparison is the choice of the true volatility proxy. While true volatility is a latent variable that cannot be observed in the market, an efficient and accurate representation of it is of great importance for the evaluation of volatility forecasts [see Andersen et al. (2010) for an excellent survey]. In this paper, we undertake three different proxies for the true daily volatility. In addition to the widely adopted realized volatility measure of Andersen and Bollerslev (1998), we also consider the median-based measure of Andersen et al. (2012) and the range-based proxy advocated by Parkinson (1980), both of which are shown to be robust to zero returns, potential jumps in the underlying price dynamics, and other microstructure related effects.
In terms of volatility models, we begin with the conventional generalized autoregressive conditional heteroskedastic (GARCH) model of Bollerslev (1986, 1990). Our choice of models is also motivated by Baillie et al. (2007), which document strong long memory properties in commodity futures and argue that the fractionally integrated GARCH (FIGARCH) model captures this feature very well. At the same time, a natural alternative that works well at capturing the long memory property in realized volatility is the autoregressive fractionally integrated moving average (ARFIMA) model of Granger (1980) and Granger and Joyeux (1980). The two models differ in the manner in which information is extracted from intraday data: intraday returns are first aggregated to obtain daily realized volatility before the ARFIMA model is adopted to describe and forecast realized volatility at the daily level; whereas for the FIGARCH model, deseasonalized intraday data are directly fed into the model. So it is empirically interesting to compare the performance of the two models using our data.
Our empirical analysis reveals a host of interesting findings. First, in terms of the out-of-sample forecasting performance, the Diebold and Mariano (1995) and West (1996) test applied on a pairwise basis and the superior predicative ability test of Hansen (2005), which tests across alternative models simultaneously, suggest that the ARFIMA model consistently outperforms the GARCH-type models in the out-of-sample tests. It is the best performing model in 11 out of 15 commodity/volatility proxy combinations, and for the remaining four combinations the difference between the forecasting performance of the ARFIMA model and that of the best performing model is statistically insignificant at any conventional level. In other words, the ARFIMA model consistently produces the best forecasts or forecasts not inferior to the best in statistical terms.
It highlights the importance of incorporating the long memory dimension in volatility modeling in line with the literature. This finding also contributes to the discussion in the literature of whether the FIGARCH or the ARFIMA model is empirically better at capturing the long memory feature in the volatility dynamics (Chortareas et al. 2011). Given that the intraday Chinese commodity futures data contain large proportion of zero returns which are directly fed in the FIGARCH model, it is not surprising that the ARFIMA model performs better.
Second, we show that within the GARCH family of models, the forecasting performance using the daily data is consistently as good as, if not better than, those using the intraday data. This finding suggests that the GARCH-type models may not be very efficient in utilizing the information contained in the intraday data of this particular market for volatility forecasting purpose due to high percentage of zero returns.
Finally, it is interesting to note that although sugar contracts with January maturity and November maturity differ massively in terms of trading volume and show different levels of liquidity, the underlying volatility dynamics is nevertheless captured by the same model at the same data sampling frequency. For example, when the median- and range-based proxies are adopted, both futures contracts are best forecasted by the AFRIMA model using daily realized volatility obtained from the 60-min returns. This further suggests that the ARFIMA model is a reliable and robust tool for forecasting volatility regardless of the underlying liquidity level with practical implications for traders and risk managers.
The rest of the paper is structured as follows. In Sect. 2, we briefly outline the alternative volatility models, the proxies for the true volatility dynamics, and the statistical metrics for the out-of-sample volatility forecasts evaluation. Section 3 describes the data and the model estimates. In Sect. 4, we discuss and analyze main empirical findings. Finally, Sect. 5 concludes. Details of the three liquidity measures are provided in the "Appendix".
2 Models and statistical evaluation
2.1 Volatility models
In this paper, we consider four popular volatility models at four different data sampling frequencies for volatility modeling and out-of-sample forecasting. In particular, we make use of the: (1) intraday GARCH, integrated GARCH (IGARCH), and FIGARCH models at the 15-, 30-, and 60-min intervals; (2) daily GARCH, IGARCH, and FIGARCH models; and (3) ARFIMA model applied to the daily realized volatility computed from the 15-, 30-, and 60-min intervals. The model specifications are briefly outlined below.
2.1.1 GARCH model
The GARCH model is the workhorse in the volatility estimation and forecasting literature (see Bollerslev 1986, 1990; among others). We use an ARMA(1,1) process in the conditional mean equation of the GARCH-type models. To allow for possible fat tails, we model the innovations in the GARCH process as independently and identically distributed Student’s t-distribution while implementing the ARMA(1,1)-GARCH(1,1) model using both intraday and daily data. The model specification is given by
where \({\tilde{r}}_{t,n}\) is the deseasonalized logarithmic return on day t for the nth time interval [see Eqs. (10)–(12)], \(\mu \), \(\gamma \), and \(\theta \) are the parameters of the conditional mean equation, and \(\omega \), \(\alpha \), and \(\beta \) are the parameters of the conditional variance equation.Footnote 5 The error term \(\varepsilon _{t,n}\), which is conditional on the information set \(\Omega _{t,n-1}\), follows a Student’s t-distribution (denoted by \(D_v\)) with zero mean, variance \(h_{t,n}\), and v degrees of freedom. The GARCH model requires that \(\alpha +\beta <1\) for the volatility process to be stationary. For the IGARCH model, however, the corresponding requirement is \(\alpha +\beta =1\).
2.1.2 FIGARCH model
The FIGARCH model extends the conditional variance equation of the standard GARCH model by adding fractional differences in order to allow for long memory property of the GARCH volatility process (Baillie et al. 1996; Baillie and Morana 2009). Following Baillie et al. (2000), we implement an ARMA(1,1)-FIGARCH(1,d,1) model given by
where \(\omega \), \(\beta \), and \(\varphi \) are the parameters of the conditional variance equation, d is the order of fractional integration, \(L_1\) is the lag operator on n, and \(D_v\) is the Student’s t-distribution defined above.
2.1.3 ARFIMA model
Granger (1980) and Granger and Joyeux (1980) introduce a flexible class of long memory processes based on realized volatilities not belonging to the ARCH family. It has been widely adopted in the literature when long memory properties are assumed in the data (see Martin and Wilkins 1999; Pong et al. 2003; and the references therein). The ARFIMA (p, d, q) model for a process \(y_t\) is defined as
where d is the order of fractional integration and \(L_2\) is the lag operator on t. The AR and MA polynomial components are given as \(\phi (L_2)=1+\phi _1 L_2+\cdots +\phi _p L_2^p\) and \(\theta (L_2)=1+\theta _1 L_2+\cdots +\theta _q L_2^q\), respectively, and \(\mu \) is the mean of \(y_t\). In the empirical estimation of the ARFIMA (p, d, q) model, we follow Andersen et al. (2003) and replace \(y_{t}\) by the log of the daily realized volatility [denoted as \(\log ({\hat{\sigma}}_{t})\)] obtained from the 15-, 30-, and 60-min returns.
2.2 True volatility proxies
2.2.1 5-min realized volatility
The most popular proxy for the unobservable true volatility is the realized volatility measure proposed by Andersen and Bollerslev (1998). This is obtained by aggregating the intraday squared returns. We follow this approach and use a realized volatility series constructed from 5-min log price series, which is the highest frequency in our data. The proxy is given by
where \({\hat{\sigma}}_{rv,t}^2\) is the realized variance for day t and \(r_{t,n}^2\) is the squared 5-min (log) return on day t for interval n (\(n=1,2,\ldots ,N)\).
2.2.2 Median-based volatility
The second proxy we exploit for true volatility is the median-based volatility measure introduced by Andersen et al. (2012). The measure is robust to jumps in the underlying return dynamics and to small ("zero") returns. The median-based true volatility proxy is defined as
where \({\hat{\sigma}}_{med,t}^2\) is the median-based variance for day t and \(|\Delta r_n|\) is the absolute return over the nth interval on day t.
2.2.3 Range-based volatility
The third proxy for true volatility is the range-based measure proposed by Parkinson (1980). It has been further refined and adopted in Garman and Klass (1980), Yang and Zhang (2000), and Li and Hong (2011). Taking into account of daily high and low prices, this measure is able to deal with microstructure biases in the market. The proxy is defined as follows:
where \({\hat{\sigma}}_{rng,t}^2\) is the range-based variance for day t , and \(H_t\) and \(L_t\) are the daily high and low prices, respectively.
2.3 Forecasting accuracy
We use three different metrics to evaluate the out-of-sample forecasting accuracy of the volatility models, all of which are commonly adopted statistical measures in the literature (see, for example, Ahmed et al. 2016).
2.3.1 Root mean squared forecast error
The root mean squared forecast error (RMSFE) compares the true volatility with the forecasted volatility from a given model and is computed as
where R is the number of daily observations, \({\hat{h}}_{t+1}\) is the variance forecast, and \({\hat{\sigma}}^2_{t+1}\) is the chosen proxy for true variance in the out-of-sample period.
2.3.2 Diebold and Mariano (1995) and West (1996) test
The second out-of-sample statistical metric of accuracy is the Diebold and Mariano (1995) and West (1996) MSFE t-statistic, which in our case tests whether a competing volatility model outperforms the benchmark volatility model by generating more accurate variance forecasts. We chose the benchmark model based on the lowest RMSFE. The test statistic is as follows:
where \(\Delta Loss_{t+1}\) is the difference between the squared forecast error loss functions of the benchmark and competing volatility models and \({\hat{\Omega}}\) is the consistent estimate of the asymptotic variance of \(R^{-0.5}\sum _{t=1}^{R}\Delta Loss_{t+1}\). The null hypothesis can be expressed as
Since the volatility models are non-nested, the alternative hypothesis in this case is two-sided. The test statistic in Eq. (12) follows an asymptotic standard normal distribution under the null hypothesis of equal predictive ability. We regress \(\Delta Loss_{t'+1}\) on a constant and obtain the \(\text{MSFE}{\text{-}}t\) statistic for a zero coefficient based on the Andrews and Monahan (1992) estimator. A positive (negative) and statistically significant \(\text{ MSFE}{\text{-}}t\) statistic suggests that the competing model outperforms (is outperformed by) the benchmark volatility model.
2.3.3 Superior predictive ability test
To address the multiple-testing problem in the light of data mining, we conduct the superior predictive ability (henceforth SPA) test of Hansen (2005). Under the composite null hypothesis, there is no predictive ability across all competing volatility models. In other words, the null states that the benchmark model is not inferior to any of the alternative models. A rejection of the null hypothesis indicates that at least one competing model produces forecasts more accurate than the benchmark. Once again, we chose the benchmark model based on the lowest RMSFE and evaluate the out-of-sample forecasts based on the MSFE. For inference, we report stationary bootstrap p values obtained using 10,000 replications.
3 Data and estimation
The data come from the GTA Information Technology Company. We obtain contract ID, trading date, trading time, trading venue, contract expiry date, last recorded (Renminbi) price, high and low prices, and volume for 5-min time series on four commodity futures contracts: aluminum, copper, fuel oil, and sugar. The full sample period as well as the in-sample and out-of-sample periods for each commodity are provided in Table 1.Footnote 6 \(^{,}\) Footnote 7 In Panel D, we find seasonality in trading volume for each contract over the full sample period. More precisely, we observe that in terms of average number of contracts traded for each delivery, there is not much variation across the 12 delivery months for aluminum and copper, and there is a slight variation for fuel oil. In other words, the number of contracts traded is relatively stable all-year round. However, with only six delivery months per year, sugar shows a notable variation in the average number of contracts traded across the delivery months. In particular, contracts for January, May, and September exhibit huge trading volumes, while contracts for March, July, and November show the opposite. The trading volume for January delivery is the highest on average with more than 5.6 million contracts, whereas for November delivery the average trading volume is the lowest at 18,418 contracts, about 0.32 % of that for January delivery. This striking yet interesting variation naturally raises the question of how much the volatility dynamics for these two delivery months are different, if they are different at all. Hence, in the empirical exercises, we examine two futures contract series for sugar, one for the very liquid January delivery and the other for the very illiquid November delivery.
In Table 2, we report descriptive statistics of three measures adopted to describe liquidity of futures contracts at 5-min interval, which is the highest sampling frequency in our data.Footnote 8 For aluminum, the Roll spread measure for nearby contracts averages at 0.0006, zero returns account for 61 % of all 5-min returns on average in a trading day, and the scaled Amihud measure is 0.23. Comparing these figures to those for the 3 months to delivery contracts, we notice a marked improvement. In particular, the Roll spread drops to 0.0004, the percentage of zero returns decreases to 36 %, and the scaled Amihud illiquidity measure drops to 0.03. The liquidity of the futures contract series subsequently worsens with longer time to delivery. For example, aluminum contracts with 3 months to delivery are the most liquid and this liquidity decreases for contracts with longer or shorter time to maturity. The pattern is mirrored in the liquidity estimators for other commodities as well. Hence, in our volatility estimation and forecasting exercises for aluminum, copper, and fuel oil, we use futures contracts with 3 months to delivery, as they are the most liquid among all maturities, and volatility forecasts are least expected to be biased by the large proportion of zero returns. While constructing the time series on returns with 3 months to maturity for aluminum, copper, and fuel oil, we choose prices of the third month prior to delivery month until the contract reaches the first day of 2 months prior to delivery month. We then switch to next contract, which is to be matured in 3 months to make continuous time series. Hence, for these three commodities, the contract time to maturity is always around 3 months. For sugar futures, however, we are mostly interested in the effect that seasonality in trading volume has on volatility forecasting. Therefore, we take contracts from January to December for next January delivery and from November to October for next November delivery. This results in the contract time to maturity to change over time. The practice of switching contracts to the next delivery month is common in the literature (see, for example, Baillie et al. 2007).
In our sample, all commodity futures are traded for 4 h on a trading day starting at 9:00 a.m. and closing at 3:00 p.m. with a 2-h break between 11:30 a.m. and 1:30 p.m. As a result, there are 48 5-min returns on any business day. The (log) return \(r_{t,n}\) on a trading day t for the nth interval is computed as
where \(P_{t,n}\) denote the commodity futures price on day t and the end of the nth interval. The 15-, 30-, 60-min and daily returns are obtained by taking the logarithmic difference between prices that are 15, 30, and 60 min apart. The daily returns are computed as \(r_{t}=\ln P_{t}-\ln P_{t-1}\).
In Table 3, we provide descriptive statistics of commodity futures contract returns at 5-, 15-, 30-, 60-min and daily intervals. We notice that the average returns are very close to zero irrespective of contracts and data frequencies. Returns are left skewed with fat tails, although the degree of negative skewness and excess kurtosis tend to drop with decreasing sampling frequency. In addition, the percentage of zero returns drops considerably from the 5-min to daily intervals. For example, it is 31.50 % at the 5-min interval, 17 % at the 15-min interval, while only 3.60 % at the daily level for Fuel oil. The trade-off between the improvement in data quality and the loss of information at lower frequencies could be crucial for the outcome of volatility measurement and forecasting exercises. In Fig. 1, we plot the time series of 30-min returns for aluminium, copper, fuel oil, and sugar with January delivery as an example of the data we employ in this paper.
The volatility of intraday returns are known to display periodicity within a trading day, which could contaminate the estimation of conventional volatility models (Andersen and Bollerslev 1997). Following Taylor and Xu (1997), we estimate a simple seasonality term \(S_{t,n}\) by averaging the squared returns for each intraday period as follows:
where T is the number of trading days in the full sample period. The deseasonalized intraday returns are obtained as
We then make use of the deseasonlized returns to estimate the intraday GARCH family of models. In the out-of-sample forecasting, the intraday forecasts are based on the deseasonlized filtered returns and therefore transformed back to those from the original returns. This is implemented as follows:
where \({\tilde{h}}_{t,n}\) is the intraday variance forecast using the deseasonalized returns and \({\hat{h}}_{t,n}\) is the transformed variance forecast for the original returns. We produce one-step ahead daily volatility forecasts for daily models. But for intraday models, we produce 16-, 8-, and 4-step ahead forecasts for 15-, 30-, and 60-min intervals and aggregate them to transform into daily forecasts. For the ARFIMA model, it is fitted directly to daily realized volatility aggregated from intraday returns. The out-of-sample forecasts are evaluated against the daily true volatility proxies described earlier. For all sampling frequencies, we use a rolling window forecasting scheme to obtain forecasts from all volatility models.
4 Empirical analysis
4.1 In-sample results
We report the in-sample parameter estimates of the intraday GARCH, FIGARCH, and IGARCH models for five futures contracts at 15-, 30-, and 60-min intervals in Table 4. For the ARMA(1,1)-GARCH(1,1) model specification in Panel A, most of the AR parameter estimates \({\hat{\gamma}}\) are statistically significant at conventional levels. Also, the MA parameter estimate \({\hat{\theta}}\) is significantly negative in most cases, capturing the first order negative autocorrelation in the returns. All the parameters in the conditional variance equations are highly significant at the 1 % level except \({\hat{\alpha}}\) for 15-min copper contracts. The fact that \({\hat{\alpha}}+{\hat{\beta}}<1\) reveals that the GARCH process is stationary, and, since \({\hat{\alpha}}+{\hat{\beta}}\) is close to 1, the volatility process is persistent. For the contract series with return innovations following a Student’s t-distribution, the degrees of freedom parameter is between 2 and 4 and statistically significant at the 1 % level. This indicates a fat tail in the return distributions.
In Panel B, when the volatility process is described by an ARMA(1,1)-FIGARCH(1,d,1) model, we notice that the parameter d, the order of fractional integration, is significantly different from zero at the 1 % level for all futures contract series. This implies that the volatility process exhibits a long memory property and attests to the importance of adding this feature in the volatility dynamics of the commodity futures contract returns under scrutiny. It is also worth noting that, similar to the results in Panel A, the degrees of freedom parameter v is highly significant. Panel C shows the parameter estimates of the ARMA(1,1)-IGARCH(1,1) model specification and the results are qualitatively similar to those in Panel A.
Table 5 shows the in-sample parameter estimation for the daily GARCH, FIGARCH, and IGARCH models. These results are qualitatively similar to those in Table 4. We observe: (1) negative and significant first order autocorrelation in the conditional mean equation for each model and contract except for the daily IGARCH model using the sugar contract with January delivery; (2) statistically significant \(\hat{\beta }\) parameters; (3) highly significant fractional integration parameters \(\hat{d}\); and (4) highly significant degrees of freedom parameters \(\hat{v}\).
We present the in-sample parameter estimates of the ARFIMA model using the daily realized volatility obtained from the 15-, 30-, and 60-min returns in Table 6. For aluminum, copper, and fuel oil, we set the MA term \(q=0\) as it is statistically insignificant at any conventional level. The first order autoregression term \(\hat{p}\) is negative and highly significant and the fractional integration term \(\hat{d}\) hovers around 0.4 for each of these three commodities. In cases of January and November contracts for sugar, the first order autocorrelation \(\hat{p}\) tends to be positive and quite often significant. The MA parameter \(\hat{q}\) is close to \(-0.4\) and significant at the 1 % level. Similar to other commodities, the fractional integration parameter estimate for sugar is in the vicinity of 0.45 and is highly significant.
Overall, the in-sample estimates of the GARCH, FIGARCH, IGARCH, and ARMIFA models reported in Tables 4, 5, and 6 using intraday and daily data reveal that, for the four commodities, the return innovations are generally negatively autocorrelated with fat tails. Moreover, the underlying volatility processes are persistent with clear evidence of long memory properties.
4.2 Out-of-sample predictions
Table 7 reports RMSFEs for all volatility models, where forecasts errors are computed in comparison with three alternative true volatility proxies. In Panel A, we use the most widely exploited proxy in the literature, namely, the realized volatility measure constructed from the 5-min returns. It is interesting to notice that for aluminum and copper futures contracts, the IGARCH and FIGARCH models produce the smallest RMSFEs, respectively, and both at the daily level. This preliminary evidence suggests that for this particular true volatility proxy, used in computing forecast errors, information contained in intraday prices does not help in generating more accurate volatility forecasts. For fuel oil, the 30-min FIGARCH model produces the smallest RMSFE. It is also interesting to observe that although the January and November deliveries for sugar contracts differ massively in terms of trading volume (see Table 1), the ARFIMA model utilizing the daily realized volatility obtained from the 15-min returns provides the best forecasts for both futures contracts.
In Panel B, we consider median-based daily volatility as a proxy for true volatility. In this case, the ARFIMA model beats the rest of the competing models by producing the lowest RMSFE. More precisely, the ARFIMA model outperforms the other models for copper, fuel oil, and sugar (both January and November deliveries) when the daily realized volatility is obtained from the 60-min returns. For aluminum, it is the ARFIMA model using the daily realized volatility computed from the 30-min returns. Finally, in Panel C, we make use of range-based volatility as true volatility proxy. Once again, the ARFIMA model is the best performing model for four out of five commodity futures contracts. In particular, the ARFIMA model applied to the daily realized volatility obtained from the 15-min returns leads to the lowest RMSFE for copper. But for aluminum and January and November deliveries of sugar contracts, it is the the 60-min returns based daily realized volatility applied to the ARFIMA model. Fuel oil is the only exception, for which the daily IGARCH model provides the most accurate out-of-sample variance forecasts.
Taken together, we notice three interesting and consistent patterns from the preliminary results in Table 7. First, the ARFIMA model, with its long memory dimension, dominates the other three volatility models in 11 out of 15 commodity/true volatility proxy combinations. Second, GARCH-type models using daily data outperform similar models using intraday data. Third, the ARFIMA model applied to the daily realized volatility obtained from the higher frequency returns (i.e., 15-min returns) does not always beat the ARFIMA model using the daily realized volatility computed from the lower frequency returns. The latter two observations are novel for our chosen futures market because the literature seems to agree that intraday data enjoy informational advantage over daily data and that forecasting performance of the ARFIMA model improves with sampling frequency (Martens 2001; Martens and Zein 2004). We plot in Fig. 2 the time series of forecast errors between the ARFIMA model and the GARCH model using 30-min returns when the benchmark is the median-based volatility measure. It is quite evident that for the two products depicted in this figure, the ARFIMA model provides smaller forecast errors over time.
In Table 8, we provide pair-wise comparison following the well-known Diebold and Mariano (1995) and West (1996) test based on the Andrews and Monahan (1992) estimator. We choose the benchmark model in each case as the one with the lowest RMSFE in Table 7. The results suggest that the competing model forecasts are either as accurate statistically as the benchmark model, or, in most cases, significantly worse. It is interesting to notice that in Panel A, for aluminum, the ARFIMA model utilizing the daily realized volatility from the 15-, 30-, ad 60-min returns produces inferior forecasts but the difference from the benchmark is statistically insignificant. Put differently, the null hypothesis of equal MSFEs can not be rejected at any conventional level. In fact, for all model/true volatility proxy combinations, whenever the best performing model utilizes daily data, the ARFIMA model provides forecasts just as good statistically. These include the daily IGARCH model for aluminum and the daily FIGARCH model for copper in Panel A, and the daily IGARCH model for fuel oil in Panel C. For other model/true volatility proxy combinations, the competing models tend to produce statistically inferior forecasts, including both sugar contracts in Panels A and C.
As a robustness check, we provide the Diebold and Mariano (1995) and West (1996) test results obtained by sequentially using each volatility model as the benchmark, based on their increasing RMSFEs, against the remaining alternative models in Tables 10, 11 and 12. These additional results corroborate the conclusion in Table 8 that the benchmark, chosen as the one with the lowest RMSFE in Table 7, is indeed the one with the best volatility forecasting ability.
In Table 9, we perform the SPA test of Hansen (2005) to examine out-of-sample forecasting ability across all competing models and compute the stationary bootstrap p values. The null hypothesis is that the benchmark model, the one with the lowest RMSFE, is not inferior to any of the competing models. The test results are resounding. The probability that the benchmark model is at least as good as the competing models in forecasting volatility in the out of sample is 1 or very close to it. Taken together, the results in Tables 8 and 9 clearly confirm and substantiate the observations in Table 7. In other words, when intraday data are directly used in the GARCH-type models, they are no better than daily data for volatility forecasting even after deseasonalization. Hence, if a model is to be recommended for volatility forecasting in the Chinese futures market, it would be the ARFIMA model, as it is consistently the best performing model or not inferior to the best performing one statistically.
Finally, we note that although sugar contracts for January and November deliveries differ in terms of trading volume and liquidity, the underlying volatility dynamics is very similar. The in-sample parameter estimates are similar between these two series and both are best forecasted by the same model. When the 5-min realized volatility is the proxy for true volatility, the ARFIMA model using the realized volatility computed from the 15-min returns produces the most accurate forecast for both series, while the ARFIMA model applied to the realized volatility computed from the 60-min interval outperforms competing models for the other two volatility proxies for both series. In other words, seasonality in trading volume and differences in liquidity do not affect volatility model selection.
5 Conclusion
In this paper, we undertake a comprehensive volatility forecasting exercise in a futures market with unique institutional regulations. In the Chinese commodity futures market, margin rate is time-dependent and investors face higher deposit as contracts move closer to maturity. In addition, although individuals account for the majority of investors, they are not allowed to trade nearby contracts. These two regulations result in a liquidity pattern whereby contracts with 3 months to delivery are the most liquid and we demonstrate this by computing three popular liquidity measures with 5-min intraday data for aluminum, copper, fuel oil, and sugar. In addition, even these most liquid contract series contain large percentage of zero returns at the 5-min interval.
We explicitly take these features into account when forecasting volatility and utilize more distant 3 months to maturity contracts at the daily and three different intraday sampling frequencies. We demonstrate that the long memory dimension is present in our data in the in-sample volatility modeling. When it comes to out-of-sample forecasting, we show that the ARFIMA model, which aggregates intraday returns to daily level in generating daily forecasts, is the best-performing model, or equivalent to the best-performing model in statistical terms. The FIGARCH model, which also incorporates the long memory feature in the volatility dynamics, is less efficient in generating forecasts probably due to the fact that large proportions of intraday returns are zero and the deseasonalized intraday returns are directly fed into the model.
Furthermore, we show that within the GARCH-family of models, the forecasting performance using the daily data is consistently as good as, if not better than, those using the intraday data, which also attests to the trade-off between information and noise in the intraday data with many zero returns. Finally, it is interesting to note that even though January and November contract series for sugar differ massively in terms of trading volume, their underlying volatility dynamics are well captured and forecasted by the ARFIMA model at the same data sampling frequency.
Notes
See the Annual Volume Survey Report 2014 published by the Futures Industry Association, the primary industry association for centrally cleared futures and swaps based in Washington D.C., at https://fia.org. The Chinese sugar futures contracts rank 3rd globally in terms of trading volume in the Agricultural Category, while copper ranks 4th in the Metals Category.
See the document entitled White Sugar Futures (April 2009) on the Zhengzhou Commodity Exchange website http://www.czce.com.cn.
By the end of 2013, there were 2.47 million investors trading in the futures market, 2.39 million of whom were individual investors (Chinese Futures Association 2015, p. 211).
In case of daily data, \(r_{t}\), \(h_{t}\), \(\varepsilon _{t}\), and \(\Omega _{t-1}\) replace \({\tilde{r}}_{t,n}\), \({\tilde{h}}_{t,n}\), \(\varepsilon _{t,n}\), and \(\Omega _{t,n-1}\), respectively. Moreover, we do not deseasonalize daily returns used in the empirical analysis.
The starting and ending dates of the four commodities are constrained by data availability.
A brief discussion of the three liquidity measures are contained in the "Appendix".
References
Ahmed S, Liu X, Valente G (2016) Can currency-based risk factors help forecast exchange rates? Int J Forecast 32:75–97
Amihud Y (2002) Illiquidity and stock returns: cross-section and time-series effects. J Financ Mark 5:31–56
Amihud Y, Mendelson H, Pedersen LH (2012) Market liquidity: asset pricing, risk, and crises. Cambridge University Press, Cambridge
Andersen TG, Bollerslev T, Debold F (2010) Parametric and nonparametric volatility measurement. In: Ait-Sahalia Y, Hansen L (eds) Handbook of financial econometrics: tools and techniques. North-Holland/Elsevier, Amsterdam, pp 67–137
Andersen TG, Bollerslev T (1997) Intraday periodicity and volatility persistence in financial markets. J Empir Finance 4:115–158
Andersen TG, Bollerslev T (1998) Answering the skeptics: yes, standard volatility models do provide accurate forecasts. Int Econ Rev 39:885–905
Andersen TG, Bollerslev T, Diebold F, Labys P (2003) Modeling and forecasting realized volatility. Econometrica 71:579–625
Andersen TG, Dobrev D, Schaumburg E (2012) Jump-robust volatility estimation using nearest neighbor truncation. J Econom 169:75–93
Andrews DWK, Monahan CJ (1992) An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60:953–966
Baillie RT, Bollerslev T, Mikkelsen H-O (1996) Fractionally integrated generalized autoregressive conditional heterskedasticity. J Econom 74:3–30
Baillie RT, Cecen AA, Han YW (2000) High frequency Deutsche mark-US dollar returns: FIGARCH representations and non-linearities. Multinatl Finance J 4:247–267
Baillie RT, Han Y-W, Myers RJ, Song J (2007) Long memory models for daily and high frequency commodity futures returns. J Futures Mark 27:643–668
Baillie RT, Morana C (2009) Modeling long memory and structural breaks in conditional variances: an adaptive FIGARCH approach. J Econ Dyn Control 33:1577–1592
Baker M, Stein JC (2004) Market liquidity as a sentiment indicator. J Financ Mark 7:271–299
Bandi FM, Russell JR (2006) Separating microstructure noise from volatility. J Financ Econ 79:655–692
Bekaert G, Harvey CR, Lundblad C (2007) Liquidity and expected returns: lessons from emerging markets. Rev Financ Stud 20:1783–1831
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31:307–327
Bollerslev T (1990) Modeling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model. Rev Econ Stat 72:498–505
Chang CW, Chang SK (1993) An implicit measure of the effective bid–ask spread: a note. J Financ Res 16:71–75
Chinese Futures Association (2015) Convoy in the futures markets: protecting the rights of futures investors (in Chinese). Chinese Finance Press
Chortareas G, Jiang Y, Nankervis JC (2011) Forecasting exchange rate volatility using high-frequency data: is the Euro different? Int J Forecast 27:1089–1107
Diebold F, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13:253–263
Fuertes AM, Kalotychou E, Todorovic N (2015) Daily volume, intraday and overnight returns for volatility prediction: profitability or accuracy? Rev Quant Finance Acc 45:251–278
Fung H-G, Leung WK, Xu X (2003) Information flows between the US and China commodity futures trading. Rev Quant Finance Acc 21:267–285
Garman MB, Klass MJ (1980) On the estimation of security price volatilities from historical data. J Bus 53:67–78
George TJ, Kaul G, Nimalendran M (1991) Estimation of the bid–ask spread and its components: a new approach. Rev Financ Stud 4:623–656
Goyenko RY, Holden CW, Trzcinka CA (2009) Do liquidity measures measure liquidity. J Financ Econ 92:153–181
Granger CW (1980) Long memory relationships and the aggregation of dynamic models. J Econom 14:227–238
Granger CW, Joyeux R (1980) An introduction to long memory time series models and fractional differencing. J Time Ser Anal 1:15–39
Hansen PR (2005) A test of superior predictive ability. J Bus Econ Stat 23:365–380
Hseu M-M, Chung H, Sun E-Y (2007) Price discovery across the stock index futures and the ETF markets: intra-day evidence from the S&P 500, Nasdaq-100 and DJIA indices. Rev Pac Basin Financ Mark Pol 10:215–236
Lee H-T (2009) Optimal futures hedging under jump switching dynamics. J Empir Finance 16:446–456
Lesmond DA (2005) The costs of equity trading in emerging markets. J Financ Econ 77:411–452
Lesmond DA, Odgen JP, Trzcinka CA (1999) A new estimate of transaction costs. Rev Financ Stud 12:1113–1141
Li H, Hong Y (2011) Financial volatility forecasting with range-based autoregressive volatility model. Finance Res Lett 8:69–76
Liu Q, Chng M, Xu D (2014) Hedging industrial metals with stochastic volatility models. J Futures Mark 34:704–730
Martens M (2001) Forecasting daily exchange rate volatility using intraday returns. J Int Money Finance 20:1–23
Martens M, Zein J (2004) Predicting financial volatility: high-frequency time-series forecasts vis-a-vis implied volatility. J Futures Mark 24:1005–1028
Martin VL, Wilkins NP (1999) Indirect estimation of ARFIMA and VARFIMA models. J Econom 93:149–175
Parkinson M (1980) The extreme value method for estimating the variance of the rate of return. J Bus 53:61–65
Peck AE (2008) The development of futures markets in China: evidence of some unique trading characteristics. In: Goss B (ed) Debt, risk, and liquidity in futures markets. Routledge, Oxon, pp 46–74
Phillips PCB, Yu J (2009) Information loss in volatility measurement with flat price trading. Unpublished manuscript, Yale University and Singapore Management University
Pong S, Shackleton MB, Taylor SJ, Xu X (2003) Forecasting currency volatility: a comparison of implied volatilities and AR(FI)MA models. J Bank Finance 28:2541–2563
Poon S-H, Granger CW (2003) Forecasting financial market volatility: a review. J Econ Lit 41:478–539
Roll R (1984) A simple implicit measure of the effective bid–ask spread in an efficient market. J Finance 39:1127–1139
Shi W, Lee C-F (2008) Volatility persistence of high-frequency returns in the Japanese government bond futures market. Rev Pac Basin Financ Mark Pol 11:511–530
Taylor SJ, Xu X (1997) The incremental volatility information in one million foreign exchange quotations. J Empir Finance 4:317–340
West KD (1996) Asymptotic inference about predictive ability. Econometrica 64:1067–1084
Yang D, Zhang Q (2000) Drift-independent volatility estimation based on high, low, open, and close prices. J Bus 73:477–492
Acknowledgments
We thank comments and suggestions by participants at the 2015 Asian Financial Association annual conference and the Workshop on Chinese Commodity Futures Market. Thanks are also due to seminar audience at Renmin University of China. Jiang and Liu gratefully acknowledge financial support from the Humanities and Social Sciences Research Fund for Young Scientists by the Ministry of Education of China (Grant No. 12YJC790079).
Author information
Authors and Affiliations
Corresponding author
Appendix: Liquidity measures
Appendix: Liquidity measures
We use three liquidity estimators widely adopted in the literature to describe the liquidity of the Chinese commodity futures contracts. They are the effective spread of Roll (1984), the proportion of zero returns as in Lesmond et al. (1999), and the Amihud (2002) illiquidity estimator. These measures are shown to perform quite well in capturing the different aspects of the asset liquidity (Goyenko et al. 2009) (Tables 10, 11, 12).
1.1 Roll spread
In the seminal paper of Roll (1984), a simple serial covariance spread estimation model is developed to capture asset liquidity. The effective spread is derived from the serial covariance properties of transaction price changes. The model has led to a burgeoning research area in the market microstructure literature with many modifications and extensions (see George et al. 1991; Chang and Chang 1993; and the references therein).
To illustrate, let E and \(P_t\) denote the effective spread and the closing price on day t, respectively, and \(\Delta \) is the change operator. Roll (1984) shows that the serial covariance between changes in prices is
In this paper, we follow Goyenko et al. (2009) and adopt a modified version of the Roll (1984) spread so that we can always obtain a numerical value for this liquidity measure. Denoting the price change over the nth time interval as \(\Delta P_n\), the effective spread can be expressed as follows:
Hence, the lower the effective spread, the higher the liquidity of the asset.
1.2 Proportion of zero returns
The second liquidity measure we exploit is proposed in Lesmond et al. (1999) and proves especially useful and effective in studying liquidity of emerging markets (see, among others, Bekaert et al. 2007; Lesmond 2005). This measure is based on the transaction cost, that is, if the value of an information signal is insufficient to outweigh the cost associated with trading, market participants will choose not to trade, resulting in a zero return. The measure is easy to implement since it only requires a time series on transaction data. In this paper, the proportion of zero returns in a trading day is defined as follows:
where N is the total number of time intervals in a trading day (\(n=1,2,\ldots ,N\)). Intuitively, the lower is the proportion of zero returns, the better is the liquidity of the asset.
1.3 Amihud illiquidity measure
The illiquidity measure of Amihud (2002) is another popular estimator in the literature (see, among others, Baker and Stein 2004; Amihud et al. 2012). It is a price impact measure that captures the price response associated with one unit currency of trading volume. Hence, the lower is the illiquidity measure, the better is the asset liquidity. More precisely, it is defined as the ratio given by
where \(r_n\) is the asset return in log over the nth time interval and \(\text{ Volume }_n\) is the US dollar (in our case, Renminbi) trading volume over the same interval.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Jiang, Y., Ahmed, S. & Liu, X. Volatility forecasting in the Chinese commodity futures market with intraday data. Rev Quant Finan Acc 48, 1123–1173 (2017). https://doi.org/10.1007/s11156-016-0570-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11156-016-0570-4
Keywords
- Out-of-sample predictability
- Long memory time series
- Futures market regulation
- Realized volatility
- Econometric models