Volatility forecasting in the Chinese commodity futures market with intraday data

Abstract Given the unique institutional regulations in the Chinese commodity futures market as well as the characteristics of the data it generates, we utilize contracts with three months to delivery, the most liquid contract series, to systematically explore volatility forecasting for aluminum, copper, fuel oil, and sugar at the daily and three intraday sampling frequencies. We adopt popular volatility models in the literature and assess the forecasts obtained via these models against alternative proxies for the true volatility. Our results suggest that the long memory property is an essential feature in the commodity futures volatility dynamics and that the ARFIMA model consistently produces the best forecasts or forecasts not inferior to the best in statistical terms.


Introduction
In this paper, we are concerned with volatility forecasting in the Chinese commodity futures market. Volatility modeling and forecasting is a much devoted area of research as volatility is considered the "barometer for the vulnerability of financial markets and the economy" (Poon and Granger (2003, p.479)) and central to asset pricing, derivative valuation, portfolio allocation, and risk management. We are interested in this particular market in part because it has become an important part of the global futures markets with tremendous trading volume. 1,2 More importantly, this market is regulated by two unique institutional rules that makes it interesting to explore.
The first regulation is the time-dependent margin rate, whereby the margin as a fraction of the contract value increases as contracts move closer to delivery. Take Sugar as an example. The margin rate for deposit two months prior to delivery is 6% of the contract value for an investor.
In the month before delivery, it increases to 8% in the first 10 days, to 15% between the 11th to the 20th day of the month, to 25% in the final 10 days of the month, culminating to 30% in the delivery month. 3 The second regulation is that, although they represent 97% of all investors in the futures markets, individual investors are not allowed to trade nearby contracts. 4 Both regulations effectively push market participation and trading volume to more distant contracts with implications for market liquidity.
Our contribution to the literature is that we take into account of unique institutional regulations of this market and design empirical volatility forecasting exercises that are appropriate for the characteristics of the market and the data it generates. Our data on Aluminum, Copper, and Fuel Oil consistently show that contracts with three months to delivery enjoy the best liquidity. We are not the first to note this pattern (see Liu et al. (2014) and Peck (2008)), but 1 we are the first to offer solid and detailed evidence. Using five-minute returns data over long sample periods, we compute three popular liquidity measures that capture different aspects of liquidity, namely the effective spread of Roll (1984), the proportion of zero returns of Lesmond et al. (1999), and the Amihud (2002) illiquidity measure (Goyenko et al. (2009)). Our results show that contracts with three months to delivery are the most liquid as they exhibit the lowest effective spread, the lowest percentage of zero returns, and the smallest value for the Amihud (2002) illiquidity measure. This is different from the majority of futures markets and contracts for which the nearby contracts are usually the most liquid (see Baillie et al. (2007), Lee (2009), and the references therein). Crucially, this liquidity pattern results from the unique institutional environment in which trading takes place.
On the other hand, being an emerging market, the Chinese commodity futures market exhibits large proportion of zero returns (Bekaert et al. (2007)) and this is particularly evident in our five-minute return series. Even for the most liquid three-month to maturity contracts, the fraction of zero returns is as high as 36.27%, 23.90%, and 31.50% on average, respectively, for Aluminum, Copper, and Fuel Oil. In the existing literature, intraday data are widely adopted for volatility forecasting as they are shown to contain more information and provide more accurate and efficient forecasts (see, for example, Taylor and Xu (1997), Chortareas et al. (2011), Fuertes et al. (2015, and the references therein). However, the large proportion of zero returns in our data suggests that higher data sampling frequency does not necessarily translate into better forecasting performance due to information loss or noise in the data (Phillips and Yu (2009) and Bandi and Russel (2005)). Hence we choose to perform volatility forecasting by aggregating five-minute data into 15-, 30-, and 60-minute intraday returns and compute daily returns from daily prices so that we can observe and compare how well different models are at capturing the volatility dynamics given the data.
Equally important for the volatility forecast comparison is the choice of the true volatility proxy. While true volatility is a latent variable that cannot be observed in the market, an efficient and accurate representation of it is of great importance for the evaluation of volatility forecasts (see Andersen et al. (2010) for an excellent survey). In this paper, we undertake three different proxies for the true daily volatility. In addition to the widely adopted realized volatility measure of Andersen and Bollerslev (1998), we also consider the median-based measure of Andersen et al. (2012) and the range-based proxy advocated by Parkinson (1980), both of which are shown to be robust to zero returns, potential jumps in the underlying price dynamics, and other microstructure related effects (Alizadeh et al. (2002)).
In terms of volatility models, we begin with the conventional generalized autoregressive conditional heteroskedastic (GARCH) model of Bollerslev (1986Bollerslev ( , 1990. Our choice of models is also motivated by Baillie et al. (2007), which document strong long memory properties in commodity futures and argue that the fractionally integrated GARCH (FIGARCH) model captures this feature very well. At the same time, a natural alternative that works well at capturing the long memory property in realized volatility is the autoregressive fractionally integrated moving average (ARFIMA) model of Granger (1980) and Granger and Joyeux (1980). The two models differ in the manner in which information is extracted from intraday data: intraday returns are first aggregated to obtain daily realized volatility before the ARFIMA model is adopted to describe and forecast realized volatility at the daily level; whereas for the FIGARCH model, deseasonalized intraday data are directly fed into the model. So it is empirically interesting to compare the performance of the two models using our data.
Our empirical analysis reveals a host of interesting findings. First, in terms of the out-ofsample forecasting performance, the Diebold and Mariano (1995) and West (1996) test applied on a pairwise basis and the superior predicative ability test of Hansen (2005), which tests across alternative models simultaneously, suggest that the ARFIMA model consistently outperforms the GARCH-type models in the out-of-sample tests. It is the best performing model in 11 out of 15 commodity/volatility proxy combinations, and for the remaining four combinations the difference between the forecasting performance of the ARFIMA model and that of the best performing model is statistically insignificant at any conventional level. In other words, the ARFIMA model consistently produces the best forecasts or forecasts not inferior to the best in statistical terms.
It highlights the importance of incorporating the long memory dimension in volatility modeling in line with the literature. This finding also contributes to the discussion in the literature of whether the FIGARCH or the ARFIMA model is empirically better at capturing the long memory feature in the volatility dynamics (Chortareas et al. (2011)). Given that the intraday Chinese commodity futures data contain large proportion of zero returns which are directly fed in the FIGARCH model, it is not surprising that the ARFIMA model performs better.
Second, we show that within the GARCH family of models, the forecasting performance using the daily data is consistently as good as, if not better than, those using the intraday data.
This finding suggests that the GARCH-type models may not be very efficient in utilizing the information contained in the intraday data of this particular market for volatility forecasting purpose due to high percentage of zero returns.
Finally, it is interesting to note that although Sugar contracts with January maturity and November maturity differ massively in terms of trading volume and show different levels of liquidity, the underlying volatility dynamics is nevertheless captured by the same model at the same data sampling frequency. For example, when the median-and range-based proxies are adopted, both futures contracts are best forecasted by the AFRIMA model using daily realized volatility obtained from the 60-minute returns. This further suggests that the ARFIMA model is a reliable and robust tool for forecasting volatility regardless of the underlying liquidity level with practical implications for traders and risk managers.
The rest of the paper is structured as follows. In Section 2, we briefly outline the alternative volatility models, the proxies for the true volatility dynamics, and the statistical metrics for the out-of-sample volatility forecasts evaluation. Section 3 describes the data and the model estimates. In Section 4, we discuss and analyze main empirical findings. Finally, Section 5 concludes. Details of the three liquidity measures are provided in the Appendix.

Volatility Models
In this paper, we consider four popular volatility models at four different data sampling frequencies for volatility modeling and out-of-sample forecasting. In particular, we make use of the: (1) intraday GARCH, integrated GARCH (IGARCH), and FIGARCH models at the 15-, 30-, and 60-minute intervals; (2) daily GARCH, IGARCH, and FIGARCH models; and (3) ARFIMA model applied to the daily realized volatility computed from the 15-, 30-, and 60-minute intervals. The model specifications are briefly outlined below.

GARCH Model
The GARCH model is the workhorse in the volatility estimation and forecasting literature (see, among others, Bollerslev (1986Bollerslev ( , 1990). We use an ARMA(1,1) process in the conditional mean equation of the GARCH-type models. To allow for possible fat tails, we model the innovations in the GARCH process as independently and identically distributed Student's tdistribution while implementing the ARMA(1,1)-GARCH(1,1) model using both intraday and daily data. The model specification is given bỹ wherer t,n is the deseasonalized logarithmic return on day t for the nth time interval (see equations (10)-(12)), µ, γ, and θ are the parameters of the conditional mean equation, and ω, α, and β are the parameters of the conditional variance equation. 5 The error term ε t,n , which is conditional on the information set Ω t,n−1 , follows a Student's t-distribution (denoted by D v ) with zero mean, variance h t,n , and v degrees of freedom. The GARCH model requires that α + β < 1 for the volatility process to be stationary. For the IGARCH model, however, the corresponding requirement is α + β = 1.

FIGARCH Model
The FIGARCH model extends the conditional variance equation of the standard GARCH model by adding fractional differences in order to allow for long memory property of the GARCH volatility process (Baillie et al. (1996) and Baillie and Morana (2009)). Following Baillie et al. (2000), we implement an ARMA(1,1)-FIGARCH(1,d,1) model given bỹ r t,n = µ + γr t,n−1 + ε t,n + θε t,n−1 , ε t,n |Ω t,n−1 ∼ D v (0, h t,n ) where ω, β, and ϕ are the parameters of the conditional variance equation, d is the order of fractional integration, L 1 is the lag operator on n, and D v is the Student's t-distribution defined above.

ARFIMA Model
Granger (1980) and Granger and Joyeux (1980) introduce a flexible class of long memory processes based on realized volatilities not belonging to the ARCH family. It has been widely adopted in the literature when long memory properties are assumed in the data (see Martin andWilkins (1999), Pong et al. (2003), and the references therein). The ARFIMA (p, d, q) model for a process y t is defined as where d is the order of fractional integration and L 2 is the lag operator on t. The AR and MA polynomial components are given as φ(L 2 ) = 1 + φ 1 L 2 + · · · + φ p L p 2 and θ(L 2 ) = 1 + θ 1 L 2 + · · · + θ q L q 2 , respectively, and µ is the mean of y t . In the empirical estimation of the ARFIMA (p, d, q) model, we follow Andersen et al. (2003) and replace y t by the log of the daily realized volatility (denoted as log(σ t )) obtained from the 15-, 30-, and 60-minute returns.

5-Minute Realized Volatility
The most popular proxy for the unobservable true volatility is the realized volatility measure proposed by Andersen and Bollerslev (1998). This is obtained by aggregating the intraday squared returns. We follow this approach and use a realized volatility series constructed from 5-minute log price series, which is the highest frequency in our data. The proxy is given bŷ whereσ 2 rv,t is the realized variance for day t and r 2 t,n is the squared 5-minute (log) return on day t for interval n (n = 1, 2, · · · , N ).

Median-Based Volatility
The second proxy we exploit for true volatility is the median-based volatility measure introduced by Andersen et al. (2012). The measure is robust to jumps in the underlying return dynamics and to small ("zero") returns. The median-based true volatility proxy is defined aŝ whereσ 2 med,t is the median-based variance for day t and |∆r n | is the absolute return over the nth interval on day t.

Range-Based Volatility
The third proxy for true volatility is the range-based measure proposed by Parkinson (1980).
It has been further refined and adopted in Garman and Klass (1980), Yang and Zhang (2000), and Li and Hong (2011). Taking into account of daily high and low prices, this measure is able to deal with microstructure biases in the market. The proxy is defined as follows: whereσ 2 rng,t is the range-based variance for day t , and H t and L t are the daily high and low prices, respectively.

Forecasting Accuracy
We use three different metrics to evaluate the out-of-sample forecasting accuracy of the volatility models, all of which are commonly adopted statistical measures in the literature (see, for example, Ahmed et al. (2016)).

7
The root mean squared forecast error (RMSFE) compares the true volatility with the forecasted volatility from a given model and is computed as where R is the number of daily observations,ĥ t+1 is the variance forecast, andσ 2 t+1 is the chosen proxy for true variance in the out-of-sample period.
Diebold and Mariano (1995) and West (1996) Test The second out-of-sample statistical metric of accuracy is the Diebold and Mariano (1995) and West (1996) MSFE t-statistic, which in our case tests whether a competing volatility model outperforms the benchmark volatility model by generating more accurate variance forecasts.
We chose the benchmark model based on the lowest RMSFE. The test statistic is as follows: where ∆Loss t+1 is the difference between the squared forecast error loss functions of the benchmark and competing volatility models andΩ is the consistent estimate of the asymptotic variance of R −0.5 R t=1 ∆Loss t+1 . The null hypothesis can be expressed as Since the volatility models are non-nested, the alternative hypothesis in this case is two-sided.
The test statistic in equation (12) follows an asymptotic standard normal distribution under the null hypothesis of equal predictive ability. We regress ∆Loss t +1 on a constant and obtain the MSFE-t statistic for a zero coefficient based on the Andrews and Monahan (1992) estimator.
A positive (negative) and statistically significant MSFE-t statistic suggests that the competing model outperforms (is outperformed by) the benchmark volatility model.

Superior Predictive Ability Test
To address the multiple-testing problem in the light of data mining, we conduct the superior predictive ability (henceforth SPA) test of Hansen (2005). Under the composite null hypothesis, there is no predictive ability across all competing volatility models. In other words, the null states that the benchmark model is not inferior to any of the alternative models. A rejection of the null hypothesis indicates that at least one competing model produces forecasts more accurate than the benchmark. Once again, we chose the benchmark model based on the lowest RMSFE and evaluate the out-of-sample forecasts based on the MSFE. For inference, we report stationary bootstrap p-values obtained using 10,000 replications.

Data and Estimation
The data come from the GTA Information Technology Company. We obtain contract ID, In Table 2, we report descriptive statistics of three measures adopted to describe liquidity of futures contracts at 5-minute interval, which is the highest sampling frequency in our data. 8 For Aluminum, the Roll spread measure for nearby contracts averages at 0.0006, zero returns account for 61% of all 5-minute returns on average in a trading day, and the scaled Amihud measure is 0.23. Comparing these figures to those for the three months to delivery contracts, we notice a marked improvement. In particular, the Roll spread drops to 0.0004, the percentage of zero returns decreases to 36%, and the scaled Amihud illiquidity measure drops to 0.03.
The liquidity of the futures contract series subsequently worsens with longer time to delivery.
For example, Aluminum contracts with three months to delivery are the most liquid and this liquidity decreases for contracts with longer or shorter time to maturity. The pattern is mirrored in the liquidity estimators for other commodities as well. Hence, in our volatility estimation and forecasting exercises for Aluminum, Copper, and Fuel Oil, we use futures contracts with three months to delivery, as they are the most liquid among all maturities, and volatility forecasts are least expected to be biased by the large proportion of zero returns.
While constructing the time series on returns with three months to maturity for Aluminum, Copper, and Fuel Oil, we choose prices of the third month prior to delivery month until the contract reaches the first day of two months prior to delivery month. We then switch to next contract, which is to be matured in three months to make continuous time series. Hence, for these three commodities, the contract time to maturity is always around three months. For Sugar futures, however, we are mostly interested in the effect that seasonality in trading volume has on volatility forecasting. Therefore, we take contracts from January to December for next January delivery and from November to October for next November delivery. This results in the contract time to maturity to change over time. The practice of switching contracts to the next delivery month is common in the literature (see, for example, Baillie et al. (2007)).
In our sample, all commodity futures are traded for four hours on a trading day starting at 9:00am and closing at 3:00pm with a two-hour break between 11:30 am and 1:30 pm. As a result, there are 48 5-minute returns on any business day. The (log) return r t,n on a trading 8 A brief discussion of the three liquidity measures are contained in the Appendix.
day t for the nth interval is computed as where P t,n denote the commodity futures price on day t and the end of the nth interval. The 15-, 30-, 60-minute and daily returns are obtained by taking the logarithmic difference between prices that are 15, 30, and 60 minutes apart. The daily returns are computed as r t = ln P t − ln P t−1 .
In Table 3, we provide descriptive statistics of commodity futures contract returns at 5-, 15-, 30-, 60-minute and daily intervals. We notice that the average returns are very close to zero irrespective of contracts and data frequencies.
Returns are left skewed with fat tails, although the degree of negative skewness and excess kurtosis tend to drop with decreasing sampling frequency. In addition, the percentage of zero returns drops considerably from the 5-minute to daily intervals. For example, it is 31.50% at the 5-minute interval, 17% at the 15-minute interval, while only 3.60% at the daily level for Fuel Oil. The trade-off between the improvement in data quality and the loss of information at lower frequencies could be crucial for the outcome of volatility measurement and forecasting exercises.
The volatility of intraday returns are known to display periodicity within a trading day, which could contaminate the estimation of conventional volatility models (Andersen and Bollerslev (1997)). Following Taylor and Xu (1997), we estimate a simple seasonality term S t,n by averaging the squared returns for each intraday period as follows: where T is the number of trading days in the full sample period. The deseasonalized intraday returns are obtained asr We then make use of the deseasonlized returns to estimate the intraday GARCH family of models. In the out-of-sample forecasting, the intraday forecasts are based on the deseasonlized filtered returns and therefore transformed back to those from the original returns. This is implemented as follows:ĥ t,n =Ŝ 2 t,n ×h t,n , whereh t,n is the intraday variance forecast using the deseasonalized returns andĥ t,n is the transformed variance forecast for the original returns. We produce one-step ahead daily volatility forecasts for daily models. But for intraday models, we produce 16-, 8-, and 4-step ahead forecasts for 15-, 30-, and 60-minute intervals and aggregate them to transform into daily forecasts. For the ARFIMA model, it is fitted directly to daily realized volatility aggregated from intraday returns. The out-of-sample forecasts are evaluated against the daily true volatility proxies described earlier. For all sampling frequencies, we use a rolling window forecasting scheme to obtain forecasts from all volatility models.

In-Sample Results
We report the in-sample parameter estimates of the intraday GARCH, FIGARCH, and IGARCH models for five futures contracts at 15-, 30-, and 60-minute intervals in Table 4.
For the ARMA(1,1)-GARCH(1,1) model specification in Panel A, most of the AR parameter estimatesγ are statistically significant at conventional levels. Also, the MA parameter estimatê θ is significantly negative in most cases, capturing the first order negative autocorrelation in the returns. All the parameters in the conditional variance equations are highly significant at the 1% level exceptα for 15-minute Copper contracts. The fact thatα +β < 1 reveals that the GARCH process is stationary, and, sinceα +β is close to 1, the volatility process is persistent.
For the contract series with return innovations following a Student's t-distribution, the degrees of freedom parameter is between 2 and 4 and statistically significant at the 1% level. This indicates a fat tail in the return distributions.
In Panel B, when the volatility process is described by an ARMA(1,1)-FIGARCH(1,d,1) model, we notice that the parameter d, the order of fractional integration, is significantly different from zero at the 1% level for all futures contract series. This implies that the volatility process exhibits a long memory property and attests to the importance of adding this feature in the volatility dynamics of the commodity futures contract returns under scrutiny. It is also worth noting that, similar to the results in Panel A, the degrees of freedom parameter v is highly significant. Panel C shows the parameter estimates of the ARMA(1,1)-IGARCH(1,1) model specification and the results are qualitatively similar to those in Panel A. Table 5 shows the in-sample parameter estimation for the daily GARCH, FIGARCH, and IGARCH models. These results are qualitatively similar to those in Table 4. We observe: (1) negative and significant first order autocorrelation in the conditional mean equation for each model and contract except for the daily IGARCH model using the Sugar contract with January delivery; (2) statistically significantβ parameters; (3) highly significant fractional integration parametersd; and (4) highly significant degrees of freedom parametersv.
We present the in-sample parameter estimates of the ARFIMA model using the daily realized volatility obtained from the 15-, 30-, and 60-minute returns in Table 6. For Aluminum, Copper, and Fuel Oil, we set the MA term q = 0 as it is statistically insignificant at any conventional level. The first order autoregression termp is negative and highly significant and the fractional integration termd hovers around 0.4 for each of these three commodities. In cases of January and November contracts for Sugar, the first order autocorrelationp tends to be positive and quite often significant. The MA parameterq is close to −0.4 and significant at the 1% level.
Similar to other commodities, the fractional integration parameter estimate for Sugar is in the vicinity of 0.45 and is highly significant.
Overall, the in-sample estimates of the GARCH, FIGARCH, IGARCH, and ARMIFA models reported in Tables 4 to 6 using intraday and daily data reveal that, for the four commodities, the return innovations are generally negatively autocorrelated with fat tails. Moreover, the underlying volatility processes are persistent with clear evidence of long memory properties. Taken together, we notice three interesting and consistent patterns from the preliminary results in Table 7. First, the ARFIMA model, with its long memory dimension, dominates the other three volatility models in 11 out of 15 commodity/true volatility proxy combinations.

Out-of-Sample Predictions
Second, GARCH-type models using daily data outperform similar models using intraday data.
Third, the ARFIMA model applied to the daily realized volatility obtained from the higher frequency returns (i.e., 15-minute returns) does not always beat the ARFIMA model using the daily realized volatility computed from the lower frequency returns. The latter two observations 14 are novel for our chosen futures market because the literature seems to agree that intraday data enjoy informational advantage over daily data and that forecasting performance of the ARFIMA model improves with sampling frequency (Martens (2001) and Martens and Zein (2004)).
In Table 8, we provide pair-wise comparison following the well-known Diebold and Mariano (1995) and West (1996) Tables A1 to A3. These additional results corroborate the conclusion in Table 8 that the benchmark, chosen as the one with the lowest RMSFE in Table 7, is indeed the one with the best volatility forecasting ability.
In Table 9, we perform the SPA test of Hansen (2005) to examine out-of-sample forecasting ability across all competing models and compute the stationary bootstrap p-values. The null hypothesis is that the benchmark model, the one with the lowest RMSFE, is not inferior to any of the competing models. The test results are resounding. The probability that the benchmark model is at least as good as the competing models in forecasting volatility in the out of sample is 1 or very close to it. Taken together, the results in Tables 8 and 9 clearly confirm and substantiate the observations in Table 7. In other words, when intraday data are directly used in the GARCH-type models, they are no better than daily data for volatility forecasting even after deseasonalization. Hence, if a model is to be recommended for volatility forecasting in the Chinese futures market, it would be the ARFIMA model, as it is consistently the best performing model or not inferior to the best performing one statistically.
Finally, we note that although Sugar contracts for January and November deliveries differ in terms of trading volume and liquidity, the underlying volatility dynamics is very similar. The in-sample parameter estimates are similar between these two series and both are best forecasted by the same model. When the 5-minute realized volatility is the proxy for true volatility, the ARFIMA model using the realized volatility computed from the 15-minute returns produces the most accurate forecast for both series, while the ARFIMA model applied to the realized volatility computed from the 60-minute interval outperforms competing models for the other two volatility proxies for both series. In other words, seasonality in trading volume and differences in liquidity do not affect volatility model selection.

Conclusion
In this paper, we undertake a comprehensive volatility forecasting exercise in a futures market with unique institutional regulations. In the Chinese commodity futures market, margin rate is time-dependent and investors face higher deposit as contracts move closer to maturity.
In addition, although individuals account for the majority of investors, they are not allowed to trade nearby contracts. These two regulations result in a liquidity pattern whereby contracts with three months to delivery are the most liquid and we demonstrate this by computing three popular liquidity measures with 5-minute intraday data for Aluminum, Copper, Fuel Oil, and Sugar. In addition, even these most liquid contract series contain large percentage of zero returns at the 5-minute interval.
We explicitly take these features into account when forecasting volatility and utilize more distant three months to maturity contracts at the daily and three different intraday sampling frequencies. We demonstrate that the long memory dimension is present in our data in the in-sample volatility modeling. When it comes to out-of-sample forecasting, we show that the 16 ARFIMA model, which aggregates intraday returns to daily level in generating daily forecasts, is the best-performing model, or equivalent to the best-performing model in statistical terms. The FIGARCH model, which also incorporates the long memory feature in the volatility dynamics, is less efficient in generating forecasts probably due to the fact that large proportions of intraday returns are zero and the deseasonalized intraday returns are direct fed into the model.
Furthermore, we show that within the GARCH-family of models, the forecasting performance using the daily data is consistently as good as, if not better than, those using the intraday data, which also attests to the trade-off between information and noise in the intraday data with many zero returns. Finally, it is interesting to note that even though January and November contract series for Sugar differ massively in terms of trading volume, their underlying volatility dynamics are well captured and forecasted by the ARFIMA model at the same data sampling frequency.

Acknowledgement
We

Roll Spread
In the seminal paper of Roll (1984), a simple serial covariance spread estimation model is developed to capture asset liquidity. The effective spread is derived from the serial covariance properties of transaction price changes. The model has led to a burgeoning research area in the market microstructure literature with many modifications and extensions (see George et al. (1991), Chang and Chang (1993), and the references therein).
To illustrate, let E and P t denote the effective spread and the closing price on day t, respectively, and ∆ is the change operator. Roll (1984) shows that the serial covariance between changes in prices is In this paper, we follow Goyenko et al. (2009) and adopt a modified version of the Roll (1984) spread so that we can always obtain a numerical value for this liquidity measure. Denoting the price change over the nth time interval as ∆P n , the effective spread can be expressed as follows: Hence, the lower the effective spread, the higher the liquidity of the asset.

Proportion of Zero Returns
The second liquidity measure we exploit is proposed in Lesmond et al. (1999) and proves especially useful and effective in studying liquidity of emerging markets (see, among others, Bekaert et al. (2007) and Lesmond (2005)). This measure is based on the transaction cost, that is, if the value of an information signal is insufficient to outweigh the cost associated with trading, market participants will choose not to trade, resulting in a zero return. The measure is easy to implement since it only requires a time series on transaction data. In this paper, the proportion of zero returns in a trading day is defined as follows: Zeros = (# of intraday time intervals with zero returns)/N, where N is the total number of time intervals in a trading day (n = 1, 2, · · · , N ). Intuitively, 18 the lower is the proportion of zero returns, the better is the liquidity of the asset.

Amihud Illiquidity Measure
The illiuqidity measure of Amihud (2002) is another popular estimator in the literature (see, among others, Baker and Stein (2004) and Amihud et al. (2012)). It is a price impact measure that captures the price response associated with one unit currency of trading volume. Hence, the lower is the illiquidity measure, the better is the asset liquidity. More precisely, it is defined as the ratio given by where r n is the asset return in log over the nth time interval and Volume n is the US dollar (in our case, Renminbi) trading volume over the same interval.   The table reports the in-sample parameter estimates of the intraday GARCH, FIGARCH, and IGARCH models. In all panels, estimates are obtained using 15-, 30-, and 60-minute deseasonalized intraday returns. The models are estimated using quasi-maximum likelihood with Student's t-distributed innovations with v degrees of freedom.
Only for Fuel Oil, the GARCH model at 15-minute interval and for Sugar (November), the GARCH, FIGARCH, and IGARCH models at 15-, 30-, and 60-minute intervals are estimated assuming a normal distribution. Numbers in parentheses are t-statistics, and ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The in-sample period for each commodity futures contract is reported in Table 1.       The table reports the in-sample parameter estimates of the daily GARCH, FIGARCH, and IGARCH models. The models are estimated using quasi-maximum likelihood with Student's t-distributed innovations with v degrees of freedom. Numbers in parentheses are t-statistics, and ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The in-sample period for each commodity futures contract is reported in Table 1.  The table reports the in-sample parameter estimates of the ARFIMA(p, d, q) model using the daily realized volatility computed from the 15-, 30-, and 60-minute returns. Numbers in parentheses are t-statistics, and ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The in-sample period for each commodity futures contract is reported in Table 1.  This table reports the daily out-of-sample RMSFEs (×10 −5 ) for all models relative to the true volatility proxies: 5-minute realized volatility (Panel A), median-based volatility (Panel B), and range-based volatility (Panel C).
The out-of-sample period for each commodity futures contract is reported in Table 1 Table 8. Diebold and Mariano (1995) and West (1996)

Test Results
The table reports the test statistics of the Diebold and Mariano (1995) and West (1996) test based on the Andrews and Monahan (1992) estimator. The benchmark models are those with the lowest RMSFE in Table 7. The forecast errors are computed relative to 5-minute realized volatility (Panel A), median-based volatility (Panel B), and range-based volatility (Panel C) measures. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The out-of-sample period for each commodity futures contract is reported in Table 1.  Appendix Table A1. Diebold and Mariano (1995) and West (1996) Test Results: 5-Minute Volatility Proxy The table reports the test statistics of the Diebold and Mariano (1995) and West (1996) test based on the Andrews and Monahan (1992) estimator. Based on the results of the RMSFE presented in Table 7, the benchmark models are chosen in terms of increasing RMSFE. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The forecast errors for all models are computed relative to 5-minute measure of true volatility. The out-of-sample period for each commodity futures contract is reported in Table 1.     Table A2. Diebold and Mariano (1995) and West (1996) Test Results: Median-Based Volatility Proxy The table reports the test statistics of the Diebold and Mariano (1995) and West (1996) test based on the Andrews and Monahan (1992) estimator. Based on the results of the RMSFE presented in Table 7, the benchmark models are chosen in terms of increasing RMSFE. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The forecast errors for all models are computed relative to the median-based measure of true volatility. The out-of-sample period for each commodity futures contract is reported in Table 1.  Table A3. Diebold and Mariano (1995) and West (1996) Test Results: Range-Based Volatility Proxy The table reports the test statistics of the Diebold and Mariano (1995) and West (1996) test based on the Andrews and Monahan (1992) estimator. Based on the results of the RMSFE presented in Table 7, the benchmark models are chosen in terms of increasing RMSFE. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively. The forecast errors for all models are computed relative to the range-based measure of true volatility. The out-of-sample period for each commodity futures contract is reported in Table 1.  ARFIMA 60-min