Has the VIX index been manipulated?

Recently, an influential academic study and many lawsuits have claimed that the VIX index has been manipulated since 2008. In this paper, we construct a regression model with explanatory variables that are exogenous to the index and examine the model prediction errors. We find that the movements in the daily levels of the VIX index are explained by market fundamentals and not by manipulation. We also specifically examine the VIX futures expiration days and demonstrate that the VIX closing values and VIX futures settlements prices on those days are consistent normal market forces and are not artificial.


Introduction
The Chicago Board Options Exchange Volatility Index (VIX) is the most popular measure of the market's expectation of volatility over the near-term future. Introduced in 1993, the VIX is considered to be the premier gauge of investor sentiment. Since the index tends to rise during falling markets, it is often called 'the fear index.' The VIX hit a high of over 80 during the 2008 market meltdown. During the slowly rising market of 2017, the VIX averaged around 11. There has been a growing acceptance of VIX and VIX-linked products (such as VIX futures and options) for use as risk management tools, and trading of these instruments has expanded dramatically over time. Because of its excellent liquidity and its negative correlation with broad stock market movements, VIXlinked products are particularly useful hedging instruments. Portfolio managers can mitigate downward movements in the general level of stock prices by buying volatility, i.e., by buying VIX futures and options.
The market for VIX-related financial instruments, such as futures, options and exchange-traded funds, has risen steadily over the years. While estimates of VIX exposure vary, one analyst suggests it could be as high as $60 billion. 1 Given the size and reach of numerous financial products linked to the VIX, any artificial inflation or deflation of the index can have widespread ramifications, including substantial damages suffered by various parties.

The claims that the VIX has been manipulated
The VIX index gauges the implied 30-day volatility of the market calculated from options on the Standard and Poor's 500 stock index (S&P 500). Futures and options on the VIX itself have a relatively large volume of trading. The value of the VIX is calculated from a wide range of out-ofthe-money options, and some of the far out-of-the-money options are thinly traded. In a recent study, Griffin and Shams (2018) have argued that a market participant could manipulate the VIX futures settlement prices by trading in the thinly traded, far out-of-the-money less liquid options used to calculate the VIX. They found that during settlement periods, volume spikes have occurred in these thinly traded options referencing the VIX. They concluded that such trading patterns are consistent with market manipulations during the period 2008 to the present.
The Griffin and Shams (2018) study has been quite influential and has been cited in many recent lawsuits alleging manipulation of the VIX. The plaintiffs in these suits claim, for example, that 'a select group of financial institutions and trading firms with sophisticated, expensive technology' 2 are engaging in 'rampant manipulations of the VIX index.' 3 The news media outlets have also paid considerable attention to this issue. For example, The New York Times cites markets experts who believe that traders 'who persistently short the VIX have distorted the market,' 4 and Barron's reported in April 2018 that investors 'suspected that someone was trying to manipulate the VIX, which had spiked suddenly just a few weeks before, roiling financial markets.' 5 The goal of this paper is to take a direct approach to examine the hypothesis that the VIX has been manipulated from 2008 to the present. Surely, the proof of the pudding is whether VIX levels themselves have displayed a different pattern during the period from 2008 to the present than it did in earlier periods. We also examine whether the VIX levels were artificially inflated or deflated on the VIX futures and options expirations days. It is important to note that our analysis uses daily closing levels of the VIX. As a result, in this paper we do not examine whether the effect, if any, of the alleged manipulation of the VIX lasted only for a brief period of time within the futures expirations days. Such an analysis, which requires intraday data, is beyond the scope of this paper. This paper's analyses have two components. First, we examine the daily closing levels of the VIX over the past two decades. We fit a model that regresses the VIX on a set of regressors using data from the 10-year period 1998-2007 (the 'in sample'); we then use the same set of regressors to predict the VIX over the period at issue: 2008 through April 2018 (the 'out of sample'). If the VIX were manipulated since 2008, then the out-of-sample prediction errors (i.e., the regression residuals) would be expected to be higher than the 'in-sample' errors. This is because, if there was manipulation, then its effects would become evident in the errors, which reflect the portion of the VIX's movement unexplained by the model's regressors, which are free from any manipulation claims. Various tests show that the 'out-of-sample' prediction errors are not higher than the 'in-sample' ones. These results imply that in both periods-the period at issue and the 10 years preceding it-market forces explain the VIX and there is no indication of artificiality in the level of the VIX. 6 The only time period within 2008-2018 where the prediction errors are unusually large is the fourth quarter of 2008. However, as is widely known, these 3 months were characterized by extreme market turbulence and the VIX spiked to record levels. Thus, large prediction errors during this 3-month period are to be expected because a simple linear regression model cannot capture the full impact of the unprecedented market disruptions and the resulting jumps in the VIX index.
The second component of this paper's analyses focuses on the expiration days of VIX futures. Recall, Griffin and Shams found unusual trading patterns on these expiration days, and thus, we inspect whether there is evidence of manipulation in these specific days. In particular, we examine how many of the 234 VIX futures and options expiration days in the 2008 to April 2018 period have statistically significant residuals. We find that the number of significant days is no different than what one would expect based on random chance, and thus, our results do not support a claim of manipulation on VIX futures expiration days. We also examine whether the settlement prices of the VIX futures contracts were artificial. Again, we find that empirical evidence does not support the claim of artificiality of the settlement prices. In sum, we have examined the claim of manipulation for: (a) the period 2008 to the present, (b) the VIX closing values on settlement days and (c) the VIX futures settlement prices. All three analyses suggest that the VIX is explained by market fundamentals 7 and strongly reject the hypothesis of manipulation.
This paper is organized as follows. In the next section, we provide a brief background for the VIX and review the relevant literature. In the third section, we describe the data and methodology used for our analyses; we also set out the regression model and the general framework for the analysis. In the fourth section, we discuss the regression results, focusing on the various parametric and nonparametric tests of the difference between 'in-sample' and 'out-of-sample' regression residuals. In the fifth section, we describe the data on VIX futures expiration days and then examine whether the residuals on those days are statistically significant. In the final section of the paper, we briefly discuss why our results are not necessarily incongruent with the findings of Griffin and Shams.

Relevant background and the literature
The Chicago Board Options Exchange (CBOE 2018) provides an excellent description of the history and mechanics of the VIX. At its inception in 1993, the VIX was based on the implied volatility of at-the-money S&P 100 option prices. The intent was to provide a reliable estimate of short-term stock market volatility and to 'offer a market volatility 'standard' upon which derivative contracts may be written.' 8 The method of calculation of the index was revised in 2003. The new method, currently employed, estimates expected volatility by using the weighted average prices of a wide range of strikes of puts and calls of S&P 500 options expiring in approximately 30 days. The methodology was further refined in 2014 to include S&P 500 weekly options. CBOE (2018) documents that an inverse relationship between the market and the VIX tends to hold roughly 80% of the time. Despite the fact that the VIX is often viewed as a hedge against market downturns or a proxy for investor sentiment, it is important to note that the VIX is simply a formulaic representation of derived 30-day forward volatility expectations based upon S&P 500 option prices. As co-creator Devish Shah explains, the VIX is akin to measuring 'the temperature outside. If it's the winter it's going to be really [cold], and if it's summer it's going to be really hot. It's not the cold index or the heat index, it's just the temperature.' 9 The prior literature The prior literature relevant to our study falls into two broad areas: The first has examined the market forces that drive the VIX, and the second relates to the methods for detecting manipulation.
Three recent studies have focused on factors explaining the movement of the VIX. Hait (2016) examined the relationship between the VIX and S&P 500 returns and found that 98.8% of the daily variation in the VIX can be explained by current S&P returns and lagged VIX values. Vodenska and Chambers (2013) compared the movement in the VIX with 22-day realized S&P volatility and found that over 80% of the variation in the VIX is explained by this one variable alone. Ozair (2014) focused on the impact of market shocks on the index. He found that the impacts of a shock persisted in the VIX and these shocks account for nearly 70% of the variation in the VIX.
Other studies have looked at the asymmetric movement of the VIX. For example, Zakamulin (2016) found that the durations for the periods of rising and falling VIX are unequal: The timespan of falling periods exceeds that of rising VIX by a factor of 1.4. This finding suggests that the VIX often experiences sudden leaps, which take a fairly long time to subside. Chow et al. (2018) found that the VIX estimation errors-between realized volatility and the VIX-are considerably larger during volatile markets.
Turning next to the literature on manipulation, we note that investors, academics and regulators generally fail to agree on the definition of manipulation. Furthermore, they disagree on how manipulation can be discerned from transactions, if at all. This is, in large part, because in any market, each order or transaction, particularly larger ones, can affect market prices. As a result, distinguishing manipulative transactions from legitimate ones using price effects can be challenging.
One branch of the literature on market manipulation posits that the intent behind the trading activity is the key to determining whether manipulation occurred. For example, Perdue (1987Perdue ( -1988 focused on whether the conduct of the relevant party involved was reasonable. If the conduct was uneconomical or irrational, then such conduct could indicate manipulative intent. Rather than focusing on unobservable 'intent' of a trade, Pirrong (1996) proposed an economic model of manipulation based on observable variables and discussed the conditions that facilitate manipulation. In a subsequent article, Pirrong (2004) set out a number of econometric tests to detect price and quantity patterns symptomatic of manipulation.
Abrantes- Metz et al. (2013) used data on prices, bids, quotes, spreads, market shares, and especially volumes to identify patterns that are anomalous or highly improbable, and concluded that such patterns could indicate manipulation. The Griffin and Shams (2018) study has a similar theme. They focused on the 'highly unusual' trading activity in the underlying options used to determine settlement prices on VIX futures expiration dates. The final settlement price for expiring VIX futures is determined during a 30-min period known as the Special Opening Quotation (SOQ) on futures expiration days. Griffin and Shams (2018) observed volume spikes in the S&P 500 option book during the SOQ. They then examined various alternative explanations of these volume spikes and found 8 CBOE (2018). 9 Max Abelson and Joe Weisenthal, ''An Inventor of the VIX: 'I Don't Know Why These Products Exist,''' Bloomberg, February 6, 2018. that none was supported by data. As a result, they concluded that these spikes were consistent with manipulation of the VIX.
Most prior empirical studies on manipulation have examined the price movements of the financial instruments or commodity at issue and not the trades of the alleged manipulator. This is because data on an alleged manipulator's trades are proprietary and are virtually impossible to acquire. However, empirical studies related to the Amaranth matter were exceptions. Amaranth LLC was a hedge fund that closed down in 2006 and faced allegations of manipulation of natural gas futures. The Senate investigation yielded data on Amaranth's natural gas futures trades, and a number of studies have relied on this dataset. For example, Marthinsen and Gai (2010a) used a Granger causality model to analyze whether Amaranth's trades affected the prices of natural gas futures in 2006. In a follow-up article, the authors examined whether Amaranth's spread trading affected prices of calendar spreads, particularly the winter-summer spreads (Marthinsen and Gai, 2010b). Saha and Petersen (2012) also used the Amarnath dataset. They proposed a method to examine both whether prices were artificial and whether alleged manipulator's trades caused the price artificiality. Their methodology involved creating a model to explain futures prices using market fundamentals and then examining the correlation between the 'errors' (that is, the difference between actual and model-predicted prices) and the alleged manipulator's trades.
This article contributes to the existing literature by developing a framework for the examination of VIX manipulation claims using the relationship between the VIX and market fundamentals. Like Saha and Petersen (2012), we examine the pattern of the estimation model's 'errors' to determine whether manipulation of the index occurred. As noted earlier, some prior studies have included lagged value of the VIX as an explanatory variable in the regression model for the VIX. We do not include this variable in our model since our goal is to examine whether or not manipulation occurred, and therefore, inclusion of a lagged value of a variable which itself may contain the effects of alleged manipulation would contaminate our analysis.

The data and the general methodology
The CBOE Web site provides historical data on the VIX using the current methodology backfilled to 1990. As noted earlier, CBOE revised its methodology of calculating the VIX in 2003 and in 2014. However, CBOE provides data on the VIX going back to its inception, with the daily closing values of the index calculated using the most current (i.e., 2014) methodology. Throughout this paper, we have used these backfilled data on VIX, which precludes the possibility of the VIX values being affected by changes in the methodology of calculating the index.
Our analysis uses the daily closing values of the index for the roughly 20-year period, 1998 through April 2018. These 2 decades allow us to use a 'clean' period (i.e., 1998-2007) with a similar length of time to the period in question (i.e., January 2008-April 2018). We also gathered data on the daily closing values of the S&P 500 index and computed its daily log-returns, denoted by 'Spr' in the table below. This variable was then used to compute the two key regressors: the 20-day rolling volatility and the 5-day rolling mean of the S&P daily returns. These two variables are denoted by 'Spv' and 'Spm.' We also created two indicator variables for a day with a positive (sp ? ) and a negative (sp -) return for the S&P index. The summary statistics for these variables are reported in Table 1. 10 Before undertaking the regression analysis, we examined whether the time series on the VIX index is stationary. In particular, we implemented the augmented Dickey-Fuller (DF) test to determine whether the VIX follows a unit root process. 11 The test results strongly rejected the null hypothesis that the VIX follows a non-stationary process, that is, it has a unit root. 12 Based largely on these results of the test for stationarity, and as in many prior studies, we have chosen to use a levels model of the VIX. The explained variable in the regression model is the level of VIX rather than the daily changes in the index. However, we observed that the mean level of the VIX in the 2008-2018 period was slightly lower than the preceding 10 years' level. We thus 'de-mean-ed' the explained variable, that is, for each of the two sub-periods we subtracted the respective means from the daily VIX values.
The explanatory variables chosen for the regression are generally consistent with the prior studies discussed earlier.
In particular, we posit that the VIX is explained by two key variables: the 20-day realized volatility and the realized 10 See 'Summary statistics for two sub-periods' of ''Appendix'' for the summary statistics for the two separate periods : 1997-2007 and 2008-2018. 11 The augmented DF test was implemented by fitting the model (Dickey and Fuller 1979). Testing b ¼ 0 is equivalent to testing that the process has a unit root, i.e., that it is non-stationary. For the VIX, the null hypothesis H 0 : b ¼ 0 was tested for up to 20 lags (i.e., k = 20) and was overwhelmingly rejected in each case. See 'Dickey-Fuller test for unit root' of ''Appendix'' for further details. 12 We ran these tests through the entire 20-year time period, as well as the pre-2008 and post-2008 periods, separately. Finally, we also ran these tests by excluding the fourth quarter of 2008. In each case, the results of the analyses were the same: The non-stationarity of VIX was rejected. For details, see 'Dickey-Fuller test for unit root' of ''Appendix.'' average 5-day returns of the S&P 500 index; each of these two variables is lagged by a day and interacted with indicator variables for a positive or a negative return day. The regression equation is shown in (1): In choosing the regressors in (1), we were careful not to include variables that could have been affected by the alleged manipulation. For example, as indicated earlier, this was our rationale for not including a lagged value of the VIX. 13 Similarly, we chose not to use contemporaneous values of the regressors and used lagged values instead. 14 One might argue that any given day's level of the VIX can potentially affect that day's realized volatility, thereby creating a simultaneity problem in the regression analysis. The inclusion of the lagged values of the regressor avoids this problem because any given day's VIX level cannot affect the preceding day values of realized volatility. The model in (1) was estimated using daily data from the 'clean' period or the in-sample period (i.e., January 1998-December 2007). We then used the estimated coefficients to predict the daily levels of the VIX during the out-of-sample period, that is, the period at issue (i.e., January 2008-April 2018). We then compared the regression errors (i.e., the residuals) between the actual and the predicted levels of the VIX in the two periods to determine whether they are statistically significantly different. If manipulation of the index was evident, then one would expect the residuals (absolute or squared value) to be larger in the post-2008 period. 15 As noted earlier, we addressed the issue of whether VIX levels were different across the two periods by de-meaning VIX in both periods. We have also tested whether the volatility of VIX was different in the post-2008 period relative to the pre-period. For example, if the volatility of VIX (the dependent variable in the regression) was lower in the 2008-2018 period, it could lead to smaller residuals for that period; however, in that case, the smaller residuals would not indicate a better fit of the model, but rather would be an artifact of lower volatility of the dependent variable. However, our tests showed that the volatility of VIX was actually higher in the 2008-2018 period. The details of these tests and the results are contained in the 'Bartlett's test for equal variance of VIX' of 'Appendix' to this paper.

The results of the analyses
The regression results are shown in Table 2. The estimation model's predictions are the focus of our analysis and not the statistical significance of the estimated coefficients. Adjusted R 2 0.7392 13 Had we included the VIX tÀ1 in the right-hand side of (1), the explanatory power of the model would have been significantly higher (the adjusted R 2 becomes 0.96 versus 0.74 without including VIX tÀ1 ). 14 Had we utilized same day (i.e., un-lagged) values of the regressors in the right-hand side of (1), the explanatory power of the model improves somewhat and all the main findings of the paper remain unchanged. 15 Note, because the VIX has been de-meaned, the mean value of the index in the two periods is exactly zero. Thus, the difference in the absolute value or squared value of the residuals from the two periods cannot be explained by the differences in the two period's average level of the VIX, the explained variable in the model.
Footnote 15 continued Furthermore, since the residuals during the period at issue are outof-sample residuals, a potential manipulator would be able to move the VIX level to the model-predicted value (thereby decrease the size of the residuals) only if the manipulator could know with certainty the in-sample modeled relationship between the regressors and VIX, and be certain that the same relationship would continue to hold during the period at issue. However, as the t statistics in Table 2 indicate, all the coefficients are statistically significant. 16 In Fig. 1, we display the actual and predicted levels of the VIX, for both the in-sample and out-of-sample periods. As is evident from this figure, the model performs well in predicting the daily level of the VIX during both periods. 17 The plot of the regression residuals is shown in the 'Regression residuals' of 'Appendix. ' In order to test the predictive accuracy of the model, we examined several measures. One such widely used measure is the Theil's U statistic. 18 A lower value of the statistic indicates higher predictive accuracy. We found that the value of Theil's U was considerably lower in the 2008-2018 period than in the 10 years preceding it. This result shows that the regression model has a higher predictive accuracy during the period at issue than in the pre-2008 period. 19 This finding was also corroborated by other measures of the predictive accuracy [such as mean absolute error (MAE) and root mean squared error (RMSE)] discussed later in the paper.

Testing for structural break
Before undertaking a comparative analysis of the in-sample and out-of-sample residuals, we tested whether the relationship between the market fundamentals and the VIX shows evidence of structural break in the post-2008 period. In particular, we estimated our regression Eq. (1) using two different periods ' data: 1998-2007 and 2008-2018. We then tested whether the estimated coefficients of the regressions using the two periods' data were significantly different. 20 The results of the tests for equality of the estimated coefficients are shown in Table 3. In this table, we report the p values of the test under the null hypothesis  16 We also estimated the model using the Newey-West correction for heteroskedasticity and autocorrelation. All coefficients remain statistically significant. 17 For the purposes of depiction of the actual and predicted values of the VIX in Fig. 1, we have used a model that has the actual and not de-meaned value of the index. However, with the exception of this chart, we have consistently used the de-meaned value of VIX for all empirical analyses in this paper. 18 Greene (1997); p. 373. 19 The details of the Theil's U computation are discussed in 'Theil's U statistics' of ''Appendix.'' 20 Further details of the tests for structural break are contained in the 'Testing for structural break' of ''Appendix.'' that the two periods' estimated coefficients are equal to each other, which would imply an absence of structural break.
The p values in column [A] of this table are indicative of a structural break: Three of the five coefficients are significantly different from each other. However, further investigation of this issue reveals that the results are driven by data from the fourth quarter of 2008, when the VIX rose to its highest level ever recorded. In column [B], we undertake the same tests of equality of the coefficients, but this time, the data for the first regression remain the same, while the second regression uses data for 2008-2018 period excluding the fourth quarter of 2008 (henceforth 2008 Q4).
The p values in column [B] show that none of the coefficients are significantly different from each other, indicating the absence of a structural break. In other words, the relationship between the explanatory variables and the VIX is essentially unchanged in the two periods, when the second period excludes the 2008 Q4. The importance of these unusually volatile 3 months will also be apparent when we undertake the comparative analysis of the prediction errors, next.

Analysis of the prediction errors
The estimated coefficients from the base 1998-2007 period shown in Table 2 are then used to predict the VIX and to compute the in-sample and out-of-sample residuals. We then run various tests to see whether the two sets of residuals are statistically different from each other. However, before undertaking these tests, it is important to note that tests strongly rejected the hypothesis that the residuals are normally distributed. 21 Consequently, traditional tests of equality of means (t test, Z test, etc.) might be unreliable, since they are based on the distributional assumption of normality. However, in the interest of completeness, we have undertaken both parametric tests (assuming normality) and nonparametric (i.e., distribution-free) tests. The results are shown in Table 4.
Because the out-of-sample period includes 2008 Q4, a period of 3 months marked by large and unprecedented spikes in the VIX, we compared the means (and medians) of the in-sample residuals with two sets of out-of-sample residuals: one that includes 2008 Q4 and the other that does not. As is evident from the results in Table 4, the residuals for 2008 Q4 are generally much larger and their exclusion makes a large difference in the test of means. 22 However, the medians are far less affected by inclusion of 2008 Q4, since the median is less sensitive to outliers than the mean. We compared the means and medians of both measures of the residuals: their absolute and their squared values.
As shown in Table 4, when the absolute value of the residuals is compared, the out-of-sample residuals are in fact smaller than the in-sample ones, and this holds true whether or not 2008 Q4 is included in the out-of-sample period. We also report the p value of the one-sided test of means. 23 In this test, the null hypothesis is that the means of the two periods are equal; it is tested against the alternative that the residuals in the out-of-sample period are larger.
For the squared residuals, we report the widely used measure of forecast accuracy: the root mean squared error (RMSE), which is the square root of the mean of the squared residuals. 24 The in-sample RMSE is significantly larger than the out of sample; the p value of 0.02 rejects the null that the two means are equal. However, this difference in means is driven by the large residuals in 2008 Q4. When that quarter is excluded from the out-of-sample period, the RMSE actually becomes lower than the in-sample period's, although the difference is not statistically significant (the  Chow et al. (2018), who found that the estimation errors were considerably larger during volatile markets. 23 Each of these tests of means was undertaken using a t test, as the test statistics for the difference of means test is distributed as t distribution. Mood: Graybill et al. (1974), p. 435. 24 Greene (1997), p. 372.
p value is 0.68). 25 Importantly, when one compares the median of the squared residuals, the out-of-sample median is lower, regardless of whether 2008 Q4 is included in the out-of-sample period. These tests show that one cannot find in the data any evidence of unusual values of the VIX during the period of alleged manipulation. Panel B of the table contains the results of the nonparametric test. This test is undertaken through a Monte Carlo (akin to bootstrapping) exercise: Two samples of squared residuals, each with 253 observations, are randomly drawn (with replacement) from the in-sample and the out-of-sample periods. We chose a random sample size of 253 data points because it constitutes approximately 10% of the observations in each period. 26 For each random draw, we compare the means and medians of the two samples. This process is repeated 10,000 times. Then, we calculate the percentage of the draws where the average out-of-sample squared residuals were greater than the average in-sample ones. If this percentage was greater than 50%, this would indicate that more than half of the time the average out-of-sample residuals were larger than the average in-sample ones, that is, on average, the model's out-of-sample forecasts were worse than the in-sample ones. By contrast, a percentage lower than 50% would imply superior forecasting performance of the estimation model in the 2008-2018 period, relative to the estimation period of 1998-2007. The fact that large residuals are clustered in 2008 Q4 is particularly clear in the Monte Carlo exercise: The percentage of random draws where the mean of the out-ofsample squared residuals is larger drops from 73.6 to 43.5% when 2008 Q4 is excluded from the out-of-sample period. By contrast, when the medians of the randomly drawn samples are compared, less than 8% of the time the median of the out-of-sample residuals is larger, and this is true regardless of whether one includes 2008 Q4 in the outof-sample period. Therefore, the nonparametric results are consistent with the traditional test results.
In panel B of the table, we also report the results of the nonparametric Wilcoxon signed-rank test, which is used to determine whether two samples were selected from populations having the same distribution. The p value of the test, 0.22, also cannot reject the hypotheses that the residuals in the out-of-sample period come from the same distribution as those of the in-sample period.
Panel C of the table provides further evidence on the clustering of large residuals in 2008 Q4. As expected,  25 We are aware that throwing out the most extreme quarter of the test sample without throwing out the most extreme quarter from the estimation sample could introduce a bias. However, the bias, if any, would work in favor of finding that the residuals in the test sample are larger than in the estimation sample, because throwing out the most extreme quarter from the estimation sample can only reduce the size of its residuals. 26 Our results remain virtually unchanged if somewhat smaller or larger random samples were drawn.
roughly 5% of the in-sample residuals are statistically significant. 27 But approximately 44% of the residuals are statistically significant in 2008 Q4, and approximately 5% of the residuals are statistically significant in the remaining out-of-sample period excluding 2008 Q4. These results make intuitive sense: The linear regression model cannot fully capture the large spikes in the VIX, like those in 2008 Q4. Thus, one would expect that the residuals-which reflect the portion of the VIX's movement unexplained by the model-to be large in 2008 Q4 when markets were in massive disarray. Furthermore, the regressors in the estimation model are lagged by a day [day t -1]; as a result, they do not capture the changes in the market happening on a given day, which, of course, is reflected in that day's VIX [i.e., on day t], the model's explained variable.
In sum, the comparative analyses of the in-sample and out-of-sample model errors undertaken in this section of paper do not support the hypothesis of manipulation of VIX. If one excludes just 3 months of the 2008 Q4, the errors in the period at issue are actually smaller than in 1998-2007. And the larger errors in 2008 Q4 are explained by the unprecedented jump in the VIX precipitated by the financial crisis. Thus, the movements of the VIX throughout the 20-year period analyzed appear to be consistent with normal market forces and do not support the conjecture of artificiality or manipulation of the index.

The analysis of the VIX futures and options expiration days
In the preceding section, while we examined a 10-year period for signs of manipulation, one might argue that the effect of the manipulation was confined to specific days when the VIX futures and options expired. This line of argument is consistent with the Griffin and Shams (2018) study, which found unusual trading activities during VIX futures expiration days and therefore concludes that settlement prices were manipulated.

Testing the statistical significance of the model errors on futures expiration days
There are currently both monthly and weekly VIX futures contracts; monthly futures expire on the same day (typically 3rd Wed of a month). Weekly futures expire on Wednesday. Monthly futures started trading in May 2004; weekly futures were introduced in August 2015.
In any month, 1 weekly option expiration day coincides with the monthly option and futures expiration day. There are 274 unique expiration days (monthly and weekly combined) in our data set. Of these, 40 are in the pre-2008 period; so, there are 234 expiration days in the period at issue.
To test the significance of the regression errors on the expiration days, we used the same regression model as in (1) estimated using data for the time period 1998-2018 but excluding the 234 expiration days from the estimation sample. We then generated regression residuals for these 234 days (thus, they are out-of-sample residuals for the 234 days) and then tested their statistical significance. Our approach is akin to widely-used event study analysis, where the residual value on a given event day is tested for statistical significance. Details of these tests are contained in the 'Testing for statistically significant residuals on settlement days' of 'Appendix. ' The results are shown in Table 5.
Panel A in Table 5 shows that 11 days or 4.7% of the 234 expiration days at issue are statistically significant. 28 By random chance alone, one would expect 5% of the expiration days to be statistically significant, and that is confirmed by the Monte Carlo results shown in Panel B of the table. Consistent with the results in the preceding section of the paper, we also observe, if one excludes 2008 Q4, the proportion of statistically significant expiration days falls under 4%. We have tested the significance of the futures expiration days using both a distribution-free nonparametric approach and also using the standard t test; the results under both approaches are identical.

Testing of evidence of manipulation in settlement prices
So far in this paper we have focused on the daily closing value of the VIX. As noted earlier, the analysis by Griffin and Shams (2018) was focused specifically on the VIX futures' settlement prices and trading during the auction period during which those prices are determined. In this subsection, we will examine the settlement prices to see if there appears to be any artificiality in VIX futures contracts.
The settlement price for an expiring futures contract is determined using an auction process called the Special Opening Quotation (SOQ). The SOQ takes place on the morning of each expiration day. The CBOE provides the settlement price for each futures contract back to 2013 on its website. We were able to obtain settlement prices for futures contracts prior to 2013 from an alternate data provider and confirmed the accuracy of data using the periods that overlap, i.e., since 2013.
Here we undertake a variant of the foregoing analysis by substituting the VIX futures settlement prices for the VIX closing values on expiration days, i.e., the settlement days. Thus, under this approach, a regression error on a futures expiration day is the difference between the settlement price (as opposed to the VIX closing value) and the modelpredicted VIX value. The results of our analysis are shown in Table 6. Table 6 shows that when settlement prices are used, the proportion of statistically significant days drops to 4.3% (which is lower than the percentage (approximately 5%) one would expect by chance alone) even when we include 4Q 2008. This result provides compelling evidence that the settlement prices on the VIX futures on the expiration days were not artificial.
In sum, the results of our analysis of the VIX futures expiation days do not support the hypothesis of manipulation of the VIX, even on the specific dates of VIX futures expirations. During the period at issue, 2008 through the present, the number of statistically significant expiration dates is consistent with random chance, regardless of whether we use VIX closing values or settlement prices of VIX futures. Thus, our findings imply that the level of the VIX on futures expiration days is explained by normal market forces. 29

Concluding comments
In this paper, we examined the daily level of the VIX index for signs of manipulation, as has been alleged during the period January 2008 to the present. We constructed a model using explanatory variables that are exogenous to the index and found that the results strongly support the movement in the VIX being explained by market fundamentals: the results overwhelmingly do not support a claim of manipulation. We also specifically examined the VIX futures expiration days and found that the VIX closing values as well as the VIX futures settlements prices on these expiration days are consistent normal market forces and do not show evidence of manipulation.
While these findings strongly support the conclusion that the VIX is not manipulated, it is important to note that our findings are not necessarily incongruent with those of Griffin and Shams (2018). As noted earlier, determination of whether any artificiality in the VIX existed for brief periods of time during and after the SOQ would require intraday data and that analysis is beyond the scope of our study. Our findings, however, do imply that, notwithstanding Griffin and Shams finding of unusual trading patterns during the SOQ, the effects of these trades do not persist through the close. Our analysis shows that both the closing values of VIX on settlement days and the settlement prices themselves do not show any evidence of

Appendix
Summary statistics for two sub-periods Below we list the mean and standard deviations for the regression variables for the two sub-periods. The two periods have similar statistics, with the exception of the standard deviation of the VIX, which is higher in the outof-sample period (Table 7).

Dickey-Fuller test for unit root
The results of the Dickey-Fuller tests are reported in Table 8 for multiple time periods. Leybourne et al. (1988) showed that the existence of a structural break can lead to a false rejection of the Dickey-Fuller Test. To confirm that the VIX time series was stationary, we tested the two sub-periods separately, as well as the entire time period  with and without the fourth quarter of 2008 (since once one excludes 4Q 2008 the data do not show structural break). The null hypothesis of a non-stationary process (i.e., existence of a unit root) was rejected in every case, as indicated by the p values below.
Bartlett's test for equal variance of VIX See Table 9. The p values are for the test of the null hypothesis shown in footnote 9 of the paper

Theil's U statistics
Theil's U statistics for forecast accuracy was computed as follows: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where V t denotes the actual value of the VIX index andV t denotes the predicted value of the index on day t and T denotes the total number of observations. Large values indicate a poor forecasting performance. See Greene (1997), p. 373.
As can be seen below, the out-of-sample forecasting performance of the regression model is better than the insample performance (Table 10).

Testing for structural break
For each regressor, x i t , in the regression equation given by (1) in the body of the paper, we created two associated regressors by multiplying each by two indicator variables, D 1 and D 2 , where D 1 takes a value of one if it is in the 1998-2007 period and zero otherwise, and D 2 takes a value of one if it is in the 2008-2018 period and zero otherwise.
Then, we estimated a single equation using the entire dataset : We then tested the null hypothesis: H 0 : b 1i ¼ b 2i ; i ¼ 1; . . .; k, using a t test. Table 3 in the paper contains the p values of these tests. In undertaking the tests by excluding the 4Q 2008 (panel B in Table 3), we adopted the same procedure as above, except we dropped the observations from this quarter when estimating Eq. (2).
Estimating the single equation as shown above in (2) and testing the null hypotheses noted above are equivalent to estimating two separate equations, one for each of the two sub-periods (1998-2007 and 2008-2018), and then testing whether the estimated coefficients are the same across the two equations.

Testing for normality of residuals
The Jarque-Bera (Jarque and Bera (1987)) test is to examine whether the skewness and kurtosis of the sample data match that of a normal distribution. The test statistics is defined as follows: where T is the number of observations; k is the number of regressors; S is sample skewness; and K is sample kurtosis.
The JB statistic has a Chi-square distribution with 2 degrees of freedom.
In Table 11, we report the p values for the individual tests for excess skewness being zero and excess kurtosis (i.e., greater than 3) being zero; the final column labeled 'Joint' shows the p value of the JB statistics, noted above. In all three cases, the null hypothesis of normally distributed residuals is rejected.

Testing for statistically significant residuals on settlement days
We estimated a single equation using the entire dataset : where x i t are the regressors in the regression equation given by (1) in the body of the paper and S j denotes an indicator variable that takes a value of one on the jth settlement day, zero otherwise. We then tested the statistical significance of the estimated coefficients,â j ; j ¼ 1; . . .; L, using a t test. This is exactly the approach undertaken in event study analysis.
Note, in Eq. (3), the estimated coefficients, a j ; j ¼ 1; . . .; L, are exactly equal to the residuals we would have obtained if we simply estimated the equation Vix t ¼ P k i¼1 b i Á x i t þ e t by excluding the L settlement days and then predicted the residuals for those days.
When using settlement prices instead of VIX closing values (the results reported in Table 6 of the paper), we estimated (3) but substituted the settlement prices for the actual VIX values on the settlement days. The estimated coefficients,â j ; j ¼ 1; . . .; L, in this case are equal to the differences (i.e., residuals) between settlement prices and model-predicted VIX values on the settlement days.