The procedure introduced in the previous section is evaluated by means of a Monte Carlo experiment,Footnote 5 considering the variance ratio test proposed by Lo and MacKinlay (1988)Footnote 6 and the Ljung–Box test (1978), when started from methods that test only some necessary, but not sufficient, condition of the random walk hypothesis; and the BDS testFootnote 7 and the runs test when starting from methods that directly test the mentioned hypothesis. If the procedure requires the application of an ARCH effect test to decide between random walks 1 and 3, ARCH models up to order 4 are used.
To conduct this analysis, return series are generated from two different models because the objective is twofold: to evaluate the performance of the procedure in the analysis of the random walk 1 hypothesis against the linear correlation alternative, on the one hand, and against that of the random walk 3, on the other.Footnote 8
Thus, the BDS, runs, variance ratio and Ljung–Box tests are applied to each generated return series. Then, if the RW1 hypothesis is rejected by the first two tests, the variance-ratio test is applied to determine whether the series is at least RW3. On the other hand, if the random walk hypothesis is not rejected with the first two tests, an ARCH effect test is applied to discern between RW1 and RW3. The process is replicated 10,000 times for each sample size T and each value of the parameter involved in the generation of the series (see the whole process in Fig. 4).
-
a.
Nominal size
Before analysing the Monte Carlo powers of the procedure initiated from the different indicated tests, the corresponding nominal size is estimated; that is, the maximum probability of falsely rejecting the null hypothesis of random walk 1 is calculated in each case. Since the different ways of executing the proposed procedure contemplate the possibility of applying tests sequentially to make a decision, we must not expect, in general, the nominal size of each case to coincide with the significance level α which is fixed in each of the individual tests.
To estimate the mentioned nominal size, return series that follow a random walk 1 are generated
$$r_{t} = \varepsilon_{t} ,\quad t = 1, \ldots T$$
(5)
where \(\varepsilon_{t} \sim iid(0,1)\). Specifically, 10,000 series of size T are generated, and the tests required by the specific way in which the procedure is being applied are performed on each data series independently, not sequentially, with significance level α. The reiteration of this process allows us to determine, for each T, the number of acceptances and rejections of the null hypothesis (random walk 1) that occur with the independent application of each test. This makes possible the estimation of the nominal size of the procedure in each case as the quotient of the total rejections of the null hypothesis divided by the total number of replications (10,000 in this case).
The process described in the previous paragraph was performed for the sample sizes T = 25, 50, 100, 250, 500 and 1000 and significance levels α = 0.02, 0.05 and 0.1 [application of the process in Fig. 4 for expression (5)]. The results (Table 2) indicate, for a given value T, the (estimated) theoretical size of the procedure when a significance level α is set in the individual tests required by the cited procedure initiated from a specific method. For example, if for \(T = 100\) the researcher sets a value of \(\alpha = 0.05\) in the individual tests and wishes to apply the procedure initiated from the variance ratio test, he will be working with a (estimated) theoretical size of 0.0975.
Table 2 Estimated nominal size for the procedure when starting from a specific test The estimated nominal size of the procedure when starting from methods that directly test the hypothesis of random walk 1 is much better in the case of the runs test since it practically coincides with the significance level α fixed (in the individual tests) for any sample size T. However, size distortions (estimated values far from the level α) are evident when the procedure is initiated from the BDS test, and the results are clearly affected by T. In effect, the greatest distortions occur for small sample sizes and decrease as T increases (at \(T = 1000\), the estimated nominal size for each α is 0.0566, 0.133244 and 0.2214, respectively, i.e., approximately \(2\alpha\)).
Since the variance ratio test and the Ljung–Box test do not directly test the random walk 1 hypothesis—to estimate the nominal size of the procedure initiated from any of them, it is necessary to apply tests sequentially—the results that appear in Table 2 for these two cases are expected in the sense that the estimates of the respective nominal sizes for each T are greater than the significance level α. In this context of size distortion, the best results correspond to the case of the variance ratio test, with estimated values very close to the significance level α for small sample sizes (\(T = 25\) and 50) but that increase as T increases (note that at \(T = 1000\), for each value of α, the nominal size is approximately double that at \(T = 25\), i.e., approximately \(2\alpha\)). In the case of the Ljung–Box test, where the distortion is greater, the sample size T hardly influences the estimated values of the nominal size since, irrespective of the value of T, they remain approximately 10%, 21% and 37% for levels 0.02, 0.05 and 0.1, respectively.
-
b.
Empirical size and Monte Carlo power
(b1) The performance of the procedure for testing the random walk 1 hypothesis against the only linear correlation alternative (among the variables of the return series generating process) is analysed using the model
$$r_{t} = \phi_{1} r_{t - 1} + \varepsilon_{t} ,\quad t = 1, \ldots ,T$$
(6)
with \(r_{0} = 0\) and \(\varepsilon_{t} \sim iid(0,1)\). By means of (6), ten thousand samples of sizes T = 25, 50, 100, 250, 500 and 1000 of the series \(r_{t}\) are generated for each value of parameter \(\phi_{1}\) considered: − 0.9, − 0.75, − 0.5, − 0.25, − 0.1, 0, 0.1 0.25, 0.5. 0.75 and 0.9. In this way, the model yields return series that follow a random walk 1 (particular case in which \(\phi_{1} = 0\)) and, as an alternative, series with a first-order autoregressive structure (cases in which \(\phi_{1} \ne 0\)), i.e., they would be generated by a process whose variables are correlated. Therefore, when the null hypothesis is rejected, some degree of predictability is admitted since by modelling the above autoregressive structure with an ARMA model, it is possible to predict price changes on the basis of historical price changes.
The procedure, starting from each of the considered tests (BDS, runs, Ljung–Box and variance ratio), was applied to the different series generated by the combinations of values of T and \(\phi_{1}\) with a significance level of 5% [application of the process in Fig. 4 for expression (6)]. Then, we calculated the number of times that the different decisions contemplated by the two ways of applying the procedure are made (according to whether we start from a method that does or does not directly test the random walk hypothesis).
From the previous results, we calculate, for each sample size T, the percentage of rejection of the null hypothesis (random walk 1) when starting from each of the four tests considered, depending on the value of parameter \(\phi_{1}\). Since \(\phi_{1} = 0\) implies that the null hypothesis is true, in this particular case, the calculations represent the empirical probability of committing a type I error for the procedure in the four applications, i.e., the empirical size. However, when \(\phi_{1} \ne 0\), the cited calculations represent the Monte Carlo power of each version of the procedure since for these values of \(\phi_{1}\), the null hypothesis is false.
-
b1.1
Empirical size
The empirical sizes (Table 3) that resulted from the different cases analysed nearly coincide with the corresponding theoretical probabilities calculated for \(\alpha = 0.05\) (see Table 2). Therefore, there is empirical confirmation of the size distortions that appear in the procedure according to the test from which it is started. In effect,
-
When the procedure is initiated from methods that directly test the random walk 1 hypothesis, the results confirm that for the runs test, the size of the procedure remains approximately 5% (the significance level) for all T. Nevertheless, when initiating from the BDS test, a very high size distortion is produced for small sample sizes (0.6806 and 0.5425 at \(T = 25\) and 50, respectively), but the distortion decreases as T increases (it reaches a value of 0.1334 at \(T = 1000\)).
-
The size distortions exhibited by the procedure when starting with methods that test only a necessary, but not sufficient, condition of the random walk hypothesis, are less pronounced when the procedure is applied starting from the variance ratio test than when starting from the Ljung–Box test. Likewise, in the former case, the empirical size increases with the sample size T from values close to the significance level (0.05) to more than double the significance level (from 0.0603 at \(T = 25\) to 0.1287 at \(T = 1000\)). In the latter case (Ljung–Box), the values between which the empirical size oscillates (18% and 22%) do not allow us to affirm that there exists an influence of T.
Table 3 Empirical size of the procedure*
-
b1.2
Monte Carlo power
Table 4 reports, for each sample size T, the power calculations of the procedure started from each of the four tests considered in this study, i.e., the probability of rejecting the null hypothesis (random walk 1) with each version of the procedure on the assumption that the hypothesis is false. Likewise, since several alternatives to the null hypothesis (values that satisfy \(\phi_{1} \ne 0\)) are considered, the corresponding power functions of the cited versions of the procedure are obtained and plotted in a comparative way for each T (Fig. 5).
Table 4 Monte Carlo powers* of the procedure** started from a specific test against linear correlation-only alternatives For each sample size T and regardless of the test from which the procedure is started, the corresponding probabilities of rejection of the random walk 1 hypothesis are distributed symmetrically around the value \(\phi_{1} = 0\) (random walk 1 hypothesis). Additionally, these probabilities tend to unity as \(\left| {\phi_{1} } \right|\) increases, reaching 1 for values of \(\left| {\phi_{1} } \right|\) increasingly closer to 0 as the sample size T increases. The velocity of the described behaviour depends on the test from which the procedure is started:
-
For the two smallest sample sizes (\(T = 25\) and 50), a power of 1 is hardly achieved for any of the alternatives. Only at \(T = 50\) is the power approximately 100 percent, with the procedure initiated from any of the four tests, for \(\left| {\phi_{1} } \right| \ge 0.75\). On the other hand, at \(T = 25\), the estimated powers of the procedure initiated from the BDS test for \(\left| {\phi_{1} } \right| \le 0.5\) are much higher than those presented by the other cases. A similar situation occurs at \(T = 50\), but with less pronounced differences between what the procedure with the BDS test and the other cases yield and restricted to the alternatives with \(\left| {\phi_{1} } \right| \le 0.25\).
-
From sample size 100, we observe differences in the convergence to unity of the estimated powers according to the test from which the procedure is initiated. Thus, when starting from the Ljung–Box test and the variance ratio test, a power of approximately 1 is achieved for \(\left| {\phi_{1} } \right| \ge 0.5\) at \(T = 100\), whereas for larger sample sizes, convergence to 1 is nearly reached for \(\left| {\phi_{1} } \right| \ge 0.25\). On the other hand, when the procedure is started from the BDS test, a power of 1 is reached for \(\left| {\phi_{1} } \right| \ge 0.75\) at \(T = 100\) and for \(\left| {\phi_{1} } \right| \ge 0.5\) at \(T \ge 250\) (note that at \(T = 1000\), the estimated power does not exceed 0.89 for \(\left| {\phi_{1} } \right| = 0.25)\). Finally, when the procedure is initiated from the runs test, the value of \(\left| {\phi_{1} } \right|\) for which the powers achieve unity decreases as the sample size T increases beyond 100. Specifically, at \(T = 100\), unity is reached for \(\left| {\phi_{1} } \right| \ge 0.75\); at \(T = 250\), for \(\left| {\phi_{1} } \right| \ge 0.5\); and at \(T = 1000\), for \(\left| {\phi_{1} } \right| \ge 0.25\) (at \(T = 500\), the power is approximately 0.95 for \(\left| {\phi_{1} } \right| = 0.25\)). The plots in Fig. 5 show that the power function of the procedure initiated from the Ljung–Box test is always above the other power functions, i.e., it is uniformly more powerful for \(T \ge 100\).
-
Regardless of the test from which the procedure is started, a power of 1 is not achieved for \(\left| {\phi_{1} } \right| = 0.1\) for any sample size, not even at \(T = 1000\) (the best result corresponds to the Ljung–Box case with an estimated power of approximately 0.91, followed by the variance ratio and runs cases with values close to 0.8 and 0.53, respectively; the BDS case yields the worst result of approximately 0.18).
At this point, we can say that the power of the procedure has been analysed, that is, its capability of rejecting the null hypothesis (random walk 1) when the null hypothesis is false. As already mentioned, for \(\phi_{1} \ne 0\), Model (6) yields a series that does not follow any type of random walk. However, the proposed procedure contemplates random walk 3 among the possible decisions. Therefore, if from the powers calculated for each version of the procedure, we subtract the portion that corresponds to the (wrong) decision of random walk 3, we obtain the power that the procedure initiated from each test actually has, i.e., its capability to reject the null hypothesis in favour of true alternatives when the null hypothesis is false.
In this sense, Table 4 and Fig. 6 report, for each sample size T, the power calculations of the procedure initiated from each of the tests considered after subtracting the effect of the (false) alternative of random walk 3. Furthermore, the cited powers and those initially calculated for each version of the procedure are compared for each T in Figs. 9, 10, 11 and 12 (Appendix).
When the procedure is started from the runs test, the variance ratio test or the Ljung–Box test (Appendix Figs. 10, 11, 12), what we call real power hardly differs from that initially calculated for each sample size T (these slight differences occur for \(\left| {\phi_{1} } \right| \le 0.5\) with \(T \le 100\) and \(\left| {\phi_{1} } \right| = 0.1\) with \(T \ge 250\)). Therefore, all the above-mentioned findings in relation to the power of these three cases is maintained.
Nevertheless, there are considerable differences between the real power and that initially calculated when the procedure is started from the BDS test. In effect, the initial calculations indicated that this version of the procedure was the most powerful for \(\left| {\phi_{1} } \right| \le 0.5\) and \(\left| {\phi_{1} } \right| \le 0.25\) for \(T = 25\) and \(T = 50\), respectively (with all the values greater than 0.5), but the results in Table 4 and Fig. 6 show that the powers in these cases are actually much lower (0.2 is hardly reached in one single case). Although these differences persist for \(T = 100\), also in the context of \(\left| {\phi_{1} } \right| \le 0,25\), they start to decrease as the sample size increases from \(T \ge 250\) (we could say that, for \(T \ge 500\), there are minimal differences between the real power and the initially calculated power).
Consequently, in terms of the power referring only to true alternatives (linear correlation in this case), the procedure initiated from the Ljung–Box test is the most powerful.
(b2) The performance of the procedure for testing the random walk 1 hypothesis against only the non-linear alternative (among the variables of the return series generating process) is analysed by means of an ARCH(1) model.
$$\left\{ {\begin{array}{*{20}c} {r_{t} = h_{t} {\varepsilon}_{t} } \\ {h_{t}^{2} = {\alpha}_{0} + {\alpha}_{1} r_{{t - 1}}^{2} } \\ \end{array} } \right. ,\quad t = 1, \ldots T$$
(7)
where \(h_{t}\) and \(\varepsilon_{t}\) are independent processes of each other such that \(h_{t}\) is stationary and \(\varepsilon_{t} \sim iid(0,1)\), with \(\alpha_{0} > 0\) and \(\alpha_{1} \ge 0\). Specifically, taking \(r_{0} = 0\) in (7), 10,000 samples of sizes T = 25, 50, 100, 250, 500 and 1000 of the series \(r_{t}\) are generated for \(\alpha_{0} = 1\) and each value of \(\alpha_{1}\) considered: 0, 0.1, 0.2, 0.3, 0.4 and 0.5.Footnote 9 In the particular case in which \(\alpha_{1} = 0\), Model (7) yields a return series that follows a random walk 1 and, for \(\alpha_{1} > 0\), series that are identified with a random walk 3, i.e., they would be generated by a process whose variables are uncorrelated but dependent (there are non-linear relationships among the variablesFootnote 10). Therefore, when random walk 3 is accepted, it is possible to develop models that allow market volatility to be predicted (model types ARCH and GARCH).
The procedure, starting from each of the four tests considered in this study, was applied to the different series generated by the combination of values for T and \(\alpha_{1}\) with a significance level of 5% [application of the process in Fig. 4 for expression (7)]. Then, we calculated the number of times that the different decisions contemplated by the two already known ways of applying the procedure were made.
On the basis of the results indicated in the previous paragraph and analogously to that described in Section (b1), we calculate, for each sample size T, the empirical size and the Monte Carlo power of each version of the procedure. In this context, \(\alpha_{1} = 0\) implies that the random walk 1 hypothesis is true, and \(\alpha_{1} > 0\) implies that it is not (it corresponds to a random walk 3).
-
b2.1
Empirical size
Since in this case the null hypothesis is again random walk 1, the obtained empirical sizes are nearly identical to those calculated in Section (b1) (the results are available on request).
-
b2.2
Monte Carlo power
Table 5 and Fig. 7 show, respectively, the power calculations of each version of the procedure and the plots of the corresponding power functions (in terms of parameter \(\alpha_{1}\)) for each sample size T.
Table 5 Monte Carlo powers* of the procedure** started from a specific test against non-linear alternatives only The estimated power of the procedure when starting from the runs test is approximately 0.05 for all alternatives, irrespective of the value of T. In the other cases, the power is influenced by parameters T and \(\alpha_{1}\); as the values of these parameters increase, the power tends to unity.
-
Fig. 7 shows that the procedure initiated from the BDS test is uniformly more powerful when \(T \le 100\), and the difference between the estimated powers of the procedure with the BDS test and those of the other cases becomes more pronounced as the sample size decreases. When \(T = 25\), the estimated power of the procedure initiated from the BDS test is approximately 0.7 for all alternatives, while the estimated power when starting from the Ljung–Box test and the variance ratio test increases with \(\alpha_{1}\) from 0.2 and 0.08 to 0.35 and 0.23, respectively. The difference in the estimated power in favour of the procedure initiated from the BDS test decreases with increasing sample size T, especially at high values of \(\alpha_{1}\). Likewise, in all three cases, the estimated power improves when the sample size increases, but a power of 1 is not reached in any case (at \(T = 100\), the estimated power for \(\alpha_{1} = 0.5\) is approximately 0.8 in all three cases).
-
For \(T \ge 250\), the estimated power of the procedure initiated from the BDS test, Ljung–Box test and variance ratio test converges to 1 as \(\alpha_{1}\) increases. In all these cases, the value of \(\alpha_{1}\) for which the power achieves unity decreases as the sample size increases. Thus, at \(T = 250\), unity is reached for \(\alpha_{1} = 0.5\); at \(T = 500\), for \(\alpha_{1} \ge 0.3\), and at \(T = 1000\), for \(\alpha_{1} \ge 0.2\). On the other hand, the plots in Fig. 7 show that the power function of the procedure initiated from the Ljung–Box is always above the other power functions, i.e., it is uniformly more powerful for \(T \ge 250\). However, the difference in the estimated power (in favour of the procedure initiated with the Ljung–Box test) is not pronounced.
-
Finally, regardless of the test from which the procedure is started, a power of 1 is not achieved for \(\alpha_{1} = 0.1\) for any sample size, not even \(T = 1000\) (the best result corresponds to the Ljung–Box case with an estimated power of approximately 0.83, followed by the variance ratio case with a value of 0.82; the BDS case yields the worst result–without considering the runs case–of approximately 0.74).
In this case, for alternative \(\alpha_{1} > 0\), Model (7) yields a series that follows a random walk 3, and the proposed procedure contemplates “non-random walk” among the possible decisions. Therefore, it is interesting to analyse, with each version of the procedure, to what extent the rejection of the random walk 1 hypothesis (when this is false) leads correctly to random walk 3. In other words, we are interested in determining what part of the power calculated in each case corresponds to the acceptance of the random walk 3 hypothesis (under the assumption that the hypothesis of random walk 1 is false). According to Section b1), we calculate the power that the procedure initiated from each test actually has. Thus, Table 5 and Fig. 8 report, for each sample size T, Monte Carlo estimates of the probability of accepting the random walk 3 hypothesis (given that the type 1 is false) with the procedure initiated from each of the tests considered (i.e., the real power). Additionally, the cited real powers and those initially calculated for each version of the procedure are compared for each T in Figs. 13, 14, 15 and 16 (Appendix).
As shown (Fig. 13), almost all the powers of the procedure initiated from the BDS test correspond to the acceptance of random walk 3 since the so-called real power hardly differs from that initially calculated for each sample size T. For large sample sizes, the real power tends to stabilize at approximately 0.96 as \(\alpha_{1}\) increases.
Similar behaviour is observed when the procedure is started from the variance ratio test, with the exception that, for \(T \ge 250\), the real powers become lower than those initially calculated as \(\alpha_{1}\) increases (for example, at \(T = 1000\), the estimated power for \(\alpha_{1} = 0.5\) was initially 1, but only 80% corresponds to the acceptance of the random walk 3 hypothesis).
Finally, the results show that an important part of the power initially calculated for the procedure when starting from the Ljung-Box test corresponds to the acceptance of a wrong alternative, i.e., the real power is significantly lower than the initial power, mainly at the small sample sizes (\(T = 25\) and 50). This extent of this loss of power decreases when \(T \ge 100\), and at \(T \ge 250\), the observed behaviour for high values of \(\alpha_{1}\) is the same as that described for the variance ratio case. Regardless, the real powers for the Ljung–Box case are lower than those for the variance ratio case.
Consequently, for the random walk 3 alternative (the only one that is true in this case), the procedure initiated from the BDS test is the most powerful.