A procedure for testing the hypothesis of weak efficiency in financial markets: a Monte Carlo simulation

The weak form of the efficient market hypothesis is identified with the conditions established by different types of random walks (1–3) on the returns associated with the prices of a financial asset. The methods traditionally applied for testing weak efficiency in a financial market as stated by the random walk model test only some necessary, but not sufficient, condition of this model. Thus, a procedure is proposed to detect if a return series associated with a given price index follows a random walk and, if so, what type it is. The procedure combines methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk hypothesis and methods that directly test a particular type of random walk. The proposed procedure is evaluated by means of a Monte Carlo experiment, and the results show that this procedure performs better (more powerful) against linear correlation-only alternatives when starting from the Ljung–Box test. On the other hand, against the random walk type 3 alternative, the procedure is more powerful when it is initiated from the BDS test.


Introduction
The hypothesis of financial market efficiency is an analytical approach aimed at explaining movements in prices of financial assets over time and is based on the insight that prices for such assets are determined by the rational behaviour of agents interacting in the market. This hypothesis states that stock prices reflect all the information available for the agents when they are determined. Therefore, if the hypothesis is fulfilled, it would not be possible to anticipate price changes and formulate investment strategies to obtain substantial returns, i.e., predictions about future market behaviour could not be performed.
The validation of the hypothesis of efficiency in a given financial market is important for both investors and trade regulatory institutions. It provides criteria to assess whether the environment favours the state that all agents playing in the market are subject to equal footings in a ''fair game'', where expectations of success and failure are equivalent.
Although the theoretical origin of the efficiency hypothesis arises from the work of Bachelier in 1900, Samuelson reported the theoretical foundations for this hypothesis in 1965. On the other hand, Fama established, for the first time, the concept of an efficient market. A short time later, the concept of the hypothesis of financial market efficiency emerged from the work of Roberts (1967), which also analysed efficiency with an informational outlook, leading to a rating for efficiency on three levels according to the rising availability of information for agents: weak, semi-strong and strong. Thus, weak efficiency means that information available to the agents is restricted to the historical price series; semi-strong efficiency means that all public information is available for all agents; and strong efficiency means that the set of available information includes the previously described information and other private data, known as insider trading.
The weak form of the efficiency hypothesis has been the benchmark of the theoretical and empirical approaches throughout history. In relation to the theoretical contributions, most link the weak efficiency hypothesis to the fact that financial asset prices follow a random walk (in form 1, 2 or 3) or a martingale. However, since it is necessary to impose additional restrictions on the underlying probability distributions that lead to one of the forms of random walk to obtain testable hypotheses derived from the martingale model, it seems logical to assume any of these forms as a pricing model.
Specifically, the types of random walks with which the weak efficiency hypothesis is identified are conditions that are established on the returns of a financial asset, which are relaxed from random walk 1 (which is the most restrictive) to random walk 3 (which corresponds to the most plausible in economic terms since it is not as restrictive). This makes it possible to evaluate the degree of weak efficiency.
Although numerous procedures have traditionally been used to test the weak efficiency of a financial market according to the random walk model, many test only some necessary, but not sufficient, condition of the aforementioned model in any of its forms (this is the case, for example, of the so-called linear methods that test only the necessary uncorrelation for the three types of random walk). In any case, applying one of these tests can lead to an incorrect conclusion. On the other hand, there are methods that directly test a specific type of random walk.
Through the strategic combination of both types of methods, a procedure that allows us to detect if a time series of financial returns follows a random walk and, if so, its corresponding type, is proposed. The objective is to reduce the effect of the above-mentioned limitation of some traditional methods when studying the weak efficiency hypothesis.
Consequently, the work begins (Sect. 2) by describing how the hypothesis of efficiency in a financial market is evaluated based on the so-called joint-hypothesis problem (Fama 1991). The different methods traditionally applied to test the weak efficiency in the forms that establish the random walk types are detailed in Sect. 3. Next, a procedure is proposed to detect if a return series associated with a given price index follows a random walk and, if so, what type it is. This procedure combines methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk hypothesis and methods that directly test for a particular type of random walk. The proposed procedure is evaluated by means of a Monte Carlo experiment, and the results are presented in Sect. 4. Finally, Sect. 5 contains the main conclusions of the study.

Evaluation of the efficiency hypothesis
To evaluate the efficiency of a financial market, Bailey (2005) proposes a procedure on the basis of the joint-hypothesis problem of Fama (1991), that is, considering, in addition to the available information, the existence of an underlying model to fix the prices of financial assets. Specifically, based on the aforementioned model and the cited set of information, the criterion that determines the efficiency of the market is established to create a testable hypothesis. Thus, by means of some method designed to test the hypothesis of efficiency, whether the collected data (observed prices) evince this hypothesis is tested, which would imply the efficiency or inefficiency of the market. Figure 1 shows this whole process schematically.
Clearly, in this procedure, the efficiency of a market depends on the pricing model and the information set assumed. Thus, if the conclusion for a market is efficiency (inefficiency) given a pricing model and a specific information set, it is possible that inefficiency (efficiency) would be concluded if another model and/or different set are assumed.
Traditionally, the martingale and the random walk are assumed to be models to fix the price P t of a financial asset whose continuously compounded return or log return is given by the expression Samuelson (1965) and Fama (1970), understanding the market as a fair game, raised the idea of efficiency from an informational outlook, with the less restrictive model, the martingale model. In this case, if X t is the available information set at time t, That is, in an efficient market, it is not possible to forecast the future using the available information, so the best forecast for the price of an asset at time t þ 1 is today's price. The second condition of expression (1) implies E p tþ1 À p t ¼ r tþ1 jX t ½ ¼ 0 which reflects the idea of a fair game and allows us to affirm that the return r t constitutes a martingale difference sequence, i.e., it satisfies the conditions

Random walk
The random walk was initially formulated as where r t is considered an independent and identically distributed process with mean 0 and constant variance, which assumes that changes in prices are unpredictable and random, a fact that is inherent to the first versions of the efficient market hypothesis. Nevertheless, several studies have shown that financial data are inconsistent with these conditions. Campbell et al. (1997) adjusted the idea of random walks based on the formulation where l is a constant term. By establishing conditions on the dependency structure of the process fe t g (which the authors call increments), they distinguish three types of random walks: 1, 2 and 3. In this case, the change in the price or return is So the conditions fixed on the increments fe t g can be extrapolated integrally to the returns {r t }.
a. Random walk 1 (RW1): IID increments/returns In this first type, e t is an independent and identically distributed process with mean 0 and variance r 2 , or e t $ IID(0,r 2 Þ in abbreviated form, which implies r t $ IID(l;r 2 Þ. Thus, formulation (2) is a particular case of this type of random walk for l ¼ 0. Under these conditions, the constant term l is the expected price change or drift. If, in addition, normality of e t is assumed, then (3) is equivalent to arithmetic Brownian motion. In this case, the independence of e t implies that random walk 1 is also a fair game but in a much stronger sense than martingale, since the mentioned independence implies not only that increments/returns are uncorrelated but also that any nonlinear functions of them are uncorrelated. b. Random walk 2 (RW2): independent increments/returns For this type of random walk, e t (and by extension r t ) is an independent but not identically distributed process (INID). RW2 contains RW1 as a particular case. This version of the random walk accommodates more general price generation processes and, at the same time, is more in line with the reality of the market since, for example, it allows for unconditional heteroskedasticity in r t , thus taking into account the temporal dependence of volatility that is characteristic of financial series. c. Random walk 3 (RW3): uncorrelated increments/returns Under this denomination, e t (and therefore r t ) is a process that is not independent or identically distributed but is uncorrelated; that is, cases are considered Cov e t ; e tÀk ð Þ¼0 8k 6 ¼ 0 but Cov e 2 t ; e 2 tÀk À Á 6 ¼ 0 for some k 6 ¼ 0 which means there may be dependence but no correlation. This is the weakest form of the random walk hypothesis and contains RW1 and RW2 as special cases.
As previously mentioned, financial data tend to reject random walk 1, mainly due to non-compliance with the constancy assumption of the variance of r t . In contrast, random walks 2 and 3 are more consistent with financial reality since they allow for heteroskedasticity (conditional or unconditional) in r t . Consequently, we could say that RW2 is the type of random walk closest to the martingale [actually, RW1 and RW2 satisfy the conditions of the martingale, but in a stronger sense (Bailey 2005)].

Martingale vs. random walk
The random walk hypothesis, in its three versions, and that of the martingale are captured in an expression that considers the kind of dependence that can exist between the returns r of a given asset at two times, t y tþk, where, in principle, f ðÁÞ and gðÁÞ are two arbitrary functions and may be interpreted as an orthogonality condition. For appropriately chosen f ðÁÞ and gðÁÞ, all versions of the random walk hypothesis and martingale hypothesis are captured by (4). Specifically, • If condition (4) is satisfied only in the case that f ðÁÞ and gðÁÞ are linear functions, then the returns are serially uncorrelated but not independent, which is identified with RW3. In this context, the linear projection of r tþk onto the set of its past values X t satisfies Proj r tþk jX t ð Þ¼constant; 8t; 8k ! 1 • If condition (4) is satisfied only when gðÁÞ is a linear function but f ðÁÞ is unrestricted, then the returns are uncorrelated with any function of their past values, which is equivalent to the martingale hypothesis. In this case, E r tþk jX t ½ ¼0; 8t; 8k ! 1 • If condition (4) holds for any f ðÁÞ and gðÁÞ, then returns are independent, which corresponds to RW1 and RW2. In this case, d:f : r tþk jX t ð Þ¼d:f : r tþk ð Þ; 8t; 8k ! 1 where d.f. denotes the probability density function. Table 1 summarizes the hypotheses derived from expression (4). Since, in practice, additional restrictions are usually imposed on the underlying probability distributions to obtain testable hypotheses derived from the martingale model, which results in the conditions of some of the random walk versions 1 (Bailey 2005, pp. 59-60), it is normal to assume the random walk as a pricing model.
Therefore, if the available information set is the historical price series and the pricing model assumed is the random walk, weak efficiency is identified with some types of random walks. 1 The additional restrictions that are usually imposed correspond to the conditions of random walks 1 or 2, which fulfil the martingale hypothesis in a stronger sense (see Sects. 2.2.a and 2.2.b).

Traditional methods
The methods traditionally used to test the weak form of efficiency, as established by some of the random walk types, are classified into two groups depending on whether they make use of formal statistical inference.
RW2 is analysed with methods that do not use formal inference techniques (filter rules and technical analysis 2 ) because this type of random walk requires that the return series is INID. In this case, it would be very complicated to test for independence without assuming identical distributions (particularly in the context of time series) since the sampling distributions of the statistics that would be constructed to carry out the corresponding test could not be obtained (Campbell et al. 1997, p. 41).
On the other hand, the methods that apply formal inference techniques for the analysis can be classified into two groups according to whether they allow direct testing of a type of random walk or only some necessary, but not sufficient, condition for its fulfilment.
The methods of the second group include the Bartlett test, tests based on the Box-Jenkins methodology, the Box-Pierce test, the Ljung-Box test and the variance ratio test. These methods test only the uncorrelation condition on the return

IID INID
Source: Own elaboration from Campbell et al. (1997) series (they are also called linear methods 3 ) necessary for any type of random walk. Since these tests do not detect non-linear relationships 4 that, if they exist, would entail the dependence of the series, rejection of the null hypothesis would imply no uncorrelation of the series and, consequently, the non-existence of any type of random walk. On the other hand, for tests that try to detect ARCH effects, rejection of the null hypothesis involves only the acceptance of non-linear relationships, which does not necessarily imply that the series is uncorrelated.
Other methods allow direct determination of whether the return series follows a specific type of random walk. This means that these procedures also take into account the possibility of non-linear relationships in the series, either because they are considered by the null hypothesis itself or because the cited methods have power against alternatives that capture these relationships (they would be, therefore, nonlinear methods). These methods include those that allow testing of the random walk type 1 (BDS test, runs test and sequences and reversals test) and one that tests for a type 3 random walk (variance ratio test, which considers the heteroskedasticity of the series). Figure 2 shows the classification established in this section for the different methods that are traditionally used to test the hypothesis of weak efficiency.
The financial literature shows that the methods described above have traditionally been applied to test the weak efficiency hypothesis in financial markets.
Correlation tests to determine the efficiency of a market were first used when Fama (1965) and Samuelson (1965) laid the foundations of efficient market theory. From these beginnings, the works developed by Moore (1964), Theil and Leenders (1965), Fama (1965) and Fisher (1966), among others, stand out.
These tests were used as almost the only tool to analyse the efficiency of a market until, in the 1970s, seasonal effects and calendar anomalies became relevant for the analysis. Then, new methodologies incorporating these effects emerged, such as the seasonality tests applied by Roseff and Kinney (1976), French (1980) and Gultekin and Gultekin (1983).
In the 1990s, studies that analysed the efficiency hypothesis in financial markets using so-called traditional methods began to appear. This practice has continued to the present day, as evidenced by the most prominent empirical works on financial efficiency in recent years.
Articles using technical analysis to test for the efficiency of a financial market include Potvin et al. (2004) On the other hand, among the studies that apply methods that test only a necessary, but not sufficient, condition of the random walk hypothesis, the most numerous are those that use correlation tests. In this sense, we can cite the studies developed by Buguk and Brorsen (2003), DePenya and Gil-Alana (2007) Rossi and Gunardi (2018) and Khanh and Dat (2020).
Regarding methods that directly test a type of random walk, the runs test has been used in works such as Dicle et al. (2010), Jamaani and Roca (2015), Leković (2018), Chu et al. (2019) and Tiwari et al. (2019). Meanwhile, the application of the BDS test can be found in studies such as Yao and Rahaman (2018), Abdullah et al. (2020), Kołatka (2020) and Adaramola and Obisesan (2021).
Therefore, the proposal of a procedure (Sect. 3.2) that reduces the limitations of traditional methods would be a novel contribution to the financial field as far as the analysis of the weak efficiency hypothesis is concerned. Moreover, it would be more accurate than the traditional methods in determining whether a return series follows a random walk.

Proposed procedure
By strategically combining the methods analysed in the previous section, we propose a procedure to test the random walk hypothesis that can be started either from a method that tests only a necessary, but not sufficient, condition or from one that directly tests a specific type of random walk (1, 2 or 3).
On the one hand, if the procedure is started with a method from the first group and shows correlation of the return series, it would not follow any type of random Fig. 2 Methods traditionally used to test the random walk hypothesis (weak efficiency). Source: own elaboration walk. In the opposite case (uncorrelation), an ARCH effect test is recommended to determine the type of random walk. Thus, if ARCH effects are detected, which implies the existence of non-linear relationships, it should be concluded that the series is RW3. Otherwise, the series will be RW1 or RW2, and a non-formal statistical inference technique can be applied to test for type 2. Finally, if the RW2 hypothesis is rejected, then the series is necessarily RW1.
On the other hand, regarding the methods that directly test a type of random walk, it is proposed to start the procedure with one that tests RW1. Thus, if the null hypothesis is rejected with this method, it cannot be ruled out that the series is RW2, RW3 or not a random walk at all. Before affirming that we are not facing any type of random walk, first it is suggested to check for type 2 by applying a non-formal statistical inference technique. If the RW2 hypothesis cannot be accepted, then RW3 is tested. In this case, rejection of the RW3 hypothesis implies that the series is not a random walk. Figure 3 schematically shows the described procedure.
The acceptance of market inefficiency (i.e., that the return series is not RW) occurs when the price series analysed shows non-randomness, a structure that can be identified, long systematic deviations from its intrinsic value, etc. (even the RW3 hypothesis implies dependence but no correlation). This indicates a certain degree of predictability, at least in the short run, i.e., it is possible to forecast both the asset returns and the volatility of the market using past price changes. These forecasts are constructed on the basis of models reflecting the behaviour of financial asset prices.
Among the models that allow linear structures to be captured, the ARIMA and ARIMAX models stand out. Moreover, ARCH family models are used for modelling and forecasting the conditional volatility of asset returns. On the other hand, when the return series presents non-linear relationships, it is common to use non-parametric and non-linear models, including those based on neural networks and machine learning techniques. Finally, hybrid models (a combination of two or more of the procedures described above) consider all the typical characteristics of financial series.

Monte Carlo experiment
The procedure introduced in the previous section is evaluated by means of a Monte Carlo experiment, 5 considering the variance ratio test proposed by Lo and MacKinlay (1988) 6 and the Ljung-Box test (1978), when started from methods that test only some necessary, but not sufficient, condition of the random walk hypothesis; and the BDS test 7 and the runs test when starting from methods that directly test the mentioned hypothesis. If the procedure requires the application of an ARCH effect test to decide between random walks 1 and 3, ARCH models up to order 4 are used.
To conduct this analysis, return series are generated from two different models because the objective is twofold: to evaluate the performance of the procedure in the analysis of the random walk 1 hypothesis against the linear correlation alternative, on the one hand, and against that of the random walk 3, on the other. 8 Thus, the BDS, runs, variance ratio and Ljung-Box tests are applied to each generated return series. Then, if the RW1 hypothesis is rejected by the first two tests, the variance-ratio test is applied to determine whether the series is at least RW3. On the other hand, if the random walk hypothesis is not rejected with the first two tests, an ARCH effect test is applied to discern between RW1 and RW3. The process is replicated 10,000 times for each sample size T and each value of the parameter involved in the generation of the series (see the whole process in Fig. 4).  LeBaron and Scheinkman (1996) for testing the null hypothesis that a series is independent and identically distributed. It is based on the correlation integral developed by Grassberger and Procaccia (1983), which is a measure of spatial correlation between two points of an m-dimensional space. We consider m = 2, 3, 4 and 5 since Monte Carlo experiments have shown that the BDS statistic has good properties for m 5, regardless of the sample size (Kanzler 1999). 8 All the simulations were performed using routines developed in EViews 8 with the random number generator contained therein.

a. Nominal size
Before analysing the Monte Carlo powers of the procedure initiated from the different indicated tests, the corresponding nominal size is estimated; that is, the maximum probability of falsely rejecting the null hypothesis of random walk 1 is calculated in each case. Since the different ways of executing the proposed procedure contemplate the possibility of applying tests sequentially to make a decision, we must not expect, in general, the nominal size of each case to coincide with the significance level a which is fixed in each of the individual tests.
To estimate the mentioned nominal size, return series that follow a random walk 1 are generated where e t $ iidð0; 1Þ. Specifically, 10,000 series of size T are generated, and the tests required by the specific way in which the procedure is being applied are performed on each data series independently, not sequentially, with significance level a. The reiteration of this process allows us to determine, for each T, the number of acceptances and rejections of the null hypothesis (random walk 1) that occur with the independent application of each test. This makes possible the estimation of the nominal size of the procedure in each case as the quotient of the total rejections of the null hypothesis divided by the total number of replications (10,000 in this case). The process described in the previous paragraph was performed for the sample sizes T = 25, 50, 100, 250, 500 and 1000 and significance levels a = 0.02, 0.05 and 0.1 [application of the process in Fig. 4 for expression (5)]. The results (Table 2) indicate, for a given value T, the (estimated) theoretical size of the procedure when a significance level a is set in the individual tests required by the cited procedure initiated from a specific method. For example, if for T ¼ 100 the researcher sets a value of a ¼ 0:05 in the individual tests and wishes to apply the procedure initiated from the variance ratio test, he will be working with a (estimated) theoretical size of 0.0975.
The estimated nominal size of the procedure when starting from methods that directly test the hypothesis of random walk 1 is much better in the case of the runs test since it practically coincides with the significance level a fixed (in the individual tests) for any sample size T. However, size distortions (estimated values far from the level a) are evident when the procedure is initiated from the BDS test, and the results are clearly affected by T. In effect, the greatest distortions occur for small sample sizes and decrease as T increases (at T ¼ 1000, the estimated nominal size for each a is 0.0566, 0.133244 and 0.2214, respectively, i.e., approximately 2a).
Since the variance ratio test and the Ljung-Box test do not directly test the random walk 1 hypothesis-to estimate the nominal size of the procedure initiated from any of them, it is necessary to apply tests sequentially-the results that appear in Table 2 for these two cases are expected in the sense that the estimates of the respective nominal sizes for each T are greater than the significance level a. In this context of size distortion, the best results correspond to the case of the variance ratio test, with estimated values very close to the significance level a for small sample sizes (T ¼ 25 and 50) but that increase as T increases (note that at T ¼ 1000, for each value of a, the nominal size is approximately double that at T ¼ 25, i.e., approximately 2a). In the case of the Ljung-Box test, where the distortion is greater, the sample size T hardly influences the estimated values of the nominal size since, irrespective of the value of T, they remain approximately 10%, 21% and 37% for levels 0.02, 0.05 and 0.1, respectively. b. Empirical size and Monte Carlo power (b1) The performance of the procedure for testing the random walk 1 hypothesis against the only linear correlation alternative (among the variables of the return series generating process) is analysed using the model with r 0 ¼ 0 and e t $ iidð0; 1Þ. By means of (6), ten thousand samples of sizes T = 25, 50, 100, 250, 500 and 1000 of the series r t are generated for each value of parameter / 1 considered: -0.9, -0.75, -0.5, -0.25, -0.1, 0, 0.1 0.25, 0.5. 0.75 and 0.9. In this way, the model yields return series that follow a random walk 1 (particular case in which / 1 ¼ 0) and, as an alternative, series with a first-order autoregressive structure (cases in which / 1 6 ¼ 0), i.e., they would be generated by a process whose variables are correlated. Therefore, when the null hypothesis is rejected, some degree of predictability is admitted since by modelling the above autoregressive structure with an ARMA model, it is possible to predict price changes on the basis of historical price changes.   The procedure, starting from each of the considered tests (BDS, runs, Ljung-Box and variance ratio), was applied to the different series generated by the combinations of values of T and / 1 with a significance level of 5% [application of the process in Fig. 4 for expression (6)]. Then, we calculated the number of times that the different decisions contemplated by the two ways of applying the procedure are made (according to whether we start from a method that does or does not directly test the random walk hypothesis).
From the previous results, we calculate, for each sample size T, the percentage of rejection of the null hypothesis (random walk 1) when starting from each of the four tests considered, depending on the value of parameter / 1 . Since / 1 ¼ 0 implies that the null hypothesis is true, in this particular case, the calculations represent the empirical probability of committing a type I error for the procedure in the four applications, i.e., the empirical size. However, when / 1 6 ¼ 0, the cited calculations represent the Monte Carlo power of each version of the procedure since for these values of / 1 , the null hypothesis is false.

b1:1 Empirical size
The empirical sizes ( Table 3) that resulted from the different cases analysed nearly coincide with the corresponding theoretical probabilities calculated for a ¼ 0:05 (see Table 2). Therefore, there is empirical confirmation of the size distortions that appear in the procedure according to the test from which it is started. In effect, • When the procedure is initiated from methods that directly test the random walk 1 hypothesis, the results confirm that for the runs test, the size of the procedure remains approximately 5% (the significance level) for all T. Nevertheless, when initiating from the BDS test, a very high size distortion is produced for small sample sizes (0.6806 and 0.5425 at T ¼ 25 and 50, respectively), but the distortion decreases as T increases (it reaches a value of 0.1334 at T ¼ 1000). • The size distortions exhibited by the procedure when starting with methods that test only a necessary, but not sufficient, condition of the random walk hypothesis, are less pronounced when the procedure is applied starting from the variance ratio test than when starting from the Ljung-Box test. Likewise, in the former case, the empirical size increases with the sample size T from values close to the significance level (0.05) to more than double the significance level (from 0.0603 at T ¼ 25 to 0.1287 at T ¼ 1000). In the latter case (Ljung-Box), the values between which the empirical size oscillates (18% and 22%) do not allow us to affirm that there exists an influence of T.
b1:2 Monte Carlo power Table 4 reports, for each sample size T, the power calculations of the procedure started from each of the four tests considered in this study, i.e., the probability of rejecting the null hypothesis (random walk 1) with each version of the procedure on the assumption that the hypothesis is false. Likewise, since several alternatives to the null hypothesis (values that satisfy / 1 6 ¼ 0) are considered, the corresponding power functions of the cited versions of the procedure are obtained and plotted in a comparative way for each T (Fig. 5).
For each sample size T and regardless of the test from which the procedure is started, the corresponding probabilities of rejection of the random walk 1 hypothesis are distributed symmetrically around the value / 1 ¼ 0 (random walk 1 hypothesis). Additionally, these probabilities tend to unity as / 1 j j increases, reaching 1 for values of / 1 j j increasingly closer to 0 as the sample size T increases. The velocity of the described behaviour depends on the test from which the procedure is started: • For the two smallest sample sizes (T ¼ 25 and 50), a power of 1 is hardly achieved for any of the alternatives. Only at T ¼ 50 is the power approximately 100 percent, with the procedure initiated from any of the four tests, for / 1 j j! 0:75. On the other hand, at T ¼ 25, the estimated powers of the procedure initiated from the BDS test for / 1 j j 0:5 are much higher than those presented by the other cases. A similar situation occurs at T ¼ 50, but with less pronounced differences between what the procedure with the BDS test and the other cases yield and restricted to the alternatives with / 1 j j 0:25. • From sample size 100, we observe differences in the convergence to unity of the estimated powers according to the test from which the procedure is initiated. Thus, when starting from the Ljung-Box test and the variance ratio test, a power of approximately 1 is achieved for / 1 j j! 0:5 at T ¼ 100, whereas for larger sample sizes, convergence to 1 is nearly reached for / 1 j j! 0:25. On the other hand, when the procedure is started from the BDS test, a power of 1 is reached for / 1 j j! 0:75 at T ¼ 100 and for / 1 j j! 0:5 at T ! 250 (note that at T ¼ 1000, the estimated power does not exceed 0.89 for / 1 j j ¼ 0:25Þ. Finally, when the procedure is initiated from the runs test, the value of / 1 j j for which the powers achieve unity decreases as the sample size T increases beyond 100. Specifically, at T ¼ 100, unity is reached for / 1 j j! 0:75; at T ¼ 250, for / 1 j j! 0:5; and at T ¼ 1000, for / 1 j j! 0:25 (at T ¼ 500, the power is approximately 0.95 for / 1 j j ¼ 0:25). The plots in Fig. 5 show that the power function of the procedure initiated from the Ljung-Box test is always above the other power functions, i.e., it is uniformly more powerful for T ! 100.   • Regardless of the test from which the procedure is started, a power of 1 is not achieved for / 1 j j ¼ 0:1 for any sample size, not even at T ¼ 1000 (the best result corresponds to the Ljung-Box case with an estimated power of approximately 0.91, followed by the variance ratio and runs cases with values close to 0.8 and 0.53, respectively; the BDS case yields the worst result of approximately 0.18).
At this point, we can say that the power of the procedure has been analysed, that is, its capability of rejecting the null hypothesis (random walk 1) when the null hypothesis is false. As already mentioned, for / 1 6 ¼ 0, Model (6) yields a series that does not follow any type of random walk. However, the proposed procedure contemplates random walk 3 among the possible decisions. Therefore, if from the powers calculated for each version of the procedure, we subtract the portion that corresponds to the (wrong) decision of random walk 3, we obtain the power that the procedure initiated from each test actually has, i.e., its capability to reject the null hypothesis in favour of true alternatives when the null hypothesis is false.
In this sense, Table 4 and Fig. 6 report, for each sample size T, the power calculations of the procedure initiated from each of the tests considered after subtracting the effect of the (false) alternative of random walk 3. Furthermore, the cited powers and those initially calculated for each version of the procedure are compared for each T in Figs. 9, 10, 11 and 12 (Appendix).
When the procedure is started from the runs test, the variance ratio test or the Ljung-Box test (Appendix Figs. 10,11,12), what we call real power hardly differs from that initially calculated for each sample size T (these slight differences occur for / 1 j j 0:5 with T 100 and / 1 j j ¼ 0:1 with T ! 250). Therefore, all the abovementioned findings in relation to the power of these three cases is maintained.
Nevertheless, there are considerable differences between the real power and that initially calculated when the procedure is started from the BDS test. In effect, the *The values in parentheses indicate the real power of the procedure for the corresponding alternative, i.e., the probability of rejecting the null hypothesis in favour of true alternatives when the null hypothesis is false **The individual tests required by the different ways of applying the procedure are performed at the 5% level of significance Source: Own elaboration initial calculations indicated that this version of the procedure was the most powerful for / 1 j j 0:5 and / 1 j j 0:25 for T ¼ 25 and T ¼ 50, respectively (with all the values greater than 0.5), but the results in Table 4 and Fig. 6 show that the powers in these cases are actually much lower (0.2 is hardly reached in one single case). Although these differences persist for T ¼ 100, also in the context of / 1 j j 0; 25, they start to decrease as the sample size increases from T ! 250 (we could say that, for T ! 500, there are minimal differences between the real power and the initially calculated power).
Consequently, in terms of the power referring only to true alternatives (linear correlation in this case), the procedure initiated from the Ljung-Box test is the most powerful. Fig. 5 Monte Carlo power of the procedure when starting from each test against linear correlation-only alternatives. Source: Own elaboration (b2) The performance of the procedure for testing the random walk 1 hypothesis against only the non-linear alternative (among the variables of the return series generating process) is analysed by means of an ARCH(1) model.
where h t and e t are independent processes of each other such that h t is stationary and e t $ iidð0; 1Þ, with a 0 [ 0 and a 1 ! 0. Specifically, taking r 0 ¼ 0 in (7), 10,000 samples of sizes T = 25, 50, 100, 250, 500 and 1000 of the series r t are generated for Fig. 6 Real Monte Carlo power of the procedure when starting from each test against linear correlationonly alternatives. Source: Own elaboration a 0 ¼ 1 and each value of a 1 considered: 0, 0.1, 0.2, 0.3, 0.4 and 0.5. 9 In the particular case in which a 1 ¼ 0, Model (7) yields a return series that follows a random walk 1 and, for a 1 [ 0, series that are identified with a random walk 3, i.e., they would be generated by a process whose variables are uncorrelated but dependent (there are non-linear relationships among the variables 10 ). Therefore, when random walk 3 is accepted, it is possible to develop models that allow market volatility to be predicted (model types ARCH and GARCH). The procedure, starting from each of the four tests considered in this study, was applied to the different series generated by the combination of values for T and a 1 with a significance level of 5% [application of the process in Fig. 4 for expression (7)]. Then, we calculated the number of times that the different decisions contemplated by the two already known ways of applying the procedure were made.
On the basis of the results indicated in the previous paragraph and analogously to that described in Section (b1), we calculate, for each sample size T, the empirical size and the Monte Carlo power of each version of the procedure. In this context, a 1 ¼ 0 implies that the random walk 1 hypothesis is true, and a 1 [ 0 implies that it is not (it corresponds to a random walk 3).

b2:1 Empirical size
Since in this case the null hypothesis is again random walk 1, the obtained empirical sizes are nearly identical to those calculated in Section (b1) (the results are available on request). b2:2 Monte Carlo power Table 5 and Fig. 7 show, respectively, the power calculations of each version of the procedure and the plots of the corresponding power functions (in terms of parameter a 1 ) for each sample size T.
The estimated power of the procedure when starting from the runs test is approximately 0.05 for all alternatives, irrespective of the value of T. In the other cases, the power is influenced by parameters T and a 1 ; as the values of these parameters increase, the power tends to unity.
• Fig. 7 shows that the procedure initiated from the BDS test is uniformly more powerful when T 100, and the difference between the estimated powers of the procedure with the BDS test and those of the other cases becomes more 9 For an ARCH(1) model such as (7), the 4th-order moment of r t E r 4 t Â Ã ¼ 3a 2 0 ð1þa1Þ ð1Àa1Þð1À3a 2 1 Þ will be finite and positive if a 2 1 2 ½0; 1=3. 10 From Model (7) and the conditions under which it is defined, the uncorrelation of r t is derived Covðr t ; r tÀk Þ ¼ E ðh t e t Þðh tÀk e tÀk Þ ½ ¼ E ðh t h tÀk Þðe t e tÀk Þ ½ ¼ E h t h tÀk ½ E e t e tÀk ½ ¼0 8k 6 ¼ 0 (where it has been taken into account that E r t ½ ¼ E h t e t ½ ¼E h t ½ E e t ½ ¼ 08t), just as the non-linear relationship among the variables of the process r t since r 2 Table 5 Monte Carlo powers* of the procedure** started from a specific test against non-linear alternatives only pronounced as the sample size decreases. When T ¼ 25, the estimated power of the procedure initiated from the BDS test is approximately 0.7 for all alternatives, while the estimated power when starting from the Ljung-Box test and the variance ratio test increases with a 1 from 0.2 and 0.08 to 0.35 and 0.23, respectively. The difference in the estimated power in favour of the procedure initiated from the BDS test decreases with increasing sample size T, especially at high values of a 1 . Likewise, in all three cases, the estimated power improves when the sample size increases, but a power of 1 is not reached in any case (at T ¼ 100, the estimated power for a 1 ¼ 0:5 is approximately 0.8 in all three cases). • For T ! 250, the estimated power of the procedure initiated from the BDS test, Ljung-Box test and variance ratio test converges to 1 as a 1 increases. In all these Fig. 7 Monte Carlo power of the procedure when starting from each test against non-linear alternatives only. Source: Own elaboration cases, the value of a 1 for which the power achieves unity decreases as the sample size increases. Thus, at T ¼ 250, unity is reached for a 1 ¼ 0:5; at T ¼ 500, for a 1 ! 0:3, and at T ¼ 1000, for a 1 ! 0:2. On the other hand, the plots in Fig. 7 show that the power function of the procedure initiated from the Ljung-Box is always above the other power functions, i.e., it is uniformly more powerful for T ! 250. However, the difference in the estimated power (in favour of the procedure initiated with the Ljung-Box test) is not pronounced. • Finally, regardless of the test from which the procedure is started, a power of 1 is not achieved for a 1 ¼ 0:1 for any sample size, not even T ¼ 1000 (the best result corresponds to the Ljung-Box case with an estimated power of approximately 0.83, followed by the variance ratio case with a value of 0.82; the BDS case yields the worst result-without considering the runs case-of approximately 0.74). In this case, for alternative a 1 [ 0, Model (7) yields a series that follows a random walk 3, and the proposed procedure contemplates ''non-random walk'' among the possible decisions. Therefore, it is interesting to analyse, with each version of the procedure, to what extent the rejection of the random walk 1 hypothesis (when this is false) leads correctly to random walk 3. In other words, we are interested in determining what part of the power calculated in each case corresponds to the acceptance of the random walk 3 hypothesis (under the assumption that the hypothesis of random walk 1 is false). According to Section b1), we calculate the power that the procedure initiated from each test actually has. Thus, Table 5 and Fig. 8 report, for each sample size T, Monte Carlo estimates of the probability of accepting the random walk 3 hypothesis (given that the type 1 is false) with the procedure initiated from each of the tests considered (i.e., the real power). Additionally, the cited real powers and those initially calculated for each version of the procedure are compared for each T in Figs. 13, 14, 15 and 16 (Appendix).
As shown (Fig. 13), almost all the powers of the procedure initiated from the BDS test correspond to the acceptance of random walk 3 since the so-called real power hardly differs from that initially calculated for each sample size T. For large sample sizes, the real power tends to stabilize at approximately 0.96 as a 1 increases.
Similar behaviour is observed when the procedure is started from the variance ratio test, with the exception that, for T ! 250, the real powers become lower than those initially calculated as a 1 increases (for example, at T ¼ 1000, the estimated power for a 1 ¼ 0:5 was initially 1, but only 80% corresponds to the acceptance of the random walk 3 hypothesis).
Finally, the results show that an important part of the power initially calculated for the procedure when starting from the Ljung-Box test corresponds to the acceptance of a wrong alternative, i.e., the real power is significantly lower than the initial power, mainly at the small sample sizes (T ¼ 25 and 50). This extent of this loss of power decreases when T ! 100, and at T ! 250, the observed behaviour for high values of a 1 is the same as that described for the variance ratio case. Regardless, the real powers for the Ljung-Box case are lower than those for the variance ratio case.
Consequently, for the random walk 3 alternative (the only one that is true in this case), the procedure initiated from the BDS test is the most powerful.

Concluding comments
The methods traditionally applied to test the weak efficiency in a financial market, as the random walk model states, have serious limitations. They only test for a type of random walk or some necessary, but not sufficient, condition to accept the random walk hypothesis in one of its forms.
To address these limitations, a procedure that strategically combines traditional methods is proposed to detect whether a return series follows a specific type of random walk (1, 2 or 3). When the random walk hypothesis is rejected, the inefficient market is accepted, i.e. the market is predictable. In this context, future price changes can be predicted based on past price changes through a model of asset prices.
The proposed procedure is evaluated in the context of a random walk 1 against linearity and non-linearity alternatives using a Monte Carlo experiment. This procedure is applied starting from methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk 1 hypothesis (variance ratio test and Ljung-Box test) and from methods that directly test a particular type of random walk (BDS test and runs test).
The results allow us to conclude that, against linear correlation-only alternatives, the procedure performs best when starting from the Ljung-Box test. In this case, the real power of the procedure is higher than that when starting from any of the other tests, for any sample size, especially for larger ones (T ! 100). In all cases, serious power distortions occur in the alternatives close to the null hypothesis (RW1). However, these distortions disappear as the sample size increase, except when the procedure is initiated from the BDS test (the aforementioned distortions remain for large sample sizes).
In contrast, against the random walk 3 alternative, the highest real powers for each sample size occur when the procedure is started from the BDS test. Again, all cases show poor real power in the alternatives close to the null hypothesis (random walk 1). These powers improve as the sample size increases, except in the case where the procedure is initiated from the runs test, which retains very low power against the RW3 alternative for all sample sizes (around the significance level a ¼ 0:05).
Regarding the size of the procedure, all the cases analysed present empirical values very similar to the corresponding estimated nominal size (for a significance level of a ¼ 0:05). In particular, the procedure initiated from the BDS test exhibits the greatest size distortions for small samples. However, there are no distortions when the procedure is started from the runs test, although its application is discouraged because its power for the random walk 3 alternative is poor.
The procedure introduced in this paper has been applied to evaluate the degree of fulfilment of the weak efficiency hypothesis in four European financial markets (Spain, Germany, France and Italy) from 1st January 2010 to 15th May 2020 (García-Moreno and Roldán 2021).
Currently, the authors are analysing the performance of the proposed procedure against other alternatives to the random walk hypothesis that are not considered in this work. They are also analysing the performance of the procedure when it combines formal and non-formal statistical inference techniques to accommodate random walk 2.   Availability of data and material Not applicable.
Code availability Not applicable.

Declarations
Conflict of interest We have no conflicts of interest to disclose.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line Fig. 16 Procedure started from the Ljung-Box test: power vs. real power (non-linear alternatives only). Source: own elaboration to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.