Understanding nonsense correlation between (independent) random walks in finite samples

Consider two independent random walks. By chance, there will be spells of association between them where the two processes move in the same direction, or in opposite direction. We compute the probabilities of the length of the longest spell of such random association for a given sample size, and discuss measures like mean and mode of the exact distributions. We observe that long spells (relative to small sample sizes) of random association occur frequently, which explains why nonsense correlation between short independent random walks is the rule rather than the exception. The exact figures are compared with approximations. Our finite sample analysis as well as the approximations rely on two older results popularized by Révész (Stat Pap 31:95–101, 1990, Statistical Papers). Moreover, we consider spells of association between correlated random walks. Approximate probabilities are compared with finite sample Monte Carlo results.


Introduction
The puzzle why "we sometimes get nonsense-correlation between time-series" has first been addressed in the seminal paper by Yule (1926). One model that he sug-gested to explain correlation between independent series was the random walk, called "conjunct series the differences of which are random" by Yule (1926, p. 26). For independent random walks Yule (1926, p. 33) provided experimental evidence, obtained by drawing playing cards from shuffled packs, that "The frequency-distribution of the correlations of samples of 10 observations [...] are much more widely dispersed than the correlations from samples of random series". His findings were accomplished by the computer experimental evidence on spurious regressions by Granger and Newbold (1974) for independent random walks of length 50, see also Palm and Sneek (1984) for further Monte Carlo results. Phillips (1986) showed that nonsense correlation between independent random walks is not a finite sample problem only. From Phillips (1986, Thm. 1) the limiting distribution of the sample correlation is available: it converges to a nondegenerate random variable. More recently, Ernst et al. (2017) determined the variance of this limit, and numerical evaluation showed that it equals 0.240522 (Ernst, Shepp andWyner (2017, p. 1807)). Of course, such findings cannot fully explain why nonsense correlation occurs between random walks in small samples. 1 In this note, we return to the finite sample puzzle. Yule (1926, Fig. 14) observed that random walks may trend in the same direction (concordance) or in the opposite direction (discordance) for certain periods of time. This is an intuitive explanation for nonsense correlation: there will be cluster of association between independent random walks. To add some rigour to this intuition, we would like to know: what is the maximum length to be expected for such spells of concordance or discordance given a fixed sample size? How large is the mode of this maximum length? And how large is the probability to observe values equal to or even larger than the mode? These questions will be answered by means of the corresponding probability distribution given in Corollary 1, building on the little-known Hungarian paper by Székely andTusnády (1976-1979), see Révész (1990, Thm. 7) and Révész (2013, Thm. 2.7) for a reference. For independent random walks of length n = 25 we learn for instance: The probability that the maximum length of spells with consecutive concordance, or consecutive discordance, is at least equal to 4 amounts to 84.76%. Hence, long spells of random association (relative to the small sample size) are rather likely. The merits of exact results will be demonstrated by comparison with approximations. Asymptotic results in Proposition 2 can be traced back to Földes (1975), which is again a Hungarian paper referenced by Révész (1990, Thm. 6). Further, Gordon et al. (1986) provided approximations that allows for correlated random walks, too, which will be evaluated at the end of our note. Since no results for exact probabilities are available, we confront the asymptotic results with finite sample Monte Carlo figures.
The rest of this paper is organized as follows. The next section motivates this study with some Monte Carlo results. Section 3 becomes precise on random association and provides the exact distributional result under independence. The latter is evaluated numerically in Sect. 4 to shed light on why nonsense correlation is likely between independent random walks in finite samples. Section 5 compares the exact results with approximations. Section 6 is devoted to the extension of correlated random walks. A short summary is contained in the final section.
A word on notation before we begin. Let x denote the integer part of x ∈ R, with fractional part {x} := x − x . Let log b stand for the logarithm to the base b, while ln denotes the natural logarithm.

Some experimental evidence
Consider a bivariate random walk (X i , Y i ) i=0,1,...,n defined by where (X 0 , Y 0 ) is an arbitrary starting value. Before we begin with the theory, let us collect some experimental evidence. For computer simulation, the differences We simulated random walks with (X 0 , Y 0 ) = (0, 0) and computed the sample correlation: Then we took the absolute value | ρ| (since it is known that ρ varies symmetrically around zero for ρ = 0). We report the average over 10 5 replications for growing sample size. Clearly, there is massive evidence in favour of nonsense correlation for ρ = 0, and the absolute correlation coefficients are of the same size for small n as for large n, see Table 1. For moderate correlation ρ = 0.2, 0.4, the sample correlation still exaggerates the true values, while ρ = 0.6 results in averages | ρ| ≈ 0.6, and ρ = 0.8 yields on average | ρ| ≈ 0.77, and these figures are rather robust over the sample size n, too. In this paper we offer the length of random association between independent random walks or between moderately correlated random walks of small and medium sample sizes as an explanation for nonsense correlation or exaggerated correlation as documented in Table 1.

Spells of concordance and discordance
We maintain a bivariate random walk (X i , Y i ) i=0,1,...,n defined by equation (1), where (X 0 , Y 0 ) is an arbitrary starting value. We now focus on independence (to be relaxed in Assumption 2). More precisely, the differences ( X i , Y i ) = (ε i , η i ) meet the following set of assumptions.
..,n be a sequence of independent, identically distributed and continuous random variables with Further, ε i and η i are independent, and at least one of the probabilities equals 1/2: p ε = 1/2 or p η = 1/2.

Remark 1
Note that the asymptotic theory by Phillips (1986) or Ernst et al. (2017) requires E(ε i ) = E(η i ) = 0, which we do not need. For Proposition 1 and 2 we just need that the median of ε i or η i equals zero.
We say that the variables from (1) are concordant on the ith interval if X i and Y i move in the same direction; if they move in the opposite direction, they are called discordant. In terms of the usual sign function this provides the following definition.

Definition 1 Concordance on the ith interval means that
Note that we rule out X i = 0 or Y i = 0 with probability 1 by assumption. For convenience, we define C i as concordance indicator, taking on the value 0 if X i and Y i have the same sign: By Assumption 1, it holds that Consider a subsequence of consecutive zeros in (C i ) i=1,...,n , called a zero run. Let Z n stand for the length of the longest zero run, which corresponds to the length of the longest spell without interruption where X i and Y i move in the same direction. The probabilities P(Z n = k) for given n can be expressed in terms of generalized Fibonacci numbers. We adopt the most convenient definition for our purposes by Spickerman and Joyner (1984, p. 327).
By Proposition 1, it immediately follows that Further, Z n = 0 corresponds to a sequence of n ones with probability P(Z n = 0) = 1/2 n . More generally, we are interested in the length of the longest spell of consecutive intervals where X i and Y i are concordant or discordant without interruption. This corresponds to the maximum length of zero runs or runs of ones in (C i ). Let S n stand for this length of the longest spell of consecutive ones or zeros. With Proposition 1, it is straightforward to establish the following distribution.
Proof See Appendix.

Numerical work
Given the relation in (6), our numerical evaluation will be restricted to the length of the longest spell of consecutive zeros or ones, S n . The computation requires to determine (generalized) Fibonacci numbers. We employ the recursion from Definition 2 and do not bother about explicit solutions. Statistical measures of S n are given in Table 2, and they are illustrated by the plots in Fig. 1. For the expected values from Table 2 one observes a logarithmic rate: Doubling n adds roughly 1 to E(S n ); an asymptotic explanation for this feature will be given in the next Section. While the variance mildly grows with n, the skewness and the kurtosis decrease with the sample size. All in all, we find the spread in S n rather small.
Looking more closely into the figures behind Fig. 1 reveals that the five outcomes with highest probabilities including the most probably value (mode) cover roughly 90% of the probability mass:  Table 3 looks more closely at the mode, mod n . While Fig. 1 and Table 2 are restricted to n = 2 s · 25 for s = 0, 1, 2, . . ., we consider now more generally n = 2 s · B and vary B ∈ {25, 30, 35}. From Table 3 we observe a logarithmic rate, mod n = log 2 n = s + log 2 B ; 2 as with the expectation this feature calls for an explanation provided in the next section. As we know from Fig. 1, the maximum probability decreases with n. For large n this probability seems to settle around 0.25 or slightly below, and an approximate explanation will be provided again in the next section. At the same time, it is interesting to look at the probabilities for larger values, say larger than the mode, P(S n > mod n ): Throughout, the probability for the maximum length of a spell to exceed the mode is varies only very little with s given n = 2 s · B, but the probability depends on B, which will be again clarified in the subsequent section. In any case we observe large probabilities P(S n > mod n ): Long spells relative to the sample size of concordance or discordance will be the rule rather than the exception. This is in line with the experimental evidence documented in Table 1 for no correlation.

Approximate results
In this section we compare our exact figures from Table 2 and 3 and Fig. 1 with approximate figures. The following approximation can be traced back to Földes (1975), see Révész (1990, Thm. 6). Easier to access is the proof by Földes (1979), while  Proof Földes (1979, Thm. 4).
Now we are equipped to turn to an approximation of S n with S n ≈ Z n + 1 building on P(S n = k) ≈ P(Z n = k − 1) for large n according to (6). The distribution of Z n can be approximated by truncating a Gumbel distribution with distribution function F n . Let V n be Gumbel distributed with parameters {log 2 n} − 1 and 1/ ln 2 such that and Var(V n ) = π 2 6 1 ln 2 2 , where γ ≈ 0.5772 is Euler's constant. It is known that F n (v) = P(V n ≤ v), v ∈ R, with F n given in Proposition 2, the mode is mod(V n ) = {log 2 n} − 1, i.e. the density f n (v) is maximized at mod(V n ), and the median is med(V n ) = mod(V n ) − ln(ln 2)/ln 2. We then have by Proposition 2 that Z n − log 2 n ≈ V n in the sense that P(Z n − log 2 n ≤ z − 1) ≈ P(V n ≤ z) = P( V n ≤ z − 1).
Remark 2 Note that the approximation in (7) builds on the convergence result in Proposition 2, which, however, may not be interpreted as a limiting distribution: The approximating random variable V n with the distribution function F n does not converge with n, simply because the fractional part 0 ≤ {log 2 n} < 1 does not.
More loosely speaking, it follows from Proposition 2 that (k = 1, 2, . . .) 3 Hence, P(S n = k) can be approximated by P n (k) defined as follows: As in Table 3, consider n = 2 s · B such that log 2 n = s + log 2 B with {log 2 (2 s · B)} = {log 2 B}. Obviously, P n ( log 2 (2 s · B) ) is constant for all s, Since {log 2 B} ∈ [0, 1), it is straightforward to verify that P n ( log 2 (2 s · B) ) varies only between 0.233 and 0.250, which explains P(S n = log 2 n ) in Table 3, in particular P n ( log 2 (2 s · 25) ) = 0.2482, P n ( log 2 (2 s · 30) ) = 0.2383, and P n ( log 2 (2 s · 35) ) = 0.2438. Similarly, Again, for B ∈ {25, 30, 35} this very well explains the figures from Table 3  Further, Fig. 2 displays selected differences of the exact and the approximate probabilities, P(S n = k) − P n (k): (9) does a fairly good job in approximating the single exact probabilities from Corollary 1 for n ≥ 100, while for n = 25 or n = 50 the deviations may be considerable. Using E(V n ) and Var(V n ), we could roughly approximate E(S n ) and Var(S n ), but more elaborate results are available from the literature. Because of (6) we have E(S n ) = n k=1 kP(S n = k) = n−1 k=0 (k + 1)P(Z n−1 = k) = E(Z n−1 ) + 1 . Gordon et al. (1986, Thm. 2) provided E(Z n ) ≈ log 2 n + γ /ln 2 − 3/2. It follows that More precisely, one has, see Guibas and Odlyzko (1980, Thm. 4.1), that where r (n) does not vanish but is small: |r (x)| ≤ 1.6 · 10 −6 for all x according to Guibas and Odlyzko (1980, p. 245). Due to r (n), S n does not converge with n even if demeaned by μ n , see Remark 2. Still, the mean can be very well approximated as the evaluation of (10) demonstrates: A look at Table 2 demonstrates the close correspondance with the exact expectation even for small n. Finally, Gordon et al. (1986, Thm. 2) provided an approximation of the variance, too. Since Var(S n ) = Var(Z n ) we have from their paper that Var(S n ) ≈ π 2 6 ln 2 2 + 1 12 ≈ 3.5070 , see also Guibas and Odlyzko (1980, Thm. 4.1). This value independent of n does not explain well the exact variances for small n given in Table 2.

Correlated random walks
Drawing from the paper by Gordon et al. (1986) we briefly consider an extensions of Proposition 2. We now relax Assumption 1 and allow for correlation between the random walks. In terms of the concordance from Definition 1, correlation allows for P(C i = 0) = P(C i = 1). Technically, this means we have a Bernoulli process without symmetry, which is the model for tossing a coin that is not fair. The stronger the positive correlation between the two random walks is, the larger is the probability of concordance p, p := P(C i = 0) and q := 1 − p = P(C i = 1).
..,n be a sequence of independent, identically distributed and continuous random variables with 0 < p < 1.

from equation (1) satisfy Assumption 2. It then holds uniformly for any integer z that
where m n, p := log 1/ p (nq).
Proof The result follows from Gordon et al. (1986, Thm. 1); details are provided in the Appendix.
Note that m n,1/2 = log 2 (n) − 1 and {log 2 (n) − 1} = {log 2 n}, such that Proposition 2 arises as a special case. Further, Proposition 3 allows to approximate in the sense of (8) that where μ n,q is defined analogously to m n, p from Proposition 3: μ n,q := log 1/q (np). Table 4 formalizes the following intuition: The stronger the correlation between the random walks is, i. e. the larger | p − 0.5| is, the more likely are long runs of zeros or ones in (C i ), depending on the sign of p − 0.5. The approximate results from the first panel of Table 4 are well supported by finite sample Monte Carlo estimates for p being not too large; for p = 0.77, however, the approximate figures are too conservative in that the Monte Carlo estimates are sizeable larger.

Summary
There exists a well understood asymptotic theory why one gets nonsensical correlation between independent long random walks, see Phillips (1986, Thm. 1). In this note we focus on finite samples with a special interest on small sizes. What is, for instance, the maximum length of random association (consecutive concordance or consecutive discordance) between two independent random walks of sample size n = 50? Evaluating Corollary 1, one can verify that the probability of the maximum length of random association being equal to 5 amounts to 27.68% (see also Fig. 1). The exact probability that this maximum length is at least equal to 5 amounts to 82.09% (see Table 3), and the expected value is 5.9783 (Table 2). Hence, long episodes (relative to the small sample size) of random association occur frequently, which explains why nonsense correlation arises between independent short random walks. We also included the case of correlated random walks where long episodes of association are of course more likely, see Table 4 for a quantification.
or '0,0,1' followed by sequences from Z n−1 (3), Z n−2 (3) and Z n−3 (3), respectively, and so on. The general recursion becomes: To initialize this recursion, one requires starting values N (k) m for m < k. In this case, all zero runs are necessarily shorter than k, i.e. all elements in B m satisfy Z m < k, such that which formally covers the case N

Proof of Corollary 1
By definition, S n is the maximum length of a spell (of zeros or ones). Denote by S n (k) the subset of B n meeting S n < k. Obviously, S n (1) equals the empty set. Generally, a new spell begins exactly when C i+1 differs from C i . Define the corresponding difference indicator By construction, P(D i = 1) = P(D i = 0) = 1/2, and (D i ) i=1,...,n−1 is a new BL process. Further, a zero run of length k − 1 in (D i ) is equivalent to a run of zeros or a run of ones of length k in (C i ). Therefore, #(S n (k)) = 2 #(Z n−1 (k − 1)), and with the previous notation this mean that #(S n (k)) = 2 N (k−1) With P(S n < k) = #(S n (k))/2 n , the proof is complete.

Proof of Proposition 3
Define as in Gordon et al. (1986) V n, p := W / ln(1/ p) + {m n, p }, where W follows a standard Gumbel distribution; note that V n,1/2 = V n from Sect. 5. The corresponding distribution function is known to become From Gordon et al. (1986, Thm. 1) we have that uniformly in z P Z n − m n, p ≤ z − 1 = P V n, p − {m n, p } ≤ z − 1 + o(1) , see also Arratia, Gordon and Waterman (1990, Coro. 3). Using P Z n − m n, p ≤ z − 1 − {m n, p } = P Z n − m n, p ≤ z − 1 together with P V n, p ≤ z − 1 = P V n, p < z = F n, p (z) , the claim follows.