Inference for time-varying lead–lag relationships from ultra-high-frequency data

A new approach for modeling lead–lag relationships in high-frequency financial markets is proposed. The model accommodates non-synchronous trading and market microstructure noise as well as intraday variations of lead–lag relationships, which are essential for empirical applications. A simple statistical methodology for analyzing the proposed model is presented, as well. The methodology is illustrated by an empirical study to detect lead–lag relationships between the S&P 500 index and its two derivative products.


Introduction
A big challenge in high-frequency financial econometrics is measuring lead-lag relationships wherein one asset is correlated to another asset with a delay. Two assets typically exhibit lead-lag relationships when they reflect new information with different speeds. A prominent example is the lead-lag relationship between the cash and futures markets wherein the latter leads the former (see, e.g., Kawaller et al. 1987;de Jong and Nijman 1997;Huth and Abergel 2014). Lead-lag relationships are also known as a source of the so-called Epps effect (see Renò 2003).

Statistics for High-Frequency Data
There are several attempts to model and analyze lead-lag relationships of highfrequency data. One approach is to utilize classical discrete time-series analysis such as lead-lag regression (Kawaller et al. 1987), cointegration (Hasbrouck 1995), crosscorrelation analysis (de Jong and Nijman 1997), and so on. In the meantime, Hoffmann et al. (2013) have introduced a continuous-time model to describe lead-lag relationships. A related model has also been studied in Robert and Rosenbaum (2010) by utilizing the random matrix theory and Ito and Sakemoto (2020) by multinomial dynamic time warping. Other approaches to investigate lead-lag relationships in a continuous-time framework include Hawkes process-based models (Bacry et al. 2013;Da Fonseca and Zaatour 2015), a wavelet-based method (Hayashi and Koike 2018), and a multi-asset lagged adjustment model (Buccheri et al. 2020). Several empirical approaches have been proposed, as well; see Pomponio and Abergel (2013) and Dobrev and Schaumburg (2016), for example.
In this study, we use Hoffmann et al. (2013)'s model which we call the HRY model as a baseline. In the HRY model, the lead-lag relationship is modeled by a pair of non-synchronously observed semimartingales where one is observed with a delay relative to the other. Empirical applications of the HRY model are found in Alsayed and McGroarty (2014); Huth and Abergel (2014); Ceron et al. (2016); Bollen et al. (2017). We expand this model in three directions. First, in high-frequency financial econometrics, it is well recognized that observed prices are subject to market frictions which cause many problems in statistical inferences as the observation frequency increases (see, e.g., Hansen and Lunde 2006). Therefore, for ultra-high-frequency financial data, the observed prices are typically modeled as semimartingales contaminated by noise (called the microstructure noise) rather than pure semimartingales. We thus introduce microstructure noise into the HRY model. We remark that there are a number of studies on volatility/covariation estimation for noisy semimartingales; see Chapter 7 of Aït-Sahalia and Jacod (2014), Shephard and Xiu (2017) and references therein. Second, it is well documented that intraday seasonal effects are an important factor of high-frequency financial data (see, e.g., Andersen and Bollerslev 1997;Bibinger et al. 2019;Ozturk et al. 2017). This motivates us to suppose that the time-lag is heterogeneous rather than constant over the course of the day. In fact, Huth (2012) has reported the existence of intraday variations of lead-lag relationships in financial markets. By these reasons, we introduce a heterogeneous time-lag into the HRY model. Third, because of the lowlatency responses of high-frequency traders in recent financial markets (cf. Hasbrouck and Saar 2013), we may expect that the time-lag is quite small, so that it is comparable with the sampling frequency. To take account of this fact explicitly in our model, we consider a local asymptotics, such that the time-lag shrinks as the sampling frequency increases. In econometrics, local asymptotics is a standard technique to make asymptotic theories more realistic in finite samples. Primal examples are studies on models with nearly unit roots (e.g., Phillips 1987;Phillips and Magdalinos 2007) and weak identification problems (e.g., Andrews and Cheng 2012). Examples in high-frequency financial econometrics include volatility estimation in the presence of round-off errors (Li and Mykland 2015;Rosenbaum 2009;Robert and Rosenbaum 2011;Li et al. 2018), small jump analysis (Li 2013) and inference under small microstructure noise (Kurisu 2018;Rosenbaum 2011). Under the proposed model, we develop a statistical methodology to investigate the lead-lag relationships. The methodology is established in line with a completely different idea from Hoffmann et al. (2013)'s one and enables us to flexibly analyze time-varying lead-lag relationships. In particular, we establish the asymptotic distribution theory for the proposed methodology, which allows us to discuss the statistical significance of the results obtained by applications of our methodology. We note that the asymptotic distribution theory for the estimator proposed in Hoffmann et al. (2013) is not straightforward; see Sect. 3.3 of Hoffmann et al. (2013) for details. This paper is organized as follows. In Sect. 2, we introduce the model used in this study. In Sects. 3-5, we develop statistical methodologies to analyze the global and local behaviors of the lead-lag relationships over the day. We assess the finite sample performances of the proposed methodologies by a Monte Carlo experiment in Sect. 6, while we provide an empirical illustration of our approach in Sect. 7. All the proofs are collected in the Appendix.

Model
We assume that X = (X 1 , X 2 ) is a bivariate continuous Itô semimartingale defined on a stochastic basis B = (Ω, F, (F t ) t≥0 , P) , which is of the form: where W s is a bivariate standard Wiener process on B , b s is a bivariate càdlàg (F t ) -adapted process, and Σ s is a 2 × 2 positive semidefinite symmetric matrix valued càdlàg (F t )-adapted process. We assume ∫ 1 0 Σ 12 s ds ≠ 0 a.s. We observe X on the interval [0, 1], and denote by (t p i ) i≥0 the observation times for X p for p = 1, 2 . We assume that t p i 's are (F t )-stopping times satisfying t p i ↑ ∞ as i → ∞ . We also assume that they implicitly depend on a parameter n ∈ ℕ representing the observation frequency and satisfy: as n → ∞ for any t > 0 with t p −1 ∶= 0 for p = 1, 2 . Here, the notation → p denotes convergence in probability.
For each p = 1, 2 , the observation Y p i of X p at the observation time t p i is given by: for i = 0, 1, … . Here, p i s are measurement errors which are referred to as the microstructure noise in financial econometrics, while p (t) denotes a latency of incorporating new information on the efficient log price X p into the observed price at the time t. We will assume that ( p (t)) t≥0 is a stochastic process adapted to (F t ) . For high-frequency financial data, we may expect that the latency is small, so that it is comparable with the sampling frequency. For this reason, we consider the local asymptotics, such that p ≡ p n = n − c p for some > 1 2 and some nonnegative-valued process (c p (t)) t≥0 .
We are interested in the process n (t) ∶= 2 n (t) − 1 n (t) , t ∈ [0, 1] , which we refer to as the spot lead-lag time. If n (t) > 0 for some t ∈ [0, 1] , the second asset's latency is larger than the first asset's one, so the first asset leads the second asset at the time t and the size of time-lag is equal to | n (t)| . The converse holds true if n (t) < 0 . Before considering the direct estimation of n (t) , which is discussed in Sect. 5, in the next section, we construct estimators for the following processes: where h n is a tuning parameter introduced in the next section. Note that we have L n,2 1 → ∫ 1 0 Σ 12 s ds a.s. as n → ∞ ; hence, L n,2 1 is asymptotically non-zero, because we assume ∫ 1 0 Σ 12 s ds ≠ 0 a.s. Now, if n (t) does not depend on t, we have n ≡ (h n ∕ ) arctan(L n,1 1 ∕L n,2 1 ) =∶ n , so we can construct an estimator for n by plugging the estimators for L n,1 1 and L n,2 1 into n . Even if n (t) depends on t, the quantity n remains meaningful as an index to capture an averaged behavior of the process n (t) on the interval [0, 1]. We call the variable n the spectral lead-lag index and will use a descriptive statistic for assessing lead-lag relationships in our empirical study.
Since the variables n tend to zero as n → ∞ , the estimation of n is only meaningful under the stronger statement than the usual consistency property. More precisely, since the variables n tend to zero as fast as n − in the sense that n n → p ∫ 1 0 (c 2 (s) − c 1 (s))Σ 12 s ds∕ ∫ 1 0 Σ 12 s ds as n → ∞ , a sequence ̂ n of estimators for n provides a meaningful estimation result if and only if: as n → ∞ . In the next section, we construct estimators for n having the above property.

Construction of the estimators
First, following , we define the spectral statistics as follows: and h n is a positive number, such that h −1 n ∈ ℕ . We assume that the sequence h n satisfies √ nh n → c as n → ∞ for some c > 0 . Namely, S p k is the Fourier sine coefficient of the observed returns of the pth asset on the interval J n k . We cannot directly use the cosine version of S p k s due to end effects, because cos(0) = − cos( ) = 1 ≠ 0 . To deal with this issue, we rely on the same trick as in Bibinger and Winkelmann (2015). Namely, We consider the spectral statistics on the shifted blocks ((k − 1 2 )h n , (k + 1 2 )h n ] , as well, i.e., S k− 1 2 ( k = 1, … , h −1 n − 1 ). Bibinger and Winkelmann (2015) use these statistics to handle jumps in their spectral covariance estimators. The following formula plays a key role: so Φ k−1 and −Φ k behave as the cosine function on the interval J n k−1∕2 . Now, we explain the idea behind the construction of our estimators. For exposition, we assume that Then, noting that | | ≤ h n ∕2 for sufficiently large n and: we have: Now, by applying the identity sin(y − x) = cos(x) sin(y) − sin(x) cos(y) , we obtain: The quantity on the right side of the above equation is equal to the integrand of L n,1 t multiplied by h n , so we naturally consider the following estimator for L n,1 t : An analogous argument suggests the following estimator for L n,2 t : where: Remark 3.1 (Cross-correlated noise) There is some empirical evidence, showing that microstructure noise is cross-correlated across multiple assets; see Voev and Lunde (2007) and Ubukata and Oya (2009). One may expect that the estimator L n,1 t constructed above would be robust against such cross-correlations as long as the serial dependence in the noise process is sufficiently weak in tick time. 2 To see this, note that summation by parts yields the following approximation for the noise part of S p k (cf. the last equation in page 363 of Bibinger and Winkelmann (2015)): This suggests that, even if E[ 1 i 2 i ] ≠ 0 , the expectation of the noise part of n,1 k is negligible by a similar argument to the above. This statement continues to hold true | decays sufficiently fast as |i − j| increases. This is because, in such a situation, we can replace the summand on the right side of (3.1) by a martingale difference due to Gordin's martingale approximation method (cf. Sect. 19 of Billingsley (1999)). In fact, this is exactly what we done in the proof to handle the serial dependence in the noise process; see (A.1) and (A.15). In the meantime, the estimator L n,2 t is biased in the presence of such crosscorrelations. In the equidistant sampling case, it is not difficult to see that the bias is proportional to the long-run cross-covariance of the noise process, which can be estimated with a faster convergence rate than L n,2 t and thus easily corrected; see Bibinger and Reiß (2014) for a serially uncorrelated case. However, the bias correction is not straightforward in the non-synchronous observation case. This is mainly because it is by now not established how to model cross-sectional dependence in microstructure noise in a both mathematically and empirically satisfactory manner (cf. the discussion after Assumption 3 in Bibinger et al. (2019)). For this reason, this paper focuses on the situation where 1 i and 2 j are uncorrelated for all i, j.

Asymptotic theory
In this section, we present an asymptotic theory for the process � L n n,2 1 In the numerical experiments below, we use 2 ) for the finite sample adjustments. This modification does not affect the asymptotic results presented in this paper. 2 Jacod et al. (2017) found empirical evidence that it is reasonable to assume autocorrelations of microstructure noise as a function of tick time.
(ii) r n (t) = o p (n − ) as n → ∞ for any t > 0 and any ∈ (0, 1). (iii) For any n ∈ ℕ and p = 1, 2 , there is a filtration (H n,p t ) t≥0 of F , such that t p i is an (H n,p t )-stopping time for every i. (iv) For each n there is a random subset N n of ℤ + , such that as n → ∞ for every t > 0 and every p = 1, 2.
(v) For any n ∈ ℕ , p = 1, 2 and r = 1, 2 , there is a càdlàg (H n,p Moreover, there is a càdlàg (F t )-adapted positive-valued process G(r) p , such that G(r) n,p → p G(r) p as n → ∞ for the Skorokhod topology. (i) u p is strictly stationary and independent of F ∞ ∶= ⋁ t>0 F t for every p = 1, 2.

Remark 3.2 (Assumptions on observation times) (a)
[A1](i) type assumptions are sometimes called the strong predictability condition in the literature and can be found in, e.g., Hayashi and Yoshida (2011) and Koike (2014Koike ( , 2016. In our situation, this type of condition is necessary to ensure that the "delayed" observation times (t p i − p n (t p i )) + are nearly (F t )-stopping times (see Lemma A.2). Here, we emphasize that in our setting, this type of assumption is required due to the possible existence of lead-lag relationships: in our setting, the process X p is essentially sampled at the times t  i ) are not necessarily stopping times unless we impose a kind of predictability condition. Therefore, if we developed an asymptotic theory without such a condition in our setting, we would presumably need to depart from the framework of Itô calculus and rely on anticipative calculus (e.g., Malliavin calculus), which will be mathematically challenging. In fact, in Hoffmann et al. (2013), they also impose an analogous condition due to the same reason (see Assumption B2 of Hoffmann et al. (2013)). Indeed, their assumption is stronger than ours, because they consider lead-lag times which do not shrink as n tends to infinity. We also remark that our assumption allows, e.g., random sampling times independent of the -field F ∞ , because such ones can be assumed to be F 0 -measurable without loss of generality, so we do not necessarily know transactions completely in advance (i.e., there may still be exogenous randomness).
(b) [A1](ii)-(iv) type assumptions are more or less standard in the literature and found in Barndorff-Nielsen et al. (2011) and Koike (2014Koike ( , 2016 for example. Here, we remark that the introduction of the random set N n is mainly necessary to ensure the stability of Assumption [A1] under the localization procedure used in the proof; see the proof of Lemma 6.3 from Koike (2017b) for details. Of course, one can take the set N n as the empty set, which amounts to a standard situation in the literature. Another reason why we introduce the set N n is because it excludes some trivial exceptions appearing when we set N n = � . For example, if t p 0 = log n∕n and t Jacod et al. (2019Jacod et al. ( , 2017. It allows the noise process to have time-varying variance and serial autocorrelations. Both properties are the stylized facts of ultra-highfrequency financial data (see, e.g., Hansen and Lunde 2006).

Remark 3.3 (Assumptions on microstructure noise) An [N ] type assumption is used in
. We impose such an assumption due to the reason explained in Remark 3.1. We also remark that the existence of all moments of the noise required by [N ] is standard in the literature (see, e.g., Assumption 16.1.1 of Jacod and Protter (2012) and Assumption (N-v) of Jacod et al. (2019)) and not a serious practical restriction as noted in Remark 16.1.2 of Jacod and Protter (2012). It would be possible to state the assumption to require the finite moment up to a suitable order which depends on other parameters such as , and so on.
Let us recall the notion of stable convergence. Given a sequence (Z n ) of random variables taking values in a Polish space S and a sub--field G of F , we say that the variables Z n converge G-stably in law to an S-valued variable Z, which is defined on an extension of (Ω, as n → ∞ for any G-measurable bounded variable U and any bounded continuous function f on S. Then, we write Z n → G−d s Z . In this case, for any variables U n converging in probability to a G-measurable variable U, we have (U n , Z n ) → G−d s (U, Z) as n → ∞ for the product topology on the space ℝ × S. Now, we are ready to state our asymptotic result.
s ds ) as n → ∞ for the Skorokhod topology, where W 1 and W 2 are mutually independent standard Brownian motions independent of F ∞ , and: Remark 3.4 (Mixing condition on the noise process) It might be worth remarking that the mixing condition imposed in Theorem 3.1 is stronger than the usual one ∑ ∞ j=1 p (j) < ∞ in classical time-series analysis, but this is standard in the literature of volatility/covariance estimation from high-frequency data. For example, Jacod et al.
Theorem 3.1 has some important conclusions. First, since L n,1 1 → 0 a.s., L n,2 1 → ∫ 1 0 Σ 12 s ds a.s., and: the property of stable convergence and the continuous mapping theorem imply that: Next, noting L n,1 1 ∕L n,2 1 → 0 a.s., we obtain: as n → ∞ by a similar argument to the proof of the delta method, where: In particular, it holds that: as n → ∞ . Therefore, the estimators ̂ n satisfy property (2.2) as long as < 3∕4.
Remark 3.5 (Necessity of the condition < 3∕4 ) It is worth mentioning that there is no sequence of estimators having property (2.2) if ≥ 3∕4 in the following sense. Let us suppose that the observation data are generated by the following simpler model: Gaussian variables with mean 0 and variance v > 0 , ∈ ℝ is the lead-lag parameter. Note that this model corresponds to a simplified version of our model (2.1), such that b s ≡ 0: ∼ (0, v) : when the time lag process n (t) does not depend on t, we may set 1 n ≡ 0 or 2 n ≡ 0 in accordance with the sign of n (t) , so we have adopted such a specification in the above model. Now, we denote by P n, the law of (Y 1 Then, from Corollary 2.1 of Koike (2017a), (P n, ) ∈ℝ has the LAN property at = 0 with rate n −3∕4 and asymptotic Fisher information 2 v 3∕2 ∕{2( √ 1 + + √ 1 − )} . In particular, by the local asymptotic minimax theorem (see, e.g., Theorem 1 from Ch. 6 of Le Cam and Yang (2000)), we obtain: for any > 0 and any sequence ̂ n of estimators.
Another application of Theorem 3.1 is the construction of confidence intervals for n . For this purpose, we need estimators for the asymptotic variances of L n,1 1 and L n,2 1 , as well, and we address this issue in the next subsection.

Asymptotic variance estimation
Since the analytic expressions of the asymptotic variances of L n,1 t and L n,2 t are complex, we estimate them by subsampling. 3 Since we consider infill-asymptotics, traditional subsampling methods (e.g., Politis and Romano (1994)) should be modified due to incorrect centering. Several subsampling methods for high-frequency data have recently been proposed, cf. Kalnina (2011), Sect. 4 of Christensen et al. (2013), Mykland and Zhang (2017), Christensen et al. (2017), Sect. 3.1 of Ikeda (2016). In this paper, we adopt an "overlapping version" of the method proposed in Sect. 4 of Christensen et al. (2013). Set n k = ( n,1 k , n,2 k ) ⊤ for k = 1, … , h −1 n − 1 , and take a positive integer K n as the number of subsamples. Then, we define: We may expect that adjacent subsampled estimators L n ( + K n ) and L n ( ) would have conditional expectations close to each other. Motivated by this, we introduce the following estimator for the asymptotic covariance matrix of L n t , t ∈ [0, 1]: The validity of this subsampling method is ensured by the following theorem: 4

Theorem 3.2 Suppose that [A1]-[A2] and [N ]
are satisfied for some ∈ (0, 1 4 ) . Suppose also that the paths of Σ 12 are almost surely -Hölder continuous for some Now, under the assumptions of Theorems 3.1-3.2, we can construct confidence intervals of n as follows. Since we have: as n → ∞ , where 5 : a 100(1 − ) % confidence interval of n for ∈ (0, 1) is given by: In the numerical experiments below, we compute for the finite sample adjustment. This modification does not affect the asymptotic result given by Theorem 3.2. 5 Although Theorem 3.1 ensures the asymptotic independence between L n,1 1 and L n,2 1 , we take it into account in the construction of the asymptotic variance estimator V n to improve the finite sample accuracy.
with z ∕2 being the upper ∕2-quantile of the standard normal distribution.
Remark 3.6 (Relation to Mykland and Zhang (2017)) The diagonal entries of V n t are related to rolling quadratic variations of integrated processes introduced in (Mykland and Zhang 2017, Definition 2). In fact, for f = 1, 2 , 2h nV n,ff 1 is almost identical to the quantity QV B,K (Θ) in Mykland and Zhang (2017) with taking n,f , Mykland and Zhang (2017) have shown that, even after appropriately rescaling, QV B,K (Θ) is generally a biased estimator for the asymptotic variance of Θ (0,1] ; see Sect. 3.2 ibidem. However, Mykland and Zhang (2017) have also shown that this is not the case when edge effects therein are "tiny"; see Sect. 3.3 ibidem for details. The proofs of Propositions A.1 and A.3 in the Appendix suggest that our estimator would satisfy this tiny edge effects condition. In fact, n − and K n ΔT n in Mykland and Zhang (2017) corresponds to √ h n and K n h n , respectively, while the average of the squared edge effects in L n,f 1 given by Eq. (21) of Mykland and Zhang (2017) would be of order O p (h 1+a n ) for any a ∈ (0, 1) . Under the assumptions of Theorem 3.2, K n h n has the same order as h b n for some b ∈ ( 1 2 , 1) . All together, Remark 9 of Mykland and Zhang (2017) would imply that V n,ff 1 is a consistent estimator for the asymptotic variance of L n,f 1 .

Testing the absence of the lead-lag relationship
In this section, we present another application of Theorem 3.1 to decide whether the spot lead-lag time n (t) is identical to zero on the interval [0, 1] or not. Noting that n (t) is stochastic, our purpose is formulated as follows (cf. Aït-Sahalia and Jacod (2009)): we decompose the sample space Ω into two disjoint subsets: and we decide whether the realized outcome ∈ Ω belongs to Ω 0 or to Ω 1 . In the following, we set " ∈ Ω 0 " as the null hypothesis and " ∈ Ω 1 " as the alternative.
To construct the test statistic, we borrow the idea from Dette and Podolskij (2008) and Vetter and Dette (2012). We note that ∈ Ω 0 is equivalent to L n,1 ( ) t = 0 for all t ∈ [0, 1] , provided that Σ 12 ( ) t ≠ 0 for all t ∈ [0, 1] . This suggests us to consider the following Kolmogorov-Smirnov type test statistic: We can also consider the Kuiper type test statistic as follows: Then, an application of Theorem 3.1 and the continuous mapping theorem yields the following convergence: ) , respectively. More formally, given a significant level ∈ (0, 1) , we have: as n → ∞ , provided that P(Ω 0 ) > 0 . Moreover, by the construction of the test statistics, we have: as n → ∞ , provided that P(Ω 1 ) > 0 and Σ 12 t ≠ 0 for all t ∈ [0, 1] . Consequently, the hypothesis testing based on T KS n (resp. T

Estimation of the spot lead-lag time
Now, we focus on the direct estimation of the spot lead-lag time n (t) , t ∈ [0, 1] . We remark that n (t) can be rewritten as n (t) = (h n ∕ ) arctan sin h n n (t) ∕ cos h n n (t) . In Sect. 3, we have constructed the estimators for the integrated quantities of sin h n n (t) and cos h n n (t) , so estimators for the latter ones can be naturally constructed by a kernel approach as in the literature on spot volatility estimation (cf. Kristensen (2010), Kanaya and Kristensen (2016)). Specifically, for a function K ∶ ℝ → ℝ and t ∈ (0, 1) , we set: where K H n (s) = K(s∕H n )∕H n for every s ≥ 0 and H n > 0 is a bandwidth parameter.
1 3 as n → ∞ , where w 1 and w 2 are mutually independent standard normal variables independent of F ∞ . Now, the estimator for the spot lead-lag time is constructed as: Under the assumptions of Theorem 5.1, we have: as n → ∞ , provided that Σ 12 t ≠ 0 . In particular, ̂ n (t) is a meaningful estimator for n (t) as long as n h  ) . Therefore, Theorem 5.1, the continuous mapping theorem, and Eq. (5) in Marsaglia (1965) imply that the variables: converge F ∞ -stably in law to the variable = o(n − ) to make ̂ n (t) a meaningful estimator for n (t) , and in this case, we have 1 {Σ 12 t =0}̂ n (t) = o p (n − ) as n → ∞ . Consequently, if Σ 12 t = 0 , ̂ n (t) will behave as if there is no significant time-lag between two assets.
To estimate the asymptotic variances of the estimators, we use a kernel counterpart of the subsampling estimator developed in Sect. 3.3. Namely, we define: As in Sect. 3.3, we can construct confidence intervals of n (t) under the assumptions of Theorems 5.1-5.2 as follows. Since we have: as n → ∞ , where a 100(1 − ) % confidence interval of n (t) for ∈ (0, 1) is given by:

Simulation study
In this section, we conduct a Monte Carlo study to assess the finite sample performance of the proposed methodology. We simulate the efficient log-price process X p ( p = 1, 2 ) from the Rough Fractional Stochastic Volatility (RFSV) model of Gatheral et al. (2018): where (B 1 , B 2 ) is a two-dimensional standard Brownian motion, such that [B 1 , B 2 ] t = Rt and B H,p is a fractional Brownian motion with Hurst parameter H. We assume that (B 1 , B 2 ) , B H,1 , and B H,2 are mutually independent. As in Section 3.4 of Gatheral et al. (2018), we set H = 0.14 , = 0.3 , m = −5 , and = 5 × 10 −4 . Z p 0 is taken from the stationary distribution of Z p . We vary the correlation parameter R as R = 0.5, 0.7, 0.9.
We generate the observation data as follows. First, we simulate the equidistant sampling (X p i∕n− p n (i∕n) ) n i=0 of the time-lagged efficient log-price process by the Euler-Maruyama scheme, where we set n = 23, 400 . We regard the interval [0, 1] as 6.5 h, so that Δ n ∶= 1∕n corresponds to 1 s. Then, for each i, we keep the value X p i∕n as an observation for the p-th asset with probability p , where we set 1 = 2∕3 and 2 = 1∕2 . After that, we add microstructure noise to the observations. We generate the noise process ( p i ) from the following AR(1) process: , where corresponds to the noise ratio of Oomen (2006). We set = 0.77 and = 0.5 as in Sect. 2.3 of Christensen et al. (2014). For the latency processes 1 n (t) and 2 n (t) , we consider the following two scenarios: Here, we vary the parameter as = 0, 1, 2, 3, 4 . 10,000 paths are generated for each scenario. We use h n = 20Δ n to calculate the estimator L n and K n = 20 to compute the subsampling-based asymptotic variance estimators. In addition, we set H n = 2h 1∕3 n and K(t) = 3 4 (1 − t 2 )1 [−1,1] (t) (the Epanechnikov kernel), while we compute the spot lead-lag time estimator ̂ n (t). Table 1 presents the results of estimating the spectral lead-lag index n . We report the bias and the root-mean-square error (RMSE) of the estimator ̂ n in seconds, i.e., the bias and the RMSE of n ⋅̂ n . We see from the table that the biases of the estimates are less than 1% of the true values and the RMSEs are mild, indicating a good finite sample performance of our estimator. The table also reveals that both the biases and the RMSEs decrease as the correlation parameter increases. This is in line with our asymptotic theory and intuitively natural as well, because the lead-lag relationships would be more pronounced with higher correlations. Table 2 shows the results for the Studentization of L n,1 1 ∕L n,2 1 to check the accuracy of the standard normal approximation. We report the sample means, sample standard deviations (SD), as well as 95 and 99% coverages of the studentization of L n,1 1 ∕L n,2 1 , i.e., the variable in the left side of (3.3). The results of the table show that the standard normal approximation works very well.    Table 4 provides the results of estimating the spot lead-lag time n (t) . We report the root mean integrated square error (RMISE) of the estimator ̂ n (t) in seconds, that is: The results show the relatively large errors compared to the estimation of the spectral lead-lag index, which is because of the slower rate of convergence presented by Theorem 5.1. However, for high correlation situations, they still seem acceptable.
In summary, our new methodology exhibits a good finite sample performance to analyze small and time-varying lead-lag relationships, and it gives a promising way to investigate lead-lag relationships in high-frequency financial data.

Empirical illustration
To illustrate how our new methodology works in real data, we analyze lead-lag relationships between the S&P 500 index and its two derivative products: the E-mini S&P 500 futures and the S&P 500 Standard and Poor's Depository Receipt (SPDR) Exchange-Traded Fund (ETF). The sample period is the whole of January 2017, containing 20 trading days. We use intraday transaction data taken from the Bloomberg with the accuracy of the timestamp values being one second. We set h n = 20 seconds , K n = 20 , H n = 2h 1∕3 n , and K(t) = 3 4 (1 − t 2 )1 [−1,1] (t) as in the simulation study. Figure 1 shows the estimates ̂ n of the spectral lead-lag indices and the associated 95% confidence intervals. The red ones are significantly non-zero at the 5% level. The left panel of the figure shows strong evidence that the futures lead the index, which is consistent with the previous studies (see Kawaller et al. (1987) andde Jong andNijman (1997) for instance). The middle panel also reveals strong evidence that the ETF leads the index. This means that the ETF dominates the index in terms of the price discovery process and it is again consistent with the previous studies such as Tse et al. (2006). The right panel indicates that the futures tend to lead the ETF, but it is less pronounced than the relationships between the index and the futures/ETF. A similar finding in terms of the price discovery process is reported by Tse et al. (2006). Figure 2 shows the spot lead-lag time estimates of ̂ n (t) averaged over the sample period and the corresponding 95% confidence bands. We find that U-shape patterns of the intraday variations for the pairs of the index and the futures/ETF. Namely, the spot lead-lag times are shorter at the beginning and the end of the days, while longer at the middle of the days. The peaks of the lead-lag times are located around 14:00 pm. This would be because the market is active at the beginning and the end Fig. 1 Values of ̂ n and their 95% confidence intervals between the S&P 500 index, the E-mini futures and the SPDR ETF on the trading days in January 2017. Note: left panel: Y 1 is the S&P 500 index and Y 2 is the E-mini futures, middle panel: Y 1 is the S&P 500 index and Y 2 is the SPDR ETF, and right panel: Y 1 is the E-mini futures and Y 2 is the SPDR ETF. � n > 0 implies that Y 1 leads Y 2 , and vice versa. Significantly non-zero values are colored red of the days, yielding fast responses of traders and shorter lead-lag times. On the other hand, the spot lead-lag time process between the futures and the ETF exhibits an increasing trend throughout the day, although the confidence bands for the estimates are comparatively wide and, thus, the significance of this trend seems unclear.

Auxiliary results on the observation times
We begin by proving some auxiliary results on the observation times. For each n and each p = 1, 2 , we set N p n (t) = Proof First, by an analogous argument to the proof of Lemma 6.3 form Koike (2017b), it is enough to prove the lemma with an additional assumption that sup i≥0 (t p i − t p i−1 ) ≤ n − holds true for every n with some ∈ ( + 1∕2, 1). . By assumption, we have N p uniformly in t ∈ [0, T] . On the other hand, noting that we can prove N p n (T) = O p (n) in a similar manner to the proof of Lemma 6.1 from Koike (2017b), we obtain: This completes the proof of the lemma. ◻ The following result is an immediate consequence of Lemma A.1: Next, take a number 1 , such that ∨ (1 − ) < 1 < 1 2 and set is Therefore, we obtain {t p i ≤ t} ∈ I t+n − log n . Since n 1 − log n → 0 as n → ∞ , for sufficiently large n, we have I t+n − log n ⊂ F (t−n − 1 ∕2) + . This completes the proof. ◻

Notation
In this subsection, we introduce some notation used throughout the proof. For (random) sequences (x n ) and (y n ) , x n ≲ y n means that there exists a (non-random) constant 1 3 K ∈ [0, ∞) , such that x n ≤ Ky n for large n. For a real-valued function f defined on [0, 1] and > 0 , we denote its modulus of continuity by w(f ; ) , that is: . We set T n k = kh n for all n, k. For each p = 1, 2 and every k = 0, 1, … , h −1 n − 1 , we define the variables S p k (X) and S p k ( ) by: We set L n t = (L n,1 t , L n,2 t ) ⊤ and � L n t = ( � L n,1 t , � L n,2 t ) ⊤ for t ≥ 0.   (1999)). and n k = ( n,1 k , n,2 k ) ⊤ . For every k, we also define the bivariate variable ̄n k = (̄n ,1 k ,̄n ,2 k ) ⊤ by: and Then, we set ̂n -measurable and satisfies E[̂n k |F n T n k−1 ] = 0 (here, we use the independence between 1 and 2 ).

Proof of Theorem 3.1
First of all, for the proof, we may additionally assume the following condition: B The processes b, Σ , v 1 , and v 2 are bounded. .
This is due to a standard localization procedure described in detail in Lemma 4.4.9 of Jacod and Protter (2012) for instance. Moreover, noting Corollary A.1, by an analogous localization argument to Lemma 6.3 from Koike (2017b), we may also assume the following strengthened version of [A1]:

SA1
We have [A1], and for every n, it holds that: and where r n = n − and is a constant satisfying: Now, we turn to the main body of the proof. The proof of the theorem is completed once we show that the following three propositions hold true:

Proof of Proposition A.1
First, using elementary properties of the trigonometric functions, we obtain: for any n ∈ ℕ , k ∈ ℝ , and t, s ≥ 0 as well as: for any n ∈ ℕ , k ∈ ℝ , i ∈ ℤ + , and p = 1, 2.
Next, we prove some auxiliary results.

Lemma A.3 Under the assumptions of Proposition A.1, we have:
Proof Summation by parts yields: Since it holds that ∑ ∞ i=−1 ΔΦ k (i, p) = 0 , we have: √ h n log n) in a similar manner. Also, an analogous argument to the above yields:  for any n ∈ ℕ , k ∈ ℝ , i ∈ ℤ + , and p = 1, 2.
Proof By the independence between (u Now, for any q ≥ r , Lemma 3.102 from Ch. VIII of Jacod and Shiryaev (2003) and [N ] yield: Since we can take q ≥ r , such that 1∕r − 1∕q > by assumption, we obtain the desired result by [N ]. ◻ Lemma A.6 Under the assumptions of Proposition A.1, for any r ∈ [2, −1 ) , there is a constant K r > 0 , such that: for any p = 1, 2 , n ≥ 1 , and k = 0, 1, … , h −1 n . Proof By symmetry, it suffices to prove the first convergence in (A.8).
First, the case of Z = B is obvious, because | � S p k (B)| ≲ h n +r n + n (h n +r n ) ≲ h n uniformly in k.
Next, we consider the case of Z = . Since  (1975)) yields: Since | � S p k (B)| ≲ h n , the desired result follows from Lemma A.6. Finally, we consider the case of Z = M . We decompose ∑⌊th −1 n ⌋−1 k=1 f k (B, M) as: First, we prove: We adopt the same strategy as in Sect. 6.1.3 of Christensen et al. (2013). For any càdlàg process V, we denote by N V (t) the number of jumps of V bigger than > 0 before time t. Furthermore, we define: for , > 0 . We evidently have lim →0 lim sup →0 m , (V) = 0 a.s. Now, we have: for any > 0 and i ∈ ℤ + , such that t p i ∈ (T n k−3∕2 , T n k+1 ] , so we obtain: On the other hand, the Schwarz inequality yields: and by Lemma A.2, we obtain:

Proof of Proposition A.2
The following lemma is an easy consequence of the BDG inequality.
Lemma A.8 Under the assumptions of Proposition A.2, for any r > 0 , there is a constant C r > 0 , such that: for any p = 1, 2 and n, k.