1 Introduction

A big challenge in high-frequency financial econometrics is measuring lead–lag relationships wherein one asset is correlated to another asset with a delay. Two assets typically exhibit lead–lag relationships when they reflect new information with different speeds. A prominent example is the lead–lag relationship between the cash and futures markets wherein the latter leads the former (see, e.g., Kawaller et al. 1987; de Jong and Nijman 1997; Huth and Abergel 2014). Lead–lag relationships are also known as a source of the so-called Epps effect (see Renò 2003).

There are several attempts to model and analyze lead–lag relationships of high-frequency data. One approach is to utilize classical discrete time-series analysis such as lead–lag regression (Kawaller et al. 1987), cointegration (Hasbrouck 1995), cross-correlation analysis (de Jong and Nijman 1997), and so on. In the meantime, Hoffmann et al. (2013) have introduced a continuous-time model to describe lead–lag relationships. A related model has also been studied in Robert and Rosenbaum (2010) by utilizing the random matrix theory and Ito and Sakemoto (2020) by multinomial dynamic time warping. Other approaches to investigate lead–lag relationships in a continuous-time framework include Hawkes process-based models (Bacry et al. 2013; Da Fonseca and Zaatour 2015), a wavelet-based method (Hayashi and Koike 2018), and a multi-asset lagged adjustment model (Buccheri et al. 2020). Several empirical approaches have been proposed, as well; see Pomponio and Abergel (2013) and Dobrev and Schaumburg (2016), for example.

In this study, we use Hoffmann et al. (2013)’s model which we call the HRY model as a baseline. In the HRY model, the lead–lag relationship is modeled by a pair of non-synchronously observed semimartingales where one is observed with a delay relative to the other. Empirical applications of the HRY model are found in Alsayed and McGroarty (2014); Huth and Abergel (2014); Ceron et al. (2016); Bollen et al. (2017). We expand this model in three directions. First, in high-frequency financial econometrics, it is well recognized that observed prices are subject to market frictions which cause many problems in statistical inferences as the observation frequency increases (see, e.g., Hansen and Lunde 2006). Therefore, for ultra-high-frequency financial data, the observed prices are typically modeled as semimartingales contaminated by noise (called the microstructure noise) rather than pure semimartingales. We thus introduce microstructure noise into the HRY model. We remark that there are a number of studies on volatility/covariation estimation for noisy semimartingales; see Chapter 7 of Aït-Sahalia and Jacod (2014), Shephard and Xiu (2017) and references therein. Second, it is well documented that intraday seasonal effects are an important factor of high-frequency financial data (see, e.g., Andersen and Bollerslev 1997; Bibinger et al. 2019; Ozturk et al. 2017). This motivates us to suppose that the time-lag is heterogeneous rather than constant over the course of the day. In fact, Huth (2012) has reported the existence of intraday variations of lead–lag relationships in financial markets. By these reasons, we introduce a heterogeneous time-lag into the HRY model. Third, because of the low-latency responses of high-frequency traders in recent financial markets (cf. Hasbrouck and Saar 2013), we may expect that the time-lag is quite small, so that it is comparable with the sampling frequency. To take account of this fact explicitly in our model, we consider a local asymptotics, such that the time-lag shrinks as the sampling frequency increases. In econometrics, local asymptotics is a standard technique to make asymptotic theories more realistic in finite samples. Primal examples are studies on models with nearly unit roots (e.g., Phillips 1987; Phillips and Magdalinos 2007) and weak identification problems (e.g., Andrews and Cheng 2012). Examples in high-frequency financial econometrics include volatility estimation in the presence of round-off errors (Li and Mykland 2015; Rosenbaum 2009; Robert and Rosenbaum 2011; Li et al. 2018), small jump analysis (Li 2013) and inference under small microstructure noise (Kurisu 2018; Rosenbaum 2011).

Under the proposed model, we develop a statistical methodology to investigate the lead–lag relationships. The methodology is established in line with a completely different idea from Hoffmann et al. (2013)’s one and enables us to flexibly analyze time-varying lead–lag relationships. In particular, we establish the asymptotic distribution theory for the proposed methodology, which allows us to discuss the statistical significance of the results obtained by applications of our methodology. We note that the asymptotic distribution theory for the estimator proposed in Hoffmann et al. (2013) is not straightforward; see Sect. 3.3 of Hoffmann et al. (2013) for details.

This paper is organized as follows. In Sect. 2, we introduce the model used in this study. In Sects. 35, we develop statistical methodologies to analyze the global and local behaviors of the lead–lag relationships over the day. We assess the finite sample performances of the proposed methodologies by a Monte Carlo experiment in Sect. 6, while we provide an empirical illustration of our approach in Sect. 7. All the proofs are collected in the  Appendix.

2 Model

We assume that \(X=(X^1,X^2)\) is a bivariate continuous Itô semimartingale defined on a stochastic basis \(\mathcal {B}=(\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\ge 0},P)\), which is of the form:

$$\begin{aligned} X_t=X_{0}+\int _{0}^tb_s\mathrm {d}s+\int _{0}^t\Sigma _s^{1/2}\mathrm {d}W_s, \end{aligned}$$

where \(W_s\) is a bivariate standard Wiener process on \(\mathcal {B}\), \(b_s\) is a bivariate càdlàg \((\mathcal {F}_t)\)-adapted process, and \(\Sigma _s\) is a \(2\times 2\) positive semidefinite symmetric matrix valued càdlàg \((\mathcal {F}_t)\)-adapted process. We assume \(\int _0^1\Sigma ^{12}_s\mathrm {d}s\ne 0\) a.s. We observe X on the interval [0, 1], and denote by \((t^p_i)_{i\ge 0}\) the observation times for \(X^p\) for \(p=1,2\). We assume that \(t^p_i\)’s are \((\mathcal {F}_t)\)-stopping times satisfying \(t^p_i\uparrow \infty\) as \(i\rightarrow \infty\). We also assume that they implicitly depend on a parameter \(n\in \mathbb {N}\) representing the observation frequency and satisfy:

$$\begin{aligned} r_n(t):=\max _{p=1,2}\max _{i\ge 0:t^p_i\le t}(t^p_i-t^p_{i-1})\rightarrow ^p0 \end{aligned}$$

as \(n\rightarrow \infty\) for any \(t>0\) with \(t^p_{-1}:=0\) for \(p=1,2\). Here, the notation \(\rightarrow ^p\) denotes convergence in probability.

For each \(p=1,2\), the observation \(Y^p_i\) of \(X^p\) at the observation time \(t^p_i\) is given by:

$$\begin{aligned} Y^p_{i}=X^p_{(t^p_i-\vartheta ^p(t^p_i))_+}+\epsilon ^p_{i} \end{aligned}$$
(2.1)

for \(i=0,1,\dots\). Here, \(\epsilon ^p_i\)s are measurement errors which are referred to as the microstructure noise in financial econometrics, while \(\vartheta ^p(t)\) denotes a latency of incorporating new information on the efficient log price \(X^p\) into the observed price at the time t. We will assume that \((\vartheta ^p(t))_{t\ge 0}\) is a stochastic process adapted to \((\mathcal {F}_t)\). For high-frequency financial data, we may expect that the latency is small, so that it is comparable with the sampling frequency. For this reason, we consider the local asymptotics, such that \(\vartheta ^p\equiv \vartheta ^p_n=n^{-\alpha }c^p_\vartheta\) for some \(\alpha >\frac{1}{2}\) and some nonnegative-valued process \((c^p_\vartheta (t))_{t\ge 0}\).

We are interested in the process \(\vartheta _n(t):=\vartheta _n^2(t)-\vartheta _n^1(t)\), \(t\in [0,1]\), which we refer to as the spot lead–lag time. If \(\vartheta _n(t)>0\) for some \(t\in [0,1]\), the second asset’s latency is larger than the first asset’s one, so the first asset leads the second asset at the time t and the size of time-lag is equal to \(|\vartheta _n(t)|\). The converse holds true if \(\vartheta _n(t)<0\). Before considering the direct estimation of \(\vartheta _n(t)\), which is discussed in Sect. 5, in the next section, we construct estimators for the following processes:

$$\begin{aligned} L^{n,1}_t=\int _0^t\sin \left( \frac{\pi }{h_n}\vartheta _n(s)\right) \Sigma _s^{12}\mathrm {d}s,\qquad L^{n,2}_t=\int _0^t\cos \left( \frac{\pi }{h_n}\vartheta _n(s)\right) \Sigma _s^{12}\mathrm {d}s,\qquad t\in [0,1], \end{aligned}$$

where \(h_n\) is a tuning parameter introduced in the next section. Note that we have \(L^{n,2}_1\rightarrow \int _0^1\Sigma _s^{12}\mathrm {d}s\) a.s. as \(n\rightarrow \infty\); hence, \(L^{n,2}_1\) is asymptotically non-zero, because we assume \(\int _0^1\Sigma ^{12}_s\mathrm {d}s\ne 0\) a.s. Now, if \(\vartheta _n(t)\) does not depend on t, we have \(\vartheta _n\equiv (h_n/\pi )\arctan (L^{n,1}_1/L^{n,2}_1)=:\mathbf {SLL}_n\), so we can construct an estimator for \(\vartheta _n\) by plugging the estimators for \(L^{n,1}_1\) and \(L^{n,2}_1\) into \(\mathbf {SLL}_n\). Even if \(\vartheta _n(t)\) depends on t, the quantity \(\mathbf {SLL}_n\) remains meaningful as an index to capture an averaged behavior of the process \(\vartheta _n(t)\) on the interval [0, 1]. We call the variable \(\mathbf {SLL}_n\) the spectral lead–lag index and will use a descriptive statistic for assessing lead–lag relationships in our empirical study.

Since the variables \(\mathbf {SLL}_n\) tend to zero as \(n\rightarrow \infty\), the estimation of \(\mathbf {SLL}_n\) is only meaningful under the stronger statement than the usual consistency property. More precisely, since the variables \(\mathbf {SLL}_n\) tend to zero as fast as \(n^{-\alpha }\) in the sense that \(n^\alpha \mathbf {SLL}_n\rightarrow ^p\int _0^1(c^2_\vartheta (s)-c^1_\vartheta (s))\Sigma ^{12}_s\mathrm {d}s/\int _0^1\Sigma ^{12}_s\mathrm {d}s\) as \(n\rightarrow \infty\), a sequence \(\widehat{\mathbf {SLL}}_n\) of estimators for \(\mathbf {SLL}_n\) provides a meaningful estimation result if and only if:

$$\begin{aligned} n^\alpha (\widehat{\mathbf {SLL}}_n-\mathbf {SLL}_n)\rightarrow ^p0 \end{aligned}$$
(2.2)

as \(n\rightarrow \infty\). In the next section, we construct estimators for \(\mathbf {SLL}_n\) having the above property.

3 Estimation of the spectral lead–lag index

3.1 Construction of the estimators

First, following Bibinger et al. (2014), we define the spectral statistics as follows:

$$\begin{aligned} S^p_{k}=\sum _{t^p_i\in J^n_k}\left( Y^p_{i}-Y^p_{i-1}\right) \Phi _{k}\left( {\bar{t}}^p_i\right) ,\qquad k=0,1,\dots ,h_n^{-1}-1, \end{aligned}$$

where \({\bar{t}}^p_i=(t^p_{i-1}+t^p_i)/2\), \(\Phi _{k}(t)=\sin \left( \pi h_n^{-1}(t-kh_n)\right)\), \(J^n_k=(kh_n,(k+1)h_n]\) and \(h_n\) is a positive number, such that \(h_n^{-1}\in \mathbb {N}\). We assume that the sequence \(h_n\) satisfies \(\sqrt{n}h_n\rightarrow c\) as \(n\rightarrow \infty\) for some \(c>0\). Namely, \(S^p_k\) is the Fourier sine coefficient of the observed returns of the pth asset on the interval \(J^n_k\). We cannot directly use the cosine version of \(S^p_k\)s due to end effects, because \(\cos (0)=-\cos (\pi )=1\ne 0\). To deal with this issue, we rely on the same trick as in Bibinger and Winkelmann (2015). Namely, We consider the spectral statistics on the shifted blocks \(((k-\frac{1}{2})h_n,(k+\frac{1}{2})h_n]\), as well, i.e., \(S_{k-\frac{1}{2}}\) (\(k=1,\dots ,h_n^{-1}-1\)). Bibinger and Winkelmann (2015) use these statistics to handle jumps in their spectral covariance estimators. The following formula plays a key role:

$$\begin{aligned} \Phi _{k-1}(t)=-\Phi _{k}(t)=\cos \left( \pi h_n^{-1}\left( t-\left( k-1/2\right) h_n\right) \right) , \end{aligned}$$

so \(\Phi _{k-1}\) and \(-\Phi _k\) behave as the cosine function on the interval \(J^n_{k-1/2}\).

Now, we explain the idea behind the construction of our estimators. For exposition, we assume that \(b_s\equiv 0\), \(\Sigma _s\equiv \Sigma\), \(t^p_i=i/n\), \(\vartheta _n^1\equiv 0\), \(\vartheta _n^2\equiv \vartheta \in \{k/n:k\in \mathbb {Z}_+\}\) and \(E[\epsilon ^1_i\epsilon ^2_j]=0\) for all ij. We set:

$$\begin{aligned} \ell ^{n,1}_k=\left( S^1_{k-1}-S^1_{k}\right) S^2_{k-\frac{1}{2}}-S^1_{k-\frac{1}{2}}\left( S^2_{k-1}-S^2_{k}\right) ,\qquad k=1,\dots ,h_n^{-1}-1. \end{aligned}$$

Then, noting that \(|\vartheta |\le h_n/2\) for sufficiently large n and:

$$\begin{aligned} E[(Y^1_i-Y^1_{i-1})(Y^2_j-Y^2_{j-1})] =\left\{ \begin{array}{ll} \Sigma ^{12}/n &{} \text {if }t^2_j=t^1_i+\vartheta , \\ 0 &{} \text {otherwise}, \end{array} \right. \end{aligned}$$

we have:

$$\begin{aligned}&E\left[ \ell ^{n,1}_k\right] \approx \frac{\Sigma ^{12}}{n}\sum _{(k-\frac{1}{2})h_n<{\bar{t}}^1_i\le (k+\frac{1}{2})h_n}\left\{ \cos \left( \pi h_n^{-1}\left( {\bar{t}}^1_i-\left( k-1/2\right) h_n\right) \right) \right. \\&\qquad \sin \left( \pi h_n^{-1}\left( {\bar{t}}^1_i+\vartheta -\left( k-1/2\right) h_n\right) \right) \\&\qquad \left. -\sin \left( \pi h_n^{-1}\left( {\bar{t}}^1_i-\left( k-1/2\right) h_n\right) \right) \cos \left( \pi h_n^{-1}\left( {\bar{t}}^1_i+\vartheta -\left( k-1/2\right) h_n\right) \right) \right\} . \end{aligned}$$

Now, by applying the identity \(\sin (y-x)=\cos (x)\sin (y)-\sin (x)\cos (y)\), we obtain:

$$\begin{aligned} E\left[ \ell ^{n,1}_k\right] \approx \Sigma ^{12} h_n\sin \left( \pi h_n^{-1}\vartheta \right) . \end{aligned}$$

The quantity on the right side of the above equation is equal to the integrand of \(L^{n,1}_t\) multiplied by \(h_n\), so we naturally consider the following estimator for \(L^{n,1}_t\):

$$\begin{aligned} {\widehat{L}}^{n,1}_t=\sum _{k=1}^{\lfloor th_n^{-1}\rfloor -1}\ell ^{n,1}_k. \end{aligned}$$

An analogous argument suggests the following estimator for \(L^{n,2}_t\):

$$\begin{aligned} {\widehat{L}}^{n,2}_t=\sum _{k=1}^{\lfloor th_n^{-1}\rfloor -1}\ell ^{n,2}_k, \end{aligned}$$

where:

$$\begin{aligned} \ell ^{n,2}_k= & {} \left( S^1_{k}S^2_{k}-S^1_kS^2_{k-1}-S^1_{k-1}S^2_k\right) \\&\quad +\left( S^1_{k-\frac{1}{2}}S^2_{k-\frac{1}{2}}-S^1_{k-\frac{1}{2}}S^2_{k-\frac{3}{2}}-S^1_{k-\frac{3}{2}}S^2_{k-\frac{1}{2}}\right) ,\qquad k=1,\dots ,h_n^{-1}-1 \end{aligned}$$

with setting \(S_{-\frac{1}{2}}:=0\).Footnote 1

Remark 3.1

(Cross-correlated noise) There is some empirical evidence, showing that microstructure noise is cross-correlated across multiple assets; see Voev and Lunde (2007) and Ubukata and Oya (2009). One may expect that the estimator \(\widehat{L}^{n,1}_t\) constructed above would be robust against such cross-correlations as long as the serial dependence in the noise process is sufficiently weak in tick time.Footnote 2 To see this, note that summation by parts yields the following approximation for the noise part of \(S^p_k\) (cf. the last equation in page 363 of Bibinger and Winkelmann (2015)):

$$\begin{aligned} \sum _{t^p_i\in J^n_k}\left( \epsilon ^p_{i}-\epsilon ^p_{i-1}\right) \Phi _{k}\left( {\bar{t}}^p_i\right) \approx -\frac{\pi }{nh_n}\sum _{t^p_i\in J^n_k}\epsilon ^p_{i}\cos \left( \pi h_n^{-1}(t^p_i-kh_n)\right) . \end{aligned}$$
(3.1)

This suggests that, even if \(E[\epsilon ^1_i\epsilon ^2_i]\ne 0\), the expectation of the noise part of \(\ell ^{n,1}_k\) is negligible by a similar argument to the above. This statement continues to hold true even if \(E[\epsilon ^1_i\epsilon ^2_j]\ne 0\) for \(i\ne j\) as long as \(|E[\epsilon ^1_i\epsilon ^2_j]|\) decays sufficiently fast as \(|i-j|\) increases. This is because, in such a situation, we can replace the summand on the right side of (3.1) by a martingale difference due to Gordin’s martingale approximation method (cf. Sect. 19 of Billingsley (1999)). In fact, this is exactly what we done in the proof to handle the serial dependence in the noise process; see (A.1) and (A.15). In the meantime, the estimator \(\widehat{L}^{n,2}_t\) is biased in the presence of such cross-correlations. In the equidistant sampling case, it is not difficult to see that the bias is proportional to the long-run cross-covariance of the noise process, which can be estimated with a faster convergence rate than \(\widehat{L}^{n,2}_t\) and thus easily corrected; see Bibinger and Reiß (2014) for a serially uncorrelated case. However, the bias correction is not straightforward in the non-synchronous observation case. This is mainly because it is by now not established how to model cross-sectional dependence in microstructure noise in a both mathematically and empirically satisfactory manner (cf. the discussion after Assumption 3 in Bibinger et al. (2019)). For this reason, this paper focuses on the situation where \(\epsilon ^1_i\) and \(\epsilon ^2_j\) are uncorrelated for all ij.

3.2 Asymptotic theory

In this section, we present an asymptotic theory for the process \({\widehat{L}}^n_t:=({\widehat{L}}^{n,1}_t,{\widehat{L}}^{n,2}_t)^\top\), \(t\in [0,1]\). First, we enumerate the assumptions which we impose. Let \(\lambda\) be a positive constant.

A1:

(i) There is a constant \(\eta \in (0,\frac{1}{2})\), such that \(t^p_i\) is an \((\mathcal {F}_{(t-n^{-\eta })_+})_{t\ge 0}\)-stopping time for any ni and every \(p=1,2\).

(ii) \(r_n(t)=o_p(n^{-\xi })\) as \(n\rightarrow \infty\) for any \(t>0\) and any \(\xi \in (0,1)\).

(iii) For any \(n\in \mathbb {N}\) and \(p=1,2\), there is a filtration \((\mathcal {H}^{n,p}_t)_{t\ge 0}\) of \(\mathcal {F}\), such that \(t^p_i\) is an \((\mathcal {H}^{n,p}_t)\)-stopping time for every i.

(iv) For each n there is a random subset \(\mathcal {N}^n\) of \(\mathbb {Z}_+\), such that \(\{(\omega ,p)\in \Omega \times \mathbb {Z}_+:p\in \mathcal {N}^n(\omega )\}\) is a measurable set of \(\Omega \times \mathbb {Z}_+\). Moreover, there is a constant \(\kappa \in (0,\frac{1}{2})\), such that \(\#(\mathcal {N}^n\cap \{i:t^p_i\le t\})=O_p(n^\kappa )\) as \(n\rightarrow \infty\) for every \(t>0\) and every \(p=1,2\).

(v) For any \(n\in \mathbb {N}\), \(p=1,2\) and \(r=1,2\), there is a càdlàg \((\mathcal {H}^{n,p}_t)\)-adapted positive-valued process \(G(r)^{n,p}\), such that \(E[|n(t^p_{i+1}-t^p_i)|^r\big |\mathcal {H}^{n,p}_{t^p_i}]=G(r)^{n,p}_{t^p_i}\) for every \(i\in \mathbb {Z}_+\setminus \mathcal {N}^n\). Moreover, there is a càdlàg \((\mathcal {F}_t)\)-adapted positive-valued process \(G(r)^{p}\), such that \(G(r)^{n,p}\rightarrow ^pG(r)^p\) as \(n\rightarrow \infty\) for the Skorokhod topology.

(vi) \(G(1)^p_{t-}>0\) for every \(t>0\) and every \(p=1,2\).

A2:

For \(p=1,2\), \(c^p_\vartheta\) is \((\mathcal {F}_t)\)-adapted and its paths are almost surely Lipschitz continuous.

\(\hbox {N}_\lambda\):

The measurement errors are of the form \(\epsilon ^p_i=\sqrt{v^p_{t^p_i}}u^p_i\) for \(p=1,2\) and \(i=0,1,\dots\), where \(u^p_i\)s are random variables and \(v^p\) is a nonnegative \((\mathcal {F}_t)\)-adapted process. Moreover, they satisfy the following conditions.

(i) \(u^p\) is strictly stationary and independent of \(\mathcal {F}_\infty :=\bigvee _{t>0}\mathcal {F}_t\) for every \(p=1,2\).

(ii) For every \(p=1,2\), \(E[|u^p|^r]<\infty\) for any \(r>0\) and \(E[u^p]=0\).

(iii) The \(\alpha\)-mixing coefficients \(\alpha _p(j)\) of \(u^p\) satisfy \(\sum _{j=1}^\infty \alpha _p(j)^\lambda <\infty\) for every \(p=1,2\).

(iv) \(u^1\) and \(u^2\) are mutually independent

(v) For every \(p=1,2\), the paths of \(v^p\) are almost surely \(\varpi\)-Hölder continuous for some \(\varpi >0\).

Remark 3.2

(Assumptions on observation times) (a) [A1](i) type assumptions are sometimes called the strong predictability condition in the literature and can be found in, e.g., Hayashi and Yoshida (2011) and Koike (2014, 2016). In our situation, this type of condition is necessary to ensure that the “delayed” observation times \((t^p_i-\vartheta ^p_n(t^p_i))_+\) are nearly \((\mathcal {F}_t)\)-stopping times (see Lemma A.2). Here, we emphasize that in our setting, this type of assumption is required due to the possible existence of lead–lag relationships: in our setting, the process \(X^p\) is essentially sampled at the times \(t^p_i-\vartheta ^p_n(t^p_i)\) rather than the times \(t^p_i\). Even if \(t^p_i\) themselves are stopping times, \(t^p_i-\vartheta ^p_n(t^p_i)\) are not necessarily stopping times unless we impose a kind of predictability condition. Therefore, if we developed an asymptotic theory without such a condition in our setting, we would presumably need to depart from the framework of Itô calculus and rely on anticipative calculus (e.g., Malliavin calculus), which will be mathematically challenging. In fact, in Hoffmann et al. (2013), they also impose an analogous condition due to the same reason (see Assumption B2 of Hoffmann et al. (2013)). Indeed, their assumption is stronger than ours, because they consider lead–lag times which do not shrink as n tends to infinity. We also remark that our assumption allows, e.g., random sampling times independent of the \(\sigma\)-field \(\mathcal {F}_\infty\), because such ones can be assumed to be \(\mathcal {F}_0\)-measurable without loss of generality, so we do not necessarily know transactions completely in advance (i.e., there may still be exogenous randomness).

(b) [A1](ii)–(iv) type assumptions are more or less standard in the literature and found in Barndorff-Nielsen et al. (2011) and Koike (2014, 2016) for example. Here, we remark that the introduction of the random set \(\mathcal {N}^n\) is mainly necessary to ensure the stability of Assumption [A1] under the localization procedure used in the proof; see the proof of Lemma 6.3 from Koike (2017b) for details. Of course, one can take the set \(\mathcal {N}^n\) as the empty set, which amounts to a standard situation in the literature. Another reason why we introduce the set \(\mathcal {N}^n\) is because it excludes some trivial exceptions appearing when we set \(\mathcal {N}^n=\emptyset\). For example, if \(t^p_0=\log n/n\) and \(t^p_i=t^p_{i-1}+1/n\), \(i=1,2,\dots\), [A1](v) is not satisfied if we set \(\mathcal {N}^n=\emptyset\).

Remark 3.3

(Assumptions on microstructure noise) An [\(\hbox {N}_\lambda\)] type assumption is used in Jacod et al. (2019, 2017). It allows the noise process to have time-varying variance and serial autocorrelations. Both properties are the stylized facts of ultra-high-frequency financial data (see, e.g., Hansen and Lunde 2006). Assumption [\(\hbox {N}_\lambda\)](iv) excludes cross-correlations between \((\epsilon ^1_i)_{i=0}^\infty\) and \((\epsilon ^2_i)_{i=0}^\infty\). We impose such an assumption due to the reason explained in Remark 3.1. We also remark that the existence of all moments of the noise required by [\(\hbox {N}_\lambda\)] is standard in the literature (see, e.g., Assumption 16.1.1 of Jacod and Protter (2012) and Assumption (N-v) of Jacod et al. (2019)) and not a serious practical restriction as noted in Remark 16.1.2 of Jacod and Protter (2012). It would be possible to state the assumption to require the finite moment up to a suitable order which depends on other parameters such as \(\xi ,\lambda\) and so on.

Let us recall the notion of stable convergence. Given a sequence \((Z_n)\) of random variables taking values in a Polish space S and a sub-\(\sigma\)-field \(\mathcal {G}\) of \(\mathcal {F}\), we say that the variables \(Z_n\) converge \(\mathcal {G}\)-stably in law to an S-valued variable Z, which is defined on an extension of \((\Omega ,\mathcal {F},P)\), if \(E[Uf(Z_n)]\rightarrow E[Uf(Z)]\) as \(n\rightarrow \infty\) for any \(\mathcal {G}\)-measurable bounded variable U and any bounded continuous function f on S. Then, we write \(Z_n\rightarrow ^{\mathcal {G}-d_s}Z\). In this case, for any variables \(U_n\) converging in probability to a \(\mathcal {G}\)-measurable variable U, we have \((U_n,Z_n)\rightarrow ^{\mathcal {G}-d_s}(U,Z)\) as \(n\rightarrow \infty\) for the product topology on the space \(\mathbb {R}\times S\).

Now, we are ready to state our asymptotic result.

Theorem 3.1

Suppose that [A1]–[A2] and [\(\hbox {N}_\lambda\)] are satisfied for some \(\lambda \in (0,\frac{1}{2})\). Then, the bivariate processes \(h_n^{-\frac{1}{2}}\left( {\widehat{L}}^{n,1}-L^{n,1},{\widehat{L}}^{n,2}-L^{n,2}\right)\) converge \(\mathcal {F}_\infty\)-stably in law to \(({\widetilde{W}}^1_{\int _0^\cdot {\mathfrak {v}}^1_s\mathrm {d}s},{\widetilde{W}}^2_{\int _0^\cdot {\mathfrak {v}}^2_s\mathrm {d}s})\) as \(n\rightarrow \infty\) for the Skorokhod topology, where \({\widetilde{W}}^1\) and \({\widetilde{W}}^2\) are mutually independent standard Brownian motions independent of \(\mathcal {F}_\infty\), and:

$$\begin{aligned} {\mathfrak {v}}^{1}_s&=\left\{ {\mathfrak {S}}_s^{1,+}{\mathfrak {S}}_s^{2,+}-\left( \Sigma _s^{12}\right) ^2\right\} +\pi ^{-2}\left\{ {\mathfrak {S}}_s^{1,-}{\mathfrak {S}}_s^{2,-}-\left( \Sigma _s^{12}\right) ^2\right\} ,\\ {\mathfrak {v}}^{2}_s&=\frac{3}{2}\left\{ {\mathfrak {S}}_s^{1,+}{\mathfrak {S}}_s^{2,+}+\left( \Sigma _s^{12}\right) ^2\right\} +\pi ^{-2}\left\{ {\mathfrak {S}}_s^{1,-}{\mathfrak {S}}_s^{2,-}+\left( \Sigma _s^{12}\right) ^2\right\} \end{aligned}$$

with \({\mathfrak {S}}^{p,\pm }_s=\Sigma ^{pp}_s\pm \pi ^2c^{-2}v^p_s\Psi ^p_s\),

$$\begin{aligned} \Psi ^p_s&=\frac{G(2)^p_s+\left( G(1)^p_s\right) ^2}{2G(1)^p_s}\gamma _p(0) +\frac{G(2)^p_s+3\left( G(1)^p_s\right) ^2}{2G(1)^p_s}\gamma _p(1) +2G(1)^p_s\sum _{j=2}^\infty \gamma _p(j) \end{aligned}$$

and \(\gamma _p(j)=E\left[ u^{p}_0u^p_j\right]\) \((j\in \mathbb {Z}_+)\) for \(p=1,2\) and \(s\ge 0\).

Remark 3.4

(Mixing condition on the noise process) It might be worth remarking that the mixing condition imposed in Theorem 3.1 is stronger than the usual one \(\sum _{j=1}^\infty \alpha _p(j) < \infty\) in classical time-series analysis, but this is standard in the literature of volatility/covariance estimation from high-frequency data. For example, Jacod et al. (2019) requires \(\alpha _p(j)=O(j^{-v})\) for some \(v >3\), which implies that \(\sum _{j=1}^\infty \alpha _p(j)^\lambda < \infty\) for some \(\lambda < 1/3\); Ikeda (2016) requires that (at least) \(\alpha _p(j)=O(j^{-\varpi /(\varpi -4)-\delta })\) for some \(\varpi >4\) and \(\delta >0\) as well as \(\sum _{j=1}^\infty j^\varpi \gamma _p(j)<\infty\), where \(\gamma _p(j)\) is the auto-covariance function of \(u^p\), which is much stronger than \(\sum _{j=1}^\infty \alpha _p(j) < \infty\); Varneskov (2016) requires \(\sum _{j=1}^\infty j\alpha _p(j)<\infty\), which implies that \(\sum _{j=1}^\infty \alpha _p(j)^\lambda < \infty\) for any \(\lambda > 1/2\).

Theorem 3.1 has some important conclusions. First, since \(L^{n,1}_1\rightarrow 0\) a.s., \(L^{n,2}_1\rightarrow \int _0^1\Sigma _s^{12}\mathrm {d}s\) a.s., and:

$$\begin{aligned} {\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1-L^{n,1}_1/L^{n,2}_1 =\frac{L^{n,2}_1({\widehat{L}}^{n,1}_1-L^{n,1}_1)-L^{n,1}_1({\widehat{L}}^{n,2}_1-L^{n,2}_1)}{{\widehat{L}}^{n,2}_1L^{n,2}_1}, \end{aligned}$$

the property of stable convergence and the continuous mapping theorem imply that:

$$\begin{aligned} h_n^{-1/2}\left( {\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1-L^{n,1}_1/L^{n,2}_1\right) \rightarrow ^{d_s}\sqrt{\mathcal {V}}\cdot \zeta \end{aligned}$$

as \(n\rightarrow \infty\), where \(\zeta\) is a standard normal variable independent of \(\mathcal {F}_\infty\) and \(\mathcal {V}:=\int _0^1{\mathfrak {v}}^1_s\mathrm {d}s/(\int _0^1\Sigma _s^{12}\mathrm {d}s)^2\) (recall \(\int _0^1\Sigma _s^{12}ds\ne 0\) a.s.). Next, noting \(L^{n,1}_1/L^{n,2}_1\rightarrow 0\) a.s., we obtain:

$$\begin{aligned} h_n^{-3/2}\left( \widehat{\mathbf {SLL}}_n-\mathbf {SLL}_n\right) \rightarrow ^{d_s}\pi \sqrt{\mathcal {V}}\cdot \zeta \end{aligned}$$

as \(n\rightarrow \infty\) by a similar argument to the proof of the delta method, where:

$$\begin{aligned} \widehat{\mathbf {SLL}}_n:=(h_n/\pi )\arctan ({\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1). \end{aligned}$$
(3.2)

In particular, it holds that:

$$\begin{aligned} \widehat{\mathbf {SLL}}_n=\mathbf {SLL}_n+O_p(h_n^{3/2}) \end{aligned}$$

as \(n\rightarrow \infty\). Therefore, the estimators \(\widehat{\mathbf {SLL}}_n\) satisfy property (2.2) as long as \(\alpha <3/4\).

Remark 3.5

(Necessity of the condition \(\alpha <3/4\)) It is worth mentioning that there is no sequence of estimators having property (2.2) if \(\alpha \ge 3/4\) in the following sense. Let us suppose that the observation data \((Y^1_i,Y^2_i)_{i=1}^n\) are generated by the following simpler model:

$$\begin{aligned} \left\{ \begin{array}{lll} Y^1_i=B^1_{i/n}+\epsilon ^1_i,&{}Y_i=B^2_{i/n-\vartheta }+\epsilon ^2_i&{}\text {if }\vartheta \ge 0,\\ Y^2_i=B^1_{i/n-\vartheta }+\epsilon ^1_i,&{}Y_i=B^2_{i/n}+\epsilon ^2_i&{}\text {if }\vartheta <0 \end{array}\right. \qquad \text {for }i=1,\dots ,n, \end{aligned}$$

where \((B^1_t,B^2_t)\) \((t\in \mathbb {R})\) is a bivariate two-sided Brownian motion, such that \(B_0=0\), \(E[(B^1_1)^2]=E[(B^2_1)^2]=1\) and \(E[B^1_1B^2_1]=\rho\) for some \(\rho \in (-1,0)\cup (0,1)\), \((\epsilon ^1_i,\epsilon ^2_i)\), \(i=1,\dots ,n\) are i.i.d. Gaussian variables with mean 0 and variance \(v>0\), \(\vartheta \in \mathbb {R}\) is the lead–lag parameter. Note that this model corresponds to a simplified version of our model (2.1), such that \(b_s\equiv 0\):

$$\begin{aligned} \Sigma _s\equiv \left( \begin{array}{cc} 1 &{} \rho \\ \rho &{} 1 \end{array}\right) , \end{aligned}$$

\(t^p_i=i/n\), \(\vartheta _n(t)\equiv \vartheta\) and \(\epsilon ^p_i\overset{i.i.d.}{\sim }{\mathbf {N}}(0,v)\): when the time lag process \(\vartheta _n(t)\) does not depend on t, we may set \(\vartheta ^1_n\equiv 0\) or \(\vartheta ^2_n\equiv 0\) in accordance with the sign of \(\vartheta _n(t)\), so we have adopted such a specification in the above model.

Now, we denote by \(P_{n,\vartheta }\) the law of \((Y^1_1,\dots ,Y^1_n,Y^2_1,\dots ,Y^2_n)\). Then, from Corollary 2.1 of Koike (2017a), \((P_{n,\vartheta })_{\vartheta \in \mathbb {R}}\) has the LAN property at \(\vartheta =0\) with rate \(n^{-3/4}\) and asymptotic Fisher information \(\rho ^2v^{3/2}/\{2(\sqrt{1+\rho }+\sqrt{1-\rho })\}\). In particular, by the local asymptotic minimax theorem (see, e.g., Theorem 1 from Ch. 6 of Le Cam and Yang (2000)), we obtain:

$$\begin{aligned} \lim _{c\rightarrow \infty }\liminf _{n\rightarrow \infty }\sup _{|\vartheta |\le cn^{-3/4}}P_{n,\vartheta }\left( n^{\frac{3}{4}}|{\widehat{\vartheta }}_n-\vartheta |>\eta \right) >0 \end{aligned}$$

for any \(\eta >0\) and any sequence \({\widehat{\vartheta }}_n\) of estimators.

Another application of Theorem 3.1 is the construction of confidence intervals for \(\mathbf {SLL}_n\). For this purpose, we need estimators for the asymptotic variances of \({\widehat{L}}^{n,1}_1\) and \({\widehat{L}}^{n,2}_1\), as well, and we address this issue in the next subsection.

3.3 Asymptotic variance estimation

Since the analytic expressions of the asymptotic variances of \({\widehat{L}}^{n,1}_t\) and \({\widehat{L}}^{n,2}_t\) are complex, we estimate them by subsampling.Footnote 3 Since we consider infill-asymptotics, traditional subsampling methods (e.g., Politis and Romano (1994)) should be modified due to incorrect centering. Several subsampling methods for high-frequency data have recently been proposed, cf. Kalnina (2011), Sect. 4 of Christensen et al. (2013), Mykland and Zhang (2017), Christensen et al. (2017), Sect. 3.1 of Ikeda (2016). In this paper, we adopt an “overlapping version” of the method proposed in Sect. 4 of Christensen et al. (2013).

Set \(\ell ^n_k=(\ell ^{n,1}_k,\ell ^{n,2}_k)^\top\) for \(k=1,\dots ,h_n^{-1}-1\), and take a positive integer \(K_n\) as the number of subsamples. Then, we define:

$$\begin{aligned} {\widehat{L}}^{n}(\beta )=\sum _{k=\beta }^{\beta +K_n-1}\ell ^{n}_{k},\qquad (\beta =1,\dots ,h_n^{-1}-K_n). \end{aligned}$$

We may expect that adjacent subsampled estimators \({\widehat{L}}^{n}(\beta +K_n)\) and \({\widehat{L}}^{n}(\beta )\) would have conditional expectations close to each other. Motivated by this, we introduce the following estimator for the asymptotic covariance matrix of \({\widehat{L}}^n_t\), \(t\in [0,1]\):

$$\begin{aligned} {\widehat{V}}^{n}_t=\frac{h_n^{-1}}{2K_n}\sum _{\beta =1}^{\lfloor th_n^{-1}\rfloor -2K_n}\left\{ {\widehat{L}}^{n}(\beta +K_n)-{\widehat{L}}^{n}(\beta )\right\} \left\{ {\widehat{L}}^{n}(\beta +K_n)-{\widehat{L}}^{n}(\beta )\right\} ^\top . \end{aligned}$$

The validity of this subsampling method is ensured by the following theorem:Footnote 4

Theorem 3.2

Suppose that [A1]–[A2] and [\(\hbox {N}_\lambda\)] are satisfied for some \(\lambda \in (0,\frac{1}{4})\). Suppose also that the paths of \(\Sigma ^{12}\) are almost surely \(\gamma\)-Hölder continuous for some \(\gamma \in (0,1]\). Then, we have: \(\sup _{0\le t\le 1}|{\widehat{V}}^{n,fg}_t-1_{\{f=g\}}\int _0^t{\mathfrak {v}}^f_s\mathrm {d}s|\rightarrow ^p0\) as \(n\rightarrow \infty\) for every \(f,g=1,2\), provided that \(K_n\rightarrow \infty\) and \(K_n=O(n^z)\) for some \(z<\gamma /\{2(\gamma +1)\}\).

Now, under the assumptions of Theorems 3.13.2, we can construct confidence intervals of \(\mathbf {SLL}_n\) as follows. Since we have:

$$\begin{aligned} (h_n\mathcal {V}_n)^{-\frac{1}{2}}\left( {\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1-L^{n,1}_1/L^{n,2}_1\right) \rightarrow ^d{\mathbf {N}}(0,1) \end{aligned}$$
(3.3)

as \(n\rightarrow \infty\), whereFootnote 5:

$$\begin{aligned} \mathcal {V}_n=\frac{({\widehat{L}}^{n,1}_1)^2{\widehat{V}}^{n,22}_1+({\widehat{L}}^{n,2}_1)^2{\widehat{V}}^{n,11}_1-2{\widehat{L}}^{n,1}_1{\widehat{L}}^{n,2}_1{\widehat{V}}^{n,12}_1}{({\widehat{L}}^{n,2}_1)^4}, \end{aligned}$$

a \(100(1-\alpha )\)% confidence interval of \(\mathbf {SLL}_n\) for \(\alpha \in (0,1)\) is given by:

$$\begin{aligned} \left[ \frac{h_n}{\pi }\arctan \left( \frac{{\widehat{L}}^{n,1}_1}{{\widehat{L}}^{n,2}_1}-\sqrt{h_n\mathcal {V}_n}z_{\alpha /2}\right) ,\frac{h_n}{\pi }\arctan \left( \frac{{\widehat{L}}^{n,1}_1}{{\widehat{L}}^{n,2}_1}+\sqrt{h_n\mathcal {V}_n}z_{\alpha /2}\right) \right] \end{aligned}$$

with \(z_{\alpha /2}\) being the upper \(\alpha /2\)-quantile of the standard normal distribution.

Remark 3.6

(Relation to Mykland and Zhang (2017)) The diagonal entries of \({\widehat{V}}^{n}_t\) are related to rolling quadratic variations of integrated processes introduced in (Mykland and Zhang 2017, Definition 2). In fact, for \(f=1,2\), \(2h_n{\widehat{V}}^{n,ff}_1\) is almost identical to the quantity \(QV_{B,K}({\hat{\Theta }})\) in Mykland and Zhang (2017) with taking \(\ell ^{n,f}\), \(K_n\), \(h_n^{-1}\) as \({\hat{\Theta }}_{((k-\frac{1}{2})h_n,(k+\frac{1}{2})h_n]}\), K, B, respectively. Mykland and Zhang (2017) have shown that, even after appropriately rescaling, \(QV_{B,K}({\hat{\Theta }})\) is generally a biased estimator for the asymptotic variance of \({\hat{\Theta }}_{(0,1]}\); see Sect. 3.2 ibidem. However, Mykland and Zhang (2017) have also shown that this is not the case when edge effects therein are “tiny”; see Sect. 3.3 ibidem for details. The proofs of Propositions A.1 and A.3 in the Appendix suggest that our estimator would satisfy this tiny edge effects condition. In fact, \(n^{-\alpha }\) and \(K_n\Delta T_n\) in Mykland and Zhang (2017) corresponds to \(\sqrt{h_n}\) and \(K_nh_n\), respectively, while the average of the squared edge effects in \({\widehat{L}}^{n,f}_1\) given by Eq. (21) of Mykland and Zhang (2017) would be of order \(O_p(h_n^{1+a})\) for any \(a\in (0,1)\). Under the assumptions of Theorem 3.2, \(K_nh_n\) has the same order as \(h_n^b\) for some \(b\in (\frac{1}{2},1)\). All together, Remark 9 of Mykland and Zhang (2017) would imply that \({\widehat{V}}^{n,ff}_1\) is a consistent estimator for the asymptotic variance of \({\widehat{L}}^{n,f}_1\).

4 Testing the absence of the lead–lag relationship

In this section, we present another application of Theorem 3.1 to decide whether the spot lead–lag time \(\vartheta _n(t)\) is identical to zero on the interval [0, 1] or not. Noting that \(\vartheta _n(t)\) is stochastic, our purpose is formulated as follows (cf. Aït-Sahalia and Jacod (2009)): we decompose the sample space \(\Omega\) into two disjoint subsets:

$$\begin{aligned} \Omega ^0=\{\omega \in \Omega :\vartheta _n(t)=0\text { for all }t\in [0,1]\},\qquad \Omega ^1=\{\omega \in \Omega :\vartheta _n(t)\ne 0\text { for some }t\in [0,1]\}, \end{aligned}$$

and we decide whether the realized outcome \(\omega \in \Omega\) belongs to \(\Omega ^0\) or to \(\Omega ^1\). In the following, we set “\(\omega \in \Omega ^0\)” as the null hypothesis and “\(\omega \in \Omega ^1\)” as the alternative.

To construct the test statistic, we borrow the idea from Dette and Podolskij (2008) and Vetter and Dette (2012). We note that \(\omega \in \Omega ^0\) is equivalent to \(L^{n,1}(\omega )_t=0\) for all \(t\in [0,1]\), provided that \(\Sigma ^{12}(\omega )_t\ne 0\) for all \(t\in [0,1]\). This suggests us to consider the following Kolmogorov–Smirnov type test statistic:

$$\begin{aligned} T_n^{\text {KS}}=(h_n{\widehat{V}}^{n,11}_1)^{-\frac{1}{2}}\sup _{t\in [0,1]}\left| {\widehat{L}}^{n,1}_t\right| . \end{aligned}$$

We can also consider the Kuiper type test statistic as follows:

$$\begin{aligned} T_n^{\text {Kuiper}}=(h_n{\widehat{V}}^{n,11}_1)^{-\frac{1}{2}}\left( \sup _{t\in [0,1]}{\widehat{L}}^{n,1}_t-\inf _{t\in [0,1]}{\widehat{L}}^{n,1}_t\right) . \end{aligned}$$

Then, an application of Theorem 3.1 and the continuous mapping theorem yields the following convergence:

$$\begin{aligned} 1_{\Omega ^0}T_n^{\text {KS}}\rightarrow ^{\mathcal {F}_\infty -d_s}1_{\Omega ^0}\sup _{t\in [0,1]}|B_t|,\qquad 1_{\Omega ^0}T_n^{\text {Kuiper}}\rightarrow ^{\mathcal {F}_\infty -d_s}1_{\Omega ^0}\left( \sup _{t\in [0,1]}B_t-\inf _{t\in [0,1]}B_t\right) \end{aligned}$$

as \(n\rightarrow \infty\), where B is a standard Brownian motion independent of \(\mathcal {F}_\infty\). Now, let \({\bar{F}}_{\text {KS}}\) and \({\bar{F}}_{\text {Kuiper}}\) be the survival functions of the variables \(\sup _{t\in [0,1]}|B_t|\) and \(\sup _{t\in [0,1]}B_t-\inf _{t\in [0,1]}B_t\), respectively. Note that they can be analytically evaluated by using formulae 3.1.1.4 and 1.1.15.4(1) from Borodin and Salminen (2002). Then, the p values of the test statistics \(T^\text {KS}_n\) and \(T^\text {Kuiper}_n\) are given by \({\bar{F}}_{\text {KS}}(T^\text {KS}_n)\) and \({\bar{F}}_{\text {Kuiper}}(T^\text {Kuiper}_n)\), respectively. More formally, given a significant level \(\alpha \in (0,1)\), we have:

$$\begin{aligned} P({\bar{F}}_{\text {KS}}(T^\text {KS}_n)<\alpha |\Omega ^0)\rightarrow \alpha ,\qquad P({\bar{F}}_{\text {Kuiper}}(T^\text {Kuiper}_n)<\alpha |\Omega ^0)\rightarrow \alpha \end{aligned}$$

as \(n\rightarrow \infty\), provided that \(P(\Omega ^0)>0\). Moreover, by the construction of the test statistics, we have:

$$\begin{aligned} P({\bar{F}}_{\text {KS}}(T^\text {KS}_n)<\alpha |\Omega ^1)\rightarrow 1,\qquad P({\bar{F}}_{\text {Kuiper}}(T^\text {Kuiper}_n)<\alpha |\Omega ^1)\rightarrow 1 \end{aligned}$$

as \(n\rightarrow \infty\), provided that \(P(\Omega ^1)>0\) and \(\Sigma ^{12}_t\ne 0\) for all \(t\in [0,1]\). Consequently, the hypothesis testing based on \(T^\text {KS}_n\) (resp. \(T^\text {Kuiper}_n\)) is implemented by rejecting the null if and only if \({\bar{F}}_{\text {KS}}(T^\text {KS}_n)<\alpha\) (resp. \({\bar{F}}_{\text {Kuiper}}(T^\text {Kuiper}_n)<\alpha\)). This provides a level \(\alpha\) test with asymptotic power 1.

5 Estimation of the spot lead–lag time

Now, we focus on the direct estimation of the spot lead–lag time \(\vartheta _n(t)\), \(t\in [0,1]\). We remark that \(\vartheta _n(t)\) can be rewritten as \(\vartheta _n(t)=(h_n/\pi )\arctan \left[ \sin \left( \frac{\pi }{h_n}\vartheta _n(t)\right) /\cos \left( \frac{\pi }{h_n}\vartheta _n(t)\right) \right]\). In Sect. 3, we have constructed the estimators for the integrated quantities of \(\sin \left( \frac{\pi }{h_n}\vartheta _n(t)\right)\) and \(\cos \left( \frac{\pi }{h_n}\vartheta _n(t)\right)\), so estimators for the latter ones can be naturally constructed by a kernel approach as in the literature on spot volatility estimation (cf. Kristensen (2010), Kanaya and Kristensen (2016)). Specifically, for a function \(\mathcal {K}:\mathbb {R}\rightarrow \mathbb {R}\) and \(t\in (0,1)\), we set:

$$\begin{aligned} {\widehat{L}}(\mathcal {K})^n_t=\sum _{k=1}^{h_n^{-1}-1}\mathcal {K}_{H_n}\left( kh_n-t\right) \ell ^n_k, \end{aligned}$$

where \(\mathcal {K}_{H_n}(s)=\mathcal {K}(s/H_n)/H_n\) for every \(s\ge 0\) and \(H_n>0\) is a bandwidth parameter.

Theorem 5.1

Let \(\mathcal {K}:\mathbb {R}\rightarrow \mathbb {R}\) be a piecewise Lipschitz continuous function supported by the interval \([-1,1]\), such that \(\int _{-\infty }^\infty \mathcal {K}(s)\mathrm {d}s=1\). Let \(H_n\) be a sequence of positive numbers, such that \(H_n\rightarrow 0\), \(H_n^{-1}h_n^{\nu }\rightarrow 0\) for some \(\nu \in (0,1)\) and \((h_n^{-1}H_n)^{3/2}n^{-\alpha }\rightarrow 0\) as \(n\rightarrow \infty\). Suppose that \(t\in (0,1)\) satisfies \(\Sigma _t=\Sigma _{t-}\) a.s. Then, under the assumptions of Theorem 3.1, the variables

$$\begin{aligned} \sqrt{h_n^{-1}H_n}\left( {\widehat{L}}(\mathcal {K})^n_t-\left( \begin{array}{c} \sin (\pi h_n^{-1}\vartheta _n(t))\\ \cos (\pi h_n^{-1}\vartheta _n(t)) \end{array}\right) \int _0^1\mathcal {K}_{H_n}(s-t)\Sigma _s^{12}\mathrm {d}s\right) \end{aligned}$$

converge \(\mathcal {F}_\infty\)-stably in law to the variable \((\sqrt{{\mathfrak {v}}^1_t\int _{-\infty }^\infty \mathcal {K}(s)^2\mathrm {d}s}\cdot w_1,\sqrt{{\mathfrak {v}}^2_t\int _{-\infty }^\infty \mathcal {K}(s)^2\mathrm {d}s}\cdot w_2)^\top\) as \(n\rightarrow \infty\), where \(w_1\) and \(w_2\) are mutually independent standard normal variables independent of \(\mathcal {F}_\infty\).

Now, the estimator for the spot lead–lag time is constructed as:

$$\begin{aligned} {\widehat{\vartheta }}_n(t)=(h_n/\pi )\arctan ({\widehat{L}}(\mathcal {K})^{n,1}_t/{\widehat{L}}(\mathcal {K})^{n,2}_t),\qquad t\in (0,1). \end{aligned}$$
(5.1)

Under the assumptions of Theorem 5.1, we have:

$$\begin{aligned} {\widehat{\vartheta }}_n(t)=\vartheta _n(t)+O_p(h_n^{3/2}H_n^{-1/2}) \end{aligned}$$

as \(n\rightarrow \infty\), provided that \(\Sigma ^{12}_t\ne 0\). In particular, \({\widehat{\vartheta }}_n(t)\) is a meaningful estimator for \(\vartheta _n(t)\) as long as \(n^\alpha h_n^{3/2}H_n^{-1/2}\rightarrow 0\) as \(n\rightarrow \infty\). This is possible if \(\alpha <\frac{3}{4}\) while taking \(H_n\propto h_n^\varpi\) for some \(\varpi \in (1-\frac{4}{3}\alpha ,3-4\alpha )\).

Remark 5.1

(Vanishing spot covariation case) It would be useful to investigate what happens if \(\Sigma ^{12}_t=0\). For this purpose, we additionally suppose that paths of \(\Sigma ^{12}\) are almost surely \(\gamma\)-Hölder continuous for some \(\gamma \in (0,1]\). Then, we have:

$$\begin{aligned} \int _0^1\mathcal {K}_{H_n}(s-t)\Sigma _s^{12}\mathrm {d}s=\Sigma _t^{12}+o_p\left(\sqrt{h_nH_n^{-1}}\right) \end{aligned}$$

as \(n\rightarrow \infty\), provided that \(H_n=o(h_n^{1/(2\gamma +1)})\). Therefore, Theorem 5.1, the continuous mapping theorem, and Eq. (5) in Marsaglia (1965) imply that the variables:

$$\begin{aligned} 1_{\{\Sigma ^{12}_t=0\}}\sqrt{h_n^{-1}H_n}\tan ((\pi /h_n){\widehat{\vartheta }}_n(t)) \end{aligned}$$

converge \(\mathcal {F}_\infty\)-stably in law to the variable \(\sqrt{{\mathfrak {v}}^1_t/{\mathfrak {v}}^2_t}\cdot {\mathfrak {c}}\) as \(n\rightarrow \infty\), where \({\mathfrak {c}}\) is a standard Cauchy variable independent of \(\mathcal {F}_\infty\). In particular, we have \(1_{\{\Sigma ^{12}_t=0\}}{\widehat{\vartheta }}_n(t)=O_p(h_n^{3/2}H_n^{-1/2})\) as \(n\rightarrow \infty\). From the above discussion, we should choose \(h_n\), such that \(h_n^{3/2}H_n^{-1/2}=o(n^{-\alpha })\) to make \({\widehat{\vartheta }}_n(t)\) a meaningful estimator for \(\vartheta _n(t)\), and in this case, we have \(1_{\{\Sigma ^{12}_t=0\}}{\widehat{\vartheta }}_n(t)=o_p(n^{-\alpha })\) as \(n\rightarrow \infty\). Consequently, if \(\Sigma ^{12}_t=0\), \({\widehat{\vartheta }}_n(t)\) will behave as if there is no significant time-lag between two assets.

To estimate the asymptotic variances of the estimators, we use a kernel counterpart of the subsampling estimator developed in Sect. 3.3. Namely, we define:

$$\begin{aligned} {\widehat{V}}(\mathcal {K})^{n}_t= & {} \frac{h_n^{-1}}{2K_n} \sum _{\beta =1}^{h_n^{-1}-2K_n}\mathcal {K}_{H_n}((\beta +K_n) h_n-t)\left\{ {\widehat{L}}^{n}(\beta +K_n) -{\widehat{L}}^{n}(\beta )\right\} \\&\left\{ {\widehat{L}}^{n}(\beta +K_n) -{\widehat{L}}^{n}(\beta )\right\} ^\top . \end{aligned}$$

Theorem 5.2

Let \(\mathcal {K}:\mathbb {R}\rightarrow \mathbb {R}\) be a piecewise Lipschitz continuous function supported by the interval \([-1,1]\), such that \(\int _{-\infty }^\infty \mathcal {K}(s)\mathrm {d}s=1\) . Let \(H_n\) be a sequence of positive numbers, such that \(H_n\rightarrow 0\) and \(H_n^{-1}K_nh_n^{\nu }\rightarrow 0\) for some \(\nu \in (0,1)\) as \(n\rightarrow \infty\) . Then, under the assumptions of Theorem 3.2, we have \({\widehat{V}}(\mathcal {K})^{n,fg}_t\rightarrow ^p1_{\{f=g\}}{\mathfrak {v}}^f_t\) as \(n\rightarrow \infty\) for any \(f,g=1,2\).

As in Sect. 3.3, we can construct confidence intervals of \(\vartheta _n(t)\) under the assumptions of Theorems 5.15.2 as follows. Since we have:

$$\begin{aligned} (h_nH_n^{-1}\mathcal {V}(\mathcal {K})_n)^{-\frac{1}{2}}\left( {\widehat{L}}(\mathcal {K})^{n,1}_t/{\widehat{L}}(\mathcal {K})^{n,2}_t-\tan \left( \pi \vartheta _n(t)/h_n\right) \right) \rightarrow ^d{\mathbf {N}}(0,1) \end{aligned}$$

as \(n\rightarrow \infty\), where

$$\begin{aligned} \mathcal {V}(\mathcal {K})_n=\frac{({\widehat{L}}(\mathcal {K})^{n,1}_t)^2{\widehat{V}}(\mathcal {K})^{n,22}_t+({\widehat{L}}(\mathcal {K})^{n,2}_t)^2{\widehat{V}}(\mathcal {K})^{n,11}_t-2{\widehat{L}}(\mathcal {K})^{n,1}_t{\widehat{L}}(\mathcal {K})^{n,2}_t{\widehat{V}}(\mathcal {K})^{n,12}_t}{({\widehat{L}}(\mathcal {K})^{n,2}_t)^4}, \end{aligned}$$

a \(100(1-\alpha )\)% confidence interval of \(\vartheta _n(t)\) for \(\alpha \in (0,1)\) is given by:

$$\begin{aligned}&\left[ \frac{h_n}{\pi }\arctan \left( \frac{{\widehat{L}}(\mathcal {K})^{n,1}_t}{{\widehat{L}} (\mathcal {K})^{n,2}_t}-\sqrt{h_nH_n^{-1}\mathcal {V}(\mathcal {K})_n} z_{\alpha /2}\right) ,\right. \\&\quad \left. \frac{h_n}{\pi }\arctan \left( \frac{{\widehat{L}} (\mathcal {K})^{n,1}_t}{{\widehat{L}}(\mathcal {K})^{n,2}_t}+\sqrt{h_nH_n^{-1}\mathcal {V}(\mathcal {K})_n}z_{\alpha /2}\right) \right] . \end{aligned}$$

6 Simulation study

In this section, we conduct a Monte Carlo study to assess the finite sample performance of the proposed methodology. We simulate the efficient log-price process \(X^p\) (\(p=1,2\)) from the Rough Fractional Stochastic Volatility (RFSV) model of Gatheral et al. (2018):

$$\begin{aligned} \mathrm {d}X^p_t=\exp (Z^p_t)\mathrm {d}B^p_t,\qquad \mathrm {d}Z^p_t=-\alpha (Z^p_t-m)\mathrm {d}t+\nu \mathrm {d}B^{H,p}_t,\qquad t\in [0,1], \end{aligned}$$

where \((B^1,B^2)\) is a two-dimensional standard Brownian motion, such that \([B^1,B^2]_t=Rt\) and \(B^{H,p}\) is a fractional Brownian motion with Hurst parameter H. We assume that \((B^1,B^2)\), \(B^{H,1}\), and \(B^{H,2}\) are mutually independent. As in Section 3.4 of Gatheral et al. (2018), we set \(H=0.14\), \(\nu =0.3\), \(m=-5\), and \(\alpha =5\times 10^{-4}\). \(Z^p_0\) is taken from the stationary distribution of \(Z^p\). We vary the correlation parameter R as \(R=0.5,0.7,0.9\).

We generate the observation data as follows. First, we simulate the equidistant sampling \((X^p_{i/n-\vartheta ^p_n(i/n)})_{i=0}^n\) of the time-lagged efficient log-price process by the Euler–Maruyama scheme, where we set \(n=23,400\). We regard the interval [0, 1] as 6.5 h, so that \(\Delta _n:=1/n\) corresponds to 1 s. Then, for each i, we keep the value \(X^p_{i/n}\) as an observation for the p-th asset with probability \(\pi _p\), where we set \(\pi _1=2/3\) and \(\pi _2=1/2\). After that, we add microstructure noise to the observations. We generate the noise process \((\epsilon ^p_i)\) from the following AR(1) process:

$$\begin{aligned} \epsilon ^p_i=\phi \epsilon ^p_{i-1}+\eta ^p_i,\qquad \eta ^p_i\overset{i.i.d.}{\sim }{\mathbf {N}}(0,\delta ), \end{aligned}$$

where \(\delta >0\) is chosen, so that \({{\,\mathrm{Var}\,}}[\epsilon ^p_i]=\gamma {{\,\mathrm{Var}\,}}[X^p_{t^p_i}-X^p_{t^p_{i-1}}]\), where \(\gamma\) corresponds to the noise ratio of Oomen (2006). We set \(\phi =0.77\) and \(\gamma =0.5\) as in Sect. 2.3 of Christensen et al. (2014). For the latency processes \(\vartheta ^1_n(t)\) and \(\vartheta ^2_n(t)\), we consider the following two scenarios:

Scenario 1:

\(\vartheta ^1_n(t)\equiv 0\), \(\vartheta ^2_n(t)\equiv \ell \Delta _n\).

Scenario 2:

\(\vartheta _n^1(t)=\ell \Delta _n\cos ^2(\pi t)\),   \(\vartheta _n^2(t)=\ell \Delta _n\sin ^2(\pi t)\).

Here, we vary the parameter \(\ell\) as \(\ell =0,1,2,3,4\). 10,000 paths are generated for each scenario. We use \(h_n=20\Delta _n\) to calculate the estimator \({\widehat{L}}^{n}\) and \(K_n=20\) to compute the subsampling-based asymptotic variance estimators. In addition, we set \(H_n=2h_n^{1/3}\) and \(\mathcal {K}(t)=\frac{3}{4}(1-t^2)1_{[-1,1]}(t)\) (the Epanechnikov kernel), while we compute the spot lead–lag time estimator \({\widehat{\vartheta }}_n(t)\).

Table 1 presents the results of estimating the spectral lead–lag index \(\mathbf {SLL}_n\). We report the bias and the root-mean-square error (RMSE) of the estimator \(\widehat{\mathbf {SLL}}_n\) in seconds, i.e., the bias and the RMSE of \(n\cdot \widehat{\mathbf {SLL}}_n\). We see from the table that the biases of the estimates are less than 1% of the true values and the RMSEs are mild, indicating a good finite sample performance of our estimator. The table also reveals that both the biases and the RMSEs decrease as the correlation parameter increases. This is in line with our asymptotic theory and intuitively natural as well, because the lead–lag relationships would be more pronounced with higher correlations.

Table 2 shows the results for the Studentization of \({\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1\) to check the accuracy of the standard normal approximation. We report the sample means, sample standard deviations (SD), as well as 95 and 99% coverages of the studentization of \({\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1\), i.e., the variable in the left side of (3.3). The results of the table show that the standard normal approximation works very well.

Table 3 reports the rejection rates of the hypothesis testing presented in Sect. 4 with 5% level. We have two options \(T^\text {KS}_n\) and \(T^\text {Kuiper}_n\) for the test statistics to implement the test. As the table reveals, the sizes of the test are well controlled for both of the test statistics. The powers of the test are very high for both of the test statistics in Scenario 1, while they are much better for \(T^\text {Kuiper}_n\) than for \(T^\text {KS}_n\) in Scenario 2, especially when \(\ell\) or R is small. This might be explained as follows. In the alternative hypothesis, we expect that the values of \(T_n^{\text {KS}}\) and \(T_n^{\text {Kuiper}}\) would be dominated by \(\xi \sup _{0\le t\le 1}|\int _0^t\vartheta _n(s)\mathrm {d}s|\) and \(\xi (\sup _{0\le t\le 1}\int _0^t\vartheta _n(s)\mathrm {d}s-\inf _{0\le t\le 1}\int _0^t\vartheta _n(s)\mathrm {d}s)\), respectively, where \(\xi\) is a positive random variable. In Scenario 2, we have:

$$\begin{aligned} \sup _{0\le t\le 1}|\int _0^t\vartheta _n(s)\mathrm {d}s|=\frac{\ell \Delta _n}{2\pi }\qquad \text {and}\qquad \sup _{0\le t\le 1}\int _0^t\vartheta _n(s)\mathrm {d}s-\inf _{0\le t\le 1}\int _0^t\vartheta _n(s)\mathrm {d}s=\frac{\ell \Delta _n}{\pi }, \end{aligned}$$

so the latter is two times larger than the former. Meanwhile, we numerically obtain \({\bar{F}}_{\text {KS}}^{-1}(0.05)\approx 2.24\) and \({\bar{F}}_{\text {Kuiper}}^{-1}(0.05)\approx 2.50\). These computations could suggest the superior performance of \(T_n^{\text {Kuiper}}\). The performance of the test based on \(T^\text {Kuiper}_n\) is satisfactory for most situations, so it is recommended to use \(T^\text {Kuiper}_n\) as the test statistic.

Table 4 provides the results of estimating the spot lead–lag time \(\vartheta _n(t)\). We report the root mean integrated square error (RMISE) of the estimator \({\widehat{\vartheta }}_n(t)\) in seconds, that is:

$$\begin{aligned} n\sqrt{E\left[ \int _0^1\left\{ {\widehat{\vartheta }}_n(t)-\vartheta _n(t)\right\} ^2\mathrm {d}t\right] }. \end{aligned}$$

The results show the relatively large errors compared to the estimation of the spectral lead–lag index, which is because of the slower rate of convergence presented by Theorem 5.1. However, for high correlation situations, they still seem acceptable.

In summary, our new methodology exhibits a good finite sample performance to analyze small and time-varying lead–lag relationships, and it gives a promising way to investigate lead–lag relationships in high-frequency financial data.

Table 1 Bias and RMSE for \(\widehat{\mathbf {SLL}}_n\) (in seconds)
Table 2 Results for the studentization of \({\widehat{L}}^{n,1}_1/{\widehat{L}}^{n,2}_1\)
Table 3 Rejection rates of the hypothesis testing from Sect. 4 (5% level)
Table 4 Root-mean-integrated-square errors (RMISEs) for \({\widehat{\vartheta }}_n(t)\) (in seconds)

7 Empirical illustration

To illustrate how our new methodology works in real data, we analyze lead–lag relationships between the S&P 500 index and its two derivative products: the E-mini S&P 500 futures and the S&P 500 Standard and Poor’s Depository Receipt (SPDR) Exchange-Traded Fund (ETF). The sample period is the whole of January 2017, containing 20 trading days. We use intraday transaction data taken from the Bloomberg with the accuracy of the timestamp values being one second. We set \(h_n=\text {20 seconds}\), \(K_n=20\), \(H_n=2h_n^{1/3}\), and \(\mathcal {K}(t)=\frac{3}{4}(1-t^2)1_{[-1,1]}(t)\) as in the simulation study.

Figure 1 shows the estimates \(\widehat{\mathbf {SLL}}_n\) of the spectral lead–lag indices and the associated 95% confidence intervals. The red ones are significantly non-zero at the 5% level. The left panel of the figure shows strong evidence that the futures lead the index, which is consistent with the previous studies (see Kawaller et al. (1987) and de Jong and Nijman (1997) for instance). The middle panel also reveals strong evidence that the ETF leads the index. This means that the ETF dominates the index in terms of the price discovery process and it is again consistent with the previous studies such as Tse et al. (2006). The right panel indicates that the futures tend to lead the ETF, but it is less pronounced than the relationships between the index and the futures/ETF. A similar finding in terms of the price discovery process is reported by Tse et al. (2006).

Figure 2 shows the spot lead–lag time estimates of \({\widehat{\vartheta }}_n(t)\) averaged over the sample period and the corresponding 95% confidence bands. We find that U-shape patterns of the intraday variations for the pairs of the index and the futures/ETF. Namely, the spot lead–lag times are shorter at the beginning and the end of the days, while longer at the middle of the days. The peaks of the lead–lag times are located around 14:00 pm. This would be because the market is active at the beginning and the end of the days, yielding fast responses of traders and shorter lead–lag times. On the other hand, the spot lead–lag time process between the futures and the ETF exhibits an increasing trend throughout the day, although the confidence bands for the estimates are comparatively wide and, thus, the significance of this trend seems unclear.

Fig. 1
figure 1

Values of \(\widehat{\mathbf {SLL}}_n\) and their 95% confidence intervals between the S&P 500 index, the E-mini futures and the SPDR ETF on the trading days in January 2017. Note: left panel: \(Y^1\) is the S&P 500 index and \(Y^2\) is the E-mini futures, middle panel: \(Y^1\) is the S&P 500 index and \(Y^2\) is the SPDR ETF, and right panel: \(Y^1\) is the E-mini futures and \(Y^2\) is the SPDR ETF. \(\widehat{\mathbf {SLL}}_n>0\) implies that \(Y^1\) leads \(Y^2\), and vice versa. Significantly non-zero values are colored red

Fig. 2
figure 2

Estimates of \({\widehat{\vartheta }}_n(t)\) averaged over the sample period and their 95% confidence bands between the S&P 500 index, the E-mini futures, and the SPDR ETF. Note: left panel: \(Y^1\) is the S&P 500 index and \(Y^2\) is the E-mini futures; middle panel: \(Y^1\) is the S&P 500 index and \(Y^2\) is the SPDR ETF; right panel: \(Y^1\) is the E-mini futures and \(Y^2\) is the SPDR ETF. The blue lines denote the estimates of \({\widehat{\vartheta }}_n(t)\) averaged over the sample period, i.e. \(\frac{1}{20}\sum _{i=1}^{20}{\widehat{\vartheta }}_n^{(i)}(t)\), where \({\widehat{\vartheta }}_n^{(i)}(t)\) denotes the estimates \({\widehat{\vartheta }}_n(t)\) on the ith day. The gray dot lines denote the corresponding 95% confidence bands