Estimating a gradual parameter change in an AR(1)-process

We discuss the estimation of a change-point t0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document} at which the parameter of a (non-stationary) AR(1)-process possibly changes in a gradual way. Making use of the observations X1,…,Xn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_n$$\end{document}, we shall study the least squares estimator t^0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{t}_0$$\end{document} for t0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0$$\end{document}, which is obtained by minimizing the sum of squares of residuals with respect to the given parameters. As a first result it can be shown that, under certain regularity and moment assumptions, t^0/n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{t}_0/n$$\end{document} is a consistent estimator for τ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _0$$\end{document}, where t0=⌊nτ0⌋\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_0 =\lfloor n\tau _0\rfloor $$\end{document}, with 0<τ0<1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<\tau _0<1$$\end{document}, i.e., t^0/n→Pτ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{t}_0/n \,{\mathop {\rightarrow }\limits ^{P}}\,\tau _0$$\end{document}(n→∞)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(n\rightarrow \infty )$$\end{document}. Based on the rates obtained in the proof of the consistency result, a first, but rough, convergence rate statement can immediately be given. Under somewhat stronger assumptions, a precise rate can be derived via the asymptotic normality of our estimator. Some results from a small simulation study are included to give an idea of the finite sample behaviour of the proposed estimator.

Remark 1 (a) As in earlier works, we only study the case of a gradual change under "local alternatives" here, i.e., under β 1,n → 0, but β 1,n √ n → ∞ as n → ∞ (cf., e.g., Dümbgen (1991), Jarušková (1998a), Hušková (1998a), Hušková (2001), or Hušková and Steinebach (2002)). (b) In case of the gradual change function g n being unknown, it would be sufficient to have an estimating function g n (say), which approximates g n at a certain rate. For a more detailed discussion we refer to Remarks 5 and 7 below.
Note that, if g(·, t 0 ) is a bounded function, then b := sup t≥1 |β 0 + β 1 g(t, t 0 )| < 1 ( 1 . 4 ) for sufficiently large n and, by a repeated application of (1.1), (β 0 + β 1 g(t − i, t 0 )) (t = 1, 2, . . .). (1.5) Before we turn to formulating our main results, we give a brief account of related works, particularly concerning the detection of gradual changes in various dependent data sets. Most of the earlier papers on the change analysis in autoregressive processes deal with abrupt changes, either in the mean or in the autoregressive parameters, respectively in the variance of the error process. Picard (1985) proposed a procedure for testing changes in the covariance structure of an AR(p)-process based on a likelihood ratio approach and obtained the asymptotic distribution of the likelihood estimators of the change parameters. Gaussian-type likelihood ratio procedures for testing abrupt changes in autoregressive models were studied in Davis et al. (1995), where also the limit distribution of the test statistic was established. Gombay (2008) used efficient score vectors to develop statistics that are able to test changes in any of the parameters of a Gaussian AR(p)-model separately, or in any collection of them, and she also studied large sample properties of the change-point estimator.
Other results were based on partial sums of residual processes, see, e.g., Horváth (1993) for testing or Bai (1994) for proving consistency of the change-point estimator. Hušková et al. (2007) used an approach based on partial sums of weighted residuals and obtained asymptotic distributions for various max-type test statistics together with proving the consistency of the change-point estimator in an AR(p)-model. Moreover, bootstrap versions of the proposed tests were studied in Hušková et al. (2008).
A quasi-maximum likelihood method was used in Bai (2000) to analyze vector autoregressive models (VAR) possessing multiple structural changes. The approach developed in Davis et al. (1995) has been extended to VAR models by Dvořák (2015), who dealt with asymptotic tests for an abrupt change in the autoregressive parameters and the variance structure under various assumptions on the correlations of the errors. Kirch et al. (2015) extended the class of max-type change-point statistics considered in Hušková et al. (2007) to the VAR case and epidemic change alternatives and developed a new approach taking possible misspecification of the model into account. Slama and Saggou (2017) considered a Bayesian analysis of a possible change in the parameters of an AR(p)-model and developed a test, which can detect a change in any of the parameters separately. Moreover, the posterior density of the change-point is given by using a Gibbs sampler. Many references on the change-point analysis in time series can also be found in a survey paper by Aue and Horváth (2013).
Concerning gradual changes in autoregression, Salazar (1982) studied a model similar to (1.1) from a Bayesian point of view. Under the assumption of normality of the error process and with some joint prior distribution of the change-point t 0 and other parameters under consideration, he obtained a joint posterior distribution from which the marginal distribution of the change-point could be obtained via numerical integration. A similar, though not identical problem was solved by Venkatesan and Arumugam (2007), who considered an AR(p)-model with a gradual switch in the parameters over a finite interval. Here again, computation of the posterior distribution of the change-point requires the use of numerical integration. He et al. (2008) derived a parameter constancy test in a stationary vector autoregressive model against the hypothesis that the parameters of the model change smoothly over time. Though model (1.1) could be considered a special case of the model studied in He et al. (2008), the authors treat other type of smooth functions and do not consider any estimator of the breaking point.
Our approach below is motivated by the previous work by Hušková, see, e.g., Hušková (1998aHušková ( , b, 1999Hušková ( , 2001, Jarušková (1998aJarušková ( , b, 1999Jarušková ( , 2001Jarušková ( , 2002Jarušková ( , 2003, or by Hušková and Steinebach (2000), Hušková and Steinebach (2002), respectively by Albin and Jarušková (2003). In the above cited papers a gradual-type change in the mean of a location model is considered and asymptotic tests for detecting the change together with limit properties of the estimator of the change-point are developed for various types of smoothly changing parameters. More specifically, the mentioned model can be written in the form . . . , n, (1.6) where μ, δ n , t 0 are unknown parameters, 1 , . . . , n are i.i.d. errors, with zero mean and finite moments of order 2 + , > 0, and the function g satisfies the assumption together with other assumptions specified for the formulated problems. Döring and Jensen (2015), and also Döring (2015a, b), extended the methodology proposed by  to regression models with independently distributed random regressors. Wang (2007) studied the same location model as  with errors that exhibit long memory dependence, and Slabý (2001) considered a test based on ranks. Hlávka and Hušková (2017), motivated by gender differences observed in a real data set, proposed a two-sample gradual change test that leads to more precise results than the application of a procedure based on the standard two-sample t-test. Račkauskas and Tamulis (2013) studied epidemic changes in a location model, in which the transition between regimes is gradual.
Several authors studied smooth changes in other contexts. For example, Aue and Steinebach (2002) discuss an extension of  approach to certain statistical models, which cover more general classes of stochastic processes satisfying an invariance principle (see also Kirch and Steinebach (2006), Steinebach (2000), Steinebach and Timmermann (2011) or Timmermann (2014), Timmermann (2015). Vogt and Dette (2015) developed a nonparametric method to estimate a smooth change-point in a locally stationary framework and established the rate of convergence of the change-point estimator. Their procedure allows to deal with a wide variety of stochastic characteristics including the mean, covariances and higher moments. Hoffmann et al. (2018) and Hoffmann and Dette (2019) discuss statistical inference for the detection and the localization of gradual changes in the jump characteristic of a discretely observed Itô semimartingale. Quessy (2019) proposed a general class of consistent test statistics for the detection of gradual changes in copulas and developed their large-sample properties. Now, let us turn to our problem. We shall study the least squares estimator t 0 for t 0 , which is obtained by minimizing (1.7) Remark 2 The technical condition t * ≤ n(1 − δ) , with δ > 0 fixed, could be weakened to allow for δ = δ n → 0 (n → ∞) at a certain rate, which, however, would depend on the parameter β 1 = β 1,n from (1.2) and the function g = g n from (1.3) as well. Since β 1,n is unknown, one should choose δ > 0 fixed, but small, for practical use.
Via partial derivatives, it is not difficult to show that, for fixed t * , (1.9) On plugging this into (1.7), we obtain (1.10) Since the first term in (1.10) does not depend on t * , a combination of (1.7)-(1.10) eventually results in (1.11) can be used as a test statistic for testing "no change" versus "there is a change", even if the true function g is unknown, just some integral has to be nonzero (see, e.g., Hušková and Steinebach (2002)). In practice, before starting to estimate t 0 = nτ 0 , one should first carry out such a test for the existence of a change-point τ 0 , with 0 < τ 0 < 1.
For our theoretical studies of t 0 below, it will be convenient to make use of the model equation (1.1) and rewrite (1.11), after a multiplication with 1/n, as (1.13) For later asymptotics it may also be convenient to express t 0 as (1.15) The paper is organized as follows. Based on the required assumptions, which are collected first, Sect. 2 presents the main results of our work. As a first statement it can be shown in Theorem 1 that t 0 /n is a consistent estimator for τ 0 , where t 0 = nτ 0 , with 0 < τ 0 < 1, i.e., t 0 /n P → τ 0 (n → ∞). Based on the rates obtained in the proof of Theorem 1, a rough convergence rate estimate can immediately be given (see Theorem 2). Under somewhat stronger assumptions, a precise rate can then be derived in Theorem 3 by showing that our estimator has an asymptotically normal limit distribution. In Sect. 3, some results from a small simulation study are included to give an idea of the finite sample behaviour of the proposed estimator. Section 4 collects some auxiliary results which are used in the proofs of the main theorems. The latter are finally given in Sect. 5.

Assumptions and main results
For our asymptotic results we assume the gradual change function g(·, t * ) to satisfy the following assumptions: (A.1) For every t * = 0, 1, . . . , n − 1, the function g(·, t * ) is of the form where g 0 : (−∞, 1] → R is a real function satisfying: (A.2) It holds that where g 0+ (0) denotes the right derivative at 0, g 0 (·) denotes the derivative, assumed to be bounded and Riemann integrable, 1/2 < ≤ 1, and D 3 is a positive constant.
(b) The function g 0 , for example, could be such that g 0 (x) = ± x κ + (x ≤ 1), where x + denotes the positive part of x and κ ≥ 1 is a fixed exponent.

Remark 5
In case of an unknown change function g, it will be obvious from the proof of Theorem 1 that (2.1) still holds, if g in (1.11) is replaced by an estimator g n at a rate o P (β 1 ), more precisely, if there is an estimating function g 0 = g 0,n such that, as n → ∞, and, in view of the rate o P (β 1 ), the convergence in (2.1) will be retained. If, for example, g 0 (x) = x κ + , with some κ ≥ 1, it would be sufficient to have an estimator κ n such that | κ n −κ| = o P (β 1 ), e.g., | κ n −κ| = O P (1/ √ n) as n → ∞. Such estimates have been obtained in other settings (cf., e.g., Döring and Jensen (2015) for a regression model). In our time series setting, it is an open question and has to be left for future research.
Another possible model to deal with would be the case β 1 g 0 (x) = β 1 x + + β 2 x 2 + , with unknown parameters β 1 , β 2 . Here, least squares estimation means to minimize , δ > 0, and then to modify the corresponding steps in the proofs. For the sake of conciseness, this may also be left for further investigations.
The proofs of Theorem 1 and Remark 5 are postponed to Sect. 5.
Remark 6 It would also be quite straightforward to get a consistent estimator of β 1 , i.e., b 1 ( t 0 ), together with some limiting properties. For the sake of conciseness, we want to omit details here. Also, if the function g(·) is only known up to a multiplicative constant, then the resulting estimator is still consistent, but the limit distribution below, however, would depend on this multiplicative constant, which is unknown.
On checking the proof of Theorem 1 more carefully, a rough rate of consistency for our estimator t 0 can be obtained as follows.
The proofs of Theorem 2 and Remark 7 are also postponed to Sect. 5.

Remark 8
If, for example, β 1 = n −α , with 0 < α < 1/2, then ε n could be chosen as (log n) − p , with p > 0, so that one would have the polynomial consistency rate (2.5) Next it will be shown that the estimator t 0 of t 0 (or equivalently τ 0 of τ 0 ) has an asymptotically normal limit distribution.

Theorem 3 Let Assumptions (A.1)-(A.4) be satisfied. Then, as n → ∞,
or, in a standardized form, (2.9) Remark 9 Note that the limit distribution in Theorem 3 does not depend on σ 2 .
For the proof of Theorem 3 see also Sect. 5.

Some simulations
In this section, before we turn to the proofs of Theorems 1-3, we first present some results from a small simulation study. We simulated observations of the time series (1.1) with the function g 0 (x) = x + , x ≤ 1, for various combinations of β 0 and β 1 and for various change-points t 0 . The errors e t were considered to be i.i.d. with a standard normal distribution. The first 50 simulated values were deleted to start computations with stationary observations X t for t = 1, . . . , t 0 . We simulated either n = 500, 1000 or 5000 observations of (1.1). For each realization of {X t , t = 1, . . . , n}, we estimated the change-point t 0 according to (1.11) with the given function g 0 and for t * running from 0 to n(1 − δ) , where δ > 0 denotes the proportion of observations, which were excluded. For each combination we used 10 000 simulation runs. The change-point was chosen to be either t 0 = n/4, n/2 or 3/4n, τ 0 = t 0 /n, and δ = 0.05. The parameters β 0 and β 1 should satisfy the asymptotic relation (1.2), i.e., |β 0 | < 1, β 1 → 0, and |β 1 | √ n → ∞ as n → ∞. For β 1 we used small multiples of 1/ √ log log n such that the asymptotic condition (1.2) and also the condition |β 0 + β 1 g 0 ((t − t 0 )/n)| < 1 for t = 1, . . . , n(1 − δ) were satisfied. Though in fact β 1 depends on n, we used the same value always for all considered sample sizes n, to make the presentation of the results more transparent.
It can be seen that the point estimators of τ 0 (or t 0 , respectively) and of β 0 are systematically slightly underestimated, but otherwise behave quite well for all considered variants. The estimators of β 1 are more volatile, but they converge to the true value with growing number of observations. To study the behaviour of the estimator of the change-point in finite samples, we also displayed histograms and Q-Q plots of the standardized statistic in (2.8). Some results are presented in Figs. 1, 2, 3, 4, 5, 6. The simulations show that the behaviour of the estimator t 0 depends both on the size of the sample and on the parameters β 0 and β 1 . The assumption |β 0 + β 1 g 0 ((t − t 0 )/n)| < 1 guarantees the stability of the time series (1.1) even after the local change, expressed by the time varying part of the autoregressive coefficient. Due to the local character of the change the convergence to the normal distribution is slow and it can only be expected for sufficiently large samples. In Figs. 1, 2, 3, 4, 5, 6 convergence to normality is demonstrated for respective sample sizes n = 500 (left panels) and n = 5000 (right panels). The role of the parameter β 1 , which represents the rate of the change of the autoregressive coefficient, can be seen on comparing Figs. 3 and 4. In Fig. 3, the value of β 0 + β 1 g 0 ((t − t 0 )/n) changes from 0 to 0.9, while in Fig. 4 it varies from 0.5 to 0.9 for the same values of t = t 0 + 1, . . . , n. The gradual change in the first case is much faster than in the second one, where the change is slow and the increments to the autoregressive coefficient are very small. In this case, the graph of the function on the right-hand side of (1.11) can be very flat and its global maximum can be incorrectly detected or it is detected either very soon or very late. This explains the large values of the side column in the histogram and the large skewness of the test statistic, especially in the left part of Fig. 4. Also the position of t 0 , and thus the length of the stationary part of the time series under consideration, plays a role. In general, we observe better results, when the change occurs in the middle of the observed time intervals. For smaller sample sizes the kurtosis of the standardized statistics is larger and the finite sample distribution has heavier tails than the normal distribution, but it improves with growing sample size.

Some auxiliary results
In this section, we collect a series of auxiliary results, which are used in the proofs of our Theorems 1-3. In the sequel, C denotes a generic positive constant, independent of t, t * and n, which may vary from case to case.
The following two lemmas from real analysis will also be used in the proof of Theorem 1.
Proof For n sufficiently large, there exists an integer k n such that 2 k n ≤ b n < 2 k n +1 . Obviously, k n → ∞ as n → ∞, so that a choice of ε n = 1/k n completes the proof.
Lemma 6 Let f be a continuous real function on a compact set K and x 0 be a unique maximizer of f , i.e., x 0 = arg max x f (x). Furthermore assume that lim n→∞ max x∈K | f n (x) − f (x)| = 0 and let x n = arg max x f n (x) be a maximizer of f n (not necessarily unique). Then, x n → x 0 as n → ∞.
Proof Suppose x n x 0 as n → ∞. Since K is compact, there exists a subsequence { x k n } and an x 1 = x 0 such that x k n → x 1 , hence This, however, contradicts our assumptions, since so that the proof is complete.
Next, we need some extensions of Lemmas 1-4. Particularly, we study properties of the following quantities for |t 0 − t * | ≤ b n , with b n → ∞, b n /n → 0: We start with an extension of Lemma 3.

Lemma 7
Let the assumptions of Theorem 3 be satisfied and let t * be such that |t * − t 0 | ≤ r n √ n |β 1 | −1 , with Then, for j = 0, 1, 2, L j (t 0 , t * ) has an asymptotically normal limit distribution, with zero mean and variance σ 2 j , where Moreover, as n → ∞, for some c n → 0 and any t * such that |t 0 − t * | ≤ b n , b n → ∞, b n /n → 0.
Proof We focus on L 0 (t 0 , t * ). The desired results for L 1 (t 0 , t * ) and L 2 (t 0 , t * ) can be derived in the same way, the obtained expressions are just somewhat more complex, but can be omitted. Note that, for fixed t * and t 0 , L j (t 0 , t * ), j = 0, 1, 2, are sums of martingale difference arrays. Hence to obtain their limit properties, we can apply Theorem 24.3 in Davidson (1994), p. 383, which means for L 0 (t 0 , t * ) to verify validity of the conditions 1 n n t=1 e t X t−1 We make repeated use of assumption (A.4), particularly Consider first the denominator in (4.25). Direct calculations give By Assumption (A.4) and using the same arguments as in the proofs of Lemmas 2 and 4 , we get as n → ∞, and combining all these results we have (4.27) For the numerator in (4.25) we have The first term on the r.h.s. is the sum of martingale difference arrays, with zero mean and variance which follows from the finiteness of Ee 4 t , Assumption (A.3) and the uniform boundedness of E X 4 t−1 (cf. (4.1)). Thus, Next, proceeding in the same way as in the proofs of Lemmas 1 and 4, we get and combining (4.28), (4.29) and (4.27) we obtain (4.25).
To verify (4.26) note that, due to (4.27), it suffices to prove max t 1 n e t X t−1 g(t, t * ) − g(t, t 0 ) (t 0 − t * )/n For this we have, using similar arguments as above, and we can conclude that the asymptotic normality of L 0 (t 0 , t * ) holds for fixed t * . Next we show (4.24) for j = 0. We proceed as in the proof of Lemma 3, part c). Toward this we study for t 0 < t * < t * * and |t 0 − t * | + |t * − t * * | ≤ b n , satisfying b n → ∞, b n /n → 0, the quantity since |t 0 − t * | ∨ |t * − t * * | ≤ |t 0 − t * * | and ≤ 1. In view of 2 > 1 by Assumption (A.4), assertion (4.24) can now be finished by again making use of Billingsley (1968), Theorem 15.6, as in the proof of Lemma 3, part c).
The next lemma is an extension of Lemma 4.

Lemma 8 Under the assumptions of Theorem 3 we have
(4.31) Moreover, as n → ∞, for some c n → 0 and any |t * − t 0 | ≤ b n , with b n → ∞, b n /n → 0.
Proof Note that We have The latter sum on the r.h.s has only |t 0 −t * | ≤ 2b n summands, with b n → ∞, b n /n → 0, and the terms g( j, t 0 )−g( j, t * ) (t * −t 0 )/n are bounded in j, thus the respective terms are not influential. By (A.4), where the last relation follows from (4.1) and (4.11). Proceeding as in the proofs of Lemmas 1 and 2, we get The result for L 42 (t * , t 0 ) is obtained in the same way and thus the assertion on L 4 (t * , t 0 ) follows. The proof for L 3 (t * , t 0 ) follows the same lines and can therefore be omitted.
To prove (4.32) we proceed as in the proof of (4.24). To avoid too many technicalities we study only As in the proof of Lemma 4 and utilizing Assumption (A.4), where we also used arguments from the proof of Lemma 7. Finally, On combining all these estimates, we can conclude that (4.32) holds true.
By Theorem 1 and since Q(t 0 ) (defined below) does not depend on t * , the estimator t 0 has the same limit distribution as We need to study the properties of Q j (t * ) − Q j (t 0 ), j = 1, 2, 3, separately. This is formulated in the next three Propositions.

Proposition 1 Under the Assumptions (A.1)-(A.4) we get
Proof Direct but long calculations give Using the assertions in Lemma 1, 2, 4 and the calculations in Lemma 8, we get after some steps, uniformly in |t * − t 0 )| ≤ b n , which suffices for the proof.
On applying Lemmas 1-3 and 7 to the above sums, we get that the first term on the r.h.s. is influential, while the latter one is negligible. So, both assertions follow from here.

Proof Direct calculations give
uniformly for |t * − t 0 | ≤ b n , where the last relation is implied by a combination of Lemmas 7 and 8. Then both assertions follow immediately.

Proofs of the main theorems
Now we are ready to turn to the proofs of Theorems 1-3. We first prove the consistency of our least squares estimator t 0 .
Via the subsequence principle for convergence in probability, the proof of Theorem 1 can now be completed from (5.7) by applying Lemma 6.

Proof of Remark 5
In case of an unknown change function g, it is obvious from the proof of Theorem 1 that, under (2.2), g(t, t * ) = g 0 ((t − t * )/n) in (1.11) resp. (1.13) can be replaced by g n (t, t * ) = g 0 ((t −t * )/n). The reason is that, in view of (5.1)-(5.3) and the rate o P (β 1 ), the convergence in (5.4) still holds with estimated g(t, t * )'s, so that the proof can be completed as before.

Proof of Theorem 2
In view of the consistency obtained in Theorem 1, it suffices to concentrate on a small neighbourhood [τ 1 , τ 2 ] of τ 0 . With the notations in (5.4), we have where τ n is between τ 0 and τ 0 .