On the Whittle estimator for linear random noise spectral density parameter in continuous-time nonlinear regression models

A continuous-time nonlinear regression model with Lévy-driven linear noise process is considered. Sufficient conditions of consistency and asymptotic normality of the Whittle estimator for the parameter of spectral density of the noise are obtained in the paper.


Introduction
The paper is focused on such an important aspect of the study of regression models with correlated observations as an estimation of random noise functional characteristics. When considering this problem the regression function unknown parameter becomes nuisance and complicates the analysis of noise. To neutralise its presence, we must estimate the parameter and then build estimators, say, of spectral density parameter of a stationary random noise using residuals, that is the difference between the values of the observed process and fitted regression function.
So, in the first step we employ the least squares estimator (LSE) for unknown parameter of nonlinear regression, because of its relative simplicity. Asymptotic properties of the LSE in nonlinear regression model were studied by many authors. Numerous results on the subject can be found in monograph by Ivanov and Leonenko (1989), Ivanov (1997).
In the article by Koul and Surgailis (2000) in the linear regression model the asymptotic properties of the Whittle estimator of strongly dependent random noise spectral density parameters were studied in a discrete-time setting.
In the paper by Ivanov and Prykhod'ko (2016) sufficient conditions on consistency and asymptotic normality of the Whittle estimator of the spectral density parameter of the Gaussian stationary random noise in continuous-time nonlinear regression model were obtained using residual periodogram. The current paper continues this research extending it to the case of the Lévy-driven linear random noise and more general classes of regression functions including trigonometric ones. We use the scheme of the proof in the case of Gaussian noise (Ivanov and Prykhod'ko 2016) and some results of the papers (Avram et al. 2010;Anh et al. 2004). For linear random noise the proofs utilize essentially another types of limits theorems. In comparison with Gaussian case it leads to the use of special conditions on linear Lévy-driven random noise, new consistency and asymptotic normality conditions.
In the present publication continues-time model is considered. However, the results obtained can be also used for discrete time observations using the statements like Theorem 3 of Alodat and Olenko (2017) or Lemma 1 of Leonenko and Taufer (2006).
We introduce a two-sided Lévy process L(t), t ∈ R, defined for t < 0 to be equal an independent copy of −L(−t).
From the condition A 1 it follows Anh et al. (2002) for any r ≥ 1 In turn from (6) it can be seen that the stochastic process ε is stationary in a strict sense. Denote by the moment and cumulant functions correspondingly of order r , r ≥ 1, of the process ε.
is a covariance function of ε, and the fourth moment function The explicit expression for cumulants of the stochastic process ε can be obtained from (6) by direct calculations: where d r is the r th cumulant of the random variable L(1). In particular, Under the condition A 1 , the spectral densities of the stationary process ε of all orders exist and can be obtained from (8) as where a ∈ L 2 (R), a(λ) = Râ (t)e −iλt dt, λ ∈ R, if complex-valued functions f r ∈ L 1 R r −1 , r > 2, see, e.g., Avram et al. (2010) for definitions of the spectral densities of higher order f r , r ≥ 3. For r = 2, we denote the spectral density of the second order by τ > 0 is some number, ⊂ R m is a bounded open convex set, that is f (λ) = f (λ, θ ), θ ∈ τ , and a true value of parameter θ 0 ∈ ; In the condition A 2 (ii) above θ (1) represents parameters of the kernelâ in (4), while θ (2) represents parameters of Lévy process.

Remark 2
The last part of the condition A 1 is fully used in the proof of Lemma 5 and Theorem B.1 in "Appendix B". The condition A 2 (i) is fully used just in the proof of Lemma 5. When we refer to these conditions in other places of the text we use them partially: see, for example, Lemma 3, where we need in the existence of f 4 only.

Definition 1
The least squares estimator (LSE) of the parameter α 0 ∈ A obtained by observations of the process {X (t), t ∈ [0, T ]} is said to be any random vector We consider the residual periodogram and the Whittle contrast field where w(λ), λ ∈ R, is an even nonnegative bounded Lebesgue measurable function, for which the intgral (10) is well-defined. The existence of integral (10) follows from the condition C 4 introduced below.

Definition 2
The minimum contrast estimator (MCE) of the unknown parameter θ 0 ∈ is said to be any random vector θ T = θ 1T , . . . , θ mT such that The minimum in the Definition 2 is attained due to integral (10) continuity in θ ∈ c as follows from the condition C 4 introduced below.

Consistency of the minimum contrast estimator
Suppose the function g(t, α) in (1) is continuously differentiable with respect to α ∈ A c for any t ≥ 0, and its derivatives g i (t, α) = ∂ ∂α i g(t, α), i = 1, q, are locally integrable with We assume that the following conditions are satisfied. C 1 . The LSE α T is a weakly consistent estimator of α 0 ∈ A in the sense that There exists a constant c 0 < ∞ such that for any α 0 ∈ A and T > T 0 , where c 0 and T 0 may depend on α 0 , The fulfillment of the conditions C 1 and C 2 is discussed in more detail in "Appendix A". We need also in 3 more conditions.
are continuous with respect to θ ∈ c almost everywhere in λ ∈ R, and C 5 . There exists an even positive Lebesgue measurable function v(λ), λ ∈ R, such that To prove the theorem we need some additional assertions.

Lemma 1 Under condition
Proof For any ρ > 0 by Chebyshev inequality and (7) From A 1 it follows that I 2 = O(T −1 ). Using expression (8) for cumulants of the process ε we get The functions F (k) T (u 1 , . . . , u k ), k ≥ 3, are multidimensional analogues of the Fejér kernel, for k = 2 we obtain the usual Fejér kernel.
Lemma 2 Let function G (u 1 , . . . , u k ), u k = − (u 1 + . . . + u k−1 ) be bounded and continuous at the point (u 1 , . . . , u k−1 ) = (0, . . . , 0). Then We set and write the residual periodogram in the form Let ϕ = ϕ(λ, θ), (λ, θ ) ∈ R × c , be an even Lebesgue measurable with respect to variable λ for each fixed θ weight function. We have Then by the Plancherel identity and condition C 2 Taking into account conditions A 1 , C 1 , C 2 and the result of Lemma 1 we obtain On the other hand and again, thanks to C 1 , C 2 , Lemma 3 Suppose conditions A 1 , A 2 are fulfilled and the weight function ϕ(λ, θ) introduced above satisfies (11). Then, as T → ∞, Proof The lemma in fact is an application of Lemma 2 in Anh et al. (2002) and Theorem 1 in Anh et al. (2004) reasoning to linear process (4). It is sufficient to prove Omitting parameters θ 0 , θ in some formulas below we derive (1) To apply Lemma 2 we have to show that the functions G 2 (u), u ∈ R; G (1) 4 (u), u = (u 1 , u 2 , u 3 ) ∈ R 3 , are bounded and continuous at origins.
Boundedness of G 2 follows from (11). Thanks to (11) sup u∈R 3 On the other hand, by (9)

|G
(2) Integral I 4 admits the same upper bound. So, > 0 is the excess of L(1) distribution, and functions G 2 , G 4 are bounded. The continuity at origins of these functions follows from conditions of Lemma 3 as well.
, then under conditions A 1 , A 2 , C 1 , C 2 and C 4 Consider the Whittle contrast function with K (θ 0 , θ) = 0 if and only if θ = θ 0 due to C 3 .

Lemma 4
If the coditions A 1 , A 2 , C 1 , C 2 , C 4 and C 5 are satisfied, then by Corollary 1. On the other hand, By the condition C 5 (i) Since by Lemma 3 and the condition C 5 (ii) and the 2nd term under the probability sign in (14) by chosing δ can be made arbitrary small, then P 1 → 0, as T → 0, taking into account that the 3rd and the 4th terms converge to zero in probability, thanks to (12) and (13), if ϕ = w f .

Proof of Theorem 1 By Definition 2 for any
when T → ∞ due to Lemma 4 and the property of the contrast function K .

Asymptotic normality of minimum contrast estimator
The first three conditions relate to properties of the regression function g(t, α) and the LSE α T . They are commented in "Appendix B".
The function g(t, α) is continuously differentiable with respect to t ≥ 0 for any α ∈ A c and for any α 0 ∈ A, and T > T 0 there exists a constant c 0 (T 0 and c 0 may depend on α 0 ) such that The function g(t, α) is twice continuously differentiable with respect to α ∈ A c for any t ≥ 0, and for any R ≥ 0 and all sufficiently large T (T > T 0 (R)) with positive constants c i , c il ,c il , possibly, depending on α 0 . We assume also that the function f (λ, θ ) is twice differentiable with respect to θ ∈ c for any λ ∈ R. Set and introduce the following conditions.
(4) ϕ i are differentiable and ϕ i are uniformly continuous on R.
Conditions N 5 (iii) and C 5 (ii) look the same, however the function v in these conditions must satisfy different conditions N 5 (ii) and C 5 (i), and therefore, generally speaking, the functions v in these two conditions can be different.
The next three matrices appear in the formulation of Theorem 2: > 0 is the excess of the random variable L(1), ∇ θ is a column vector-gradient, ∇ θ is a row vector-gradient. N 6 . Matrices W 1 (θ ) and W 2 (θ ) are positive definite for θ ∈ .

Theorem 2 Under conditions
The proof of the theorem is preceded by several lemmas. The next statement is Theorem 5.1 Avram et al. (2010) formulated in a form convenient to us.

Lemma 5 Let the stochastic process ε satisfies
and Then the central limit theorem holds: where "⇒" means convergence in distributions, where > 0 is the excess of the random variable L(1). In particular, the statement is true for p = 2 and q = ∞.
Alternative form of Lemma 5 is given in Bai et al. (2016). We formulate their Theorem 2.1 in the form convenient to us.

Remark 3
It is important to note that conditions of Lemma 5 are given in frequency domain, while Lemma 6 employs the time domain conditions. Theorems similar to Lemmas 5 and 6 can be found in paper by Giraitis et al. (2017), where the case of martingale-differences were considered. Overview of analogous results for different types of processes is given in the paper by Ginovyan et al. (2014). Set Proof Let B σ be the set of all bounded entire functions on R of exponential type 0 ≤ σ < ∞ (see "Appendix C"), and δ > 0 is an arbitrarily small number. Then there exists a function j e i j σ n λ , n ≥ 1, be a sequence of the Levitan polynomials that corresponds to ϕ σ . For any > 0 there exists n 0 = n 0 (δ, ) such that for n > n 0 sup So, under the condition C 2 , for any ρ > 0 The probability P 4 → 0, as T → ∞, and the probability P 3 under the condition N 1 for sufficiently large T (we will write T > T 0 ) can be made less than a preassigned number by chosing δ > 0 for a fixed ρ > 0.
As far as the function ϕ σ ∈ B σ and the corresponding sequence of Levitan polynomials T n are bounded by the same constant, we obtain The integral in the term D 1 can be majorized by an integral over R and bounded as earlier.
We have further

Under the Lemma conditions
Obviously, , and for any ρ > 0 and i = 1, q according to N 1 (or C 1 ). On the other hand, by condition N 1 the value R can be chosen so that for T > T 0 the probability P 6 becomes less that preassigned number. So, and the second probability is equal to zero, if > R ρ . Thus for any fixed ρ > 0, similarly to the probability P 3 , the probability P 7 = P{D 2 ≥ ρ} for T > T 0 can be made less than preassigned number by the choice of the value . Consider It means that For j > 0 consider the value It means that the sum S 1T P −→ 0, as T → ∞. For the general term S ik 2T of the sum S 2T and any ρ > 0, R > 0, Under condition d T (α 0 ) ( α T − α 0 ) ≤ R using assumptions N 3 (ii) and N 3 (iii) we get as in the estimation of the probability P 5 By Lemma 1 For j ≤ 0 the reasoning is similar, and Lemma 8 Let the function ϕ(λ, θ )w(λ) be continuous in θ ∈ c for each fixed λ ∈ R with |ϕ(λ, θ)| ≤ ϕ(λ), θ ∈ c , and ϕ(·)w(·) ∈ L 1 (R).
Proof By a Lebesgue dominated convergence theorem the integral I (θ ), θ ∈ c , is a continuous function. Further argument is standard. For any ρ > 0 and ε = ρ 2 we find such a δ > 0, that due to the choice of ε, and
Lemma 10 Let under conditions A 1 , A 2 there exists an even positive Lebesgue measurable function v(λ), λ ∈ R, and an even Lebesgue measurable in λ for any fixed θ ∈ c function ϕ(λ, θ), (λ, θ ) ∈ R × c , such that By Lemma 3 and the condition (iii) On the other hand, for any r > 0 under the condition (i) there exists δ = δ(r ) such that for and by the condition (ii) The relations (19)-(21) prove the lemma.

Proof of Theorem 2
By definition of the MCE θ T , formally using the Taylor formula, we get Since there is no vector Taylor formula, (22) must be taken coordinatewise, that is each row of vector equality (22) depends on its own random vector θ * T , such that θ * T − θ 0 ≤ θ T − θ 0 . In turn, from (22) we have formally As far as the condition N 4 implies the possibility of differentiation under the sign of the integrals in (10), then T contain values Re{ε T (λ)s T (λ, α T )} and |s T (λ, α T )| 2 , respectively.
Bearing in mind the 1st part of the condition N 4 (i), we take in Lemma 7 the functions Then in the formula (23) A (2) T P −→ 0, as T → ∞.

Consider the term A
(3) where ϕ i (λ) are as before. Under conditions C 1 , C 2 , N 1 and (1) of N 4 (i) A Examine the behaviour of the terms B (1) T in formula (24). Under conditions C 1 and N 4 (iii) we can use Lemma 8 with functions to obtain the convergence Under the condition N 5 (i) we can use Lemma 9 with functions Under conditions C 1 and N 5 if we take in Lemma 10 in conditions (i) and (iii) So, under conditions C 1 , C 2 , N 4 (iii) and N 5 because W 1 (θ 0 ) is the sum of the right hand sides of (25) and (26). From the facts obtained, it follows that for the proof of Theorem 2 it is necessary to study an asymptotic behaviour of vector A T from (23): We will take Under conditions (1) and (2) of N 4 (i) (Bentkus 1972b;Ibragimov 1963) On the other hand Thus we can apply Lemma 5 taking b(λ) = (2π) −1 (λ) in the formula (18) to obtain for where The relations (28) and (29) are equivalent to the convergence From (27) and (30) it follows (15).

Remark 4
From the conditions of Theorem 2 it follows also the fulfillment of Lemma 6 conditions for functionsâ andb. Really by condition A 1â ∈ L 1 (R) ∩ L 2 (R) and we can take p = 1 in Lemma 6. On the other hand, if we look at b = (2π) −1 as at an original of the Fourier transform, from N 4 (i)1) we have b ∈ L 1 (R) ∩ L 2 (R). Then according to the Plancherel theoremb ∈ L 2 (R) and we can take q = 2 in Lemma 6. Thus and conclusion of Lemma 6 is true.

Example: The motion of a pendulum in a turbulent fluid
First of all we review a number of results discussed in Parzen (1962), Anh et al. (2002), and Leonenko and Papić (2019), see also references therein. We examine the stationary Lévy-driven continuous-time autoregressive process ε(t), t ∈ R, of the order two ( C AR(2)-process ) in the under-damped case (see Leonenko and Papić 2019 for details).
The motion of a pendulum is described by the equation in which ε(t) is the replacement from its rest position, α is a damping factor, 2π ω is the damped period of the pendulum (see, i.e., Parzen 1962, pp. 111-113).
We consider the Green function solution of the equation (31), in whichL is the Lévy noise, i.e. the derivative of a Lévy process in the distribution sense (see Anh et al. 2002; Papić 2019 for details). The solution can be defined as the linear process where the Green functionâ The formula (33) for the covariance function of the process ε corresponds to the formula (2.12) in Leonenko and Papić (2019) for the correlation function On the other hand forâ(t) given by (32) Then the positive spectral density of the stationary process ε can be written as (compare with Parzen 1962) It is convenient to rewrite (34) in the form where α = θ 1 is a damping factor, β = −κ (2) (0) = d 2 (θ 2 ) = θ 2 , γ = ω = θ 3 is a damped cyclic frequency of the pendulum oscillations. Suppose that The condition C 3 is fulfilled for spectral density (35). Assume that More precisely the value of a will be chosen below.
Thus the function Z 1 (λ) in the condition C 4 (i) exists.
As for condition C 4 (ii), if a ≥ 2, then As a function v in condition C 5 we take Further it will be helpful to use the notation s(λ) = λ 2 − α 2 − γ 2 2 + 4α 2 λ 2 . Then To check the condition N 4 (i)1) consider the functions Then the condition N 4 (i)1) is satisfied for ϕ α and ϕ γ when a > 3 2 , for ϕ β when a > 5 2 . The same values of a are sufficient also to meet the condition N 4 (i)2).
In our example to satisfy the consistency conditions C 4 and C 5 the weight functions w(λ) and v(λ) should be chosen so that a ≥ b > 2. On the other hand to satisfy the asymptotic normality conditions N 4 and N 5 the functions w(λ) and v(λ) should be such that a > 5 2 and a ≥ b > 2.
The spectral density (35) has no singularity at zero, so that the functions v(λ) in the conditions C 5 (i) and N 5 (ii) could be chosen to be equal to w(λ), for example, a = b = 3. However we prefer to keep in the text the function v(λ), since it is needed when the spectral density could have a singularity at zero or elsewhere, see, e.g., Example 1 (Leonenko and Sakhno 2006), where linear process driven by the Brownian motion and regression function g(t, α) ≡ 0 have been studied. Specifically in the case of Riesz-Bessel spectral density where θ = (θ 1 , θ 2 , θ 3 ) = (α, β, γ ) ∈ = (α, α) × (β, β) × (γ , γ ), α > 0, α < 1 2 , β > 0, β < ∞, γ > 1 2 , γ < ∞, and the parameter α signifies the long range dependence, while the parameter γ indicates the second-order intermittency (Anh et al. 2004;Gao et al. 2001;Lim and Teo 2008), the weight functions have been chosen in the form Unfortunately, our conditions do not cover so far the case of the general non-linear regression function and Lévy driven continuous-time strongly dependent linear random noise such as Riesz-Bessel motion.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A: LSE consistency
Some results on consistency of the LSE α T in the observation model of the type (1) with stationary noise ε(t), t ∈ R, were obtained, for example, in Ivanov and Leonenko (1989, 2004, 2007, Ivanov (1980Ivanov ( , 2010, Ivanov et al. (2015) to mention several of the relevant works. In this section we formulate a generalization of Malinvaud theorem (Malinvaud 1970) on α T consistency for linear stochastic process (4) and consider an example of nonlinear regression function g(t, α) satisfying the conditions of this theorem and conditions C 1 , C 2 . Then we consider another possibilities of C 1 and C 2 fulfillment. Set . Assume the following.
(1) For any ε > 0 and R > 0 there exists δ = δ(ε, R) such that (2) For some R 0 > 0 and any ρ ∈ (0, R 0 ) there exist numbers a = a(R 0 ) > 0 and It was proven in Lemma 1 that under condition A 1 Proof By formula (7) E w 4 By condition A 1 and Fubini-Tonelli theorem On the other hand by formula (8)

For integral I
(2) 7 we get the same bound. So, we obtain inequality (52) assumptions (1), (2), and A 1 are valid then for any ρ > 0 Proof The proof of this Malinvaud theorem generalization is similar to the proof of Theorem 3.2.1. in Ivanov and Leonenko (1989) and uses the relations (51) and (52).
Instead of C 2 consider the stronger condition. C 2 . There exist positive constants c 0 , c 1 < ∞ such that for any α ∈ A c and T > T 0 Point out a sufficient condition for C 2 ' fulfillment. Introduce a diagonal matrix (ii) For some numbers c * 0 . c * 1 and T > T 0 , Under condition C 2 as it is easily seen one can take in C 2 The next example demonstrates the fulfillment of the condition C 2 [compare with Ivanov and Orlovskyi (2018)].

Example A.1 Let
where J is a positive definite matrix, and the set A in the model (1) is bounded. Set Then for any δ > 0 and T > T 0 and condition C 2 (i) is fulfilled with matrix s T = T 1 2 I q , I q is identity matrix of order q, and Let us check the condition C 2 (ii). We have e α 1 , y(t) − e α 2 , y(t) = e α 2 , y(t) e α 1 −α 2 , y(t) − 1 .
On the other hand, where λ max (J ) is the maximal eigenvalue of the matrix J . It means that condition C 2 (ii) is valid for matrix s T = T 1 2 I q . So the condition C 2 is valid as well and in (53) one can choose for T > T 0 some numbers Inequalities (53) can be rewritten in the equivalent form From the right hand side of (55) it follows (48). Similarly, from the left hand side of (55) taking ν = 0 we obtain (49) for any R 0 > 0 and it is possible to choose R 0 > 0 satisfying (50). In our example A 1 due to inequalities (54) with s i T = T 1 2 , i = 1, q, the set U T (α) is bounded uniformly in T and it is not necessary to use condition (50). However in Malinvaud theorem we can not ignore the condition (50) of parameters distinguishability in the cases when the sets U T (α) expands to infinity as T → ∞ or the set A is unbounded.
It goes without saying not all the interesting classes of nonlinear regression functions satisfy consistency conditions of Malinvaud or, say, Jennrich (1969) types. The important example of such a class is given by the trigonometric regression functions.

Example A.2 Let
Under some conditions on angular frequencies ϕ = (ϕ 1 , . . . , ϕ N ) distinguishability (see Walker Walker 1973;Ivanov 1980;Ivanov et al. 2015) it is possible to prove that at least The convergence in (57) can be a.s. In turn, form (57) it follows (see cited papers) Note that From (58) and (59) we obtain the relation of condition C 1 for trigonometric regression: To check the fulfillment of the condition C 2 for regression function (56) we get k = 1, N , and therefore Using again the relation (59) we arrive at the inequality of the condition C 2 .
with constant c 0 depending on The next lemma is the main part of the convergence (57) proof.

Lemma 11 Under condition
Proof By formula (7) and By formula (8) Obviously, From inequalities (63)-(67) it follows The result of the lemma can be strengthened to a.s. convergence in (62). Note also that in the proof we did not use the conditionâ ∈ L 1 (R).
The next CLT is an important part of the proof of LSE α T asymptotic normality in the model (1) and fully uses condition A 1 .
Use the Leonov-Shiryaev formula (see, e.g., Ivanov and Leonenko 1989). Let then the application of formula (73) to (74) shows that to obtain (72) for all i = 3, n. Taking into account the equality E ε(t) = 0, from (75) will follow that in (72) all the odd moments E η 2ν+1 = 0. On the other hand, for even moments E η 2ν we shall find that in (74) thanks to (73) only those terms correspond to the partitions of the set I = {1, 2, . . . , 2ν} into pairs of indices will remain nonzero, i.e. "Gaussian part" : all l p = 2. In (73) it will be (2ν − 1)!! of such terms and each of them will be equal to σ 2ν (z). Let us prove (75). We note that condition ( To obtain (76) we have usedâ ∈ L 1 (R) only.
Using the theorem, just as in the works cited above (for definiteness, we turn our attention to Ivanov et al. 2015), it can be proved that, if a number of additional conditions on the regression function are satisfied, the normalized LSE d T (α 0 ) ( α T − α 0 ) is asymptotically normal N 0, L SE , with Note that, firstly, our conditions N 3 , (1), (2) are included in the conditions for the LSE asymptotic normality of Ivanov et al. (2015), and, secondly, the trigonometric regression function (56) satisfies the conditions of Ivanov et al. (2015). Moreover, using (70) and (59) we conclude that for the trigonometric model the normalized LSE The matrix T RI G is positive definite, if f ϕ 0 k > 0, k = 1, N . Hovewer it follows from our condition A 2 (iii).
Note also that condition N 2 is satisfied, for example, for the trigonometric regression function (56). Indeed, in this case and similarly to (60) Then the sequence of the Levitan polynomials that corresponds to F can be written as